How many different words is in an average book?

Tim   Sat Sep 19, 2009 8:31 am GMT
Hi,
I'd like to know how many different words there is in an average book. Say, for example an average book written by Michael Crichton. Or the most recent book Brisingr by Christopher Paolini. I imagine the latter book can have more words.
Thanks!
Trimac20   Sat Sep 19, 2009 9:05 am GMT
I think in the average novel most of the 'everyday words' would be covered. Like 'a', 'it', 'them'. It would obviously vary on the author's style and the length and genre of the book.
Tim   Sat Sep 19, 2009 10:51 am GMT
Trimac20, I mean no slight, but can you read?
I've written a perfectly understandable post with a clear question. I've requested an answer. And what do I get?
You're unnecessarily leading this thread off-topic. If you don't know the answer, don't write at all.
Can someone else have any rough idea as to the number of different words in books (novels)?
Johnny   Sat Sep 19, 2009 11:00 am GMT
<<I think in the average novel most of the 'everyday words' would be covered. Like 'a', 'it', 'them'.>>

LOL, I don't think you can find a novel without a single "it" or "a" or "them" in it.

Tim, I once read somewhere, I don't remember where, that something like 98% of a novel is made up of less than 8000 words, if I'm not mistaken. Try to google something like "number of vocabulary words". But the point was that "knowing 8000 words, you can read and understand a novel", not that 8000 words are all in a single novel. I don't know how you should count those.
Guest   Sat Sep 19, 2009 11:32 am GMT
I wrote a program that is capable of measuring this a while back. I have applied to some public domain works because their full texts are easily available. Note that these counts are inflated because they include proper nouns and it does not account for different forms of the same word.

Alice's Adventures in Wonderland: 2766
Pride and Prejudice: 6424
A Tale of Two Cities: 9877
Oliver Twist: 10419
A Connecticut Yankee in King Arthur's Court: 10312

Well, you can see that the number varies quite a bit. I can try other PD books if you like.
Tim   Sat Sep 19, 2009 11:39 am GMT
Thanks, Johnny!
Lately, I've been wondering how many words I know (passively). I can read novels with ease. I come across a new word not very often, but still there are some words I haven't encountered or remembered. So now I'm trying to figure out how many words I might know.
<<Tim, I once read somewhere, I don't remember where, that something like 98% of a novel is made up of less than 8000 words, if I'm not mistaken.>>
Those remaining 2% is a huge number of new words considering the overall number of words in a novel, if I were to know only that 8000 words.
I'll try to google some more. Meanwhile, I'd be very interested in other ideas.
Tim   Sat Sep 19, 2009 11:46 am GMT
Guest,
That's very interesting! Too bad the program counts all the inflected forms of the same word. I'm assuming that your program also counts names, places and the like.
Considering your given results, I think that's not too many words. Although I suspected something like that.
Tim   Sat Sep 19, 2009 11:48 am GMT
Guest,
That's very interesting! Too bad the program counts all the inflected forms of the same word. I'm assuming that your program also counts names, places and the like.
Considering your given results, I think that's not too many words. Although I suspected something like that.
Guest   Sun Sep 20, 2009 5:08 am GMT
Yeah, it does what you mentioned so the numbers are higher than they should be. If I had a data set which listed a large number of words along with their inflections, I could get an exact count. I don't know of anything like that, though.
Guest   Sun Sep 20, 2009 7:05 am GMT
I think BE books contain more words than AE books
Phoenicius   Sun Sep 20, 2009 9:06 am GMT
IIRC, the King James Bible has something like 11000 or 12000 distinct words.
in   Sun Sep 20, 2009 12:06 pm GMT
I think you can manually count how many words per page are inflections (on 3-4 pages, let's say), and then just multiply % * all the words.
Tim   Sat Sep 26, 2009 8:40 am GMT
I've found this article that says:

"In written text, we know that about 2000 word families cover about 85-90% of the running words in general texts and that 50% of any text will be function words (Nation 2001). We also know that to read a native novel, a newspaper or a magazine with 98% vocabulary coverage, a learner would need to know about 8000-9000 word families."

The article: http://www.robwaring.org/what_and_why/er_is_vital.htm