Underdispersed Word Counts

Keywords: Poisson distribution, underdispersion


In studies aimed at characterising an author's style, samples of n words are taken and the number of function words in each sample counted. Often binomial or Poisson distributions are assumed to hold for the proportions of function words. The table shows the combined frequencies (x) of the articles "the", "a" and "an" in samples from Macauley's "Essay on Milton", taken from the Oxford edition of Macualey's (1923) literary essays. Non-overlapping samples were drawn from opening words of two randomly chosen lines from each of 50 pages of printed text, 10 word samples being simply extensions of 5 word samples. The data show clear evidence of underdispersion.


Data file (tab-delimited text)


