/ Home

OzDASL

Underdispersed Word Counts

Keywords: Poisson distribution, underdispersion


Description

In studies aimed at characterising an author's style, samples of n words are taken and the number of function words in each sample counted. Often binomial or Poisson distributions are assumed to hold for the proportions of function words. The table shows the combined frequencies (x) of the articles "the", "a" and "an" in samples from Macauley's "Essay on Milton", taken from the Oxford edition of Macualey's (1923) literary essays. Non-overlapping samples were drawn from opening words of two randomly chosen lines from each of 50 pages of printed text, 10 word samples being simply extensions of 5 word samples. The data show clear evidence of underdispersion.

Download

Data file (tab-delimited text)

Source

Bailey, B.J.R. (1990) A model for function word counts. Applied Statistics, 39, 107-114, Table 1.
Hand D.J., Daly F., Lunn A.D., McConway K.J., Ostrowski E. (1994) A Handbook of Small Data Sets. London: Chapman & Hall. Data set 486.

 


Help

Home - About Us - Contact Us
Copyright © Gordon Smyth