Keywords: regression with two groups, multiple regression, outliers.
This data set was assembled by Rowan Todd and Mark McNaughton, two students studying Statistics at QUT in a class taught by Dr Margaret Mackisack. For a class project they decided to investigate the effect on football game attendance of various covariates. They collected data involving Saturday Australian Football League (AFL) matches at the Melbourne Cricket Ground (MCG). They looked only at matches during the normal home and away season (i.e. not including finals). They used statistics from all such games in 1993 and 1994 (nineteen relevant matches in 1993 and twenty-two in 1994). The response variable measured was attendance at the MCG, and after consideration, they came up with the following covariates:
|MCG||Attendance at the MCG in 1000's.|
|Temp||Temperature. The forecast maximum temperature on the day of the match, in whole degrees C, found in The Weekend Australian.|
|Other||Attendance at other matches in 1000's. The sum of the attendances at other AFL matches in Melbourne and Geelong on the same day as the match in question.|
|Members||Membership. The sum of the memberships of the two clubs whose teams were playing the match in question in 1000's.|
|Top50||Number of players from the top fifty. The number of players in the top 50 in the AFL who happened to be playing in the match in question.|
|Date||Date of the match in the format dd/mm/yy.|
|Home||Abbreviation for home team.|
|Away||Abbreviation for away team.|
The abbreviations for team names are given below, together with the membership of each club in 1993 and 1994.
|Abbrev.||Club Name||Members 93||Members 94|
Data file (tab-delimited text)
Various copies of The Weekend Australian, The Football Bible '94 by Rex Hunt, and various copies of Inside Football and Football Record.)
Data and description supplied by Dr Margaret S. Mackisack, Department of Mathematics, University of Queensland.
Attendance increases with club membership, but at only about half the rate when one of the terms is from interstate. Perhaps the membership of the home and away teams should be included as separate covariates.
To use as an example of simple linear regression, regress MCG on Membership just for the Victorian clubs. To use an an example of simple linear regression with two groups, compare the regression of MCG on Membership when the away side is Victorian, and when it is from interstate.