Keywords: multiple regression, outliers
The data were collected as part of a time study for Telecom, now known as Telstra. The purpose if the study was to model the total hours worked in a section of Telecom in terms of the counts of various tasks. It was hoped that such a model could be used to predict hours worked and hence staffing requirements in changing circumstances. The number of hours worked by employees in a fault reporting centre were recorded, together with the number of faults of each type which were recorded.
Employees often work on a flexitime system which allows them to build up time and to leave early every second Friday.
|Hours||Number of hours worked|
|ByDa||Number of talks of a certain type|
|RWT||A type of fault variable|
|SOA||Number of service orders of type A|
|SOB||Number of service orders of type B|
|SOC||Number of service orders of type C|
|Day||Day of the week: 1-Monday, 2-Tuesday, 3-Wednesday, 4-Thursday, 5-Friday|
Data file (tab-delimited text)
Gordon Smyth, Consulting Problem, 1981.
This is an interesting data set because there is an noticeable Friday effect as employees take flexitime off. Initially this information was not known, so two Fridays were identified as outliers. The following GLIM code shows the transformations applied to each of the predictor variables.
$UNITS 31 $FACTOR DAY 5 FRI 2 $VAR HRS BD PR RWT FT SOA SOB SOC CBL FD HOT REST SPEC APP PROB SC HO MO $DATA HRS BD PR RWT FT SOA SOB SOC CBL FD HOT REST SPEC APP PROB SC HO MO $DINPUT 21 $CALC LRWT=%LOG(28/109*RWT+1) : SO=SOA+SOB+SOC : LSO=%LOG(28/496*SO+1) : LHOT=%LOG(HOT+1) : LBD=%LOG(28/174*BD+1) : LPR=%LOG(28/81*PR+1) : LFT=%LOG(28/390*FT+1) : DAY=%GL(5,1)+2 : DAY=DAY-5*%GT(DAY,5) : FRI=%EQ(DAY,5)+1 $RETURN