S archive: Generalized linear models with random effects

reglm Generalized linear models with random effects

DESCRIPTION

Fits a generalized linear model including random factors using the method of Schall (1991). Produces an object of class "glm" with some additional components.

USAGE

reglm(formula = formula(data), random = NULL, family = gaussian(), data = sys.parent(), weights = NULL, dispersion = NULL, subset = NULL, na.action = na.fail, contrasts = NULL, start = NULL, mustart = NULL, betastart = NULL, sigmastart = NULL, control = reglm.control(...), ykeep = T, xkeep = F, ukeep = F, ...)

REQUIRED ARGUMENTS

formula a formula expression as for glm, of the form response ~ predictors. This specifies the fixed part of the model. See the documentation of lm and formula for details. As for glm, this specifies the linear predictor for modelling the mean. A term of the form offset(expression) is allowed.

OPTIONAL ARGUMENTS

Almost all arguments for reglm are as for glm. The following arguments are new or altered. The first is the important one. The others are likely to be of only occasional use.

`random`		a list object, the components of which are the factors to be treated as random. This specifies the random part of the model.
`dispersion`		specifies a prior value for the dispersion of the glm. This argument defaults to 1 for the binomial and Poisson families. Same as the $scale command in GLIM.
`mustart`		numeric vector giving starting values for the fitted values or expected responses (including the predicted values of the random effects). This is the same as the usual glm argument `start` except on the scale of the means rather than of the linear predictors. Must be of the same length as the response, or of length 1 if a constant starting vector is desired. Ignored if `betastart` is supplied.
`betastart`		numeric vector giving starting values for the regression coefficients in the link-linear model for the mean (including predicted values of the random effects).
`sigmastart`		numeric vector giving starting values for the variance components
`control`		a list of iteration and algorithmic constants. See `reglm.control` for their names and default values. These can also be set as arguments to `dglm` itself. This argument is the same as control.glm except that the defaults have been changed.
`ykeep`		logical flag: if `TRUE`, the vector of responses is returned. Same as `y` for glm.
`xkeep`		logical flag: if `TRUE`, the `model.matrix` for the mean model is returned. Same as `x` for glm.
`zkeep`		logical flag: if `TRUE`, the `model.matrix` for the variance components is returned.

VALUE

an object of class glm is returned. The following components are additional to the usual glm components:

`sigma`		vector of estimated variance components.
`dispersion`		estimated dispersion. If the family had a prior dispersion, then the dispersion is estimated only at convergence of the iteration and returned to diagnose over- or under-dispersion.
`prior.dispersion`		prior value of the dispersion for the glm. Same as the input argument dispersion if that was specified.
`q`		vector of dimensions of the random factors..
`edf`		vector of effective degrees of freedom used to estimate the variance components.
`u`		design matrix for the random factors, if ukeep=T on input.

DETAILS

We assume the linear predictor for the generalized linear model is of the form

g(m) = Xb + U₁b₁ + ... + U_cb_c

where b is a vector of fixed coefficients and b₁ ... b_c are vectors of random effects. The random effects are assumed to have variances s₁ ... s_c. The design matrix X is specified using the model formula and the design matrices U₁ ... U_c are specified using the input argument random.

The parameters are estimated here using the REML method described in Schall (1991).

REFERENCES

Schall, R (1991). Estimation in generalized linear models with random effects. Biometrika, 78, 719-727.

Candy, S. G. (2004). Modelling catch and effort data using generalized linear models, the Tweedie distribution, random vessel effects and random stratum-by-year effects. CCAMLR Science 11, 59-80.

ACKNOWLEDGEMENTS

Thanks for Dr Steve Candy, Australian Antarctic Division, for contributing a bug fix, 6 Nov 2004.

SEE ALSO

dglm

The following program is included in the reglm software distribution, but is not separately documented: reglm.control.

BUGS

On output the regression parameters don't have labels.

EXAMPLES

Here we demonstrate the reglm function on the radiation of cancer cells data example from Schall (1991). Basic analysis of the data shows that there is strong evidence for differences between the occasions, and also good evidence for overdispersion between the dishes (observations) within occasions.

> radiatio <- read.table("radatio.txt",header=T)
> attach(radiatio)
> n <- rep(400,27)

We specify a binomial model with two random effects. The factor Occasion represents the radiation occasion. There is also a random effect at the observation level, represented by (1:27) since there are 27 observations, which picks up any over-dispersion in the data. Trace is turned on so that we can see the progress of the iteration. The variance components and dispersion are output at each iteration.

> out <- reglm(Survived/n~1,list(Occasion=Occasion,Units=(1:27)),
         family=binomial,weights=n,trace=T)
Iter 1 0.0000505604239786021 0.0000166989428102248 1
Iter 2 0.00283092775872843 0.000306570522656396 1
Iter 3 0.0914639350937484 0.00247127680331739 1
Iter 4 0.207417308402352 0.00381847728525275 1
Iter 5 0.222886503891888 0.00539847167615311 1
Iter 6 0.223151486991139 0.00690093186214274 1
Iter 7 0.222683195620167 0.00805926586181473 1
Iter 8 0.222303061660932 0.00882599204674674 1
Iter 9 0.222049310739124 0.00928435931728272 1
Iter 10 0.22189665203915 0.0095420092039735 1
Iter 11 0.221810447442714 0.00968186587710608 1
Iter 12 0.221763517700708 0.0097563502955682 1
Iter 13 0.221738481620298 0.00979561756518506 1
Iter 14 0.221725270519245 0.00981620792679515 1
Iter 15 0.221718339587564 0.00982697440004381 1
Iter 16 0.221714714509054 0.00983259577809794 1
Iter 17 0.221712821520788 0.00983552854676926 1
> out$sigma
[1] 0.221712822 0.009835529
> out$dispersion
[1] 0.9369549

The final variance components, and residual variance estimate, are the same as in Schall (1991). Note that the prior dispersion in this case is 1.

> out$q
[1]  9 27
> out$edf
[1] 7.744181 8.232174

The random effects have dimensions 9 and 27 respectively. After smoothing the random effects, the effective degrees of freedom are 7.7 and 8.2 respectively. Including Occasion as a fixed effect would have used 8 df. In this case the effect of Occasion is so strong that there is very little difference between the fixed and random models for it.

> out$rank-sum(out$q)+sum(out$edf)
[1] 16.97635

out$rank gives the total number of parameters in the linear predictor including both the fixed and random parameters. Therefore out$rank-sum(out$q) gives the number of fixed parameters, in this case 1. The total effective degrees of freedom used in modelling the means is just under 17.0, including the 1 degree of freedom for the fixed model.

> coef(out)[2:10]
[1]  0.69126760 -0.18631757 -0.70870278 -0.01694652  0.19448300 -0.02442938
[7]  0.60688086 -0.54311626 -0.01311896
> contrasts(Occasion) <- contr.sum(levels(Occasion))
> out.fixed <- glm(Survived/n~Occasion,family=binomial,weights=n)
> coef(out.fixed)[2:9]
 Occasion1  Occasion2  Occasion3   Occasion4 Occasion5   Occasion6
 0.7128812 -0.1915751 -0.7354824 -0.01624658 0.2027526 -0.02395969
 Occasion7  Occasion8
 0.6260501 -0.5620218

Note that the estimated coefficients for Occasion (there are 9 of them and they add to zero) are similar to the equivalent parameters from the fixed model, but shrunk somewhat towards zero. In the fixed model, the last coefficient is not included explicitly because the coefficients are constrained to add to zero.

S-Archive

Download Script