Introduction

Generalized linear models (GLMs) are used to do regression modelling for non-normal data with a minimum of extra complication compared with normal linear regression. GLMs are flexible enough to include a wide range of common situations, but at the same time allow most of the familiar ideas of normal linear regression to carry over.

The normal linear model. Let y be a vector of observations, and let X be a matrix of covariates. The usual multiple regression model takes the form

(1) m = Xb

where m = E(y) and b is a vector of regression coefficients. Typically we assume that the y_iare normal and independent with standard deviation s, so that we estimate b by minimizing the sum of squares

(y - m)^T(y - m)

Why is linearity not enough? The most important and common case is that in which the y_iand m_i are bounded. For example, if y represents the amount of some physical substance then we may have y > 0 and m > 0. On the other hand if y is binary, y = 1 if an animal survives and y =0 if it does not say, then 0 < m < 1. The linear model (1) is inadequate in these cases because complicated and unnatural constraints on b would be required to make sure that m stays in the possible range. Generalized linear models instead assume a link linear relationship

(2) g(m) = Xb

where g() is some known monotonic function which acts pointwise on m. Typically g() is used to transform the m_i to a scale on which they are unconstrained. For example we might use g(m) = log(m) if m_i > 0 or g(m) = log[ m / (1-m) ] if 0 < m_i < 1.

Why is normality not enough? In some situations, typically cases when s is small, the normal approximation to the distribution of y is accurate. More typically, responses are not normal.

If y is bounded then the variance of y must depend on its mean. Specifically if m is close to a boundary for y then the var(y) must also be small. For example, if y > 0, then we must have var(y) -> 0 as m -> 0. For this reason strictly positive data almost always shows increasing variability with increased size. If 0 < y < 1, then var(y) -> 0 as m -> 0 or m -> 1. For this reason, generalized linear models assume that

(3) var(y) = f V(m)

where V() is some known variance function appropriate for the data at hand.

We therefore estimate the nonlinear regression equation (2) weighting the observations inversely according to the variance functions V(m_i). This weighting procedure turns out to be exactly equivalent to maximum likelihood estimation when the observations actually come from an exponential family distribution.