S archive: Estimation of Insurance Tariffs

tariff Estimate insurance tariffs

DESCRIPTION

Estimate a mean and dispersion model for the cost and frequency of insurance claims. Allows the estimation of insurance tariffs. Produces a double generalized linear model object of class "dglm" which inherits from "glm" and "lm".

Note: To use this function, you will also need to the functions associated with dglm and the Tweedie family.

USAGE

tariff <- function(formula = formula(data), dformula = ~1, nclaims = NULL, exposure = NULL, link.power = 0, dlink.power = 0, var.power = 1.5, data = sys.parent(), subset = NULL, contrasts = NULL, method = "ml", mustart = NULL, betastart = NULL, phistart = NULL, control = dglm.control(...), ykeep = T, xkeep = F, zkeep = F, ...)

REQUIRED ARGUMENTS

formula a formula expression as for glm, of the form response ~ predictors. See the documentation of lm and formula for details. As for glm, this specifies the linear predictor for modelling the mean. A term of the form offset(expression) is allowed. The response should be the total cost of claims divided by the number of claims.

OPTIONAL ARGUMENTS

`dformula`		a formula expression of the form `~ predictor`, the response being ignored. This specifies the linear predictor for modelling the dispersion. A term of the form `offset(expression)` is allowed. For insurance modelling, this will often be the same as the mean model.
`nclaims`		vector giving the number of claims.
`exposure`		vector giving a measure of exposure to risk, usually proportional to policy years.
`link.power`		link function for modelling the mean. A linear predictor is used for the mean raised to link.power, with 0 indicating the log-link.
`dlink.power`		link function for modelling the dispersion. A linear predictor is used for the dispersion raised to link.power, with 0 indicating the log-link.
`var.power`		Scalar. The variance is assumed proportion to the mean raised to this power. Must be between 1 and 2.
`data`		as for the glm function; see S-Plus documentation.
`subset`		as for the glm function; see S-Plus documentation.
`contrasts`		as for the glm function; see S-Plus documentation.
`method`		the method used to estimate the dispersion parameters; the default is "ml" for maximum likelihood and the alternative is "reml" for restricted maximum likelihood. Upper case and partial matches are allowed.
`mustart`		numeric vector giving starting values for the fitted values or expected responses. Must be of the same length as the response, or of length 1 if a constant starting vector is desired. Ignored if `betastart` is supplied.
`betastart`		numeric vector giving starting values for the regression coefficients in the link-linear model for the mean.
`phistart`		numeric vector giving starting values for the dispersion parameters.
`control`		a list of iteration and algorithmic constants. See `dglm.control` for their names and default values. These can also be set as arguments to `tariff` itself.
`ykeep`		logical flag: if `TRUE`, the vector of responses is returned.
`xkeep`		logical flag: if `TRUE`, the `model.matrix` for the mean model is returned.
`zkeep`		logical flag: if `TRUE`, the `model.matrix` for the dispersion model is returned.

VALUE

an object of class dglm is returned, which inherits from glm and lm. See dglm.object for details.

DETAILS

Let z_i be the total cost of claims in the ith category, and let n_i be the numbe of claims. We assume that the n_i are Poisson and that the size of each claim follows a gamma distribution. This implies that the average observed claim size y_i = z_i/n_i follows Tweedie's compound Poisson distribution. The function tariff computes maximum likelihood or restricted maximum likelihood estimators for the parameters based on the joint likelihood of y_i and n_i.

The function is similar in structure to the double generalized linear model function dglm, and it returns an object of the same class.

REFERENCES

Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, 696-709. Read article

Smyth, G. K., and Jørgensen, B. (To appear). Fitting Tweedie's Compound Poisson Model to Insurance Claims Data: Dispersion Modelling. ASTIN Bulletin. Read article

SEE ALSO

dglm, dglm.object, Tweedie family.

WARNING

The anova method is questionable when applied to an dglm object with method="reml" (stick to "ml").

EXAMPLES

Estimate tariffs for the Swedish 3rd party motor insurance data. This reproduces results from Smyth and Jørgensen (in press).

motorins <- read.table("c:/gordon/www/data/general/motorins.txt",header=T)
motorins <- motorins[motorins$Zone == 1 & motorins$Make != 9,]
motorins$Bonus <- factor(motorins$Bonus)
motorins$Make <- factor(motorins$Make)
motorins$Kilometres <- factor(motorins$Kilometres)
contrasts(motorins$Bonus) <- contr.treatment(levels(motorins$Bonus))
contrasts(motorins$Make) <- contr.treatment(levels(motorins$Make))
contrasts(motorins$Kilometres) <- contr.treatment(levels(motorins$Kilometres))
attach(motorins)

out <- tariff(Payment/Insured~Bonus+Make+Kilometres,~Bonus+Make+Kilometres,nclaims=Claims,exposure=Insured,var.power=1.72)
summary(out)

# Base risk
tapply(fitted(out),list(Bonus,Make,Kilometres),mean)[1,1,1]

# Multiplative tariff factors for other factor levels
exp(coef(out))

S-Archive

Download Script