孟生旺 中国人民大学统计学院 http://blog.sina.com.cn/mengshw R 在精算中的应用 第5届中国R会议 孟生旺 中国人民大学统计学院 http://blog.sina.com.cn/mengshw
概述 统计 精算 统计软件:R,SAS,…… 精算软件 寿险:定价、准备金、分类 非寿险:定价、准备金、分类 Prophet,MoSes,VIP, IGLOO, EMBlem
统计软件在非寿险精算中的应用 Source Palisade 2006 ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf
EXCEL,SAS,R的简单比较 EXCEL:简单易学,容易出错,结果不稳定,计算效率低,统计功能有限。
精算应用中的R包 利息理论和寿险精算:lifecontigencies 损失模型:actuar 非寿险准备金评估:ChainLadder 非寿险定价:glm,glm.nb (在MASS包中), cplm,gamlss 数据处理和绘图:plyr,reshape,ggplot2
利息理论与寿险精算:lifecontigencies 功能: 人口统计、利息理论和精算数学的计算 寿险保单的定价、准备金评估 不足: 目前只能处理单减因表 不能处理连续时间。 Bug?
Lifecontigencies示例 > library(lifecontingencies) > nominal2Real(i=0.12, k = 12, type="interest") > #现值计算 > cf=c(10,20,30) #现金流 > t=1:3 #付款时间 > p=c(0.5,0.6,0.8) #付款概率 > i=0.03 #年实际利率 > presentValue(cashFlows=cf, timeIds=t, interestRates=i, probabilities=p) [1] 38.12892 > #30岁,10年定期寿险,年利率4% > Axn(soa08Act, 30, 10, i=0.04) [1] 0.01577283
A bug? > library(lifecontingencies) > cf=c(10,10,10,10,10,110) > t=1:6 > duration(cf, t, i=0.03,macaulay=F) #得到macaulay久期? [1] 4.984214 > sum(t*cf*1.03^(-t))/sum(cf*1.03^(-t)) #直接计算macaulay久期 > convexity(cf, t, i=0.03) #计算凸度 [1] 30.69613 > sum(t*(t+1)*cf*(1+0.03)^(-t-2))/sum(cf*1.03^(-t)) 直接计算凸度
损失模型:actuar 分布计算和参数估计:d, p, q, r, m, lev 信度模型 累积损失的计算 破产概率的计算 分层损失模型的随机模拟 注:CAS的LSM可以模拟保险公司的损失发生过程及其进展:http://www.casact.org/research/lsmwp/index.cfm?fa=software
Actuar:BS信度模型示例 > library(actuar) > mydata policy loss1 loss2 loss3 w1 w2 w3 1 1 1738 1642 1794 7861 9251 8706 2 2 1364 1408 1597 1622 1742 1523 3 3 1759 1685 1479 1147 1357 1329 4 NA NA 1010 NA NA 348 > myfit=cm(~policy,mydata,ratios=loss1:loss3, weights=w1:w3)
> summary(myfit) Call: cm(formula = ~policy, data = mydata, ratios = loss1:loss3, weights = w1:w3) Structure Parameters Estimators Collective premium: 1566.874 Between policy variance: 24189.8 Within policy variance: 34611317 Detailed premiums Level: policy policy Indiv. mean Weight Cred. factor Cred. premium 1 1722.485 25818 0.9474905 1714.314 2 1452.297 4887 0.7735260 1478.246 3 1635.718 3833 0.7281780 1617.005 4 1010.000 348 0.1956350 1457.930
#分层信度模型 > mydata1 type policy loss1 loss2 loss3 w1 w2 w3 1 A 1 1738 1642 1794 7861 9251 8706 2 B 2 1364 1408 1597 1622 1742 1523 3 A 3 1759 1685 1479 1147 1357 1329 4 B 4 NA NA 1010 NA NA 348 > myfit1=cm(~type+type:policy, mydata1, ratios=loss1:loss3, weights=w1:w3, method='iterative')
> summary(myfit1) Call: cm(formula = ~type + type:policy, data = mydata1, ratios = loss1:loss3, weights = w1:w3, method = "iterative") Structure Parameters Estimators Collective premium: 1560.691 Between type variance: 33970.18 Within type/Between policy variance: 5775.545 Within policy variance: 34611317 Detailed premiums Level: type type Indiv. mean Weight Cred. factor Cred. premium A 1694.319 1.2017108 0.8760556 1677.757 B 1404.139 0.5040669 0.7477794 1443.625 Level: policy type policy Indiv. mean Weight Cred. factor Cred. premium A 1 1722.485 25818 0.81161277 1714.059 B 2 1452.297 4887 0.44918368 1447.520 A 3 1635.718 3833 0.39009799 1661.358 B 4 1010.000 348 0.05488322 1419.826
Actuar:累积损失计算示例 S = X1 + X2 + … + XN library(actuar) fx=discretize(pgamma(x, 2, 1), from = 0, to = 22, step = 0.5, method = "unbiased", lev = levgamma(x, 2, 1)) #单次损失分布的离散化 Fs=aggregateDist("recursive", model.freq = "poisson", model.sev = fx, lambda = 10, x.scale = 0.5) #累积损失的计算 > VaR(Fs,seq(0.9,1,0.02)) 90% 92% 94% 96% 98% 100% 30.5 31.5 33.0 35.0 38.0 71.0 > CTE(Fs,seq(0.9,1,0.02)) 90% 92% 94% 96% 98% 100% 35.41874 36.30422 37.64626 39.45808 42.21550 NaN
par(mfrow=c(1,2)) plot(Fs,verticals=T,do par(mfrow=c(1,2)) plot(Fs,verticals=T,do.points=F,col=2,main='分布函数') #分布函数 plot(diff(Fs),type='l',col=2,main='密度函数') #密度函数
非寿险准备金评估:ChainLadder 基于流量三角形 适用于常见的准备金评估方法: 确定性模型 随机模型:bootstrap,GLM
Chainladder:基于GLM的准备金评估示例 > dat dev origin 1 2 3 4 5 6 7 8 9 10 1981 501 827 1091 1181 1354 1618 1801 1861 1866 1883 1982 11 429 540 1067 1379 1561 1563 1630 1684 NA 1983 341 899 1387 1614 1873 2221 2286 2346 NA NA 1984 566 1156 1577 2127 2343 2609 2707 NA NA NA 1985 109 956 1583 2216 2595 2617 NA NA NA NA 1986 151 644 1170 1293 1585 NA NA NA NA NA 1987 56 402 1095 1232 NA NA NA NA NA NA 1988 135 695 1311 NA NA NA NA NA NA NA 1989 313 539 NA NA NA NA NA NA NA NA 1990 206 NA NA NA NA NA NA NA NA NA > fit=glmReserve(dat,var.power=2,mse.method='bootstrap')
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.4363557 0.3204344 16.966 < 2e-16 *** factor(origin)1982 -0.1897676 0.3127652 -0.607 0.54783 factor(origin)1983 0.0916830 0.3270977 0.280 0.78086 factor(origin)1984 0.3190545 0.3427445 0.931 0.35812 factor(origin)1985 0.1560711 0.3613516 0.432 0.66838 factor(origin)1986 -0.1696000 0.3849454 -0.441 0.66215 factor(origin)1987 -0.3552483 0.4169667 -0.852 0.39986 factor(origin)1988 -0.0007145 0.4645266 -0.002 0.99878 factor(origin)1989 -0.0900973 0.5461889 -0.165 0.86990 factor(origin)1990 -0.1084795 0.7368022 -0.147 0.88377 factor(dev)2 0.7510720 0.3127652 2.401 0.02162 * factor(dev)3 0.7565416 0.3270977 2.313 0.02656 * factor(dev)4 0.3215201 0.3427445 0.938 0.35446 factor(dev)5 0.1581825 0.3613516 0.438 0.66418 factor(dev)6 -0.1244326 0.3849454 -0.323 0.74838 factor(dev)7 -1.0670776 0.4169667 -2.559 0.01484 * factor(dev)8 -1.2581527 0.4645266 -2.708 0.01028 * factor(dev)9 -1.8769195 0.5461889 -3.436 0.00150 ** factor(dev)10 -2.6031424 0.7368022 -3.533 0.00115 **
> fit$summary Latest Dev.To.Date Ultimate IBNR S.E CV 1982 1684 0.9917550 1698 14 15.39944 1.0999601 1983 2346 0.9762797 2403 57 42.68783 0.7489093 1984 2707 0.9435343 2869 162 102.59693 0.6333144 1985 2617 0.9192132 2847 230 127.54743 0.5545540 1986 1585 0.8246618 1922 337 193.09326 0.5729770 1987 1232 0.7247059 1700 468 271.48780 0.5801021 1988 1311 0.5712418 2295 984 563.30382 0.5724632 1989 539 0.2857900 1886 1347 890.46892 0.6610757 1990 206 0.1048346 1965 1759 1425.98589 0.8106799 total 14227 0.7264233 19585 5358 1973.53746 0.3683347
可以用于非寿险定价的R包 glm glm.nb cplm gamlss
gamlss的应用示例 拟合损失次数, 数据来源:de Jong(2008) > dat claims freq 1 0 63232 1 0 63232 2 1 4333 3 2 271 4 3 18 5 4 2
> library(gamlss) > fam=c('PO','NBI','NBII','PIG','DEL','SICHEL','ZIP','ZINBI') > m1=m2=list() > for (i in 1:8) {m1[[fam[i]]]=GAIC(gamlss(claims~1,weights=freq,data=dat,family=fam[i],n.cyc=100,trace=F)) + m2[[fam[i]]]=GAIC(gamlss(claims~1,weights=freq,data=dat,family=fam[i],n.cyc=100,trace=F),k=log(sum(dat$freq)))} > sort(unlist(m1)) #AIC比较 PIG NBI NBII SICHEL DEL ZINBI ZIP PO 36102.91 36103.36 36103.36 36104.91 36104.93 36105.60 36108.40 36205.00 > sort(unlist(m2)) #SBC比较 PIG NBI NBII ZIP SICHEL DEL ZINBI PO 36121.16 36121.61 36121.61 36126.65 36132.28 36132.30 36132.97 36214.13
claims freq PIG PO 1 0 63232 63232.1 63094.3 2 1 4333 4332.7 4590.6 3 2 271 270.9 167.0 4 3 18 18.7 4.1 5 4 2 1.4 0.1
R与SAS在精算应用中的几个比较 优化: GLM: GLMM: R:optim,nlm …… SAS:NLMIXED R:glm,glm.nb, gamlss SAS:GENMOD GLMM: R:lme4 SAS:GLIMMIX,NLMIXED
小结 R和SAS需要互补 R的精算包:需要进一步完善和优化 R对精算学习的影响