First, we explain how to estimate the Lee-Carter model using the
lc_stan
function in the StanMoMo
package.
For illustration, the package already includes the object
FRMaleData
containing deaths (FRMaleData$Dxt
)
and central exposures (FRMaleData$Ext
) of French males for
the period 1816-2017 and for ages 0-110 extracted from the Human Mortality Database. In this
example, we focus on ages 60-90 and the period 1980-2010. This can be
obtained via:
ages.fit<-60:90
years.fit<-1980:2010
deathFR<-FRMaleData$Dxt[formatC(ages.fit),formatC(years.fit)]
exposureFR<-FRMaleData$Ext[formatC(ages.fit),formatC(years.fit)]
As a reminder, the Lee-Carter model assumes that the log death rates are given by logμxt=αx+βxκt To ensure identifiability of the model, we assume that ∑xβx=1,κ1=0
Moreover, we assume that the period parameter follows a random walk with drift: κt∼N(c+κt−1,σ) The choice of priors for each parameter can be found in the documentation of each function.
All the parameters can be estimated either under a Poisson model with
argument family = "poisson"
or under a Negative-Binomial
model which includes an additional overdispersion parameter ϕ with argument
family="nb"
: Dxt∼Poisson(extμxt) or Dxt∼NB(extμxt,ϕ) Given the matrix of deaths deathFR
and the matrix
of central exposures exposureFR
, we can infer the posterior
distribution of all the parameters and obtain death rates forecasts for
the next 10 years under a Poisson model by a simple call to the
lc_stan
function:
fitLC=lc_stan(death = deathFR,exposure=exposureFR, forecast = 10, family = "poisson",chains=1,iter=1000,cores=1)
##
## SAMPLING FOR MODEL 'leecarter' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000235 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.35 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 3.77 seconds (Warm-up)
## Chain 1: 2.354 seconds (Sampling)
## Chain 1: 6.124 seconds (Total)
## Chain 1:
Internally, the sampling is performed through the
rstan::sampling
function. By default, Stan samples four
Markov chains of 2000 iterations. For each chain, there are 1000 warmup
iterations (hence, 4000 post warm-up draws in total). Moreover, by
default, Stan uses 1 core but we recommend using as many processors as
the hardware and RAM allow (up to the number of chains). All this can be
set using the following additional arguments : for instance, if one
wants to sample 2 chains of 3000 iterations with 1000 warm-up samples,
one can add chains=2,iter=3000,warmup=1000
.
For additional arguments, the reader may refer to the
rstan::sampling
documentation.
The output is an object of class stanfit
(cf. the
rstan package) which contains, among others,
posterior draws, posterior summary statistics and convergence
diagnostics.
The easiest way to extract the posterior draws is to call the
rstan::extract
function which returns a list with named
components corresponding to the model parameters.
## [1] "a" "b" "c" "ks" "sigma" "k" "phi"
## [8] "k_p" "mufor" "log_lik" "pos" "pos2" "pos3" "lp__"
Among these parameters, we find
a
: αx.b
: βx.k
: κt.k_p
: forecasts of κt.mufor
: forecasts of death rates μxt.log_lik
: log-likelihoods.phi
: overdispersion parameter for the NB model.c
and sigma
: drift and standard deviation
of the random walk with drift.The user can have access to an interface for interactive MCMC
diagnostics, plots and tables helpful for analyzing posterior samples
through the shinystan
package (fore more details, you can
check the shinystan web
page)
Note: you need to close the shiny app before continuing in R.
Moreover, the package also includes functions to represent boxplots
of the posterior distribution based on the ggplot2
package.
For instance, boxplots for the parameters from the Lee-Carter model can be obtained via:
All forecast death rates μxt are stored in the output
mufor
. We point out that the output mufor
is a
matrix of dimension N × (A.M) where the number of rows N is the posterior sample size and the
number of columns A.M is the
product of A the age dimension and
M the number of forecast years.
In a similar fashion, the models RH, APC, CBD and M6 can be estimated with the following functions:
Model | Predictor |
---|---|
LC | logμxt=αx+βxκt |
RH | logμxt=αx+βxκt+γt−x |
APC | logμxt=αx+κt+γt−x |
CBD | logμxt=κ(1)t+(x−ˉx)κ(2)t |
M6 | logμxt=κ(1)t+(x−ˉx)κ(2)t+γt−x |
fitRH=rh_stan(death = deathFR,exposure=exposureFR, forecast = 10, family = "poisson",cores=4)
fitAPC=apc_stan(death = deathFR,exposure=exposureFR, forecast = 10, family = "poisson",cores=4)
fitCBD=cbd_stan(death = deathFR,exposure=exposureFR, age=ages.fit, forecast=10,family = "poisson",cores=4)
fitM6=m6_stan(death = deathFR,exposure=exposureFR, age=ages.fit,forecast = 10, family = "poisson",cores=4)
For information, the estimated parameters κ(1)t, κ(2)t and γt−x are respectively stored in
the variables k
,k2
and g
. The
forecast parameters are respectively stored in the variables
k_p
,k2_p
and g_p
.
You can check the next vignette to know more about model averaging techniques based on leave-future-out validation.