Anthropological Demography Short-Course

       First, “R” is your friend, so:

http://cran.r-project.org/


 

Kaplan-Meier survivorship

Download the “kia.txt” file and save in your “R” workspace (usually this is something like C:\Program Files\R\rw1071).

Then load into your “R” workspace with:

  kia<-dget(‘kia.txt’)
 

Or do in one fell-swoop as:

 
  download.file('http://konig.la.utk.edu/kia.txt','kia.txt')
  kia<-dget('kia.txt')

At the prompt type “kia” to make sure you’ve got it right.

Now attach “kia” and get the survival library loaded  with: library(survival).  You can plot with:

  plot(survfit(Surv(Age)),xlim=c(17,45))

and see where the lines comes from with:

  lines(sort(Age),(232:1)/232,col='blue')
  st<-(232:1)/232
  lines(sort(Age),st+1.96*sqrt(st*(1-st)/232),col='blue')
  lines(sort(Age),st-1.96*sqrt(st*(1-st)/232),col='blue')
 

The “kia” file has Todd phases scored in it, so you could fit K-M survivorship for each phase, but there are very few individuals in the advanced phases, so let’s combine this data with male data from the Terry collection.  You do this, again, with “dget.”  First download the “terry.txt”, or:

 

  download.file('http://konig.la.utk.edu/terry.txt','terry.txt')
  terry<-dget('terry.txt')
 

Then do the plot (copy and paste the below into the ‘Rgui’ then type  plotit()):

 
plotit<-function () 
{
 
age<-kia$Age
phase<-kia$Todd
recode<-c(1,1,1,2,2,3,4,4,5,6)
phase<-recode[phase]
 
 
age2<-terry[,2]
phase2<-recode[terry[,3]]
 
age<-c(age,age2)
phase<-c(phase,phase2)
 
st<-survfit(Surv(age)~phase)
plot(st,xlim=c(17,90),col=1:6)
lines(c(0,100),c(0.975,0.975),lty=2)
lines(c(0,100),c(.025,.025),lty=2)
legend(70,.8,legend=c('I','II','III','IV','V','VI'),1:6)
}
 

September 29, 2003

Parametric hazard models

Now let’s fit a parametric hazard model to the “kia” data.  We can first do a histogram or a kernel density plot to get some idea of the shape of the age-at-death distribution:

 

 
  attach(kia)
  hist(Age)
  plot(density(Age))
 

It looks like there’s a long tail to the right, which suggests a log-normal distribution, but let’s try a plain old normal distribution first.  We’ll plot it first as a survival distribution (S(t)) against the Kaplan-Meier:

 
  library(survival)
  plot(survfit(Surv(Age)),xlim=c(17,40))
  lines(17:40,1-pnorm(17:40,mean(Age),sd(Age)),col='blue',lwd=3)
 

Yes, that looks like a horrible fit, but while we’re at it let’s also plot the probability density function (pdf or f(t)) against the kernel density fit:

  plot(density(Age))
  lines(15:40,dnorm(15:40,mean(Age),sd(Age)),col='blue',lwd=3)

Yes, that looks terrible too.  Now, to complete the lunacy let’s plot the hazard function:

  plot(17:40,dnorm(17:40,mean(Age),sd(Age))/(1-pnorm(17:40,mean(Age),sd(Age))),type='l',ylab='h(t)',xlab='Age')

To compare to the original data you need to smooth it (kernel density) and divide f(t) by S(t).  Let’s hold off on that for the moment.

The normal was a bad fit, so let’s try a shifted log normal for the pdf.  The shift will be 16.99 years

  plot(density(Age))
  lines(sort(Age),dlnorm(sort(Age-16.99),mean(log(Age-16.99)),sd(log(Age-16.99))),col='blue',lwd=2)

O.K. Give credit where credit is due.  Darryl Holman suggested this, but fit in his “mle” package.  Now plot the survivorship against the Kaplan-Meier:

  plot(survfit(Surv(Age)),xlim=c(17,40))
  lines(sort(Age),1-plnorm(sort(Age-16.99),mean(log(Age-16.99)),sd(log(Age-16.99))),col='blue',lwd=2)
 

Looks pretty good.  Now let’s plot the hazard:

  f.t<-dlnorm(sort(Age-16.99),mean(log(Age-16.99)),sd(log(Age-16.99)))
  s.t<-1-plnorm(sort(Age-16.99),mean(log(Age-16.99)),sd(log(Age-16.99)))
  plot(sort(Age),f.t/s.t)
 

And let’s skip trying to do the “empirical” hazard with kernels (ick)

Censored data

There’s a lot of mortality data for the Korean War, but it is interval censored.  Get it from:

  download.file('http://konig.la.utk.edu/kia2.txt','kia2.txt')
  kia2<-dget('kia2.txt')
 

Make a function to get the log-likelihood:

  kia3<-function (x,deaths=kia2) 
  {
          mu<-x[1]
          sd<-x[2]
 
          t<-log(deaths[,1:2]-16.9999)
          S<-pnorm(t,mu,sd,lower=F)
          d<-S[,1]-S[,2]
          obs<-deaths[,3]
          lnlk<-crossprod(obs,log(d))
          return(lnlk)
  }

And maximize the log-likelihood to get the estimates:

> optim(c(1,2),kia3,control=list(fnscale=-1))
 
$par
[1] 1.4122365 0.7700474
 
$value
[1] -27049.53
 
$counts
function gradient 
      61       NA 
 
$convergence
[1] 0
 
$message
NULL
 
Warning message: 
NaNs produced in: pnorm(q, mean, sd, lower.tail, log.p)
 

October 13, 2003

Parametric hazard models

Let’s continue that example from last time:

First, we need a quick step-function to represent survivorship such as would come from a life table.  You might be able to coax either “survival” or “stepfun” to do this, but let’s take the brain dead approach:

lftable<-function () 
{
attach(kia2)
n.ints<-NROW(kia2)
n.deaths<-sum(deaths)
lx<-c(n.deaths,n.deaths-cumsum(deaths))/n.deaths
x<-c(start[1],rep(start[2:n.ints],each=2),50,50)
y<-c(rep(lx[1:n.ints],each=2),0)
print(x)
print(y)
plot(x,y,type='l')
}
 

To this we’ll add our parametric model, for which we need a another function:

 
kia4<-function (t) 
{
        mu<-1.4122364
        sd<-0.7700474
        t<-log(t-16.9999)
        S<-pnorm(t,mu,sd,lower=F)
        return(S)
}
 

and then add the parametric line to the plot like:

 
lines(17:50,kia4(17:50),col='blue',lwd=2)
 

But enough of this, let’s look at some Bosnian data:

 
  download.file('http://konig.la.utk.edu/bosmale.txt','bosmale.txt')
  bosmale<-dget('bosmale.txt')
 

First do the K-M plot:

  plot(survfit(Surv(bosmale)))
 

Now let’s fit the simplest hazard model we can think of.  That would be an exponential model:

 
survreg(Surv(bosmale-7.9999)~rep(1,772),dist='exp')
 
Call:
survreg(formula = Surv(bosmale - 7.9999) ~ rep(1, 772), dist = "exp")
 
Coefficients:
(Intercept) rep(1, 772) 
   3.637551    0.000000 
 
Scale fixed at 1 
 
Loglik(model)= -3580.2   Loglik(intercept only)= -3580.2
        Chisq= 0 on 1 degrees of freedom, p= 1 
n= 772
 
and add the hazard model line to the plot:
 
lines(8:80,exp(-(1/exp(3.637551))*(0:72)),col='blue',lwd=2)
 

Well, that looked pretty terrible, because mortality is not following an exponential.

So, let’s try a two parameter model – the Gompertz.  The Gompertz is not in the “survival” library, so make your own function and find the maximum likelihood:

gompertz<- function (x,t=bosmale) 
{
        a3<-x[1]
        b3<-x[2]
        shift<-0
        h.t<-a3*exp(b3*(t-shift))
        S.t<-exp(a3/b3*(1-exp(b3*(t-shift))))
        return(S.t*h.t)
}
 
gompertz2<-function (x) 
{
lnlk<-sum(log(gompertz(x,age)))
return(lnlk)
}
 
> optim(c(.01,.01),gompertz2,control=list(fnscale=-1))
$par
[1] 0.003765531 0.047976935
 
$value
[1] -3323.380
 
$counts
function gradient 
      81       NA 
 
$convergence
[1] 0
 
$message
NULL
 

Now add the Gompertz survivorship line:

 
gompertz3<-function (t) 
{
        a3<-0.003765531
        b3<-0.047976935
        shift<-0
        h.t<-a3*exp(b3*(t-shift))
        S.t<-exp(a3/b3*(1-exp(b3*(t-shift))))
        return(S.t)
}
 
lines(0:80,gompertz3(0:80),col='green4',lwd=2)
 

and for fun you could find the MRDT as log(2)/b3

Now for a three parameter function, the Gompertz-Makeham.  But we’re only going to use those who are 20 or over (for “homework” figure out how to form the subset “bosmale20”)

     OK - here's the answer to that:

bosmale20=bosmale[bosmale>=20]
GM = function(age=bosmale20){
library(survival)
library(alabama)
options(warn=-1)
N=length(age)
age=sort(age)
shift=age[1]

makeham=function(x)
{
 a2=x[1]; a3=x[2]; b3=x[3]
 lnlk=0
 for(i in 1:N){
 t=age[i]
 h.t=a2+a3*exp(b3*(t-shift))
 S.t=exp(-a2*(t-shift)+a3/b3*(1-exp(b3*(t-shift))))
 lnlk=lnlk-log(h.t*S.t)
 }
 return(lnlk)
}

hin=function(x){h=0; h[1]=x[1]; h[2]=x[2]}

makeham2=function(a2,a3,b3,t) exp(-a2*(t-shift)+a3/b3*(1-exp(b3*(t-shift))))


plot(survfit(Surv(age)~1),xlim=range(age),xlab='Age',ylab='Survivorship')
start.theta=rep(0.01,3)
par = constrOptim.nl(par=start.theta,fn=makeham,control.outer=list(trace=F),hin=hin)$par
y=makeham2(par[1],par[2],par[3],age)
lines(age,y,lwd=2)
cat('Shift = ')
cat(age[1])
cat('\n')
return(par)
}

And run:

GM()
 
 

October 20, 2003

Age Estimation and Parametric Hazard Models

We will need a data set with some morphological data that can be used to estimate age-at-death.  Paste the following into your “Rgui” to get data on periodontal recession, root transparency, and root height.

  download.file('http://konig.la.utk.edu/debbi.txt','debbi.txt')
  debbi<-dget('debbi.txt')
 

Now let’s fit a linear model for trans/rh regressed on age.

  attach(debbi)
  plot(AGE,trans/rh)
  abline(lm(trans/rh~AGE))

And we can find the un-normalized density across age for someone with a trans/rh of 0.2:

debbi4<-function ()
{
##################################################################
#                                                                #
#  A function to get the probability density for trans/rh=.20    #
#  at exact age t                                                #
#                                                                #
##################################################################
toothprob<-function (t,xt=.2) 
{
 pred<-predict(tra,data.frame(AGE=t))
 p<-dnorm(xt,pred,see)
 return(p)
} 
 
attach(debbi)
tra<-lm((trans/rh)~AGE)
see<-summary(tra)$sigma
 
hpd<-optimize(toothprob,c(-40,100),max=T)$maximum
cat('\n\n Maximum density occurs at ',hpd,'\n')
plot(-20:120,toothprob(-20:120),type='l',xlab='Age',ylab='Density',xlim=c(-20,120))
lines(c(-20,120),c(0,0))
}

How does this compare to the age estimate we would get the “usual” way?

tr<-trans/rh
predict(lm(AGE~tr),data.frame(tr=.2))
 

So the ages differ by about 12.7 years.  Could we plot the densities together?  Yes, but have to normalize within the “debbi4” function:

debbi4<-function ()
{
##################################################################
#                                                                #
#  A function to get the probability density for trans/rh=.20    #
#  at exact age t                                                #
#                                                                #
##################################################################
toothprob<-function (t,xt=.2) 
{
 pred<-predict(tra,data.frame(AGE=t))
 p<-dnorm(xt,pred,see)
 return(p)
} 
##################################################################
#                                                                #
#  Need this function because of calling "integrate"             #
#                                                                #
##################################################################
toothprob2<-function (xt,t=.2) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-toothprob(t[j],xt)
                    }
        return(p)
}
##################################################################
#  pia()                                                         #
#  This function finds the unormalized posterior density of age  #
#  conditional on trans/rh=.2 and uniform prior                  #
#                                                                #
##################################################################
pia<-function(tt)
{
  p<-toothprob2(tt,xt=.2)
  return(p)
}
 
##################################################################
#  pia2()                                                        #
#                                                                #
#  Normalizes pia()                                              #
#                                                                #
##################################################################
pia2<-function(tt)
{
  p<-toothprob2(tt,xt=.2)
  return(p/denom)
}
 
 
attach(debbi)
tr<-trans/rh
 
# Classical calibration
 
tra<-lm(tr~AGE)
see<-summary(tra)$sigma
 
# Inverse calibration
 
tra2<-lm(AGE~tr)
see2<-summary(tra2)$sigma
pred2<-predict(tra2,data.frame(tr=.2))
plot(-40:120,dnorm(-40:120,pred2,see2),type='l',xlab='Age',ylab='Density',col='blue',lwd=2)
 
denom<-integrate(pia,-40,100)$value
 
lines(-40:120,pia2(-40:120),lwd=2)
lines(c(-40,120),c(0,0))
legend(80,.01,c('Inverse','Classical'),lty=c(1,1),lwd=c(2,2),col=c('blue','black'))
}
 

November 3, 2003

Parametric Statistical Models for Ordinal Categorical Data (a.k.a., “transition analysis”)

Now we return to the Bosnian data to fit a cumulative probit model (proportional odds), a forward continuation ratio probit (the original “transition model”), a backward continuation ratio probit, and an unrestricted cumulative probit model.  The data used here can be obtained by:

download.file('http://konig.la.utk.edu/malepube.txt','malepube.txt')
malepube<-dget('malepube.txt')

 

Cumulative probit

For the cumulative probit (proportional odds) model we use a slightly edited version of the function polr() from the MASS library.  polr() does “proportional odds logistic regression,” while the variant below is called popr() because it does proportional odds probit regression.

popr<-function(){
junk<-polr(factor(malepube$suchey)~malepube$age,method='probit')
mu<-as.vector(junk$zeta)
sdev<-as.vector(junk$coef)
mu<-mu/sdev
sdev<-1/sdev
curve(dnorm(x,mu[1],sdev),0,100,xlab='Age',ylab='Density')
curve(dnorm(x,mu[2],sdev),0,100,add=T)
curve(dnorm(x,mu[3],sdev),0,100,add=T)
curve(dnorm(x,mu[4],sdev),0,100,add=T)
curve(dnorm(x,mu[5],sdev),0,100,add=T)
return(mu,sdev)
}

Now, run as:

popr()
 

Note the very low average age of transition for phase I to II in the Suchey phases.  This is an artifact of assuming a constant standard deviation for transition distributions (i.e., I/II and V/VI have the same standard deviation).  There have been a number of ways proposed to fix this problem.  The simplest way is to assume log normals for the transition:

popr2<-function(){
junk<-polr(factor(malepube$suchey)~log(malepube$age),method='probit')
mu<-as.vector(junk$zeta)
sdev<-as.vector(junk$coef)
mu<-mu/sdev
sdev<-1/sdev
curve(dlnorm(x,mu[1],sdev),0,100,xlab='Age',ylab='Density')
curve(dlnorm(x,mu[2],sdev),0,100,add=T)
curve(dlnorm(x,mu[3],sdev),0,100,add=T)
curve(dlnorm(x,mu[4],sdev),0,100,add=T)
curve(dlnorm(x,mu[5],sdev),0,100,add=T)
return(exp(mu),sdev)
}

Now, run as:

popr2()

Another way is to use a “forward continuation ratio” model, for which we will again use the probit (Boldsen et al. use the logit, and call this “transition analysis”).

Forward Continuation Ratio Probit

The forward-continuation ratio is a common enough problem that Frank Harrell has included a utility (cr.setup) in his “Design” library to implement the analysis.  You will need this library, as well as his library of utility functions (Hmisc).  Below is a function called “my.tran” that will do the forward continuation ratio model:

my.tran<- function () 
{
 library(Hmisc)
 library(Design)
 attach(malepube)
 mu<-0
 sdev<-0
 types<-c('all','suchey>=2','suchey>=3','suchey>=4','suchey>=5')
 junk<-cr.setup(suchey)
 for(i in 1:5){
 x<-as.vector(glm(junk$y[junk$cohort==types[i]]
           ~age[junk$subs[junk$cohort==types[i]]],
           family=binomial(link='probit'))$coefficients)
 x[2]<-(-x[2])
 mu[i]<-x[1]/x[2]
 sdev[i]<-1/x[2]
             }
 return(t(cbind(mu,sdev)))
}

Backward Continuation Ratio Probit

If there is a forward, then there must be a backward.  Ralf Bender has provided a “crback.setup” function, which is listed below.  See Bender’s webpage (http://wwwhomes.uni-bielefeld.de/rbender/softw.htm) for the original, and for a useful paper [Bender and Benner, 2000] comparing some ordinal categorical models):

crback.setup<-function(y)
{
yname <- as.character(substitute(y))
if(!is.category(y))
y <- factor(y, exclude = NA)
y <- unclass(y)# in case is.factor
ylevels <- levels(y)
kint <- length(ylevels) - 1
y <- as.integer(y - 1)
reps <- ifelse(is.na(y), 1, ifelse(y > 1, kint - y + 1, kint))
subs <- rep(1:length(y), reps)
cuts <- vector("list", kint + 2)
cuts[[1]] <- NA
for(j in 0:kint)
cuts[[j + 2]] <- if(j < 1) 1:kint else j:kint
cuts <- unlist(cuts[ifelse(is.na(y), 1, y + 2)])
y <- rep(y, reps)
Y <- 1 * (y == cuts)
labels <- c(paste(yname, "<=", ylevels[2:kint], sep = ""), "all")
cohort <- factor(cuts, levels = 1:kint, labels = labels)
list(y = Y, cohort = cohort, subs = subs, reps = reps)
}

my.tran2() will fit the model:

my.tran2<-function () 
{
 library(Hmisc)
 library(Design)
 attach(malepube)
 mu<-0
 sdev<-0
 types<-c('suchey<=2','suchey<=3','suchey<=4','suchey<=5','all')
 junk<-crback.setup(suchey)
 for(i in 1:5){
 y<-junk$y[junk$cohort==types[i]]
 y<-abs(y-1)
 x<-as.vector(glm(y
           ~age[junk$subs[junk$cohort==types[i]]],
           family=binomial(link='probit'))$coefficients)
 x[2]<-(-x[2])
 mu[i]<-x[1]/x[2]
 sdev[i]<-1/x[2]
             }
 return(t(cbind(mu,sdev)))
}
 
my.tran2()

Unrestricted Cumulative Probit

Unfortunately, this model cannot currently be fit using “R.”  CatReg (see http://www.epa.gov/ncea/catreg.htm) should do the job, but it is currently available only for S+, and it has numerical difficulties because of  cross-overs” that occur early in the optimization.  There is FORTRAN source code and an executable for this model available at http://konig.la.utk.edu/nphases2.htm. 

Comparison of Forward, Backward, and Unrestricted

Does it make a difference which you use?  Probably not much.  Below is a function to compare the three:

ehk09<- function (i=1,top=90) 
{
######################################
mu<-c(21.00696, 22.67136, 27.64891, 44.35809, 66.16238)
mu.forward<-c(21.89896, 20.02034, 23.82622, 40.25905, 62.47263)
mu.backward<-c(23.708705, 24.645957, 30.50105, 48.01302, 66.65829)
 
sdev<-c(3.115283, 4.063592, 9.083801, 19.05828, 15.619209)
sdev.forward<-c(2.190101, 5.810575, 11.16819, 25.32287, 18.24818)
sdev.backward<-c(3.856725,  4.457414,  9.11710, 22.02238, 16.26292)
######################################
probit<-function (t=50,ip=5) 
{
alpha<-mu
beta<-sdev
imax<-6
p<-0
p[1]<-1-pnorm(t,alpha[1],beta[1])
for(i in 2:(imax-1))
{p[i]<-pnorm(t,alpha[i-1],beta[i-1])-pnorm(t,alpha[i],beta[i])}
p[imax]<-pnorm(t,alpha[imax-1],beta[imax-1])
for(i in 2:(imax-1)){if(p[i]<0) p[i]<-0}
return(p[ip]/sum(p))
}
######################################
probit.forward<-function (t=17,ip=5) 
{
imax<-6
p<-0
for(i in 1:(imax-1)){
p[i]<-pnorm(t,mu.forward[i],sdev.forward[i])
}
p<-c(1,p)
p<-cumprod(p)
ifelse(ip==6,p<-p[6],p<-p[ip]-p[ip+1])
return(p)
}
######################################
probit.backward<-function (t=17,ip=5) 
{
imax<-6
p<-1
 
if(ip==imax){
p<-pnorm(t,mu.backward[imax-1],sdev.backward[imax-1])
return(p)
}
 
if(ip==(imax-1)){
p<-pnorm(t,mu.backward[imax-2],sdev.backward[imax-2])*(1-pnorm(t,mu.backward[imax-1],sdev.backward[imax-1]))
return(p)
}
 
for(i in ip:(imax-1)){
p<-p*(1-pnorm(t,mu.backward[i],sdev.backward[i]))
}
if(ip==1) return(p)
p<-p*pnorm(t,mu.backward[ip-1],sdev.backward[ip-1])
return(p)
}
######################################
probit2<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit(t[j],i)
                    }
        return(p)
}
######################################
probit3<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit.forward(t[j],i)
                    }
        return(p)
}
######################################
probit4<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit.backward(t[j],i)
                    }
        return(p)
}
######################################
pia<-function(tt)
{
 
  p<-probit2(i,tt)
  return(p)
}
 
######################################
pia2<-function(tt)
{
  p<-probit2(i,tt)
  return(p/denom1)
}
######################################
pia3<-function(tt)
{
 
  p<-probit3(i,tt)
  return(p)
}
######################################
pia4<-function(tt)
{
  p<-probit3(i,tt)
  return(p/denom2)
}
######################################
pia5<-function(tt)
{
 
  p<-probit4(i,tt)
  return(p)
}
######################################
pia6<-function(tt)
{
  p<-probit4(i,tt)
  return(p/denom3)
}
######################################
 
denom1<-integrate(pia,17,120)$value
denom2<-integrate(pia3,17,120)$value
denom3<-integrate(pia5,17,120)$value
y1<-pia2(17:90)
y2<-pia4(17:90)
y3<-pia6(17:90)
p.max<-max(c(y1,y2,y3))
plot(17:90,y1,type='l',ylim=c(0,p.max),xlim=c(17,top),xlab='Age',ylab='Density',lwd=2)
lines(17:90,y2,lwd=2,lty=2)
lines(17:90,y3,lwd=2,lty=3)
start.x<-2/3*(top-17)+17
if(i==6) start.x<-1/3*(top-17)+17
legend(start.x,p.max,c('Unrestricted','Forward','Backward'),lty=c(1,2,3),lwd=c(2,2,2))
}
 

November 10, 2003

Highest Posterior Densities in Forensics and Estimation of Hazards in Paleodemography

Armed with the unrestricted cumulative probit from last time, let’s try to find “highest posterior densities” for age using Suchey phases for Bosnian material.  Last time when we compared unrestricted cumulative probit and the forward and backward continuation ratios we assumed an uninformative prior for age (see ehk09 above).  This is not a particularly logical thing to do, as there is prior information we could use, in particular see Komar (2003, Journal of Forensic Sciences 48:713-716).  Get Komar’s life table data as follows:

  download.file('http://konig.la.utk.edu/komar.txt','komar.txt')
  komar<-dget('komar.txt')

and at the prompt type “komar” to make sure you’ve got it right.  Now write a Makeham function as follows (note that because “R” is case-sensitive this will not over-write your old function called “makeham”):

Makeham<-function (x,t) 
{
        a2<-x[1]
        a3<-x[2]
        b3<-x[3]
        shift<-20
        S.t<-exp(-a2*(t-shift)+a3/b3*(1-exp(b3*(t-shift))))
        return(S.t)
}

and a “wrapper” function to get the log-likelihood:

Makeham2<- function (x) 
{
attach(komar)
S.t<-Makeham(x,age)
d.t<-c(diff(-S.t),S.t[NROW(age)])
lnlk<-as.vector(log(d.t)%*%deaths)
return(lnlk)
}
 

Then use “optim” to find the mle’s for the hazard parameters:

> optim(rep(.01,3),Makeham2,control=list(fnscale=-1))
$par
[1] 0.023907712 0.005382351 0.063442251
 
$value
[1] -17762.83
 
$counts
function gradient 
     196       NA 
 
$convergence
[1] 0
 
$message
NULL
 

And plot the lifetable lx with the Makeham survivorship:

 
Lftable<- function () 
{
attach(komar)
n.ints<-NROW(komar)
n.deaths<-sum(deaths)
lx<-c(n.deaths,n.deaths-cumsum(deaths))/n.deaths
print(age)
lx<-lx[1:n.ints]
print(lx)
plot(age,lx)
}
 
> lines(20:70,Makeham(c(0.023907712, 0.005382351, 0.063442251),20:70),col='blue',lwd=2)
 

And add the line from our previous data:

 
> lines(20:70,Makeham(c(0.025555897, 0.000834712, 0.088227287),20:70),col='red',lwd=2)
 

Ouch, don’t look very similar.  Now let’s combine the prior with phase information:

 
lrage<-function (i=5,bot=15,top=100,area=.9,a2=0.012482505,a3=0.003543657,b3=0.058074052)
{
 
mu<-c(23.09848, 29.59097, 35.20909, 45.61214, 60.71858, 72.66794)
sdev<-c(12.48234, 13.94446, 17.41642, 17.66973, 17.02856, 16.06338)
 
# The default call uses male mortality at and above 15 years.  To switch to female mortality
# uncomment the following three lines
 
# a2=0.0140656844 
# a3=0.0003480993 
# b3=0.0866711354
################################################################## 
probit<-function (t=50,ip=5) 
{
alpha<-mu
beta<-sdev
imax<-7
p<-0
p[1]<-1-pnorm(t,alpha[1],beta[1])
for(i in 2:(imax-1))
{p[i]<-pnorm(t,alpha[i-1],beta[i-1])-pnorm(t,alpha[i],beta[i])}
p[imax]<-pnorm(t,alpha[imax-1],beta[imax-1])
for(i in 2:(imax-1)){if(p[i]<0) p[i]<-0}
return(p[ip]/sum(p))
} 
###################################################################
probit2<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit(t[j],i)
                    }
        return(p)
}
##################################################################
#  pia()                                                         #
#  This function finds the unormalized posterior density of age  #
#  conditional on the observed phase and the Gompertz-Makeham    #
#                                                                #
##################################################################
pia<-function(tt)
{
  p<-probit2(i,tt)
  h.t<-a2+a3*exp(b3*(tt-bot))
  S.t<-exp(-a2*(tt-bot)+a3/b3*(1-exp(b3*(tt-bot))))
  return(S.t*h.t*p)
}
 
##################################################################
#  pia2()                                                        #
#  This function finds the normalized posterior density of age   #
#  conditional on the observed phase and the Gompertz-Makeham    #
#                                                                #
##################################################################
pia2<-function(tt)
{
  p<-probit2(i,tt)
  shift<-bot
  h.t<-a2+a3*exp(b3*(tt-bot))
  S.t<-exp(-a2*(tt-bot)+a3/b3*(1-exp(b3*(tt-bot))))
  return(S.t*h.t*p/denom)
}
 
##################################################################
#  pia5()                                                        #
#  Searches left and right to find cut points, such that the     #
#  integral between cut points less the desired area is near     #
#  zero                                                          #
#                                                                #
##################################################################
pia5<-function(divid)
{
   pia4<-function(tt)
    {
     cd<-pia2(tt)-prob*divid
     return(cd)
    }
 
  low<-uniroot(pia4,c(bot,hpd))$root
   hi<-uniroot(pia4,c(hpd,top))$root
  return(integrate(pia2,low,hi)$value-area)
}
 
##################################################################
#  pia6()                                                        #
#  Returns the cut points after they have been found by pia5()   #
#                                                                #
##################################################################
pia6<-function(divid)
{
 
  pia4<-function(tt)
   {
    cd<-pia2(tt)-prob*divid
    return(cd)
   }
  
  low<-uniroot(pia4,c(bot,hpd))$root
  hi<-uniroot(pia4,c(hpd,top))$root
  return(c(low,hi))
}
#################################################################
pia7<-function(t)
{return(integrate(pia2,bot,t)$value-area)}
#################END OF FUNCTIONS################################
 
denom<-integrate(pia,bot,top)$value
hpd<-optimize(pia,c(bot,top),max=T)$maximum
prob<-pia2(hpd)
prob2<-pia2(bot)
tunebot<-1.00001*prob2/prob
cat(' tunebot = ',tunebot,'\n')
cat(' Change tunebot value?: ')
ans<-scan(what='character',n=1)
if(ans=='y'){
  cat('\n  Enter value: ') 
  tunebot<-scan(n=1)
            }
tune<-c(tunebot,.99999)
plot(c(bot,bot:top),c(0,pia2(bot:top)),type='l',xlab='Age',ylab='Density',xlim=c(bot,top))
 
 
cat('\n\n Maximum density occurs at ',hpd,'\n')
 
lines(c(0,100),c(0,0))
 
hi<-uniroot(pia7,c(bot,top))$root
print(c(pia2(hi),prob2))
if(pia2(hi)<prob2)
{
  x<-c((bot:top)[(bot:top)<hi],hi)
  polygon(c(bot,x,x[NROW(x)]),c(0,pia2(x),0),density=10)
  cat(area*100,'% HPD from bottom [',bot,']',hi,'\n\n')
 
}
 
if(pia2(hi)>prob2)
{
 divid<-uniroot(pia5,tune)$root
 x<-pia6(divid)
 x<-c(x[1],(bot:top)[(bot:top)>x[1] & (bot:top)<x[2]],x[2])
 polygon(c(x[1],x,x[NROW(x)]),c(0,pia2(x),0),density=10)
 cat(area*100,'% HPD \n')
 return(c(x[1],hpd,x[NROW(x)]))
}
}
 

November 17, 2003

Estimation of Hazards in Paleodemography

Now we will pretend that for the 214 males in the Bosnian sample with Suchey scores we do not know the ages-at-death.  As a consequence, all we can do is count the number of individuals in each phase.

obs<-c(13,6,21,66,71,37)

Now we need to write a function that will integrate across age using “transition” parameters from a known age sample, as in the following:

integ<- function (xx) 
{
 
########################################################
        pia<-function (i,a) 
{
p<-probit2(i,a)
return(p)
}
########################################################
# BOSNIAN
# mu<-c(21.007911,22.673167,27.655352,44.352924,65.992957)        
# sdev<-c(3.115333,4.063623,9.080723,19.009080,15.456997)
 
# LOS ANGELES
 mu<-c(21.060991,25.598330,28.393430,40.575782,69.879650)
 sdev<-c(2.186370,5.107382,6.642665,11.897768,18.843960)
########################################################
probit<-function (t=50,ip=5) 
{
alpha<-mu
beta<-sdev
imax<-6
p<-0
p[1]<-1-pnorm(t,alpha[1],beta[1])
for(i in 2:(imax-1))
{p[i]<-pnorm(t,alpha[i-1],beta[i-1])-pnorm(t,alpha[i],beta[i])}
p[imax]<-pnorm(t,alpha[imax-1],beta[imax-1])
for(i in 2:(imax-1)){if(p[i]<0) p[i]<-0}
return(p[ip]/sum(p))
}
########################################################
probit2<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit(t[j],i)
                    }
        return(p)
}
########################################################
Makeham<-function (x,t)
{
        a2<-0
        a3<-x[1]
        b3<-x[2]
        shift<-17
        h.t<-a2+a3*exp(b3*(t-shift))
        S.t<-exp(-a2*(t-shift)+a3/b3*(1-exp(b3*(t-shift))))
        return(S.t*h.t)
}
 
        p<-0
        for(i in 1:6){
        p[i]<-integrate(function(t){pia(i,t)*Makeham(xx,t)},17,120)$value
                     }
return(p/sum(p))        
}
 

Note that this function uses Suchey’s L.A. sample for “transition” parameters, but it also has the Bosnian parameters if we want to cheat.  Now write a likelihood function:

lnlk<- function (xx) 
{
return(as.vector(log(integ(xx))%*%obs))
}

 
 

and find the maximum likelihood estimates:

optim(c(.01,.01),lnlk,control=list(fnscale=-1))
 
$par
[1] 0.01408801 0.04407935
 
$value
[1] -339.2635
 
$counts
function gradient 
      89       NA 
 
$convergence
[1] 0
 
$message
NULL
 

How does this compare to the actual survivorship from the ages-at-death?

nderiv<- function () 
{
V<-c(3.312196e-06, -6.523779e-06, -6.523779e-06,  1.730414e-05)
V<-matrix(V,nc=2)
V2<-c(1.101711e-05, -3.675924e-05, -3.675924e-05,  1.545947e-04)
V2<-matrix(V2,nc=2)
V3<-c(9.896624e-06, -2.802083e-05, -2.802083e-05,  1.003747e-04)
V3<-matrix(V3,nc=2)
myenv <- new.env()
assign("a3", 0.0140881, env = myenv)
assign("b3", 0.04407935, env = myenv)
assign("t",17:90, env = myenv)
evals<-numericDeriv(quote(exp(a3/b3*(1-exp(b3*(t-17))))), c("a3", "b3"), myenv)
grad<-attributes(evals)$gradient
evals<-evals[1:length(evals)]
s.e.<-0
for(i in 17:90)
{
 ip=i-16
 s.e.[ip]<-sqrt(as.vector(grad[ip,]%*%V2%*%grad[ip,]))
 }
 
plot(17:90,evals+1.96*s.e.,type='l',xlab='Age',ylab='Survivorship',col='blue')
lines(17:90,evals-1.96*s.e.,col='blue')
x<-c(17:90,90:17)
y<-c(evals+1.96*s.e.,rev(evals-1.96*s.e.))
polygon(x,y,col='blue')
 
myenv <- new.env()
assign("a3", 0.0135843, env = myenv)
assign("b3", 0.04192333, env = myenv)
assign("t",17:90, env = myenv)
evals<-numericDeriv(quote(exp(a3/b3*(1-exp(b3*(t-17))))), c("a3", "b3"), myenv)
grad<-attributes(evals)$gradient
evals<-evals[1:length(evals)]
for(i in 17:90)
{
 ip=i-16
 s.e.[ip]<-sqrt(as.vector(grad[ip,]%*%V%*%grad[ip,]))
 }
 
x<-c(17:90,90:17)
y<-c(evals+1.96*s.e.,rev(evals-1.96*s.e.))
polygon(x,y,col='black')
 
myenv <- new.env()
assign("a3", 0.01343238, env = myenv)
assign("b3", 0.04310035, env = myenv)
assign("t",17:90, env = myenv)
evals<-numericDeriv(quote(exp(a3/b3*(1-exp(b3*(t-17))))), c("a3", "b3"), myenv)
grad<-attributes(evals)$gradient
evals<-evals[1:length(evals)]
for(i in 17:90)
{
 ip=i-16
 s.e.[ip]<-sqrt(as.vector(grad[ip,]%*%V3%*%grad[ip,]))
 }
x<-c(17:90,90:17)
y<-c(evals+1.96*s.e.,rev(evals-1.96*s.e.))
 lines(17:90,evals+1.96*s.e.,col='red',lwd=2)
 lines(17:90,evals-1.96*s.e.,col='red',lwd=2)
legend(60,1,legend=c('Actual','L.A.','Bosnia'),fill=c('black','blue','red'))
}

Looks pretty good, but now for the scary part:

> integ(c(0.01408801, 0.04407935))*sum(obs)
[1] 13.27944 17.44172 11.64160 53.12957 87.54793 30.95974
> obs
[1] 13  6 21 66 71 37

Looks bad.