"R" and Forensic Age Estimation

        This page contains some computer "stuff" for use in forensic age estimation, all of it implemented using the statistical package "R."  The main reason for using "R" is that the package is a free download, and the "code" from the page you are looking out now reads more or less like English.  Further, you can easily modify the code to do things the way you want (as opposed to the way I want).  And finally, the documentation for "R" is excellent, and free (again!)(available as *.pdf and *.html).  There's a bit of a learning curve, but when you consider the utility of "R" for doing both statistics and publication quality graphics, it is well worth the adventure.

 

        "R" executables can be downloaded from the following address:

http://temper.stat.cmu.edu/R/CRAN/

In all of the examples from this page you start the "RGui" going, then mark from this page and paste into the "R" window and hit an "enter" ("return").  If you want to edit code do something like:

mystuff<-edit(yourstuff)

to edit an old file and place it in a new name, or save over by using mystuff<-edit(mystuff).  Be sure to save your workspace when you exit "R" if you want to keep what you've done.


Highest posterior density bounded on the left and the right

        This code is for finding the most likely age at death and the left and right bounds for the highest posterior density, conditional on:

1.      Phase parameters from a reference sample of known age

2.      The observed phase for a forensic case

3.      An informative prior for age-at-death (expressed as a Gompertz-Makeham hazard model)

To begin with, we need some phase parameters.  Below are the mean “ages-to-transition” and standard deviations for the “age to transition” taken from Suchey-Brooks scores on a known age reference sample.  Copy and paste these into the “Rgui.”  Note that there are six transitions, because there are seven phases.



 
mu<-c(17.12492, 21.70249, 35.23981, 49.41972, 62.61990, 71.76162)
sdev<-c(5.720715, 14.154538, 14.022637, 19.150908, 18.777776, 18.229731)
 

The next two functions do probit analysis (i.e., find the probability of being in a particular phase conditional on age).  Pay carefully attention to the “imax<-7” statement.  This should be equal to the actual number of phases, so edit if need be.


 
probit<-function (t=50,ip=5) 
{
alpha<-mu
beta<-sdev
imax<-7
p<-0
p[1]<-1-pnorm(t,alpha[1],beta[1])
for(i in 2:(imax-1))
{p[i]<-pnorm(t,alpha[i-1],beta[i-1])-pnorm(t,alpha[i],beta[i])}
p[imax]<-pnorm(t,alpha[imax-1],beta[imax-1])
for(i in 2:(imax-1)){if(p[i]<0) p[i]<-0}
return(p[ip]/sum(p))
}
 


probit2<-function (i,t) 
{
        n<-NROW(t)
        p<-0
        for(j in 1:n){
          p[j]<-probit(t[j],i)
                    }
        return(p)
}

lrage – this function finds the left and right bounds for an HPD (lrage stands for left and right age)

 

And now we’re into the nitty-gritty.  The following function finds the most likely age and the specified percentage hpd.  The parameters in the call are as follows:

i = 5 – This is the observed phase for the forensic case

bot = 15 – This is the minimum possible age.  Note that this age is subtracted from the age that goes into the hazard model

top = 100 – This is the maximum possible age

area = .9 – This is the proportion in the hpd.  “.9” means 90%.

a2, a3, b3 – These are the Gompertz-Makeham parameters (where age starts at 0.0 because we subtract 15 to make it do so), and the parameters (by default) are from Bosnian males.  To switch to females uncomment the three indicated lines

 

When you run this you’ll get a plot and a triplet of numbers (the left bound, the highest density, and the right bound).  HOWEVER, if you run this for a case where the hpd runs into a bound on the left, then you will get the defined bottom and the upper age for the requested hpd

 


 
lrage<-function (i=5,bot=15,top=100,area=.9,a2=0.012482505,a3=0.003543657,b3=0.058074052)
{
 
# The default call uses male mortality at and above 15 years.  To switch to female mortality
# uncomment the following three lines
 
# a2=0.0140656844 
# a3=0.0003480993 
# b3=0.0866711354
 
 
##################################################################
#  pia()                                                         #
#  This function finds the unormalized posterior density of age  #
#  conditional on the observed phase and the Gompertz-Makeham    #
#                                                                #
##################################################################
pia<-function(tt)
{
  p<-probit2(i,tt)
  h.t<-a2+a3*exp(b3*(tt-bot))
  S.t<-exp(-a2*(tt-bot)+a3/b3*(1-exp(b3*(tt-bot))))
  return(S.t*h.t*p)
}
 
##################################################################
#  pia2()                                                        #
#  This function finds the normalized posterior density of age   #
#  conditional on the observed phase and the Gompertz-Makeham    #
#                                                                #
##################################################################
pia2<-function(tt)
{
  p<-probit2(i,tt)
  shift<-bot
  h.t<-a2+a3*exp(b3*(tt-bot))
  S.t<-exp(-a2*(tt-bot)+a3/b3*(1-exp(b3*(tt-bot))))
  return(S.t*h.t*p/denom)
}
 
##################################################################
#  pia5()                                                        #
#  Searches left and right to find cut points, such that the     #
#  integral between cut points less the desired area is near     #
#  zero                                                          #
#                                                                #
##################################################################
pia5<-function(divid)
{
   pia4<-function(tt)
    {
     cd<-pia2(tt)-prob*divid
     return(cd)
    }
 
# print(c(divid,pia4(bot),pia4(hpd),pia4(top)))
  low<-uniroot(pia4,c(bot,hpd))$root
   hi<-uniroot(pia4,c(hpd,top))$root
# print(c(low,hi))
  return(integrate(pia2,low,hi)$value-area)
}
 
##################################################################
#  pia6()                                                        #
#  Returns the cut points after they have been found by pia5()   #
#                                                                #
##################################################################
pia6<-function(divid)
{
 
  pia4<-function(tt)
   {
    cd<-pia2(tt)-prob*divid
    return(cd)
   }
  
  low<-uniroot(pia4,c(bot,hpd))$root
  hi<-uniroot(pia4,c(hpd,top))$root
  return(c(low,hi))
}
#################################################################
pia7<-function(t)
{return(integrate(pia2,bot,t)$value-area)}
#################END OF FUNCTIONS################################
 
denom<-integrate(pia,bot,top)$value
hpd<-optimize(pia,c(bot,top),max=T)$maximum
prob<-pia2(hpd)
prob2<-pia2(bot)
tunebot<-1.00001*prob2/prob
cat(' tunebot = ',tunebot,'\n')
cat(' Change tunebot value?: ')
ans<-scan(what='character',n=1)
if(ans=='y'){
  cat('\n  Enter value: ') 
  tunebot<-scan(n=1)
            }
tune<-c(tunebot,.99999)
plot(c(bot,bot:top),c(0,pia2(bot:top)),type='l',xlab='Age',ylab='Density',xlim=c(bot,top))
 
 
cat('\n\n Maximum density occurs at ',hpd,'\n')
 
lines(c(0,100),c(0,0))
 
hi<-uniroot(pia7,c(bot,top))$root
print(c(pia2(hi),prob2))
if(pia2(hi)<prob2)
{
  x<-c((bot:top)[(bot:top)<hi],hi)
  polygon(c(bot,x,x[NROW(x)]),c(0,pia2(x),0),density=10)
  cat(area*100,'% HPD from bottom [',bot,']',hi,'\n\n')
 
}
 
if(pia2(hi)>prob2)
{
 divid<-uniroot(pia5,tune)$root
 x<-pia6(divid)
 x<-c(x[1],(bot:top)[(bot:top)>x[1] & (bot:top)<x[2]],x[2])
 polygon(c(x[1],x,x[NROW(x)]),c(0,pia2(x),0),density=10)
 cat(area*100,'% HPD \n')
 return(c(x[1],hpd,x[NROW(x)]))
}
}

 

And now, for some examples:

 

> lrage(i=1)
 tunebot =  1.000018 
 Change tunebot value?: 1: n
Read 1 items
 
 
 Maximum density occurs at  15.00007 
90 % HPD from bottom [ 15 ] 23.73827

Note that because you requested the hpd for the first phase (which is primarily defined by the truncation age at 15) you get an hpd that is defined as starting at age 15.  Disregard the “maximum density occurs at 15.00007, because really that’s 15.0 years.  Following is a more typical example where the hpd is defined at both the bottom and the top age:

 

> lrage(i=2)
 tunebot =  0.1165028 
 Change tunebot value?: 1: n
Read 1 items
 
 
 Maximum density occurs at  24.97376 
90 % HPD 
[1] 16.13532 24.97376 41.51356

 

And now a less typical example:

 

> lrage(i=7)
 tunebot =  0.002685022 
 Change tunebot value?: 1: n
Read 1 items
 
 
 Maximum density occurs at  69.85449 
Error in uniroot(pia4, c(hpd, top)) : f() values at end points not of opposite sign
 

That didn’t work, so try bumping “tunebot” up a bit:

 

> lrage(i=7)
 tunebot =  0.002685022 
 Change tunebot value?: 1: y
Read 1 items
 
  Enter value: 1: .05
Read 1 items
 
 
 Maximum density occurs at  69.85449 
90 % HPD 
[1] 46.51535 69.85449 87.57487