We write a single “age indicator”
as *C*, for “character.” The
character can take any of 1,..,*I* values, each represented as *c _{i}*. For example, in the Suchey-Brooks pubic phase
system the character states are

where _{} is the standard
normal integral, _{} is a set of
intercepts with _{} equal to negative
infinity and _{} equal to positive
infinity, and _{} is a slope. The method of maximum likelihood can be used
on a reference sample in order to estimate the parameters _{} and _{}. In addition to proprietary
software, there are a number of freely available programs that can be used to
fit the ordinal probit model (tda, which is available from ftp://ftp.stat.ruhr-uni-bochum.de/pub/tda/,nkotp
from ftp://k7moa.gsia.cmu.edu/nkotp.zip
and see http://k7moa.gsia.cmu.edu/nkotp.htm
for documentation, and MIXOR from http://www.uic.edu/~hedeker/mix.html). Following Boldsen and co-worker's
terminology, the ordinal probit model can also be written in terms of the mean
“age to transitions” and the standard deviation of the transition
distributions, so that equation becomes:

where _{}is the normal probability density function at variate value *a*
with mean m
and standard deviation s.

The representation of the ordinal probit in equation makes clear one particular problem, which is that the integration across age starts at negative infinity. There are two ways to circumvent this problem. The first, and simplest way, is to measure age in a logarithmic scale, or equivalently to let the “age to transition” distributions be log-normal rather than normal. The second way is to allow the standard deviations for transitions to vary by phase, so that equation becomes:

In our experience, the standard deviations for transitions between early phases are generally so small relative to the means, that the integrals from negative infinity to zero (across age) are negligible. Equation does, however, produce a different problem, which is that some of the calculated probabilities for being in a particular phase at a particular age can be negative. To circumvent this problem, we set any calculated probabilities to zero and renormalize (i.e., divide the probability of being in a particular phase at a particular age by the sum of the probabilities of being in each phase at that age).

Both models that we have discussed (the log-normal model and the separate standard deviation model) are supported in CatReg (U.S. EPA 2000), a collection S+ routines available from http://www.epa.gov/ncea/catreg.htm and http://www.stat.uiuc.edu/~simpson/papers.html. In CatReg the separate standard deviation model is referred to as an “unrestricted cumulative” probit, while in Hedeker et al.’s (1999) terminology this is referred to as a “thresholds of change model.” In general, we have not been able to fit the separate standard deviation model in CatReg, or in MIXOR, because the occurrence of negative probabilities early in the optimization of the total log-likelihood causes severe numerical problems. We have written our own software which uses a tensor method (Chow et al. 1994) for maximizing the log-likelihood, and which replaces negative probabilities with positive values near zero. In our experience this method has almost always converged properly, and yields answers identical to the “unconstrained cumulative” probit in CatReg when that program has converged properly.

NPHASES2 is written in FORTRAN 77 and was compiled using the GNU compiler g77 (in Windows ’98). When you obtain the executable it also comes with the source code (“nphases2.for”) in case you want to modify and re-compile, and with a test data set called “stewart.dat” (these are the ages and pubic phases in the Todd system for 358 individuals from the McKern and Stewart Korean War Dead study). To run nphases2.exe click on its icon or use a DOS window to run it. You will be prompted for a file name, and it is simplest if the file is in the same directory as the executable. On that note, I strongly recommend that you do not run the program “from the desktop,” as most people who do that have little chance of finding the path to their data files. The stuff below is what you will see on your screen, where the blue font represents your answers. Output is appended to a file called “junk.out.”

Name of input file:

stewart.dat

I read
358 records. If that is wrong,
bail!

PAUSE statement executed

To
resume execution type go. Other input
will terminate the job.

go

This system has 10 phases

Model to fit?

("C" or "c" will fit
cumulative probit,

"L" or "l" will fit
log age cumulative probit,

any other single character will fit

multiple s.d. model):

c

Bottom truncation age?:

-1000

Top truncation age?:

1000

nopar =
10

INITIAL FUNCTION VALUE F= 0.7783672510073E+03

INITIAL GRADIENT G= 0.1067040628979E+03-0.7033229053722E+02-0.2452956146482E+02-0.2560136423968E+01-0.5417676633931E+01-0.7138809903946E+00
0.2453813637782E+02-0.2343716163692E+02-0.7530402087538E+01 0.3092735961612E+02

----------- ITERATION 1
----------------

X= 0.1952383859257E+02 0.2027291494099E+02
0.2146744689456E+02 0.2400697040114E+02 0.2668122749510E+02 0.2896725086346E+02
0.3028639393876E+02 0.3151697174511E+02 0.3471555599626E+02 0.4528791742785E+01

F(X)= 0.7516959811154E+03

G(X)=
0.9671368859902E+02-0.6395290800352E+02-0.2220154557146E+02-0.2262817546794E+01-0.4858710084908E+01-0.6406097763174E+00
0.2229295836350E+02-0.2117806466688E+02-0.6751974075181E+01 0.2646326869703E+02

----------- ITERATION 2
----------------

X= 0.1829815952978E+02 0.1969586953910E+02
0.2161373582133E+02 0.2457896447418E+02 0.2767576452350E+02 0.2964025813414E+02
0.3072021865315E+02 0.3283829437895E+02 0.3750775641667E+02 0.2693270850434E+01

F(X)= 0.6990714429314E+03

G(X)=
0.3386741265048E+02-0.3631685556265E+02-0.7546418979795E+01 0.2871656402900E+01
0.1462232795863E+01 0.3609745638392E+00 0.1306661289861E+02-0.3832301237806E+01
0.6144807827946E+00-0.9688545789301E+02

.

.

.

----------- ITERATION 7
----------------

X= 0.1561109877024E+02 0.2013905856279E+02 0.2256982849924E+02
0.2513979706418E+02 0.2784830905376E+02 0.2957148915937E+02 0.3054926553731E+02
0.3432316275048E+02 0.3982809918431E+02 0.4500524337221E+01

F(X)= 0.6081384792419E+03

G(X)=-0.7599598613541E-05-0.2606296839793E-04
0.1784020950171E-04 0.3203291321853E-04-0.7229352203519E-05-0.4486042227881E-04
0.1005002827707E-03 0.1179400417934E-03-0.2346896592650E-04-0.2859763366388E-03

OPTSTP
RELATIVE GRADIENT CLOSE TO ZERO.

OPTSTP
CURRENT ITERATE IS PROBABLY SOLUTION.

FINAL X= 0.1561108853982E+02
0.2013905471715E+02 0.2256982376472E+02 0.2513979048805E+02 0.2784830141980E+02
0.2957147731599E+02 0.3054924678061E+02 0.3432313361715E+02 0.3982811697564E+02
0.4500534490873E+01

GRADIENT G=-0.4605820359870E-06
0.3570270325310E-06-0.4778628547525E-06-0.1430041142758E-06
0.0000000000000E+00-0.2431460175903E-06
0.1176819021968E-06-0.1047425771762E-06-0.9026521324173E-07-0.1597629561161E-05

FUNCTION VALUE F(X)= 0.6081384792382E+03 AT
ITERATION 8

End parms = 15.6110885
20.1390547 22.5698238 25.1397905
27.8483014

29.5714773
30.5492468 34.3231336 39.828117
4.50053449 0.

1.30303408E+270 1.07266771E+270
1.07195056E+270 1.30301843E+270

1.30304261E+270 1.05658906E+270
8.48798324E-314 3.33928504E-289

4.52090354E-308

0

0.531710688

0.369493084

0.36217331

0.390503705

0.431154948

0.462691614

0.485789855

0.615553357

1.04282831

0.270274789

__input/output__

__ __

The input format is free-field, with a column of ages and a column of phases, like so (this taken from the “stewart.dat”) file:

18.5681 1

21.03765 3

20.82409 4

20.26831 3

21.27858 4

.

.

.

31.89596 8

39.77276 9

36.50103 9

28.41068 8

34.26694 6

30.4011 5

Note that there should not be any “tabs.” The only allowed characters are numerals and spaces, and a decimal for the age. The last line of data should be the last line (don’t put any blank lines below).

The output in “junk.out” looks like the following:

Data read from file stewart.dat with 358 records

This system has 10 phases

Model fit with option "c"

Following model was fit with truncation at
ages

-1000.0 and 1000.0

15.611089 0.531711

20.139055 0.369493

22.569824 0.362173

25.139790 0.390504

27.848301 0.431155

29.571477 0.462692

30.549247 0.485790

34.323134 0.615553

39.828117 1.042828

4.500534 0.270275

-608.138479

Data read from file stewart.dat with 358 records

This system has 10 phases

Model fit with option "l"

Following model was fit with truncation at
ages

-1000.0 and 1000.0

2.842642 0.018351

3.008901 0.013086

3.100221 0.013039

3.198777 0.014059

3.301085 0.015242

3.364243 0.016143

3.399392 0.016842

3.531747 0.020985

3.716118 0.034767

0.155982 0.008928

-589.737731

Data read from file stewart.dat with 358 records

This system has 10 phases

Model fit with option "s"

Following model was fit with truncation at
ages

-1000.0 and 1000.0

17.747852 0.422595

20.419633 0.256458

22.107561 0.246049

24.086645 0.314911

27.087691 0.440840

28.962254 0.516070

30.603027 0.705234

36.380301
1.328248

47.220229 4.591539

2.249520 0.452320

3.023636 0.337138

2.602468 0.246692

2.919999 0.265688

4.169001 0.366412

4.660805 0.456106

6.159124 0.784415

7.318319 1.163636

9.236322 2.713909

-578.792138

What you see above are three runs, one with “c” (for cumulative
probit), one with “l” (for cumulative probit on log age) and one with the separate standard deviation model. The left column contains estimates and the
right contains standard errors of those estimates. The last (negative) number is the log-likelihood of the
model. In the “c” model the average age
for moving from a Todd I to a Todd II is 15.61 years, for Todd II to Todd III
is 20.14,…, and for Todd IX to Todd X is 39.83, and all of these transitions
have a standard deviation of 4.5 years.
In the “l” model the interpretation is the same, but on a natural log
scale. In the final model the first
transition has an average of 17.75 years with a standard deviation of 2.25
years, while the last transition has an average of 47.22 years with a standard
deviation of 9.24 years. Finally, what
are the truncations about? They are an
old feature I originally put in the program that I suggest you don’t use (so
use –1000 and 1000 as the truncations).
If you are using the log age model you must use a positive number for
the lower truncation point, because the log of a negative number is
undefined. I often use something like
0.0001, which on a natural log scale is about –9.2.

__How
to get “NPHASES2” up and running__

1) Download the file from http://konig.la.utk.edu/nphases.exe

2) Put the file in a directory and “run it” (either double click or “run” in a DOS Window).

3) The file is a self-extracting file, so you will get the executable, plus other “stuff.”

__One
“Bug”__

The program is written so that the
amount of user input is quite minimal.
Consequently, the program picks reasonable starting values, so that you
don’t have to. In an earlier version
some of these starting values were far enough away from reality that the
optimization did not converge properly.
If you get results that look ridiculous (e.g., parameters with enormous
standard errors) or mean age-to-transitions of, oh say, 600 years, let me
know. I’m still trying to fix this
part. E-mail to: lylek@utk.edu.

** **

Chow T, Eskow E and Schnabel R (1994)
Algorithm 739: A software package for unconstrained optimization using tensor
methods. *ACM Transactions on
Mathematical Software*, **20**, 518-530.

Hedeker D, Mermelstein RJ and Weeks KA
(1999) The thresholds of change model: an approach to analyzing stages of
change data. *Annals of Behavioral
Medicine*, **21**, 61-70.

Johnson VE and Albert JH (1999) *Ordinal
Data Modeling*. New York, NY:
Springer-Verlag.

Long JS (1997) *Regression Models
for Categorical and Limited Dependent Variables*. Thousand Oaks, CA: Sage Publications.

Powers DA and Xie Y (2000) *Statistical
Methods for Categorical Data Analysis*.
New York, NY: Academic Press.