Anda di halaman 1dari 8

What is...?

series Second edition Statistics

Supported by sanofi-aventis

What is a
Cox model?
● A Cox model is a statistical technique for exploring the
Stephen J Walters BSc
MSc PhD CStat Reader in relationship between the survival of a patient and several
Medical Statistics, School explanatory variables.
of Health and Related
Research (ScHARR),
University of Sheffield ● Survival analysis is concerned with studying the time
between entry to a study and a subsequent event (such as
death).

● A Cox model provides an estimate of the treatment effect


on survival after adjustment for other explanatory variables.
In addition, it allows us to estimate the hazard (or risk) of death
for an individual, given their prognostic variables.

● A Cox model must be fitted using an appropriate computer


program (such as SAS, STATA or SPSS). The final model from a
Cox regression analysis will yield an equation for the hazard
as a function of several explanatory variables.

● Interpreting the Cox model involves examining the coefficients


for each explanatory variable. A positive regression
coefficient for an explanatory variable means that the hazard
is higher, and thus the prognosis worse. Conversely, a negative
regression coefficient implies a better prognosis for patients
with higher values of that variable.

For further titles in the series, visit:


www.whatisseries.co.uk

Date of preparation: May 2009 1 NPR09/1005


What is
a Cox model?

What is a Cox model?


What is the purpose of the different types of treatment for malignant
Cox model? melanoma of the skin, although the patients
may be followed up for several years, there
The Cox model is based on a modelling will be some patients who are still alive at the
approach to the analysis of survival data. The end of the study. We do not know when these
purpose of the model is to simultaneously patients will die, only that they are still alive
explore the effects of several variables on at the end of the study; therefore, we do not
survival. know their survival time from the start of
The Cox model is a well-recognised treatment, only that it will be longer than
statistical technique for analysing survival their time in the study. Such survival times
data. When it is used to analyse the survival of are termed censored, to indicate that the
patients in a clinical trial, the model allows us period of observation was cut off before the
to isolate the effects of treatment from the event of interest occurred.
effects of other variables. The model can also From a set of observed survival times
be used, a priori, if it is known that there are (including censored times) in a sample of
other variables besides treatment that individuals, we can estimate the proportion of
influence patient survival and these variables the population of such people who would
cannot be easily controlled in a clinical trial. survive a given length of time under the same
Using the model may improve the estimate of circumstances. This method is called the
treatment effect by narrowing the confidence product limit or Kaplan–Meier method.
interval. Survival times now often refer to The method allows a table and a graph to be
the development of a particular symptom or produced; these are referred to as the life table
to relapse after remission of a disease, as well and survival curve respectively.
as to the time to death.

Why are survival times Kaplan–Meier estimate of


censored? the survivor function
A significant feature of survival times is that The data on ten patients presented in Table 1
the event of interest is very rarely observed in refer to the survival time in years following
all subjects. For example, in a study to treatment for malignant melanoma of the
compare the survival of patients having skin.

Table 1. Calculation of Kaplan–Meier estimate of the survivor function

A B C D E F
Survival time Number at Number of Number Proportion Cumulative
(years) risk at start deaths censored surviving until proportion
of study end of interval surviving
0.909 10 1 0 1 – 1/10 = 0.900 0.900
1.112 9 1 0 1 – 1/9 = 0.889 0.800
1.322* 8 0 1 1 – 0/8 = 1.000 0.800
1.328 7 1 0 1 – 1/7 = 0.857 0.686
1.536 6 1 0 1 – 1/6 = 0.833 0.571
2.713 5 1 0 1 – 1/5 = 0.800 0.457
2.741* 4 0 1 1 – 0/4 = 1.000 0.457
2.743 3 1 0 1 – 1/3 = 0.667 0.305
3.524* 2 0 1 1 – 0/2 = 1.000 0.305
4.079* 1 0 1 1 – 0/1 = 1.000 0.305
* Indicates a censored survival time

Date of preparation: May 2009 2 NPR09/1005


What is
a Cox model?
censored survival time is then taken to occur
1.0 –
immediately after the death time when
Survival function calculating the survivor function.
Censored A plot of the Kaplan–Meier estimate of the
0.8 – survivor function (Figure 1) is a step function,
Cumulative proportion surviving

in which the estimated survival probabilities


are constant between adjacent death times
0.6 – and only decrease at each death.
An important part of survival analysis is to
produce a plot of the survival curves for each
0.4 –
group of interest.1 However, the comparison
of the survival curves of two groups should be
0.2 – based on a formal non-parametric statistical
test called the logrank test, and not upon
visual impressions.2 Figure 2 shows the
0.0 – survival of patients treated for malignant
0 1 2 3 4 5 melanoma: the survival of 338 patients on
Overall survival (years from surgery) interferon treatment was compared with that
of 336 patients in the control group.3 The two
groups of patients appear to have similar
Figure 1. Kaplan–Meier To determine the Kaplan–Meier estimate of survival and the logrank test supports this
estimate of the survival the survivor function for the above example, a conclusion.
function series of time intervals is formed. Each of
these intervals is constructed to be such that Modelling survival – the Cox
one observed death is contained in the regression model
interval, and the time of this death is taken to The logrank test cannot be used to explore
occur at the start of the interval. (and adjust for) the effects of several variables,
Table 1 shows the survival times arranged such as age and disease duration, known to
in ascending order (column A). Some survival affect survival. Adjustment for variables that
times are censored (that is, the patient did not are known to affect survival may improve the
die during the follow-up period) and these are precision with which we can estimate the
labelled with an asterisk. The number of treatment effect.
patients who are alive just before 0.909 years The regression method introduced by Cox
is ten (column B). Since one patient dies at is used to investigate several variables at a
0.909 years (column D), the probability of time.4 It is also known as proportional
dying by 0.909 years is 1/10 = 0.10. So the hazards regression analysis.
corresponding probability of surviving up to Briefly, the procedure models or regresses
0.909 years is 1 minus the probability of the survival times (or more specifically, the
dying (column F) or 0.900. so-called hazard function) on the explanatory
The cumulative probability of surviving up variables. The actual method is much too
to 1.112 years, then, is the probability of complex for detailed discussion here. This
surviving at 1.112 years, and surviving publication is intended to give an
throughout the preceding time interval – that introduction to the method, and should be of
is, 0.900 x 0.889 = 0.800 (column F). The use in the understanding and interpretation
third time interval (1.322 years) contains of the results of such analyses. A more
censored data, so the probability of surviving detailed discussion is given by Machin et al5
in this time interval is 1 or unity, and the and Collett.6
cumulative probability of surviving is
unchanged from the previous interval. This is
the Kaplan–Meier estimate of the survivor What is a hazard function?
function. The hazard function is the probability that
Sometimes the censored survival times an individual will experience an event (for
occur at the same time as deaths. The example, death) within a small time interval,

Date of preparation: May 2009 3 NPR09/1005


What is
a Cox model?

Figure 2. Kaplan–Meier 1.00 –

Cumulative proportion surviving


survival curves in
patients receiving
treatment for 0.75 –
malignant melanoma3
0.50 –

0.25 –

0.0 –
0 2 4 6 8
Time from randomisation to death (years)
Number at risk
Control 336 203 97 22 0
Interferon 338 215 84 23 0

Control Interferon
Hazard ratio 0.92 (95% CI: 0.74–1.13); p=0.411 (logrank)

given that the individual has survived up to variable is the hazard function at a given time.
the beginning of the interval. It can therefore If we have several explanatory (X) variables of
be interpreted as the risk of dying at time t. interest (for example, age, sex and treatment
The hazard function – denoted by h(t) – group), then we can express the hazard or risk
can be estimated using the following of dying at time t as:
equation:
h(t) = h0(t) x exp(bage.age + bsex.sex + ...
number of individuals experiencing + bgroup.group)
an event in interval beginning at t
h(t) = taking natural logarithms of both sides:
(number of individuals surviving at
time t) x (interval width)
ln h(t) = ln h0(t) x exp(bage.age + bsex.sex + ...
+ bgroup.group)
What is regression?
If we want to describe the relationship The quantity h0(t) is the baseline or
between the values of two or more variables underlying hazard function and corresponds
we can use a statistical technique called to the probability of dying (or reaching an
regression.7 If we have observed the values event) when all the explanatory variables are
of two variables, X (for example, age of zero. The baseline hazard function is
children) and Y (for example, height of analogous to the intercept in ordinary
children), we can perform a regression of Y on regression (since exp0 = 1).
X. We are investigating the relationship The regression coefficients bage to bgroup give
between a dependent variable (the height the proportional change that can be expected
of children) based on the explanatory in the hazard, related to changes in the
variable (the age of children). explanatory variables. They are estimated by a
When more than one explanatory (X) complex statistical method called maximum
variable needs to be taken into account (for likelihood,6 using an appropriate computer
example, height of the father), the method is program (for example, SAS, SPSS or STATA).
known as multiple regression. Cox’s The assumption of a constant relationship
method is similar to multiple regression between the dependent variable and the
analysis, except that the dependent (Y) explanatory variables is called proportional

Date of preparation: May 2009 4 NPR09/1005


What is
a Cox model?

Figure 3. Complementary 1–
log-log plot3
0–

-1 –

ln{–ln[survival probability]} -2 – Randomised


group
Control
-3 – Interferon

-4 –

-5 –

-6 –

-3 -2 -1 0 1 2
ln (time)

hazards. This means that the hazard Interpretation of the model


functions for any two individuals at any As mentioned above, the Cox model must be
point in time are proportional. In other fitted using an appropriate computer
words, if an individual has a risk of death at program. The final model from a Cox
some initial time point that is twice as high regression analysis will yield an equation for
as that of another individual, then at all later the hazard as a function of several
times the risk of death remains twice as high. explanatory variables (including treatment).
This assumption of proportional hazards So how do we interpret the results? This is
should be tested.6 illustrated by the following example.
The testing of the proportional hazards Cox regression analysis was carried out on
assumption is most straightforward when we the data from a randomised trial comparing
compare two groups with no covariates. The the effect of low-dose adjuvant interferon alfa-
simplest check is to plot the Kaplan–Meier 2a therapy with that of no further treatment
survival curves together (Figure 2).3 If they in patients with malignant melanoma at high
cross, then the proportional hazards risk of recurrence.3,8 Malignant melanoma is a
assumption may be violated. For small data serious type of skin cancer, characterised by
sets, where there may be a great deal of error uncontrolled growth of pigment cells called
attached to the survival curve, it is possible melanocytes. Treatments include surgical
for curves to cross, even under the removal of the tumour; adjuvant treatment;
proportional hazards assumption. A more chemo- and immunotherapy, and radiation
sophisticated check is based on what is therapy. In this trial, 674 patients with a
known as the complementary log-log plot. radically resected malignant melanoma (who
With this method, a plot of the logarithm of were at high risk of disease recurrence) were
the negative logarithm of the estimated randomly assigned to one of two treatment
survivor function against the logarithm of groups: interferon (3 megaunits of interferon
survival time will yield parallel curves if the alfa-2a three times a week until recurrence of
hazards are proportional across the groups cancer, or for two years – whichever occurred
(Figure 3).3 first) or no further treatment. The primary

Date of preparation: May 2009 5 NPR09/1005


What is
a Cox model?
aim of this multicentre study was to An individual regression coefficient is
determine the effects of interferon on overall interpreted quite easily. Note that patients are
survival. Patients were followed for up to eight either given interferon (coded as 1) or not
years from randomisation.8 (coded as 0). From Table 2, the estimated
The final Cox model included two hazard in the interferon group is exp(–0.90) =
demographic (age and gender) and one 0.914 of that of the control group; that is, a
baseline clinical variable (histology) as 9% decrease in the risk of death after
independent prognostic factors, plus a adjustment for the other explanatory
treatment variable (Table 2). An approximate variables in the model. However, the p-value
test of significance for each variable is of 0.404 is not statistically significant and the
obtained by dividing the regression estimate b 95% confidence interval for the hazard ratio
by its standard error SE(b), and comparing the includes 1, suggesting no difference in
result with the standard normal distribution. survival. In this study the authors concluded
Values of this ratio greater than 1.96 will be that there was no significant difference in
statistically significant at the 5% level. The overall survival between interferon-treated
Cox model is shown in Table 2. patients and those in the control group, even
The first feature to note in such a table is after adjustment for prognostic factors.8
the sign of the regression coefficients. A For explanatory variables that are
positive sign means that the hazard (risk of continuous (for example, age) the regression
death) is higher, and thus the prognosis coefficient refers to the increase in log hazard
worse, for subjects with higher values of that for an increase of 1 in the value of the
variable. Thus, from Table 2, older age and covariate. Thus, the estimated hazard or risk of
regionally metastatic cancer histology are death increases by exp(0.004) = 1.004 times if a
associated with poorer survival, whereas patient is a year older, after adjustment for the
being male is associated with better survival. effects of the other variables in the model

Table 2. Cox regression model fitted to the data from the AIM HIGH trial of interferon versus
no further treatment (control) in malignant melanoma (n=674)

Variable Regression Standard p-value eb Hazard 95% CI for


coefficient (b) error SE(b) ratio* hazard ratio
Lower Upper
Age 0.004 0.004 0.359 1.004 0.996 1.012
Sex –0.312 0.110 0.005 0.732 0.590 0.909
(0 = female,
1 = male)
Histology 0.001
Histology (1) –0.033 0.234 0.887 0.967 0.612 1.530
(0 = localised,
1 = LM)
Histology (2) 0.446 0.204 0.029 1.562 1.048 2.330
(0 = localised,
1 = RMD)
Histology (3) 0.569 0.154 0.001 1.766 1.306 2.387
(0 = localised,
1 = RMR)
Group –0.090 0.108 0.404 0.914 0.740 1.129
(0 = control,
1 = interferon)
* Risk of death according to treatment assignment and prognostic variables

CI: confidence interval; LM: locally metastatic; RMD: regionally metastatic at diagnosis; RMR: regionally metastatic at recurrence

Date of preparation: May 2009 6 NPR09/1005


What is
a Cox model?
(Table 2). The overall effect on survival for an hazard function is not restricted to a specific
individual patient, however, cannot be form, the semi-parametric model has
described simply, as it depends on the patient’s considerable flexibility and is widely used.
values of the other variables in the model. However, if the assumption of a particular
probability distribution for the data is valid,
inferences based on such an assumption are
Other models more precise. That is, estimates of the hazard
Cox regression is considered a ‘semi- ratio will have smaller standard errors and
Figure 4. Examples of parametric’ procedure because the baseline hence narrower confidence limits.
hazard functions over hazard function, h0(t), (and the probability A fully parametric proportional
time for exponential distribution of the survival times) does not hazards model makes the same assumptions
(a), Weibull (b) and have to be specified. Since the baseline hazard as the Cox regression model but, in addition,
Gompertz (c) is not specified, a different parameter is used also assumes that the baseline hazard
distributions for each unique survival time. Because the function, h0(t), can be parameterised
according to a specific model for the
distribution of the survival times. Survival
(a) 0.15 – time distributions that can be used for this
purpose (those that have the proportional
Hazard function

hazards property) are mainly the


exponential, Weibull and Gompertz
distributions.
Figure 4 shows examples of the hazard
0.0 –
functions for the exponential, Weibull and
0 10
Time Gompertz distributions. The simplest model
for the hazard function is to assume that it is
constant over time. The hazard of death at any
(b) 1– time after the start of the study is then the
same, irrespective of the time elapsed, and the
hazard function follows an exponential
distribution (Figure 4A). In practice, the
assumption of a constant hazard function (or
Hazard function

equivalently exponentially distributed survival


times) is rarely tenable. A more general form
of hazard function is called the Weibull
distribution. The shape of the Weibull hazard
function depends critically on the value of
something called the shape parameter,
0– typically denoted by the Greek letter gamma,
0 0.5 γ. Figure 4B shows the general form of this
Time
Gamma >2
hazard function for different values of gamma.
Gamma = 2 Since the Weibull hazard function can take a
Gamma = 1 variety of forms depending on the value of the
0< gamma <1 shape parameter gamma, this distribution is
widely used in the parametric analysis of
survival data. When the hazard of death is
(c) 0.12 –
expected to increase or decrease with time in
Hazard function

the short term and then to become constant, a


hazard function that follows a Gompertz
distribution may be appropriate (Figure 4C).
Different distributions imply different
0– shapes of the hazard function, and in practice
0 10 the distribution that best describes the
Time
functional form of the observed hazard

Date of preparation: May 2009 7 NPR09/1005


What is...? series

2. Altman DG. Practical Statistics for Medical Research. London:


What is
function is chosen.6 Fitting three parametric
proportional hazard models, assuming
Chapman & Hall/CRC, 1991: 365–396.
3. Dixon S, Walters SJ, Turner L, Hancock BW. Quality of life and
a Cox model?
cost-effectiveness of interferon-alpha in malignant melanoma:
exponential, Weibull and Gompertz baseline results from randomised trial. Br J Cancer 2006; 94: 492–498.
hazards, to the malignant melanoma trial 4. Cox DR. Regression models and life tables. J Roy Statist Soc B
data produced similar regression coefficients 1972; 34: 187–220.
5. Machin D, Cheung YB, Parmar M. Survival Analysis: A Practical
to the standard Cox model in Table 2. Approach, 2nd edn. Chichester: Wiley, 2006.
A family of fully parametric models that 6. Collett D. Modelling Survival Data in Medical Research, 2nd edn.
London: Chapman & Hall/CRC, 2003.
accommodate, directly, the multiplicative 7. Campbell MJ, Machin D, Walters SJ. Medical Statistics: A text
effects of explanatory variables on survival book for the health sciences, 4th edn. Chichester: Wiley, 2007.
8. Hancock BW, Wheatley K, Harris S et al. Adjuvant interferon in
times, and hence do not have to rely on high-risk melanoma: the AIM HIGH Study – United Kingdom
proportional hazards, are called accelerated Coordinating Committee on Cancer Research randomized study
of adjuvant low-dose extended-duration interferon Alfa-2a in
failure time models. These models are too high-risk resected malignant melanoma. J Clin Oncol 2004; 22:
complex for a discussion here, and a more 53–61.
detailed discussion is given by Collett.6
Further reading
Chapter 13 of Altman2 provides a good introduction to survival
References analysis, the logrank test and the Cox regression model. A more
1. Freeman JV, Walters SJ, Campbell MJ. How to display data. detailed technical discussion of survival analysis and Cox
Oxford: Blackwell BMJ Books, 2008. regression is given by Machin et al and Collett.5,6

Box 1. Glossary of terms survival experience of the two groups is


the same.
Confidence interval (CI). A range of values,
calculated from the sample of observations Logarithms. Logarithms are mainly used in
that are believed, with a particular probability, statistics to transform a set of observations to
to contain the true parameter value. A 95% values with a more convenient distribution.
confidence interval implies that if the The natural logarithm (logex or ln x) of a First edition published 2001
estimation process were repeated again and quantity x is the value such that x = ey. Here e Author: Stephen J Walters
again, then 95% of the calculated intervals is the constant 2.718281… The log of 1 is 0
would be expected to contain the true and the log of 0 is minus infinity. Log This publication, along with
parameter value. Note that the stated transformation can only be used for data the others in the series, is
probability level refers to the properties of the where all x values are positive. available on the internet at
interval and not to the parameter itself. www.whatisseries.co.uk
SE or se. The standard error of a sample The data, opinions and statements
ex or exp(x). The exponential function, mean or some other estimated statistics (for appearing in the article(s) herein
denoting the inverse procedure to that of example, regression coefficient). It is the are those of the contributor(s)
taking logarithms. measure of the uncertainty of such an concerned. Accordingly, the
estimate and it is used to derive a confidence sponsor and publisher, and their
Logrank test. A method for comparing the interval for the population value. The notation respective employees, officers
survival times of two or more groups of SE(b) means the ‘standard error of b’. and agents, accept no liability
subjects. It involves the calculation of observed for the consequences of any such
and expected frequencies of failures in p. The probability value, or significance level, inaccurate or misleading data,
separate time intervals. The relevant test from a hypothesis test. p is the probability of opinion or statement.
statistic is a comparison of the observed the data (or some other more extreme data)
number of deaths occurring at each particular arising by chance when the null hypothesis
point with the number to be expected if the is true. Published by Hayward Medical
Communications, a division of
Hayward Group Ltd.
Copyright © 2009 Hayward
Group Ltd.
Supported by sanofi-aventis All rights reserved.

Date of preparation: May 2009 8 NPR09/1005

Anda mungkin juga menyukai