Chest 2007;131;628-632
DOI 10.1378/chest.06-2088
The online version of this article, along with updated information
and services can be found online on the World Wide Web at:
http://chestjournals.org
Guideline: Confirm That the Assumptions Guideline: Report How Any Outlying
of the Analysis Were Met and State How Values Were Treated in the Analysis
Each Was Checked
Outliers are extreme values that appear to be
A statement that the assumptions were verified anomalies. Outliers cannot be ignored: even a single
and by which methods is all that need be included. outlier can have a profound effect on the relationship
There are both formal checks (eg, hypothesis tests) derived from the regression line.2,3 All outliers must
and informal checks (eg, inspection of graphs of be reported, but it is permissible to report the results
residuals) for these assumptions. Sometimes, data with and without the outliers to indicate their effect
that violate the assumptions can be adjusted (eg, with on the results.
data transformations) to meet the assumptions. If
such adjustments were made, they should be iden-
tified. Guideline: Report the Regression Model
A simple linear regression equation can be re-
ported in the text or in a scatter plot of the data.
Guideline: Report How Any Missing Data Multiple linear regression models can be reported as
Were Treated in the Analyses equations (Fig 1) or in tables (Table 1); logistic
Missing data can be a problem in multivariate regression models are typically reported in tables
analysis because it reduces the sample size unless because the equations are so complex (Table 2).
corrective measures are taken. To create a model for
predicting weight from age and height, for example, Guideline: Report the actual p Value and
values for each of these variables must be collected the 95% Confidence Interval for the
for each patient. If age is missing from one patient, Regression Coefficient(s) of the
the patient is excluded from the analysis, and the Explanatory Variable(s), and in Logistic
sample size is reduced by one. In regression models Regression, Report the Odds Ratio and the
with several variables, losses to missing data can be Associated 95% Confidence Interval
common.
However, missing data can be replaced in a pro- In regression analysis, the regression coefficient
cess called imputation. Simple imputation methods for an explanatory variable indicates how much the
average value of the response variable, Y, varies with with p values less than 0.1 on univariate analysis are
each unit change in the explanatory variable, X. The considered for inclusion in the model.
coefficient, or -weight, is an estimate and so should The second step in building a regression model is
be accompanied by a confidence interval that indi- to identify the best combination of explanatory vari-
cates its precision. ables to include in the model. In simultaneous
Odds ratios are widely used in logistic regression regression, all of the explanatory variables are in-
analysis. For a binary explanatory variable, the odds cluded in the model and are tested as a group. In
ratio is the ratio of the odds that an event will occur hierarchical regression, the investigator defines the
in one group to the odds that the event will occur in number and order in which the explanatory variables
the other group. An odds ratio of 1 means that both are entered into the model. Common procedures are
groups have a similar likelihood of having a heart forward, backward, stepwise, and best-subset tech-
attack. The larger the odds ratio, the more likely the
niques.
event is expected to occur in the group used in the
numerator.
Table 2—A Table for Reporting a Multiple Logistic Regression Model With Four Explanatory Variables*
The most common ANOVA procedures used in ANOVA is typically used to compare three or
biomedical research are as follows: more group means on a certain response variable. It
• One-way ANOVA assesses the effect of a single can also be expanded to include additional explana-
(hence the “one-way” designation) categorical ex- tory variables and can assess their simultaneous
planatory variable (sometimes called a factor) on a effects on the response variable. Whereas the pur-
single continuous response variable. Note, too, pose of regression analyses is usually to predict the
that the factor (category) has three or more alter- value of the response variable, the purpose of
natives (or “levels” or “values”; eg, blood type is A, ANOVA is usually to compare groups for differences
B, AB, or O). When there are only two alternatives in the means of the response variable. ANOVA
(two groups), this analysis reduces to Student t models are also usually reported in tables (Table 3).
test. ACKNOWLEDGMENT: This article draws heavily from How
• Two-way ANOVA assesses the effect of two cate- To Report Statistics in Medicine, by Tom Lang and Michelle
gorical explanatory variables (again, sometimes Secic.1
called factors) on a single continuous response
variable. References
• Multiway ANOVA assesses the effect of three or 1 Lang T, Secic M. How to report statistics in medicine. 2nd ed.
more categorical explanatory variables (still called Philadelphia, PA: American College of Physicians, 2006
2 Godfrey K. Simple linear regression in medical research. In:
factors) on a single continuous response variable. Bailar JC, Mosteller F, eds. Medical uses of statistics. 2nd ed.
• Analysis of covariance assesses the effect of one or Boston, MA: NEJM Books, 1992; 201–232
more categorical explanatory variables while con- 3 Altman DG, Gore SM, Gardner MJ, et al. Statistical guide-
trolling for the effects of some other (possibly lines for contributors to medical journals. BMJ 1983; 286:
continuous) explanatory variables (now called co- 1489 –1493
4 Shutty M. Guidelines for presenting multivariate statistical
variates) on a single continuous response variable. analyses in rehabilitation psychology. Rehabil Psych 1994;
• Repeated-measures ANOVA is used to assess sev- 39:141–144
eral, or repeated, measurements of the same 5 Bagley SC, White H, Golomb BA. Logistic regression in the
participants under different conditions (such as medical literature: standards for use and reporting, with
BP measurements taken while the patient is su- particular attention to one medical domain. J Clin Epidemiol
2001; 54:979 –985
pine, sitting, or standing) or at different points 6 Hosmer DW, Taber S, Lemeshow S. The importance of
over time (such as muscle strength measured 1, 5, assessing the fit of logistic regression models: a case study.
10, and 20 days after surgery). Am J Public Health 1991; 81:1630 –1635