Anda di halaman 1dari 7

Faculty Development Program

Clinical Epidemiology and Clinical Research


TOPIC: DATE: LEADERS: OBJECTIVES: 1. Interpret a logistic regression equation. 2. Calculate OR and RR from logistic regression for continuous and categorical predictor variables. 3. Describe the main assumptions of the logistic model. 4. Describe the main errors in studies that analyze data with logistic regression. 5. Check for interactions in linear and logistic models. REQUIRED READINGS: Norman and Streiner. PDQ Statistics. 2nd Ed. Pages 65-69 (ANCOVA); 116-117 (logistic regression). Norman and Streiner. Biostatistics: The Bare Essentials. Pages: 119-127. Concato and Feinstein. The risk of determining risk with multivariable models. Ann Intern Med. 1993;201-210. PROBLEMS: A randomized trial was performed testing a new treatment against placebo with mortality at one-year as the outcome of interest. A logistic regression model was used to assess the treatment effect while adjusting for potential confounders. Questions: 1. According to the logistic regression model, is there evidence of interaction? 2. Based on the first model (no confounding or interaction considered), what is the OR that describes the treatment effect? Verify by calculating the OR in the raw data. 3. What is the treatment OR after adjusting for the potential confounder? Is there evidence of confounding? How do you make that decision? Biostatistics 5: Multivariable and Logistic Regression May (1:30 PM 5:00 PM) Art Evans

Model without the confounder beta coefficient intercept treatment (1=Tx, 0=C) -0.2 -1.0 <0.01 P value

Model with confounder beta coefficient intercept treatment (1=Tx, 0=C) confounder (1=Yes, 0=No) -2.20 -1.60 3.55 0.01 <0.01 P value

Model with confounder and interaction term intercept treatment (1=Tx, 0=C) confounder (1=Yes, 0=No) treatment*confounder -2.20 -1.60 3.55 0.001 0.01 0.01 0.99

Raw data Dead Tx Control 466 900 Alive 1534 1100 2000 2000

High Risk Stratum Tx Control 444 800 556 200 1000 1000

Low Risk Stratum Tx Control 22 100 978 900 1000 1000

Multivariable Linear Regression


1. Confusion: multivariable vs. multivariate Multivariable means that you are simultaneously considering more than one predictor variable (independent; X), eg, Y = X1+X2+X3+X4 Multivariate usually means that you are simultaneously considering more than one outcome variable (dependent; Y), eg, Y1+Y2+Y3 = X1+X2+X3+X4 2. Sample Size: Rough rule of thumb Linear regression: 10 subjects for every potential predictor variable; Logistic regression: 10 subjects in the smallest group of the dichotomous outcome variable for every potential predictor variable; Multivariate linear regression: 10 subjects for every potential variable, including the multiple dependent (Y) variables. Note: this is 10 subjects for every potential predictor, not 10 for every significant predictor in the final model! (This assumes you are interested in describing a prediction model, rather than simply being interested in controlling for lots of potential confounders while examining a specific exposuredisease relationship. If the later is true, then the combination of potential confounders counts as 1 variable.

3.

Interpretation of beta coefficients: The importance of a beta coefficient must be interpreted in light of the units of the particular X variable. For example, the beta coefficient for height measured in yards would be much smaller than if height were measured in inches (which would be 36 times bigger). Despite a difference of 36-fold between these two beta coefficients, their importance is identical. Therefore, it is impossible to judge a beta coefficient without knowing the units of measurement.

4.

Interactions: Always look for interactions among predictor variables. Two X variables may appear to have no relationship with the outcome variable until an interaction term is also considered in the model. Interaction terms are the best method to test for important differences among subgroups. Very bad method: testing within each subgroup separately and then declaring interaction present if one subgroup demonstrates a significant difference whereas in the other subgroup there is no significant difference. Always test for interaction before trying to simplify the model.

Always check for interaction before checking for confounding. (Remember: Adjusting for confounding is similar to taking the average among the subgroups. Taking the average is bad if there is important interaction, ie, the effect is markedly different among subgroups.) 5. Confounding: If the primary goal is to estimate the effect of one X variable on Y, while adjusting for possible confounding from other X variables, then see if the beta coefficient for the main X changes when all the other Xs are added to the model. If it does, then there is some confounding. If it changes a lot, then there is a lot of confounding. 6. Test linearity assumption: For multivariable linear regression (single Y; multiple Xs), the assumptions are: linear relationship between Y and Xs;

for all possible combinations of X, the distribution of Y is normal with a constant variance. Eyeball test: SPSS: Graphs: Scatter: Matrix: enter all Xs and Y: look at the row in the matrix that compares Y to each of the Xs: ask yourself: Is there really a linear relationship? Do NOT test the linearity assumption by looking at a table of correlation coefficients between Y and each of the Xs. Instead, look at the scatterplots. Check all partial regression plots to see if they are linear: SPSS: Analyze: Regression: Linear: Plots: select Produce all partial plots 7. Test for collinearity (multicollinearity): Its okay for the X variables to be correlated, but its not okay if they are nearly identical, correlations near 1.0 (completely redundant). Check that the tolerance > 0.1 for each X variable (collinearity diagnostics). (A tolerance of < 0.1 is bad and means that something has to be done.) 8. Test for outliers: unusual values of Y for combinations of Xs Do NOT plot residuals against the Observed values of Y (it will always have a positive slope = 1-R2). Instead, plot residuals against the Expected values of Y. Cooks distance tells you how much the beta coefficients will change if a particular case (outlier) is removed. If Cooks distance is > 1, then its a case with a particularly big influence and should be double checked to make sure there is no measurement error.

9.

Choosing the best model (for prediction, rather than explanation): Among the different methods (forward; backward; stepwise; best subset), backwards is often the best (start with all Xs in the model and take out the most nonsignificant predictor, and repeat until only significant predictors are left). There are better ways. But, a good rule of thumb: do it several ways, and if you get a different answer, then be cautious and get more help.

10. Multiple Linear Regression in SPSS: SPSS: Analyze: Regression: Linear only allows continuous variables as predictors. SPSS: Analyze: General Linear Model: Univariate allows different kinds of predictors.

Logistic Regression
1. Logistic regression models: 2 common goalstest associations vs. make predictions If the outcome (Y) variable is dichotomous, then logistic regression allows you to assess the association between Y and any type of X variable (nominal, ordinal, or interval), while controlling for other variables (other Xs). Logistic regression models also allow you to make predictions: for any combination of predictor (X) variables, what is the probability that Y=1? 2. Logistic equation: natural log of (odds that Y=1) = X1 + X2 +X3 3. Beta coefficients in logistic model: For each of the X variables, there will be a beta coefficient. There will also be a Y intercept term (except for case control studies). ln (odds Y=1) = bo + b1X1 + b2X2 + b3X3 Odds (Y=1) = e(bo + b1X1 + b2X2 + b3X3) If there are no interaction terms, then the odds ratio (OR) for the relationship between Y and any X is simply: eb, where b is the beta coefficient for that particular X. This odds ratio is adjusted for all the other Xs in the model. If X is an ordinal or interval variable, then the odds ratio (eb) measures the relative change in odds for every one unit change in the X variable. 4. Interactions: As with other regression models, you must force the computer to look for interactions. If there are two X variables in the model, then the relationship between Y and X1 (measured as an OR) is adjusted for the average value of X2. However, if the relationship between Y and X1 (OR) is different for different values of X2, then there is interaction. 5. 6. Interaction is good to find. It means there are important differences among subgroups of patients (subgroups defined by X2). Sample Size: You should consider only one X variable for every 10 events in the smaller of the two subgroups of Y. Again, this rule applies to the total number of potential predictor variables being considered, not the final number of significant predictors. However, if the goal is just measuring the association between one main X variable and Y, while adjusting for several possible confounders, then all the potential confounders (all the other Xs) can be considered together as the equivalent of one other variable. In this situation, you would need at least 20-30 events in the smallest subgroup of Y.

7.

Ordinal or Interval Predictor Variables: Do they meet the assumption of the model? There is a linearity assumption for logistic regression models just as there is for linear regression models. The assumption is that for any change of 1 unit in the X variable, the OR will be the same (ie, the OR for X=1 compared to X=2 will be the same as the OR for X=4 compared to X=5). This is the same as saying: there is a straight line relationship when you plot the X variable on the horizontal axis and the ln (odds Y=1) on the vertical axis. If this assumption is violated, then the conclusions of the model will be misleading. Unfortunately, there is no easy test for this assumption. Ideally, you need to visually inspect the plot, which you must create yourself. For dichotomous X variables, there is no problem, since this assumption is automatically satisfied.

8.

Goodness of Fit Test: Always inspect the goodness of fit test. It is a test for logistic regression models that compares the expected to the observed percentages of Y=1 for different combinations of Xs. If it is significant (small P value), thats bad. It means the model doesnt fit the data well. In that case, then look for important interactions or look for problems with ordinal or interval X variables that might not be satisfying the linearity assumption. Another reason might be too few outcome events in one of the subgroups of Y.

Anda mungkin juga menyukai