Goals
Understand the issue of confounding in statistical analysis Learn how to use matching and logistic regression to control for confounding
Confounding
Mostly members of the same dinner club BUT many club members also went to a city-wide food festival Food handling practices in the dinner club might be blamed for the outbreak when food eaten at the festival was the cause Membership in the dinner club could be a confounder of the relationship between attendance at the food festival and illness Analyzing the data to account for both dinner club membership and food festival attendance could help determine which event was truly associated with the outcome
Confounding
Stratification methods could be used to calculate the risk of illness due to the food festival for those in the dinner club vs. those not in the dinner club If attending the food festival was a significant risk factor for illness in both groups, then the festival would be implicated because illness occurred whether or not people were members of the dinner club
Confounding
What if there are multiple factors that might be confounding the exposure-disease relationship?
Using our previous example, what if we had to stratify by membership in the dinner club and by health status? Or stratify by other potential confounders (age, occupation, income, etc.)? Trying to stratify by all of these layers becomes difficult Logistic regression controls for many potential confounders at one time Matching when incorporated correctly into the study design, reduces confounding before analysis begins
Confounding Confounders
Risk ratio (RR) in cohort studies Odds ratio (OR) in case-control studies
May have multiple exposures significantly associated with disease or no exposures associated
In these cases you need to explore whether a confounder is present making it appear that exposures are associated with the disease (when they really are not) or making it appear that no association exists (when there really is one)
Confounders
A confounder is a variable that distorts the risk ratio or odds ratio of an exposure leading to an outcome
Confounding is a form of bias that can result in a distortion in the measure of association between an exposure and disease Confounding must be eliminated for accurate results (1)
Confounding can occur in an observational epidemiologic study whenever two groups are compared to each other
Confounding is a mixing of effects when the groups are compared (exposure-disease relationship can be affected by factors other than the relationship)
Common Confounders
Children born later in the birth order are more likely to have Downs syndrome.
Does birth order cause Downs syndrome? Norelationship is confounded by mothers age, older women are more likely to have children with Downs Mothers age confounds the association between birth order and Downs syndrome: appears there is an association when there is not (2)
Common Confounders--Examples
Womens use of hormone replacement therapy (HRT) and risk of cardiovascular disease
Some studies suggest an association, others do not Women of higher socio-economic status (SES) are more likely to be able to afford HRT Women of lower SES are at higher risk of cardiovascular disease Differences in SES may thus confound the relationship between HRT and cardiovascular disease Need to control for SES among study participants (3)
Common Confounders--Examples
Study shows women were at much greater risk of the disease than men Association is confounded by eating salad women were much more likely to order salad than men Salad was contaminated with disease-causing agent Relationship between gender and disease was confounded by salad consumption (which was the true cause of the outbreak)
Characteristics of Confounders
A confounder must be associated with the disease being studied A confounder must be associated with the exposure being studied
To control for confounding you must take the confounding variable out of the picture There are 3 ways to do this:
Restrict the analysisanalyze the exposure-disease relationship only among those at one level of the confounding variable
Stratifyanalyze the exposure-disease relationship separately for all levels of the confounding variable
Example: look at the relationship between HRT and cardiovascular disease ONLY among women of high SES Example: look at the relationship between HRT and cardiovascular disease separately among women of high SES and low SES
Conduct logistic regressionregression puts all the variables into a mathematical model
Stratification can be used to separate the effects of exposures and confounders Example: tuberculosis (TB) outbreak among homeless men
Homeless shelter and soup kitchen implicated as the place of transmission Men likely to spend time in both places To determine which site is most likely, could examine the association between the homeless shelter and TB among men who did NOT go to the soup kitchen and among men who DID go to the soup kitchen
Stratification--Example
Suspicion that one food item is confounding the other Cannot tease out the effects without stratifying because many people consumed both cookies and punch
Stratification--Example
Stratification--Example
Data continued..
Punch Exposure
Cases Punch No Punch Total
Controls 40 10 20 30
Total 60 40 100
Stratification--Example
Both cookies and punch have a high odds ratio for illness & a confidence interval that does not include 1
OR (cookies) = 3.93; 95% CI, 1.69 9.15, p= 0.001* OR (punch) = 6.00; 95% CI, 2.83 12.71, p= 0.0004*
Among those who did not drink punch, what is the odds ratio for the association between cookies and illness? Among those who did drink punch, what is the odds ratio for the association between cookies and illness? If cookies are the culprit, there should be an association between cookies and illness, regardless of whether anyone drank punch
Stratification--Example
Cookies
No Cookies Total
35
5
17
3
52
8 60
Stratification--Example
Stratification--Example
Among those who did not eat cookies, what is the odds ratio for the association between punch and illness? Among those who did eat cookies, what is the odds ratio for the association between punch and illness? If punch is the culprit, there should be an association between punch and illness, regardless of whether anyone ate cookies
Stratification--Example
Stratification--Example
Stratification
Stratification allows us to examine two risk factors independently of each other In our cookies and punch example we can see that cookies were not really a risk factor independent of punch (stratified ORs 1) Punch remained a potential risk factor independent of cookies (large ORs and pvalues close to significant)
More on Stratification
Method of controlling for confounding using stratified analysis Takes an association, stratifies it by a potential confounder and then combines these by averaging them into one estimate that is controlled for the stratifying variable
2 stratum-specific estimates of the association between punch and illness (ORs of 4.1 and 5.4) More convenient to have only one estimatecan average two estimates into a pooled or common odds ratio
Example: if gender is an effect measure modifier, you should give 2 odds or risk ratios, 1 for men and 1 for women You identify effect measure modification by stratification (same technique used to identify confounding) but you are looking for the measure of effect to be different between the 2 or more strata
One stratum shows no association (OR 1) while another stratum does have an association No confounding third variable present, rather, need to identify and present estimates separately for each level or stratum
Among the elderly, gender is an effect modifier of the association between nutritional intake and osteoporosis
In developing countries, sanitation is an effect modifier of the association between breastfeeding and infant mortality
Nutritional intake (calcium) is associated with osteoporosis among women Among men this association is not so strong because mens bone mineral content is not affected as much by nutritional intake
In unsanitary conditions, breastfeeding has a strong effect in reducing infant mortality In cleaner conditions infant mortality is not very different between breastfed and bottle-fed infants
Matching
In case-control studies cases are matched to controls on desired characteristics In cohort studies unexposed persons are matched to exposed persons on desired characteristics
You must account for matching when analyzing matched data Important that the matched variables not be exposures of interest
Matching--Example
Hypothetical study where students in a high school have reported a strange smell and sudden illness
Test the association between smelling an unusual odor and a set of symptoms Match cases and controls on gender, grade and hallway
Precedents for outbreaks of illness related to unusual odors in buildings, possibly psychogenic (ie. illness spread by panic rather than true cause) Women are more reactive in this situation, grade level controls for age (different ages may react differently) and matching on hallway controls for actual odor observed (different locations may produce different odors)
Matching--Example
With matched case-control pairs, a 2x2 table is set up to examine pairs Table 1: Analysis of matched pairs for a case control study
Controls Exposed Cases Exposed Not Exposed Total e g e+g Not Exposed f h f+h Total e+f g+h
Cells e and h are concordant cells because the case and the control have the same exposure status Cells f and g are discordant because the case and control have a different exposure status Only the discordant cells give us useful data to contrast the exposure between cases and controls
Matching--Example
A chi-square for matched data (McNemars chi-square) can be calculated using a statistical computing program
Calculation examines discordant pairs and results in a McNemar chi-square value and p-value If the p-value <0.05, you can conclude that there is a statistically significant difference in exposure between cases and controls
Matching--Example
Cases
Matching--Example
Interpretation:
The odds of having a sudden onset of nausea, vomiting, or fainting if students smelled an unusual odor in the school were 3.0 times the odds of having a sudden onset of these symptoms if students did not smell an unusual odor in the school, controlling for gender, grade, and location in the school.
Matching
Once you have matched on a variable, you cannot use that variable as a risk factor in your analysis Cases and controls will have the exact same matched variables so they are useless as risk factors Do not match on any variable you suspect might be a risk factor
Logistic regression is a mathematical process that results in an odds ratio Logistic regression can control for numerous confounders The odds ratio produced by logistic regression is known as the adjusted odds ratio because its value has been adjusted for the confounders
Outcome variable (sick or not sick) and exposure variable (exposed or not exposed) must both be dichotomous Other variables (the confounders) can be dichotomous, categorical, or continuous
Logistic regression uses an equation called a logit function to calculate the odds ratio Using our earlier punch and cookies example, we suspect one of these food items is confounding the other Variables would be:
SICK (value is 1 if ill, 0 if not ill) PUNCH (1 if drank punch, 0 if did not drink punch) COOKIES (1 if ate cookies, 0 if did not eat cookies)
Logistic Regression--Example
Outcome = variable SICK Exposure = variable PUNCH Confounder = variable COOKIES Equation is: Logit (SICK) = PUNCH + COOKIES
Logistic Regression--Example
Computer uses the math behind logistic regression to give the results as odds ratios Each variable on the right side will have its own odds ratio
Odds ratio for PUNCH would be the odds of becoming ill if punch was consumed compared to the odds of becoming ill if punch was not consumed, controlling for COOKIES Odds ratio for COOKIES is the odds of becoming ill if cookies were consumed compared to the odds of becoming ill if cookies were not consumed, controlling for PUNCH
Each variable on the right side of the equation is controlling for all the other variables on the right side of the equation
If you are not sure whether one of several variables is a confounder, you can examine them all at the same time
Do not put too many variables in the equation (a loose rule of thumb is you can add one variable for every 25 observations) You cannot control for confounders you did not measure (Example: if a childs attendance at a particular daycare was a confounder of the SICK-PUNCH relationship, but you do not have data on childrens daycare attendance, you cannot control for it.)
Logistic regression can also account for matching in the data analysis
Known as conditional logistic regression Computer calculates odds ratios similar to McNemars test but the results are conditioned on the matching variables Can be done using Epi Info Interpretation of matched odds ratios (MORs) using conditional logistic regression is the same as interpretation of MORs calculated from tables
Logistic Regression
For many investigations you may not need to use logistic regression Logistic regression is helpful in managing confounding variables, useful with large datasets and in studies designed to establish risk factors for chronic conditions, cancer cluster investigations or other situations with numerous confounding factors Many software packages can simplify data analysis using logistic regression
Common software packages used for data analysis, including logistic regression*
SAS Cary, NC http://www.sas.com/index.html SPSS Chicago, IL http://www.spss.com/ STATA College Station, TX http://www.stata.com Epi Info Atlanta, GA http://www.cdc.gov/EpiInfo/ Episheet Boston, MA http://members.aol.com/krothman/modepi.htm (Episheet cannot do logistic regression but is useful for simpler analyses, e.g., 2x2 tables and stratified analyses.)
*This is not a comprehensive list, and UNC does not specifically
Logistic Regression--Examples
(4)
Guests complained of a diarrheal illness diagnosed as cyclosporiasis Univariate analysis (using 2x2 tables) showed eating raspberries was the exposure most strongly associated with risk for illness Multivariate logistic regression showed same results Investigators determined raspberries had not been washed
Logistic Regression--Examples
Assessing the relationship between obesity and concern about food security (5)
Washington State Dept. of Health analyzed data from the 1995-99 Behavioral Risk Factor Surveillance System A variable indicating concern about food security was analyzed using a logistic regression model with income and education as potential confounders Persons who reported being concerned about food security were more likely to be obese than those who did not report such concerns (adjusted OR = 1.29, 95% CI: 1.04-1.83)
(6)
Affected 47 people from 5 different states Case-control study carried out, controls matched by agegroup Logistic regression conducted to control for confounders Cases were more likely than controls to have eaten ground beef (MOR = 2.3, 95% CI: 0.9-5.7) and more likely to have eaten raw or undercooked ground beef (MOR = 50.9, 95% CI: 5.3-489.0) No specific contamination event identified but public health alert issued to remind consumers about safe food-handling practices
10,000 people affected in outbreak, case-control study conducted Cases were culture positive for the organism (Salmonella serotype Typhi) Using 2x2 tables, illness was associated with:
When all variables were included in conditional logistic regression, only drinking unboiled water (MOR = 9.6, 95% CI: 2.7-334.0) and obtaining water from an outside tap (MOR = 16.7, 95% CI: 2.0138.0) were significantly associated with illness
Drinking unboiled water in the 30 days before onset (MOR = 6.5, 95% CI: 3.0-24.0) Using drinking water from a tap outside the home (MOR = 9.1, 95% CI: 1.6-82.0) Eating food from a street vendor (MOR = 2.9, 95% CI: 1.4-7.2)
Routinely boiling drinking water was protective (MOR = 0.2, 95% CI: 0.05-0.5)
Conclusion
Controlling for confounding can be done using matched study design and logistic regression While complicated, with practice these methods can be as easy to use as 2x2 tables
References
1. 2. Gregg MB. Field Epidemiology. 2nd ed. New York, NY: Oxford University Press; 2002. Hecht CA, Hook EB. Rates of Down syndrome at livebirth by one-year maternal age intervals in studies with apparent close to complete ascertainment in populations of European origin: a proposed revised rate schedule for use in genetic and prenatal screening. Am J Med Genet. 1996;62:376-385. Humphrey LL, Nelson HD, Chan BKS, Nygren P, Allan J, Teutsch S. Relationship between hormone replacement therapy, socioeconomic status, and coronary heart disease. JAMA. 2003;289:45. Centers for Disease Control and Prevention. Update: Outbreaks of Cyclosporiasis -- United States, 1997. MMWR Morb Mort Wkly Rep. 1997;46:461-462. Available at: http://www.cdc.gov/mmwr/PDF/ wk/mm4621.pdf. Accessed December 12, 2006. Centers for Disease Control and Prevention. Self-reported concern about food security associated with obesity --- Washington, 1995 1999. MMWR Morb Mort Wkly Rep. 2003;52:840-842. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htm. Accessed December 12, 2006.
3.
4.
5.