ecologic studies (and confidence intervals) Victor J. Schoenbach, PhD home page Department of Epidemiology Gillings School of Global Public Health University of North Carolina at Chapel Hill www.unc.edu/epid600/
Principles of Epidemiology for Public Health (EPID600)
2 Signs from around the world In a Copenhagen airline ticket office: We take your bags and send them in all directions. 3 Signs from around the world In a Norwegian cocktail lounge: Ladies are requested not to have children in the bar. 4 Signs from around the world Rome laundry: Ladies, leave your clothes here and spend the afternoon having a good time. 5 Faster keyboarding - 1 I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy. It dn'seot mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Gary C. Ramseyer's First Internet Gallery of Statistics Jokes http://davidmlane.com/hyperstat/humorf.html (#162)
6 Faster keyboarding - 2 Most of my friends could read this with understanding and rather quickly I might add. Then I had them read a statistical bit of literature: Miittluvraae asilyans sattes an idtenossiy ctuoonr epilsle is the itternoiecsno of a panle pleralal to the xl-yapne and the sruacfe of a btiiarave nmarol dbttiisruein. Gary C. Ramseyer's First Internet Gallery of Statistics Jokes http://davidmlane.com/hyperstat/humorf.html (#162)
2/22/2011 Cross-sectional studies 7 Study designs: Cross-sectional studies, ecologic studies (and confidence intervals) Victor J. Schoenbach, PhD home page Department of Epidemiology Gillings School of Global Public Health University of North Carolina at Chapel Hill www.unc.edu/epid600/
Principles of Epidemiology for Public Health (EPID600)
10/15/2001 Cross-sectional studies 8 Today outline Cross-sectional studies (and sampling) Ecologic studies Confidence intervals 2/10/2009 Cross-sectional studies 9 Cross-sectional studies Cross-sectional studies include surveys People are studied at a point in time, without follow-up. Can combine a cross-sectional study with follow-up to create a cohort study. Can conduct repeated cross-sectional studies to measure change in a population.
2/22/2011 Cross-sectional studies 10 Cross-sectional studies Number of uninsured Americans rises to 50.7 million. (USA Today, 9/17/2010; data from Census Bureau) In 2007-2008, almost one in five children older than 5 years was obese. (Health, United States, 2010; data from the National Health and Nutrition Examination Survey) 35% (~7.4 million) of births to U.S. women during the preceding 5 years were mistimed or unwanted (2002 National Survey of Family Growth, Series 23, No. 25, Table 21) [Source: www.cdc.gov/nchs/] 2/10/2009 Cross-sectional studies 11 Cross-sectional studies Incidence information is not available from a typical cross-sectional study Sometimes can reconstruct incidence from historical information Example: the incidence proportion of quitting smoking, called the quit ratio: ex-smokers / ever-smokers is calculated from survey data. 10/15/2001 Cross-sectional studies 12 Measure prevalence at point in time Snapshot of a population, a still life Can measure attitudes, beliefs, behaviors, personal or family history, genetic factors, existing or past health conditions, or anything else that does not require follow- up to assess. The source of most of what we know about the population 2/22/2011 Cross-sectional studies 13 Population census A cross-sectional study of an entire population Provides the denominator data for many purposes (e.g., estimation of rates, assessing generalizability, projecting from smaller studies) A huge effort people can be difficult to find and to count; may not want to provide data Some countries maintain accurate and current registries of the entire country 2/22/2011 Cross-sectional studies 14 National surveys conducted by NCHS National Health Interview Survey (NHIS) household interviews National Health and Nutrition Examination Survey (NHANES) interviews and physical examinations National Survey of Family Growth (NSFG) household interviews National Health Care Survey (NHCS) medical records 2/22/2011 Cross-sectional studies 15 National surveys Designed to be representative of the entire country Modes: household interview, telephone, mail Employ complex sampling designs to optimize efficiency (tradeoff between information and cost) Logistically challenging (answering machines, cellphones, . . .) See presentation by Dr. Anjani Chandra at www.minority.unc.edu/institute/2003/materials/slides/Chandra-20030522.ppt 10/15/2001 Cross-sectional studies 16 Example: National Health Interview Survey Conducted every year in U.S. by National Center for Health Statistics (CDC) Stratified, multistaged, household survey that covers the civilian noninstitutionalized population of the United States Redesigned every decade to use new census
2/10/2009 Cross-sectional studies 17 multistaged Improves logistical feasibility and reduces costs (though reduces precision) 1. Divide population into primary sampling units (PSUs)
PSU = primary sampling unit: metropolitan statistical area, county, group of adjacent counties
2/10/2009 Cross-sectional studies 18 multistaged 2. Select sample of census block groups (SSUs) within each selected PSU 3. Map each selected census block group or examine building permits 4. Select one cluster of 4-8 housing units dispersed evenly throughout the block NCHS draws a new representative sample for each weeks interviews 10/15/2001 Cross-sectional studies 19 stratified US divided into 1,900 PSUs Largest 52 PSUs are self-representing Rest of PSUs divided into 73 categories (strata), based on socioeconomic and demographic variables Sampling takes place separately within each category (stratum)
7/30/2010 Cross-sectional studies 20 Sample size and Precision Sample size Lower 95% Point estimate Upper 95% Width 100 0.17 0.25 0.33 0.16 400 0.21 0.25 0.29 0.08 900 0.22 0.25 0.28 0.06 1600 0.23 0.25 0.27 0.04 0.25 0.188 0.43301 3/6/2006 Cross-sectional studies 21 Weighted sampling Hypothetical Unweighted Weighted Age group Pop (1,000's) Sample Sample 20-39 yrs 18,000 900 400 40-59 yrs 18,000 900 400 60-69 yrs 8,000 400 400 Total 44,000 2,200 1,200 10/15/2001 Cross-sectional studies 22 stratified Also place census blocks into categories and sample within each Oversample some strata
2/10/2009 Cross-sectional studies 23 Defined population Studies, especially cross-sectional studies, are easiest to interpret when they are based in a population that has some existence apart from the study itself (defined population) 1. Political subdivision (city, county, state) 2. Institutional (HMO, employer, profession) Probability sampling enables statistical generalizability to the defined population 2/22/2011 Cross-sectional studies 24 Surveys of sentinel populations HIV seroprevalence survey in three county STD clinics in central NC in 1988 3,000 anonymous, unlinked, leftover sera Anonymous questionnaire for demographics and risk factors [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] 10/15/2001 Cross-sectional studies 25 HIV seroprevalence [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] Group % HIV+ Homosexual men 46 Bisexual men 25 Heterosexual men 1.6 Women 0.6 Total 2.5
10/14/2003 Cross-sectional studies 26 Characteristic Gay Hetero Women Syphilis (history/current) 53 9.0 3 Gonorrhea (history) 37 2.6 1 Anal intercourse 41 1.7 2 Paid for sex 5.2 Seroprevalence (% HIV+) by risk factors [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] 10/15/2001 Cross-sectional studies 27 Interpretation Measures prevalence if incidence is our real interest, prevalence is often not a good surrogate measure Studies only survivors and stayers May be difficult to determine whether a cause came before an effect (exception: genetic factors) 10/15/2001 Cross-sectional studies 28 Other points Can choose by exposure or overall Can choose by disease may not be distinguishable from a case-control study with prevalent cases 10/15/2001 Cross-sectional studies 29 Outline Cross-sectional studies (and sampling) Ecologic studies Confidence intervals 10/15/2001 Cross-sectional studies 30 Ecologic studies Most study designs cross-sectional, case- control, cohort, intervention trials can be carried out with individuals or with groups Group-level studies which use routinely collected data are easier and less costly Group-level studies that involve interventions may not be easier or less costly 3/6/2006 Cross-sectional studies 31 Types of group-level variables Summary of individual-level variable (e.g., median household income, % with high school diploma) Property of the aggregate (e.g., neighborhood grocery stores, seat belt legislation, community competence) 2/22/2011 Cross-sectional studies 32 Interpretation Link between summary exposure variable and individual-level outcome must be inferred Inference from group to individual is not always sound Example: Male Circumcision and HIV Source: Bongaarts J, et al. The relationship between male circumcision and HIV infection in African populations. AIDS 1989; 3(6): 373-7. 2/22/2011 Cross-sectional studies 33 (Slope indicates strength of relationship; r indicates linearity) 10/15/2001 Cross-sectional studies 34 Outline Cross-sectional studies (and sampling) Ecologic studies Confidence intervals 3/8/2006 Cross-sectional studies 35 Confidence intervals Provide a plausible range for the quantity being estimated Width indicates the precision of an estimate for a given level of confidence Confidence intervals quantify only random error from sampling variation, not systematic error from nonresponse, study design, etc. 10/15/2001 Cross-sectional studies 36 Confidence level vs. precision The more vague my estimate, the more confident I can be that it includes the population parameter: I am 100% confident that the prevalence of HIV is between 0 and 100%. The more specific my estimate, the lower my confidence: I am 0% confident that the prevalence of HIV is 5.23% 10/12/2004 Cross-sectional studies 37 Confidence intervals interpretation Simple interpretations are typically not precise Precise interpretations are typically not simple 10/15/2001 Cross-sectional studies 38 Simple but imprecise There is 95% confidence that the interval contains the true value
True, but begs the question how to define confidence 10/15/2001 Cross-sectional studies 39 Simple but imprecise There is a 95% probability that the interval contains the true value
Not quite correct: probability (as conventionally defined) applies to a process, not to a single instance 3/7/2006 Cross-sectional studies 40 Probability applies to a process: example A 95% confidence interval can be viewed as a measurement or estimation process that will be correct (the interval includes the true value of the parameter) 95% of the time and incorrect 5% of the time. Let us make up another estimation process that will be correct (about) 95% of the time. 6/29/2002 Cross-sectional studies 41 Why probability applies to a process Estimate your gender by flipping a coin 5 times - if the result is 5 heads estimate your gender to be its opposite; otherwise estimate your gender to be what you think it is now. Probability that estimate will be correct is (1 Probability of 5 heads) = 0.97 = 97% Probability that estimate will be incorrect is 3% 6/29/2002 Cross-sectional studies 42 Why probability applies to a process So we now have a measurement process that will be correct 97% of the time. We will use it to measure your gender. Flip the coin 5 times, and suppose you get 5 heads Is there a 97% probability that you are of the opposite sex? 2/22/2011 Cross-sectional studies 43 Precise but not simple A 95% confidence interval is: 1. obtained by using a procedure that will include the population parameter being estimated 95% of the time 2. the set of all population values which are likely to yield a sample like the one we obtained 10/15/2001 Cross-sectional studies 44 Suppose that this line represents the value of the parameter we are trying to estimate True value 10/15/2001 Cross-sectional studies 45 Possible estimates of that parameter in N identical studies (shows sampling variation) o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o Study estimates True value 10/15/2001 Cross-sectional studies 46 One possible true value and how it would manifest, on average, in N identical studies o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 95% of the distribution True value 10/15/2001 Cross-sectional studies 47 Estimate from one study of a given size Estimate ? 10/14/2003 Cross-sectional studies 48 o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o ooooooooooooooo o o A possible true value with < 2.5% chance of being observed at or beyond the estimate 95% of the distribution Estimate ? 10/15/2001 Cross-sectional studies 49 o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oooooooooooooooo o o A possible true value with > 2.5% probability of being observed at or beyond the estimate 95% of the distribution Estimate ? 10/15/2001 Cross-sectional studies 50 o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooo A possible true value with > 2.5% probability of being observed at or beyond the estimate 95% of the distribution Estimate ? 10/15/2001 Cross-sectional studies 51 o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo oo o oooooooooooooo A possible true value with < 2.5% probability of being observed at or beyond the estimate 95% of the distribution Estimate ? 10/14/2003 Cross-sectional studies 52 o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo oo o oooooooooooooo o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooo o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o ooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o oooooooooooooooo o o What the confidence interval represents 95% confidence interval ? 10/15/2001 Cross-sectional studies 53 o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo oo o oooooooooooooo o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooo o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o ooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o oooooooooooooooo o o o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o ooooooooooooooooo o o What the confidence interval represents 95% confidence interval 3/8/2006 Cross-sectional studies 54 One possible true value and how it would manifest, on average, in N identical studies o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 1.96 x s.e. | 1.96 x s.e. True value 10/15/2001 Cross-sectional studies 55
Confidence intervals another take 10/15/2001 Cross-sectional studies 56
O One possible population 10/15/2001 Cross-sectional studies 57
O Another possible population 10/15/2001 Cross-sectional studies 58
O A 3rd possible population 10/15/2001 Cross-sectional studies 59
O A 4th possible population 10/15/2001 Cross-sectional studies 60
O A 5th possible population 10/15/2001 Cross-sectional studies 61
O A 6th possible population O O O 10/15/2001 Cross-sectional studies 62
O etc. O O O 10/15/2001 Cross-sectional studies 63
O There are 1.6 x 10 60 possible populations (no cases all cases) O O O 10/15/2001 Cross-sectional studies 64
Suppose this is the population (prevalence = 15%) O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 10/15/2001 Cross-sectional studies 65
Take a sample (n=10) O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 10/15/2001 Cross-sectional studies 66
The sample O O 10/15/2001 Cross-sectional studies 67
Make point estimate of prevalence O O 6/29/2005 Cross-sectional studies 68 Interval estimate What are all the possible populations that would be expected to yield this prevalence in a sample of size 10?
10/15/2001 Cross-sectional studies 69
O This one is not possible 3/8/2006 Cross-sectional studies 70
O Possible, but VERY UNLIKELY O 3/8/2006 Cross-sectional studies 71
O Not quite 2.5% probability (2.1%, in fact) O O O O 3/8/2006 Cross-sectional studies 72
O Yields just about 2.5% (3%, actually) probability of selecting 2 (or more) cases in 10 O O O O O 3/8/2006 Cross-sectional studies 73 One possible true value and how it would manifest, on average, in N identical studies o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 95% of the distribution True value 3/8/2006 Cross-sectional studies 74
O Just above 2.5% (actually 2.6%) probability of selecting 2 (or fewer) cases in 10 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 3/8/2006 Cross-sectional studies 75
O Just below 2.5% (actually 2.4%) probability of selecting 2 (or fewer) cases in 10 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 3/8/2006 Cross-sectional studies 76 Interval estimate for 2/10 Lower bound: 2.5% (5 cases) Upper bound: 55% (110 cases) Meaning: Our sample of 10 with 2 cases provides evidence to exclude, at conventional error tolerance, populations with fewer than 5 cases or more than 110 cases. Populations with 5-110 cannot be excluded as likely sources for this sample. 3/8/2006 Cross-sectional studies 77 Interval estimate for 2/10 Actual population prevalence was 15%, which in fact is between 2.5% and 55%. 2.5% to 55% is a very wide interval, i.e., a very imprecise estimate To make it more precise, we need a larger sample
78 Signs from around the world Germany A sign posted in Germany's Black Forest: It is strictly forbidden on our black forest camping site that people of different sex, for instance, men and women, live together in one tent unless they are married with each other for that purpose. 79 Signs from around the world Finland On the faucet in a Finnish washroom: To stop the drip, turn cock to right.
Statistics Is The Science of Conducting Studies That Collect, Organize, Summarize, Analyze, and Draw Conclusions From Data. Statistics Is Used in Almost All Fields of Human Endeavor