Anda di halaman 1dari 4

Sampling and Sample Size Computation

Grace H. Encelan-Brizuela, MD, MSPH July 29, 2010

SAMPLING METHODS IN RESEARCH


INTRODUCTION Challenge to every research protocol: It must specify a sample of subjects that: can be studied at an acceptable cost in time and money is large enough to control random error in generalizing the study findings to the population is representative enough to control systematic error in these inferences Basic terms and concepts Population complete set of people with specified set of characteristics Sample subset of the population, selected so as to be representative of the larger population (e.g., Population Medicine Class 2013; Sample 2013 Section B) Target population the large set of patients throughout the world to which the results will be generalized. Defined by clinical and demographic characteristics. Accessible population the subset of the target population that is available for the study. Defined by geographic and temporal characteristics. Reasons for sampling 1. Samples can be studied more quickly than populations 2. A study of a sample is less expensive than studying an entire population 3. A study of an entire population is impossible in most situations (e.g., sex workers) 4. Sample results are often more accurate than results based on a population (More prone to mistakes if the population is larger) 5. If samples are properly selected, probability methods can be used to estimate the error in the resulting statistics (can also generalize) 6. Samples can be selected to reduce heterogeneity (i.e., use of inclusion and exclusion criteria)
RESEARCH QUESTION (Truth in the Universe) STEP # 1 Target Populations Specify clinical and Demographic Characteristics CRITERIA Well suited to the Research Question STUDY PLAN (Truth in the Study) STEP # 3 Intended Sample Design an approach to selecting the sample CRITERIA Representative of accessible population and easy to do

Inclusion criteria define the main characteristics of the target and accessible populations (the exclusion criteria is not the opposite of the inclusion criteria and vice versa) Considerations Specifying the characteristics that define populations that are relevant to the research question and efficient for study: Demographic characteristics Clinical characteristics Examples A 5 year trial of calcium supplementation for preventing osteoporosis might specify that the subjects be: White females age 45 50 In good general health: no known life threatening disease; not taking long-term corticosteroids Patients attending the medical clinic at the investigators hospital Between Jan 1 and Dec 31, 2006

Inclusio n criteria

Target populati on Derived from Literatur e Accessib le populati on

Geographic characteristics Temporal characteristics

Establishing Exclusion Criteria Exclusion criteria indicate subsets of individuals who meet the eligibility criteria, but are likely to interfere with the quality of the data or the interpretation of the findings Considerations Specifying subsets of the population that will not be studied because of: Exclusi on criteri a A high likelihood of being lost to follow-up An inability to provide good data Ethical barriers The subjects refusal to participate Examples A 5 year trial of calcium supplementation for preventing osteoporosis might exclude subjects who are: Plan to move out of state Disoriented or having language barriers Kidney stone formers Unwilling to accept possibility of random allocation to placebo group

STEP # 2 Accessible Population Specify temporal And geographic Characteristics CRITERIA Representative of target populations and easy to study

Sampling Specification Specification Establishing Inclusion Criteria

Choosing an accessible population Clinic based samples inexpensive and easy to recruit, but selection factors that determine who comes to the hospital or clinic may have an important effect Population based samples particularly useful for guiding public health and clinical practice in the whole community, but chief disadvantage is the expense and difficulty involved 1 of 4 Page

SAMPLING 1. Probability Sampling uses a random process to guarantee that each unit of the population has a specified chance of selection if there is no accurate listing of the target population, use non-probability sampling researcher knows denominator and characteristics of the population a. Simple Random sampling Every subject has an equal probability of being selected for the study. Recommended way is to use a table of random numbers or a computer generated list of random numbers Process of enumerating every unit of the accessible population, and then selecting the sample at random Fishbowl sampling included here What are needed: Accurate listing of the population Mechanism to find and enroll those who are chosen

ADDITIONAL: Multi-Stage sampling Combination of the above-mentioned sampling methods Used in very big research studies (e.g., nationwide study) e.g., nationwide choose province choose city/municipality choose barangay, etc 2. Nonprobability Sampling Sampling method in which the probability that a subject is selected is unknown E.g., studies involving abused children/women, sex workers, etc Denominator is unknown (total population number) a. Consecutive Sampling Involves taking every patient who meets the selection criteria over a specified time interval or number of patients; it amounts to taking the complete accessible population over the duration of the study Taking an accessible population at the time of the study Usually used by residents in their research (due to time constraints)

b. Systematic sampling Involves selecting by a periodic process; starting point is chosen at random Example: get 200 samples from a population of 3400 Procedure: Number all units 1 to 3400; divide population with the number to be sampled (3400/200 = 17). Select any number from 1 to 17 to be the k. Then select every 17th subject thereafter. NOTE: should not be used when a cyclic repetition is inherent in the sampling frame. e.g., not appropriate for selecting months of the year in a study of the frequency of different types of accidents, because some accidents occur most often at certain times of the year e.g., selecting all even/odd numbers when males and females are seated alternately c. Stratified Random sampling Involves dividing the population into subgroups according to characteristics and taking a random sample from each of these strata Characteristics used to stratify should be related to the measurement of interest In Medicine, commonly used strata include: age, gender, severity of disease e.g., use of proportionate numbers in groups with differing population sizes d. Cluster sampling Process of taking a random sample of natural groupings of individuals in the population; very useful when the population is widely dispersed and it is impractical or costly to list and sample from all of its elements Clusters are commonly based on geographic areas or districts, so this approach is used more often in epidemiologic research than in clinical research e.g., Different areas [Area 48 or 81] in Brgy Dona Imelda (e.g., Different areas [Area 48 or 81] in Brgy Dona Imelda)

b. Convenience Sampling Process of taking those members of the accessible population who are easily available. Many biases, sample not homogenous

c. Judgemental Sampling Involves handpicking from the accessible population those individuals judged most appropriate for the study E.g. accreditation Snowball Sampling Referral method After recruiting one subject, ask that subject for other prospective subjects SAMPLE SIZE COMPUTATION Sample Size Factors that affect the number of subjects required for a study: 1. Whether the research design involves paired or unpaired data 2. Whether beta error is considered in addition to alpha error 3. Whether a large or small variance is anticipated in the data set 4. Whether alpha level chosen is the usual (p value 0.05) or smaller 5. Whether the desired difference between means or proportions to be detected is fairly small or extremely small Pre-test Answer with LARGE or SMALL: What sample size would be needed if the investigator wants the answer to be very close to the true value (i.e., have very narrow confidence level or a very small p value)? LARGER What sample size would be needed if anticipated variance is small? - SMALLER What sample size would be needed if the difference the investigator wants to detect is extremely small? - LARGER 2 of 4 |Page

Review of Basic Concepts and Terms Effect size difference you want to detect between one group and the other group (related to number 3) Alpha level/Significance level probability that a positive finding is due to chance alone (in medicine, set at 95% [0.05] and Z=1.96 constant) Power the probability that the effect will be detected; equivalent to beta error (in medicine, set at 80% and Z=0.84 - constant) Alpha error type I error; error of finding something when in fact there is nothing (error of rejecting the null when it is true); z-value for alpha error is 1.96; p = 0.05 means that you are allowing yourself 5% chance of committing type I error Beta error type II error; error of finding nothing when in fact there is something (error of accepting the null when it is false); z-value for beta error is 0.84 Recall (Nice to Know) t= d____ _sd__ N Where: d is the mean difference that was observed, sd is the standard error of that mean difference, and N is the sample size To solve for N, rearrangements have to be done. The formula becomes (Need to Know) N = (z)2 * (s)2 (d)2 Derivation of the Basic Sample Size Formula Formula for the Calculation of Sample Size for studies commonly pursued in Medical Research Studies using the paired t test (e.g. before and after studies) and considering alpha (Type I) error only N = (z)2 * (s)2 (d)2 Use Paired T-test if: Utilizing a before and after study (involves 1 group only) Matching was employed (according to gender, age, etc) Subjects are twins If not, data is continuous use independent T-test Study Characteris tics Type of Study Data sets Variable Standard deviation (s) Variance (s2) Data for alpha (z) Difference to be detected (d) Assumptions made by Investigator Before and after study of an antiHPN drug Pre-treatment and post-treatment observations in the same group of subjects Systolic blood pressure 15 mmHg 225 mmHg p = 0.05; therefore, 95% confidence desired (two-tailed test); Z = 1.96 10 mmHg or larger difference between pre and post-treatment blood pressure values

= (3.84)*(225) (100) = 864 = 8.64 = 9 subjects total 100 Studies using the Students t test (e.g. one experimental group and one control group) and considering alpha (Type I) error only N = (z)2 * 2 * (s)2 (d)2 Study Characteris tics Type of Study Data sets Variable Standard deviation (s) Variance (s2) Data for alpha (z) Difference to be detected (d) Assumptions made by Investigator RCT of an anti-HPN drug Observations in one experimental group and one control group Systolic blood pressure 15 mmHg 225 mmHg p = 0.05; therefore, 95% confidence desired (two-tailed test); Z = 1.96 10 mmHg or larger difference between mean blood pressure values of the experimental group and control group

N = (z)2 * 2 * (s)2 (d)2 = (1.96)2 * 2 * (15)2 (10)2 = (3.84)*2*(225) (100) = 1728 = 17.28 100 = 18 subjects per group * 2 grps = 36 subjects Studies using the Students t test and considering alpha (Type I) error and beta (Type II) errors N = (z + z )2 * 2 * (s)2 (d)2 Study Characteris tics Type of Study Data sets Variable Standard deviation (s) Variance (s2) Data for alpha (z) Data for beta (z) Difference to be detected (d) Assumptions made by Investigator RCT of an anti-HPN drug Observations in one experimental group and one control group Systolic blood pressure 15 mmHg 225 mmHg p = 0.05; therefore, 95% confidence desired (two-tailed test); Z = 1.96 20% beta error; therefore, 80% power desired (one-tailed test); Z = 0.84 10 mmHg or larger difference between mean blood pressure values of the experimental group and control group

N = (z)2 * (s)2 (d)2 = (1.96)2 * (15)2 (10)2

N = (z + z )2 * 2 * (s)2 (d)2 = (1.96+0.84)2*2* (15)2 (10)2 = (7.84)*2* (225) 3 of 4 |Page

100 = 3528 = 35.28 100 = 36 subjects per grp * 2 grps = 72 subjects Studies using a test of differences in proportions and considering alpha (Type I) error and beta (Type II) errors N = (z + z )2 * 2 * p(1 - p) (d)2 Study Characteris tics Type of Study Data sets Variable Variance, p (1-p) Data for alpha (z) Data for beta (z) Difference to be detected (d) Assumptions made by Investigator RCT of a drug to reduce the 5yr mortality in patients with a particular form of cancer Observations in one experimental group and one control group Success=5-yr survival after Tx; Failure=death within 5 yrs of Tx p=0.55;therefore, (1-p) = 0.45 p = 0.05; therefore, 95% confidence desired (two-tailed test); Z = 1.96 20% beta error; therefore, 80% power desired (one-tailed test); Z = 0.84 0.1 or larger difference bet the success (survival) of the E grp and that of the C grp)

N = (z + z )2 * 2 * p(1 - p) (d)2 = (1.96+0.84)2 * 2 * (0.55)(0.45) (0.1)2 = (7.84)*2*(0.2475) 0.01 = 3.88 = 388 0.01 = 388 subjects per grp * 2 grps = 776 Remember: N = sample size Z = z-value for alpha error = 1.96 (constant) Z = z-value for beta error = 0.84 (constant) (s)2 = variance (from literature) p = mean proportion of success (from literature) d = difference to be detected (researcher assigned)

4 of 4 |Page

Anda mungkin juga menyukai