Anda di halaman 1dari 22

Table of Contents

Chapter 1: Thinking like a scientist Chapter 2: Getting started: ideas, resources, ethics Chapter 3: defining, measuring and manipulating variables Chapter 4: descriptive methods Chapter 5: data organization and descriptive statistics Chapter 6: correlational methods and statistics Chapter 7: probability and hypothesis testing Chapter 8: introduction to inferential statistics Chapter 9: the logic of experimental design Chapter 10: inferential statistics: two group designs Chapter 11: experimental designs with more than two levels of an independent variable Chapter 12: complex experimental designs Chapter 13: quasi-experimental and single-case designs

Chapter 1 thinking like a scientist

Sources of knowledge: p.6 Superstition, intuition (couples more likely to conceive after adoptingan illusory correlation), authority (e.g. parents, actors), tenacity (repetition increases believabilityadvertising), rationalismlogical reasoning (syllogisms: A categorical syllogism consists of three parts: the major premise, the minor premise and the conclusion. Each of the premises has one term in common with the conclusion: in a major premise, this is the major term (i.e., the predicate of the conclusion); in a minor premise, it is the minor term (the subject) of the conclusion. For example: o Major premise: All men are mortal. o Minor premise: All Greeks are men. o Conclusion: All Greeks are mortal. Empiricismknowledge through observation and experiences; get a long list of observable facts; need rationalism to assemble these facts logically; Aristotle was an empiricist, while Plato was a theorist. Science: rationalism + empiricism; A hypothesis is a prediction regarding the outcome of a single study. Many hypotheses may be tested and several research studies conducted before a comprehensive theory on a topic is put forth. Publicly verifiable knowledge: research can be observed, replicated, criticized, and tested for veracity p.11 principle of falsifiability: a theory must be stated in such a way that it is possible to refute.

Scientific research has three basic goals: (1) to describe behavior, (2) to predict behavior, and (3) to explain behavior p.14 Research Methods in Science Descriptive Methods o Observational: Naturalistic observation and Laboratory observation (p.15) o Case Study Method: in-depth study of one or more individuals e.g. Piagets theory of cognitive development in children developed by simply describing the individual(s) being studied. o Survey Method: question individuals on topic(s) and then describe their responses Predictive (relational) methods: we do not systematically manipulate the variables of interest; we only measure them; since alternative explanations cant be ruled out, cannot establish causation o Correlational method: assesses the degree of relationship between two measured vars o Quasi-experimental method: differs from the experimental method in that subjects choose to be members of the different groups being studied i.e. subject/participant var cant be changed i.e. it is not a manipulated variable e.g. sorority vs. non-sorority girls (p.17) Experimental method: Controls are very important in such experiments; you control who is in the study (get a representative population), who participates in each group (control for differences in participants by random assignment between the control (baseline) group and the experimental group), and the treatment each group receives (e.g. some take Vit C and some do not). Other vars such as amt of sleep, type of diet, amt of exercise might also need to be controlled. P.19

Chapter 2: Getting Started on a Research Idea

Selecting a Problem: review past research on the problem OR review the pertinent chapter in the psychology text OR observe a problem in nature and decide how to address it Reviewing the Literature (p.33,34): A list of psychology journals is on p.32; Psych Abstracts, published by the APA, lists abstracts on a monthly basis of all published work; PsycINFO is an electronic database that provides abstracts and citations to the scholarly literature in the behavioral sciences and mental health. To help you choose appropriate keywords, use the APAs Thesaurus of Psychological Index Terms. Whereas Psych Abstracts finds articles published on a given topic within a given year, the Social Science Citation Index (SSCI) can help you to work from a given article (a key article) and see what has been published on that topic since the key article was published.p.34. PsyArticles is an online database that provides full-text articles from many psychology journals and is available through many academic libraries. ProQuest is an online database that searches both scholarly journals and popular media sources. Full-text articles are often available.p.35 Journal Article Structure (p.37) Research articles usually have five main sections: Abstract, Introduction, Method, Results, and Discussion. The Abstract is a brief description of the entire paper that typically discusses each section of the paper (Introduction, Method, Results, and Discussion). It should not exceed 120 words. The Introduction has 3 components: (1) intro to the problem (2) review of relevant previous research, and (3) purpose/rationale for study. Method section includes participants (selection processes), materials/apparatus (testing materials, equipment), and procedure (groups used in study, instructions to participants, experimental manipulation, controls etc); The Results section summarizes the data collected and the type of statistic used to analyze the data. This section should include a description of the results only, not an explanation of the results. Discussion: The results are evaluated and interpreted in the Discussion section. It typically begins with a restatement of the predictions of the study and tells whether or not the predictions were supported. Institutional Review Boards (IRBs) oversee all federally funded research involving human participants. P.44 If the participants in a study are classified as at minimal risk, then an informed consent is not mandatory. P.47 In studies where anonymity and confidentiality are at risk, an informed consent form should be used.

Chapter 3: Defining, Measuring, and Manipulating Variables (p.57)

Operational definition: defining a variable in concrete terms e.g. hunger: >12hrs w/o food intake, anxiety: galvanic skin response or HR. Purpose: communicate clearly to others and measure/manipulate vars consistently. Properties of measurement are listed at the right p.58 Scales of Measurement A nominal scale is one in which objects or individuals are assigned to categories that have no numerical properties. Nominal scales have the characteristic of identity. Such variables are categorical b/c. data is grouped into categories. E.g. ethniticity In an ordinal scale, the categories form a rank order along a continuum and have the properties of identity and magnitude but lack equal unit size (diff bet. Rank 1 and 2; and rank 2 and 3) and absolute zero. Also known as ranked data. E.g. class rank In an interval scale, the units of measurement (intervals) between the numbers on the scale are all equal in size. When you use an interval scale, the criteria of identity, magnitude, and equal unit size are met. E.g. Celsius temp scale the Fahrenheit scale does not have an absolute zero. Because of this, you cannot form ratios based on this scale (for example, 100 degrees is not twice as hot as 50 degrees). Ratio data have all four properties of measurementidentity, magnitude, equal unit size, and absolute zero. Examples of ratio scales of measurement include weight, time, and height.

Aptitude tests measure an individuals potential to do something, whereas achievement tests measure an individuals competence in an area. P.63 Behavioral measures are often referred to as observational measures because they involve observing anything that a participant does. Most physical measures, or measures of bodily activity, are not directly Observable. Reliability refers to the consistency or stability of a measuring instrument.p65 Examples of errors include trait (participant truthfulness e.g.) error and method errors (operator using equipment). In effect, a measurement is a combination of the true score and an error score. Observed score = True score + Measurement error

Reliability is measured using correlation coefficients. A correlation coefficient measures the degree of relationship between two sets of scores and can vary between -1.00 and +1.00. To establish the reliability (or consistency) of a measure, we expect a strong correlation coefficientusually in the .80s or .90sbetween the two variables or scores being measured. A positive coefficient indicates that those who scored high on the measuring instrument at one time also scored high at another time, those who scored low at one point scored low again. P.68 Types of reliability: test/retest reliability (lowered if practice effects--person can get better between testing from practice), alternate-forms reliability (diff but equivalent questions on the tests), split-half reliability, and inter-rater reliability Validity: a measure that is valid measures what it claims to measure. P.70 A systematic examination of the test content to determine whether it covers a representative sample of the domain of behaviors to be measured assesses content validity. Criterion validity: estimate present performance (concurrent validity) or to predict future performance (predictive validity). The construct validity of a test assesses the extent to which a measuring instrument accurately measures a theoretical construct or trait that it is designed to measure. p.72: A test can be reliable, but not valid, but it can never be valid without being reliable.

Chapter 4: Descriptive Methods

Descriptive methods: describe what has been observed in a group of people or animals, but dont allow one to make accurate predictions or determine cause-and-effect relationships. Five different types of descriptive methods observational methods, case studies, archival method, qualitative methods, and surveys (p.79). Observational Methods: two types, naturalistic and laboratory (systematic). Naturalistic: Ecological validity refers to the extent to which research can be generalized to real-life situations. In nonparticipant observation (goodall), there is the issue of reactivityparticipants reacting in an unnatural way to someone obviously watching them. Disguised observation mitigates this. Expectancy effects are the effect of the researchers expectations on the outcome of the study naturalistic observation has greater flexibility but less control than laboratory observation. Laboratory: also concerned with reactivity and expectancy effects. Advantage is that the situation is contrived so the likelihood that participants will perform the behavior is higher. P.83. Observational Methods: Data Collection Narrative records: full narrative descriptions of a participants behavior. E.g. piagets studies of cognitive development in children Checklists: A static item is a means of collecting data on characteristics that will not change while the observations are being made. E.g. # ppl present, age, gender; an action item, is used to record whether specific behaviors were present or absent during the observational time period. Disadvantage is missing behavior not present on the checklist. Qualitative Methods (case studies, archival, interviews/focus groups, field studies, action research): These are distinguished from observational methods as follows: researchers are typically not interested in simplifying, objectifying, or quantifying what they observe. Case Study Method: e.g. Piaget; an in-depth study of one or more individuals in the hope of revealing things that are true of all of us. Problems: atypical individual causes erroneous generalizations to population. Also, expectancy effects. P.85 Archival Method: describing data that existed before the time of the study. E.g. whether more babies are born when the moon is full. Use US census bureau etc. Risks are selection bias (cherry-pick data sources, also risk of not reliability or validity b/c. using someone elses data Interviews/focus group methods: 3 types of interviews: standardized interview (fixed questions), semistandardized, and unstandardized (unstructured) Field studies: similar to naturalistic observation; difference is that data are always collected in narrative form and left that way. P.90 text. Qualitative Method: Qualitative research focuses on phenomena that occur in natural settings, and the data are typically analyzed without the use of statistics. Both the naturalistic observational method and the case study method can be qualitative in nature. Surveys (summary table on p.102): closed-ended, open-ended, partially open-ended (closed ended Questions with an additional other option). A Likert rating scale (most psychs view it as interval, but some ordinal) presents a statement rather than a question, and respondents are asked to rate their level of agreement with the statement.p.89 A loaded question is one that includes nonneutral or emotionally laden terms (e.g. eliminating wasteful excesses). A leading question is one that sways the respondent to answer in a desired manner e.g. most people believe... A doublebarreled question asks more than one thing in a single item. Survey questions should not be randomly arranged: sensitive questions (e.g. drug/sexual use) at end, demographic questions at end b/c. boring, group Qs on similar topics

together.p.90 A socially desirable response is one that is given because participants believe it is deemed appropriate by society, rather than because it truly reflects their own views or behaviors. mail survey: less sampling bias than phone/email b/c. wide availibilty; also ppl more comfortable answering sensitive stuff; disadv: if Qs are unclear, no clarification; low response rate: 20% Sampling Techniques for Surveys: There are two ways to sample individuals from a population: probability sampling and nonprobability sampling. Probability sampling:p.95 random selection, stratified random sample (guarantee that the sample accurately represents the population on specific characteristics; cluster sampling: e.g. sample from classes that are required of all students at the university, such as English composition. Non-probability Sampling: individual members of the population do not have an equal likelihood of being selected o Convenience (haphazard) sampling: if you wanted a sample of 100 college students, you could stand outside of the library and ask people who pass by to participate o Quota sampling: Quota sampling is to nonprobability sampling what stratified random sampling is to probability sampling, but still not much effort devoted to creating a sample that is truly representative of the population

Chapter 5: Data Organization and Descriptive Statistics

In a class interval frequency distribution, individual scores are combined into categories, or intervals, and then listed along with the frequency of scores in each interval. P.106 For nominal scale or qualitative data, a bar graph (graphical representation of frequency distribution) is most appropriate e.g. democrats, independents, republicans. For quantitative data in ordinal, interval, or ratio scales, a histogram is used. P.107 Unlike in a bar graph, in a histogram, the bars touch each other to indicate that the scores on the variable represent related, increasing values. Frequency polygona line graph of the frequencies of individual scores or intervals. Again, scores (or intervals) are shown on the x-axis and frequencies on the y-axis. After all the frequencies are plotted, the data points are connected. Use with quantitative, continuous data like height, weight. Measures of Central Tendency: A measure of central tendency is a representative number that characterizes the middleness of an entire set of data. E.g. Mean, median, and mode p. 110 Mean: The mean is appropriate for interval and ratio data but not for ordinal or nominal data

For the sample mean, Median: not affected by extreme scores. Measures of Variation: A measure of central tendency provides information about the middleness of a distribution of scores but not about the width or spread of the distribution. P.114 A measure of variation indicates the degree to which scores are either clustered or spread out in a distribution. Range: The simplest measure of variation is the rangethe difference between the lowest and the highest scores in a distribution. The range is usually reported with the mean of the distribution.

Standard Deviation for a population (p.117):

Standard Deviation for a sample: Compare this to the average deviation for a population, which is given by . Note that the standard deviation will always be larger than the average deviation because the squaring of the terms gives more weight to outlying values. If, however, sample data is being used to estimate the population standard deviation, then an unbiased estimator

modification of N 1 must be used: p.126 Notice that the symbol for the unbiased estimator of the population standard deviation is s (lowercase), whereas the symbol for the sample standard deviation is S (uppercase). The estimate has N-1 in the denominator to compensate for the small samples not containing as much variability as the real population. Variance is the square of the standard deviation. Normal distributions are bell-shaped, symmetrical, and have an identical mean, median, and mode. They are unimodal; most observations are centrally clustered. Last, when standard deviations are plotted on the x-axis, the percentage of scores falling between the mean and any point on the x-axis is the same for all normal curves. (p.121) Kurtosis: how flat or peaked a normal distribution is; Platykurtic = short and wide (think: platypus = close to the ground, flat); Mesokurtic = medium height/breath; Leptokurtic = tall and thin (think: lepto = leap) In a positively skewed distribution, the peak is to the left of the center point, and the tail extends toward the right. Reason for its name: few individuals have extremely high scores that pull the distribution in that direction. Negatively skewed is just the opposite. P.122 If your disease has a low median survival rate, you would prefer a positive skewthis means some people live for a very long time post-diagnosis.

The Z-score (p.124): A z-score or standard score is a measure of how many standard deviation units an individual raw score falls from the mean of the distribution. Thus, when calculating a z-score for an individual in comparison to a

sample, we use population, we use

, while for a .

If the distribution of scores for which you are calculating transformations (z-scores) is normal (symmetrical and unimodal), then it is referred to as the standard normal distributiona normal distribution with a mean of 0 and a standard deviation of 1.p.126 The standard normal curve can also be used to determine an individuals percentile rankthe percentage of scores equal to or below the given raw score. P131

Chapter 6: Correlational Methods and Statistics

Correlational Methods: determine whether two variables are related to one another. P.148 In addition to describing a relationship, correlations allow us to make predictions from one variable to another. If two variables are correlated, we can predict from one variable to the other with a certain degree of accuracy. The magnitude or strength of a relationship is determined by the correlation coefficient describing the relationship: 0 = no correlation; 0 0.29: weak correlation; 0.3 to 0.69: moderate correlation; 0.7 to 1.0: strong correlation; 1.0 = perfect correlation. In a perfect correlation, an increase/decrease in one variable is always accompanied by an increase/decrease in the other variable. Thus, in a graph, when there is a perfect correlation, the data points all fall exactly on a straight line (the slope is irrelevant unless it is zero). Accompanying scatterplot shows no relationship. Also, it is possible for a correlation coefficient of zero to indicate a curvilinear relationship (the + and correlations nullify each other e.g. Anxiety vs. text performance, memory and age ) p.144 Misinterpreting Correlations Causality refers to the assumption that the correlation indicates a causal relationship between two variables, whereas directionality refers to the inference made with respect to the direction of a causal relationship between two variables. P.146. Third variable effects: a strong correlation between two variables is not really a meaningful relationship and is really the product of a third variable. E.g. researchers found contraceptive use strongly correlated w/. # of electrical appliances; the third var was socioeconomic status; to remove the effect of the 3rd var, use partial correlation p.148. Restrictive Range: examine the correlated vars over a very short range that isnt big enough to observe a correlation. Curvilinear relationships mask correlations Pearson product-moment correlation coefficient: Pearsons r is used for data measured on an interval or ratio scale of measurement. P.151. e.g. consider a list of 20 individuals heights and weights. Step 1: calculate the mean and S.D. for the heights and weights. Next, convert each value to a z-score. If the correlation is strong and positive, we should find that positive z-scores on one variable go with positive z-scores on the other variable and vice versa. Step 2: calculate the cross-product i.e. multiply each of the z-scores together and sum the respective products. If both zs are consistently positive or negative or positive/negative, you will end up with a large positive or negative value and a strong correlation.

The overall formula is below: General rule of thumb: at least 10 ppl per variable. An alternative, computational formula is listed on p.154. Coefficient of Determination: Calculated by squaring the correlation coefficient, the coefficient of determination (r2) is a measure of the proportion of the variance in one variable that is accounted for by another variable. R2 is typically reported as a percentage.p.154 Correlations for Nominal or Ordinal Data: Spearmans rank-order correlation coefficient: both vars are ordinal (ranking) scale. If one var is interval/ratio, it must first be converted into the ordinal scale. Point-biserial correlation coefficient: one var is a two-value dichotomous nominal (e.g. gender) and the other is interval or ratio Phi coefficient: both vars are dichotomous nominal vars. Regression Analysis:p.156 A tool that enables us to predict an individuals score on one variable based on knowing one or more other variables is regression analysis. Regression analysis involves determining the equation for the best-fitting

line for a data set. The line has the form of y = mx + b, but is written as follows: , where Y is the predicted value on the Y variable, b is the slope of the line, X represents an individuals score on the X variable, and a is

the y-intercept. To compute the slope: To compute the y-intercept: , where the bars are the respective sample means. Multiple regression analysis involves combining several predictor variables in a single regression equation to increase the predictive accuracy because in the real world, it is unlikely that one variable is affected by only one other variable.

Chapter 7: Hypothesis Testing and Inferential Statistics

Probability: multiplication and addition rule p.178 It is impossible statistically, however, to demonstrate that something is true. In fact, statistical techniques are much better at demonstrating that something is not true. Whatever the research topic, the null hypothesis always predicts that there is no difference between the groups being compared. One-tailed test p.184: E.g. Do students in after-school programs have higher IQs than those in the general population? The null and alternative hypotheses are:

Two-tailed test: e.g. the researcher just wants to prove that there are IQ differences between the two groups, but isnt concerned with the direction of those differences.

Errors: p.186

The p-value or alpha level: When a result is statistically significant at the 0.05 (or 5%) level, it means that the observed difference between the sample and the population could have occurred by chance only 5 out of every 100 times. In other words, any variation between groups is most likely due to true/real differences between them. In this case, the risk of a Type I error is 5%.

Chapter 8: Introduction to Inferential Statistics

Inferential Statistics: p.197 three teststhe z test, the t test, and the chi-square (X2) test; the z test and the t-test are used with interval or ratio data and are parametricassumptions such as knowing population mean (u) and standard deviation (sigma) are needed; the chi-square test is used with ordinal or nominal data and is non-parametric. The z-test: parametric inferential statistical test. needs population parameters such as mean and standard deviation. determines the likelihood that the sample is part of the sampling distribution. allows us to test the null hypothesis for a single sample when the population variance is known. Remember that a z-score tells us how many standard deviations above or below the mean of the distribution an individual score falls. But in the IQ problem above, we are not comparing

an individuals score to the population mean, but rather a sample mean must be compared instead with a distribution of sample means, known as the sampling distribution. Standard Error of the Mean p.198: the standard error of the mean (the standard deviation of the sampling distribution) can never be as large as , the standard deviation for the distribution of individual scores. Think about it this way: if the size of each of these samples were to approach the population size, their means would all be tightly clustered around the pop. mean and the standard deviation of the sample distribution would be very small. Thus, the central limit theorem states that for any population with mean u and standard deviation , the distribution of sample means for sample size N will have a mean of u and a standard deviation of /sqrt (N) and will approach a normal distribution as N approaches infinity. p.198 Thus,

The z-score will tell us how many standard deviation units a sample mean is from the population mean, or the likelihood that the sample is from that population. P.175 e.g. if wind up with a z = 2.06 for the one-tailed test, the zritical = 1.64 i.e. the area under the graph to the right of that is 5%. The z-value would be significant and H0 would be rejected. In APA style, report result as Z (n = 50) = 2.06, p<.05 (one-tailed) . For a two-tailed test, zcritical at an alpha of 0.05 is + or 1.96. Statistical power refers to the probability of correctly rejecting a false H0. Two ways to increase statistical power: With a one-tailed test, we are more likely to reject H0 because zobt does not have to be as large p.180 Secondly, by increasing sample size, we reduce the standard error of the mean and increase the z value. Assumptions of the z-test: distribution of sample means is normalif sample size is small N <30, the z-test may not be appropriate. P.206 In such cases or when the S.D. of the pop. Is not known, use the t-test.

Confidence Intervals based on the z-distribution p.208 This differs from the previously described z test in that we are not determining whether the sample mean differs significantly from the population mean; rather, we are estimating the population mean based on knowing the sample mean. We are usually given the sample size (e.g. N = 100), sample mean (X-bar = 86), and population standard deviation = 17. First we calculate = 1.7; next, we rearrange the z-equation as follows: to get 82.667 <= u <= 89.332 for the 95% CI. It is also possible to do hypothesis testing with confidence intervals. For example, if you construct a 95% confidence interval based on knowing a sample mean and then determine that the population mean is not in the confidence interval, the result is significant. The T-test p.209: It is similar to the z-testparametric statistical test of the null hypothesis for a single sample and determines the # of standard deviations a score is from the mean of a distribution. However, key differences from z-test: here, population variance is not known; also, t distributions, although symmetrical and bell-shaped, do not fit the standard normal distribution because the size of the samples is usually less than N = 30. Further, unlike the z distribution, of which there is only one, the t distributions are a family of symmetrical distributions that differ for each sample size. As sample size increases, the t-distribution approaches the z-distribution. Degrees of freedom equal N -1. Notice that when the degrees of freedom approaches infinity at the bottom of the table, for a one-tailed alpha of 0.05 (99% CI), the critical tscore is 1.960, which is the same as the critical z-score value!

The estimated standard error of the mean: , where . is the estimated standard error of the mean i.e. an estimate of the standard deviation of the sampling distribution based on sample data since the pop. Standard dev is not known. s, (the estimated standard deviation for a population,

based on sample data): APA style: t(9) = 2.06, p <.05 (one-tailed) p.212 The chi-square (_2) goodness-of-fit test p.216 Neither u nor sigma is known. It is a nonparametric statistical test that tests the observed frequency (the frequency with which participants fall into a category) against the expected frequency (the frequency expected in a category if the sample data represent the population). P.191 It is used with nominal or ordinal data (i.e. categorical data).

where O is the observed frequency, E is the expected frequency, and indicates that we must sum the indicated fractions for each category in the study (e.g. for the pregnant and not pregnant groups). The null hypothesis is rejected if is greater than freedom is the number of categories minus 1 . The is found in Table A.4 in Appendix A. The degrees of

Pearson r Correlation Coefficients and Statistical Significance p.218 The null hypothesis (H0) is that the true population correlation is .00the variables are not related. The alternative hypothesis (Ha) is that the observed correlation is not equal to .00the variables are related. A one-tailed test of a correlation coefficient means that we have predicted the expected direction of the correlation coefficient (i.e., predicted either a positive or negative correlation). Table A.5 in Appendix A shows critical values for both one- and two-tailed tests of r, the Pearson product moment correlation coefficient. The degrees of freedom = N -2; If the correlation coefficient of +0.33 is based on 20 pairs of observations, then the degrees of freedom are 20 - 2 =18. Summary: (in the case of the z test, u and sigma must be known; for the t test, only u is needed. For the chi-square test, neither u nor sigma is needed.

Chapter 9: The Logic of Experimental Design p.226

Experiments involve (1) random sampling to ensure a sample that is representative of the population, (2) random assignment of subjects to the control and experimental groups, (3) control what happens during the experiment so that the only differences between subjects in the two groups is the independent variable p.227 In a between-participants design, the participants in each group are different; that is, different people serve in the control and experimental groups. P.228 e.g. in experimental smoking study, smoking is the independent variable and the incidence of cancer is the dependent variable. A problem could occur if the differences in performance on the memory test resulted from the fact that, based on chance, the more educated participants made up the experimental group. To ensure that participants are equivalent at the beginning of a study, a pretest/posttest control group design (test participants before and after the experiment) could be used.p.228. The researcher needs to maximize the internal validity of the studythe extent to which the results can be attributed to the manipulation of the independent variable rather than to some confounding variable. Potential confounds are listed below. Threats to Internal Validity (summary table below is on p.236) Non-equivalent control groups: e.g. compare smokers who voluntarily signed up for cessation program vs. other smokers; the former group may be more concerned with their health; solve via random assignment

History: change in dependent var due to external circumstances; eg. Stress reduction b/c. exams at start and vacation at end of study Maturation: participants mature physically, socially, and cognitively over course of study Testing: the testing effectchange in performance due to familiarity with and practice on test items. Both + practice effect and fatigue effect Regression to the Mean: extreme scores that are the product of chance will moderate upon retesting Instrumentation effect: observer becomes better/more fatigued with taking measures Attrition/Mortality: e.g. heaviest smokers in experimental cessation group drop-out; post-test measures would be unduly optimistic Diffusion of treatment: people receive treatment info from other participants Experimenter/Participant Effects: experimenter bias or expectancy effects influence outcome e.g. clever hans the mathematical horse receiving cues from owners. Solve via single blind: either the experimenter or the participants are blind to the manipulation being made or double blind: both unaware; Participant effects include reactivitychange in behavior due to being watched. Also, placebo effect. Floor and ceiling effects: e.g. measure rat weight in poundsno change detectedfloor effect; ceiling effect measure elephant w/. 350 lb max limit bathroom scale;

Threats to External Validity Generalization to Populations: hampered by the college sophomore problem Generalizations from Lab settings: control maximized in lab settingsthe artificiality criticism; solve by conceptual replicationtest concepts via diff indep var or dep var. Correlated-Group Designs: participants in experimental and control groups are related Within-participant design: also known as repeated measures designsall participants serve in all conditions; benefit is that you need fewer participants (e.g. if there are 4 conditions and need 15 ppl per condition; then in the between-participants design, need 60 ppl, whereas only 15 for within-participant design), takes less time, and increases statistical power b/c. reduces variability due to individual differences; this mode is popular is psychological research p.240 downside; b/c. participants tested at least twice, practice/fatigue effects; solve via counterbalancingreverse the order of tasks presented to control and experimental groups; however, with three conditions, 6 possibilities, 4 conditions have 24 orderings of conditions; therefore, complete counterbalancingexposing participants to all of the orderings of conditions is not possible; also carry-over effectsdrug administered in one condition effects performance in subsequent conditions

Matched-Participants Experimental design: for each participant in one condition, there is a participant in the other condition(s) who matches him or her on some relevant variable or variables. Has advantages over the between-participant design (groups are more similar) and the within-participant design (less carryover testing effects); downsidemore people needed; also mortality effectsif one person drops out, the pair is compromised; also difficulty finding participants (p.242)

Chapter 10: Inferential StatisticsTwo-group Designs

The inferential statistics discussed in Chapter 7 compared single samples with populations (z test, t test, and test). The statistics discussed in this chapter are designed to test differences between two equivalent groups or treatment conditions.

The t Test for Independent Groups (Samples): p.251

It indicates whether the two samples perform so similarly that we conclude they are likely from the same population, or whether they perform so differently that we conclude they represent two different populations. P.227 e.g. researcher wants to determine whether spacingstudy same amt of material all at intervalsis superior to cramming. Thus,

The dependent var is participants scores on a test Statistical significance indicates that an observed difference between two descriptive statistics (such as means) is unlikely to have occurred by chance.

Rather than comparing a single sample mean to a population mean, we are comparing two sample means. To determine how far the difference between the sample means is from the difference between the population means, we convert the mean differences to standard errors.

The standard error of the difference between the means does have a logical meaning. If we took thousands of pairs of samples from these two populations and found for each pair, those differences between means would not all be the same. They would form a distribution. The mean of that distribution would be the difference between the means of the populations and its standard deviation would be . Thus,

, where . s12 and s22 are the variances of the two groups. P.252 The degrees of freedom for this independent groups t test are (n1 -1) + (n2 -1). Refer to Table A.3 for the tcritical value. APA style: t(18) =4.92, p <.05 (one-tailed). Note that the statistical power (the t value) can be increased by the following three things: Greater differences produced by the independent variable, Less variability of raw scores in each condition, Increased sample size Effect Size: Cohens d and r2

effect sizethe proportion of variance in the dependent variable that is accounted for by the manipulation of the independent variable. It is an estimate of the effect of the independent variable, regardless of sample size. P.232 For the t test, one formula for effect size, known as Cohens d, is

According to Cohen (1988, 1992), a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and a large effect size is at least 0.80. e.g. APA: t(18) = 4.92, p = .05 (one-tailed), d = 2.198 R2: the proportion of variance accounted for in the dependent variable based on knowing which treatment group the

participants were assigned to for the independent variable. Confidence Intervals: Same formula as before (Ch. 7), except that rather than using the sample mean and the standard error of the mean, we use the difference between the means and the standard error of the difference between means. p.257

T test for Correlated Groups

The same people are used in each group (a within-participants design) or different participants are matched between groups (a matched-participants design). P.260 In a correlated groups design, the sample includes two scores for each person, instead of just one. The null hypothesis is that there is no difference between the two scores. The degrees of freedom for a correlated-groups t test are equal to N 1 Step 1: We compute a difference score for each person by subtracting one score from the other for that person (or the two individuals in a matched pair).

The standard error of the difference scores differences between dependent samples.

is the standard deviation of the sampling distribution of mean

, where sD is the unbiased estimator of the standard deviation of the difference scores and N is the number of participants in each group.

Effect size: Cohens d and r2 p.262

. The r2 formula is the same as that listed above

Confidence interval: e.g. on word memorization differences between concrete and abstract words, we could answer that we are 95% confident that the difference in performance on the 20-item memory test between the two word type conditions would be between 0.96 and 4.04 words recalled correctly. Nonparametric Tests A nonparametric test does not use any population parameters, such as the mean and standard deviation. Three nonparametric tests: the Wilcoxon rank-sum test, the Wilcoxon matched-pairs signed-ranks T test (both used with ordinal data), and the chi-square test of independence, used with nominal data. P.240 Wilcoxon Rank-Sum Test: p.265 The Wilcoxon rank-sum test is similar to the independent-groups t test; however, it uses ordinal data (ranking) rather than interval-ratio data and compares medians rather than means. Interval or ratio data may be converted to ranked ordinal data. The underlying distribution is not normal. First, sum the ranks for the group expected to have the smaller total. This value needs to be equal to or less than the critical value to be statistically significant. Further, in table A.6, n1 is always the smaller of the two groups. Refer to Table A.6. Table A.6 presents the critical values for one-tailed tests only. If a two-tailed test is used, the table can be adapted by dividing the alpha level in half. n1(the number of participants in a group) is always the smaller of the two groups. Assumptions of this test: p.266 Wilcoxon Matched-Pairs T Test This is a nonparametric statistic and is necessary whenever the distribution is skewed (i.e. not normal). P.243 e.g. during the first term, the teacher measures the number of books her students read and ranks them ordinally; during the second term, a rewards program is instituted and the students are again ranked. Is there a statistically significant difference between the # of books read? The null hypothesis is that the median number of books read does not differ; the alternative hypothesis is that the median number of books read during rewards is greater. Step 1: for each student, compute a difference score (subtract books read 2nd month from those read first month); if program had no effect, would expect most scores to be close to 0. Step 2: rank the absolute values of the difference scores. If two scores at position 1 have the same numerical value, they are both ranked 1.5 and the next score gets a 3. Note that any values with a difference score of zero are not ranked and do not figure into the N value. Step 3: give the rank the sign of the difference score it represents Step 4: sum the positive and negative ranks. for a two-tailed test, Tobt is equal to the smaller of the summed ranks. In contrast, the Tobt for a one-tailed test is the sum of the signed ranks predicted to be smaller. p.268 As with the Wilcoxon rank-sum test, the obtained value needs to be equal to or less than the critical value to be statistically significant.

Chi-Square Test of Independence

This nonparametric test compares an observed frequency distribution to an expected frequency distribution of two nominal variables. P.245 The difference between the Chi-Square test of independence and the Chi-Square goodness-offit test (ch.7) is that the goodness-of-fit test compares how well an observed frequency distribution of one nominal variable fits some expected pattern of frequencies, whereas the test of independence compares how well an observed frequency distribution of two nominal variables fits some expected pattern of frequencies. The degrees of freedom for this test are equal to (r -1)(c - 1), where r is the number of rows and c is the number of columns.

Objective: determine whether babysitters are more likely to have taken first aid than those who have never worked as babysitters. To determine the expected frequency for each cell:

, where RT is the row total, CT is the column total, and N is the total number of observations. P.246 If the exceeds the , then thenull hypothesis can be rejected.

Chi-Square test and effect size: Phi Coefficient As with the t tests discussed earlier in this chapter, we can also compute the effect size for a test of independence.

. Cohens (1988) specifications for the phi coefficient indicate that a phi coefficient of .10 is a small effect, .30 is a medium effect, and .50 is a large effect. In our particular example, if the phi value is small, then the difference observed in whether a teenager had taken a first aid class is not strongly accounted for by being a babysitter. Summary: First consideration: determine whether to use either a parametric or a nonparametic statistic; if the data is not normally distributed, use nonparametric; also if certain population parameters such as mean and standard deviation are not provide, use nonparametric (Wilcoxon or Chi-square); if data is normal, use parametric, such as T-test. Second consideration: whether a between-participants or correlated-groups design has been used. P.248 A nonparametric test is one that does not involve the use of any population parameters, such as the mean and standard deviation. In addition, a nonparametric test does not assume a bell-shaped distribution. The because it fits this definition. test is nonparametric

Chapter 11: Experimental designs with More than Two Levels of an Independent Variable
The experiments described in Chapter 9 involved manipulating one independent variable with only two levels (aka treatments)either a control group and an experimental group or two experimental groups. Researchers may want more than 2 levels of an independent var b/c. they can compare multiple treatments e.g. compare placebo group w/. control/experimental groups. P.281 If group 1 is compared to group 2, 2 to 3, 3 to 4, and so on, we increase the risk of a type 1 error by where c equals the number of comparisons performed. One way of counteracting this is to use a more stringent alpha level by performing the Bonferroni adjustment, in which the desired alpha level is divided by the number of tests or comparisons. However, Type II error is increased. A better method is to use a single statistical test that compares all groupsANOVA. ANOVA is an inferential parametric statistical test for comparing the means of three or more groups that have interval or ratio data. P.286. If the data are ordinal, use Kruskal-Wallis analysis of variance for a between-subjects design; for a within-subjects design, where the data are skewed and/or ordinal, use the Friedman rank test. if data are nominal, use chi-square test. If the Fobt value is greater than the Fcrit value, the results of ANOVA indicate that at least one of the sample means differs significantly from the others. In that case, a post hoc test for comparing each of the groups in the study with each of the other groups must be conducted to determine which ones difer significanlty from each other. e.g. Tukeys HSD test. p.297 Also, see p. 296 for the assumptions of the anova (interval-ratio, normal distributed etc.)

One-way randomized ANOVA

A significant ANOVA result i.e. F-value indicates that at least one of the sample means differs significantly from the others. to determine which means differ significantly from the others, one needs to perform a post hock test (such as Tukeys HSD). p.297 Assumptions (p.296): data are interval/ratio, normally distributed, observations are independent etc. The term randomized indicates that participants are randomly assigned to conditions in a between-participants design. The term one-way indicates that the design uses only one independent variable. E.g. rote rehearsal vs. imagery, vs. story-telling on # of words recalled. This is a design with one independent var with 3 levels. The null hypothesis is . The alternative hypothesis is atleast one u not equal to another u. When a researcher rejects H0 using an ANOVA, it means that the independent variable affected the dependent variable to the extent that at least one group mean differs from the others by more than would be expected based on chance. The grand mean is the mean performance across all participants in all conditions. Since none of the participants scored the grand mean, there is variability between conditions. Is this variability due to the independent var or due to error variance--chance or uncontrolled variables such as individual differences between participants? Within-groups variance This is an estimate of the population error variance. Error variance can be ascertained by seeing the variability within each condition b/c. participants were treated similarly. Between-groups variance Systematic variance due either to the effects of the independent variable or to uncontrolled confounding vars Error variance The F-ratio

If we assume that the systematic variance is due to the effects of the independent variable, then if the independent var has a strong effect, the F-ratio will be substantially greater than one; else it will be around 1. P.264 Step 1: Sum of Squares p.291: Several types of sums of squares (SS) are used in the calculation of an ANOVA; SSwithin + SSbetween = SStotal Total sum of squares (SStotal): the sum of the squared deviations of each score from the grand mean. The sum of the variances of all the groups are added together to produce the total sum of squares value Within-groups sum of squares : , where X is each individual score, and is the mean for each group or condition. This is the sum of the squared deviations of each score from its group or condition mean and is a reflection of the amount of error variance. Between-groups sum of squares: . This is the sum of the squared deviations of each groups mean from the grand mean, multiplied by the number of participants in each group. The betweengroups variance is an indication of the systematic variance across the groups. The basic idea: if the independent var has no effect, the group means would be similar to the grand mean, and there would be little variance across conditions.

Step 2: Mean Square (MS) is the mean squared deviation that is an estimate of variance between and within the groups. MSwithin and MSbetween groups are calculated by dividing each SS by the appropriate df. Dftotal = N -1, where N is the total number of subjects in the study; dfwithin = N k, where k = # of groups; dfbetween = k 1. Note that if the dfwithin number is not present in the table at the back, use the next lowest number (because when dfvalues decrease, the critical value increases)p.294.

Step 3: Calculate the F-ratio p.293

In APA format, to say that a test with a between groups df of 2 and a within groups df of 21 has a value of 11.07 and is significant at the 0.01 level, we write: F(2,21) = 11.07, p <.01. Increasing any differences between the groups by using stronger controls increases the F-value. Also, decreasing the error variance within groups as well as increasing group size (and hence dfwithin) also increases the F-value. Effect Size From Fobt, we know that there was more variability between groups than within groups. However, it would be useful to know how much of the variability in the dependent variable can be attributed to the independent variable. In ANOVA, effect size is estimated using eta-squared:

. Since SSbetween is the differences in the means from the various levels of the independent var, and SStotal reflects the total differences between all scores in the experiment, reflects how much of the variability in the dependent variable (memory) is attributable to the manipulation of the independent variable. Tukeys Post Hoc Test p.297 Tukeys honestly significant difference (HSD) compares each of the groups in the study with each of the other groups to determine which ones differ significantly from each other. Tukeys test identifies the smallest difference between any two means that is significant with alpha = .05 or alpha = .01.

, where k = # of groups, n = # participants in each group. The value for Q is found in Appendix A.9. A difference of at least the HSD value between means is necessary to conclude that the differences between the means is greater than would be expected based on chance. If a difference is significant at 0.05, check at 0.01 as well p.298.

Correlated-Group Designs: One-Way Repeated Measures ANOVA p.299 (assumptions p.306)

A within-participants study; All correlated group designs are more statistically powerful than between participants designs. Also, can use fewer participants in a within-participants study, where the same group of participants is subject to the different conditions of the independent variable. The phrase repeated measures refers to the fact that measures are taken repeatedly on the same individuals. The single largest factor contributing to error variance (individual differences across participants) has been removed; the smaller denominator increases the F-ratio. Because there is only one group of participants, what was referred to as the between-groups sum of squares in a randomized ANOVA is now called a between-treatments, or simply a between, sum of squares. SStotal and SSbetween are computed in the same manner as for one-way ANOVA. However, what was the within-groups sum of squares in the randomized ANOVA is split into two sources of variance in the repeated measures ANOVA: participant (subject) variance, which is attributable to individual differences and error (residual) variance. Step 1: calculate the within groups sums of squares, SSwithin, identically to that for one-way ANOVA: p.301 , where X represents each score and represents each treatment mean.

Step 2: determine participant sum of squares, SSparticipants, which indicates within-groups variance due to individual differences: , where is the mean across treatments for each participant, is the grand mean, and k is the number of treatments. After the variability due to individual differences, SSparticipants, has been removed from the

within groups sum of squares, the error sum of squares is left. Thus error sum of squares, SSerror, equals

Step 3: calculate F = MSbetween/MSerror p.302 MS or mean square; dfsubjects = n -1, where n is number of subjects (p.304). dftotal = N -1, where N is the total number of scores in the study; dfparticipants = n -1, where n = # in group; dfbetween = k-1, where k is # of conditions; dferror = dfbetween X dfparticipants. In table A.8, use dfbetween and dferror to find the Fcv Effect size in the repeated measures ANOVA is calculated similarly to one-way ANOVA. P.280 Tukeys Post Hoc HSD test:

Chapter 12: Complex Experimental Designs p.316

In the previous chapter, we discussed designs with more than two levels of an independent variable. In this chapter, we will look at designs with more than one independent variablefactorial designs. P.316 A complete factorial design is one in which all levels of each independent variable are paired with all levels of every other independent variable. An incomplete factorial design, all levels are not paired with all levels of every other var. The factorial notation for a factorial design is determined as follows:

Thus, a 3 X 6 factorial design is one with two independent variables, the first one of which has 3 levels and the second one, 6 levels, for a total of 18 possible conditions. It is not possible to have a 1 X 3 factorial design. A main effect is an effect of a single independent variable. The main effect of each independent variable tells us about the relationship between that single independent variable and the dependent variable. In other words, do different levels of one independent variable bring about changes in the dependent variable? For example, in a study about the effects of different rehearsal types (rote, imagery) and different word types (concrete, abstract) on memory, the first two are the independent variables, and memory is the dependent variable. p.317 There can be as many main effects as there are independent variables. An interaction effect is the effect of each independent variable across the levels of the other independent variable. The relationship can be graphed. The dependent variable always goes on the y-axis. One independent variable is placed on the x-axis, and the levels of the other independent variable are captioned in the graph. P.294 Possible outcomes of a 2 X 2 factorial design are Main effect of A? Main Effect of B? Interaction Effect? So 2*2*2 = 8 possible outcomes (p.296). Question p.322: How many main effect(s) and interaction effect(s) are possible in a 4 X 6 factorial design? A 4 X 6 factorial design has two independent variables. Thus, there is the possibility of two main effects (one for each independent variable) and one interaction effect (the interaction between the two independent variables). Two-Way ANOVA p.323 For the factorial designs discussed in this chapter, a two-way ANOVA would be used. The term two-way indicates that there are two independent variables in the study. As with one-way ANOVA, if either of the variables has an effect, the variance between the groups should be greater than the variance within the groups. In a 2 X 2 factorial design, such as the one we have been looking at in this chapter, there are three null and alternative hypotheses. The null hypothesis for factor A states that there is no main effect for factor A, and the alternative hypothesis states that there is an effect of factor A. A second null hypothesis states that there is no main effect for factor B. The third null hypothesis states that there is no interaction of factors A and B.

Step 1: Calculate SStotal. This is calculated in the same manner as one-way ANOVA. The dftotal also is the same: N 1; Step 2: Calculate SSA. p.325 This is the sum of the squared deviation scores of each group mean for factor A minus the grand mean times the number of scores in each factor A condition (column). The definitional formula is: , where is the mean for each condition of factor A, is the grand mean, and

is the number of people in each of the factor A conditions. dfA = the number of levels of factor A minus 1. P.325. SSB is calculated similarly. Step 3: Calculate the sum of squares interaction (SSA X B): , where Xc is the mean for each condtion (cell), Xg is the grand mean, and nC is thenumber of scores in each condition or cell. The degrees of freedom for the interaction are based on the number of conditions in the study. To determine the degrees of freedom across the conditions, we multiply the degrees of freedom for the factors involved in the interaction. p.327 Step 4: Calculate sum of squares error (SSError): The sum of squares error (SSError) is the sum of the squared deviations of each score from its condition (cell) mean: . dfError is calculated as follows: the number of conditions in the study is multiplied by the number of participants in each condition minus the one score not free to vary, or AB(n 1). P.303 In the table below, A = # of conditions in A (e.g. concrete vs. abstract), B = # of conditions in B (e.g. rote vs. imagery)

To determine the Fcritical value in Table A.8, we use dferror running down the left side of the table and the dfbetween running across the top of the table. p.329 However, note that there are three dfbetween values and thus three Fcv values. For factor A, dfbetween is dfA, for factor b, dfbetween is dfB, for the interaction, dfbetween is dfinteraction. If FA is significant, this means that there was a significant main effect for factor A. Note that Tukeys Post-hoc test needs only be completed if either or both of the independent variables have more than two levels (assuming that the main effects are significant to begin with). e.g. in a 2X6 factorial design for which both main effects are signficant, post-hoc needs to be calculated only for the independent variable that has six levels to determine which pairs of these six are significant). p.331 eta-squared = SSbetween/SStotal; here SSbeween equals SSA, SSB, and SSAXB, respectively p.331

Chapter 13: Quasi-Experimental and Single-Case Designs

Non-manipulated Independent variables (aka participant vars e.g. gender, age, ethnicity, political affiliation): as with experimental studies, groups are compared and hypotheses regarding causality are tested; however ,the participants are not assigned randomly and the groups occur naturally. (p.345)

Single-group posttest-only design: involves the use of a single group of participants to whom some treatment is given. there is neither a comparison group nor a comparison of the results to any previous measurements. The single-group pretest/posttest design is an improvement over the posttest-only design in that measures are taken twicebefore the treatment and after the treatment. The single-group time-series design involves using a single group of participants, taking multiple measures over a period of time before introducing the treatment, and then continuing to take several measures after the treatment. The nonequivalent control group posttest-only design is similar to the single-group posttest-only design; however, a nonequivalent control group is added as a comparison group. Nonequivalent means that group membership is not random, but already established. Thus, the differences observed between the two groups on the dependent variable may be due to the nonequivalence of the groups and not to the treatment.P.323. An improvement over the previous design involves the addition of a pretest measure, making it a nonequivalent control group pretest/posttest design. a pretest allows us to assess whether the groups are equivalent on the dependent measure before the treatment is given to the experimental group. The logical extension of the previous design is to take more than one pretest and posttest. In a multiple-group time-series design, several measures are taken on nonequivalent groups before and after treatment. Internal validity is the extent to which the results of an experiment can be attributed to the manipulation of the independent variable, rather than to some confounding variable. Thus, quasi-experimental designs lack internal validity. p.325 Statistical Analysis: Depending on the type of data (nominal, ordinal, or interval-ratio), the number of levels of the independent variable, the number of independent variables, and whether the design is between-participants or within-participants, we choose the appropriate statistic as we did for the experimental designs. Cross-sectional Designs p.352 Researchers study individuals of different ages at the same time. The advantage of this design is that a wide variety of ages can be studied in a short period of time. The main issue is that the researcher is typically attempting to determine whether or not there are differences across different ages; however, the reality of the design is such that the researcher tests not only individuals of different ages but also individuals who were born at different times and raised in different generations or cohorts, so rather than testing age differences, may be testing generational differences. Longitudinal Design With a longitudinal design, the same participants are studied repeatedly over a period of time. Disadvantage: people who attrition may differ from those who remain in the study. Sequential Designs a researcher begins with participants of different ages (a cross-sectional design) and tests or measures them. Then, either a number of months or years later, the researcher retests or measures the same individuals (a longitudinal design). P.352 Single Case Research: versions of a within-participants experiment in which only one person is measured repeatedly. Often the research is replicated on one or two other participants. Thus, we sometimes refer to these studies as small-n designs. A reversal design is a within-participants design with only one participant in which the independent variable is introduced and removed one or more times. o An ABA reversal design involves taking baseline measures (A), introducing the independent variable (B) and measuring behavior again, and then removing the independent variable and retaking the baseline measures (A). the reversal controls for confounds that may be changing the dependent variable. o The ABAB reversal design involves reintroducing the independent variable after the second baseline measurement. Multiple-baseline designs: Because single-case designs are a type of within-participants design, carryover effects from one condition to another are of concern.

Multiple Baselines across participants: So, here we assess the effect of introducing the treatment over multiple participants, behaviors, or situations. We control for confounds not by reversing back to baseline after treatment, as in a reversal design, but by introducing the treatment at different times across different people, behaviors, or situations. P.331 This eliminates the possibility that some other extraneous variable produced the results. Multiple baselines across behaviors: An alternative multiple-baseline design uses only one participant and assesses the effects of introducing a treatment over several behaviors. E.g. first introduce treatment for aggressive behaviors, then days later, for talking out of turn, then days later for temper tantrums Multiple baselines across situations: introduce treatment across different situations. E.g. treat first for bad behavior in math class, then days later, for bad behavior in English class. Introducing the treatment at different times in the two classes minimizes the possibility that a confounding variable is responsible for the behavior change.