Anda di halaman 1dari 10

CHAPTER 9 - matching is used to create groups that

Design are equivalent on potentially

 an experiment details an experimenter’s plan for confounding subject variables.
testing a hypothesis. - successful matching prevents selection
 the experiment’s structure or floor plan—not the threat from undermining internal
experiment’s specific content. validity.
 we can use the same design to investigate different - use a two matched groups design when
hypotheses. there are two levels of an independent
variable and there is an extraneous
The experimental design is largely determined by the variable we can measure that could
experimental hypothesis.
affect the dependent variable.
A researcher mainly selects an experimental design on the basis
of three factors:
1. the number of independent variables in the hypothesis Random assignment
2. the number of treatment conditions needed to fairly test the  involves assigning subjects to conditions so that each
hypothesis subject has an equal chance of participating in each
3.whether the same subjects are used in each of the treatment condition.
conditions  we use random assignment to equally distribute subject
variables between the treatment groups to prevent them
In a between-subjects design, a subject participates in only one from confounding an experiment.
condition of the experiment.
Experimental condition presents a value of the independent
The representativeness of our sample determines whether we variable.
can generalize our results to the entire population from which Control condition presents a zero level of the independent
the sample was drawn. variable.

Random sampling increases an experiment’s external Experimental Group-Control Group design

validity.  this is a two independent groups design.
 the experimental group receives a level of the IV and
You should have at least 10-20 subjects in each treatment the control group receives the same procedures, but
condition to detect a strong treatment effect. receives no treatment.

Fewer subjects in each condition risks not detecting the effect Two-Experimental Groups design
of the IV on the DV.  in a two experimental groups design, we assign
subjects to one of two levels of the independent
Effect size variable.
 a statistical estimate of the size or magnitude of a  this design is appropriate if there is one independent
treatment effect. variable with two levels and if we can assume that
 determines the number of subjects required to detect a randomization will control extraneous variables.
treatment effect.
What limits the effectiveness of randomly assigning subjects
The larger the effect size, the stronger the relationship between to different conditions?
the independent and dependent variables, and the fewer
subjects needed to detect a treatment effect. Random assignment works poorly with 5-10 subjects per
Researchers determine the number of subjects required for Since people often differ on many subject variables that could
an expected effect size using power charts or programs that potentially confound your experiment, random assignment may
incorporate these charts. not control all of them.
Two group design What must we measure to form matched groups?
 involves the creation of two separate groups of
subjects. We need to measure an extraneous variable, strongly correlated
 two independent groups design with the dependent variable, that could confound our results if
- a design where there is one IV with two not controlled.
levels and subjects are randomly
assigned to one of the two conditions. Multiple groups design
- this design includes the Experimental  a between-subjects design with more than two levels
Group-Control Group design and Two- of an independent variable.
Experimental Groups design.
- this design is appropriate if there is one
independent variable with two levels Multiple independent groups design
and if we can assume that  we randomly assign subjects to one of the treatment
randomization will control extraneous conditions.
 two matched groups design Block randomization is a process for randomly assigning
- 1. match participants on a subject equal numbers of subjects to conditions.
variable correlated with the DV
- 2. randomly assign them to one of two
treatment conditions
The experimenter creates random sequences of each A higher-order interaction is an interaction among three or
experimental condition and subjects are randomly assigned to more IVs.
fill each treatment block.
Interpretation can be difficult when more than
How should a researcher choose the number of treatments? three IVs interact in an experiment.

The hypothesis, prior research, pilot study results, Example:

and practical limits can all help determine the number of A previous hypothetical study examined the effect of a
treatments. perpetrator’s gender (male or female), relationship to the child
(parent, step-parent, or parent’s partner), and severity of the
A researcher needs to answer the question: “What will abuse (neurological damage, broken bones, or bruising) on
I gain by adding these extra conditions to the experiment?” sentencing.

A pilot study can reveal whether: There would be a higher-order interaction if the
▪ you have allocated sufficient time perpetrator’s gender, relationship to the child, and severity of
▪ your instructions are clear abuse jointly determined sentence length.
▪ your deception worked
▪ you need additional treatment conditions How many interactions are possible in a study with three
Assign letters (A, B, C) to the independent variables. Identify
Factor is an independent variable. all unique two-and three treatment combinations.

A factorial design contains more than one independent For three independent variables, these include AB, AC, BC, and
variable. ABC. ABC is the higher-order interaction.

A two factor experiment is the simplest factorial design. How does an interaction affect the interpretation of our
A factorial design can provide information about both treatment
and interaction effects. An interaction qualifies a main effect, warning us that there
may be limits or exceptions to the effect of an IV on the DV.
A main effect is the action of a single IV on the DV.
When there is an interaction, we must consider both IVs,
There can be as many main effects as independent variables. because the effects of one factor will depend on the levels of the
other factor.
An experimenter studies the effects of exercise The factor-labeling method lists the two factors in
intensity (IV1 ) and duration (IV2 ) on depression (DV). parentheses after the numerical notation.
If exercise intensity or duration separately reduced
depression, these would constitute main effects. For example, 2 x 2 (Type of Name x Length of Name).

How do we determine whether we have main effects in our This method lists the two factors and their respective
experiment? levels after the numerical notation.
Perform an appropriate statistical test.
For example, 2 x 2 (Type of Name: given, nickname x
In a 2 x 3 x 3 study, how many IVs and treatment conditions Length of Name: short, long).
are there?
There are 3 independent variables and 18 treatment conditions. The factor and levels method provides more detailed
information about the design than the factor-labeling method.
The independent variables were the perpetrator’s Why use a factorial design instead of two separate
gender (male or female), relationship to the child (parent, step- univariate experiments?
parent, or parent’s partner), and severity of the abuse
(neurological damage, broken bones, or bruising). A factorial design is more efficient since it combines
several one-factor experiments and allows us to study
The dependent variable was sentence length. interactions.

An interaction is the joint effect of two or more IVs A factorial design can achieve greater external validity
on the DV. since it can better recreate the complexity of the multivariate
When there is an interaction, the effect of one IV is
different across levels of the other IV Why should we keep between-subjects designs simple?

Example: Practical limitations include:

If the antidepressant Paxil produced greater reductions ▪ number of subjects
in depression in the Cognitive Behavior Therapy (CBT) ▪ time
condition than the Waiting List condition, this would illustrate ▪ interpretability of results
an interaction between drug and psychotherapy.
Describing Main Effects and Interactions

• In a 2x2 design there are THREE possible effects

– A main effect of IV(A)
– A main effect of IV(B)
– A IV(A) x IV(B) interaction

• You need to describe each in English

No Main Effect of Word Type

No Main Effect of Rehearsal Type
No Interaction

Main Effect of Word Type

Main Effect of Rehearsal Type
No Interaction

Main Effect of Word Type (line is on a diagonal)

No Main Effect of Rehearsal Type
No Interaction

Main Effect of Word Type

Main Effect of Rehearsal Type
Interaction (lines are not parallel)

No Main Effect of Word Type

Main Effect of Rehearsal Type (space between lines)
No Interaction

Main Effect of Word Type

No Main Effect of Rehearsal Type
Statistical power is desirable when it allows us to
detect practically significant differences between the
experimental conditions.

Theoretically, there is a point of diminishing returns

where excessive power detects meaningless differences
between treatment conditions.

For example, in a study of treatments to lower blood

pressure, a difference of 0.1 mm Hg— while statistically
significant—would not affect patient health or life expectancy.

Why do we call this approach a repeated-measures design?

In a within-subjects experiment, researchers measure subjects

on the dependent variable after each treatment

Basic principles of a within-subjects design

No Main Effect of Word Type
Main Effect of Rehearsal Type Subjects participate in more than one treatment
Interaction condition and serve as their own control.

We compare their performance on the dependent

variable across conditions to determine whether there is a
treatment effect.

A within-subjects factorial design assigns subjects

to all levels of two or more independent variables.

A mixed design is an experiment where there is at least

one between-subjects and one within-subjects variable.

What are the advantages of within-subjects designs?

▪ use fewer subjects
▪ save time on training
▪ greater statistical power
▪ more complete record of subjects’ performance
No Main Effect of Word Type
No Main Effect of Rehearsal Type Disadvantages:
Interaction ▪ subjects participate longer
▪ resetting equipment may consume time
▪ treatment conditions may interfere with each other
▪ treatment order may confound results

We cannot use a within-subjects design when one

treatment condition precludes another due to interference.

Order effects are positive (practice) and negative

(fatigue) performance changes due to a condition’s position in
a series of treatments.

The term, progressive error, encompasses both

positive and negative order effects.

Counterbalancing is a method of controlling order

effects by distributing progressive error across different
treatment conditions.

Two major counterbalancing strategies are:

Subject-by-subject counterbalancing, which
In a within-subjects experiment, subjects are controls progressive error for each subject.
assigned to more than one treatment condition.
Across-subjects counterbalancing, which
Power is an experiment’s ability to detect the distributes progressive error across all subjects.
independent variable’s effect on the dependent variable.
A fatigue effect is form of progressive error where
performance declines on the DV due to tiredness, boredom, or Partial counterbalancing is a form of across-subjects
irritation. counterbalancing, where we present only some of the possible
(N!) orders.
Subject performance on the dependent variable may
improve across the conditions of a within-subjects experiment Two partial counterbalancing techniques are
and these positive changes are called practice effects. randomized partial and Latin square counterbalancing.

Practice effects may be due to relaxation, increased A within-subjects design is usually preferable when
familiarity with the equipment or task, development of you need to control large individual differences or have a
problem-solving strategies, or discovery of the purpose of small number of subjects.
the experiment.
However, it may not be feasible if the experiment is
We cannot eliminate order effects because there is an long or there is a risk of asymmetrical carryover.
order as soon as we present two or more treatments.

Holding order constant - always assigning subjects to CHAPTER 12

the sequence ABC - would confound the experiment.
A large N design compares the performance of groups
Subject-by-subject counterbalancing controls of subjects.
progressive error for each subject by presenting all treatment
conditions more than once. A small N design studies one or two subjects, often
using variations of the ABA reversal design.
Two subject-by-subject counterbalancing techniques
are reverse counterbalancing and block randomization. Aggregate effects are the pooled findings from many
In reverse counterbalancing, we administer
treatments twice in a mirror-image sequence, for example, They argue that large N studies ignore individual
ABBA subject responses to the IV and instead report aggregate results
or trends.
When progressive error is linear, it progressively
changes across the experiment so that A and B have the same When subjects vary greatly in their response to the IV,
amount of progressive error. this can create the appearance of no difference between the
Nonlinear progressive error, which can be curvilinear
(inverted-U) or non-omonotonic (changes direction), cannot be A clinical psychologist could use a small N design to
graphed as a straight line. test a treatment when there are insufficient subjects to conduct
a large N study and when s/he wants to avoid the ethical problem
Reverse counterbalancing only controls for linear of an untreated control group.
progressive error.
Animal researchers prefer small N designs to minimize
When progressive error increases in a straight line, this the acquisition and maintenance cost, training time, and possible
method actually confounds the experiment sacrifice of their animal subjects.
Block randomization is a subject-by-subject
counterbalancing technique where researchers assign each Sir Ronald Fisher’s (1935) creation of the analysis of
subject to several complete blocks of treatments. variance allowed inferential testing of large N data

A block consists of a random sequence of all Small N designs have been most extensively used in
treatments, so that each block presents the treatments in a operant conditioning research. B. F. Skinner examined the
different order. continuous behavior of individual subjects in preference to
analyzing discrete measurements from separate groups of
Since subject-by-subject counterbalancing presents subjects
each treatment several times, this can result in long-duration,
expensive, or boring procedures. Single-subject research is a group of research
methods that are used extensively in the experimental analysis
This problem is compounded as the experimenter of behavior and applied behavior analysis with both human
increases the number of treatments. and non-human participants.

Across-subjects counterbalancing techniques An A-B design is a two part or phase design composed
present each treatment once and controls progressive error by of a baseline ("A" phase) with no changes, and a treatment or
distributing it across all subjects intervention ("B") phase.

Two techniques are complete and partial However, many interventions cannot be reversed,
counterbalancing. some for ethical reasons (e.g., involving self-injurious behavior,
smoking) and some for practical reasons (they cannot be
Complete counterbalancing uses all possible unlearned, like a skill)
treatment sequences an equal number of times.
In both large and small N designs, baselines are
Researchers randomly assign each subject to one of control conditions that allow us to measure behavior without
these sequences. the influence of the IV.
A discrete trials design is a small N design without
How did Kazdin explain the decision of many clinical baselines used in psychophysical research.
researchers to end without a return to baseline? Instead, the impact of different levels of the
independent variable is averaged across 100s to 1000s of trials
It would be ethically indefensible to cause a patient to
relapse by returning to baseline after treatment appeared to A discrete trials design has no baselines and
improve behavior. administers the levels of the independent variable 100s to 1000s
of times to each subject.
When is this most important?
The large number of data points produced by 100s to
When relapse threatens the health or safety of the 1000s of trials provides a very reliable measurement of the
patient or others, as in self-injurious, and suicidal or homicidal effect of the independent variable.
The similarity of human sensory systems allows
What price do researchers pay when they can't return to researchers to generalize from a small number of subjects.
When is a small N design appropriate?
They can’t rule out the possibility that the patient’s
clinical improvement was caused by an extraneous variable. When studying a clinical subject (a self-injurious
child) or when very few subjects are available.
In a multiple baseline design, a series of baselines and
treatments are compared within the same subject, and once A large N design would be desirable when we have
treatments are administered, they are not withdrawn. sufficient subjects and want to increase generalizability.

This approach could also be used to evaluate the effect The generalizability of a large N study depends on
of a treatment administered to different individuals after how we select our sample since a seriously biased sample will
baselines of different lengths. not represent the population.

A researcher can evaluate the effects of a treatment on When would we prefer a large N design?
two or more behaviors or on the same behavior in different
settings. The generalizability of a small N study depends on
repeated successful replications with different subjects
The A-B-A-B design represents an attempt to measure
a baseline (the first A), a treatment measurement (the first If a large N study’s sample is biased, we will be
B), the withdrawal of treatment (the second A), and the re- unable to generalize its findings to a larger population. Also, if
introduction of treatment (the second B). it is poorly controlled, there will be no valid findings to
In other words, the A-B-A-B design involves two
parts: (1) gathering of baseline information, the application In contrast, a well-controlled small N experiment
of a treatment and measurement of the effects this using a single subject might be successfully replicated across
treatment; and (2) measurement of a return to baseline or sufficient subjects to generalize its results to the population
what happens when the treatment is removed and then from which they were drawn
again applying the treatment and measuring the change.
In a multiple baseline design, an experimenter never
withdraws treatments after administering them. Statistics are quantitative measurements of samples

Researchers often visually inspect changes in the Descriptive statistics describe sample central tendency and
dependent variable across treatment conditions. The variability.
independent variable’s effect is often apparent.
Inferential statistics allow us to draw conclusions about a
They may also use statistics to analyze small N data. parent population from a sample.

Critics are concerned about generalizing from a single What point does the Ms. Adams story make about
subject to a population. Unless 50 measurements are taken evaluating experimental data?
during each baseline and treatment phase, important
assumptions underlying inferential tests may be violated. Just as Detective Katz can at best show that Ms. Adams
is probably guilty, in statistics we can only state that the
In changing criterion designs, the criteria for independent variable probably affected the dependent variable
reinforcement are incrementally increased as participants
succeed. While we cannot prove that the independent variable
definitely caused the change in the dependent variable, we can
For example, initially, a subject might receive a reward state the probability that our conclusion is correct.
for 30 minutes of daily exercise, later, for 45 minutes, and
finally, for 60 minutes. A population is a set of people, animals, or objects that
share at least one characteristic in common (like college
Reinforcement for successive approximations of the sophomores).
target behavior is central to athletic training, behavior
modification, and biofeedback and neurofeedback A sample is a subset of the population that we use to
draw inferences about the population.
The experimental group and control group will achieve
Statistical inference is the process by which we make different systolic blood pressure reductions.
statements about a parent population based on a sample.
The significance level (alpha) is our criterion for
The differences in scores obtained from separate deciding whether to accept or reject the null hypothesis.
treatment groups are not significantly greater than what we
might expect between any samples randomly drawn from this Psychologists do not use a significance level larger
population. than .05.

When researchers report this outcome, it means that A significance level of .05 means that a pattern of
were was no treatment effect. results is so unlikely that it could have occurred by chance fewer
than 5 times out of 100.
For a set of dependent variable measurements, there is
variability when the scores are different. A Type 1 error (a) is rejecting the null hypothesis
when it is correct. The experimenter determines the risk of a
Variability “spreads out” a sample of scores drawn Type 1 error by selecting the alpha level.
from a population.
A Type 2 error (b) is accepting the null hypothesis
The null hypothesis (Ho) is the statement that the when it is false.
scores came from the same population and the independent
variable did not significantly affect the dependent variable An American Psychological Association task force
recommended that researchers include estimates of effect size
Results are statistically significant when the and confidence intervals, in addition to p values.
difference between our treatment groups exceeds the normal
variability of scores on the dependent variable. When you calculate a p value that is statistically
significant, this means that your results are unlikely to be due to
Statistical significance means that there is a chance (are probably real).
treatment effect at an alpha level we have preselected, like .01
or .05. Effect size estimates the strength of the association
between the independent and dependent variable—the
The alternative hypothesis (H1 ) is the statement that percentage of the variability in the dependent variable is due to
the scores came from different populations the independent the independent variable.
variable significantly affected the dependent variable
A confidence interval is a range of values above and
We may reject the null hypothesis when the below a sample mean that is likely to contain the population
differences between treatment groups exceed the normal mean (usually 95% or 99% of the time).
variability in the dependent variable at our chosen level of
significance A critical region is a region of the distribution of a
test statistic sufficiently extreme to reject the null hypothesis.
The frequency distribution displays the number of
individuals contributing a specific value of the dependent For example, if our criterion is the .05 level, the critical
variable in a sample. region consists of the most extreme 5% of the distribution.

The values of the dependent variable are indicated on To reject the null hypothesis, the test statistic would
the horizontal X-axis (abscissa) and the frequencies of these have to fall within the shaded critical region.
values are indicated on the vertical Y-axis (ordinate).
A one-tailed test has a critical region at one tail of the
You can calculate the total number of participants by distribution. We use a one-tailed test with a directional
adding the frequencies. hypothesis.

The decision to accept or reject the null hypothesis A two-tailed test has two critical regions, found at
depends on whether the differences we measure between opposite ends of the distribution. We use a two-tailed test with
treatment groups are significantly greater than the normal a nondirectional hypothesis
variability among people in the population
Inferential statistics allow us to predict the behavior
The greater the normal variability in the population, the of a population from a sample.
larger the difference between groups required to reject the
null hypothesis. Examples of inferential statistics are the t test and F
A directional hypothesis predicts the “direction” of
the difference between two groups on the dependent variable. CHAPTER 14

For example: A nominal scale assigns items to two or more distinct

The experimental group will lower their systolic blood categories that can be named using a shared feature, but does
pressure more than the control group. not quantify the items.

A nondirectional hypothesis predicts that the two Example: you can sort pictures into attractive and
groups will have different values on the dependent variable. unattractive categories.
For example:
An ordinal scale measures the magnitude of the DV
using ranks. Calculate an effect size for a t test for independent groups.

This scale allows us to make statements about First, we calculate the t statistic (2.47) and then we
contestants’ relative speed. enter it into the following formula:

Example: marathon contestants are assigned to places

from first place to last place

An interval scale measures the magnitude of the DV

using equal intervals between values with no absolute zero

Example: Fahrenheit or Centigrade temperatures, and

Sarnoff and Zimbardo’s 0-100 scale.

A ratio scale measures the magnitude of the DV using

equal intervals between values and an absolute zero. An r value of .50 is a large effect.

This scale allows us to state that 2 meters are twice as If we square r , which is .66, this reveals that fun
long as 1 meter. accounts for 44% of the variance in the subjects’ time estimates.

Example: distance in meters or time in seconds. A t test for matched groups either assigns the same
subjects to both conditions or matches subjects and then
Nonparametric tests use nominal or ordinal data. randomly assigns them to either condition.

Parametric tests require interval or ratio data. A t test for matched groups may use fewer subjects and
achieve greater control over individual differences than a t test
When should we use the chi square test? for independent groups.

When the data are nominal and the groups are This makes a t test for matched groups potentially more
independent, which means the experimenter assigns different powerful.
subjects to them.
We use an analysis of variance when data are interval
The chi square test determines whether the frequency or ratio level and there is at least one independent variable
of sample responses represents the frequencies we would expect with three or more levels.
in the population.
Within-groups variability is the degree to which the
The X obt is the actual frequency of responses. scores of subjects in the same treatment group differ from each
The critical value is the minimum value required to
reject the null hypothesis. Within-groups variability consists of error due to
individual differences and extraneous variables.
Cramer’s coefficient Φ is analogous to r2 and indexes
the degree of association between priming and the number of Between-groups variability is the degree to which the
incorrect responses. scores of different treatment groups differ from one another or
the grand mean.
If our sample included every member of the
population, we would have the maximum possible degrees of Between-groups variability consists of error due to
freedom and would know the exact population values of the individual differences and extraneous variables and treatment
mean and variance. effects.

If X2 obt > X2 crit, reject the null hypothesis. What does it mean when an F ratio is statistically
The sample size determines the degrees of freedom.
Across all group means, there is a significant
There is a different t distribution for each value of difference due to the independent variable.
degrees of freedom.
When F obtained > F critical, reject Ho
The t distribution approaches a normal curve as
sample size increases. When an overall ANOVA is significant and you have
made no specific predictions, you may perform post hoc tests
What does robustness mean? on all pairs of treatment groups.

The t test provides a valid test of the hypothesis when We may use a priori tests to test predictions of
assumptions like normal distribution of population values are differences between groups, such as between two groups or
slightly to moderately violated. between one group and the others.

We reject the null hypothesis when tobt > tcrit. For 9 The maximum number of comparisons = p – 1, where
df, if tobt > 2.262, we would reject the null hypothesis. p is the number of treatment groups.
The samples used in psychological research are often
A priori tests are more powerful than post hoc tests; but biased and may not represent the larger population.
you may perform fewer a priori tests.
The samples may not always represent even college
Effect size measured by η2 is the proportion of the sophomores since we heavily depend on volunteers.
variability in the dependent variable that can be accounted for Experimental variables like anger may have multiple
by the independent variable. operational definitions.

η 2 indexes the strength of the relationship between the When we generalize from our experimental results, we
independent and dependent variables. move from discussing our specific operational definition of
anger to discussing the concept of anger itself.
It is dangerous to generalize from a single experiment’s
An experiment is internally valid when the effects on operational definition of anger.
the dependent variable are due to the independent variable.
We cannot be sure of the reliability or validity of our
An internally valid experiment is free of confounding. procedures.

A manipulation check evaluates how well the A study achieves research significance when its
experimenter manipulated the experimental situation. findings clarify or extend knowledge gained from previous
studies and raise implications for broader theoretical issues.
A manipulation check determines whether subjects
followed directions and were appropriately affected by our We should question novel findings when they
treatments. contradict prior findings that have been successfully replicated.

What did Orne (1969) mean by a pact of ignorance? The burden of proof is on the experimenter who claims
novel findings to explain this discrepancy.
Subjects expect their data to be discarded if they guess
the experimental hypothesis, and don’t volunteer this We want to generalize beyond the laboratory to
information to the experimenter. increase the external validity of our findings

Experimenters don’t want to test additional subjects Since extraneous variables are uncontrolled in real
and may take subject reports at face value. world setting and operate in complex combinations, they can
modify the influence of our individual variables.
Debrief subjects after the experiment and convey that
you want to know if they guessed the hypothesis. The trade-off is between the laboratory’s more precise
control of extraneous variables and the field experiment’s
Provide incentives for guessing the hypothesis. greater realism and external validity.

Selecting the wrong statistical test Hanson (1980) found that more laboratory than field
Using a t-test to analyze ordinal data. studies reported a positive correlation between reported
attitudes and behavior.
Improperly using a statistical test
Calculating multiple t-tests. We can’t confirm external validity until additional
studies are completed in field settings.
Drawing the wrong conclusions from the test Reporting p =
.07 as a trend. Researchers can increase and verify the external
validity of laboratory findings using aggregation, multivariate
An experiment is externally valid when its findings designs, nonreactive measurements, field experiments, and
can be extended to other situations and populations. naturalistic observation.

What two requirements must an externally valid study Aggregation is the grouping together and averaging of
satisfy? data to increase external validity.

1. The experiment must be internally valid. Combining the results of experiments with different
2. The experimental findings can be replicated subjects and methodologies increases the generality and
external validity of our findings.
What does it mean to generalize across subjects? Why is this
important? Meta-analysis uses statistical analysis to combine and
quantify data from many comparable experiments to calculate
The findings can be extended to a larger group than our an average effect size.
Aggregation establishes external validity by
Generalizing across subjects is critical to the external combining the results of experiments performed using different
validity and usefulness of experimental findings. subjects, stimuli and/or situations, trials or occasions, and
Which problems prevent us from generalizing across
subjects? A multivariate design studies multiple DVs.
For example, a study of repetitive strain places a Experimenter effects. Prophesying a difference caused
computer keyboard at different distances from the subject IV) research assistants to create an effect, and this could be done
and measures the effect on three different muscle groups (3 equally in either
Observer effects. Much as for experimenter effects.
Multivariate designs allow us to study the effect of an
independent variable on combinations of dependent variables. Trial effects. As for experimenter effects, but with the emphasis
on the activity not the person of the experimenter.
These designs better simulate the complexity of the
real world than univariate designs and provide more detailed Research participation effects: perhaps a more general label
information for observer, experimenter, trial, and novelty effects.

We analyze multivariate experiments with a Demand characteristics: Participants of an experiment or

multivariate analysis of variance (MANOVA). interview provide responses and act in ways they believe are
How should researchers handle a nonsignificant outcome?

Accept the outcome, don’t reframe your result as

“almost significant.”

Examine the experimental procedures for design flaws.

If the design appears sound, decide whether the

hypothesis was reasonable

How should we handle the possibility of faulty procedures?

Check for possible causes of a nonsignificant outcome like:

1. confounding
2. extraneous variables that increase within-subjects variability
3. weak manipulation of the IV
4. inconsistent or flawed procedures
5. ceiling and floor effects
6. insufficient power

If previous studies supported the hypothesis and ours

did not, look for differences in experimental design or sample.

If there was no previous support and our design and

execution were good, we may have to revise or discard our


The placebo effect: Patients experience treatment effects based

on their belief that a treatment will work

Hawthorne effect: Aspects of this suggest that the effect did

not depend on the particular expectation of the researchers, but
that being studied caused the improved performance.

The John Henry effect is the opposite of the Hawthorne effect

Jastrow's effect on factory work was much bigger: here an

explicit expectation about performance was transmitted and
turned out to change output by a factor of three.

The Pygmalion effect or "expectancy advantage" is that of a

self-fulfilling prophecy.

The charisma effect. This term really is most used as one of the
rival theories of leadership

The halo effect. Rating the performance of someone based on

an overall impression of them.

The novelty effect: participants think the technology /

educational intervention is wonderful and that belief is the real
cause of raised outcomes