Anda di halaman 1dari 13

An experiment deliberately imposes a treatment on a group of objects or subjects in the interest of observing the response.

This differs from an observational study, which involves collecting and analyzing data without changing existing conditions. Because the validity of a experiment is directly affected by its construction and execution, attention to experimental design is extremely important. Treatment In experiments, a treatment is something that researchers administer to experimental units. For example, a corn field is divided into four, each part is 'treated' with a different fertilizer to see which produces the most corn; a teacher practices different teaching methods on different groups in her class to see which yields the best results; a doctor treats a patient with a skin condition with different creams to see which is most effective. Treatments are administered to experimental units by 'level', where level implies amount or magnitude. For example, if the experimental units were given 5mg, 10mg, 15mg of a medication, those amounts would be three levels of the treatment. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) Reference: http://www.stat.yale.edu/Courses/1997-98/101/expdes.htm Experimental Treatments Alternative manipulations of the independent variable being investigated

Test Unit Entity whose responses to experimental treatments are being observed or measured

Reference: www.jsu.edu/ccba/mm/faculty/thomas/497/497experiments ppt Factor A factor of an experiment is a controlled independent variable; a variable whose levels are set by the experimenter. A factor is a general type or category of treatments. Different treatments constitute different levels of a factor. For example, three different groups of runners are subjected to different training methods. The runners are the experimental units, the training methods, the treatments; where the three types of training methods constitute three levels of the factor 'type of training'. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) Reference: http://www.stat.yale.edu/Courses/1997-98/101/expdes.htm

Assignment methods Randomization Assignment of subjects and treatments to groups is based on chance Provides control by chance Random assignment allows the assumption that the groups are identical with respect to all variables except the experimental treatment.

Random Assignment Random assignment is a procedure used in experiments to create multiple study groups that include participants with similar characteristics so that the groups are equivalent at the beginning of the study. The procedure involves assigning individuals to an experimental treatment or program at random, or by chance (like the flip of a coin). This means that each individual has an equal chance of being assigned to either group. Usually in studies that involve random assignment, participants will receive a new treatment or program, will receive nothing at all or will receive an existing treatment. When using random assignment, neither the researcher nor the participant can choose the group to which the participant is assigned. The benefit of using random assignment is that it evens the playing field This means that the groups will differ only in the program or treatment to which they are assigned. If both groups are equivalent except for the program or treatment that they receive, then any change that is observed after comparing information collected about individuals at the beginning of the study and again at the end of the study can be attributed to the program or treatment. This way, the researcher has more confidence that any changes that might have occurred are due to the treatment under study and not to the characteristics of the group. A potential problem with random assignment is the temptation to ignore the random assignment procedures. For example, it may be tempting to assign an overweight participant to the treatment group that includes participation in a weight-loss program. Ignoring random assignment procedures in this study limits the ability to determine whether or not the weight loss program is effective because the groups will not be randomized. Research staff must follow random assignment protocol, if that is part of the study design, to maintain the integrity of the research. Failure to follow procedures used for random assignment prevents the study outcomes from being meaningful and applicable to the groups represented.

http://ori.hhs.gov/education/products/sdsu/rand_assign.htm

Between group and within group design

There are two basic research designs associated with the experimental research strategy Between-subjects design we obtain each of the different groups of scores from a separate group of participants e.g. one group of students is assigned to teaching method A and a separate group to method B

Within-subjects design different groups of scores are all obtained from the same sample of participants e.g. one sample of individuals is given a memory test using a list of one-syllable words, and then the same set of individuals is tested again using a list of two-syllable words

analysis", which is a multivariate statistical technique, which involves the statistical grouping of multiple continuous variables.) (IV) Room Temperature (IV) Test Difficulty (Level) Hard Test (Level) Easy Test (Level) 50 degrees Hard Test in 50 degrees Easy Test in 50 degrees (Level) 90 degrees Hard Test in 90 degrees Easy Test in 90 degrees

Figure 1. Example of a 2 X 2 Design Whenever we see numbers and the multiplication sign in describing designs like this (2 X 2 or "two by two"), each of the numbers represents an independent variable and the value of the number represents the number of levels of that independent variable. For example, 3 x 2 would mean two independent variables, one with three levels and one with 2. 3 X 4 X 2 would represent an experiment with three independent variables, one with three levels, one with 4, and one with 2. Researchers often use the term "way" to refer to the number of independent variables in an analysis, so a "one-way" ANOVA refers to an ANOVA with one independent variable, and a two-way ANOVA would be used to analyze an experiment with two independent variables. Mathematically, we can analyze data with as many independent variables as we want. However, due to the complexity of interpreting higher level ANOVAs, its rare to see anything beyond a three-way ANOVA. Results and Analyses An experiment that includes multiple categorical independent variables and one continuous dependent variable is appropriately analyzed using an analysis of variance (ANOVA). Statistically, in a two-way ANOVA there are three basic types of effects that are tested: main effect for independent variable A, main effect for independent variable B, and effect for the interaction of A and B. We will consider main effects first. A main effect for a given independent variable means that there is a significant difference between the levels of this independent variable across levels of the other independent variable. Another way of thinking of this is that there is an effect for one independent variable regardless of the the level of the other. For example, in the test experiment, students are probably going to do better on the easy test than the hard, regardless of the temperature of the room they are tested in (Figure 2 displays means that represent such an effect, and Figure 3 is a line graph of these means). The mean of the means for a given independent variable, collapsing across the levels of another independent variable, are referred to as "marginal means" and these means aid us in interpreting a main effect. So, if we look at the marginal means in this example we can see that there is clearly a dramatic difference between the marginal means associated with test difficulty (60 vs. 80) and no difference in the means associated with room temperature.

Room Temperature Test Difficulty 50 degrees 90 degrees Marginal means (Test difficulty) 60 80

60 80 70

60 80 70

Figure 2. Means representing a main effect for Test Difficulty.

Figure 3. Line graph of means from Figure 2 We could, of course, find a main effect of Room Temperature, by reversing this hypothetical example, in which case we could, for example, change the means for both test difficulty groups in the 50 degree room to a score of 60, and for both test difficulty groups in the 90 degree room to a score of 80. The examples thus far might lead you to the conclusion that a main effect for one independent variable precludes the possibility of finding a main effect for the other. Actually this is not true as illustrated in the table and line-graph in figures 4 and 5. Notice that in this case, we would conclude from the results that students perform best in 90 degrees regardless of whether the test was hard or easy, and they also perform better on the easy test regardless of whether they take it in 50 degrees or 90. Note that the graphs of the means in both figures 3 and 5 represent parallel lines. Parallel lines in these types of graphs indicate that there are main effects in the results, but no interactions. If the lines are not parallel this is indicative of an interaction. (Note that it is possible to find both a significant main effect and a significant interaction with the same set of means, and in this case, the lines will not be parallel. In interpreting such a case, the main effect is usually ignored, in that it is misleading. We will address this further below.)

Room Temperature Test Difficulty 50 degrees 90 degrees Marginal means (Test difficulty) 65 75

hard easy Marginal means (room temperature)

60 70 65

70 80 75

Figure 4. Means Representing a Main Effect for Both Room Temperature and Test Difficulty.

Figure 5. Line Graph of Means from Figure 4 Figure 6 and 7 represent the classic "crossing" interaction. This effect is an interaction because the effect of one independent variable depends on, the effect of the other. If we found these means, and we were asked a question about one independent variable, such as: "Did students do better with the hard test or easy test?", our answer would be something like: "That depends, with room temperature of 50 they did better on the hard test, with room temperatures of 90 they did better on the easy test". Conversely, if someone asked: "Did students do better in ninety degrees or 50 degrees?", your answer would be: "That depends, with the hard test they did better in 50 degrees, with the easy test they did better in 90 degrees." As you can see from the marginal means there is absolutely no main effects in this case. This illustrates the fundamental advantage of using a multi-way design and analysis. If we were to set up and experiment where we just compared hard vs. easy tests or 50 vs. 90 degree rooms, and students scored just as is illustrated below, we would never realize that the effect of one of these factors was dependent on the other. Likewise if we analyzed the present experiment, and students scored just as illustrated below, just using two t-tests, one for each

of the independent variables, our conclusions would be quite different about the effect of these two independent variables, and our conclusions would be incorrect. Room Temperature Test Difficulty 50 degrees 90 degrees Marginal means (Test difficulty) 70 70

80 60 70

60 80 70

Figure 6. Means Representing "crossing" Room Temperature X Test Difficulty Interaction

Figure 7. Line Graph of Means from Figure 6 Although the classic "crossing" interaction in figures 6 and 7 is used most often to illustrate an interaction in a two-way analysis, its also possible to find an interaction in which the lines do not cross (though note they are still not-parallel), such as illustrated in figures 8 and 9. Note that, to explain the results, we would still have to describe the results of one independent variable in terms of the other. For example: "Do students do better on hard tests or easy tests?" "It depends, in a fifty degree room there is no difference, but in a ninety degree room they do much better on easy tests." Note that this also represents the case that I referred to above in which the marginal means indicate that there are two main effects. When we average across effects for room temperature, the mean for the ninety-degree room is higher, and when we average across effects for test difficulty, the easy test scores are higher. However, clearly if we were to conclude from these results that students do better in ninety degree rooms, regardless of test difficulty, or that they do better on easy tests regardless of room temperature, we would clearly be incorrect. The correct conclusion to be drawn from the results below is that students do best when the test is easy and the temperature is 90 degrees. Otherwise temperature and difficulty level doesn't

matter. This is why researchers often disregard a main effect that occurs in the data analysis when there is also an interaction. Room Temperature Test Difficulty 50 degrees 90 degrees Marginal means (Test difficulty) 60 70

60 60 60

60 80 70

Figure 9. Line Graph of Means from Figure 8

Within-Subjects Designs A within-subjects design is an experiment in which the same group of subjects serves in more than one treatment Note that Im using the word "treatment" to refer to levels of the independent variable, rather than "group" Its probably always better to use the word "treatment", as opposed to group. The term "group" can be very misleading when you are using a within-subjects design because the same "group" of people is often in more than one treatment. As an example of a within-subjects design, lets say that we are interested in the effect of different types of exercise on memory. We decide to use two treatments, aerobic exercise and anaerobic exercise. In the aerobic condition we will have participants run in place for five minutes, after which they will take a memory test. In the anaerobic condition we will have them lift weights for five minutes, after which they will take a different memory test of equivalent difficulty. Since we are using a within-subjects design we have all participants begin by running in place and taking the test, after which we have the same group of people lift weights and then take the test. We compare the memory test scores in order to answer the question as to what type of exercise aids memory the most. Strengths There are two fundamental advantages of the within subjects design: a) power and b) reduction in error variance associated with individual differences. A fundamental inferential statistics principle is that, as the number of subjects increases, statistical power increases, and the probability of beta error decreases (the probability of not finding an effect when one "truly" exists). This is why it is always better to have more subjects, and why, if you look at a significance table, such as the t-table, as the number of subjects increases the t value necessary for statistical significance decreases. The reason this is so relevant to the within subjects design is that, by using a within-subjects design you have in effect increased the number of "subjects" relative to a between subjects design. For example, in the exercise experiment, since you have the same subjects in both groups, you will have twice as many "subjects" as you would have had if you would have used a between-subjects design. If ten students sign up for the experiment, and you use a between-subjects design, with equal size groups, you will have five subjects in the aerobic condition and 5 in the anaerobic condition. However, if you use a within-subjects design you will in effect have 10 subjects in both conditions. Just as with the term "groups" vs. "treatments", instead of using the term "subjects" its better to speak of "observations", since the term subjects is misleading in the within-subjects design when the same person may effectively be more than one "subject". The reduction in error variance is due to the fact that much of the error variance in a betweensubjects design is due to the fact that, even though you randomly assigned subjects to groups, the two groups may differ with regard to important individual difference factors that effect the dependent variable. With within-subjects designs, the conditions are always exactly equivalent with respect to individual difference variables since the participants are the same in the different conditions. So, in our exercise example above, any factor that may effect performance on the dependent variable (memory) such as sleep the night before, intelligence, or memory skill, will be exactly the same for the two conditions, because they are the exact same group of people in the two conditions.

Weaknesses There is also a fundamental disadvantage of the within-subjects design, which can be referred to as "carryover effects". In general, this means the participation in one condition may effect performance in other conditions, thus creating a confounding extraneous variable that varies with the independent variable. Two basic types of carryover effects are practice and fatigue. As you read about the hypothetical exercise and memory experiment, you may very possibly have recognized that one problem with this experiment would be that participating in one exercise condition first, followed by the memory test, may inadvertently effect performance in the second condition. First of all, participants may very possibly be more tired from running in place and weight lifting than they are from just running in place so that they perform worse on the second memory test. If this is the case, they wouldn't do worse on the second test because aerobic exercise is better for memory than anaerobic, rather they would do worse because they were actually more worn out from exercising for ten minutes total than after only exercising for five. When one within-subjects treatment negatively effects performance on a later treatment this is referred to as a fatigue effect. On the other hand, in the exercise experiment the second memory test may be very similar to the first, so that by practicing with the first test they perform much better the second time. Again, the difference between the two conditions would not be due to the independent variable (aerobic vs. anaerobic), rather it would be due to practice with the test. When a within-subjects treatment positively effects performance on a later treatment this is referred to as a practice effect. Factorial Design A way to study the effects of two or more independent variables. It is commonly used to avoid the need to study each individual effect separately. It combines the studies of different effects into a single study. It is essential in order to create and test all possible outcomes. The factorial design consists of the independent variables which are also called factors. These factors are broken down into multiple sub-divisions or levels. The number of levels of each factor are what determines the amount of possible treatment groups and the name of the design. The amount of groups are calculated by multiplying the amount of levels of each factor by one another. For example, suppose there are three factors. Factor 1 has 2 levels, Factor 2 has 4 levels and Factor 3 has 3 levels. It would result in a 2 x 4 x 3 design with ( 2 * 4 * 3 = 24) with 24 treatment groups.

The above images shows a 2x2 design, resulting in 4 treatment groups

Results from Factorial Design Utilizing a factorial design can give vital information about the experiment being performed. The result illustrated in the two images below is representative of the null case. This case occurs when regardless of the combination of factor levels, the outcome is exactly the same; the chance

of having a seizure is still 10%. This ultimately implies that the amount of medicine nor the age of the patient will have any effect on the likelihood of having a seizure. Main effects are also outcomes from using a factorial design. Unlike the null case, a difference can be seen amongst levels of factors. The difference is calculated by taking the average of one variable across all the levels of the other variable. The figure titled, Main Effects, show two graphs The graph on the left shows that time has a difference. In both the in and out settings, the 4 hour time slot proved to be more beneficial than the 1 hour slot. However, the second graph shows that there is no effect of the setting on the time. The same result was achieved by the in and out setting over both periods of time. The line for the in and out settings are completely overlapped.

Factorial design is also useful in discovering interaction effects between variables. An interaction exist when two or more variables depend on one another. It can found by noticing that you cannot explain the effect on one variable without mentioning the other. Statistically, graphical software will report any interaction between variables. Graphically, one can notice an interaction has occurred if the lines are not parallel to one another. An intersection is proof of an interaction. Different Types of Experiments True True experiments are thought to be the most accurate of all research types. This is due to its characteristic of randomly assignment. Subjects are put into their test conditions at random, with no previous testing. This act of randomization helps to controls extraneous variables. In a true experiment only one variable is manipulated and tested. This type of experiment also has a control group, where nothing is changed. Although this type of test is highly favored for its accuracy, it is also one of the most expensive type of testing. Quasi Quasi experiments are also referred to as natural experiments. This is due to the nature of the group assignments. The membership in the treatment groups is predetermined by factors outside of the experimenters control For example, comparing the math abilities between male and female students would fall under the quasi category, because there is no control over what someones gender already is One disadvantage to quasi testing does not rule out other factors In the previous example with the male and female students, a test could prove that boys are better in math than girls, but it does not acknowledge that this could be due to outside factors that are totally unrelated to gender. Two types of quasi experiments are pre vs. post events and treatment vs. control. Non-Experimental This form of design is unlike the other two. There is no control or assigned grouping. It essentially looks at the event or trend overall instead of trying to explain the relationship. This is more of a design focused on pure observation. It is favorable because the design is so simplistic, easy and inexpensive to implement. However, more so than the quasi experiment, it does not indicate if and what factors have an effect on the results. Two types of non-experimental design are pre and post-test design and time series design. http://www.socialresearchmethods.net/kb/expfact.php https://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_factorial_designs# What_is_Factorial_Design.3F http://psychology.ucdavis.edu/SommerB/sommerdemo/experiment/types.htm http://explorable.com/quasi-experimental-design http://explorable.com/true-experimental-design http://www.ehow.com/info_8481025_true-experiments-vs-quasi-experiments.html http://www1.cyfernet.org/eval/family/Section5/Page-05.html http://faculty.ksu.edu.sa/Hanan_Alkorashy/Nursing%20management%20489NUR/Non_experim ental_Designs.pdf

Activity: Suppose you want to measure the effect of noise conditions and test difficulty on a students test score. Quiet 95 80 72 Loud 85 80 63

Easy Average Hard

What are the factors in this design? _________________________________________ How many levels do each factor have?________________________________________ What kind of design is this (ex. 6 x 3)? ________________________________________ Is this representative of a null case? __________________________________________ Are there any main effects indicated? _________________________________________

Answers: 2 (noise condition & test difficulty) Noise condition : 2; Test difficulty : 3 2 x 3 or 3 x 2 No Yes, there are main effects for both noise conditions and test difficulty. If you calculate the average for both levels of noise conditions across the three levels of test difficulty ( Quiet= 82.3, Loud = 76), you can see that they are different values. The same occurs for the test difficulty (Easy= 90, Average =80 , Hard= 67.5).