Anda di halaman 1dari 17

BI3007 Experimental Design, Analysis and Presentation An introduction to Minitab 2011

Contents Introduction Getting started Descriptive statistics Unpaired t-test Paired t-test F-test for equality of variances Stacking & manipulating data One-way ANOVA Diagnostic tests for ANOVA Two-way ANOVA Data transformation Correlation & simple linear regression Chi-square test Non-parametric tests Introduction Minitab is a data analysis software package that runs in the Windows environment. The current version on the teaching server is Minitab 15. The software package itself is fairly easy to use and is almost completely menu driven so should be familiar to most of you. The following is a brief introduction to Minitab and instructions on how to conduct some of the more common statistical tests. As for most Windows applications, extensive help is available by clicking on Help on the Menu Bar, then on Search Help... Click on the Index tab and enter what it is you want help on in the box that appears. Getting started To start Minitab on the classroom computers, log on to Windows, open the Common Applications folder, then open the Statistical Software folder and open the Minitab 15 folder. Double click on the Minitab icon in the new window that appears. A Minitab Session and Worksheet like this should appear: main menu session window ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. Page 1 1 2 3 4 5 6 7 9 11 13 14 15 16

worksheet window

A. Douglas, updated by L McPherson

09/10/2011

You will also need to download all the .MPJ (Minitab project files) files from the Minitab tutorials link on the BI3007 course web page (on MyAberdeen) and save the files to your network drive (usually H: drive) or pen drive for use during the tutorial sessions. The Minitab project file (.MPJ) to be used for each exercise in this handbook is given in parentheses after the exercise title. To open a project, click on File on the main menu and then click on open project. Navigate to the Minitab project file you wish to open and double click on the file. Note: when using Minitab on the teaching server avoid opening Minitab by double clicking on individual project files as this causes Minitab to crash. You can also save your work as you progress by clicking on File and then Save Project on the main menu. If you are using an external program to manage your data (such as Microsoft excel) you can copy and paste your data directly into Minitab. If your data includes column headings be sure to paste your selection starting at the top left hand cell. See below for example:

Descriptive Statistics and frequency distributions (ttest data.MPJ) Open the project file entitled ttest data.MPJ. Click on Stat on the menu bar, select Basic Statistics, and then click on Display Descriptive Statistics. In the box labelled Variables: type in the column(s) containing your data e.g. C1 (or select it by double clicking the column titles from the selection available in the in the box to the left of the Variables: box).

A. Douglas, updated by L McPherson

09/10/2011

Minitab will produce a range of descriptive statistics for the dataset you selected and display the output in the Session window.

You can also obtain a graphical summary of your data by clicking on Basic Statistics, then Graphical Summary. In the window that appears, enter the column(s) containing your data in the box labelled Variables: This method has the advantage of producing the results of an Anderson-Darling Normality test. A P value less than 0.05 indicates that it is unlikely that the data are normally distributed and therefore parametric statistics are not appropriate or you must transform your data before using parametric statistics (see below for more details on transformations).

Anderson-Darling test for normality

Unpaired t-test (ttest data.MPJ) To conduct an unpaired t-test click on Stat on the menu bar, select Basic Statistics, and then click on 2-Sample t Select either the option Samples in one column or Samples in different columns, depending on how your data is arranged. Click on the appropriate boxes within each option e.g. if your samples are in two different columns, then in the box labelled First: type C1 (if your first sample data is in C1) and in Second: type C2 (if your second sample data is in C2). An alternative way of entering the column headings is to select a column by double clicking on the column name (e.g. sample A) in the box situated in the top left section of the window. Do not select the option Assume equal variances unless you know that the two samples you are comparing have equal variances (you can test for this using an F test - see below). Click on OK.

A. Douglas, updated by L McPherson

09/10/2011

The results of the t-test are shown in the Session window:

The P value (probability) tells you the likelihood of your null hypothesis (that the samples means are not different) being correct (i.e. a P value < 0.05 suggests that you can reject the null hypothesis and therefore infer that the sample means are significantly different). REMEMBER a P is NEVER zero. If it displays like this (above) in Minitab you report it as P<0.001.

Paired t - test (ttest data.MPJ) Click on Stat on the menu bar, select Basic Statistics, and then click on Paired t In the box labelled First sample: type C1 (if your first sample is in C1) and in Second sample: type C2 (if your second sample is in C2). You can also select it by double clicking on the correct option (e.g. sample A) in the box situated in the top left section of the window.

A. Douglas, updated by L McPherson

09/10/2011

Click on OK. The results of the t-test are shown in the Session window.

The P value tells you the likelihood (probability) of your null hypothesis (that the mean difference between the paired observations does not differ from 0) being correct (i.e. a P value < 0.05 suggests that the mean difference between the paired observations is significantly different).

F-test for equality of variances (ttest data.MPJ) One assumption you make about your data when using parametric statistics (such as a t-test, ANOVA etc) is that the variances of the groups you are comparing are approximately equal. You can perform an F-test to check this. Click on Stat on the menu bar, select Basic Statistics, and then click on 2 Variances Select either the option Samples are in one column or Samples are in different columns, depending on how your data is arranged. Click on the appropriate boxes within each option e.g. if your samples are in different columns, then in the box labelled First: type C1 (if your first sample data is in C1) and in Second: type C2 (if your second sample data is in C2). You can also select it by double clicking on the correct option (e.g. sample A) in the box situated in the top left section of the window.

A. Douglas, updated by L McPherson

09/10/2011

Click on OK. The results of the F-test are shown in the Session window. The P value tells you the likelihood of your null hypothesis (that the variances of the two samples are the same) being correct (i.e. a P value < 0.05 suggests that the variances differ as you are able to reject the null hypothesis).

In this case (above) P>0.005 so we cannot reject the null hypothesis (that the variances of the two samples are the same). Therefore we assume equal variances. The procedure described above can only be used to test for equal variances between two groups. In order to test for equal variances between two or more groups use the method described below in Diagnostic tests for ANOVA

Stacking and manipulating data (ANOVA.MPJ) For many of the statistical tests that you may want to carry out (especially some of the more complex tests), Minitab requires that the data is stacked. Stacking data simply means placing your data in a single column in the spreadsheet and including a list of descriptors in an adjacent column. Fortunately, Minitab can automatically stack data for you, and list the descriptors in an adjacent column. Click on Data on the menu bar, select Stack and then choose Columns. In the box labelled Stack the following columns: enter the columns you wish to stack. Select Column of current worksheet and enter the next available column on your worksheet that you want to store the stacked data. You can also store your subscripts (descriptors) in a column adjacent to your data by entering the column heading into Store subscripts in: Make sure that you have Use variable names in subscript column selected.

A. Douglas, updated by L McPherson

09/10/2011

Click on OK. The stacked data will now be stored on your worksheet. It is advisable to give your stacked data column headings for later reference.

type in column headings

original data

stacked data

One-way ANOVA (ANOVA.MPJ) In Minitab, it is preferable to stack all the data in one column and in an adjacent column type the code for the data (i.e. 1, 2 or 3 etc.). There is also an option to carry out a One-way ANOVA on unstacked data (Stat > ANOVA > One-way (Unstacked)) but it is not as versatile as the One-way ANOVA on stacked data option. For this exercise you will use the data you stacked in the previous exercise to perform a one-way ANOVA. You will be investigating the effect of three different temperatures on the biomass of a species of plant. Click on Stat on the menu bar, select ANOVA, and then click on One-way In the box labelled Response: enter the column containing all the stacked data (e.g. Biomass in the example below) and in Factor: enter the column containing the codes for the data (e.g. Temp in the example below). If you would like to store the residuals for later use, make sure that this option is checked. To determine which pair of means are significantly different click on Comparisons button.

A. Douglas, updated by L McPherson

09/10/2011

Select Fishers, individual error rate: (Fishers LSD) and check that the box on the right has a value of 5. This is the percentage significance level and therefore represents a P value of 0.05 (5%). Click on OK and then on OK again. The results of the ANOVA should be shown in the Session window:

The main values of interest are the P (probability) value associated with the Source of variation (e.g. temp in the above example) and the Mean for each of the different groups. In the above example, temperature has a significant (P < 0.001) effect on biomass (a P value < 0.05 suggests that you can reject the null hypothesis and therefore two or more of the means are significantly different). Minitab also provides you with a graphical representation of the confidence intervals for each mean. However, the results given in the ANOVA table will only tell you whether there a significant difference between the sample means exists. It will not tell you which treatment means are different from which. To see which means are significantly different maximise your Session window or scroll down your Session window in order to view the results from the Comparisons test. The results should look like those shown on the following page:

A. Douglas, updated by L McPherson

09/10/2011

Minitab calculates whether there is a significant difference between the mean biomass of the three temperature treatments. The 95 % confidence intervals for the mean difference between the groups is provided in the section below Simultaneous confidence level = 88.07%. In the first table, biomass at temperature 10C is compared to biomass at temperatures 15 and 20C. If the range between the lower and upper values does not pass through zero then the mean biomass is significantly different from each other. So, in the above example the range for temp 15C is 0.981 3.080, as this range does not pass through zero we can surmise that the mean biomass is significantly different between temp 10C and temp 15C. In addition, as the lower and upper values are above zero we can conclude that the mean biomass for temp 15C is significantly higher then temp 10C. Similarly, the mean biomass at 20C is significantly higher than temp 10C (9.040 11.138). In the second table the remaining comparison (temp 15C compared with temp 20C) is performed and again the mean biomass for temp 20C is significantly higher than temp 15C (7.009 9.108). If the range between the lower and upper values did pass through zero then we would conclude that there was no significant difference between the different temperature treatments being compared.

Diagnostic tests for ANOVA (ANOVA.MPJ) There are various options available for testing the suitability of using ANOVA to analyse your data. You MUST check these before you accept the results from your ANOVA test. Once you have selected the type of ANOVA you are going to carry out, there are two options available. One option is to click on Graphs (available in the dialog box in which you specify your ANOVA) and under 'Residual plots' select the option Four in One option. Click on OK and then click on OK to run your ANOVA.

A. Douglas, updated by L McPherson

09/10/2011

normality

equal variances

The four plots of the residuals should enable you to decide whether your data are suitable for ANOVA. The top and bottom lefthand graphs give you an indication whether the residuals of your analysis are normally distributed. If your data are normally distributed then the red data points would lie on or close to the blue line in the graph entitled 'normal probability plot of the residuals' (top left). Likewise, in the graph histogram of the residuals (bottom left), your residuals should be distributed in approximately a bell shape with one major peak if they are normally distributed. The top and bottom righthand graphs provide information on whether the variances for each of your treatment groups are approximately equal. In the residuals versus fitted graph (top right) you are looking for an equal spread of data points above and below the zero line. If the spread is unequal this would indicate that the variance in each of your treatment groups is unequal. The residual versus the order of the data graph (bottom right) indicates whether there are any unusual patterns in the data. Ideally, you data points would be scattered randomly above and below the zero line. Another option is to store the residuals when you carry out your ANOVA (as previously described). To do this select Store residuals (as described above). When you click on OK to run your ANOVA, the residuals will be stored in the first empty column available on your worksheet (you can either label your residuals column with something meaningful, or leave it as the default RESI1). You can then carry out a normality test on these residuals. To do this, click on Stat on the menu bar, select Basic Statistics and click on Normality Test In the box labelled Variable: enter the column containing your residuals, select which 'Tests of Normality' you wish to use (either Kolmogorov-Smirnov or Anderson-Darling tests are good choices) and click on OK.

A. Douglas, updated by L McPherson

10

09/10/2011

A P value > 0.05 indicates that the data are normally distributed (we cannot reject the null hypothesis that the data are not significantly different from normal). It is also possible to test for equal of variances between your treatment groups. To do this, click on Stat on the menu bar, select ANOVA and click on Test for Equal Variances In the box labelled Response: enter the column containing your saved residuals (RESI1 in this example) and in Factors: select the columns containing the factors in your model (temp in this example). Click OK.

A P value > 0.05 using the Bartlett's test (for normally distributed data) suggests that the variances do not differ (i.e. we cannot reject the null hypothesis that there is no difference between the variances of the treatments). Two-way ANOVA (anova2 data.MPJ) In this exercise you are comparing the biomass of a plant species grown at two different moisture levels and at three temperatures. There are five replicates for each of the treatment combinations. The data should be stacked and the levels of the two factors placed in adjacent columns [e.g. on the following screen shot, the data in columns C1-C4 has been stacked into column C8; the subscripts for each factor being tested (moisture and temperature) are in columns C6 and C7].

Once the data has been arranged into the correct format, click on Stat on the menu bar, select ANOVA, and then click on General Linear Model (GLM is a general term for a group of statistical tests which includes ANOVA and is the most versatile option for a two-way ANOVA and higher). In the box labelled Responses: enter the column containing your stacked data e.g. C8 (biomass) in the above example. In the box labelled Model: enter the columns containing the factors (e.g. C6 C7

A. Douglas, updated by L McPherson

11

09/10/2011

followed by C6*C7 to test for an interaction between the two factors (the model can be shortened to C6|C7 or C6!C7).

To determine which pair of means are significantly different click on Comparisons In the box labelled Terms: enter the model terms for comparison (e.g.. C6*C7 in the above example) and de-select the Confidence interval, with confidence level option. Make sure that the option Pairwise comparisons is selected and that Tukey is the method selected for the comparisons. Click on OK.

Click on Results In the box labelled Display least squares means corresponding to the terms: enter the model terms of interest (e.g.. C6*C7 in the example below). Click on OK and then click on OK again.

A. Douglas, updated by L McPherson

12

09/10/2011

The results should look like the following screen capture:

The values of interest are the P values shown in the ANOVA table. In the example above, the effect of moisture level on mean biomass was significant (P < 0.001), the effect of temperature was significant (P < 0.05) and there was significant (P < 0.001) interaction between moisture and temperature treatments. To visualise the main effects, click on Stat on the menu bar, select ANOVA, and then click on Main Effects Plot In the box labelled Responses: enter the column containing your stacked data e.g. C8 in the above example, and in Factors: enter the columns containing the factors e.g. C6 C7. Click on OK. To visualise the interactions, click on Stat on the menu bar, select ANOVA, and then click on Interactions Plot In the box labelled Responses: enter the column containing your stacked data e.g. C8 in the above example, and in Factors: enter the columns containing the factors e.g. C6 C7. Click on OK. If, in the graphical output, there was no interaction between the two factors then the lines joining the means would be parallel. To see which treatment means are significantly different maximise your Session window or scroll down your Session window in order to view the results from the Comparisons test. The output should look like this:

A. Douglas, updated by L McPherson

13

09/10/2011

treatment combinations with significantly different biomass

treatment combinations with significantly different biomass

treatment combinations with significantly different biomass

The best way to interpret this information is to work through each table systematically. In the above example the first treatment combination (moisture level 1 (dry) temperature 1 (10C)) is compared with moisture level 1 (dry) temperature 2 (15C). There is no significant difference in biomass between these two treatments as the P value is 0.9234. The next comparison is between moisture level 1 (dry) temperature 1 (10C) and moisture level 1 (dry) temperature 3 (20C) which is also not significantly different. Keep working through the comparisons in the same fashion until all treatments have been compared.

Data Transformation (anova data.MPJ) If your data are not normally distributed and/or the variances of your treatment groups are not equal then an alternative to using non-parametric statistics is to transform your data. There are a number of different transformations that you can apply to your data and you should consult your course text regarding these. In Minitab you are able to perform a number of transformations, including log10 transformation. To perform a Log10 transformation on your biomass data (anova data.MPJ, but remember to stack the data again if you havent already saved it) click on Calc on the main menu and then select Calculator. In the box labelled Store result in variable enter the name of the next available column in your worksheet (C6 in this case). Click in the Expression box and then scroll down and select the Log 10 function in the Functions: box by double clicking it. This will enter the function into the Expression box. You then need to double click on your response variable (biomass in this case) and then click on the + and then 1 buttons on the calculator. This should result in LOGT(biomass+1) being entered into the Expression box.

A. Douglas, updated by L McPherson

14

09/10/2011

Click on OK to transform your data. Your transformed data will now be stored in C6. It is a good idea to give the column heading a title for later reference.

If you are unsuccessful in applying other transformations a last resort is the Box-Cox Transformation. To perform this transformation in Minitab click on Stat on the menu bar, select Control Charts and then click on Box-Cox Transformation Select the appropriate option for the way in which your data are arranged. For the ANOVA examples in this manual you would select Single column: and in the box on the right you would type in the column where your data is (e.g. C5 in the example of a one-way ANOVA). In the box labelled Subgroup size: you would type in the number of replicates (e.g. 10 in the example of a one-way ANOVA). Finally, below where it says Store transformed data, in the box labelled Single column: type in an empty column in which to store the data (e.g. C8 in the example of a one-way ANOVA). Click on OK. The optimal transformation for your data will have been carried out and the transformed values stored in the specified column.

Correlation & simple linear regression (blood pressure.MPJ) To determine whether two variables measured on the same set of subjects are correlated, click on Stat on the menu bar, select Basic Statistics and then click on Correlation In the box labelled Variables: enter the columns containing the variables (e.g. first reading, second reading) , select Display p-values and click on OK. In the Session window, the Pearson correlation coefficient, r, is given and indicates the strength of the linear association between the two variables, as well as the P value, which indicates the strength of evidence that there is a linear association between the two variables in the population.

A. Douglas, updated by L McPherson

15

09/10/2011

For regression, click on Stat on the menu bar, select Regression and then click on Regression In the box labelled Response: enter the column containing the Y data (second reading in this example) and in Predictors: enter the column containing the X data (first reading in this example). Click on OK. If you use the data in blood pressure.MPJ the output should look like this:

The equation of the linear regression is: Second reading (Pa) = 20.2 + 0.812 First reading (Pa) i.e. the point at which the line intercepts the Y axis is 20.2 and the slope of the line is 0.812. An r (R-Sq) value determines the strength of the relationship between two variables. Minitab gives you this value as a percent, but it is more appropriate to convert this to a decimal. In the above example this 2 would give an r value of 0.77. The P value in the ANOVA table indicates whether there is a significant linear relationship in the population. In the above example there was a significant (P < 0.001) linear relationship in the population. The same diagnostic tools used for ANOVA should be available for regression i.e. you need to examine the residual plots.
2

Chi-square test (chisquare data.MPJ) To carry out a chi-square test for association, click on Stat on the menu bar, select Tables and click on Chi-square Test (two way table in worksheet) In the box labelled Columns containing the table: enter the columns containing your data, e.g. c1 c2 c3 in the example below (the data should be entered as a table in Minitab) and click on OK. The columns can be labelled with the names of the categories 2 but the rows cannot be labelled. Minitab calculates the expected values, the chi square ( ) statistic and the P value associated with the Null Hypothesis. In the example below, the columns refer to counts of milkwort flowers of a particular colour; row 1 refers to counts obtained at dry sites and row 2 refers to

A. Douglas, updated by L McPherson

16

09/10/2011

counts obtained at wet sites. In this example, the P value of < 0.05 suggests that there is evidence of a significant association between milkwort colour and site characteristics. The output should look like this:

Non-parametric tests (mann-whitney data.MPJ) Minitab has a range of non-parametric tests. To access these tests, click on Stat on the menu bar, select Nonparametrics and click on the desired test. For example, to compare two samples using the Mann Whitney U test (alternative to an unpaired t-test), click on Stat on the menu bar, select Nonparametrics and click on Mann-Whitney In the box labelled First Sample: enter the column containing the first sample (e.g. C1 Morning in the example below) and in the box labelled Second Sample: enter the column containing the second sample (e.g. C2 Afternoon in the example below), check that the Confidence level: is 95.0, check that the Alternative: hypothesis selected is not equal and click on OK. In the example below, the null hypothesis is that there is no difference between the test scores of students that take a class test in the morning or afternoon. As the P value is >0.05 we cannot reject the null hypothesis suggesting that there is no difference in test scores between students that take the test in the morning and the afternoon. The output should look like this:

A. Douglas, updated by L McPherson

17

09/10/2011

Anda mungkin juga menyukai