Anda di halaman 1dari 2

Stat 350 Project 01

Projects are due at the beginning of class on the indicated due date (9/14/2015, Monday).
To recieve full credit you must:
Show sufficient output not too much and not too little.
Write clear complete sentences an important part of a statistical analysis is clearly
communicating your conclusions, often to people that dont understand statistics.
Clearly label your solutions.
Staple your pages together.

For this project, you will be performing an exploratory analysis on a set of data
containing repeated measurements on the speed of light. You will be generating
graphical and numerical summaries in Minitab, and creating a short write-up describing your findings.
Data set: LIGHT.MTW can be downloaded from Canvas
Most of what you will do in Minitab will be under Graphs or Stat Descriptive
Statistics as demonstrated in class. You can also search the Minitab Help pages
or the web.

In 1879, Michelson made 100 measurements of the velocity of light in air using a modification of a method proposed by the French physicist Foucault. The experiments were
broken down into five trials of 20 experiments each. Download the Minitab data file
LIGHT.MTW from the course website, which contains two variables: Velocity and Trial. The
velocity measurements are in km/sec (and have had 299,000 subtracted from them).
Here you will go through the process of exploratory data analysis, where you will summarize the datanumerically and graphicallyand identify some patterns not obvious
from the list of numbers in the data file. All calculations and plots should be done with
Minitab and the relevant output pasted into your report. In the write-up you submit,
you should use complete sentences, your numerical answers should have units attached,
and your tables/graphs should be clearly labeled.
1. Draw a histogram of Velocity and comment on the shape (symmetric, bimodal,
skewed) of the distribution. If the distribution is skewed, specify which direction.
2. Calculate the mean and median of Velocity. Are the two numbers about the same,
or is one considerably larger than the other? Can you explain this relationship
based on your graph in Problem 1?

3. Report the five-number summary. Interpret the value of the third quartile (Q3 ) you
obtained in the context of this speed of light study.
4. An important part of data analysis is identifying observations that are extreme (too
large or too small compared to the rest)these observations are called outliers.
(a) Draw a stem-and-leaf plot of Velocity. Are there any observations you suspect
could be outliers?
(b) Here is a precise rule for identifying outliers: An observation is considered an
outlier if it falls outside the interval
[Q1 1.5 IQR, Q3 + 1.5 IQR],

(1)

where Q1 and Q3 are the first and third quartiles, respectively, and IQR =
Q3 Q1 is the inter-quartile range. Use the five-number summary in Problem 3
(and a calculator) to find interval (1) for Velocity, and identify the outliers.
(c) Construct a boxplot of Velocity and confirm that the outliers detected by
Minitab agree with your conclusions in part (b).
(d) Create a new variable1 VelocityNew in column C3 that consists of the Velocity
measurements with the outliers removed. Compute the mean and median of
VelocityNew. How has the relationship between mean and median changed
(compared to Problem 2) when the outliers are removed?
5. An important technique in many statistical analyses is standardization. For data
x1 , . . . , xn with mean x and standard deviation s, the standardized values (also
called z-scores) are defined as
zi =

xi x
,
s

i = 1, 2, . . . , n.

(2)

(a) Create a new variable2 Z in column C4 that contains the standardized values
of Velocity. Compute the mean and standard deviation of Z.
(b) Construct a histogram of Z. What has changed here compared to the histogram
in Problem 1? What has not changed?
6. Up until now we have not used the Trial variable. But perhaps some additional
information can be obtained if we take this variable into account.
(a) Draw a side-by-side boxplot of Velocity using Trial as the by variable. Compare the five distributions in terms of their location and spread.
(b) What does this tell us about the sequence of experiments performed by Michelson? In particular, is it reasonable to assume that the 100 experiments were
performed under identical experimental conditions? Use the side-by-side boxplot from (a) to justify your answer. (This is important because the way
we would proceed to analyze this data would depend on the design of the
experiment.)
1
2

Highlight the column Velocity and paste it into C3; then manually delete each outlier.
Go to Calc Standardize; double-click Velocity and type C4 for storage.

Anda mungkin juga menyukai