Anda di halaman 1dari 9

Ryan Gardner

Math 1040

Professor: Richard Oremus

4/18/17

Term Project

For the term project, we each purchased a 2.17oz bag of original Skittles, recorded counts of
each color in the bag, and the total Skittle count. Each individual submitted their data and the data was
compiled for the entire class. Using this sample data, we worked both in groups and as individuals to
calculate descriptive statistics. Later we used the data to perform inferential statistics to determine
characteristics of all Skittles. The project consists of 5 parts, and began with my skittle counts which are
as follows.

Term Project- Part 1

Counts
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Count 8.00 10.00 17.00 13.00 8.00 56.00

Term Project- Part 2

In part two of the term project we first worked as a group and then as individuals. As a group, we
first made a guess of the proportions of the class counts of each color. We then calculated the actual
proportions and displayed the proportions using a Pie and Pareto Chart. We then discussed whether the
data represented a random sample.

Without looking at the data, we expect the skittles to be even proportions across all colors. We
expect each color to represent 20% of the total. The reason we think that they will be of even
proportions is because we think the different colored skittles are manufactured in even
quantities before being mixed and placed in individual bags. With modern production processes,
they should be able to achieve an even production of the different colors at the factory.

Here is a table representing the actual proportions:

Orange Yellow Green Purple


Red Count Count Count Count Count
20.5% 19.4% 21.4% 19.8% 19.0%

1
2
Yes, we believe that the class data does represent a random sample, because the selection of
the bags of 2.17oz Skittles was left up to chance. Each individual selection was made at different
stores in different areas of the valley depending on where each student lives. Every bag could
have been made on a different day, or month. Also, these bags could have come from different
factories.
The population is 2.17oz bags of Skittles.

For the individual submission in part two I created a table that includes the counts for my bag
and the counts for the entire class. I then discussed my observations.

Counts
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Count 8.00 10.00 17.00 13.00 8.00 56.00
Average Class Count 12.21 11.55 12.76 11.82 11.34 59.68

Proportions
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Proportions 14.3% 17.9% 30.4% 23.2% 14.3% 56.00
Class Proportions 20.5% 19.4% 21.4% 19.8% 19.0% 59.68

For the most part, the graphs do reflect what I expected to see. I expected that the
proportions for each color of Skittle would be close to 20% in the class count. I was surprised to
see that the data was quite close to what I expected to see. The yellow count appears to be an
outlier. Since it is much higher than the average class count for yellow, it will cause the graphics
and summary statistics to be skewed toward a higher count for yellow. Since the frequency of
yellow Skittles is high, and the frequencies for purple and red are low, my single bag of skittles
does not match the distribution of colors in the total class data.

Term Project- Part 3

For part 3 we performed calculations to obtain descriptive statistics of the skittle data. We then
displayed the data using a frequency histogram and box plot.

1. I: Mean Number of Candies: 59.7


II: Standard Deviation: 2.9
III: 5 Number Summary: 53, 58, 59.5, 61, 66

3
4
Term Project Part 3- Individual

The individual submission of part 3 discusses the findings about the variable, total candies in
each bag. The differences between categorical and quantitative data are also discussed.

The shape of the distribution is relatively symmetric. This is demonstrated by the difference
between the upper limit and the median, and the lower limit and the median. The difference
between the upper limit and the median is 5.5, and the difference between the lower limit and
the median is 4.5. Since the difference between the upper limit and the median is slightly higher
than the difference between the lower limit and the median, the data is skewed right only
slightly, which is why the shape of the distribution is relatively symmetric.

I was surprised by the amount of dispersion of the data. Since each bag of Skittles is the same
weight (2.17 oz) I expected the standard deviation to be lower than it was. One thing that I can
think of that would account for this difference is an inconsistency in the size of the individual
Skittles.

My Bag Count 56.00


Average Class
Count 59.68

My bag of Skittles fell on the lower end of the counts for total Skittles. My bag was 1.27
standard deviations below the mean. Since the count from my bag was greater than the lower
fence limit of 55, the overall class data does agree with the count for my bag. My counts are not
an outlier.

5
There are many differences between categorical and quantitative data. One difference in
particular that stands out to me is the ability to use arithmetic on quantitative data. In our
example, we began with categorical data- the color of the Skittles. We can come to this
conclusion by trying to apply arithmetic to the data. Trying to add red plus yellow is nonsense,
so we know this is categorical data.

Frequency and relative frequency bar graphs and pie charts can be used to provide a
visual representation of categorical data. The height of the bars in the bar graph give a visual
comparison of the frequency of each category of the data relative to the other categories. A pie
chart accomplishes something similar, but is particularly useful when comparing a part of the
data to the whole. Stem and leaf plots, histograms, and box plots do not work with categorical
data, and are used for quantitative data. The stem and leaf plot and histogram each require
distinct classes to be established, so categorical data would be nonsense when trying to be
displayed. The box plot requires a 5 number summary, which is not possible to calculate from
categorical data. Calculation for categorical data include frequency and relative frequency. The 5
number summary, standard deviation, variance, and range are not appropriate for categorical
data, because they require the ability to use arithmetic on the data.

Initially, we began with categorical data when we put together a frequency distribution
from the colors of Skittles included in the bag. When the class data was compiled and made
available for comparison, the data became quantitative. For instance, 5 number summary,
standard deviation, and variance can all be calculated when comparing the total count for each
of the bags. We can also make histograms, stem and leaf plots, and dot plots since we can assign
classes to the data.

Group Project Part 4

In part 4 of the project we constructed a 99% confidence interval estimate for the population
proportion of yellow candies, and a 95% confidence interval estimate for the population mean number
of candies per bag. We then summarized the meaning of our estimates. For the individual portion of the
project I described the meaning of the confidence interval. Our work is as follows:

1. Construct a 99% confidence interval estimate for the population proportion of yellow
candies. Show your work, including the computations for the margin of error and the
critical value.

Total Candies: 2268 Yellow Candies: 485

(1 )
/2

485
= 2268 = 2268 = .2138 .01/2 = .005 = 2.5758

6
.2138(1.2138)
Lower Bound = . 2138 2.5758 2268
= .1916

.2138(1.2138)
Upper Bound = . 2138 + 2.5758 2268
= .236

.236.1916
Margin of Error = 2
= .0222

2. Construct a 95% confidence interval estimate for the population mean number of
candies per bag. Show your work, including the computations for the margin of error
and the critical value.

/2

.05/2 = .025 = 2.0262 = 59.68 = 2.88 = 38

2.88
Lower Bound: 59.68 2.0262 = 58.7334
38

2.88
Upper Bound: 59.68 + 2.0262 = 60.6266
38

60.626658.7334
Margin of Error: 2
= .9466

3. Discuss and interpret (with complete sentences) the results of each of your two
interval estimates.

We are 99% confident that the population proportion of yellow candies is . 2138 .0222.

We are 95% confident that the population mean of the skittles per bag is between 58.7334 and
60.6266 candies.

Term Project Part 4: Individual

The general purpose of a confidence interval is to provide a margin of error. Confidence

intervals show how accurately sample data represents the population. Depending on how important

accuracy is to the situation, confidence intervals can be used to determine if data is suitable for decision

making. Confidence intervals provide a range of values and a confidence level that the population

7
parameter falls within this range of values. The confidence level is expressed as a percentage. In

situations where accuracy is important, and obtaining a parameter from an entire population is cost

prohibitive, confidence intervals can be used to determine the most effective survey size. An increase in

the survey size can have different results depending on the needs of the statistician. Increasing the

survey size can increase the confidence that the population parameter will fall within the range of

values, or it will lower the width and margin of error of the range of values with the same level of

confidence, or it can increase the confidence and decrease the margin of error. In general, the

confidence interval is used to determine how confident one can be that a statistic represents the

population parameter.

Term Project Part 5: Reflection

The biggest takeaway for me from the term project was learning new skills that I will be able

to use for problem solving in the future. The skill that I think will be the most useful that I

learned during the project was how to calculate and interpret confidence interval estimates for

population proportions and population means. It is often very difficult if not impossible to gather

data for an entire population, making this an important skill.

Prior to taking this course I was working on suggestions for solving a problem at my work.

The problem was that rail vehicles were being delayed because headlamps on the vehicles were

going out during service, and had to be replaced at the end of the line where there is not enough

time to complete the replacement. Learning how to construct a confidence interval estimate for

the population mean gave me a new tool to use to look for a solution to the problem. Using

sample data from the repair database, I generated a 99% confidence interval to estimate the mean

headlamp life. The confidence interval can now be used to determine a replacement interval

8
which results in a reduction of delays due to headlamp replacements, without drastically

increasing the number of headlamp replacements due to premature replacement.

The applications of population mean and population proportion estimates are virtually

endless. Working through these estimates for the term project has given me a better

understanding of what goes into the voting statistics before an election, and the importance of a

confidence interval when describing a statistic, since it is only an estimate of a population

parameter. I have been confused many times about why estimates of the outcome of an election

are incorrect. After completing the project I have come to realize that many of the reasons why

the statistic did not represent the population are due to limitations of the sample data and not the

statistical calculations that were used. Completing this project has given me a better

understanding of what I am looking at when reading statistics.

Conclusion

In order to complete the term project I had to expand my skills. I learned how to calculate

proportions and display the information in a Pie and Pareto Chart. I learned how to display data

using a box plot in order to easily identify outliers in the data. I learned how about histograms

and how to identify the type of distribution. I learned the difference between categorical and

quantitative data, and finally, I learned how to take the sample statistics and apply it to the

population to make estimates of the population parameters with a certain level of confidence.

Anda mungkin juga menyukai