Math 1040
4/18/17
Term Project
For the term project, we each purchased a 2.17oz bag of original Skittles, recorded counts of
each color in the bag, and the total Skittle count. Each individual submitted their data and the data was
compiled for the entire class. Using this sample data, we worked both in groups and as individuals to
calculate descriptive statistics. Later we used the data to perform inferential statistics to determine
characteristics of all Skittles. The project consists of 5 parts, and began with my skittle counts which are
as follows.
Counts
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Count 8.00 10.00 17.00 13.00 8.00 56.00
In part two of the term project we first worked as a group and then as individuals. As a group, we
first made a guess of the proportions of the class counts of each color. We then calculated the actual
proportions and displayed the proportions using a Pie and Pareto Chart. We then discussed whether the
data represented a random sample.
Without looking at the data, we expect the skittles to be even proportions across all colors. We
expect each color to represent 20% of the total. The reason we think that they will be of even
proportions is because we think the different colored skittles are manufactured in even
quantities before being mixed and placed in individual bags. With modern production processes,
they should be able to achieve an even production of the different colors at the factory.
1
2
Yes, we believe that the class data does represent a random sample, because the selection of
the bags of 2.17oz Skittles was left up to chance. Each individual selection was made at different
stores in different areas of the valley depending on where each student lives. Every bag could
have been made on a different day, or month. Also, these bags could have come from different
factories.
The population is 2.17oz bags of Skittles.
For the individual submission in part two I created a table that includes the counts for my bag
and the counts for the entire class. I then discussed my observations.
Counts
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Count 8.00 10.00 17.00 13.00 8.00 56.00
Average Class Count 12.21 11.55 12.76 11.82 11.34 59.68
Proportions
Purple
Red Count Orange Count Yellow Count Green Count Count Total
My Bag Proportions 14.3% 17.9% 30.4% 23.2% 14.3% 56.00
Class Proportions 20.5% 19.4% 21.4% 19.8% 19.0% 59.68
For the most part, the graphs do reflect what I expected to see. I expected that the
proportions for each color of Skittle would be close to 20% in the class count. I was surprised to
see that the data was quite close to what I expected to see. The yellow count appears to be an
outlier. Since it is much higher than the average class count for yellow, it will cause the graphics
and summary statistics to be skewed toward a higher count for yellow. Since the frequency of
yellow Skittles is high, and the frequencies for purple and red are low, my single bag of skittles
does not match the distribution of colors in the total class data.
For part 3 we performed calculations to obtain descriptive statistics of the skittle data. We then
displayed the data using a frequency histogram and box plot.
3
4
Term Project Part 3- Individual
The individual submission of part 3 discusses the findings about the variable, total candies in
each bag. The differences between categorical and quantitative data are also discussed.
The shape of the distribution is relatively symmetric. This is demonstrated by the difference
between the upper limit and the median, and the lower limit and the median. The difference
between the upper limit and the median is 5.5, and the difference between the lower limit and
the median is 4.5. Since the difference between the upper limit and the median is slightly higher
than the difference between the lower limit and the median, the data is skewed right only
slightly, which is why the shape of the distribution is relatively symmetric.
I was surprised by the amount of dispersion of the data. Since each bag of Skittles is the same
weight (2.17 oz) I expected the standard deviation to be lower than it was. One thing that I can
think of that would account for this difference is an inconsistency in the size of the individual
Skittles.
My bag of Skittles fell on the lower end of the counts for total Skittles. My bag was 1.27
standard deviations below the mean. Since the count from my bag was greater than the lower
fence limit of 55, the overall class data does agree with the count for my bag. My counts are not
an outlier.
5
There are many differences between categorical and quantitative data. One difference in
particular that stands out to me is the ability to use arithmetic on quantitative data. In our
example, we began with categorical data- the color of the Skittles. We can come to this
conclusion by trying to apply arithmetic to the data. Trying to add red plus yellow is nonsense,
so we know this is categorical data.
Frequency and relative frequency bar graphs and pie charts can be used to provide a
visual representation of categorical data. The height of the bars in the bar graph give a visual
comparison of the frequency of each category of the data relative to the other categories. A pie
chart accomplishes something similar, but is particularly useful when comparing a part of the
data to the whole. Stem and leaf plots, histograms, and box plots do not work with categorical
data, and are used for quantitative data. The stem and leaf plot and histogram each require
distinct classes to be established, so categorical data would be nonsense when trying to be
displayed. The box plot requires a 5 number summary, which is not possible to calculate from
categorical data. Calculation for categorical data include frequency and relative frequency. The 5
number summary, standard deviation, variance, and range are not appropriate for categorical
data, because they require the ability to use arithmetic on the data.
Initially, we began with categorical data when we put together a frequency distribution
from the colors of Skittles included in the bag. When the class data was compiled and made
available for comparison, the data became quantitative. For instance, 5 number summary,
standard deviation, and variance can all be calculated when comparing the total count for each
of the bags. We can also make histograms, stem and leaf plots, and dot plots since we can assign
classes to the data.
In part 4 of the project we constructed a 99% confidence interval estimate for the population
proportion of yellow candies, and a 95% confidence interval estimate for the population mean number
of candies per bag. We then summarized the meaning of our estimates. For the individual portion of the
project I described the meaning of the confidence interval. Our work is as follows:
1. Construct a 99% confidence interval estimate for the population proportion of yellow
candies. Show your work, including the computations for the margin of error and the
critical value.
(1 )
/2
485
= 2268 = 2268 = .2138 .01/2 = .005 = 2.5758
6
.2138(1.2138)
Lower Bound = . 2138 2.5758 2268
= .1916
.2138(1.2138)
Upper Bound = . 2138 + 2.5758 2268
= .236
.236.1916
Margin of Error = 2
= .0222
2. Construct a 95% confidence interval estimate for the population mean number of
candies per bag. Show your work, including the computations for the margin of error
and the critical value.
/2
2.88
Lower Bound: 59.68 2.0262 = 58.7334
38
2.88
Upper Bound: 59.68 + 2.0262 = 60.6266
38
60.626658.7334
Margin of Error: 2
= .9466
3. Discuss and interpret (with complete sentences) the results of each of your two
interval estimates.
We are 99% confident that the population proportion of yellow candies is . 2138 .0222.
We are 95% confident that the population mean of the skittles per bag is between 58.7334 and
60.6266 candies.
intervals show how accurately sample data represents the population. Depending on how important
accuracy is to the situation, confidence intervals can be used to determine if data is suitable for decision
making. Confidence intervals provide a range of values and a confidence level that the population
7
parameter falls within this range of values. The confidence level is expressed as a percentage. In
situations where accuracy is important, and obtaining a parameter from an entire population is cost
prohibitive, confidence intervals can be used to determine the most effective survey size. An increase in
the survey size can have different results depending on the needs of the statistician. Increasing the
survey size can increase the confidence that the population parameter will fall within the range of
values, or it will lower the width and margin of error of the range of values with the same level of
confidence, or it can increase the confidence and decrease the margin of error. In general, the
confidence interval is used to determine how confident one can be that a statistic represents the
population parameter.
The biggest takeaway for me from the term project was learning new skills that I will be able
to use for problem solving in the future. The skill that I think will be the most useful that I
learned during the project was how to calculate and interpret confidence interval estimates for
population proportions and population means. It is often very difficult if not impossible to gather
Prior to taking this course I was working on suggestions for solving a problem at my work.
The problem was that rail vehicles were being delayed because headlamps on the vehicles were
going out during service, and had to be replaced at the end of the line where there is not enough
time to complete the replacement. Learning how to construct a confidence interval estimate for
the population mean gave me a new tool to use to look for a solution to the problem. Using
sample data from the repair database, I generated a 99% confidence interval to estimate the mean
headlamp life. The confidence interval can now be used to determine a replacement interval
8
which results in a reduction of delays due to headlamp replacements, without drastically
The applications of population mean and population proportion estimates are virtually
endless. Working through these estimates for the term project has given me a better
understanding of what goes into the voting statistics before an election, and the importance of a
parameter. I have been confused many times about why estimates of the outcome of an election
are incorrect. After completing the project I have come to realize that many of the reasons why
the statistic did not represent the population are due to limitations of the sample data and not the
statistical calculations that were used. Completing this project has given me a better
Conclusion
In order to complete the term project I had to expand my skills. I learned how to calculate
proportions and display the information in a Pie and Pareto Chart. I learned how to display data
using a box plot in order to easily identify outliers in the data. I learned how about histograms
and how to identify the type of distribution. I learned the difference between categorical and
quantitative data, and finally, I learned how to take the sample statistics and apply it to the
population to make estimates of the population parameters with a certain level of confidence.