Anda di halaman 1dari 9

Skittles Term Project

Math 1040

December 4, 2014
By: Sarah Nielson

Skittles Term Project


Math 1040
Introduction:
Each student in the class was instructed to purchase a 2.17oz bag of regular skittles and count the total
number of skittles in the bag. The skittles were then sorted by color and an inventory was taken. The
results were submitted to the instructor who then compiled the total number of bags and the total number
of skittles by color. This information was distributed to the students who then performed a statistical
analysis with the data. This report is the statistical analysis I performed.

Objective:
The objective of this project was to become familiar with statistical analysis in real life. For the actual
project students are expected to be able to calculate with a certain degree of confidence the average number
of skittles that are a specific color from any randomly selected bag.

Procedures:
1.
2.
3.
4.

A 2.17oz bag of regular skittle was purchased by each student within the class.
The total number of skittles were counted.
An inventory of each color of skittle was taken per bag.
The data was submitted to the instructor who compiled the data and created an Excel file which was
distributed to the class.

5. In order to determine the proportion of each color within the overall sample gathered by the class a Pie

Skittles Term Project | 12/4/2014

Chart and a Pareto Chart were created from StatCrunch.

6. Tables showing the totals and calculated proportions from my individual bag and the total class
sample were created.
7. A table was then created showing the Mean, Standard Deviation, Minimum, Quartile 1, Median,
Quartile 3 and the Maximum.
8. A Histogram and Boxplot were created in order to display quantitative data.
9. Constructed a 99% confidence interval estimate for the true population of yellow candies.
10. Constructed a 95% confidence interval estimate for the true mean number of candies per bag.
11. Constructed a 98% confidence interval estimate for the standard deviation of the number of candies
per bag.
12. A hypothesis test was made using a 0.05 significance level to test the claim that 20% of all Skittles
candies are red.
13. A hypothesis test was made using a 0.01 significance level to test the claim that the mean number of
candies in a bag is 55.

Pie Chart:

F IGURE 1: P IE C HART OF THE T OTAL C LASS D ATA

F IGURE 2: P ARETO C HART OF THE T OTAL C LASS D ATA

Skittles Term Project | 12/4/2014

Pareto Chart:

My Data:
T ABLE 1: T ABLE OF THE D ATA C OLLECTED IN MY BAG

My Data and Proportions


Bag
1
Proportions

RED
15
0.246

ORANGE
11
0.180

GREEN
15
0.246

YELLOW
15
0.246

PURPLE
5
0.082

Candies per Bag


61
1

T ABLE 2: T ABLE OF THE D ATA C OLLECTED BY THE T OTAL C LASS S AMPLE

Bag
Totals
Proportions

RED
321.2459
0.212

Class Total Data and Proportions


ORANGE
GREEN
YELLOW
292.18033 316.2459 306.2459
0.193
0.209
0.203

PURPLE
276.08197
0.183

Candies per Bag


1512
1

Discussion on the single bag and total data Skittles Proportions :


When I counted the total number of skittles within my own bag I was surprised at the equality among the colors
excluding the color purple. I was curious to see if the same results would be proportional to the entire class. I noticed
that the colors red, orange, green and yellow were all within 4% of the proportions calculated from my sample.
However the purple was significantly off by 10% which is a huge difference between my individual bag and the class
sample.

Sample Calculations:
T ABLE 3: T ABLE DISPLAYING THE S TATISTICAL C ALCULATIONS FOR THE T OTAL S AMPLE

Skittles Term Project | 12/4/2014

Sample Calculations

Mean
Standard Deviation
Minimum
Quartile 1
Median
Quartile 3
Maximum

60.4
4.36
53
59
60
61
77

Frequency Histogram:

F IGURE 3: F REQUENCY H ISTOGRAM DISPLAYING THE NUMBER OF CANDIES PER BAG

F IGURE 4: B OXPLOT DISPLAYING THE MIN , Q1, M EDIAN , Q3, M AX

Skittles Term Project | 12/4/2014

Boxplot:

Discussion:
With the outlier the shape of the frequency histogram appears to be skewed right. The graphs did not reflect what I
expected to see because I did not expect to see such a large outlier. However without the outlier the range within the
box of the boxplot is within reason. The total number of candies within my own bag was within the range of the
boxplot because I had 61 candies which was Quartile 3 for the class sample. The class as a whole collected 25 bags
while the total number of candies in the class sample was 1,511.

Reflection:

Skittles Term Project | 12/4/2014

The categorical data would be separated into different categories such as the different colors of candies within the
class sample. Quantitative data would be the bags that were counted and how many candies were within those bags.
As reflected in the report the Pie Chart and Pareto Chart are used for categorical data while Frequency Histograms
and Boxplots are used for quantitative data. This is because the Pie Chart and the Pareto Chart by nature is split into
categories which makes them applicable for categorical data. However the Frequency Histogram and the Boxplot are
utilized for calculating statistical values in a plot format. The calculations that make sense for categorical data would
be proportions. The mean and standard deviation do not make sense for categorical data but rather quantitative data
because it displays the statistical quality of the data whereas categorical data shows the partitioned characteristics of
the data.

Confidence Interval Estimates


In general a confidence interval is a range of values used to describe the level of surety one may have in a
specific population parameter.
99% Confidence interval estimate for the true proportion of yellow candies.
= 1511
= 306
= 0.01
= 0.2025
(0.2025)(. 7975)
E = 2.575
= 0.027
1511
. < < .
95% Confidence interval estimate f or the true mean number of candies.
= 25
= 60.4
= 0.05
= 4.36
4.36
E = 2.064
= 1.80
25
. < < .
98% Confidence interval estimate f or the standard deviation of the number
of candies per bag.
= 25
= 4.36

(24)(4.36)2
(24)(4.36)2
<<
42.980
10.856
. < < .

For the first confidence interval estimate the true proportion of yellow candies ranges between 17.55%
and 22.95% of the total candies in the sample. This fits the proportion reported in the sample where
the proportion was 20.3% for the yellow candies. This means that you can be 95% confident that the
percentage of yellow candies in any given bag will be within the range between 17.55% and 22.9% of
the total bag.
For the second confidence interval estimate the true mean number of candies per bag was between 58.6
and 62.2. This is reflected in the boxplot where the average is around 60 and the range between Q 1 and
Q3 is about 58 and 62. This means that in any given 2.17oz bag of Skittles you can be 95% confident
that there are between 58.6 and 62.2 candies in the bag.
For the third confidence interval estimate the standard deviation of the number of candies per bag
ranges between 3.26 and 6.48. This means that you can be 95% confident that the number of candies in
any two 2.17oz bag of Skittles may vary by 3-6 candies.

Skittles Term Project | 12/4/2014

Discussion:

Hypothesis Tests
Similar to a confidence interval, a hypothesis test is used to test a property of a population parameter. The
difference is that a hypothesis test will test if a specific claim is within the level of confidence needed.
Hypothesis test using a 0.05 significance level to test the claim that 20% of
all Skittles candies are red.
= 1511
= 321
= 0.05
a.) Claim H o : p= 0.20
H 1 : p 0.20

b.) =

(1)

321
0.20
1511
0.20(0.8)
1511

= 1.2091

c.) P-value: 0.1131


d.) Fail to reject the null
e.) The sample data supports the claim that 20% of all skittles candies are
red.
Hypothesis test using a 0.01 significance level to test the claim that the
mean number of candies in a bag of Skittles is 55 .
= 25
= 60.4
= 0.01
a.) Claim H o : = 55
H 1 : 55

Skittles Term Project | 12/4/2014

b.) =

60.455
4.36
25

= 6.1927

c.) Critical Value(s)= -2.797 & 2.797


d.) Reject the null
e.) The sample data does not support the claim that the mean number of
candies in a bag of Skittles is 55 .

Discussion:
In the first hypothesis test the claim that 20% of the Skittles candies are red was tested. The sample data
supported the claim that 20% of all skittles candies are red. This means that the original claim which was
the null hypothesis failed to be rejected based on the statistical data. The second hypothesis test tested the
claim that the mean number of candies in a bag of skittle is 55. The data did not support the claim that the
mean number of candies in a bag of Skittles is 55. Therefore the null was rejected.

Reflection:

Skittles Term Project | 12/4/2014

The conditions for doing an interval estimate for a proportion state that it must be a simple random sample,
must meet binomial distribution conditions, and lastly, n 5 and n 5 which means that there will be
at least five successes and at least five failures. For estimating confidence intervals of the mean where
standard deviation is unknown the requirements are that the sample must be a simple random sample and
the sample must either be normally distributed or have n > 30. The requirements for a confidence interval
of the mean where the standard deviation is known is the same as the requirements for a confidence interval
of the mean where the standard deviation is unknown. The requirements that must be met for the
confidence interval estimate of the standard deviation are that the sample must be a simple random sample
and the population must have normally distributed values. This includes even when the sample is large. For
a hypothesis test when a claim is tested for a proportion the requirements are the same as the requirements
needed for doing an interval estimate for a proportion. The requirements that are needed to test a claim
about a mean with the standard deviation known or unknown are identical to the requirements needed for
estimating the confidence intervals of the mean whether the standard deviation is known or unknown. The
requirements needed to test a claim about standard deviation are also identical to the requirements needed
to construct a confidence interval for the standard deviation. Each of these requirements stated above were
met by our sample set. The possible errors that could have been made are Type I and Type II errors which
mean that we could have rejected the null hypothesis when it was true (Type I) or that we could have
rejected the null when it was false (Type II). However, there is no possible way of knowing if either a Type
I or Type II error occurred. The conclusion drawn from the statistical research is that if a random bag of
Skittles is purchased from a store it can be expected that each color will make up roughly 20% of the total
candies in the bag.

Anda mungkin juga menyukai