Anda di halaman 1dari 9

Michael Newell

Fall Semester 2017


Math 1040 Skittles Project

In this project we will be analyzing and organizing data from different skittles bag
from our classroom sample. Each person in the class was required to purchase a
2.17 oz bag of skittles. We were to count the number of each color of skittles and
report it to our professor. He then compiled the data into a chart on excel for us to
use as reference for our tests and analyzation.
In the pie chart and pareto chart above, it shows the number of candies in the
sample size of our class and how they are distributed. The distribution of the
classes data is actually quite a bit different then my own bag of skittles. I thought
that my data would be relatively close to the data of the class. However, as you
will see below only the amount of yellow candies I had in my bag was similar to
that of the class. I was particularly surprised at the small amount of purple skittles
that my bag as compared the amount of the class.

My Bag of Skittles
Orange Candies Red Candies Green Candies Yellow Candies Purple Candies

18 19 8 12 3
.3 .317 .133 .2 .05

Class Sample Skittles Bag

Orange Candies Red Candies Green Candies Yellow Candies Purple Candies

338 361 312 351 361

.196 .21 .181 .204 .21
Summary Statistics (Per Bag)
n Mean Std. Median Min Max Q1 Q3
28 61.5 1.86 60 57 65 60 63
In the histogram and boxplot below the data appears to be approximately bell
shaped. It is skewing to the left very slightly but it isnt enough to make a big
difference. I was surprised to see that as a class we did have a fairly consist amount
of the same skittles in our bags. So I guess kudos to the skittles company. As
shown above the mean amount of skittles for the class sample was 61.5 and my
own personal bag contained 60.
Number of Bags in the Sample Candies in my Bag
28 60 pieces

Two basic types of data are quantitative and categorical. Quantitative data are
values that can be measured or counted, it is also called numerical data. Some
examples of quantitative data would be length, weight, and height in inches. Time
as well as financial numbers. Categorical data is values or observations like names
or labels that can be sorted into groups or categories but cannot be measured.
Categorical data can take on numerical values in some cases but those numbers
dont have any meaning math wise. Some examples of categorical data would be
gender, eye color, race, part numbers. Graphs that make sense with categorical data
would be pareto chart, bar graph, and pie chart. Scatterplot, stem-and-leaf plot,
time-series graph, and dot-plot are example of graphs that make sense using
quantitative data. In quantitative data you can add, multiply, subtract, and divide
the data. Calculations with quantitative data still makes sense mathematically when
manipulated but you could not make calculations with the categorical data.

Confidence Intervals
Confidence intervals measure the probability that a population parameter will fall
between two set values. What a confidence interval does is allow us to estimate the
range in which our true population parameter falls. Because the fact that no
estimate can be 100 percent reliable, we must be able to know how confident the
estimates are. For example if samples were taken over and over and the 99%
confidence interval was determined for each sample, 99% of the intervals would
contain the population mean.
Below you will find examples of the my confidence intervals calculated from a TI-
83 graphing calculator:
The first confidence interval was done for a 99% confidence interval for the true
population proportion of yellow candies. May results show that I can say with 99%
confidence that the true population portion is between .179 and .229
The second confidence interval I did was a 95% confidence interval for the true
mean number of candies per bag. I used T interval since this is regarding a mean. I
can say with 95% confidence that the true mean is between (60.82, 62.26)

Hypothesis Testing

Hypothesis testing is when someone analyzes and tests an assumption about a

population parameter. It allows us to find evidence as to whether or not claims
being made are true or not.
Below are the hypothesis test Ive done on the calculator using the data from our
class sample:
In this first hypothesis test we were using a .05 significance level which can be
defined as the probability of rejecting a null hypothesis when it is actually true. We
are testing the claim that 20 % of all skittles candies are red. As you can see above
the P value came out to be .323 which is more than the .05 significance level.
Because that is the case we fail to reject the null hypothesis. The null hypothesis is
tied to our claim in this instance and so we say: There is sufficient evidence to
support the claim that 20% of all candies are red.
Our next hypothesis test is about a mean therefore we are going to use a T-Test.
The prior test was using a 1propztest. We are testing the claim that the mean
number of candies in a bag of skittles is 55. Because our claim is equal to 55 that
will be our null hypothesis and our alternate will be that it is not equal to 55. Upon
doing the calculation I found that P=0 therefore it is less than the .01 significance
level and we will reject the null. Our conclusion is: There is sufficient evidence
to reject the claim that the mean number of candies in a bag of skittles is 55.

There are 3 conditions for confidence Interval for estimating a population

1- The sample is a simple random sample.
2- Either or both of these conditions should be satisfied: the population is normally
distributed or n is greater than 30
3-There are at least 5 successes and at least 5 failures.

Conditions for a hypothesis test about a population proportion:

1. The sample proportion must be obtained from a random sample.
2. np0 is greater than or equal to 10 where P0 is the assumed population
proportion from H0 .
3. n(1-P0) is greater than or equal to 10 , where P0 is the assumed population
proportion from H0.
4. The population size is at least ten times the sample size.

It appears that our samples met these conditions,

Possible Errors
-Data could have been entered incorrectly
-Nonresponse error (someone didnt buy a bag therefore we are technically missing
that data)

I conclude that I have learned out to properly get a random sample and find the 5
number summary which helps create confidence intervals and hypothesis tests.
This allows us to make conclusions about the data and claims that are being about
it and whether or not they are supported or rejected.


At the beginning this project sounded very daunting. I wasnt the best about

working on this as I went through out the semester. I was so focused on trying to

understand the concepts we were learning that I didnt think about the project too

often. However, now that I have the knowledge to perform all the test and different

things this project asks for I can say that this wasnt very hard at all. Time

consuming yes, but not hard. This class has really opened up my eyes to how much

goes on behind the scenes. It is interesting to think that someone has a real world

job where they are doing confidence intervals and hypothesis tests. I love numbers

and have really enjoyed this stats class and I definitely am more skeptical now of

studies and data that is thrown around on TV and social media. I actually had the

opportunity at work to use the principle of combinations we were taught in class.

That experience really put into perspective for me that the things Ive learned in

this project and in class have real world application!