Anda di halaman 1dari 6

Kali (Taylor) Hansen

Skittles Project

This semester for MATH 1040 students were given a term project to put together all the
concepts we learned over the semester. The project starts by having each student from the two
different statistic classes get a 2.17 oz bag of Skittles and see how many of each color are in
their bag. After the class collects everyones bag of Skittles, we find the proportion of each color
for the total number of bags. The proportion is the total percentage of each skittle color. We
learned at the beginning of the semester how to find the mean and standard deviation in a
scenario. This project required us to find the mean number of candies per bag and standard
deviation of the number of candies per bag. Knowing the mean and standard deviation will help
solve other scenarios in the project as well as in other real life scenarios. We also use other
concepts by putting our collected data into different forms of charts: categorically and
quantitatively. After we determine which chart is to be used for the information collected, we
now start putting our data into different scenarios. The first group of scenarios we solve by
using confidence intervals. A confidence interval is a range of values used to estimate the true
value of a population parameter. We use three different scenarios to solve the proportions: a
90% confidence interval, the mean with a 99% confidence interval, and standard deviation with
a 98% confidence interval. To solve these scenarios we use either a z-score chart for the
proportion, the t-chart for the mean, or the Chi-square chart for the standard deviation. These
different charts are different from one another in finding the confidence intervals in each
scenario. The last concept used in the project is the hypothesis test. There is a long process in
solving a hypothesis test in both math and science experiments. The concepts are the same. We
first need to find a claim or theory and solve either to prove it right or wrong. In math, there are
two types of hypothesis tests: proportion or the mean. For this project we use both for
different situations.

Categorical data:
Total Number of Each Color from 50 bags of Skittles

Kali (Taylor) Hansen


Number of
Red Skittles
My Sample
My Sample
Percentage
Classes
Sample
Classes
Sample
Percentage

Skittles Project

7
11.29%

Number of
Orange
Skittles
11
17.74%

Number of
Yellow
Skittles
11
17.74%

Number of
Green
Skittles
15
24.19%

Number of
Purple
Skittles
18
29.03%

Total
Number of
Skittles
62
100%

548

607

629

648

612

3044

18%

19.94%

20.66%

21.29%

20.11%

100%

Overall the data collected from My Sample compared to the Classes Sample are relatively
similar in that the proportions of the collected data appear to be close in the total percentage.
There is only a 3.29% difference from the most collected of the Classes Sample, which is the
Green Skittles at 21.29%; to the least collected of the Classes Sample of Red Skittles at 18%. My
Sample data has a difference of 17.74% between the largest amounts collected of Purple
Skittles to the least amount which was the Red Skittles. My Samples difference percentage is
larger because I had a much smaller sample of 62, compared to the entire Classes Sample of
3044. Both My Sample and the Classes Sample showed that Red Skittles were least. Purple is
my favorite Skittle flavor. I was delighted that I had 29.03% more Purple Skittles in my Skittle
bag. The Purple and Yellow Skittles were very close with a .55% difference in the entire Classes
Sample. Purple appears to be a common color in a Skittles bag.

Quantitative Data:
Summary statistics for the total number of Skittles in 50 bags:
Column
n Mean Std. dev. MedianRangeMin Max Q1Q3
Skittles collected data 50 60.885.5093242
61
42 44 86 60 63

Kali (Taylor) Hansen

Skittles Project

Number of Skittles per Bag- 50 bags

The graphs above are other ways to look at the collected data. The graph on the left, the
histogram, represents the total number of Skittles from everyones bag from each of the two
math classes. The 60-65 range is shown to be the most frequent of the total number of Skittles
collected in one bag. The sample from My Bag had 62 Skittles total; my total ranged from 60-65.
The graph on the right is boxplot of the collected data. Both graphs show that the data is a little
skewed to the left due to the individual who had 86 candies in their bag; this causes graphs to
have a large outlier.
Number of Skittles in My Bag
62

Total Number of Bags in Sample


50

Reflection
The project wanted the classes to learn how to make data categorically and quantitatively.
Categorical data is the colors in a Skittles bag, which are represented by a pie graph and Pareto
chart. The graphs show the proportions for each Skittle color. The pie graph doesnt clearly
show the difference of each Skittle color, it makes it appear that all Skittle colors are almost
evenly divided in the sample. According to our data, this is not the case. The Pareto chart is the
best to display the categorical data for this sample. This graph shows the total number of
Skittles of each color. The height of the bars displays the difference of each color in the entire
sample. Green and red are the tallest and shortest bar heights, respectively. The quantitative

Kali (Taylor) Hansen

Skittles Project

data is represented by the 5-number summary, histogram, and boxplot. The 5-number
summary is information that helps to create the boxplot and histogram of the collected data.
One student had a high total number of Skittles, which caused a large outlier. The boxplot, for
me, is harder to understand how the outlier affects the collected data. The histogram, I believe,
provides a better visual of how one students total caused a large outlier.

Confidence Interval:
A confidence interval is a range of values used to estimate the true value of a population
parameter. We will be using the confidence level, which is the probability 1- that the
confidence interval actually contains within the population parameter, with the following data
to confirm that the different confidence intervals for the population proportion, mean, and
standard deviation will actually fall within the range of parameter.

0.187<p<.0125
o We have a 95% confidence level that the interval from 0.187 to 0.215 actually
does contain the true value of p, which is the population proportion of Purple
Skittles. This means that if we were to select many different samples of the size
of 3044 and construct the corresponding confidence interval, 95% of them would
actually contain the values of the population proportion p.
58.82<<62.98
o We have a 99% confidence level that the interval from 58.82 to 62.98 actually
does contain the true value of , which is the mean number of Skittles per bag.
This means that if we were to select many different samples of the same size and
construct the corresponding confidence interval; in the long run 99% of them
would actually contain the value of .
4.42<<7.08
o We have a 95% confidence level that the interval from 4.42 to 7.08 actually does
contain the true value of , which is the standard deviation of the number of
Skittles per bag. This means that if we were to select many different samples of
the same size and construct the corresponding confidence interval; in the long
run 95% of them would actually contain the value of .

Hypothesis Tests:
A hypothesis is a claim or statement about a property of a population. We will use the
hypothesis test, which is a procedure for testing a claim about a property of a population, with
the data in situations to see if the data is within the range of the claim.

Kali (Taylor) Hansen

Skittles Project

0.01 significance level to test the claim that 20 % of all Skittles are green
o Fail to reject null hypothesis
o The sample data does support the claim that 20% of all Skittles are green.
0.05 significance level to test the claim that the mean number of candies in a bag of
Skittles is 56
o Reject the null hypothesis
o The sample data doesnt support the claim that the mean number of Skittles in a
2.17-oz bag is 56.

Reflection:
By using the confidence interval testing we could see in three different situations how the data
collected either passed or failed in meeting the difference requirements for each confidence
interval. The first situation is the 95% confidence interval for the true proportion of Purple
Skittles. This situation passed the requirements for the true proportion that this sample had to
be a simple random sample. The second situation is the 99% confidence interval for the true
mean for the number of Skittles per bag. This situation passed each requirement, that this
sample had to be a simple random sample and that the population is normally distributed or
greater than 30; which the population was 103 and greater than 30. The last situation is the
95% confidence interval for the standard deviation that is the number of Skittles per bag. This
situation passed the requirement that the sample is a simple random sample and the
population must have normally distributed values, despite the size of the sample, which the
distributed values were 4.42 to 7.08.
In testing the hypothesiss claims for each of the different situations, each problem had to be
analyzed and solved through the hypothesis equation to see if the claims pass each of the
requirements for the hypothesis tests. The first situation we had to use a 0.01 significance level
to test the claim that 20 % of all Skittles candies are green. This being a population proportion
problem the requirement for this problem is that the sample had to be a simple random
sample, the conditions for a binomial are satisfied and that np is greater than or equal to 5 and
nq is greater than or equal to 5 are both satisfied. With the claim being that 20% of all Skittles
candies are green, it passes all of the requirements by accepting the null hypothesis that
proportion equal 20%. The second situation we had to use a 0.05 significance level to test the
claim that the mean number of candies in a bag of Skittles is 56. The requirements for the
population mean is that the sample is a simple random sample and either or both of these
conditions are satisfied: the population is normally distributed or n greater than 30. The Pvalue of this claim was less than the significance level, where the significance level is supposed
to be higher than P-value. Therefore the situation rejected the null hypothesis that the mean is
equal to 56.

Kali (Taylor) Hansen

Skittles Project

Anda mungkin juga menyukai