Anda di halaman 1dari 7

Skittles Project Group 3

Joel Hanes, Alamissi Ouro-Gneni, Virginia Darger, Lily Ratliff


Introduction:
The study of statistics is composed of ways to prepare data collection, analyze results and
make conclusions about the particular study. In the course Intro to Statistics 1040, we were
assigned a semester project to help us understand various concepts of statistics and apply those
concepts to everyday life. At the beginning of the semester, each student in the class purchased a
bag of 2.71oz Skittles candy. Our initial task was data collection, which we did by counting how
many skittles of each color were in each individual bag and total Skittles per bag. Every student
had their own bag and the data was compiled into a spread sheet in order to analyze the results.
Our next task was to analyze and interpret the data. We did this by constructing visual graphs and
charts as well as statistical tables seen below. The goal for this assignment is to better understand
the concepts of the study of statistics and key components to effectively interpret the validity of
statistical studies. Understanding these concepts is useful in our everyday life and the Skittles
project helped the concepts become more relatable.

Data Collection:
Organizing and Displaying Categorical Data: Colors

Observations:

The colors came out to be relatively close in number. We expected a color or two to be lower in
count and maybe a favored color to be out in front, which we did see in half the charts while the
others were more evenly distributed. The data from our bag follows the pattern of the whole
classes bag except for the color purple which is higher in ours, but the other colors follow the
pattern of the whole classes.
Group Three Bag:
Summary statistics:
Column n M
Varia
ea nce
n
NUMBE 5 48. 43.3
R
4

Medi
an
46

Ra
ng
e
16

M
i
n
4
4

M
a
x
6
0

Q Q S
1 3 u
m
4 4 24
5 7 2

Class Bags:
Summary statistics:

Frequency table results for Total skittles per person:


Count = 26
Total skittles per
Frequency
Relative
person
Frequency
54
1 0.038461538
55
1 0.038461538

Percent of
Total
3.8461538
3.8461538

Cumulative
Frequency
1
2

56
58

1
5

0.038461538
0.19230769

3.8461538
19.230769

3
8

59
60

6
6

0.23076923
0.23076923

23.076923
23.076923

14
20

61
62

5
1

0.19230769
0.038461538

19.230769
3.8461538

25
26

Organizing and Displaying Quantitative Data: The Number of Candies per


Bag

Mean: 59.1
Standard Deviation: 1.90
5 Number Summary:
Min: 54
Q1: 58
Q2: 59
Q3: 60
Max: 62
Observations:

The shape of the distribution is almost a normal bell shape, although it is skewed to the left. It
seems that the class data is relatively normal which can be expected considering there are not
extreme outliers. We expected there would be about the same number of candies in each bag,
which wasnt the case. The number of candies from my individual bag was 26 and the total
number of bags in the sample is 26.
Reflection
The categorical data are qualitative variables that consist of names or labels (not
numbers) which represent counts or measurements. The pie charts and bar graphs make sense for
categorical data because they compare one categorical variable against others. Computation, or
arranging in ordering such as low to high, does not make sense for categorical data however,
survey responses of yes, no, and undecided are more appropriate.
Quantitative data is numerical variables consisting of number that can be measured,
ordered, or counted. The scatterplot and steam plot make sense to house quantitative data
because, they help determine whether there is a relationship between two variables or separating
each value into two parts. Computation makes sense for quantitative data to find an average or
mean, standard deviation, five numbers summary, and sum.

Confidence Interval Estimates


A confidence interval is a range of specific values that is used to represent what the true
value of a population parameter may be. The confidence interval is a range of values instead of
just a single number so statisticians can have a better understanding as to how close the
calculated estimate is to the population. We also associate the confidence level value (in the form
of a percentage) with the confidence interval because it provides us with a value of how accurate
our population parameter calculations are. The confidence level allows us to report our
confidence in the estimate population mean value being in-between the range of our confidence
interval. In the following section, our group performed three confidence interval estimates using
the class data of Skittles.

The 95% confidence interval estimate for the true proportion of purple
candies:

Based on the calculations from our sample data, we are 95% confidence that the interval between
0.182 and 0.222 actually does contain the true value of the population proportion of the purple
color candies.
The 99% confidence interval estimate for the true mean number of candies
per bag:
99% confidence interval results:
: Mean of variable
Variable
Sample
Mean
Mean candies per bag
59.076923

Std. Err.

D
F
2
5

0.371786
03

L. Limit

U. Limit

58.0405
93

60.1132
53

Based on the calculations from our data, we are 99% confident that the interval between 58.041
and 60.113 does contain the true mean value of the population of number of candies per bag.
The 98% confidence interval estimate for the standard deviation of the
number of candies per bag:
98% confidence interval results:
: standard deviation of variable
Variable
Standard Deviation.
Mean candies per bag

1.895744234

D
F
2
5

L. Limit

U. Limit

1.4238975
74

2.7922132
62

Based on the calculation from our data, we are 98% confident that the interval between 1.424
and 2.792 does contain the true value of the population standard deviation of the number of the
candies per bag.

Hypothesis tests
A hypothesis is an assumption or claim about some aspect of a population. The various
parameters of the population involved in hypothesis testing are mean, standard deviation,
probability, and variance. Hypothesis tests are used to evaluate the accuracy of the claim
(hypothesis) made about the property of a population. In the following section, our group
performed two hypothesis tests on our classs Skittle candy data.
Test the claim that 20% of all Skittles are green (class bags):
Hypothesis test results:
p: Proportion of successes
H0: p = 0.2
HA: p 0.2
Proportion

Cou
nt

Tot
al

Sample
Prop.

Std. Err.

Z-Stat

Pvalue

311

15
36

0.20247396

Variable

0.0102062
07

Sample
Mean
60.5

Candies per bag group 3

Std. Err.

D
F
3

0.86602
54

0.242397
42

0.808
5

T-Stat

Pvalue
0.013
8

5.19615
24

Since our p-value of 0.805 was greater than , we fail to reject the claim. We have sufficient
evidence to support the claim of H0 that 20% of all Skittles candies are green.
The mean number of candies in a bag of Skittles is 56 (class bags):
Hypothesis test results:
: Mean of variable
H0: = 56
HA: 56
Variable
Candies per bag class

Sample
Mean
59.076923

Std. Err.
0.371786
03

D
F
2
5

T-Stat
8.27605
89

Pvalue
<0.00
01

Since our p-value of 0.0001 was less than , we reject the claim that the mean number of candies
in a bag of Skittles is 56. There is sufficient evidence to warrant a rejection of the claim.

Reflection:
The conditions for doing interval estimates and hypothesis tests:
The sample must be a simple random sample or the sample size n must be > 30.

The population needs to have a normal distribution.

The data for our sample met both requirements as the class sample was simple and
random, and although our sample size was less than 30, it was generally normally distributed as
shown in the histogram from the previous section.
An error that could have occurred is that although there is a normal distribution, it is
slightly skewed, and the population size is less than 30, which could cause results to be skewed.
The sampling method could be improved by making the sample more random. For
example, we could likely get more accurate results were the sample to have been taken from
students in statistics classes all throughout Utah.

Reflective Writing of Math


1040
By Joel Hanes
As a result of the process of creating the skittles project and the course of the class Ive learned that my preconceived ideas and my beliefs were unfounded and mostly wrong. Coming into the class I believed statistics was
just number manipulating and making ideas fit your own. I have found that, except for the people who do
manipulate data consciously or subconsciously, the use of statistics is a scientific act and you can create or find data
that can be useful in daily life or a job.
Being able to use a software program such as Statcrunch to store data in an orderly way, organize the data in
different ways depending on needs, sorting the data to see it from different viewpoints and being able to use and
apply the collected data to find meaning from it and create visual representations of collected data to be used to
show others in an easy to understand format has been a great experience and appreciated tool.
I will be able to use the visual representations such as pie charts and histograms and also the confidence intervals
and hypothesis tests in my other classes such as anthropology; to show the differences and overlaps between cultures
and beliefs. In archaeology I can show the differences and overlaps between sites and artifact populations.
I will also be able to use the tools learned through the project and the class in my personal life and work as well. I
am active and will be able to calculate a mean for respirations, blood pressure and time spent during an activity to
see if I improve over time. With work I will be able to calculate the same with patients I take care of by keeping
track of their vital signs and see if it changes over time.
Through this project I have been able to learn how to create better reports for my classes by being able to use
graphs and I have learned an appreciation of how statistics is used to solve problems and understand the world
around us and it is not just data manipulation.

Anda mungkin juga menyukai