Anda di halaman 1dari 9

Lucy Wood

9/14/14
Kiker Pd. 6
AP Statistics
The Data Exploration Mini-Project
A t-shirt is something that almost every single human has. Some have more than
others, but everyone has at least one. Most of the time, high school students will wear
them to school due to the fact that they are comfortable and easy to wear. When asked
how many t-shirts each high school student owns, the answers ranged from 15 all the way
up to 100.
The values of the data found represents how many t-shirts each student owns.
Because there is no other possible unit, the units are found in t-shirts. The question How
many T-Shirts do you own? is not something that can be collected by observation, so in
order to collect all my data, I had to individually ask thirty people. I asked them both in
person, and by text message because I wanted to give them a chance to go home and
either estimate by looking at their t-shirt collection, or counting them in order to get an
exact number. The reason I asked this question was because I have a lot of t-shirts myself
and I wanted to see how many my friends owned because they are such a popular
clothing item.
The most important calculations, when dealing with a bunch of data, are sample
size, the five number summary, the mean, median, range, standard deviation, variance,
and the inter quartile range. The first number we have is the sample size. Because I asked
30 people how many t-shirts they owned, the sample size would be 30 because there are
30 data points. The next few numbers are in the five number summary which consists of
the minimum value, the first quartile, the median, the third quartile, and the maximum.

The minimum value is 15 because the person with the least amount of t-shirts in my data
collection had 15 t-shirts. The first quartile is found after having found the median. To
find the median, it is necessary to put the data numbers in numerical order. Because there
are 30 numbers, the median number is going to be between the 15th and the 16th numbers.
The 15th number is 32 and the 16th number is 33. In order for us to find the median, we
must find the mean of these two numbers by adding them together and dividing by two.
With this, we get that the median number for this data set is 32.5. Next, we can find the
first and third quartiles by finding the median number starting from the minimum and
ending with the median for the first quartile and starting with the median and ending with
the maximum for the third quartile. The middle number in the first set is 25, so 25 is the
first quartile. The middle number for the second set is 57, so 57 is the third quartile. The
last number in the 5 number summary is the maximum. The highest number in the data
set is 100, so therefore, 100 is the maximum. In order to find the mean of the data, we
must add all 30 numbers together to get 1237. Next in order to get the average of all of
the numbers, we divide 1237 by 30 because there are 30 numbers. After doing this, we
end up with 41.233, which is the average of the data points. The next number we must
find is the range. In order to find this, we just subtract the minimum value of 15 from the
maximum value of 100 to get 85. Next, we need to find the standard deviation of the data.
In order to find this we take the square root of the sigma of the x values minus the mean
squared divided by the amount of data points minus one. We find the mean of the
numbers, which we already have as 41.233. After we have the mean, we subtract it from
each value in the data set. Finally, to finish the table, we square the value minus the mean
and add all of those values together. That is what we put on top of the equation, which

would be the sigma of the value minus the mean squared. On the bottom, we would put
29 because 30 minus 1 is 29. The sigma of the X values minus the mean squared is
13135.367. That divided by 29 is 452.944, and the square root of that is 21.9282, which is
the standard deviation. In order to find the variance of this data set, all we have to do is
square the standard deviation. 21.282 squared is 452.944. Lastly, in order to find the inter
quartile range, or IQR, we just find the difference between the first and third quartiles.
The first quartile, 25 subtracted from the third quartile, 57, is 32, which is the IQR.
To find any outliers in my data set, we must use the equation Q3+1.5IQR and
Q1-1.5IQR. When we input numbers into these equations, we have 57+1.5(32) and
25-1.5(32). The answers to these equations are 105 and -23. Because we dont have any
data points over 105 and less than -23, we have no outliers.

When adding 100 to each number in my data, some numbers of the sample size, five
number summary, mean, median, range, standard deviation, variance, and the inter
quartile range change. Because there are still 30 data points, the sample size stays at 30.
Inside the 5 number summary though, numbers change. The minimum is no longer 15, it
is now 115 and the maximum is no longer 100, it is now 200. The mean stays the same
except it increases by 100, so it is now 141.233. The first and third quartiles also only add
100 to each, so the first quartile would be 125 and the third quartile would be 157. The
median is still the same number as well, but just 100 more. It is 132.5 because the mean
of 132 and 133 is 132.5. The range stays the same as before, however, and is 85 because
200 minus 115 still equals 85. Because the mean changes according to the X values, the
standard deviation for the new data is still the same due to the fact that all of the X values
are consistent with each other when changing. That being said, it is still 21.282. The
variance is also still the same because variance is effected by standard deviation and
because standard deviation didnt change, variance didnt change and is still 452.924.
IQR also didnt change either because the difference between the new Q1 and Q3 is still
the same as the difference of the old Q1 and Q3, which is 32. Lastly, to find any outliers
in my new data set, we must use the equation Q3+1.5IQR and Q1-1.5IQR again. When
we input numbers into these equations, we have 157+1.5(32) and 125-1.5(32). The
answers to these equations are 205 and 77. Because we dont have any data points over
205 and less than 77, we still have no outliers

Lastly, when we increase our original data points by 50%, almost all of the
numbers in the sample size, five number summary, mean, median, range, standard
deviation, variance, and the inter quartile range change. The sample size is still 30,
however, because there are still 30 data points. The 5 number summary changes a lot
though. The minimum is now 22.5 because 50% of 15 is 7.5. 7.5 added to 15 is 22. The
maximum is now 150, because half of 100 is 50. 50 added to 100 is 150. The mean is
now 61.833 because all of the new numbers added together and divided by 30 is 61.833.
The first quartile is 150% of the original first quartile, 25. The new first quartile is 37.5.
The third quartile is 150% of the original third quartile, 57. The new third quartile is 85.5.
The median number is 150% of the original median, 32.5. The new median is 48.75,
which is also halfway between the new 15th number, 48, and the new 16th number, 49.5.
The range is also affected by the 50% increase to each number. Because the maximum
and minimum numbers increased inconsistently with each other, the new range is 150
minus 22, which is 128. When each number was increased by 100, the mean did not
change because each X value increased consistently with one another. However, the
mean has changed now that the data points changed according to their original values.
Because the mean changed, the standard deviation and variance changed along with it.
The Standard deviation changed from 21.282 to 31.945 because each variable in the
standard deviation equation increased. The variance also increased from 452.924 all the
way to 1020.483. Also, because both Q1 and Q3 changed, the IQR changed. The
equation for IQR is Q3-Q1. That being said the new IQR is 85.5-37.5, which equals 48.
This is a significant increase from the previous IQR. Lastly, to find any outliers in the last
data set, we must use the equation Q3+1.5IQR and Q1-1.5IQR again. When we input

numbers into these equations, we have 85.5+1.5(48) and 37.5-1.5(48). The answers to
these equations are 157.5 and -34.5. Because we dont have any data points over 157.5
and less than -34.5, we still have no outliers

In my original data, we can easily figure out what percent is greater than 5 units above
the mean. The original mean is 41.233. The number that starts to be greater than the mean
is 45. 5 numbers above the mean is the 1st 60. After that 60, there are 6 numbers that are
numerically greater than 5 units above the mean. To find the percentage, we divide 6, the
number of data points 5 units about the mean, by the amount of data points total, which is
30. 6/30 is equal to .2 or 20%. To find the percent that is 3 units below the mean and 2
units above the mean, all we have to do is divide 5 by 30 since there are only 5 variables
between 3 units below and 2 units above the mean. 5/30 is .1667 or 16.67%. Lastly, to
find the number of units required for the top 10% of the data, we must think about how
many data points there are. Because there are 30, we can multiply by 10% or .1 to get 3
because 30*.1 is 3, therefore, the top 10% of my data is the first 3 points.
When asking my classmates how many t-shirts they owned, I expected to get
similar answers across the board. I didnt think that my range would be as high as it is.
Because of this, I have come to a conclusion that while t-shirts seem to be a major part of
everyones closets, the amount different people have varies incredibly. One thing I
noticed is that males tend to have more t-shirts than females probably because they wear
them more often than females do. The mean amount of t-shirts the students I asked is a
little bit higher than the median because one student had a much larger amount than most
of the others, owning about 100 t-shirts. The reason this wasnt an outlier, however, is
because the person with only 15 t-shirts balanced out the large number and made it not as
drastic on the mean. While the numbers range from 15 all the way to 100, everyone I
asked had a fair amount of t-shirts, which proves my original point; they are an essential
item in everyones closet.

Anda mungkin juga menyukai