Anda di halaman 1dari 7

Lab 4 Joey Martinez

05/09/16
Math 311

The following analysis will consist of two parts. The first part will comprise of random data
generation of dice rolls for samples of size n=50. We begin by creating 2, 10, and 30 random samples of
the size specified and then compare and contrast their behavior with respect to the Central Limit
Theorem (CLT). Based on our knowledge of the Central Limit Theorem, we should expect, as the
number of samples collected increases, the distribution of the samples will tend to approximate a
normal distribution. We also predict, by the CLT, that once we reach a distribution of 30 samples, our
histogram of the data will resemble the normal distribution the most.

In part two, we will construct confidence intervals of varying levels of confidence and examine
the affects of manipulating these intervals at the 90%, 95%, and 99% confidence levels. The data
provided will consist of fat percentages in bottled sauce from a processed foods company, where 30
random samples had been collected. After our analysis of the data from bottled sauce, we will move to
another confidence interval problem concerning noise levels in area of urban hospitals.

Part I)

In this section, random samples of size 50 were generated from the minitab random data
generator to emulate dice rolls, as mentioned above. Subsection A will discuss the expected theoretical
distribution for rolling a die; we will also examine the expected value and standard deviation of such a
distribution. Then, in subsections B, C, and D, we will proceed to gradually increase the number of
samples collected and offer descriptive statistics, histograms and an analysis of our results. Finally,
subsection D will give a commentary on the behavior of our distributions as we increase the number of
samples collected.

A) Before we begin, we shall discuss the theoretical distribution of rolling a die. Given that the
act of rolling a die is discrete, independent of previous rolls ,and has an equal (1/6)
probability for each face of the die, our theoretical distribution should be a discrete
distribution; we demonstrate this distribution graphically as:

Uniform Distribution of Die Roll


0.18

0.16

0.14

0.12
Density

0.10

0.08

0.06

0.04

0.02

0.00
1 2 3 4 5 6
X= Number Rolled
Lab 4 Joey Martinez
05/09/16
Math 311
Based on our knowledge of this theoretical distribution, we calculate the mean to be E(x)=3.5=µ
6
1
and the standard deviation to be σ=1.7078= √∑ k 2 ( ) − (3.52 ) . These values indicate that after
𝑘=1 6

repeated rolling, we expect the average value of the die to give 3.5 and the typical roll to fall 1.7078
away from this average.

B) We now generate 2 random samples of size 50 representing the roll of a single die, where x
represents sample 1 and y represents sample 2. We display a distribution of the sample
means and the descriptive statistics as follows:

Histogram of Sample Means (x̄,ȳ)


Normal

9 Mean 3.35
StDev 1.108
8 N 50

6
Frequency

0
1 2 3 4 5 6
Mean (x̄,ȳ)

Descriptive Statistics: Mean (x̄,ȳ)

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


Mean (x,y) 50 0 3.350 0.157 1.108 1.500 2.500 3.500 4.125 5.500

We note, by the results above, our distribution is slightly asymmetric, with a minor skew to
the left and uniformity in its center. The initial mean, µ=3.35, of our sampling distribution is
fairly close to the expected mean of 3.5, differing by only a value of .15. We also begin with
a standard deviation of our sample means of σx̅=1.108, which we expect to decrease as we
increase our number of samples.

C) Next, we generate 10 random samples of size 50 for die rolls, where x_i represents the i^th
sample mean collected. We have our results as follows:
Lab 4 Joey Martinez
05/09/16
Math 311

Histogram of Mean (x_i)


Normal
14
Mean 3.49
StDev 0.5683
12 N 50

10
Frequency

0
1 2 3 4 5 6
Mean (x_i)

Descriptive Statistics: Mean (x_i)


Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
Mean (x_i) 50 0 3.4900 0.0804 0.5683 2.2000 3.0750 3.5000 3.8250 4.6000

As can be seen above, our distribution has now become fairly symmetric, with a noticeable
unimodal peak at approximately 3.5. There is a negligible skew to the left; however, the
mean of our sample means has nearly approached the expected value of 3.5 and is given by
µ=3.49. As predicted, the spread of the sample means has been pulled in closer to the
center toward 3.5, indicated by a smaller standard deviation of σx̅=0.5683.

D) Finally, we generate 30 random die roll samples of size 50, with the jth sample being denoted
by y_j. Our results are given by the following histogram and descriptive statistics:

Histogram of Mean(y_j)
Normal
14
Mean 3.458
StDev 0.3050
12 N 50

10
Frequency

0
1 2 3 4 5 6
Mean(y_j)
Lab 4 Joey Martinez
05/09/16
Math 311
Descriptive Statistics: Mean(y_j)

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


Mean(y_j) 50 0 3.4580 0.0431 0.3050 2.5333 3.2583 3.4667 3.6417 4.0000

Unlike our previous sampling distributions, the most noticeable feature of our 30
collected samples is the level of compactness and symmetry that has been achieved.
Although the mean of the sample means has fallen to µ=3.4580, the distribution above
approximates normality the most and displays the least variability. This observation is
verified by another sizable reduction in the standard deviation, now given by σx̅=0.3050.

E) The most obvious quality of the histograms above is their immediate change in shape as the
number of samples collected increases. Initially, the distribution of sample means collected
from two random samples has a no semblance of being normal, even containing a center
that appears to be uniform with a slight difference between our mean of the sample means
being 3.35 instead of approximately 3.5. However, we can observe an almost instant affect
on the shape of our sampling distribution by simply increasing our number of collected
samples to 10. There is a significant decrease in variability, with our shape now beginning to
slightly resemble a normal distribution and our mean of the sample means begins to
approach the expected mean of the population. The greatest transformation occurs when
the number of samples collected reaches 30, where our distribution has nearly achieved
normality and is the least variable, with a standard deviation of .3030 and mean of 3.458.
Thus our results above relate to the Central Limit Theorem by demonstrating how the
sampling distributions of sample means approaches normality by increasing our number of
samples collected, regardless of the population. In our case, regardless of collecting samples
from a population with a discrete and uniform distribution. We visually represent this
concept by the histograms below, where the number of samples collected increases from
left to right:

Histogram of Sample Means (x̄,ȳ) Histogram of Mean (x_i) Histogram of Mean(y_j)


Normal Normal Normal
Mean 3.35
14 14
9 Mean 3.49 Mean 3.458
StDev 1.108 StDev 0.5683 StDev 0.3050
8 N 50 12 N 50 12 N 50

7
10 10
6
Frequency

Frequency

Frequency

8 8
5

4 6 6

3
4 4
2
2 2
1

0 0 0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Mean (x̄,ȳ) Mean (x_i) Mean(y_j)
Lab 4 Joey Martinez
05/09/16
Math 311
Part II)

The following data contains the percentage fat in bottled sause of 30 randomly collected
samples taken by a scientist wishing to assess the percentage of fat contained in a processed food
manufacture’s bottled sauce. The purported percentage fat contained in a typical bottled sauce is 15%,
as advertised by the same company. Suppose previous measurements have found the standard
deviation to be 2.6% fat. The collected data is as follows:
Sample ID Percent Fat
1 15.2
2 12.4
3 15.4
4 16.5
5 15.9
6 17.1
7 16.9
8 14.3
9 19.1
10 18.2
11 18.5
12 16.3
13 20
14 19.2
15 12.3
16 12.8
17 17.9
18 16.3
19 18.7
20 16.2
21 18.5
22 12.7
23 14.9
24 16.2
25 15.9
26 17.3
27 13.7
28 15.5
29 16.4
30 14.2

A) Using the provided data, we proceed to construct a confidence interval for the mean percentage
fat in each bottle at the 90% confidence level. The desired confidence interval is given below:
Lab 4 Joey Martinez
05/09/16
Math 311
One-Sample Z: Percent Fat

The assumed standard deviation = 2.6

Variable N Mean StDev SE Mean 90% CI


Percent Fat 30 16.150 2.113 0.475 (15.369, 16.931)

By the confidence interval above, we note that the manufacturer’s advertised percentage falls
slightly outside our bounds at the 90% confidence level. In the context of the problem, our
scientist can only be 90% confident that our predicted mean fat percentage would fall inside the
interval (15.369, 16.931), after repeated sampling. Hence, the company’s advertised percentage
fat is unable to be contained at this level of confidence.

B) We now construct a 95% confidence interval as follows:

One-Sample Z: Percent Fat

The assumed standard deviation = 2.6

Variable N Mean StDev SE Mean 95% CI


Percent Fat 30 16.150 2.113 0.475 (15.220, 17.080)

At the 95% confidence level, the advertised percentage of fat inside bottled sauce falls outside
our interval. The scientist would now have to construct another confidence interval to assess
the manufacturer’s claim that the typical bottle contains 15% fat. Since he can state, with 95%
confidence, that the population mean of bottle fat percentage is contained within the
constructed interval, it should be fairly suspicious that the advertised percentage fat falls
outside our interval at this level of confidence.

C) The final confidence interval to be calculated is at the 99% confidence level. This interval is given
by:

One-Sample Z: Percent Fat

The assumed standard deviation = 2.6

Variable N Mean StDev SE Mean 99% CI


Percent Fat 30 16.150 2.113 0.475 (14.927, 17.373)

At the 99% confidence level, we finally construct an interval that contains the manufacturer’s
advertised fat percentage. Although the scientist can state that the claim of the manufacturer is
contained in the constructed interval with 99% confidence, it is somewhat unconvincing that
our level of confidence should be raised to this level before the desired 15% can be contained
in an interval.
Lab 4 Joey Martinez
05/09/16
Math 311
D) The three constructed confidence intervals share the quality of expanding their bounds as the
level of confidence increases. In fact, each confidence interval at a lower level of confidence is
nested in the interval of higher confidence. For example, the 90% confidence interval is
contained in the 95% confidence interval and the 95% confidence interval is contained in the
99% confidence interval. This should be expected, since the confidence interval depends on the
critical values in the margin of error, and the critical values themselves depend on the level of
confidence decided upon.

2. During this final subsection, suppose we are interested in the noise levels (measured in decibels)
around urban hospitals in several areas. Let the mean and standard deviation of the noise levels in n=84
corridors be given by 61.2 decibels and 7.9 decibels, respectively. Given these values, we predict the
true population mean with the 95% confidence interval (59.5, 62.9). In the context of our problem, our
result means that we can state, with 95% confidence, that the true mean noise level around urban
hospitals is contained by the interval (59.5, 62.9), after repeated sampling.

Anda mungkin juga menyukai