Assignment 1

Statistiek: Assignment 1
Arend Slomp
Groningen, September 22, 2011
Contents
1 Losse opgaven 1.1 Opgave 1: Dierence 1.2 Opgave 2.8 . . . . . 1.3 Opgave 2.43 . . . . . 1.4 Opgave 7.9 . . . . . 1.5 Opgave 7.28 . . . . . between median and mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 4 5 6
2 Onderzoek Does smoking of weight of their babies? 2.1 Introduction . . . . . . . 2.2 Exploratory analysis . . 2.3 Formal analysis . . . . . 2.4 Conclusion . . . . . . . 2.5 Discussion . . . . . . . .
pregnant mothers have an eect on the birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . 8 . 9 . 10 . 10 . 10
1 Losse opgaven
1.1 Opgave 1: Dierence between median and mean
The dierence between the median and the mean is that the mean is the sum of all values divided by the number of values. The median is determined by sorting all data, and selecting the middle value of all the data. If the dataset contain an even number of values, the average between the two middle values is taken. The median is a more reliable estimator than the mean, in essence because it is less sensitive to outliers.
1.2 Opgave 2.8

> library(UsingR) > data(npdb) > attach(npdb) > max(table(state)) [1] 1566 > which(table(state) == max(table(state))) CA 6 California had the most awards
1.3 Opgave 2.43

> > > > data(brightness) png("brightness.png"); hist(brightness) dev.off()
As you can see in this histogram, the brightness of the stars is symmetric. Further is the data unimodal.
1.4 Opgave 7.9

x=4; n = 5 > prop.test(4,5,conf.level=0.9) 1-sample proportions test with continuity correction data: 4 out of 5, null probability 0.5 X-squared = 0.8, df = 1, p-value = 0.3711 alternative hypothesis: true p is not equal to 0.5 90 percent confidence interval: 0.3493025 0.9861052 sample estimates: p 0.8 Warning message: In prop.test(4, 5, conf.level = 0.9) : Chi-squared approximation may be incorrect > x = 80; n=100 > prop.test(80,100,conf.level=0.9) 1-sample proportions test with continuity correction
data: 80 out of 100, null probability 0.5 X-squared = 34.81, df = 1, p-value = 3.635e-09 alternative hypothesis: true p is not equal to 0.5 90 percent confidence interval: 0.7212471 0.8617706 sample estimates: p 0.8 x = 800; n=1000; 1-sample proportions test with continuity correction data: 800 out of 1000, null probability 0.5 X-squared = 358.801, df = 1, p-value < 2.2e-16 alternative hypothesis: true p is not equal to 0.5 90 percent confidence interval: 0.7778789 0.8204633 sample estimates: p 0.8 90% condence intervals are for 4 out of 5: 0.3493025-0.9861052 90% condence intervals are for 80 out of 100: 0.7212471-0.8617706 90% condence intervals are for 800 out of 1000: 0.7778789-0.8204633
1.5 Opgave 7.28

Initialisatie omgeving > library(UsingR) > data(babies) > attach(babies) > t.test(dage-age,conf.level=0.95) One Sample t-test data: dage - age t = 17.3922, df = 1235, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 2.986035 3.745356
sample estimates: mean of x 3.365696 When we look at the condence interval, it doesnt contain 0.
2 Onderzoek Does smoking of pregnant mothers have an eect on the birth weight of their babies?
2.1 Introduction
I am conducting an research about the eect of smoking of pregnant mothers on the birth weight of their babies. I used data from Baystate Medical Center in Springeld, Massachusets during 1986. The data contained 189 mothers, with the mothers age, if she smoked, and the birth weight. The data further contained extra information about the age, the mothers weight in pounds at the last menstrual period, the race, the amount of previous premature labours, the history of hypertension, the presence of uterine irritability, and the number of physician visits during the rst semester. I dont think the extra data contained in the data set is relevant, so I will not use that in my research, as this has nothing to do with my research question. When we look at the data, we have two sets. We know the amount of mothers that smoke and we know which dont smoke. Since we have 2 sets with weight of babies with mothers that do smoke and dont smoke, we can use a t-test. Looking at the data we see that 115 mothers dont smoke, and 74 do smoke. We need to formulate an hypothesis about the test we want to perform. As nullhypothesis I choose: H0 : p1 = p2 theres no dierence between smoking and no smoking Ha : p1 = p2 There is a dierence between smoking and no smoking.
2.2 Exploratory analysis
Figure 2.1: 1 = boxplot of birth-weight of babies from mothers who smoke 2 = boxplot of birth-weight of babies from mothers who dont smoke First when we talk about smoking mothers then we talk in general about the weight of the babies from the mothers who are smoking. Consistently when we talk about non smoking mothers we talk in general about the weight of the babies from the mothers who dont smoke. If we take a look at the weight of the babies from the mothers who did smoke, we see then that the median is 2775.5 grams. If we take a look at the mean of the weight of babies from mothers who smoke we see that the weight is 2772 grams. This means the data is probably normally distributed. When we take a look at the weight of the babies from the mothers who didnt smoke, we see that the median is 3100 grams. We see that the mean of these babies is 3055.7 grams.
2.3 Formal analysis

We choose a condence interval of 0.95. > m.smoke = bwt[smoke==1] > m.nosmoke = bwt[smoke==0] >> t.test(m.nosmoke,m.smoke,var.equal=TRUE) Two Sample t-test data: m.nosmoke and m.smoke t = 2.6529, df = 187, p-value = 0.008667 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 72.75612 494.79735 sample estimates: mean of x mean of y 3055.696 2771.919 When we look at this test we see that the p-value equals 0.008. This means that the test is signicant, and we reject the null-hypothesis.
2.4 Conclusion
Looking at the data, we see that there is a statistical signicance, and we have to reject the null-hypothesis.
2.5 Discussion
The fact that there is a relation between the weight of the babies and the fact that mothers do or dont smoke doesnt mean that smoking or smoking really the cause is that the weight is so dierent. We furthermore have no information about the habitat of the two groups. If the smokers do come from gettos and the non-smokers from the wealthy part of society than is that a dierent reason why the we see this correlation.
10

Assignment 1

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Assignment 1

Diunggah oleh

Hak Cipta:

Format Tersedia

Statistiek: Assignment 1

Groningen, September 22, 2011

1.2 Opgave 2.8

1.3 Opgave 2.43

1.4 Opgave 7.9

1.5 Opgave 7.28

2.2 Exploratory analysis

2.3 Formal analysis

Anda mungkin juga menyukai