Anda di halaman 1dari 16

Running

head: THE INFERENCE PROJECT

Title: The inference project

College: Grady Journalism and Mass Communication College

Professor: Dr. April Galyardt

Student: Fengyao Luo


THE INFERENCE PROJECT 2

The inference project

The educational longitudinal study of 2005 school administrator survey samples

individuals who work as school administrators or who concern about the educational quality.

This study aims at helping educators or policy makers know about the important issues facing

the nation’s schools. Schools are simply random selected from the population. 1954 Individuals

response to the survey and we will use this data to continue our analysis in this report.

How many days is the school year (for 10th graders)?

Figure 1: the number of days of the school year for 10th graders
THE INFERENCE PROJECT 3

The question talks about the number of days of school year in 10th graders. To figure out

the question, I use the data who answer the question on the survey to analyze. In order to

effectively analyze data of “the number of days in school year for 10th graders”, I treated the

values of “missing” (0.4%), “item not applicable to sample member” (61.5%), and “non-

respondent” (0.5%) as missing values. After getting rid of these missing values, the total number

of valid data is 735. The median of the distribution is 180, and its mean is 179.39. Two of them

are quite similar and I can speculate that the distribution is nearly normal, which is proved by the

graph. Thus, the mean (179) represents the center of this distribution. And the standard deviation

of the distribution is 3.89.

The valid sample size is 735, so we can get the sample error, which is 0.14. The 95%

confidence interval of the sample mean is from 179.12 to 179.66. It indicates that with 95%

confidence, I can conclude that the mean of number of days in school year for 10th graders in US.

Schools will fall between 179.12 and 179.66. As the distribution concerns about number of days

in school year for 10th graders, I round the confidence interval to the integrate. In sum, US.

schools have around 179 days of school year for 10th graders.
THE INFERENCE PROJECT 4

What proportion of schools require students to pass a test in order to receive a high school

diploma?

Figure 2: The percentage of high schools require tests for getting diploma

The question talks about the proportion of schools require students to pass a test in order

to receive a high school diploma. People who select the “Yes” mean that students in their schools

need to pass tests to get diploma, while those who select “No” means that they do not need to

pass tests. In order to make the bar chart to be concise, I set the “missing” (0.2%), “partial

interview breakoff” (4.5%), “non-respondent” (0.5%), and “survey component legitimate

skip/NA” (61.5%) as missing values and got rid of them. Then the number of valid number (N)

turns out to be 652. As the graph shows, 42.94% of schools claim that students in their schools
THE INFERENCE PROJECT 5

do not need to pass a test to get a high school diploma, while 57.06% claim that students in their

schools need to pass a test.

The valid number of sample is 652, so I get the sample error as 0.019. The confidence

interval of percentage of schools that require students to pass a test for getting a high school

diploma in the sample is from 56.74% to 57.38%. Therefore, I am 95% confidence that the

proportion of US. high schools that require students to pass a test for receiving a high school

diploma is covered by the interval between 56.74% and 57.38%.


THE INFERENCE PROJECT 6

Do students who live in different types of communities (urban, suburban, rural) enroll in

vocational programs in different proportions?

Figure 3: The percentage of 10th graders in vocational program in types of communities

In this report, we try to figure out the relationship between types of communities of

school and the percentage of 10th graders in vocational program. To get valid data, I recoded the

variable of school urbanicity as type 1 (rural), 2 (suburban), and 3 (urban) and treated the

“{Missing” (69.1%) in the data as missing values. The valid sample size is 604. As the boxplot

shows, three types of school urbanicity have some outliers, and thus their median can represent

the center. To rural schools, their center of percentage of 10th graders in vocational program is
THE INFERENCE PROJECT 7

10%, while to urban schools and suburban school, their center of percentage of 10th graders in

vocational program separately are 5% and 6%.

Regarding the hypothesis of these two variables, my null hypothesis is that there is no

relationship between types of communities of school and percentage of 10th graders in vocational

program. The graph shows that there are some outliers and the shape of distribution of three type

of communities are not normal. But, the sample size is 604, which is large enough for us to use

ANOVA to test the hypothesis. Through an ANOVA test of these two variables, the F is 3.542

and the p-value is 0.03. With the 0.05 significance level, I can reject the null hypothesis and

conclude that there is a relationship between types of communities of school and percentage of

10th graders in vocational program. It indicates that schools from at least one of three types of

communities (urban, suburban, rural) have different proportions of students enrolled in

vocational programs.
THE INFERENCE PROJECT 8

Is there a relationship between administrators reporting that learning is hindered by a lack

of texts/suppliers and reporting that absenteeism is a problem at school?

The crosstable of learning hindered by a lack of text and absenteeism is a problem in schools

How much is learning hindered by a lack of text/suppliers


A lot Less than a lot Total
How often Happens daily 52.0 494.0 546.0
absenteeism is Expected 49.9 496.1 546.0
a problem % 92.9% 80.6% 89.1%
at your school? Less than daily 4.0 63.0 67.0
Expected 6.1 60.9 67.0
% 7.1% 10.3% 10.9%
Total 56.0 557.0 613
Expected 56.0 557.0 613
% 100% 100% 100%

Table 1: The two-way table of learning hindered by a lack of text and absenteeism is a problem

in schools

In this question, we will figure out the relationship between learning hindered by a lack

of text and absenteeism is a problem in schools. In order to clearly display the relationship, I got

rid of some irrelevant values and recoded these two variables. To the explanatory variable

(learning is hindered by a lack of texts/suppliers), I kept the values “a lot” as “A lot” and recoded

the values of “not at all”, “very little” and “to some extent” as “Less than a lot”. To response

variable (absenteeism is a problem), I kept the values of “happens daily” as “happens daily”, and

recoded the values of “happens at least once a week”, “happens at least once a month”, “Happens

on occasion” and “Never happens” as “less than daily. To both two variables, other values except

I mentioned above are set as missing values since they are irrelevant with this question.
THE INFERENCE PROJECT 9

The two-way table shows that in the schools that learning hindered by a lack of suppliers

happens a lot, 92.9% of them experience the absenteeism problem daily. In the schools that

learning hindered by a lack of suppliers happens less than a lot, 80.6% of them experience the

absenteeism problem daily. Using the Chi-square test to examine the hypothesis, I got the 𝑥 " as

9.108 and the p-value is 0.694. Because the p-value is larger than 0.05, I do not have enough

evidence to reject the null hypothesis. With 0.05 significance level, I can conclude that the

survey did not provide enough evidence to prove a relationship between between learning

hindered by a lack of text and absenteeism is a problem in US. schools.


THE INFERENCE PROJECT 10

How does the percentage of 10th graders in a college preparatory program differ between

schools that have a process to get parent input on discipline policies and schools that do

not?

Figure 4: The percentage of 10th graders in a preparatory program if schools have parents

involved

In this question, we try to figure out whether schools have parents involved or not will

influence percentage of 10th graders in preparatory program. In order to clearly display these two

variables of getting parent involved and students participating in a college preparatory program, I
THE INFERENCE PROJECT 11

kept values of “Yes” (18.7%) and “No” (14.6%), and got rid of values that named “{missing}”

(0.3%), “{nonrespondent}” (0.5%), “{partial interview-breakoff” (4.5%) and “{survey

component legitimate}” (61.5%). Then the valid number becomes 651, which is 33.32% of the

sample size (1954).

The graph shows that the distribution of schools which do not involve parents in is

skewed, while the distribution of schools involved parents in is nearly normal. In the distribution

of schools which do not involve parents in, its median (78%) could represent its center, which

indicates that mainly 78% of 10th graders will attend in college preparatory program in these

schools that do not have parents involved in. The approximate normal distribution of schools

involved parents in indicates that mainly 60% of 10th graders will attend in college preparatory

program in these schools that have parents involved in.

Although the distribution of schools which do not involve parents in is right skewed, the

sample size is large and thus we can use t-test to examine hypothesis. Using a two-sample t-test,

the t statistic is -3.003 and the p-value is 0.003. The p-value is smaller than the 0.05, and thus I

can reject the null hypothesis. At the 5% significance level, I can conclude that schools that

getting parent involved in will influence students’ participation in a college preparatory program.
THE INFERENCE PROJECT 12

Describe the relationship between the percentage of 10th graders in a college preparatory

program and the percentage of 10th graders receiving remediation in reading.

Figure 5: The scatter plot of percentage of students in preparatory programs and getting reading

remediation
THE INFERENCE PROJECT 13

Figure 6: The normal P-P plot of regression standardized residuals

Figure 7: the scatter plot of regression residuals and predicted value


THE INFERENCE PROJECT 14

In this question, we try to figure out the relationship between the percentage of 10th

graders in a college preparatory program and the percentage of 10th graders receiving

remediation in reading in US. schools. In order to make the data to be valid, I got rid of some

values in both variables. In the variable of “percentage of 10th graders in preparatory program”, I

set “missing” (1.5%), “Item not applicable to sample member” (61.5%), “partial interview-

breakoff” (4.5%), “out of range” (0.1%) and “nonrespondent” (0.5%) as missing values.

Through test the correlation between two variables, we get the p-value as 0.026 and the

correlation coefficient (r) is -0.092. At the 0.05 significance level, the p-value is smaller than

0.05, and the r is approximate 0, we can assume that there is a weak linear relationship between

the percentage of 10th graders in a college preparatory program and the percentage of 10th

graders receiving remediation. However, I compared the linear fit line y=5.77-0.02*x and the

loess fit line (the curve line in the Figure 5), and I found that the curve does not look similar to

the straight line. It indicates the linear regression line may not the best way to describe the

relationship between two variables.

In order to figure out it, I made the normal P-P plot (Figure 6) and the scatterplot with

predicted values (Figure 7). The P-P plot shows that the residuals are not normally distributed

and the scatterplot shows that the residuals are not distributed evenly around the reference line

y=0. It means that the relationship is not really linear. In sum, the p-value is very small, so we

can use linear regression as a simple approximation of the relationship, but we need to explore

other methods, which may be suitable for the relationship better.


THE INFERENCE PROJECT 15

How are good teachers recognized/awarded? (Keep in mind each school might use multiple

ways.)

The percentage of schools award good tearchers


100%
91.30% Given special awards
90%

80% assigned to better students

70%
given a lighter teaching load
60% 55.80%
50.90% 49.10%
50% 44.20% relieved of administrative duties

40%
given priority on requests for materials
30%

20% receive higher pay


8.70%
10%
not recognized in these ways
0%
Yes No

Figure 8: The bar chart of the way of schools award good teachers

This question concerns about the way that schools award good teachers. In order to

clearly display the proportion of schools taking different measures to judge good teachers, I got

rid of some irrelevant values, such as “missing” (0.7%), “nonrespondent” (0.5%), “partial

interview-breakoff” (4.5%) and “survey component legitimate” (61.5%). The valid sample size

is 643. Through comparing different percentage of schools taking different measures to judge

good teachers, I find that more than half of schools (50.9%) take actions of giving special awards

to good teachers, while other actions only a few schools take them to encourage good teachers.

Among them, 8.7% of schools assign good students to good teachers, 2.2% of schools give

lighter teaching loads to good teachers, 1.6% of schools relief good teachers’ administrative

duties, 2.8% of schools give priority on requests for materials and 6.2% of schools give good
THE INFERENCE PROJECT 16

teachers higher pays. The graph shows that 55.8% of schools selected that good teachers are not

recognized in these ways. It indicates that some schools will give awards to good teachers using

their specific ways, and researchers could take some interviews or group discussion methods to

have further study in this aspect.

Therefore, I treat the action of giving special awards to good teachers as the best way to

reward good teachers, since a majority of schools take the action to award good teachers. I can

get the confidence interval of percentage of schools having special awards to good teachers,

which is from 46.9% to 54.9%. At the 95% confidence level, I can conclude that the percentage

of US. schools which award good teachers by given them special awards will fall in the interval

from 46.9% to 54.9%. It shows that half of high schools in US. give good teachers special

awards to encourage them.

Anda mungkin juga menyukai