individuals who work as school administrators or who concern about the educational quality.
This study aims at helping educators or policy makers know about the important issues facing
the nation’s schools. Schools are simply random selected from the population. 1954 Individuals
response to the survey and we will use this data to continue our analysis in this report.
Figure 1: the number of days of the school year for 10th graders
THE INFERENCE PROJECT 3
The question talks about the number of days of school year in 10th graders. To figure out
the question, I use the data who answer the question on the survey to analyze. In order to
effectively analyze data of “the number of days in school year for 10th graders”, I treated the
values of “missing” (0.4%), “item not applicable to sample member” (61.5%), and “non-
respondent” (0.5%) as missing values. After getting rid of these missing values, the total number
of valid data is 735. The median of the distribution is 180, and its mean is 179.39. Two of them
are quite similar and I can speculate that the distribution is nearly normal, which is proved by the
graph. Thus, the mean (179) represents the center of this distribution. And the standard deviation
The valid sample size is 735, so we can get the sample error, which is 0.14. The 95%
confidence interval of the sample mean is from 179.12 to 179.66. It indicates that with 95%
confidence, I can conclude that the mean of number of days in school year for 10th graders in US.
Schools will fall between 179.12 and 179.66. As the distribution concerns about number of days
in school year for 10th graders, I round the confidence interval to the integrate. In sum, US.
schools have around 179 days of school year for 10th graders.
THE INFERENCE PROJECT 4
What proportion of schools require students to pass a test in order to receive a high school
diploma?
Figure 2: The percentage of high schools require tests for getting diploma
The question talks about the proportion of schools require students to pass a test in order
to receive a high school diploma. People who select the “Yes” mean that students in their schools
need to pass tests to get diploma, while those who select “No” means that they do not need to
pass tests. In order to make the bar chart to be concise, I set the “missing” (0.2%), “partial
skip/NA” (61.5%) as missing values and got rid of them. Then the number of valid number (N)
turns out to be 652. As the graph shows, 42.94% of schools claim that students in their schools
THE INFERENCE PROJECT 5
do not need to pass a test to get a high school diploma, while 57.06% claim that students in their
The valid number of sample is 652, so I get the sample error as 0.019. The confidence
interval of percentage of schools that require students to pass a test for getting a high school
diploma in the sample is from 56.74% to 57.38%. Therefore, I am 95% confidence that the
proportion of US. high schools that require students to pass a test for receiving a high school
In this report, we try to figure out the relationship between types of communities of
school and the percentage of 10th graders in vocational program. To get valid data, I recoded the
variable of school urbanicity as type 1 (rural), 2 (suburban), and 3 (urban) and treated the
“{Missing” (69.1%) in the data as missing values. The valid sample size is 604. As the boxplot
shows, three types of school urbanicity have some outliers, and thus their median can represent
the center. To rural schools, their center of percentage of 10th graders in vocational program is
THE INFERENCE PROJECT 7
10%, while to urban schools and suburban school, their center of percentage of 10th graders in
Regarding the hypothesis of these two variables, my null hypothesis is that there is no
relationship between types of communities of school and percentage of 10th graders in vocational
program. The graph shows that there are some outliers and the shape of distribution of three type
of communities are not normal. But, the sample size is 604, which is large enough for us to use
ANOVA to test the hypothesis. Through an ANOVA test of these two variables, the F is 3.542
and the p-value is 0.03. With the 0.05 significance level, I can reject the null hypothesis and
conclude that there is a relationship between types of communities of school and percentage of
10th graders in vocational program. It indicates that schools from at least one of three types of
vocational programs.
THE INFERENCE PROJECT 8
Is there a relationship between administrators reporting that learning is hindered by a lack
The crosstable of learning hindered by a lack of text and absenteeism is a problem in schools
Table 1: The two-way table of learning hindered by a lack of text and absenteeism is a problem
in schools
In this question, we will figure out the relationship between learning hindered by a lack
of text and absenteeism is a problem in schools. In order to clearly display the relationship, I got
rid of some irrelevant values and recoded these two variables. To the explanatory variable
(learning is hindered by a lack of texts/suppliers), I kept the values “a lot” as “A lot” and recoded
the values of “not at all”, “very little” and “to some extent” as “Less than a lot”. To response
variable (absenteeism is a problem), I kept the values of “happens daily” as “happens daily”, and
recoded the values of “happens at least once a week”, “happens at least once a month”, “Happens
on occasion” and “Never happens” as “less than daily. To both two variables, other values except
I mentioned above are set as missing values since they are irrelevant with this question.
THE INFERENCE PROJECT 9
The two-way table shows that in the schools that learning hindered by a lack of suppliers
happens a lot, 92.9% of them experience the absenteeism problem daily. In the schools that
learning hindered by a lack of suppliers happens less than a lot, 80.6% of them experience the
absenteeism problem daily. Using the Chi-square test to examine the hypothesis, I got the 𝑥 " as
9.108 and the p-value is 0.694. Because the p-value is larger than 0.05, I do not have enough
evidence to reject the null hypothesis. With 0.05 significance level, I can conclude that the
survey did not provide enough evidence to prove a relationship between between learning
schools that have a process to get parent input on discipline policies and schools that do
not?
Figure 4: The percentage of 10th graders in a preparatory program if schools have parents
involved
In this question, we try to figure out whether schools have parents involved or not will
influence percentage of 10th graders in preparatory program. In order to clearly display these two
variables of getting parent involved and students participating in a college preparatory program, I
THE INFERENCE PROJECT 11
kept values of “Yes” (18.7%) and “No” (14.6%), and got rid of values that named “{missing}”
component legitimate}” (61.5%). Then the valid number becomes 651, which is 33.32% of the
The graph shows that the distribution of schools which do not involve parents in is
skewed, while the distribution of schools involved parents in is nearly normal. In the distribution
of schools which do not involve parents in, its median (78%) could represent its center, which
indicates that mainly 78% of 10th graders will attend in college preparatory program in these
schools that do not have parents involved in. The approximate normal distribution of schools
involved parents in indicates that mainly 60% of 10th graders will attend in college preparatory
Although the distribution of schools which do not involve parents in is right skewed, the
sample size is large and thus we can use t-test to examine hypothesis. Using a two-sample t-test,
the t statistic is -3.003 and the p-value is 0.003. The p-value is smaller than the 0.05, and thus I
can reject the null hypothesis. At the 5% significance level, I can conclude that schools that
getting parent involved in will influence students’ participation in a college preparatory program.
THE INFERENCE PROJECT 12
Describe the relationship between the percentage of 10th graders in a college preparatory
Figure 5: The scatter plot of percentage of students in preparatory programs and getting reading
remediation
THE INFERENCE PROJECT 13
graders in a college preparatory program and the percentage of 10th graders receiving
remediation in reading in US. schools. In order to make the data to be valid, I got rid of some
values in both variables. In the variable of “percentage of 10th graders in preparatory program”, I
set “missing” (1.5%), “Item not applicable to sample member” (61.5%), “partial interview-
breakoff” (4.5%), “out of range” (0.1%) and “nonrespondent” (0.5%) as missing values.
Through test the correlation between two variables, we get the p-value as 0.026 and the
correlation coefficient (r) is -0.092. At the 0.05 significance level, the p-value is smaller than
0.05, and the r is approximate 0, we can assume that there is a weak linear relationship between
the percentage of 10th graders in a college preparatory program and the percentage of 10th
graders receiving remediation. However, I compared the linear fit line y=5.77-0.02*x and the
loess fit line (the curve line in the Figure 5), and I found that the curve does not look similar to
the straight line. It indicates the linear regression line may not the best way to describe the
In order to figure out it, I made the normal P-P plot (Figure 6) and the scatterplot with
predicted values (Figure 7). The P-P plot shows that the residuals are not normally distributed
and the scatterplot shows that the residuals are not distributed evenly around the reference line
y=0. It means that the relationship is not really linear. In sum, the p-value is very small, so we
can use linear regression as a simple approximation of the relationship, but we need to explore
ways.)
70%
given a lighter teaching load
60% 55.80%
50.90% 49.10%
50% 44.20% relieved of administrative duties
40%
given priority on requests for materials
30%
Figure 8: The bar chart of the way of schools award good teachers
This question concerns about the way that schools award good teachers. In order to
clearly display the proportion of schools taking different measures to judge good teachers, I got
rid of some irrelevant values, such as “missing” (0.7%), “nonrespondent” (0.5%), “partial
interview-breakoff” (4.5%) and “survey component legitimate” (61.5%). The valid sample size
is 643. Through comparing different percentage of schools taking different measures to judge
good teachers, I find that more than half of schools (50.9%) take actions of giving special awards
to good teachers, while other actions only a few schools take them to encourage good teachers.
Among them, 8.7% of schools assign good students to good teachers, 2.2% of schools give
lighter teaching loads to good teachers, 1.6% of schools relief good teachers’ administrative
duties, 2.8% of schools give priority on requests for materials and 6.2% of schools give good
THE INFERENCE PROJECT 16
teachers higher pays. The graph shows that 55.8% of schools selected that good teachers are not
recognized in these ways. It indicates that some schools will give awards to good teachers using
their specific ways, and researchers could take some interviews or group discussion methods to
Therefore, I treat the action of giving special awards to good teachers as the best way to
reward good teachers, since a majority of schools take the action to award good teachers. I can
get the confidence interval of percentage of schools having special awards to good teachers,
which is from 46.9% to 54.9%. At the 95% confidence level, I can conclude that the percentage
of US. schools which award good teachers by given them special awards will fall in the interval
from 46.9% to 54.9%. It shows that half of high schools in US. give good teachers special