Simple Linear Ekek

Schnell, Jan Andrew, S.
CE60P / B2
Estillore, Ava Portia, S.

Simple Linear
It is assumed that achievement test scores should be correlated with student's classroom
performance. One would expect that students who consistently perform well in the classroom
(tests, quizes, etc.) would also perform well on a standardized achievement test (0 - 100 with
100 indicating high achievement). A teacher decides to examine this hypothesis. At the end of
the academic year, she computes a correlation between the students achievement test scores
(she purposefully did not look at this data until after she submitted students grades) and the
overall g.p.a. for each student computed over the entire year. The data for her class are
provided below.
Achievement
98
96
94
88
91
77
86
71
59
63
84
79
75
72
86
85
71
93
90
62
G.P.A.
3.6
2.7
3.1
4.0
3.2
3.0
3.8
2.6
3.0
2.2
1.7
3.1
2.6
2.9
2.4
3.4
2.8
3.7
3.2
1.6
1. Compute the correlation coefficient.

2. What does this statistic mean concerning the relationship between achievement test
prformance and g.p.a.?
3. What percent of the variability is accounted for by the relationship between the two
variables and what does this statistic mean?
4. What would be the slope and y-intercept for a regression line based on this data?
Polynomial
Here we use an example from the physical sciences to emphasise the point that polynomial
regression is mostly applicable to studies where environments are highly controlled and
observations are made to a specified level of tolerance. The data below are the electricity
consumptions in kilowatt-hours per month from ten houses and the areas in square feet of
those houses:
Home
Size
KW
Hrs/Mnth
1290
1182
1350
1172
1470
1264
1600
1493
1710
1571
1840
1711
1980
1804
2230
1840
2400
1956
2930
1954
Polynomial regression
Intercept
b0= -1216.143887 t = -5.008698
P = .0016
Home Size
b1= 2.39893
t = 9.75827
P < .0001
Home Size^2
b2= -0.00045
t = -7.617907
P = .0001
KW Hrs/Mnth = -1216.143887 +2.39893 Home Size -0.00045 Home Size^2
Analysis of variance from regression
Source of variation
Sum Squares
DF
Regression
831069.546371
415534.773185
Residual
15332.553629
2190.364804
Total (corrected)
846402.1
Root MSE = 46.801333

F = 189.710304 P < .0001
Multiple correlation coefficient (R) = 0.990901
Mean Square
R = 98.188502%
Ra = 97.670932%
Durbin-Watson test statistic = 1.63341

Polynomial regression - area under curve
AUC (polynomial function) = 2855413.374801
AUC (by trapezoidal rule) = 283819
Multiple Linear
Nonlinear
Upon using a spreadsheet to create a graph of raw data it is frequently desirable to generate the
best fit equation for that data. Most spreadsheets have the capability of performing a polynomial
regression analysis of any order. By means of an example, the following steps can be followed
to obtain the equation of best fit for a set of data. In an attempt to keep this discussion general,
a discipline specific example will not be presented. Also, nonlinear data will be used, although
linear data can also be analyzed.
1. Gather the data. For the example, the following data will be used.
Y Data
0
0.2
0.6
1.8
5.4
16.2
X Data
0
1
2
3
4
5
2. Enter the data into a spreadsheet in two columns with the Y data to the left of the X data.
Since the data is obviously nonlinear, create a third column that is the X data raised to the
second power (X2) as shown below.
Y Data
0
0.2
0.6
1.8
5.4
16.2
X Data
0
1
2
3
4
5
X2
0
1
4
9
16
25
3. Select the regression analysis option within the spreadsheet you are using. Highlight the
column of Y data as the dependent variable and the X and X2 data as the independent
variables. Selecting only the X data as the independent variable will yield a linear equation.
4. The menu should have an OUTPUT selection option. Select OUTPUT and then place the
cursor somewhere off of your raw data. It is necessary to place the output off your raw data so
that your data is not erased when the analysis is performed.
5. Select GO from the regression analysis menu. Your output should look like the following.
Regression Output:
Constant
0.964286
Std Err of Y Est
1.77477
R Squared
No. of Observations
6
Degrees of Freedom
3
X Coefficient(s)
Std Err of Coef.
0.952187
3.08071 1.175
1.513022
0.290465
6. The equation correlating the Y and X data is as follows,

Y = 0.964286 - 3.08071*X + 1.175*X2
(1)
The R Squared (correlation coeffient) value is used to determine if the equation is of adequate
accuracy. The closer R Squared is to 1.0, the better the equation fits the data. For this data, a
value of 0.952187 may or may not be of adequate accuracy depending on what the user's
requirements are. Let's assume a better fit is needed. The analysis will be rerun using X through
X4 data.
7. Expand the X data to include X3 and X4 values, as shown below.
Y Data
0
0.2
0.6
1.8
5.4
16.2
X Data
0
1
2
3
4
5
X2
0
1
4
9
16
25
X3
0
1
8
27
64
125
X4
0
1
16
81
256
625
8. In the regression analysis option, highlight the X, X2, X3 and X4 data as the independent data.
Rerun the regression analysis and the output will look like the following.
Regression Output:
Constant
0.00873
Std Err of Y Est
0.138587
R Squared
0.999903
No. of Observations
6
Degrees of Freedom
1
X Coefficient(s)
Std Err of Coef.
0.53201
0.510028
1.098611
0.487215
0.50648 0.0875
0.153745
0.015278
9. The new equation correlating the data is as follows,

Y = 0.00873 - 0.53201*X + 1.098611*X2 - 0.50648*X3 + 0.0875*X4.
(2)

Simple Linear Ekek

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Simple Linear Ekek

Diunggah oleh

Hak Cipta:

Format Tersedia

Schnell, Jan Andrew, S.

Estillore, Ava Portia, S.

1. Compute the correlation coefficient.

b0= -1216.143887 t = -5.008698

KW Hrs/Mnth = -1216.143887 +2.39893 Home Size -0.00045 Home Size^2

Analysis of variance from regression

Root MSE = 46.801333

Multiple correlation coefficient (R) = 0.990901

Durbin-Watson test statistic = 1.63341

6. The equation correlating the Y and X data is as follows,

9. The new equation correlating the data is as follows,

Anda mungkin juga menyukai