Regression

Decision Model
To be Submitted to
PROF. SREEDHARA R.
Presented by:
Philip Chehalan
1527614, M1
QUESTION: A biotech major, specializing in the development of genetically modifying

seeds to know the degree of yield of wheat per hectare is influenced by the variety of seeds,
used for the crop & few other variables. It deputed an employee at the Bangalore Head Office
to conduct a research to study the effect that 6 different conditions (independent variables) on
the yield per hectare (dependent variable) for a crop of a wheat. The research was conducted
by accumulating data from 15 major states in India.
Dependent Variable
Y = yield per hectare in quintals
Independent Variables
X1 = Rainfall in cms.
X2 = Soil Type [1 = Low Quality; 5 = High Quality of Soil Suitable for Wheat]
X3 = Quantity of Fertilizers [in quintals per sq. km. of land]
X4 = Percentage of land being irrigated by the State Agriculture Department
X5 = Seed Quality [1= Low; 5 = High level of Genetically modified quality seeds]
X6 = Percentage of automation in the cultivation process
Build a regression model for the dependent variable.
Yield Rainfall SoilType Fertilizer IrrigatedArea SeedType Automation

40
23
3
8
78
2
45
30
12
1
3
47
1
26
42
28
3
9
82
2
39
35
15
2
5
51
2
25
44
30
3
4
80
3
35
48
31
4
7
86
3
40
55
37
5
10
90
4
42
34
15
2
5
50
2
25
28
64
1
4
42
1
21
60
44
5
9
92
5
52
35
16
2
7
49
2
25
48
30
1
4
82
2
38
29
65
1
3
41
1
20
46
30
3
8
80
3
36
58
42
4
8
89
4
49
Table 1: Correlations
Quantity of Fertilizers
Rainfall
Soil
(in quintals per sq. km.
(in cms.) Type
of land)
Yield per
Hectare in
Quintals
Yield per Hectare Pearson Correlation
in Quintals
Sig. (2-tailed)
15
.067
Rainfall (in cms.) Pearson Correlation

Sig. (2-tailed)
N
Soil Type
Pearson Correlation
Sig. (2-tailed)
N
Quantity of
Fertilizers (in
quintals per sq.
km. of land)
Percentage of
land being
irrigated by the
State Agriculture
Seed Quality
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
.067
.848**
.701**
.929**
.937**
.916**
.813
.000
.004
.000
.000
.000
15
15
15
15
15
15
.024
-.067
-.009
.056
-.001
.932
.811
.976
.844
.998
15
15
15
15
15
**
**
**
.813
15
15
**
.024
.000
.932
.848
Percentage of land being

Percentage of
irrigated by the State
Seed
Automation in the
Agriculture Department Quality Cultivation Process
.847
.810
.919
.818
**
.000
.000
.000
.000
15
15
15
15
15
15
15
**
-.067
**
**
**
.736**
.004
.811
.000
.003
.003
.002
15
15
15
15
**
.941**
.000
.000
15
.701
.847
15
15
15
**
-.009
**
**
.000
.976
.000
.003
15
.929
.810
.706
.706
.713
.812
15
15
15
15
15
**
.056
**
**
**
.000
.844
.000
.003
.000
N
15
Percentage of
Pearson Correlation
.916**
Automation in
Sig. (2-tailed)
.000
the Cultivation
N
15
Process
**. Correlation is significant at the 0.01 level (2-tailed).
15
15
15
15
15
15
-.001
**
**
**
**
Pearson Correlation
Sig. (2-tailed)
.937
.919
.818
.713
.736
.812
.941
.820
**
.000
.820
.998
.000
.002
.000
.000
15
15
15
15
15
Find out the degree of correlation at p<0.05 i.e. 95% confidence level by which you find
which factors determine a greater yield of wheat.
The first column of the table is the dependent variable Y (or Yield per Hectare in
Quintals) shows what is the degree of relationship that the yield per hectare of wheat has
with independent. It also shows how the above mentioned independent variables are related
to each other.
Pearson Correlation - Measure the strength of linear relationship between the two
variables.The coefficient range from -1 to +1, with -1 indicating a perfect negative
correlation, +1 indicating a perfect positive correlation, and 0 indicating no correlation at all.
(A variable correlated with itself will always have a correlation coefficient of 1). From the
table thus we can see that all the independent variables have positive correlation with the
yield per hectare. Rainfall (in cms.) has negative corelation with the other independent
variables
The p-value was taken as 0.05 so all the variables with p-value less than 0.05 would be
considered as a significant factor for the conduction of research. Since the p-value shows that
rainfall with a p-value of .813 is definitely greater than 0.05 and thus is an insignificant
factor. All the other independent variables have their p-values less than 0.05 making them
significant factors of study.
15
Regression:
2.1 ENTER METHOD
Table of Variables Entered/Removed
Model
1
Variables
Entered
Variables
Removed
Percentage of
Automation in
the Cultivation
Process,
Quantity of
Fertilizers (in
quintals per sq.
.
km. of land),
Seed Quality,
Percentage of
land being
irrigated by the
State Agriculture
Department, Soil
b
Type
Method
Enter
This table tells us about the variables that are being used in the research.
Table of ANOVA
Model
1
Sum of Squares
Regression
Residual
Total
df
Mean Square
1482.693
296.539
33.040
3.671
1515.733
14
F
80.775
Sig.
.000b
ANOVA is a tells us whether the model is a good fit for the data. The variables involved
under this table can be interpreted as thus:
Model - SPSS allows you to specify multiple models in a single regression command.
Sum of Squares - These are the Sum of Squares associated with Total, Model and Residual.
df - degrees of freedom corresponds to the number of coefficients estimated -1.
Mean Square - The Sum of Squares divided by their respective DF.

F and Sig. -. The F-statistic is the Mean Square (Regression) divided by the Mean Square
Since the significance value is .000 which is less than our recommended p-value of .05. Thus
making this model a good fit for the data.
2.1.3 Table of Model Summary
Model
.989
R Square
Adjusted R Square
Std. Error of the Estimate
.978
.966
1.916
The next table shows the multiple linear regression model summary and overall fit
statistics. R2 more than 50% shows that the variables chosen for the study are significant.
The coefficient of determination for the model is 0.966 with the R = .978.
Table 2.1.4 Table of Coefficients

Table 2.1.4 Coefficients
Unstandardized Coefficients
Model
1
Std. Error
(Constant)
10.415
2.242
Soil Type
-2.983
1.294
Quantity of Fertilizers (in

quintals per sq. km. of land)
.310

Agriculture Department
Seed Quality
Percentage of Automation
in the Cultivation Process
Standardized
Coefficients
Beta
Sig.
4.644
.001
-.401
-2.305
.047
.443
.071
.701
.501
.234
.079
.442
2.964
.016
6.951
1.235
.793
5.627
.000
.126
.157
.125
.802
.443
It helps us to determine what is the nature of the regression equation and what factors
influences the yield of wheat
Model - Number of the model reported which in this case is 1.

B - These are the values for the regression equation for predicting the dependent variable
from the independent variable. The regression equation can be presented as:
Ypredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5+ Standard Error.
Std. Error standard errors associated with each coefficients.
Beta Coefficients obtained when all the variables in the regression are standardized
The coefficient for seed quality (6.951) is significantly different from 0 because its p-value is
0.000, which is smaller than 0.05, so we take the coefficient 10.415. The coefficient for
Quantity of fertilizers (0.310) is not statistically significantly different from 0 because its pvalue (.501) is larger than 0.05. So we take the coefficient (0.234) which is statistically
significant because its p-value of 0.016 is less than .05. The coefficient for percentage of
process automation (0.126) is not statistically significantly different from 0 because its pvalue (.443) which is larger than 0.05. So we take the coefficient (-2.983) which is
significantly different from 0 because the p-value is 0.047, which is smaller than 0.05.
The variables which would make up the regression equation to determine the yield of wheat
per hectare of land are:
Seed Quality
Percentage of land being irrigated by the Agricultural State Department
Soil type.
The regression equation is:

Y
Predicted_Yield_Value
= 10.415 + 6.951X
Seed_Quality
+ 0.234X
Percentage_of_Land_Irrigated
- 2.983X
Soil_Type
FORWARD METHOD
Table 2.2.1 Variables Entered/Removed
Model
Variables
Variables
Entered
Removed
Method
Forward
Seed Quality
(Criterion:
Probability-of-Fto-enter <= .050)
Percentage of
Forward
land being
irrigated by the
(Criterion:
Probability-of-F-
State Agriculture
to-enter <= .050)
Department
3
Forward
Soil Type
(Criterion:
Probability-of-Fto-enter <= .050)
Table 2.2.2 ANOVA

Model
1
Sum of Squares
Regression
1331.027
184.706
13
14.208
Total
1515.733
14
Regression
1456.843
728.422
58.890
12
4.908
Total
1515.733
14
Regression
1477.421
492.474
38.312
11
3.483
1515.733
14
Residual
Mean Square
1331.027
Residual
df
Residual
Total
Sig.
b
93.680
.000
148.430
.000
141.397
.000
The third model is the final model after all the variables have been assessed which is Seed
Quality, Percentage of Land irrigated and the Soil Type. The significance of 3rd model is
.000 which makes it a good fit for the data.
Table 2.2.3 Model Summary
Model
R Square
Adjusted R
Std. Error of the
Square
Estimate
.937
.878
.869
3.769
.980
.961
.955
2.215
.987
.975
.968
1.866
From the above Table, we can see that the adjusted coefficient of Determination is .968
which is greater than that of ENTER method (.966). Therefore there is a greater degree of
variance in the dependent variable.
Standardized
Model
1
B
(Constant)
Seed Quality
(Constant)
Seed Quality
Std. Error
21.875
2.308
8.213
.849
12.425
2.307
4.698
.855
.262
.052
11.316
1.997
6.757
1.112
.292
-2.286
Coefficients
Beta
Sig.
9.477
.000
9.679
.000
5.385
.000
.536
5.497
.000
.494
5.063
.000
5.667
.000
.771
6.078
.000
.045
.552
6.450
.000
.940
-.307
-2.431
.033
.937

3
(Constant)
Seed Quality
Soil Type
The third model shows all the variables are significant as they all have p-values less than
.050.
Y
= 11.316 + 6.757X
Seed_Quality
+ 0.292X
Table 2.2.5 Excluded Variables
- 2.286X
Soil_Type
Collinearity
Statistics
Partial
Model
1
Beta In
Soil Type
Sig.
Correlation
Tolerance
-.080
-.315
.758
-.090
.156
.067
.471
.646
.135
.492
.494
5.063
.000
.825
.340
.449
3.776
.003
.737
.328
-.307
-2.431
.033
-.591
.144
-.067
-.770
.457
-.226
.444
.109
.608
.555
.180
.106
.088
.907
.386
.276
.249
.148
.998
.342
.301
.105

Percentage of Automation in
the Cultivation Process
2
Soil Type

BACKWARD METHOD
Table 2.3.1 Variables Entered/Removed
Model
1
Variables
Variables
Entered
Removed
Method
Percentage of
Automation in
the Cultivation
Process,
Quantity of
Fertilizers (in
quintals per sq.
km. of land),
. Enter
Seed Quality,
Percentage of
land being
irrigated by the
State Agriculture
Department, Soil
Type
Quantity of
.
Fertilizers (in
quintals per sq.
km. of land)
Percentage of
.
Automation in
the Cultivation
Process
Backward
(criterion:
Probability of Fto-remove >=
.100).
Backward
(criterion:
Probability of Fto-remove >=
.100).
In BACKWARD Selection, initially all the variables are entered simultaneously and then the
least significant factors are removed.
Table 2.3.2 ANOVA

Model
1
Sum of Squares
Regression
296.539
33.040
3.671
Total
1515.733
14
Regression
1480.891
370.223
34.842
10
3.484
Total
1515.733
14
Regression
1477.421
492.474
38.312
11
3.483
1515.733
14
Residual
Mean Square
1482.693
Residual
df
Residual
Total
Sig.
b
80.775
.000
106.258
.000
141.397
.000
The variables seed quality, percentage of land irrigated by the State Agriculture Department
and the Soil Type is the model with the maximum F-statistics.
Coefficients
Standardized
Model
1
Std. Error
(Constant)
10.415
2.242
Soil Type
-2.983
1.294
.310
Coefficients
Beta
Sig.
4.644
.001
-.401
-2.305
.047
.443
.071
.701
.501
.234
.079
.442
2.964
.016
6.951
1.235
.793
5.627
.000
.126
.157
.125
.802
.443
(Constant)
11.013
2.020
5.452
.000
Soil Type
-2.383
.946
-.320
-2.520
.030
.230
.077
.435
2.999
.013
6.632
1.119
.757
5.928
.000
.149
.150
.148
.998
.342
(Constant)
11.316
1.997
5.667
.000
Soil Type
-2.286
.940
-2.431
.033

Seed Quality
2

Seed Quality
3
-.307

.292
.045
.552
6.450
.000
6.757
1.112
.771
6.078
.000
Seed Quality
. The regression model in this scenario is as follows:
= 11.316 + 6.757X
+ 0.292X
Seed_Quality
Excluded Variables
- 2.286X
Soil_Type
Collinearity
Model
2
Beta In

Sig.
Partial
Statistics
Correlation
Tolerance
.701
.501
.227
.237
.088
.907
.386
.276
.249
.148
.998
.342
.301
.105
.071
The equations from the above three processes are:

ENTER Method:
FORWARD Method:
BACKWARD Method:
Y
Y
= 10.415 + 6.951X
+ 0.234X
- 2.983X
= 11.316 + 6.757X
+ 0.292X
- 2.286X
Seed_Quality
Seed_Quality
= 11.316 + 6.757X
Seed_Quality
+ 0.292X
Soil_Type +1.916
Soil_Type +
1.866
- 2.286X
Soil_Type +
Now substitute the values to the equations in each mthod:

Yield Rainfall SoilType Fertilizer IrrigatedArea SeedType Automation
28
64
42
21
From ENTER Method:

YPredicted_Yield_Value = 10.415 + 6.951(1) + 0.234(42) - 2.983(1) + 1.916 24.211
quintals
FORWARD and BACKWARD Method of selection:
YPredicted_Yield_Value = 11.316 + 6.757(1) + 0.292(42) - 2.286(1) + 1.866 29.917
quintals
This value is nearer to the predicted value of yield.
Therefore it is suitable for the employee to select the regression model generated either out of
the BACKWARD or the FORWARD selection process,
1.866
After removing the negatively correlated variable Soil Type

ENTER METHOD
a
Table 4.1.1 ANOVA

Model
1
Sum of Squares
Regression
Mean Square
1463.184
365.796
52.550
10
5.255
1515.733
14
Residual
Total
df
Sig.
69.610
.000
Model
R Square
.983
Adjusted R
Std. Error of the
Square
Estimate
.965
.951
2.292
Standardized
Model
1
B
(Constant)
Coefficients
Std. Error
12.578
2.437
-.365
.397
.215
Beta
Sig.
5.162
.000
-.083
-.918
.380
.094
.406
2.284
.045
4.785
.959
.546
4.988
.001
.149
.188
.148
.796
.445

Seed Quality

Y
= 12.578 + 4.785X
Seed_Quality
+ 0.215X
+ 2.292
FORWARD METHOD
a
Table 4.2.1 ANOVA

Model
1
Sum of Squares
Regression
Mean Square
1331.027
1331.027
184.706
13
14.208
Total
1515.733
14
Regression
1456.843
728.422
58.890
12
4.908
1515.733
14
Residual
df
Residual
Total
Sig.
b
93.680
.000
148.430
.000
Model
1
2
Adjusted R
Std. Error of the
Square
Estimate
R Square
.937
.878
.869
3.769
.980
.961
.955
2.215
Standardized
Model
(Constant)
Seed Quality
(Constant)
Seed Quality
Coefficients
Std. Error
21.875
2.308
8.213
.849
12.425
2.307
4.698
.855
.262
.052
Beta
Sig.
9.477
.000
9.679
.000
5.385
.000
.536
5.497
.000
.494
5.063
.000
.937

Therefore the regression equation is:
= 12.425 + 4.698X
Seed_Quality
+ 0.262X
+ 2.215
BACKWARD METHOD
= 12.425 + 4.698X
Seed_Quality
+ 0.262X
+ 2.215

Regression

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Regression

Diunggah oleh

Hak Cipta:

Format Tersedia

Decision Model

QUESTION: A biotech major, specializing in the development of genetically modifying

Yield Rainfall SoilType Fertilizer IrrigatedArea SeedType Automation

Rainfall (in cms.) Pearson Correlation

Percentage of land being

Mean Square - The Sum of Squares divided by their respective DF.

2.1.3 Table of Model Summary

Std. Error of the Estimate

Table 2.1.4 Table of Coefficients

Quantity of Fertilizers (in

Percentage of land being

Model - Number of the model reported which in this case is 1.

The regression equation is:

Probability-of-Fto-enter <= .050)

to-enter <= .050)

Probability-of-Fto-enter <= .050)

Table 2.2.2 ANOVA

Table 2.2.3 Model Summary

Std. Error of the

Percentage of land being

Table 2.2.5 Excluded Variables

Percentage of land being

Quantity of Fertilizers (in

Table 2.3.1 Variables Entered/Removed

Table 2.3.2 ANOVA

Quantity of Fertilizers (in

Percentage of land being

Percentage of land being

. The regression model in this scenario is as follows:

Quantity of Fertilizers (in

The equations from the above three processes are:

Now substitute the values to the equations in each mthod:

From ENTER Method:

After removing the negatively correlated variable Soil Type

Table 4.1.1 ANOVA

Table 4.1.2 Model Summary

Std. Error of the

Table 4.1.3 Coefficients

Percentage of land being

The regression equation is:

Table 4.2.1 ANOVA

Table 4.2.2 Model Summary

Std. Error of the

Table 4.2.4 Coefficients

Percentage of land being

Therefore the regression equation is:

Anda mungkin juga menyukai