Anda di halaman 1dari 15

Decision Model

To be Submitted to
PROF. SREEDHARA R.

Presented by:
Philip Chehalan
1527614, M1

QUESTION: A biotech major, specializing in the development of genetically modifying


seeds to know the degree of yield of wheat per hectare is influenced by the variety of seeds,
used for the crop & few other variables. It deputed an employee at the Bangalore Head Office
to conduct a research to study the effect that 6 different conditions (independent variables) on
the yield per hectare (dependent variable) for a crop of a wheat. The research was conducted
by accumulating data from 15 major states in India.
Dependent Variable
Y = yield per hectare in quintals
Independent Variables
X1 = Rainfall in cms.
X2 = Soil Type [1 = Low Quality; 5 = High Quality of Soil Suitable for Wheat]
X3 = Quantity of Fertilizers [in quintals per sq. km. of land]
X4 = Percentage of land being irrigated by the State Agriculture Department
X5 = Seed Quality [1= Low; 5 = High level of Genetically modified quality seeds]
X6 = Percentage of automation in the cultivation process
Build a regression model for the dependent variable.

Yield Rainfall SoilType Fertilizer IrrigatedArea SeedType Automation


40
23
3
8
78
2
45
30
12
1
3
47
1
26
42
28
3
9
82
2
39
35
15
2
5
51
2
25
44
30
3
4
80
3
35
48
31
4
7
86
3
40
55
37
5
10
90
4
42
34
15
2
5
50
2
25
28
64
1
4
42
1
21
60
44
5
9
92
5
52
35
16
2
7
49
2
25
48
30
1
4
82
2
38
29
65
1
3
41
1
20
46
30
3
8
80
3
36
58
42
4
8
89
4
49

Table 1: Correlations
Quantity of Fertilizers
Rainfall
Soil
(in quintals per sq. km.
(in cms.) Type
of land)

Yield per
Hectare in
Quintals
Yield per Hectare Pearson Correlation
in Quintals
Sig. (2-tailed)

15
.067

Rainfall (in cms.) Pearson Correlation


Sig. (2-tailed)
N
Soil Type

Pearson Correlation
Sig. (2-tailed)
N

Quantity of
Fertilizers (in
quintals per sq.
km. of land)
Percentage of
land being
irrigated by the
State Agriculture
Seed Quality

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

.067

.848**

.701**

.929**

.937**

.916**

.813

.000

.004

.000

.000

.000

15

15

15

15

15

15

.024

-.067

-.009

.056

-.001

.932

.811

.976

.844

.998

15

15

15

15

15

**

**

**

.813
15

15

**

.024

.000

.932

.848

Percentage of land being


Percentage of
irrigated by the State
Seed
Automation in the
Agriculture Department Quality Cultivation Process

.847

.810

.919

.818

**

.000

.000

.000

.000

15

15

15

15

15

15

15

**

-.067

**

**

**

.736**

.004

.811

.000

.003

.003

.002

15

15

15

15

**

.941**

.000

.000
15

.701

.847

15

15

15

**

-.009

**

**

.000

.976

.000

.003

15

.929

.810

.706

.706

.713

.812

15

15

15

15

15

**

.056

**

**

**

.000

.844

.000

.003

.000

N
15
Percentage of
Pearson Correlation
.916**
Automation in
Sig. (2-tailed)
.000
the Cultivation
N
15
Process
**. Correlation is significant at the 0.01 level (2-tailed).

15

15

15

15

15

15

-.001

**

**

**

**

Pearson Correlation
Sig. (2-tailed)

.937

.919

.818

.713

.736

.812

.941

.820

**

.000

.820

.998

.000

.002

.000

.000

15

15

15

15

15

Find out the degree of correlation at p<0.05 i.e. 95% confidence level by which you find
which factors determine a greater yield of wheat.
The first column of the table is the dependent variable Y (or Yield per Hectare in
Quintals) shows what is the degree of relationship that the yield per hectare of wheat has
with independent. It also shows how the above mentioned independent variables are related
to each other.
Pearson Correlation - Measure the strength of linear relationship between the two
variables.The coefficient range from -1 to +1, with -1 indicating a perfect negative
correlation, +1 indicating a perfect positive correlation, and 0 indicating no correlation at all.
(A variable correlated with itself will always have a correlation coefficient of 1). From the
table thus we can see that all the independent variables have positive correlation with the
yield per hectare. Rainfall (in cms.) has negative corelation with the other independent
variables
The p-value was taken as 0.05 so all the variables with p-value less than 0.05 would be
considered as a significant factor for the conduction of research. Since the p-value shows that
rainfall with a p-value of .813 is definitely greater than 0.05 and thus is an insignificant
factor. All the other independent variables have their p-values less than 0.05 making them
significant factors of study.

15

Regression:
2.1 ENTER METHOD
Table of Variables Entered/Removed

Model
1

Variables
Entered

Variables
Removed

Percentage of
Automation in
the Cultivation
Process,
Quantity of
Fertilizers (in
quintals per sq.
.
km. of land),
Seed Quality,
Percentage of
land being
irrigated by the
State Agriculture
Department, Soil
b
Type

Method

Enter

This table tells us about the variables that are being used in the research.

Table of ANOVA

Model
1

Sum of Squares
Regression
Residual
Total

df

Mean Square

1482.693

296.539

33.040

3.671

1515.733

14

F
80.775

Sig.

.000b

ANOVA is a tells us whether the model is a good fit for the data. The variables involved
under this table can be interpreted as thus:
Model - SPSS allows you to specify multiple models in a single regression command.
Sum of Squares - These are the Sum of Squares associated with Total, Model and Residual.
df - degrees of freedom corresponds to the number of coefficients estimated -1.

Mean Square - The Sum of Squares divided by their respective DF.


F and Sig. -. The F-statistic is the Mean Square (Regression) divided by the Mean Square
Since the significance value is .000 which is less than our recommended p-value of .05. Thus
making this model a good fit for the data.

2.1.3 Table of Model Summary

Model

.989

R Square

Adjusted R Square

Std. Error of the Estimate

.978

.966

1.916

The next table shows the multiple linear regression model summary and overall fit
statistics. R2 more than 50% shows that the variables chosen for the study are significant.
The coefficient of determination for the model is 0.966 with the R = .978.

Table 2.1.4 Table of Coefficients


Table 2.1.4 Coefficients

Unstandardized Coefficients
Model
1

Std. Error

(Constant)

10.415

2.242

Soil Type

-2.983

1.294

Quantity of Fertilizers (in


quintals per sq. km. of land)

.310

Percentage of land being


irrigated by the State
Agriculture Department
Seed Quality
Percentage of Automation
in the Cultivation Process

Standardized
Coefficients
Beta

Sig.

4.644

.001

-.401

-2.305

.047

.443

.071

.701

.501

.234

.079

.442

2.964

.016

6.951

1.235

.793

5.627

.000

.126

.157

.125

.802

.443

It helps us to determine what is the nature of the regression equation and what factors
influences the yield of wheat

Model - Number of the model reported which in this case is 1.


B - These are the values for the regression equation for predicting the dependent variable
from the independent variable. The regression equation can be presented as:
Ypredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5+ Standard Error.
Std. Error standard errors associated with each coefficients.
Beta Coefficients obtained when all the variables in the regression are standardized
The coefficient for seed quality (6.951) is significantly different from 0 because its p-value is
0.000, which is smaller than 0.05, so we take the coefficient 10.415. The coefficient for
Quantity of fertilizers (0.310) is not statistically significantly different from 0 because its pvalue (.501) is larger than 0.05. So we take the coefficient (0.234) which is statistically
significant because its p-value of 0.016 is less than .05. The coefficient for percentage of
process automation (0.126) is not statistically significantly different from 0 because its pvalue (.443) which is larger than 0.05. So we take the coefficient (-2.983) which is
significantly different from 0 because the p-value is 0.047, which is smaller than 0.05.
The variables which would make up the regression equation to determine the yield of wheat
per hectare of land are:

Seed Quality
Percentage of land being irrigated by the Agricultural State Department
Soil type.

The regression equation is:


Y

Predicted_Yield_Value

= 10.415 + 6.951X

Seed_Quality

+ 0.234X

Percentage_of_Land_Irrigated

- 2.983X

Soil_Type

FORWARD METHOD
Table 2.2.1 Variables Entered/Removed

Model

Variables

Variables

Entered

Removed

Method
Forward

Seed Quality

(Criterion:

Probability-of-Fto-enter <= .050)

Percentage of

Forward

land being
irrigated by the

(Criterion:

Probability-of-F-

State Agriculture

to-enter <= .050)

Department
3

Forward
Soil Type

(Criterion:

Probability-of-Fto-enter <= .050)

Table 2.2.2 ANOVA


Model
1

Sum of Squares
Regression

1331.027

184.706

13

14.208

Total

1515.733

14

Regression

1456.843

728.422

58.890

12

4.908

Total

1515.733

14

Regression

1477.421

492.474

38.312

11

3.483

1515.733

14

Residual

Mean Square

1331.027

Residual

df

Residual
Total

Sig.
b

93.680

.000

148.430

.000

141.397

.000

The third model is the final model after all the variables have been assessed which is Seed
Quality, Percentage of Land irrigated and the Soil Type. The significance of 3rd model is
.000 which makes it a good fit for the data.

Table 2.2.3 Model Summary

Model

R Square

Adjusted R

Std. Error of the

Square

Estimate

.937

.878

.869

3.769

.980

.961

.955

2.215

.987

.975

.968

1.866

From the above Table, we can see that the adjusted coefficient of Determination is .968
which is greater than that of ENTER method (.966). Therefore there is a greater degree of
variance in the dependent variable.
Table 2.2.4 Coefficients

Standardized
Unstandardized Coefficients
Model
1

B
(Constant)
Seed Quality

(Constant)
Seed Quality

Std. Error

21.875

2.308

8.213

.849

12.425

2.307

4.698

.855

.262

.052

11.316

1.997

6.757

1.112

.292

-2.286

Coefficients
Beta

Sig.

9.477

.000

9.679

.000

5.385

.000

.536

5.497

.000

.494

5.063

.000

5.667

.000

.771

6.078

.000

.045

.552

6.450

.000

.940

-.307

-2.431

.033

.937

Percentage of land being


irrigated by the State
Agriculture Department
3

(Constant)
Seed Quality
Percentage of land being
irrigated by the State
Agriculture Department
Soil Type

The third model shows all the variables are significant as they all have p-values less than
.050.
The regression equation is:
Y

Predicted_Yield_Value

= 11.316 + 6.757X

Seed_Quality

+ 0.292X

Percentage_of_Land_Irrigated

Table 2.2.5 Excluded Variables

- 2.286X

Soil_Type

Collinearity
Statistics
Partial
Model
1

Beta In
Soil Type
Quantity of Fertilizers (in
quintals per sq. km. of land)

Sig.

Correlation

Tolerance

-.080

-.315

.758

-.090

.156

.067

.471

.646

.135

.492

.494

5.063

.000

.825

.340

.449

3.776

.003

.737

.328

-.307

-2.431

.033

-.591

.144

-.067

-.770

.457

-.226

.444

.109

.608

.555

.180

.106

.088

.907

.386

.276

.249

.148

.998

.342

.301

.105

Percentage of land being


irrigated by the State
Agriculture Department
Percentage of Automation in
the Cultivation Process
2

Soil Type
Quantity of Fertilizers (in
quintals per sq. km. of land)
Percentage of Automation in
the Cultivation Process

Quantity of Fertilizers (in


quintals per sq. km. of land)
Percentage of Automation in
the Cultivation Process

BACKWARD METHOD

Table 2.3.1 Variables Entered/Removed

Model
1

Variables

Variables

Entered

Removed

Method

Percentage of
Automation in
the Cultivation
Process,
Quantity of
Fertilizers (in
quintals per sq.
km. of land),

. Enter

Seed Quality,
Percentage of
land being
irrigated by the
State Agriculture
Department, Soil
Type

Quantity of
.

Fertilizers (in
quintals per sq.
km. of land)

Percentage of
.

Automation in
the Cultivation
Process

Backward
(criterion:
Probability of Fto-remove >=
.100).
Backward
(criterion:
Probability of Fto-remove >=
.100).

In BACKWARD Selection, initially all the variables are entered simultaneously and then the
least significant factors are removed.

Table 2.3.2 ANOVA


Model
1

Sum of Squares
Regression

296.539

33.040

3.671

Total

1515.733

14

Regression

1480.891

370.223

34.842

10

3.484

Total

1515.733

14

Regression

1477.421

492.474

38.312

11

3.483

1515.733

14

Residual

Mean Square

1482.693

Residual

df

Residual
Total

Sig.
b

80.775

.000

106.258

.000

141.397

.000

The variables seed quality, percentage of land irrigated by the State Agriculture Department
and the Soil Type is the model with the maximum F-statistics.
Coefficients

Standardized
Unstandardized Coefficients
Model
1

Std. Error

(Constant)

10.415

2.242

Soil Type

-2.983

1.294

.310

Coefficients
Beta

Sig.

4.644

.001

-.401

-2.305

.047

.443

.071

.701

.501

.234

.079

.442

2.964

.016

6.951

1.235

.793

5.627

.000

.126

.157

.125

.802

.443

(Constant)

11.013

2.020

5.452

.000

Soil Type

-2.383

.946

-.320

-2.520

.030

.230

.077

.435

2.999

.013

6.632

1.119

.757

5.928

.000

.149

.150

.148

.998

.342

(Constant)

11.316

1.997

5.667

.000

Soil Type

-2.286

.940

-2.431

.033

Quantity of Fertilizers (in


quintals per sq. km. of land)
Percentage of land being
irrigated by the State
Agriculture Department
Seed Quality
Percentage of Automation in
the Cultivation Process
2

Percentage of land being


irrigated by the State
Agriculture Department
Seed Quality
Percentage of Automation in
the Cultivation Process
3

-.307

Percentage of land being


irrigated by the State

.292

.045

.552

6.450

.000

6.757

1.112

.771

6.078

.000

Agriculture Department
Seed Quality

. The regression model in this scenario is as follows:

Predicted_Yield_Value

= 11.316 + 6.757X

+ 0.292X

Seed_Quality

Percentage_of_Land_Irrigated

Excluded Variables

- 2.286X

Soil_Type

Collinearity

Model
2

Beta In
Quantity of Fertilizers (in
quintals per sq. km. of land)

Quantity of Fertilizers (in


quintals per sq. km. of land)
Percentage of Automation in
the Cultivation Process

Sig.

Partial

Statistics

Correlation

Tolerance

.701

.501

.227

.237

.088

.907

.386

.276

.249

.148

.998

.342

.301

.105

.071

The equations from the above three processes are:


ENTER Method:
FORWARD Method:
BACKWARD Method:
Y
Y

Predicted_Yield_Value

Predicted_Yield_Value

= 10.415 + 6.951X

+ 0.234X

- 2.983X

= 11.316 + 6.757X

+ 0.292X

- 2.286X

Seed_Quality

Predicted_Yield_Value

Seed_Quality

= 11.316 + 6.757X

Percentage_of_Land_Irrigated

Seed_Quality

Percentage_of_Land_Irrigated

+ 0.292X

Soil_Type +1.916

Percentage_of_Land_Irrigated

Soil_Type +

1.866

- 2.286X

Soil_Type +

Now substitute the values to the equations in each mthod:


Yield Rainfall SoilType Fertilizer IrrigatedArea SeedType Automation
28

64

42

21

From ENTER Method:


YPredicted_Yield_Value = 10.415 + 6.951(1) + 0.234(42) - 2.983(1) + 1.916 24.211
quintals
FORWARD and BACKWARD Method of selection:
YPredicted_Yield_Value = 11.316 + 6.757(1) + 0.292(42) - 2.286(1) + 1.866 29.917
quintals
This value is nearer to the predicted value of yield.
Therefore it is suitable for the employee to select the regression model generated either out of
the BACKWARD or the FORWARD selection process,

1.866

After removing the negatively correlated variable Soil Type


ENTER METHOD
a

Table 4.1.1 ANOVA


Model
1

Sum of Squares
Regression

Mean Square

1463.184

365.796

52.550

10

5.255

1515.733

14

Residual
Total

df

Sig.

69.610

.000

Table 4.1.2 Model Summary

Model

R Square

.983

Adjusted R

Std. Error of the

Square

Estimate

.965

.951

2.292

Table 4.1.3 Coefficients

Standardized
Unstandardized Coefficients
Model
1

B
(Constant)
Quantity of Fertilizers (in
quintals per sq. km. of land)

Coefficients

Std. Error

12.578

2.437

-.365

.397

.215

Beta

Sig.

5.162

.000

-.083

-.918

.380

.094

.406

2.284

.045

4.785

.959

.546

4.988

.001

.149

.188

.148

.796

.445

Percentage of land being


irrigated by the State
Agriculture Department
Seed Quality
Percentage of Automation in
the Cultivation Process

The regression equation is:


Y

Predicted_Yield_Value

= 12.578 + 4.785X

Seed_Quality

+ 0.215X

Percentage_of_Land_Irrigated

+ 2.292

FORWARD METHOD
a

Table 4.2.1 ANOVA


Model
1

Sum of Squares
Regression

Mean Square

1331.027

1331.027

184.706

13

14.208

Total

1515.733

14

Regression

1456.843

728.422

58.890

12

4.908

1515.733

14

Residual

df

Residual
Total

Sig.
b

93.680

.000

148.430

.000

Table 4.2.2 Model Summary

Model

1
2

Adjusted R

Std. Error of the

Square

Estimate

R Square

.937

.878

.869

3.769

.980

.961

.955

2.215

Table 4.2.4 Coefficients

Standardized
Unstandardized Coefficients
Model

(Constant)
Seed Quality

(Constant)
Seed Quality

Coefficients

Std. Error

21.875

2.308

8.213

.849

12.425

2.307

4.698

.855

.262

.052

Beta

Sig.

9.477

.000

9.679

.000

5.385

.000

.536

5.497

.000

.494

5.063

.000

.937

Percentage of land being


irrigated by the State
Agriculture Department

Therefore the regression equation is:

Predicted_Yield_Value

= 12.425 + 4.698X

Seed_Quality

+ 0.262X

Percentage_of_Land_Irrigated

+ 2.215

BACKWARD METHOD

Predicted_Yield_Value

= 12.425 + 4.698X

Seed_Quality

+ 0.262X

Percentage_of_Land_Irrigated

+ 2.215

Anda mungkin juga menyukai