Anda di halaman 1dari 19

SMBA-Project Report

Relationship between Literacy and


Population exposed to Primary Schools
In Uttar Pradesh
Statistical Modeling for Business Analytics PROJECT REPORT

Submitted By:
Nishu Navneet (12125032)

Sushil Panigrahi (12125048)

Mangesh Dharwad(12125026)

SMBA-Project Report

INDEX
Introduction3
Data Collection and References.4
Variables 5
Tools and Methods.....7
Analysis .....8
Observation and Conclusion .....10
Result .11
Annexures...12

SMBA-Project Report

Introduction
Uttar Pradesh is the most populous state in India. Its population accounts for 16.4
per cent of the countrys population. It is also the fourth largest state in
geographical area covering 9.0 per cent of the countrys geographical area. It has
83 districts, 901 development blocks and 112,804 inhabited villages. The density
of population in the state is 473 persons per square kilometers while of the country
it is 274. The literacy rate in Uttar Pradesh stood at 56.27% overall with 67% male
and 43% female literate.
Indian constitution defines literacy for people aged seven years and above with
ability to read and write with understanding in any language. In India we
denominate its poor literacy rate with following reasons1. Absence of adequate school infrastructure
2. Improper facility
3. Inefficient teaching staff
4. Existence of Caste-Religion disparity
5. Poverty

The purpose of this report is to find the relationship between literacy of a district
in Uttar-Pradesh with number of person exposed to primary schools in their
villages. We have tried to measure its significance while putting many other factors
in account.

SMBA-Project Report

Data Collection
The relevant data to do the analysis has been collected from the government census
website.
It can be found herehttp://censusindia.gov.in/Tables_Published/Basic_Data_Sheet.aspx

Through the website data for each district can be collected. The tabling of data into
the excel sheet has been done manually. Utmost sincerity and precaution has been
taken care while putting data in excel columns.

The data collected is from the source of 2001 census data.

Other references have been taken from herehttp://lawmin.nic.in/ncrwc/finalreport/v2b1-5.htm


http://upgov.nic.in/upecon.aspx

SMBA-Project Report

Variables
Data collected has been assigned under various variables listed below.
Primary Variables - Collected in raw form
1. District (String) Name of the district
2. Literacy (Scale) Number of population literate in the district, according to
government standards.
3. Population (Scale) Population of the district
4. Males (Scale) Male population in the district
5. Hindu (Scale) Hindu population in the district
6. Muslim (Scale) Muslim population in the district
7. NoOfHouseholds (Scale) Number of Household present in the district
8. TotalVillages (Scale) Total number of villages present in the district
9. PrimarySchoolsAvail (Scale) Total number of villages with the facility of
primary schools
10. BusServiceAvail (Scale) Total number of villages with bus service
availability.
Secondary Variables Variables with operations on primary variables to do further
analysis
11. PrimarySchoolExposure (Scale) Average population exposed to primary
schools
12. BusServiceExposure (Scale) Average population exposed to Bus service
13. PctHindu (Scale) Percentage of hindu population in the district
14. PctMuslim (Scale) Percentage of Muslim Population in the district

SMBA-Project Report

15. PrimarySchoolExposureSquared (Scale) Square of Average population


exposed to primary schools
16. PrimarySchoolExposureCube (Scale) Cube of Average population
exposed to primary schools
17. NormalLofOfPrimarySchoolExposure (Scale) Normal logarithm of
Average population exposed to primary schools
18. NormalLofOfPrimarySchoolExposureSqaured (Scale) Square of Normal
logarithm of Average population exposed to primary schools
19. NormalLofOfPrimarySchoolExposureCube (Scale) Cube of Normal
logarithm of Average population exposed to primary schools
20. HighPercentageOfMuslimPopulation (Binomial) Value 1 assigned to
variable if the Percentage Muslim Population is above the mean of the
percentage Muslim population
21. InteractionBetweenMuslimPopulationAndPopulationExposureToSchool
(Numeric)

Multiplication

of

the

variables

HighPercentageOfMuslimPopulation and PrimarySchoolExposure, to study


interaction effect
22. HouseholdSize (Numeric) Average number of person staying in a
household for the district.
23. HighPercentHouseHoldSize (Binary) - Value 1 assigned to variable if the
Household size of the district is above the mean of the Household sizes
24. InteractionBetweenHighHouseholdSizeAndPSExpsoure
Multiplication

of

the

InteractionBetweenHighHouseholdSizeAndPSExpsoure
PrimarySchoolExposure, to study its interaction effect

(Numeric)

variables
and

SMBA-Project Report

Tools and Methods


IBM SPSS Statistics software has been used to do regression and other statistical
analysis.
The analysis includes
1. Scatter Plot
2. Curve Estimation
3. Linear Regression
4. Multiple Linear Regression
5. Polynomial Non-linear Regression
6. Linear-Log Modeling
7. Linear-Log Model with Higher Powers
8. Interaction Between Continuous and Binary Variable

SMBA-Project Report

Analysis
The relationship between Literacy and population exposed to primary schools can
be shown by the scatter plot. The scatter plot is suggesting a linear relationship.

1. Model 1 The equation 1 in Result (Page Number - 11) is depicting a linear


relationship of literacy with population exposed to primary schools. The
coefficient is statistically significant at 5% significance level and is giving a
model with Adjusted 2 as high as 71.7%
2. Model 2 The equation 2 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, population

SMBA-Project Report

exposed to bus services and percentage of Hindu population. The coefficients


are statistically significant at 5% significance level for two regressors but not
for the population exposed to bus service. The Adjusted 2 for the model is
72.90%.
3. Model 3 - The equation 3 in Result (Page Number - 11) is depicting a nonlinear relationship of literacy with population exposed to primary schools, to its
square and cube. The coefficient is statistically significant at 5% significance
level for the population exposed to primary schools only but not for its square
and cube. The Adjusted 2 for the model is 72.20%.
4. Model 4 - The equation 4 in Result (Page Number - 11) is depicting a LinearLog model of literacy with population exposed to primary schools. The
coefficient is statistically significant at 5% significance level for the natural log
of population exposed to primary schools. The Adjusted 2 for the model is
66.5%.
5. Model 5 - The equation 5 in Result (Page Number - 11) is depicting a LinearLog model of literacy with population exposed to primary schools, to its square
and cube. The coefficient is statistically significant at 5% significance level for
the natural log of population exposed to primary schools and to its cube but not
for its square. The Adjusted 2 for the model is 71.30%.
6. Model 6 The equation 6 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, High
percentage of Muslim population and their interaction term. The coefficients are
statistically significant at 5% significance level for population exposed to
primary schools but not for others. The Adjusted 2 for the model is 74.5%.
7. Model 7 The equation 6 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, High
percentage of Household size and their interaction term. The coefficients are
statistically significant at 5% significance level for population exposed to
primary schools and for high percentage of household size but not for their
interaction. The Adjusted 2 for the model is 72.7%.

SMBA-Project Report

10

Observation
1. There is a significant relationship between Literacy and population exposed to
primary schools.
2. There is no association between Literacy and population exposed to bus service.
3. Percentage of Hindu population is associated with literacy.
4. Relationship between Literacy and population exposed to primary schools is
linear in nature.
5. High percentage of Muslim population is negatively related with the Literacy
but it is not statistically significant.
6. High percentage of Household Size is negatively related with the Literacy but
there is no interaction between household size and population exposed to
primary schools.

Conclusion
1. From the above analysis we can conclude that more the population is exposed
to primary schools more will be the literacy in Uttar-Pradesh.
2. 1% increase in population exposed to primary school will lead to an increase of
[.01 * 998974.215 = 9989.74215] ~ 10000 literates in Uttar Pradesh
3. Decreasing the Household Size (number of people per household) will increase
the literacy in Uttar Pradesh

Result

SMBA-Project Report

(Sig.) p-Value

Regression Model of Literacy in Uttar Pradesh


Dependent Variable : Literacy; 68 Observations,
Significance Level = 10% , RED = Insignificant
Regressor
1
2
3
4
5
Unstandarised Intercept Coeffecients

96260.478

Population Exposed To Primary Schools


(Sig.)

0.849
0.001

-599561.359 -207629.731 -13046171.64 70834486.19

0.659
0.001

1.85
0.105

Sqaure Population Exposed To Primary Schools


(Sig.)

-1.105
0.273

Cube of Population Exposed To Primary Schools


(Sig.)

1.706
0.206

Natural Log Of Population Exposde To Primary Schools


(Sig.)

Square Of Natural Log Of Population Exposde To Primary Schools


(Sig.)

73238.91 -80423.001

0.931
0.001

0.813
0.001

11

0.946
0.001

-0.6402
0.001
**
7.219
0.001

Cube of Natural Log Of Population Exposde To Primary Schools


(Sig.)
Population Exposed To Bus Service
(Sig.)

0.073
0.701

Percentage of Hindu Population


(Sig.)

0.151
0.033

High Percentage Of Muslim Population


(Sig.)

-0.063
0.694

High Percentage Of Muslim Population * Population Exposed To Primary Schools


(Sig.)

-0.147
0.411

High Percentage Of HouseholdSize


(Sig.)

0.32
0.05

High Percentage Of HouseholdSize* Population Exposed To Primary Schools


(Sig.)

-0.272
0.118

R Sqaure
Adjusted R Square

0.721
0.717

0.741
0.729

0.734
0.722

0.661
0.665

0.721
0.713

0.757
0.745

0.739
0.727

Annexures

SMBA-Project Report

Curve Estimation
Model Summary and Parameter Estimates
Dependent Variable:Literacy
Model Summary
Equation

R Square

df1

Parameter Estimates
df2

Sig.

Constant

b1

b2

Linear

.721

170.983

66

.000

96260.478

.638

Logarithmic

.661

128.472

66

.000

998974.215

b3

13046171.63
6
Quadratic

.727

86.753

65

.000

323034.544

.354

7.442E-8

Cubic

.734

58.944

64

.000

-207629.731

1.390

-5.096E-7

The independent variable is Population Exposed to Primary Schools.

9.829E-14

12

Means
Descriptive Statistics
N

Minimum

Maximum

Mean

Std. Deviation

Percentage Muslim Population

68

2.95

49.14

18.0329

10.63096

Percentage Hindu Population

68

47.05

96.21

81.3617

11.00021

HouseholdSize

68

5.66

8.36

6.4465

.46777

Valid N (listwise)

68

Regression Results
Linear Regressions
1.

Linear Relation b/w Literacy and Primary School Exposure to the Population

Literacy

0 + 1PopulationExposedToPrimarySchool

Model Summary

Model
1

R
.849

Adjusted R

Std. Error of the

Square

Estimate

R Square
a

.721

.717

305639.51070

a. Predictors: (Constant), Population Exposed to Primary Schools

Coefficients

Model
1

(Constant)
Population Exposed to
Primary Schools

a. Dependent Variable: Literacy

Unstandardized

Standardized

95.0% Confidence Interval

Coefficients

Coefficients

for B

Std. Error

96260.478

92742.929

.638

.049

Beta

.849

Sig.

Lower

Upper

Bound

Bound

1.038

.303

-88906.754

281427.710

13.076

.000

.541

.735

2.

Multiple Regressions with linear Regressor

Literacy
=
0 + 1PopulationExposedToPrimarySchool + 2 PopulationExposedToBusService +
3PercentageHinduPopulation

Model Summary

Model
1

R
.861

R Square
a

Adjusted R

Std. Error of the

Square

Estimate

.741

.729

299389.95576

a. Predictors: (Constant), Population Exposed to Primary Schools,


Percentage Hindu Population, Population Exposed to Bus Service

Coefficients

Standardized
Unstandardized Coefficients
Model
1

B
(Constant)
Population Exposed to Bus

Std. Error

-599561.359

332102.285

.073

.190

7875.853

.659

Coefficients
Beta

Sig.

-1.805

.076

.030

.386

.701

3622.053

.151

2.174

.033

.057

.877

11.526

.000

Service
Percentage Hindu
Population
Population Exposed to
Primary Schools
a. Dependent Variable: Literacy

Non-Linear Regression
1. PolynomialRegression Model
Literacy
=
0 + 1PopulationExposedToPrimarySchool + 2
PopulationExposedToPrimarySchool2 + 3 PopulationExposedToPrimarySchool3

Model Summary

Model

R Square

.857

Adjusted R

Std. Error of the

Square

Estimate

.734

.722

303187.19985

a. Predictors: (Constant), Cube of population exposed to primary


school, Population Exposed to Primary Schools, Square of Population
exposed to primary schools

Coefficients

Model
1

(Constant)
Population Exposed to

Unstandardized

Standardized

95.0% Confidence Interval for

Coefficients

Coefficients

Std. Error

-207629.731

465363.453

1.390

.846

-5.096E-7

9.829E-14

Beta

Sig.

Lower Bound

Upper Bound

-.446

.657 -1137300.100

722040.638

1.850

1.643

.105

-.300

3.079

.000

-2.647

-1.105

.273

.000

.000

.000

1.706

1.278

.206

.000

.000

Primary Schools
Square of Population
exposed to primary
schools
Cube of population
exposed to primary
school
a. Dependent Variable: Literacy

2. Linear-Log Model

Literacy

0 + 1ln(Population Exposed To Primary School)

Model Summary

Model
1

R
.813

Adjusted R

Std. Error of the

Square

Estimate

R Square
a

.661

.655

337395.93911

a. Predictors: (Constant), Normal Log of Population Exposed to Primary


School

Coefficients

Standardized
Unstandardized Coefficients
Model
1

B
(Constant)
Normal Log of
Population Exposed
to Primary School

Std. Error

-13046171.636

1258245.056

998974.215

88135.391

Coefficients
Beta

.813

95.0% Confidence Interval for B


t

Sig.

Lower Bound

-10.369

.000

-15558338.945 -10534004.327

11.335

.000

823006.230

Upper Bound

1174942.201

3. Linear Log Model with powers


Literacy
=
0 + 1ln(PopulationExposedToPrimarySchool)+
2[ln(PopulationExposedToPrimarySchool)]2+ 3[ln(PopulationExposedToPrimarySchool)]3

Model Summary

Model
1

R
.849

R Square
a

Adjusted R

Std. Error of the

Square

Estimate

.721

.713

308180.18029

a. Predictors: (Constant), Cube of Normal Log of Population Exposed to


Primary School, Normal Log of Population Exposed to Primary School

Coefficients

Unstandardized Coefficients
Model
1

B
(Constant)

Std. Error

Population Exposed to

Standardized

95.0% Confidence Interval

Coefficients

for B

Beta

70834486.18 22362519.46
5

Normal Log of

t
3.168

Sig.
.002

- 2362248.104

Population Exposed to
Primary School
a. Dependent Variable: Literacy

26173450.82

1.155E8

6
-6.402

-3.331

.001

7868268.095

12586003.33 3150532.856

Primary School
Cube of Normal Log of

Lower Bound Upper Bound

3
14632.725

3895.918

7.219

3.756

.000

6852.039

22413.410

Interaction between Independent Variables


1. Continuous and Binary Variable

Literacy
=
0 + 1PopulationExposedToPrimarySchool +
2HighPercentageOfMuslimPopulation+ 3(HighPercentageOfMuslimPopulation *
PopulationExposedToPrimarySchool )
Model Summary

Model
1

R Square

.870

Adjusted R

Std. Error of the

Square

Estimate

.757

.745

290132.86319

a. Predictors: (Constant),
InteractionBetweenMuslimPopulationAndPopulationExposureToSchool,
Population Exposed to Primary Schools, Binary Variable of High
Muslim Population

Coefficients

Model
1

Unstandardized

Standardized

95.0% Confidence Interval for

Coefficients

Coefficients

B
(Constant)
Population Exposed to

Std. Error

73238.912

110327.732

.700

.062

-74442.482

-.079

Beta

Sig.

Lower Bound

Upper Bound

.664

.509

-147166.071

293643.894

.931

11.247

.000

.575

.824

188655.158

-.063

-.395

.694

-451324.484

302439.521

.096

-.147

-.828

.411

-.271

.112

Primary Schools
Binary Variable of High
Muslim Population
InteractionBetweenMusli
mPopulationAndPopulati
onExposureToSchool
a. Dependent Variable: Literacy

1. Continuous and Binary Variable


Literacy
=
0 + 1PopulationExposedToPrimarySchool + 2 HighPercentageOfHouseholdSize+
3 (HighPercentageOfHouseholdSize* PopulationExposedToPrimarySchool )

Model Summary

Model
1

R
.860

R Square
a

Adjusted R

Std. Error of the

Square

Estimate

.739

.727

300233.86148

a. Predictors: (Constant), Interaction Between HighHousehold Size and


Exposure to Primary Schools, Population Exposed to Primary Schools,
Binary Variable of High Household size

Coefficients

Model
1

Unstandardized

Standardized

95.0% Confidence Interval for

Coefficients

Coefficients

B
(Constant)
Population Exposed to

Std. Error

-80423.001

126561.439

.711

.066

365077.504

-.152

Beta

Sig.

Lower Bound

Upper Bound

-.635

.527

-333258.541

172412.539

.946

10.705

.000

.578

.844

182345.455

.320

2.002

.050

800.582

729354.427

.096

-.272

-1.586

.118

-.344

.039

Primary Schools
Binary Variable of High
Household size
Interaction Between
HighHousehold Size and
Exposure to Primary
Schools
a. Dependent Variable: Literacy

Anda mungkin juga menyukai