Group No : 03 Section : G
Group members :
SR. NO NAMES ROLL NO
1 Akash Basa 2017232011
2 Aman Srivastava 2017232015
3 Anindhya Sharma 2017232023
4 Anuankit Panda 2017232028
5 Chitwan Singh 2017232039
6 Tejas Bhosale 2017231103
Measures of central tendency & variability for house price data and
built up area (Slide no. 3 and 4)
Mean 132207.12
Standard Error 4014.57
Median 124500
Mode 124600
Standard Deviation 39944.4
Sample Variance 1595558516.37
Kurtosis 0.56
Skewness 0.85
Mean 1628.1
Standard Error 31.33
Median 1630
Mode 1950
Standard Deviation 313.33
Sample Variance 98181.2
Kurtosis -0.97
Skewness 0.06
2000
BUILT UP AREA (SQ. FEET)
1500
1000
500
0
0.00 50000.00 100000.00 150000.00 200000.00 250000.00 300000.00
HOUSE PRICE
FINDING:
Coefficient of Correlation (r) : 0.668
INTERPRETATION:
1) Weak Positive Correlation exists between House Prices and
Built up area : Both X and Y variables are moving in same direction
FINDING:
Coefficient of determination (r2) : 0.45
INTERPRETATION:
1) Model built through sample data for estimation captures only 45%
of variation and remaining 55% variation is captured by error or other
factors that are not considered (i.e. no of offers, no of bedrooms, no of
bathrooms etc.)
2) 45% variation in House prices is due to Built up area.
Business Statistics: House price data analysis 7
Scatter plot for No. of offers vs. House Prices
6
4
NO. OF OFFERS
0
0.00 50000.00 100000.00 150000.00 200000.00 250000.00 300000.00
HOUSE PRICE
FINDING:
Coefficient of Correlation (r) : 0.125
INTERPRETATION:
1) Zero Correlation exists between House Prices and
No. of Offers: i.e No Linear relationship exists between
X and Y variables.
FINDING:
Coefficient of determination (r2) : 0.0157
INTERPRETATION:
1) Model built through sample data for estimation captures only 1.57%
of variation and remaining 98.43% variation is captured by error or
other factors that are not considered.
2) ONLY 1.57% variation in No. of Offers is due to House Prices.
FINDING:
Coefficient of determination (r2) : 0.494
INTERPRETATION:
1) Model built through sample data for estimation captures only 49.4%
of variation and remaining 50.6% variation is captured by error or other
factors that are not considered
2) 49.4% variation in House prices is due to Built up area, no. of
bedrooms, no. of bathrooms, no. of offers received, brick/non brick house
and location of house.
Business Statistics: House price data analysis 11
Observations:
The goal is NOT to get the highest R-square value. Instead, the goal
is to develop a model that is statistically sound, creating the best fit
with existing data.
More the R-square value, better is the Model because we are able to
capture many reasons for Variance.
In case of simple regression: when only Sq. feet is considered as
our independent variable (i.e. X) , then we get 45% variation in our
dependent variable(i.e. Y), Price of houses.
In case of Multiple regression: when built up area, no. of bedrooms,
no. of bathrooms, no. of offers received, brick/non brick house and
location of house are considered as our independent variables, then
we get 49.4% variation in our dependent variable i.e. Price of
houses.
Conclusion: Limiting the no. of independent variables also limits
the usefulness of the Model. Hence to develop a model that fits the
data better, we have to go for Multiple Regression Model.