Anda di halaman 1dari 10

CLUSTERING:

The datset which I am using now is metals. It contains the combinations of different metals. It
consists of 543 observations on three variables namely metal, concentrations- conc,
Luminscene induction factor-BIF.
Hierarchal Clustering :
It is the clustering type in which the data is divided into clusters in the form of hierarchy in
such a manner that intra cluster similarity is high and inter cluster differences are high.
i. Complete & Euclidean method
ii. Average Method
iii. Ward.D method

Code:
Install.packages(cluster)
library(cluster)
metals

Here, Firstly we has to install the package of luster by using the first step in the code
Install.packages(cluster). Then we has to enable it by using the library step.
Then we need to import the data set which we downloaded earlier. For this we can directly go
to environment block and select > import > browse > metals > import by this the data set will
be imported tn r.
For vieing the data set set in r we can simply need to type the name of the data set it will
show the data set.
I. Method=complete
c1=hclust(dist(metals[,3:4]),method='complete')
plot(c1,cex=0.3) # dendogram
cut1=cutree(c1,3)
#confusion/ error matrix
table(metals$metal,cut1)
rect.hclust(c1,k=3,border=2:5)
II. Method=ward.D
c2=hclust(dist(metals[,3:4]),method='ward.D')
plot(c2,cex=0.3) # dendogram
cut2=cutree(c2,3)
#confusion/ error matrix
table(metals$metal,cut2)
rect.hclust(c2,k=3,border=2:5)
III.Method=average
c3=hclust(dist(metals[,3:4]),method='average')
plot(c3,cex=0.3) # dendogram
cut3=cutree(c3,3)
#confusion/ error matrix
table(metals$metal,cut3)
rect.hclust(c3,k=3,border=2:5)
Result:
The model that could be accepted in this case is ward.D method. As the cluster dendrogram is
quite clear and distinct which could give us better results as compared to other methods used.

Practical Implementation:
The clustering method can be used by marketers at the time of segmentation to segment the
population into homogenous clusters in order to understand the preferred segements.

Data set: https://vincentarelbundock.github.io/Rdatasets/datasets.html


LINEAR REGRESSION:
The data set considers various attributes which determines the price of a house such as no. of
bedrooms, no. of bath rooms, square foots etc.
Code:
#linear regression using houses
houses.lm=lm(bedrooms~price,data=kc_house_data)
names(houses.lm)
houses.lm$coefficients
confint(houses.lm)
houses.lm$fitted.values
houses.lm$residuals
houses.lm$call

plot(bedrooms~price,data=kc_house_data)
plot(bedrooms~price,data=kc_house_data,cex=.7,pch=17,col='green')
abline(houses.lm)
abline(houses.lm,lwd=2)
abline(houses.lm,lwd=3,pch=17,col='red')
Here the linear regresiion is performed on the data houses. We took price in x axis and bed
rooms in y axis for the plot. Basically the price is determined by various things but in which
no. of bedrooms is a major concern of every one. So it was taken as an attribute in which
price is majorly depende.
Dataset: https://www.kaggle.com/shivam330333/regression-on-kc-house-data/data
Regression(Multiple Linear Regression):
About dataset(Regression)
Columns - age: age of primary beneficiary
sex: insurance contractor gender, female, male
bmi: Body mass index, providing an understanding of body, weights that are relatively high
or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height
to weight, ideally 18.5 to 24.9
children: Number of children covered by health insurance / Number of dependents
smoker: Smoking
region: the beneficiary's residential area in the US, northeast, southeast, southwest,
northwest.
charges: Individual medical costs billed by health insurance.

Code:
regressor=lm(charges~age+bmi+children+smoker,data=train_ins)
summary(regressor)
Call:
lm(formula = charges ~ age + bmi + children + smoker, data = train_ins)

In console:

Residuals:
Min 1Q Median 3Q Max
-11750 -3040 -1020 1433 26252

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12185.35 1087.23 -11.208 < 2e-16 ***
age 259.63 13.76 18.873 < 2e-16 ***
bmi 322.30 31.79 10.138 < 2e-16 ***
children 508.16 161.70 3.143 0.00172 **
smokeryes 23670.08 474.63 49.871 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6095 on 1001 degrees of freedom


Multiple R-squared: 0.7488, Adjusted R-squared: 0.7478
F-statistic: 745.9 on 4 and 1001 DF, p-value: < 2.2e-16
Plot:

It can be clearly interpreted that Multiple linear regression model can predict the value of
charges more accurately and this model is more significant than the simple linear model as
the value of the R-Square and Adj R-square is more for Multiple linear regression model
which tells us how much variance in our target variable(Charges) or dependent variable can
be explained by independent variables.
In other words more the value of R-square the better and to nullify the effect of more number
of variables which does increase the value of R-square we check the value Adjusted R-Square
which is again more in case of Multiple linear regression.

Link of data: https://www.kaggle.com/mirichoi0218/insurance


Practical Applications

➢ Trend Line Analysis:

Linear regression is used in the creation of trend lines, which uses past data to predict
future performance or "trends." Usually, trend lines are used in business to show the
movement of financial or product attributes over time. Stock prices, oil prices, or
product specifications can all be analyzed using trend lines.

➢ Risk Analysis for Investments:

The capital asset pricing model was developed using linear regression analysis, and a
common measure of the volatility of a stock or investment is its beta--which is
determined using linear regression. Linear regression and its use is key in assessing the
risk associated with most investment vehicles.

➢ Sales or Market Forecasts:

Multivariate (having more than two variables) linear regression is a sophisticated


method for forecasting sales volumes, or market movement to create comprehensive
plans for growth. This method is more accurate than trend analysis, as trend analysis
only looks at how one variable changes with respect to another, where this method
looks at how one variable will change when several other variables are modified.

➢ Total Quality Control:

Quality control methods make frequent use of linear regression to analyze key product
specifications and other measurable parameters of product or organizational quality
(such as number of customer complaints over time, etc).

➢ Linear Regression in Human Resources:

Linear regression methods are also used to predict the demographics and types of future
work forces for large companies. This helps the companies to prepare for the needs of
the work force through development of good hiring plans and training plans for the
existing employees.

Anda mungkin juga menyukai