Anda di halaman 1dari 28

Project report

Course title:
ENGINEERING SATISTICS AND PROBABILITY
Submitted by:
GROUP N0 03
ALI RAZA

UW-13-CE-Bsc-011

NAEEM ZAFAR

UW-11-ME-Bsc-015

MASOOD CHANDIO

UW-13-CE-Bsc-027

RAJA ZULQERNAIN

UW-13-CE-Bsc-044
Submitted to:
SIR TARIQ

Department of civil engineering


WEC

APPLICATION OF CORRELATION AND


REGRESSION

Acknowledgment:

Countless gratitude to Almighty ALLAH, Who is then omnipotent, omnipresent & HE, who
blessed with the chance and choice, health and courage, and knowledge enabled us to complete
this project.
All respect for the HOLY PROPHET MUHAMMAD (S.A.W.W), who is forever a torch of
knowledge and guidance to humanity & enables us to shape our life according to the teachings of
ISLAM, & endowed us an exemplary guidance in every sphere of life.
I acknowledge the services of Mr. Tariq Hussain in helping and guiding me in compiling and
presenting the present report. In fact it would not have been possible for me to accomplish this
task without his help.
I dedicate this work to my Parents, to whom I am very thankful as they encouraged me and
provided me all the necessary resources that had made possible for me to be able to accomplish
this task.
Regards
Ali RAZA

Contents
Abstract...................................................................................................................... 5
Introduction:............................................................................................................... 6
Brief History of Correlation......................................................................................... 7
Types of Correlation.................................................................................................... 8
Correlation coefficient.............................................................................................. 12
Covariance............................................................................................................ 12
For a population...................................................................................................... 14
For a sample........................................................................................................... 15
Why Use Correlation?............................................................................................... 15
Regression................................................................................................................ 16
History...................................................................................................................... 16
Uses of Correlation and Regression..........................................................................21
Assumptions............................................................................................................. 21
Why Use Regression................................................................................................. 21
Application of correlation and regression.................................................................22
Correlation and Regression Conclusion....................................................................23
References................................................................................................................ 24

Abstract
The present review introduces methods of analyzing the relationship between two quantitative
variables. The calculation and interpretation of the sample product moment correlation
coefficient and the linear regression equation are discussed and illustrated. Common misuses of
the techniques are considered. Tests and confidence intervals for the population parameters are
described, and failures of the underlying assumptions are highlighted.

Introduction:
The most commonly used techniques for investigating the relationship between two quantitative
variables are correlation and linear regression. Correlation quantifies the strength of the linear
relationship between a pair of variables, whereas regression expresses the relationship in the
form of an equation. For example, in patients attending an accident and emergency unit (A&E),
we could use correlation and regression to determine whether there is a relationship between age
and urea level, and whether the level of urea can be predicted for a given age

Brief History of Correlation


Sir Francis Galton pioneered correlation (21, 35, 36, 39a, 42, 43). Galton, a cousin of Charles
Darwin, did a lot: he studied medicine, he explored Africa, he published in psychology and
anthropology, he developed graphic techniques to map the weather (39a, 42). And, like others of
his era, Galton strove to understand heredity (13, 14, 17, 20).
In 1877, Galton unveiled reversion, the earliest ancestor of correlation, and described it like this
(13): Reversion is the tendency of that ideal mean type to depart from the parent
type, revertingtowards what may be roughly and perhaps fairly described as the average
ancestral type.
The empirical fodder for this observation? The weights of 490 sweet peas. Nine years later,
Galton (14) that the offspring did not tend to resemble their parent seeds in size, but to be always
more mediocre than theyto be smaller than the parents, if the parents were large; to be larger
than the parents, if the parents were very small.
In Galton's subsequent writings (14, 17, 20), reversion evolved into regression.
It was in 1888 that Galton (15) first wrote about correlation: Two variable organs are said to be
co-related when the variation of the one is accompanied on the average by more or less variation
of the other, and in the same direction It is easy to see that co-relation must be the
consequence of the variations of the two organs being partly due to common causes. If they were
wholly due to common causes, the co-relation would be perfect, as is approximately the case
with the symmetrically disposed parts of the body. If they were in no respect due to common
causes, the co-relation would be nil The statures of kinsmen are co-related variables; thus, the
stature of the father is correlated to that of the adult son ; the stature of the uncle to that of the

adult nephew, and so on; but the index of co-relation, which is what I there [Ref. 14]
calledregression, is different in the different cases.
By 1889, Galton was writing co-relation as correlation (42), and he had become fascinated by
fingerprints (16, 19). Galton's 1890 account of his development of correlation (18) would be his
last substantive paper on the subject (43).
Karl Pearson, Galton's colleague and friend, and father of Egon Pearson, pursued the refinement
of correlation (33, 34, 37) with such vigor that the statistic r, a statistic Galton called the index of
co-relation (15) and Pearson called the Galton coefficient of reversion (36), is known today as
Pearson's r.
Correlation
Correlation and regression analysis are related in the sense that both deal with relationships
among variables. The correlation coefficient is a measure of linear association between two
variables. Values of the correlation coefficient are always between -1 and +1. A correlation
coefficient of +1 indicates that two variables are perfectly related in a positive linear sense, a
correlation coefficient of -1 indicates that two variables are perfectly related in a negative linear
sense, and a correlation coefficient of 0 indicates that there is no linear relationship between the
two variables. For simple linear regression, the sample correlation coefficient is the square root
of the coefficient of determination, with the sign of the correlation coefficient being the same as
the sign of b1, the coefficient of x1 in the estimated regression equation.
Neither regression nor correlation analyses can be interpreted as establishing cause-and-effect
relationships. They can indicate only how or to what extent variables are associated with each
other. The correlation coefficient measures only the degree of linear association between two
variables. Any conclusions about a cause-and-effect relationship must be based on the judgment
of the analyst.

Types of Correlation
Positive correlation occurs when an increase in one variable increases the value in another.
The line corresponding to the scatter plot is an increasing line.

Negative Correlation
Negative correlation occurs when an increase in one variable decreases the value of another.
The line corresponding to the scatter plot is a decreasing line.

No Correlation
No correlation occurs when there is no linear dependency between the variables.

Perfect Correlation

Perfect correlation occurs when there is a funcional dependency between the variables.
In this case all the points are in a straight line.

Strong Correlation
A correlation is stronger the closer the points are located to one another on the line.

Weak Correlation
A correlation is weaker the farther apart the points are located to one another on the line.

Through the coefficient of correlation, we can


measure the degree or extent of the correlation
between two variables.
On the basis of the coefficient of correlation we
can also determine whether the correlation is
positive or negative and also its degree or
extent.
Perfect correlation: If two variables changes in
the same direction and in the same proportion,
the correlation between the two is perfect
positive
Absence of correlation: If two series of two
variables exhibit no relations between them or
change in variable does not lead to a change in
the other variable
Limited degrees of correlation: If two
variables are not perfectly correlated or is there
a perfect absence of correlation, then we term
the correlation as Limited correlation

High degree, moderate degree or low degree are


the three categories of this kind of correlation.
The following table reveals the effect of
coefficient or correlation.

We shall consider the following most


commonly used methods.
(1) Scatter Plot
(2) Kar Pearsons coefficient of correlation

Correlation coefficient
Pearson's correlation coefficient is the covariance of the two variables divided by the product of
their standard deviations. The form of the definition involves a "product moment", that is, the
mean (the first moment about the origin) of the product of the mean-adjusted random variables;
hence the modifier product-moment in the name.
Covariance

Covariance indicates how two variables are related. A positive covariance means the variables
are positively related, while a negative covariance means the variables are inversely related. The
formula for calculating covariance of sample data is shown below.

x=
y=
n=
=

number
the

the
the
of
mean

data
of

independent
dependent
points
in
the
the
independent

variable
variable
sample
variable x

= the mean of the dependent variable y


To understand how covariance is used, consider the table below, which describes the rate of
economic growth (xi) and the rate of return on the S&P 500 (yi).

Using the covariance formula, you can determine whether economic growth and S&P 500
returns have a positive or inverse relationship. Before you compute the covariance, calculate the
mean of x and y. (The Summary Measures topic of the Discrete Probability Distributions section
explains the mean formula in detail.)

Now you can identify the variables for the covariance formula as follows.
x=
y=
=

2.1,
8,

2.5,
12,

4.0,
14,

and
and

3.6
10

(economic
(S&P
500

growth)
returns)
3.1

= 11
Substitute these values into the covariance formula to determine the relationship between
economic growth and S&P 500 returns.

The covariance between the returns of the S&P 500 and economic growth is 1.53. Since the
covariance is positive, the variables are positively relatedthey move together in the same
direction.

For a population
Pearson's correlation coefficient when applied to a population is commonly represented by the
Greek letter (rho) and may be referred to as the population correlation coefficientor
the population Pearson correlation coefficient. The formula for [7] is:

where:

is the covariance

is the standard deviation of

The formula for can be expressed in terms of mean and expectation. Since

[7]

Then the formula for can also be written as

where:

and

is the mean of

is the expectation.

are defined as above

The formula for can be expressed in terms of uncentered moments. Since

Then the formula for can also be written as

For a sample
Pearson's correlation coefficient when applied to a sample is commonly represented by the
letter r and may be referred to as the sample correlation coefficient or the sample Pearson
correlation coefficient. We can obtain a formula for r by substituting estimates of the covariances
and variances based on a sample into the formula above. So if we have one dataset {x1,...,xn}
containing n values and another dataset {y1,...,yn} containing n values then that formula for r is:

Why Use Correlation?


We can use the correlation coefficient, such as the Pearson Product Moment Correlation
Coefficient, to test if there is a linear relationship between the variables. To quantify the strength
of the relationship, we can calculate the correlation coefficient (r). Its numerical value ranges
from +1.0 to -1.0. r> 0 indicates positive linear relationship, r < 0 indicates negative linear
relationship while r = 0 indicates no linear relationship.

Regression
In statistics, regression is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analysing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. More
specifically, regression analysis helps one understand how the typical value of the dependent
variable (or 'criterion variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the independent variables
that is, the average value of the dependent variable when the independent variables are fixed.
Less commonly, the focus is on a quantile, or other location parameter of the conditional
distribution of the dependent variable given the independent variables. In all cases, the
estimation target is a function of the independent variables called the regression function. In
regression analysis, it is also of interest to characterize the variation of the dependent variable
around the regression function which can be described by a probability distribution.
Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so caution is advisable; for example, correlation does not imply
causation.

History
The earliest form of regression was the method of least squares, which was published
by Legendre in 1805,and by Gauss in 1809. Legendre and Gauss both applied the method to the
problem of determining, from astronomical observations, the orbits of bodies about the Sun
(mostly comets, but also later the then newly discovered minor planets). Gauss published a
further development of the theory of least squares in 1821,including a version of the Gauss
Markov theorem.
The term "regression" was coined by Francis Galton in the nineteenth century to describe a
biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors
tend to regress down towards a normal average (a phenomenon also known as regression toward
the mean). For Galton, regression had only this biological meaning, but his work was later
extended by Udny Yule and Karl Pearson to a more general statistical context. In the work of

Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and
1925. Fisher assumed that the conditional distribution of the response variable is Gaussian, but
the joint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's
formulation of 1821.
In the 1950s and 1960s, economists used electromechanical desk calculators to calculate
regressions. Before 1970, it sometimes took up to 24 hours to receive the result from one
regression.
Regression methods continue to be an area of active research. In recent decades, new methods
have been developed for robust regression, regression involving correlated responses such
as time series and growth curves, regression in which the predictor or response variables are
curves, images, graphs, or other complex data objects, regression methods accommodating
various types of missing data, nonparametric regression, Bayesian methods for regression,
regression in which the predictor variables are measured with error, regression with more
predictor variables than observations, and causal inference with regression.

Regression analysis is a mathematical measure


of the averages relationship between two or
more variable in terms of the original units of
data.
Types of Regression
(i) Simple Regression
(Two
Variable at a time)
(ii) Multiple Regression (More than
two variable at a time)
Linear Regression: If the regression curve is a
straight line then there is a linear regression
between the variables .
Non-linear Regression/ Curvilinear Regression:
If the regression curve is not a straight line
then there is a non-linear regression between
the variables.

Regression analysis helps in


three important ways :

Or

It provides estimate of
values
of
dependent
variables from values of
independent variables.

It can be extended to 2 or
more variables, which is
known
as
multiple
regression.

It shows the nature of


relationship between two
or more variable.

Algebraically method-:
1.Least Square Method-:
The regression equation of X on Y is :
X= a+bY
Where,
X=Dependent variable
Y=Independent variable
The regression equation of Y on X is:
Y = a+bX
Where,
Y=Dependent variable
X=Independent variable
And the values of a and b in the above
equations are found by the method of least
of Squares-reference . The values of a and b
are found with the help of normal equations
given below:
(I )
(II )

Solution-:

X=0.49+0.74Y

Substitution the values from the table we


get
29=5a+24b(i)
168=24a+142b
84=12a+71b..(ii)
Multiplying equation (i ) by 12 and (ii) by 5
348=60a+288b(iii)
420=60a+355b(iv)
By solving equation(iii)and (iv) we get
a=0.66 and b=1.07

Uses of Correlation and Regression


There are three main uses for correlation and regression.

One is to test hypotheses about cause-and-effect relationships. In this case, the


experimenter determines the values of the X-variable and sees whether variation in X causes
variation in Y. For example, giving people different amounts of a drug and measuring their
blood pressure.

The second main use for correlation and regression is to see whether two variables are
associated, without necessarily inferring a cause-and-effect relationship. In this case, neither
variable is determined by the experimenter; both are naturally variable. If an association is
found, the inference is that variation in X may cause variation in Y, or variation in Y may
cause variation in X, or variation in some other factor may affect both X and Y.
The third common use of linear regression is estimating the value of one variable
corresponding to a particular value of the other variable.

Assumptions
Some underlying assumptions governing the uses of correlation and regression are as follows.
The observations are assumed to be independent. For correlation, both variables should be
random variables, but for regression only the dependent variable Y must be random. In carrying
out hypothesis tests, the response variable should follow Normal distribution and the variability
of Y should be the same for each value of the predictor variable. A scatter diagram of the data
provides an initial check of the assumptions for regression.

Why Use Regression


In regression analysis, the problem of interest is the nature of the relationship itself between
the dependent variable (response) and the (explanatory) independent variable.
The analysis consists of choosing and fitting an appropriate model, done by the method of least
squares, with a view to exploiting the relationship between thevariables to help estimate the
expected response for a given value of the independent variable. For example, if we are
interested in the effect of age on height, then by fitting a regression line, we can predict the
height for a given age.

Application of correlation and regression


1.

Bridge Engineering

2. Construction engineering
3. Environmental engineering
4. Fire protection engineering
5. Geotechnical engineering
6. Hydraulic engineering
7. Materials science
8. Structural engineering
9. Surveying
10. Timber Engineering
11. Transportation engineering
12. Water resources engineering
13. Agricultural Engineering
14. Civil Engineering
15. Chemical Engineering
16. Electrical Engineering
17. Environmental Engineering
18. Industrial Engineering
19. Marine Engineering
20. Material Science

21. Mechanical & Industrial Engineering


22. Mechanical Engineering

Correlation and Regression Conclusion


Although they may not know it, most successful businessmen rely on regression analysis to
predict trends to ensure the success of their businesses. Consciously or unconsciously, they rely
on regression to ensure that they produce the right products at the right time. They use it to
measure the success of their marketing and advertising efforts. They rely on inference to
predict future market trends and react to them. That is also why statistical analysis is gaining in
popularity as a career. If you are interested in statistics and how you can help business predict
future trends or measure current success, try this course in Introductory statistics from
Udemy today.

References
1. Whitley E, Ball J. Statistics review 1: Presenting and summarising data. Crit
Care. 2002;6:6671. doi: 10.1186/cc1455. [PMC free article] [PubMed] [Cross Ref]
2. Kirkwood BR, Sterne JAC. Essential Medical Statistics. 2. Oxford: Blackwell Science;
2003.
3. Whitley E, Ball J. Statistics review 2: Samples and populations. Crit Care. 2002;6:143
148. doi: 10.1186/cc1473. [PMC free article] [PubMed] [Cross Ref]
4. Bland M. An Introduction to Medical Statistics. 3. Oxford: Oxford University Press;
2001.
5. Bland M, Altman DG. Statistical methods for assessing agreement between two methods
of clinical measurement. Lancet. 1986;i:307310. [PubMed]
6. Zar JH. Biostatistical Analysis. 4. New Jersey, USA: Prentice Hall; 1999.
7. Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall; 1991.

Anda mungkin juga menyukai