Anda di halaman 1dari 13

CORRELATION AND REGRESSION

OUTLINE
INTRODUCTION

SCATTER PLOTS

CORRELATION

REGRESSION

OBJECTIVES
DRAW A SCATTER PLOT FOR A SET OF ORDERED PAIRS.

FIND

THE CORRELATION

COEFFICIENT.

FIND

THE EQUATION OF

THE REGRESSION LINE.

OBJECTIVES
FIND THE COEFFICIENT OF DETERMINATION.

FIND

THE STANDARD

ERROR OF ESTIMATE.

INTRODUCTION

EVERY

DAY

WE

TAKE

PERSONAL

AND

PROFESSIONAL

DECISIONS

THAT

ARE

BASED

ON PREDICTIONS OF FUTURE EVENTS.

TO

MAKE

THESE

FORECASTS,

WE

RELY

ON

THE

RELATIONSHIP

BETWEEN

WHAT

IS

ALREADY

KNOWN AND

WHAT IS TO BE ESTIMATED.

REGRESSION

AND

CORRELATION THE STRENGTH

ANALYSIS

SHOW

US

HOW

TO

DETERMINE

BOTH

THE NATURE AND

OF A RELATIONSHIP BETWEEN TWO VARIABLES.

SIGNIFICANCE OF THE STUDY OF CORRELATION


MOST OF THE VARIABLES INCOME SHOW SOME AND KIND OF RELATIONSHIP ETC. BETWEEN PRICE AND SUPPLY, EXPENDITURE, CORRELATION ANALYSIS

GIVES THE DEGREE OF

RELATIONSHIP IN ONE

FIGURE

ONCE

WE

KNOW

THE

RELATIONSHIP

WE

CAN

ESTIMATE

THE

VALUE

OF

ONE

VARIABLE GIVEN THE VALUE OF ANOTHER.

CORRELATION BUSINESS,

ANALYSIS

CONTRIBUTES ANALYSIS

TO

THE

ECONOMIC THE

BEHAVIOUR. TO

IN

CORRELATION

ENABLES

EXECUTIVE

ESTIMATE

COSTS, PRICE, ETC.

TYPES OF CORRELATION
POSITIVE AND NEGATIVE

SIMPLE, PARTIAL AND

MULTIPLE AND

LINEAR AND

NON LINEAR

POSITIVE AND NEGATIVE CORRELATION


IF TWO VARIABLES THEY VARY ARE TOGETHER IN THE SAME DIRECTION OR IN OPPOSITE DIRECTIONS, SAID TO BE CORRELATED.

IF AS X INCREASES Y

INCREASES CONSISTENTLY, X&Y

ARE +VELY CORRELATED

IF

AS

INCREASES

DECREASES

AND

AS

DECREASES

INCREASES

X&Y

ARE

VELY CORRELATED

SIMPLE, PARTIAL AND MULTIPLE CORRELATION


WHEN ONLY TWO VARIABLES ARE ONLY STUDIED SIMPLE CORRELATION.

WHEN

TWO

OR

MORE

VARIABLES

ARE

STUDIED

PARTIAL

OR

MULTIPLE

CORRELATION.

IN

MULTIPLE

CORRELATION

TWO

OR

MORE

VARIABLES

ARE

STUDIED

SIMULTANEOUSLY

IN

PARTIAL

CORRELATION

MORE

THAN

TWO

VARIABLES

ARE

THERE

BUT

WE

CONSIDER ONLY

TWO VARIABLES (KEEPING THE

OTHER AS CONSTANT)

DEPENDENT & INDEPENDENT VARIABLES

THE

KNOWN

VARIABLE

IS

CALLED

THE

INDEPENDENT

VARIABLE

AND

THE

VARIABLE

WE ARE TRYING TO PREDICT IS

THE DEPENDENT VARIABLE.

IF

THE

CORRELATION LINE AS

IS

PERFECT IN

POSITIVE, FIGUREAND

ALL THE

THE

POINTS

WILL IS

LIE

IN

STRAIGHT

SHOWN

CORRELATION

PERFECT

NEGATIVE THEY

WILL BE IN

A LINE AS SHOWN IN FIGURE

EXAMPLE
SALES HOME OF MAJOR APPLIANCES GOOD, AND VARY ARE WITH THE THE NEW HOUSING OF TRADE MARKET. WHEN NEW SALES ARE SO SALES A DISHWASHERS, ASSOCIATION OF UNITS) WASHING COMPILED ON MAJOR

MACHINES, THE

DRINKERS

REFRIGERATORS. DATA ( IN

FOLLOWING

HISTORICAL

THOUSANDS

APPLIANCE SALES AND

HOUSING STARTS.

In this case, data points represents the relationship between the housing market and sales of house appliances. The relationship between X & Y is well described a straight line.
THE DIRECTION OF THE LINE CAN INDICATE WHETHER THE RELATIONSHIP IS DIRECT OR INVERSE.

William C Andrews, an organizational behavior consultant for Victory Motorcycles ,has designed a test to show the companys supervisors the dangers of over supervising their workers. A worker from the assembly line is given a series of complicated tasks to perform. During the workers performance, a supervisor constantly interrupts the worker to assist him or her in completing the tasks. The worker, upon completion of the tasks, is then given a psychological test designed to measure the workers

hostility toward authority (a high score equals low hostility). Eight different workers were assigned the tasks and then interrupted for the purpose of instructional assistance variance number of times. Their corresponding scores on the hostility test are revealed as follows. Predict the expected test score if the worker is interrupted 18 times.

How can we fit a line mathematically?


TO A STATISTICIAN, BETWEEN THE THE LINE WILL HAVE A GOOD THE FIT IF AND IT MINIMIZES THE ERROR ESTIMATED POINTS ON LINE ACTUAL OBSERVED

POINTS THAT WERE

USED

TO DRAW IT. (METHOD

OF LEAST SQUARES)

THE METHOD OF LEAST SQUARES


AN EQUATION IN IS A OF A LINE THAT IS DRAWN THAT THROUGH THE SUM OR THE OF MIDDLE THE OF A SET OF ON OF THE THE POINTS ERRORS SCATTER . DIAGRAM THE SUCH SQUARES LIE

MINIMUM

ESTIMATING

LINE

POINTS

THAT

ESTIMATING LINE

Slope of the best-fitting Regression line & Y-intercept of the best-fitting Regression line

THE

GIVEN

EQUATION

IS

REGRESSION

EQUATION X.

OF

ON

X.

IT

GIVES

MOST

PROBABLE VALUES OF Y

FOR GIVEN VALUES OF

THE

REGRESSION

LINE

OF

ON

GIVES

THE

PROBABLE

VALUES

OF

FOR

GIVEN

VALUES OF Y. SAY

X=A +

BY.

THE REGRESSION EQUATION OF Y

ON X CAN ALSO BE REPRESENTED BY

EXAMPLE
THE GENERAL SALES MANAGER OF KIRAN ENTERPRISES IN THE SALE OF HIS LAST READY-MADE SALES 10 TO MENS 80,000. IT WAS WEARS ON IS AN ENTERPRISE DEALING WITH THE IDEA OF OF TOYING THE

INCREASING DURING AND OF THE

CHECKING THAT

RECORDS SALE TO

SALES

YEARS,

FOUND

THE

ANNUAL

PROCEEDS EXTENT BEEN RS. A

ADVERTISEMENT IT AND WAS

EXPENDITURE NOTED

WERE THE

HIGHLY ANNUAL

CORRELATED AVERAGE

THE HAS

0.8.

FURTHER

THAT

SALE RS.

45,000

ANNUAL OF

AVERAGE AND

ADVERTISEMENT RS. 626 IN

EXPENDITURE

30,000

WITH

VARIANCE

RS.1600

ADVERTISEMENT

EXPENDITURE

RESPECTIVELY.

IN

VIEW

OF

THE

ABOVE,

HOW

MUCH

EXPENDITURE PF THE

ON

ADVERTISEMENT TO INCUR

YOU TO

WOULD

SUGGEST

THE

GENERAL

SALES

MANAGER

ENTERPRISE

MEET HIS TARGET OF SALES.

X- ADVERTISEMENT EXPENDITURE

Y- SALES EXPENDITURE

WHEN Y=

80,000

X= 47500

EXAMPLE
SUPPOSE GARBAGE BMC IS INTERESTED AND THE IN THE RELATIONSHIP EXPENSE BETWEEN THE AGE OF TO TRUCK ANNUAL REPAIR THEY SHOULD EXPECT

INCUR.

IN

ORDER

TO

DETERMINE

THIS

RELATIONSHIP,

BMC

HAS

ACCUMULATED OWNS.

INFORMATION CONCERNING FOUR OF THE TRUCKS THE CITY

CURRENTLY

ORGANIZE THE DATA AS

OUTLINED IN TABLE

USE THE EQUATIONS OF REGRESSION LINE.

A & B TO FIND

THE NUMERICAL CONSTANTS

FOR OUR

b= 0.75
A= 3.75

Y=3.75+0.75X

BMC CAN ESTIMATE THE

ANNUAL REPAIR EXPENSE GIVEN THE AGE OF TRUCK.

IF IT IS

4 YEARS OLD USE THE

EQUATION Y=3.75+0.75X TO GET THE

ANNUAL

EXPENSE AS FOLLOWS

Y= 3.75+0.75 *4

=6.75

EXPECTED ANNUAL REPAIR EXPENSE =6750.0

HOW

TO

MEASURE

THE

RELIABILITY

OF

THE

ESTIMATING

EQUATION?

MEASURED

BY

THE STANDARD

ERROR OF ESTIMATE

IT

MEASURES

THE

VARIABILITY,

OR

SCATTER

OF

THE

OBSERVED

VALUES

AROUND

THE REGRESSION LINE.

STANDARD ERROR
For the above example

STANDARD

ERROR=0.866 866.0

/-

IF

STANDARD

ERROR

IS

ZERO

WE

EXPECT

THE

ESTIMATING

EQUATION

TO

BE A

PERFECT ESTIMATOR OF

THE DEPENDENT VARIABLE.

Assuming that the observed points are normally distributed around the regression line, we can expect
68% OF THE POINTS WITHIN + S
E

95.5 %

OF THE POINTS WITHIN

+S

AND

99.7%

OF THE POINTS

WITHIN +

3S

CORRELATION
THE STATISTICAL BETWEEN TWO OR TOOL MORE WITH THAN THE HELP OF WHICH IS THE RELATIONSHIPS IS CALLED TWO VARIABLES STUDIED CORRELATION.

CORRELATION ANALYSIS
CORRELATION TO WHICH ANALYSIS IS THE STATISTICAL RELATED TOOL TO DESCRIBE THE DEGREE ONE VARIABLE IS LINEARLY TO ANOTHER.

The coefficient of determination


THE EXTENT, OR STRENGTH OF THE ASSOCIATION THAT EXISTS BETWEEN TWO VARIABLES X & Y

SAMPLE COEFFICIENT OF DETERMINATION

SAMPLE COEFFICIENT OF DETERMINATION

=1 WHEN THERE IS PERFECT CORRELATION

=0 WHEN THERE IS NO

CORRELATION

NOTE

MEASURES

ONLY

THE

STRENGTH

OF

LINEAR

RELATIONSHIP

BETWEEN

TWO

VARIABLES.

CORRELATION COEFFICIENT
THE CORRELATION THE STRENGTH AND COEFFICIENT COMPUTED FROM THE SAMPLE DATA MEASURES DIRECTION OF A RELATIONSHIP BETWEEN TWO VARIABLES.

SAMPLE CORRELATION COEFFICIENT,

R.

POPULATION CORRELATION COEFFICIENT,

Range of Values for the Correlation Coefficient


COEFFICIENT OF CORRELATION

R=

WHEN

THE

SLOPE THE

EQUATION

IS POSITIVE

R IS

THE

POSITIVE SQUARE

ROOT,

BUT IF B IS NEGATIVE

R IS THE NEGATIVE SQUARE ROOT..

THE

SIGN

OF

INDICATES

THE

DIRECTION

OF

THE

RELATIONSHIP

BETWEEN

TWO

VARIABLES X & Y

KARL PEARSONS CORRELATION COEFFICIENT


THIS IS ALSO CALLED PRODUCT MOMENT COEFFICIENT OF CORRELATION. COVARIANCE OF X AND Y IS DEFINED AS

What does r=0.6 mean?


R=0.6 R
2

=0.36

36%

OF THE VARIATION

IN THE AMOUNT SPENT

ON MOVIES

IS EXPLAINED

BY

THE REGRESSION LINE.

FROM R=0.6

THE AMOUNT SPENT ON

MOVIES CORRELATES 0.6 WITH


2

FAMILY

INCOME SEEMS LIKE FAIRY STRONG CORRELATION . BUT R

=0.36

36%

OF

THE VARIATION

IN THE AMOUNT

OF MONEY

FAMILIES

SPEND

ON

MOVIES.

IF

YOU

DESIGNED

YOUR

MARKETING

STRATEGY

TO

APPEAL

ONLY

TO

FAMILIES

WITH HIGH INCOMES, YOUD

MISS A LOT

OF POTENTIAL

CUSTOMERS.

INSTEAD

TRY

TO FIND

WHAT ELSE IS INFLUENCING FAMILY

MOVIE DECISIONS.

RANK CORRELATION COEFFICIENT

WHEN

QUANTITATIVE IN THE

MEASURE GROUP

OF

CERTAIN BE

FACTOR IN

CANNOT ORDER

BE

FIXED,

BUT

THE

INDIVIDUALS FOR EACH

CAN

ARRANGED

THEREBY

OBTAINING

INDIVIDUAL A NUMBER INDICATING HIS RANK

IN THE GROUP.

THE

RANK

CORRELATION WITH 1 FOR

COEFFICIENT THE

IS

APPLIED

TO

SET

OF IN

ORDINAL

RANK OR

NUMBERS,

INDIVIDUAL

RANKED

FIRST,

QUANTITY AS

QUALITY, AND

SO ON,

N FOR LAST

RANKED ONE, THEN

R CAN BE DEFINED

EXAMPLE
TWO MANAGERS FOR ARE ASKED TO RANK A GROUP TOP OF EMPLOYEES THE IN ORDER ARE OF AS POTENTIAL FOLLOWS. EVENTUALLY BECOMING MANAGERS. RANKING

COMPUTE VALUE.

THE

COEFFICIENT

OF

RANK

CORRELATION

AND

COMMENT

ON

THE

R=1-0.085
=0.915

WHERE RANKS ARE NOT GIVEN


ASSIGN RANKS. THEN APPLY THE SAME FORMULA

EQUAL RANKS OR TIE IN RANKS


ASSIGN EACH INDIVIDUAL OR ENTRY AN AVERAGE RANK.

THUS IF INDIVIDUALS =5.5 TO BOTH .

ARE RANKED

EQUAL AT

T H

PLACE,

GIVE THE RANK

(5+6)/2

IF M IS THE NUMBER OF ITEMS WHOSE RANKS

ARE COMMON THEN R IS

MULTIPLE REGRESSION AND CORRELATION ANALYSIS


WE CAN USE MORE THAN AND ONE THUS INDEPENDENT ATTEMPT TO VARIABLE TO ESTIMATE OF THE THE DEPENDENT ESTIMATE. VARIABLE INCREASE THE ACCURACY

THIS PROCESS IS CALLED

MULTIPLE REGRESSION ANALYSIS

EXAMPLE
CONSIDER HOUSES THE REAL FIRM ESTATE SELLS IN AGENT A WHO WISHES TO THE TO RELATE THE NUMBER OF THE MONTH AMOUNT OF HER MONTHLY

ADVERTISING.

CERTAINLY

WE

CAN

FIND

SIMPLE

ESTIMATING

EQUATION

THAT

RELATES

THESE TWO VARIABLES.

COULD

WE

ALSO

IMPROVE

THE

ACCURACY

OF

OUR

EQUATION ?

BY

INCLUDING

THE

NUMBER OF SALESPEOPLE SHE EMPLOYS EACH

MONTH

THEN

WE

CAN

USE

NUMBER

OF

SALES

AGENTS

AND

THE

ADVERTISING

EXPENDITURES TO PREDICT MONTHLY HOUSE SALES.

Multiple regression equations


FOR GETTING A, B & C SOLVE THE NORMAL EQUATIONS

EXAMPLE

IN TRYING TO EVALUATE THE EFFECTIVENESS FIRM COMPLIED THE

IN ITS ADVERTISING CAMPAIGN,

FOLLOWING INFORMATION

YEAR 1996 1997 1998

1999

2000 2001

2002

2003

ADV.

EXPENDITURE 12 15 15 23 24 38 42 48

(000

RS.)

SALES

(LAKH

RS.)

5.0

5.6 5.8 7.0 7.2 8.8

9.2 9.5

ESTIMATE THE PROBABLE SALES WHEN ADVERTISEMENT EXPENDITURE IS RS. 60 THOUSAND.

Y= 3.8719+

0.1250 X

WHEN X=60

Y= 11.37

Anda mungkin juga menyukai