Anda di halaman 1dari 11

WMBA 11 IBA Instructor: Dr.

Swapan Kumar Dhar Bivariate Data: Bivariate data refer to data relating to two variables. Statistical data relating to simultaneous measurements of two variables are called bivariate data. The simultaneous measurements of each item are then expressed in the form of a pair have n pairs of measurements or observations as

( x, y ) one for each variable. Thus, for n item we

( x1 , y1 ), ( x2 , y 2 ), ..., ( xn , y n ).
Example: Following is the bivariate data showing the Height (c.m) and Weight (kg) of 12 students. Height (cm) 122 130 126 120 122 124 125 127 126 123 130 124 Weight (kg) 25 40 32 23 27 35 34 34 32 28 38 30

In the above data the height (cm) and weight (kg) of the first student are paired as (122, 25), for the second student as (130, 40) and so on. Correlation Analysis: Correlation analysis is the study of the relationship between variables. That is, it is a technique to measure the association between two variables. The basic idea of correlation analysis is to report the association between two variables. The usual first step is to plot the data in a scatter diagram. For example, if x and y denote respectively the height and weight, then a sample of n individual would give the observations of heights

x 1 ,K , x n and corresponding weights y 1 ,K , y n . When these points

(x 1 , y 1 ),(x 2 , y 2 ),...,(x n , y n ) are plotted on a rectangular coordinate system, the resulting set of
points on the plot is called the scatter diagram. From such a diagram one can visualize a smooth curve known as approximating curve. Following figure shows that the data appear to be approximated by a straight line and it shows a linear relationship between two variables.

Scatter Diagram
50 40 Weight 30 20 10 0 115 120 125 Height 130 135

Scatter diagram: The scatter diagram is a graphical representation of simple bivariate data. That is, it is a chart that portrays the relationship between the two variables of interest. Example: The sales manager of some company randomly selected 10 selected 10 sales representatives and determined the number of sales calls each one made last month and the number of units of the

product he or she sold last month. The sample information is given below. Portray this information in a scatter diagram. Based on the scatter diagram, what observations can you make? Sales calls and units sold for 10 sales people. Sales Representative A B C D E F G H I J Solution: Number of Sales Calls 14 35 22 29 6 15 17 20 12 29 Number of Units Sold 28 66 38 70 22 27 28 47 14 68

Scatter diagram
d l o 80 s 60 y t i t 40 n a 20 u Q 0 0 10 20 Sales Call 30 40

Observations made: From the scatter diagram, it is noted that the number of sales calls increases, the number of units sold increases. Therefore, there is a strong positive relationship (correlation) between sales calls and the number of units sold. Note: It is common practice to put the dependent variable on the Y-axis and the independent variable on the X-axis. In our example units sold is the dependent variable and number of sales calls is independent variable. Dependent variable: The variable that is being predicted for estimated. Independent variable: A variable that provides the basis for estimation. Measures of correlation: There are many situations in which the objective in studying the joint behavior of two variables is to see whether they are related, rather than to use one to predict the value of the other. The sample correlation coefficient r measures how strongly related two variables x and y are in a sample. The coefficient of correlation: A measure of the strength of the linear relationship between two variables. The degree of correlation between two variables X and Y is measured by coefficient of correlation or correlation coefficient. This is denoted by r. The formula for r is

[ ( X i X ) ] [ (Yi Y ) ] Interpretation of r : r can take any value from


2 2

r=

( X
i =1

X )( Yi Y )

[n( X 2 ) ( X )2 ] [n( Y 2 ) ( Y )2

n( XY ) ( X )( Y )

1.00 to 1.00 inclusive. When it is positive, one variable tends to increase as the other increase. When it is negative one variable tends to decrease as the other increases. If there is absolutely no relationship between the two sets of variables, r will be zero. i.e. the variables are said to be uncorrelated.

When r = 1, the correlation is said to be perfect. The closer the value of r to either +1 or 1, the stronger is the relationship between the two sets of variables and the closer it is to zero the weaker is the relationship. Characteristics of the coefficient of correlation coefficient: 1. The sample coefficient of correlation is identified by the lower-case letter r. 2. It shows the direction and strength of the linear relationship between two variables. 3. It ranges from 1 up to and including +1. 4. A value near 0 indicates there is little association between the variables. 5. A value near 1 indicates a direct or positive association between the variables. 6. A value near 1 indicates inverse or negative association between the variables. Scatter diagrams showing perfect positive correlation and perfect negative correlation

Perfect Positive Correlation Scatter diagrams depicting zero, weak and strong correlation r = 0 (No correlation)

Perfect Negative Correlation

Weak Negative Correlation

X and Y strongly linearly related Perfect Negative Correlation Strong Negative Correlation -1.00 -.50 correlation Example: The following data represent the age(in years) and blood pressure(mmHg) of some patients: Age 25 30 35 30 32 40 45 40 36 35 BP 75 80 85 90 95 85 100 90 85 80 (a) Identify the dependent and independent variables. (b) Compute the coefficient of correlation between age and blood pressure. Solution: (a) Here age is the independent variable and blood pressure is the dependent variable. (b) Since age is the independent variable, we denote it by X and blood pressure by Y. The correlation coefficient of X and Y is given by Moderate negative correlation Weak negative correlation 0 No correlation Moderate positive correlation Perfect positive correlation

Weak positive correlation .50

Strong positive correlation 1.00

r= [

( X
i =1

X )( Yi Y )
2

=
2

(X

X) ] [
Age (X) 25 30 35 30 32 40 45

(Y Y ) ]
i

[ n( X 2 ) ( X )2 ] [n( Y 2 ) ( Y )2 ]
XY X2 Y2

n( XY ) ( X )( Y )

Patient

BP(Y) 75 80 85 90 95 85 100

Total

40 36 35 345

90 85 80 865

30350

12420

75325

r=

[ n( X ) ( X ) ] [ n ( Y ) ( Y ) ]
2 2 2 2

n( XY ) ( X )( Y )

10 30350 348 865 [10 12420 (348) 2 ] [10 75325 (865)2

= 0.63. The amount of linear relationship of blood pressure and age of patients is 0.63. That is, there is moderate correlation. Example: From the following data find the association or the correlation in the value of two currencies, the German mark and the Japanese yen, from 1988 to 1997. Table 1: Exchange rate of the German mark and the Japanese yen in U.S. dollars Year 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 Solution: Year 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 r 0.53566 Mark(X) 1.76 1.88 1.62 1.66 1.56 1.65 1.62 1.5 1.54 1.8 XBAR 1.659 Yen(Y) 128.17 138.07 145 134.59 126.78 111.2 102.21 103.35 115.87 130.38 YBAR 123.562 (X-XBAR)^2 0.010201 0.048841 0.001521 1E-06 0.009801 8.1E-05 0.001521 0.025281 0.014161 0.019881 SUM 1 0.13129 (Y-YBAR)^2 21.233664 210.482064 459.587844 121.616784 10.355524 152.819044 455.907904 408.524944 59.166864 46.485124 SUM2 1946.17976 (X-XBAR)*(Y-YBAR) 0.465408 3.206268 -0.836082 0.011028 -0.318582 0.111258 0.832728 3.213708 0.915348 0.961338 SUM3 8.56242 German Mark 1.76 1.88 1.62 1.66 1.56 1.65 1.62 1.50 1.54 1.80 Japanese Yen 128.17 138.07 145.00 134.59 126.78 111.20 102.21 103.35 115.87 130.38

r = Sum(X-Xbar)(Y-Ybar)/sqrt((X-Xbar)^2)(Y-Ybar)^2)) = 0.53566 Comment: The coefficient of correlation r = + .536 between the German mark and the Japanese yen indicates a moderate association. A higher price of the German mark is moderately associated with a higher price of the Japanese yen. Example: Calculate the coefficient of correlation between the number of sales calls and the number of units sold for the problem given in Example 5. Comment on the result. Solution: Sales Calls (X) Units Sold(Y)

(X-Xbar)^2

(Y-Ybar)^2

(X-Xbar)*(Y-Ybar)

(X-XBAR)

(Y-YBAR)

14 35 22 29 6 15 17 20 12 29 XBAR 19.9

28 66 38 70 22 27 28 47 14 68 YBAR 40.8

34.81 228.01 4.41 82.81 193.21 24.01 8.41 0.01 62.41 82.81 SUM 1 720.9 SQRT 1 26.849581

163.84 635.04 7.84 852.64 353.44 190.44 163.84 38.44 718.24 739.84 SUM 2 3863.6 SQRT 2 62.15786354

75.52 380.52 -5.88 265.72 261.32 67.62 37.12 0.62 211.72 247.52 SUM 3 1541.8 1668.912592

-5.9 15.1 2.1 9.1 -13.9 -4.9 -2.9 0.1 -7.9 9.1

-12.8 25.2 -2.8 29.2 -18.8 -13.8 -12.8 6.2 -26.8 27.2

Correlati on 0.923835 08

r= SUM 3/ SQRT 1* SQRT2 0.923835081

Coefficient of correlation is 0.924.

Interpretation: Here correlation is positive. So, we conclude that there is a direct relationship between the two variables. The value of 0.924 is close to 1.00. So we conclude that the association between the number of sales calls and the number of units sold is strong. Coefficient of Determination: It is the square of the correlation coefficient r. It measures the percent of the variation in Y that is explained by the variation in X . Example : For the data given in Example 7, calculate the coefficient of determination Solution: We have
2

r = 0.924 (solution given in example 8). r = (.924) 2 = 0.854

r2

and comment.

Comment: This is a proportion or a percent. We can say that 85.4 % of the variation in the units sold is explained or accounted for by the variation in the number of sales calls. Example: The city council is considering increasing the number of police in an effort to reduce crime. Before making a final decision, the council asks the Chief of police to survey other cities of similar size to determine the relationship between the number of police and the number of crimes reported. The Chief gathered the following sample information. City A B C D E F G H Police 15 17 25 27 17 12 11 22 Number of crimes 17 13 5 7 7 21 19 6 (a) If we want to estimate crimes on the basis of the number of police, which variable is the dependent variable and which is the independent variable? (b) Draw a scatter diagram. (c) Determine the coefficient of correlation. (d) Determine the regression equation and estimate the number of crimes if the number of police is 30. (e) Interpret these statistical measures. Solution: (a) Crime is the dependent variable and police is the independent variable. (b)

Scatter Diagram
25 20 Crime 15 10 5 0 0 10 Police
(c) City A B C D E F G H Total Police ( X ) 15 17 25 27 17 12 11 22 146 Crime (Y ) 17 13 5 7 7 21 19 6 95

20

30

X2
225 289 625 729 289 144 121 484 21316

Y2
289 169 25 49 49 441 361 36 9025

XY
255 221 125 189 119 252 209 132 13870

From the Table, we know that


2 2 X = 146, Y = 95, X = 21316, Y = 9025, XY = 1502,n = 8. n( XY ) ( X )( Y )

We have, r =

8 1502 146 95

[ n( X 2 ) ( X )2 ] [n( Y 2 ) ( Y )2 =

[8 2906 (146) 2 ] [8 1419 (95)2


(d) Regression equation is

1854 = 0.8744. 2120.3217

) a = ) b=

1854 ) 95 146 = 0.9596. a = (0.9596) = 29.3877. n( X ) ( X ) 1932 8 8 ) So the estimated regression line is y = 29.3877 0.9596X.
2 2

) X b . n n n( XY ) ( X )( Y )

) n( XY ) ( X )( Y ) ) ) ) and y = a + bx where b = n ( X 2 ) ( X )2

If the no of police were 30, then the estimated number of crimes would be

) y = 29.3877 0.9596 30 = 0.5997.

(f) There is a strong inverse relationship between police and no of crimes as we have found r = - 0.8744. That is, the number of police increases, the crime decreases. ) From the calculation of b = - 0.9596, we can conclude that for each addition of police, crime is expected to decrease almost by 1 unit.

Linear regression: Linear regression and correlation are two commonly used methods for examining the relationship between quantitative variables and for making predictions. It is necessary to develop an equation to express the relationship between two variables and estimate the value of the dependent variable Y based on a selected value of the independent variable X . The technique used to develop the equation for the straight line and make these predictions is called regression analysis. So, regression is a statistical method to estimate (or predict) the unknown values of one variable (Y ) for specified values of the other variable ( X ). Linear Equation: The general form of a linear equation with one independent variable can be written as

) ) ) y = a + bx

Where: ) y read as y head, is the predicted value of the Y variable for a selected X value.

) a = Y intercept. It is the estimated value of Y when X = 0. ) ) b = the slope of the line or the average change in y for each change of one unit ( either increase or a and b are the regression coefficients or simply regression coefficients. b and a are n( XY ) ( X )( Y ) n ( X 2 ) ( X )2 ) X b . n
,

decrease) in the independent variable X. Here

The formula for estimated

) b= ) a =

X = Independent variable and Y = Dependent Variable.

Y
n

Example: A chemical engineer is investigating the effect of process operating temperature on product yield. The study results in the following data: Temperature,
o

C ( x)

100 45

110 51

120 54

130 61

140 66

150 70

160 74

170 78

180 85

190 89

Yield, % (y)

These pairs of points are plotte4d in Figure. Such a display is called a scatter diagram. Examination of this scatter diagram indicates that there is a strong relationship between yield and temperature.

Temperature

100 80 60 40 20 0 0 50 100
Yield

150

200

The following quantities may be computed:

n = 10,

x
i =1 10 i =1

10

= 1450,

y
i =1

10

= 673, x = 145,
10

y = 67.3, = 101570.

xi2 = 218,500,
) b = 0.483
and

yi2 = 47,225,
i =1

10

x y
i =1 i

) ) a = y bx = 67.3 ( 0.483 ) 145 = 2.739. = 2.739 + 0.483 x . y

The fitted simple linear regression equation or model is

Example: The following table gives experimental data for force ( X wind tunnel Velocity (X) Force(Y) 10 24 20 68 30 378 40 552

and velocity (Y ) for an object suspended in a

50 608

60 1218

70 831

80 1452

(a) Use the linear regression to determine the coefficients (b) Estimate the force when the velocity is 55 m/s. Solution: Here

a and b .

n = 8.

n
1 2 3 4 5 6 7 8

x
10 20 30 40 50 60 70 80 360

Calculating table

Now we have

24 68 378 552 608 1218 831 1452 5131

100 400 900 1600 2500 3600 4900 6400 20400

x2

xy
240 1360 11340 22080 30400 73080 58170 116160 312830

x = 45, y = 641.375

and

) a = 236.50. ) Hence y = 236.50 + 19.5083x .


and

) n( XY ) ( X )( Y ) by / x = = 19.5083 n ( X 2 ) ( X )2

(b) The estimated value of the force when the velocity is 55m/s is given by

) y = 236.50 + 19.5083 55 = 836.4583N .

Example: For the data involving sales calls and units sold, obtain the regression equation. Interpret it. What will be the estimated sales if 40 calls are made? Solution: Sales Sales calls XY Units sold (Y ) X2 Y2 representative (X ) A B C D E F G H 14 35 22 29 6 15 17 20 28 66 38 70 22 27 28 47 196 1225 484 841 36 225 289 400 392 2310 836 2030 132 405 476 940 784 4356 1444 4900 484 729 784 2209

I J Total

12 29 199

14 68 408

144 841 4681

168 1972 9661

196 4624 20510

) 10(9661) (199)(408) b = = 2.1387 10(4681) (199)2 ) 408 199 a = 2.1387 = -1.7601 10 10


Hence, the regression equation is So, if a sales representative makes 20 calls in a month, the expected number of units sold or estimated number of sales is

) Y = 1.7601 + 2.1387 X

expected to increase 2.1387 units. That is, for each sales call made, about two units are sold. ) Drawing the line of Regression The least squares equation Y = -1.7601 + 2.1387 X is used to determine the least squares line of regression to be drawn on the scatter diagram. The first sales representative in the sample is A and he made 14 sales calls. His estimated number of units sold is 28.1817 found by ) Y = -1.7601 + 2.1387 X = - 1.7601 + 2.1387 (14). The other points on the regression line can be determined by substituting the particular values of X into the regression equation. Sales Calls (X) 14 35 22 29 6 15 17 20 12 29

Interpretation ) The b = 2.1387 indicates that for each additional sales calls made during the month, sales are

) Y = 1.7601 + 2.1387(20) = 41.0139

Y = -1.7601 +2.1387(14) Y = -1.7601 +2.1387(35) Y = -1.7601 +2.1387(22) Y = -1.7601 +2.1387(29) Y = -1.7601 +2.1387(6) Y = -1.7601 +2.1387(15) Y = -1.7601 +2.1387(17) Y = -1.7601 +2.1387(20) Y = -1.7601 +2.1387(12) Y = -1.7601 +2.1387(29)

Equations

Estimated Monthly Sales ( Y ) 28.1817 73.0944 45.2913 60.2622 11.0721 30.3204 34.5978 41.0139 23.9043 60.2622

The plot X= 14, Y = 28.1817 is located by moving to 14 on the X axis and then going vertically to 28.1817. All the other points are connected to give the straight line.

10

Positive Linear Relationship

Negative Linear Relationship

No Relationship

11

Anda mungkin juga menyukai