trgl
.v
Age ofa planl Quantity of fruit produced
Height of students Weight of students
wcight at the end ofa spring Lcngth ofthe spring
Ditunet(l ufstem of a plant Average leigth of leafof the plant
No. ofhrs spent studying Marks achieved
'firne 'l emperature ofcooling object
c8-l
Scatter Dingram
The most common and convcnicnt mcthod ofdisplaying a sel clfbivariate data is by means oi a
scatler diagram.
Wc trcat thc bivariatc pairs as sct ol(r, r) coordinalcs and plot thcm as a graph io obiain a scl ol'
points. Thlr scattcr diagram will revcal thc rclationship bctwccn thc two variablcs.
Eg 2 The marks ola class of l0 studcnts jn a Mathernatics cxamiratjon are give,l in the tablc
Student R (' D F] F (i H
12 84 50 42 33 50 69 8l 5o :15
onark in Paper 1)
v '/3 ,10
31 u3 42 60 63 59 92
(mark in Paper 2)
CC:
Step l: Entcr data
<stat><Edit><enteF
121t1) =
Before plotting,
. 'Y:" screen i unbold all ":" signs
c8-2
Step 2i Plot the data PlntZ Pl+tf
(sr',\'r'PLo1) Dr-{'
'JFE: E La fu
rt}'. {IF l-/
. Sct Plot to 'ON'
I iEL: Lr
<I :Plot 1...><cntcr>. ON>.:cntcr>
I isL: Lr
' Choosc type ofgraph 'scattcr plot' EFK: E +
' Xlist :Lr (-r-coordinates)
. Yljst: L2 L| coordinatcs) 1:Ll.'LZ
. Mark: Any
. .:TRACE>
, <Zoom><9:Zoom Stal':'
. (to vicw full plot)
lrlrl iv
90
80
/rl
EO
50
4rl
3rl
2D
10
0 -T
100
c8-3
Do it yourself
Qn l: The height and weight ofa class ofl0 students are given in the table below:
Student B c D E F G H I J
1.5 I.58 1.6 1.61 1.65 1.72 1.73 r.78 1.8 1.85
(Height in m)
v '70 '75
53 5'7 62 65 66 72 90 85
(weisht in ks)
Soln:
c8-4
Analvsis of Scatter Diagram
X
XX Xand I related in this way are said 1() have a
corelation.
xXX)0(
X
(linear relationship)
X i.e. as n gets larger, y gets larger
XX
X
XX
X corelation.
XX
X XX (linear relationship)
X X
X i.e. as r. gets larger,l, gets smaller
XX
XX n
iYtr
XX
correlation
(No clear relationship)
1XX X
c8-5
Wc rvi ll only be dealing with Iinear rclationship. I f points in the scatter didgram seem to lie near a
Notc:
. Scatter diagrarns arc uscd only for quantitative variables (i.e. height, mass, counts, ctc).
.lnterpretationofthcstrcngthsolclybasedonlhescallerdiagramissubjectivcanditcanbe
cleceiving when clifl_erent scales for the axcs arc used.
To measure the degree oflincar rclationship betweeD two variables r and 1 (which is called
correlation), a quantity called the , will be needed.
The estimatcd product-moment correlation coef'licient ofa sample is given by:
r 0-
> posilivecol_relation
E.g. 'l hc correlation bctwccn thc two scts ofvariables is saicl to be curvilinear. There is
no linear conelation and such a scatter diagram rvill give a vcry low valuc ofr (i.c. r
= 0). But therc is a curvilinear kind ofconelation or quadratic correlation bctwec:rr
the variahles
6. r is a lneasure of the degree of scatter and ," is indcpendent of the units in which the
data is measured- r is unalGcted bv chanscs in the scale of the axes and chanoes of
units of the variables.
c8-7
X
XX v XX ,.X
vX
xxx., x
X,(
X XiX
XX xXX
X
)o(
X
X
XX
XX
aXX X XX X
X XX
Eg 3a
The marks in Mathematics (.r) and Chemistry (/) obtained by ten randomly chosen JC 2 students
were taken and the summadsed data were given as follows
Find the product moment conelation coefficient / and comment on the value ofr obtained.
y- I'IY
Soln: r=
c8-8
Eg 3b
The data in the above example is given in the table below instead ofthe sumtnarised statistics. Find
the product momcnt correlation coefficient r and comml-nt on the value ofr obtained.
18 20 30 40 16 54 60 80 88 9)
J 42 54 60 54 62 68 80 66 80 t00
Soln:
IJse of GC to obtain r
Step I : Key in the data using <STAT> <EDIT>
:I
zg
30
t0
t6
5\
60
Lr(r) = 18
Step 2: Tum diagnostics on
<CATALOG> <DiagnosticOn>
Step 3: <STAT> <CALC> <8:LinReg(a+br.)> <LIST> <NAMES> <Lr> <LIST>
<NAMES> <Lr>
lnRp!l
!J=E+bX
d-Jo. I Jt rlErfJJ
Lr-. J-{a D ?rf I 7J
I.|-.(|+|+L'J(z.zo
r-.8626339159
c8-9
Do it yoursell
Qn 2: The height and wcight of a class of l0 students arc giverl in the table belo\.v:
Find the product moment correlation coefficicnt r and comment on the valuc ofr obtained.
Student B C D F F G H J
;-
1.5 1.58 1.6 1.63 t65 t1) 1.73 1.78 t.8 1.85
(l lcight in m)
v '75
53 5't 62 65 66 10 12. 90 85
(Weisht in kg)
Soln:
3. Regression Lines
/'mcasurcs how well the data fits a linear model. Ifthe fit is good, we can consider fbr-mulating an
cquation ol a straiglrt linc to model the relationship. This straight line is called a regression line.
(a) F-or any biva.iate set ofdata, connecting variables r and/, there are always qg_glliqJglX
defined reercssion Iines.
c8,10
Equation of the lcast squares regression line oft on x
y:a+bx
J = a + hy is obtained by finding values ofa and b such that lel is minimum. (e is the difference
aid a = t-b;
\-
t'^-\-
z-t.,
, xy
b= I ("-;X-v - t)
and r-t=b()l;-;) (in MFrs)
)t"-;)' -,, (I')'
z'^ n
Thus,y = t+b(,Y-t)
c8-l i
Note:
.
t.
\=4::!!, )=4
t"
Regrcssion linc passes through (t,t), the rnean olthe set ofbivariate data.
Eg.la 'lhe marks in Mathcmatics (.r) and flhemistry 0,) obteined by ten randomly chosen JC 2
students were takcn and thc surrrnariscd data wcre given as lollows
20
r12
Find the cquation oflhe estimated regression line oft on -y.
Soln:
Use ofGC to obtain regression line
Step I : Key in the data using <STA'|> <EDl l >
LI L3 1
rFI
20
\2
51r
]'I EO
\4 lrr
t6 EZ
5t 68
60 EO
Li(rl = 18
Step 2: Turn diagnostics on
<CATALOG> <Diagnosticon> <ENTER>
Step 3: <STAT> <CALC> <8'LinReg(a+bi)> <LIST> <NAMES> {Lr> <,>
<LIST> <NAMES> <L:> <ENTER>
I NREg
s=E+bx
???E
---?o 'ao??
Lr-. -rt l E 70I ?,J
t *-- a ++rJa .4a ztf,
r=.8625339159
c8-12
Eg 4b Suppose that the table in Eg 4a is not given and thc data is summarised as I-ollows
\'.\-,, {s28)(ob6)
L^. /-^,/,.1 18640 - "
t0
(b) Slope: For an increase of I in the Mathematics score, there is an increase of0.528
in the Chemistry score.
i/-intercept: A student is estimated to score 38.7 for Chomistry when he/she scores 0 for
Mathematics.
y- I'Ir 38640_
(528)(666)
10
il;'rtt;"arLl
\i{" , )\" n ,f;'ilF,*ff )
t-
Since r ry 0.863 , it indicates a high positive linear correlation between Mathematics and
Chernistry scores. llence, the predicted score is reliable.
c8 13
Do it Yoursell
Qn 3:
The ro. ofhours spent studying for a particular subject in a week and the marks obtained 1br a test
5 7 8 t0 1) 13 15 20 2t
(No. ofhours per week)
(a) Find the equation ofthe estimated regression lile of7 on:r.
(b) Inter?ret the slope andl-intercept in the context ofthe question.
(c) Estimate the no. of hourc a student needs to spend in order to achieve a mark of 80 in the
test. Comme[t on the reliability ofthe va]ue obtained-
Soln:
c8-14
Equatiop of the teast squares regression line of.r on r
\ " r:c+d/
.,
X
x""'x
''"
X'\
...t x
x -\
-;;
;. '(i,r)
\,.
.:/, X
-r - d!
c+ (lea:;l .tqLter.ts rcgresston li)lc oJ x on y)
,r: c + dI is obtained by finding values ofit ilncl b such that te2 is minimum.
(e is the di1l€rence betwcen the obscrvcd and cxpcctcd r, also known as residuals)
\-.!.,
L"L'
\r,
t(,.,-Xr t) or and y-1=d(x ;) (in MFls)
lrr rt'
I:u' _DI
Thus,J= t+d(,t-t)
Note:
\
t, y
t,,
-,
. Regression line passes through (t,t), the mean ofthc set ofbivariate data.
c8-15
' d is known ns the estimated regression coclticient (slope of !3aph).
Regression line of
v .), on -r
Regression line of
x on JL/
Tlc larger the numerical value ofr, the nearer the lines approach coincidence and
c8-16
No lihem correl.tion r = 0
Eg 5a Find the regression lines of/ onr and r on.), for the data below and also calculate the
v l0 l4 l2 13 15 t2 t3
Soln:
Step l: Placer values in Ll and.p values in L2.
<STAT> <EDIT>
Step 2: To get product moment corelation coefficient.
<Catalogue> <Diagnosticon> <ENTER>
c8-t 7
rnReg(B+bx) Lr,LrnRPg
z,Vrl Ic=a+bx
a=11.70403587
b=. 1S68986547
rr=. 1438282624
r=,37BlEB19E3
)=11.7+0.186,lj
Step 4: To get regrcssion line of-r ony.
<STAT> <CALC> <8:LinReg(a+br)> <LIST> <NAMES> <L2> <,> <LIST><NAMES>
<Lr> <.> <VARS> <Y-VARS> <1 TFLINCTION> <Yr> <ENTER>
lnHeg(E+bx) Lr,
r , Vzl 'J=E+bx
d- +. ,J.tiJ ?.4J ?J
Lt-- I OOJltlJl{]J
rz=.1438282624
r=.3781881983
x: - 4.34 + 0.769 y
r + 4.14
.. 'v - --l --:-- ) store thrs as Y'
0.7b9
Note: All above regression lines are stored in Yt, Y2 respectively so that the regression line can
be obtained graphically (Not really a must-do)
Soln:
Regression liney on rc is
Rcgression line -r ony is
c8 18
Eg 5b Find the regression lines ofjl on r and r on / for the data below and also calculate the
prcduct momcnt correlation coeffi cient.
38' :7
lx2 =2t0, )r =
n
Soln:
= 5.43 = 12.'7
I _,
\'-s.,
Z-^ Z.' {18)(8e)
495 : --,1--,1
s,,.-(I,)
z-' l r47- {84)'
n 1
t- I'Iu
-1
4es {J8)(8e)
, l[r"
/1r, _tI,, {Id]
, 1r 7t\
./[zro-:s llrr+z-8]
1l I
llt'- )lt' I
(Compare these a swerc with those you obtained using GC)
c8-19
Eg6
Soln:
(-r,,-t,r)
Regession line
ofy on r
x
. Identify the outlier data pair (J.r,.l,r)
. Remove data (xl,.t/r) ftorn CC
. Recalculate the corelation coeflicient for the revised data
. Recalculate the line ofregression ofy on r for the revised data.
c8-20
4. InterpolationandExtrapolation
Once thc rcgrcssion lincs are found, we can use them lor !!.Elp!4lli9&
Extrapolntion ol rhe sample should be used $rilll caution as the relationship bctwcen )aand I niay
Eg 7 (continucd from Eg 5a) In thc abovc cxanrplc 5a, find thc valuc of
(i) 1 when -r:5 (iiterpolatiol within the range of-r)
(ii) r. when-f :5 (extrapolation - outside the range of})
Soh:
(i) Frorn GC, when ,r = 5, -t = 12.(; trsing thc y on x regressidl line (Y tgraph)
(ii) From GC, when .1, = 5, .y = 0.5 using lhe x on ! regression lhe (y 2 graph)
Eg 8 'Ihe ages, x years and hcights, y cm, of l0 boys wcrc given as follows:
Soln:
(i) I inear (orrclirliun coell. bct. r & 1
s.\-,,
, _,, t-^ z-' I202.t 1 fel6l(r28r\
" ',
t0
899 8-'
rer.6l'lf
|66091
/rzsll')
r0^t0)
c8-21
(ii) Eqn- ofrcgrcssion line o1_r on r:
(iij)
c8 22
trg9
The averagc densities ofblackbirds (in pairs per thousand hcctarcs) ovcr vcry large lreas of
f'amland and ofwoodland arc shown, f-or the years 1976 1o 1982, in the table below.
Year t9'7 6 19'77 197ft 1919 | 9E0 lgSl 1982
Soln: (r)
Manurl meihod
. .y : l1 .7 + 0.226t
As extrapolation is boing caried out in this case, the lin(]ar cofielation may not be valid
outside ofthe range ofvalues. llence, the esfimate is un eliable.
crJ 23
Do it yourself
Qn 4:
The no. ofhours spent studying for a particular subject in a week and the marks obtained for a test
for 10 students are given in the table below:
Srudent R c D E F G H I J
5 l 8 l0 ll 12 t3 15 20 21
(No. ofhours per week)
v '73 '14 89
55 60 62 63 66 75 84
(Mark)
o) Estimate the no. ofhouN a student needs to spend in order to achieve full marks in the
test. Comment on the reliability ofthe value obtained.
Soln:
Obtain the least square €stimates for d and B using an equation of the form
(i) y=q+ Blogtar and
(ii) y=d+Px2
as a fit for the set ofdata shown above.
Determine which equation is a better fit, giving rcasons to support your answer.
c8-24
Soln: (i) .f -.1+ /loglr-r =
Kcy in the cltta lbr x, l and z jnto L,, l-, and L, rcspectively using <STAT>
<EDII'>
LI LZ L] 3
ET 5.5 st FTr{H
t8 6.1 7A -7991t
E6 8.5 E6 .9t9rrZ
It5 \-z E5 .6Zlt5
91 7.t 91 .859t3
EO 5.1 EO
95 9.6 9S .9EZZ7
rr =loB{Lt } lrttt=. 7481888?78-..
(ii)
Key in the clata 1'or -r, y and ; into L,, L, and Lr respeotively using <S1A'l>
<EDt'.l >
LI LZ L} ]
5.E
5.1
BT
7E
F*t
19.59
s.5 s5
Lt E5 17.5t
?.9 91. 5!r.76
5.1 EO t6.01
9.6 95 92.16
.f,6
Using GC,
Since the correlation coelficient ibr part (i) is larger than that in part (ii), there is a
much better positive linear conelation. Therefore, t =a+ / logr0 r is a better fit.
c8,25
6. MiscellaneousExamples
Eg 11
A random sampie ofeight pairs of values of). and.), is used to obtain the following equations ofthe
regression lines ofy on n and ofrc on J., respectively.
7x. t5t
_. 7
'I t0 l0
.t___v+20
6-
Seven pairs ofdata are given in the table.
l0 1l 12 l1 1'7
't4 ls
,7
-l 9 8 6 5 4 1
Find the sth pair ofvalues of(jr,./). Detemine the value ofthe product moment conelation
coefficient and comment on what its value implies about the 2 regression lines given above.
Let y be the value obtained by substituting a sample value ofr into the equation ofthe regression
line ofy onx. Evaluate fforeach ofthe eight values ofxand venfy that )(7 f)'=S.S.
For each ol the sample values ofx, I/'isgivenby y'=a+bir,where u*!!1, 6*-1.y7lro1"on
I0 l0
you say about the value of I(1,- f ')'] ?
Soln:
1 l5t
-lr=--J+-.-...--(l)
l0 l0
7
y= _-y + 20 ...... (2)
6
Using GC, r =
c8-26
Sincc r . 0.90,+,which is very close to I , it indicates a high negative lincar conclation between
,\ and I. Hcnce thc rcgrcssion lines are very close.
l0 1l 12 ll l1 l4 19 10
v 9 8 I 6 5 4 u
_. 1 l5r
t0 t0 8.1 7.4 61 1.4 1.2 5.3 1.8 8.1
Eg 12
The daily rate charged by a ca-hire firn varics with thc lcngth ofthc hirc period. Thc finr-r's
,r l)ays
Daily
149 119 115 11). 109 105 103 10i
Rate $.1
F or the appropriate mod{-l, calculate the least squares estimates of a and b. Find also the product
lnolrlent corelation coefficient and commeDt on the suitabilitv ofthe modcl-
c8-2',1
Soln:
. Entcr thc data into the GCI as in two lists (say Iand y)
. With the command Diagnosticon, on thc Homc Screen, find ilny regressir'n ei.luation.
(Follow previous exarnple to find the regression equalior)
Your scrccn shot should look like this:
LinEes
v=EX+h
a= -. 4986649635
h=128.5658301
rr=.317465457?
F= -. 56344la73la9
(i) The scatter plot ofl, and r shows that the relationship betwecnJ", and x is non-linear.
Mor€over, the / value indicates a low negative linear col.(rlation. Hcnce, the
regression line ofJL, on x is not suitable.
(ii) It can bo €asily identified as C since-! tends to a limit lbr larger value ofr.
'l akc h
y-a.- iey a Ibz.Drawlher(e,ressic,nline_yonz.
c8-28
-l
he screen should look likc this:
b=47, B7?6J415
rt=,98.37837183
r'= - 9914587189
Now r = 0 992 rvhich is close to 1. Therefore there is a very high positive linear
coITelation which implies that the model is suitable.
Eg 13 l{esearch is being carried out iDto how the concenlration ofa dlug in the bloodsiream varics
with time, measurcd lrom when thlj dnrg is givcn- Observations at succcssivc timcs givc the
data shown in the fbllowing table.
'Iine (t minules) 90
aloncentralion
r microrrams Dcr litrc
It is given that thc valuc ofthc product momcnt corrclation cocf{icicnt for this data is
0.912, colrect to 3 decimal p].lces- The scatter diagram for the data is shown below-
100
rJ{l
60
40
2t
0 I {nimtcs)
r00 t50 200 2J0 300 150
crJ 29
Soln:
Equation ofthe regression line ofr on / :
When r:300, r:
It is not a suitable model as the concentration cannot bc a negative value.
Y= 4.62 0 0123t
As / is close to I, the regression lines ofl on I and l on 1 are almost idertical, therelore we can
usel on I to estimate L
c8 30