EXCELMultipleRegression
EXCEL2007:MultipleRegression
A.ColinCameron,Dept.ofEconomics,Univ.ofCalif.Davis
ThisJanuary2009helpsheetgivesinformationon
MultipleregressionusingtheDataAnalysisAddin.
Interpretingtheregressionstatistic.
InterpretingtheANOVAtable(oftenthisisskipped).
Interpretingtheregressioncoefficientstable.
Confidenceintervalsfortheslopeparameters.
Testingforstatisticalsignificanceofcoefficients
Testinghypothesisonaslopeparameter.
Testingoverallsignificanceoftheregressors.
Predictingygivenvaluesofregressors.
Excellimitations.
Thereislittleextratoknowbeyondregressionwithoneexplanatoryvariable.
ThemainadditionistheFtestforoverallfit.
MULTIPLEREGRESSIONUSINGTHEDATAANALYSISADDIN
ThisrequirestheDataAnalysisAddin:seeExcel2007:AccessandActivatingtheDataAnalysisAddin
Thedatausedareincarsdata.xls
WethencreateanewvariableincellsC2:C6,cubedhouseholdsizeasaregressor.
ThenincellC1givethetheheadingCUBEDHHSIZE.
(ItturnsoutthatforthesedatasquaredHHSIZEhasacoefficientofexactly0.0thecubeisused).
ThespreadsheetcellsA1:C6shouldlooklike:
WehaveregressionwithaninterceptandtheregressorsHHSIZEandCUBEDHHSIZE
Thepopulationregressionmodelis:y=1+2x2+3x3+u
Itisassumedthattheerroruisindependentwithconstantvariance(homoskedastic)seeEXCEL
LIMITATIONSatthebottom.
Wewishtoestimatetheregressionline:y=b1+b2x2+b3x3
WedothisusingtheDataanalysisAddinandRegression.
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
1/7
12/15/2014
EXCELMultipleRegression
TheonlychangeoveronevariableregressionistoincludemorethanonecolumnintheInputXRange.
Note,however,thattheregressorsneedtobeincontiguouscolumns(herecolumnsBandC).
Ifthisisnotthecaseintheoriginaldata,thencolumnsneedtobecopiedtogettheregressorsincontiguous
columns.
HittingOKweobtain
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
2/7
12/15/2014
EXCELMultipleRegression
Theregressionoutputhasthreecomponents:
Regressionstatisticstable
ANOVAtable
Regressioncoefficientstable.
INTERPRETREGRESSIONSTATISTICSTABLE
Thisisthefollowingoutput.OfgreatestinterestisRSquare.
Explanation
MultipleR
0.895828 R=squarerootofR2
RSquare
0.802508 R2
3/7
12/15/2014
EXCELMultipleRegression
x2iandx3i.
INTERPRETANOVATABLE
AnANOVAtableisgiven.Thisisoftenskipped.
df SS
MS
F
SignificanceF
Regression 2 1.6050 0.8025 4.0635 0.1975
Residual
Total
2 0.3950 0.1975
4 2.0
TheANOVA(analysisofvariance)tablesplitsthesumofsquaresintoitscomponents.
Totalsumsofsquares
=Residual(orerror)sumofsquares+Regression(orexplained)sumofsquares.
Thusi(yiybar)2=i(yiyhati)2+i(yhatiybar)2
whereyhatiisthevalueofyipredictedfromtheregressionline
andybaristhesamplemeanofy.
Forexample:
R2=1ResidualSS/TotalSS(generalformulaforR2)
=10.3950/1.6050(fromdataintheANOVAtable)
=0.8025(whichequalsR2givenintheregressionStatisticstable).
ThecolumnlabeledFgivestheoverallFtestofH0:2=0and3=0versusHa:atleastoneof2and3
doesnotequalzero.
Aside:ExcelcomputesFthisas:
F=[RegressionSS/(k1)]/[ResidualSS/(nk)]=[1.6050/2]/[.39498/2]=4.0635.
ThecolumnlabeledsignificanceFhastheassociatedPvalue.
Since0.1975>0.05,wedonotrejectH0atsignficancelevel0.05.
Note:SignificanceFingeneral=FINV(F,k1,nk)wherekisthenumberofregressorsincludinghte
intercept.
HereFINV(4.0635,2,2)=0.1975.
INTERPRETREGRESSIONCOEFFICIENTSTABLE
Theregressionoutputofmostinterestisthefollowingtableofcoefficientsandassociatedoutput:
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
2.1552
0.0585
4/7
12/15/2014
EXCELMultipleRegression
Letjdenotethepopulationcoefficientofthejthregressor(intercept,HHSIZEandCUBEDHHSIZE).
Then
Column"Coefficient"givestheleastsquaresestimatesofj.
Column"Standarderror"givesthestandarderrors(i.e.theestimatedstandarddeviation)oftheleast
squaresestimatesbjofj.
Column"tStat"givesthecomputedtstatisticforH0:j=0againstHa:j0.
Thisisthecoefficientdividedbythestandarderror.Itiscomparedtoatwith(nk)degreesoffreedom
whereheren=5andk=3.
Column"Pvalue"givesthepvaluefortestofH0:j=0againstHa:j0..
ThisequalsthePr{|t|>tStat}wheretisatdistributedrandomvariablewithnkdegreesoffreedom
andtStatisthecomputedvalueofthetstatisticgiveninthepreviouscolumn.
Notethatthispvalueisforatwosidedtest.Foraonesidedtestdividethispvalueby2(also
checkingthesignofthetStat).
Columns"Lower95%"and"Upper95%"valuesdefinea95%confidenceintervalforj.
Asimplesummaryoftheaboveoutputisthatthefittedlineis
y=0.8966+0.3365*x+0.0021*z
CONFIDENCEINTERVALSFORSLOPECOEFFICIENTS
95%confidenceintervalforslopecoefficient2isfromExceloutput(1.4823,2.1552).
Excelcomputesthisas
b2t_.025(3)se(b2)
=0.33647TINV(0.05,2)0.42270
=0.336474.3030.42270
=0.336471.8189
=(1.4823,2.1552).
Otherconfidenceintervalscanbeobtained.
Forexample,tofind99%confidenceintervals:intheRegressiondialogbox(intheDataAnalysisAddin),
checktheConfidenceLevelboxandsetthelevelto99%.
TESTHYPOTHESISOFZEROSLOPECOEFFICIENT("TESTOFSTATISTICAL
SIGNIFICANCE")
ThecoefficientofHHSIZEhasestimatedstandarderrorof0.4227,tstatisticof0.7960andpvalueof
0.5095.
Itisthereforestatisticallyinsignificantatsignificancelevel=.05asp>0.05.
ThecoefficientofCUBEDHHSIZEhasestimatedstandarderrorof0.0131,tstatisticof0.1594andpvalue
of0.8880.
Itisthereforestatisticallyinsignificantatsignificancelevel=.05asp>0.05.
Thereare5observationsand3regressors(interceptandx)soweuset(53)=t(2).
Forexample,forHHSIZEp==TDIST(0.796,2,2)=0.5095.
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
5/7
12/15/2014
EXCELMultipleRegression
TESTHYPOTHESISONAREGRESSIONPARAMETER
HerewetestwhetherHHSIZEhascoefficient2=1.0.
Example:H0:2=1.0againstHa:21.0atsignificancelevel=.05.
Then
t=(b2H0valueof2)/(standarderrorofb2)
=(0.336471.0)/0.42270
=1.569.
Usingthepvalueapproach
pvalue=TDIST(1.569,2,2)=0.257.[Heren=5andk=3sonk=2].
Donotrejectthenullhypothesisatlevel.05sincethepvalueis>0.05.
Usingthecriticalvalueapproach
Wecomputedt=1.569
Thecriticalvalueist_.025(2)=TINV(0.05,2)=4.303.[Heren=5andk=3sonk=2].
Sodonotrejectnullhypothesisatlevel.05sincet=|1.569|<4.303.
OVERALLTESTOFSIGNIFICANCEOFTHEREGRESSIONPARAMETERS
WetestH0:2=0and3=0versusHa:atleastoneof2and3doesnotequalzero.
FromtheANOVAtabletheFteststatisticis4.0635withpvalueof0.1975.
Sincethepvalueisnotlessthan0.05wedonotrejectthenullhypothesisthattheregressionparametersare
zeroatsignificancelevel0.05.
Concludethattheparametersarejointlystatisticallyinsignificantatsignificancelevel0.05.
Note:SignificanceFingeneral=FINV(F,k1,nk)wherekisthenumberofregressorsincludinghte
intercept.
HereFINV(4.0635,2,2)=0.1975.
PREDICTEDVALUEOFYGIVENREGRESSORS
Considercasewherex=4inwhichcaseCUBEDHHSIZE=x^3=4^3=64.
yhat=b1+b2x2+b3x3=0.88966+0.33654+0.002164=2.37006
EXCELLIMITATIONS
Excelrestrictsthenumberofregressors(onlyupto16regressors??).
Excelrequiresthatalltheregressorvariablesbeinadjoiningcolumns.
Youmayneedtomovecolumnstoensurethis.
e.g.IftheregressorsareincolumnsBandDyouneedtocopyatleastoneofcolumnsBandDsothatthey
areadjacenttoeachother.
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
6/7
12/15/2014
EXCELMultipleRegression
Excelstandarderrorsandtstatisticsandpvaluesarebasedontheassumptionthattheerrorisindependent
withconstantvariance(homoskedastic).
Exceldoesnotprovidealternaties,suchasheteroskedasticrobustorautocorrelationrobuststandarderrors
andtstatisticsandpvalues.
MorespecializedsoftwaresuchasSTATA,EVIEWS,SAS,LIMDEP,PCTSP,...isneeded.
ForfurtherinformationonhowtouseExcelgoto
http://cameron.econ.ucdavis.edu/excel/excel.html
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
7/7