Anda di halaman 1dari 44

# Excel For Statistical Data Analysis

Europe Site Site for Asia Site for Middle East UK Site USA Site
This is a webtext companion site of
Asia-Pacific Site Europe Site Site for Asia Site for Middle East USA Site
Para mis visitantes del mundo de habla hispana, este sitio se encuentra disponible en espaol
en:
Sitio Espejo para Espaa Sitio Espejo para Amrica !atina Sitio de los E"E"U"U"
Excel is the widely used statistical package, which serves as a tool to understand statistical
concepts and computation to check your hand-worked calculation in solving your homework
problems The site provides an introduction to understand the basics of and working with the
Excel !edoing the illustrated numerical examples in this site will help improving your
familiarity and as a result increase the effectiveness and efficiency of your process in
statistics
Professor Hossein Arsham
To search the site, try Edit | Find in page [Ctrl + f]. Enter a word or phrase in the dialogue
box, e.g. "variance" or "mean" If the frst appearance of the word/phrase is not what you
are looking for, try Find Next.
ME#U
1. "ntroduction
2. Entering #ata
3. #escriptive \$tatistics
4. %ormal #istribution
5. &onfidence "nterval for the 'ean
6. Test of (ypothesis &oncerning the Population 'ean
7. #ifference )etween 'ean of Two Populations
8. *%+,*: *nalysis of ,ariances
9. -oodness-of-.it Test for #iscrete !andom ,ariables
10.Test of "ndependence: &ontingency Tables
11.Test (ypothesis &oncerning the ,ariance of Two Populations
12./inear &orrelation and !egression *nalysis
13.'oving *verage and Exponential \$moothing
14.*pplications and %umerical Examples
15.E-/abs to .ully 0nderstand \$tatistical &oncepts
16."nteresting and 0seful \$ites
\$ompanion Sites%
Topics in \$tatistical #ata *nalysis
Time \$eries *nalysis and )usiness .orecasting
&omputers and &omputational \$tatistics
1uestionnaire #esign and \$urveys \$ampling
Probabilistic 'odeling
\$ystems \$imulation
Probability and \$tatistics !esources
The )usiness \$tatistics +nline &ourse
&ntroduction
This site provides illustrative experience in the use of Excel for data summary, presentation,
and for other basic statistical analysis " believe the popular use of Excel is on the areas where
Excel really can excel This includes organi2ing data, ie basic data management, tabulation
and graphics .or real statistical analysis on must learn using the professional commercial
statistical packages such as \$*\$, and \$P\$\$
Microsoft Excel '((( 3version 45 provides a set of data analysis tools called the Analysis
)oolPa* which you can use to save steps when you develop complex statistical analyses 6ou
provide the data and parameters for each analysis7 the tool uses the appropriate statistical
macro functions and then displays the results in an output table \$ome tools generate charts in
"f the #ata *nalysis command is selectable on the Tools menu, then the *nalysis ToolPak is
installed on your system (owever, if the #ata *nalysis command is not on the Tools menu,
you need to install the *nalysis ToolPak by doing the following:
Step +% +n the Tools menu, click *dd-"ns "f *nalysis ToolPak is not listed in the *dd-"ns
dialog box, click )rowse and locate the drive, folder name, and file name for the *nalysis
ToolPak *dd-in 8 *nalys9:xll 8 usually located in the Program .iles;'icrosoft
+ffice;+ffice;/ibrary;*nalysis folder +nce you find the file, select it and click +<
Step '% "f you don=t find the *nalys9:xll file, then you must install it
> "nsert your 'icrosoft +ffice :??? #isk > into the &# !+' drive
: \$elect !un from the @indows \$tart menu
9 )rowse and select the drive for your &# \$elect \$etupexe, click +pen, and
click +<
A &lick the *dd or !emove .eatures button
B &lick the C next to 'icrosoft Excel for @indows
D &lick the C next to *dd-ins
E &lick the down arrow next to *nalysis ToolPak
F \$elect !un from 'y &omputer
4 \$elect the 0pdate %ow button
>? Excel will now update your system to include *nalysis ToolPak
>> /aunch Excel
>: +n the Tools menu, click *dd-"ns - and select the *nalysis ToolPak check
box
Step ,% The *nalysis ToolPak *dd-"n is now installed and #ata *nalysis will now be
'icrosoft Excel is a powerful spreadsheet package available for 'icrosoft @indows and the
*pple 'acintosh \$preadsheet software is used to store information in columns and rows
which can then be organi2ed andGor processed \$preadsheets are designed to work well with
numbers but often include text Excel organi2es your work into workbooks7 each workbook
can contain many worksheets7 worksheets are used to list and analy2e data
Excel is available on all public-access P&s 3ie, those, eg, in the /ibrary and P& /abs5 "t
can be opened either by selecting \$tart - Programs - 'icrosoft Excel or by clicking on the
Excel \$hort &ut which is either on your desktop, or on any P&, or on the +ffice Tool bar
-penin. a Document%
&lick on .ile-+pen 3&trlC+5 to openGretrieve an existing workbook7 change
the directory area or drive to look for files in other locations
To create a new workbook, click on .ile-%ew-)lank #ocument
Sa/in. and \$losin. a Document%
To save your document with its current filename, location and file format either click on .ile
- \$ave "f you are saving for the first time, click .ile-\$ave7 chooseGtype a name for your
document7 then click +< *lso use .ile-\$ave if you want to save to a different
filenameGlocation
@hen you have finished working on a document you should close it -o to the .ile menu and
click on &lose "f you have made any changes since the file was last saved, you will be asked
if you wish to save them
)0e Excel screen
1or*2oo*s and 3or*s0eets%
@hen you start Excel, a blank worksheet is displayed which consists of a multiple grid of
cells with numbered rows down the page and alphabetically-titled columns across the page
Each cell is referenced by its coordinates 3eg, *9 is used to refer to the cell in column * and
row 97 )>?:):? is used to refer to the range of cells in column ) and rows >? through :?5
6our work is stored in an Excel file called a workbook Each workbook may contain several
worksheets andGor charts - the current worksheet is called the active sheet To view a different
worksheet in a workbook click the appropriate \$heet Tab
6ou can access and execute commands directly from the main menu or you can point to one
of the toolbar buttons 3the display box that appears below the button, when you place the
cursor over it, indicates the nameGaction of the button5 and click once
Mo/in. Around t0e 1or*s0eet%
"t is important to be able to move around the worksheet effectively because you can only
enter or change data at the position of the cursor 6ou can move the cursor by using the arrow
keys or by moving the mouse to the reHuired cell and clicking +nce selected the cell
becomes the active cell and is identified by a thick border7 only one cell can be active at a
time
To move from one worksheet to another click the sheet tabs 3"f your workbook contains
many sheets, right-click the tab scrolling buttons then click the sheet you want5 The name of
the active sheet is shown in bold
Mo/in. Bet3een \$ells%
(ere is a keyboard shortcuts to move the active cell:
(ome - moves to the first column in the current row
&trlC(ome - moves to the top left corner of the document
End then (ome - moves to the last cell in the document
To move between cells on a worksheet, click any cell or use the arrow keys To see a different
area of the sheet, use the scroll bars and click on the arrows or the area aboveGbelow the scroll
box in either the vertical or hori2ontal scroll bars
#ote that the si2e of a scroll box indicates the proportional amount of the used area of the
sheet that is visible in the window The position of a scroll box indicates the relative location
of the visible area within the worksheet
Enterin. Data
* new worksheet is a grid of ro3s and columns The rows are labeled with numbers, and the
columns are labeled with letters Each intersection of a row and a column is a cell Each cell
has an address, which is the column letter and the row number The arrow on the worksheet
to the right points to cell *>, which is currently 0i.0li.0ted, indicating that it is an acti/e
cell * cell must be active to enter information into it To highlight 3select5 a cell, click on it
To select more than one cell:
&lick on a cell 3eg *>5, then hold the shift key while you click on another
3eg #A5 to select all cells between and including *> and #A
&lick on a cell 3eg *>5 and drag the mouse across the desired range,
unclicking on another cell 3eg #A5 to select all cells between and including
*> and #A
To select several cells which are not adIacent, press JcontrolJ and click on the
cells you want to select &lick a number or letter labeling a row or column to
select that entire row or column
+ne worksheet can have up to :BD columns and DB,B9D rows, so it=ll be a while before you
run out of space
Each cell can contain a la2el, /alue, lo.ical /alue, or formula
/abels can contain any combination of letters, numbers, or symbols
,alues are numbers +nly values 3numbers5 can be used in calculations *
value can also be a date or a time
/ogical values are JtrueJ or JfalseJ
.ormulas automatically do calculations on the values in other specified cells
and display the result in the cell in which the formula is entered 3for example,
you can specify that cell #9 is to contain the sum of the numbers in )9 and
&97 the number displayed in #9 will then be a funtion of the numbers entered
into )9 and &95
To enter information into a cell, select the cell and begin typing
%ote that as you type information into the cell, the information you enter also displays in the
formula bar 6ou can also enter information into the formula bar, and the information will
appear in the selected cell
@hen you have finished entering the label or value:
Press JEnterJ to move to the next cell below 3in this case, *:5
Press JTabJ to move to the next cell to the right 3in this case, )>5
&lick in any cell to select it
Enterin. !a2els
0nless the information you enter is formatted as a value or a formula, Excel will interpret it
as a label, and defaults to align the text on the left side of the cell
"f you are creating a long worksheet and you will be repeating the same label information in
many different cells, you can use the Auto\$omplete function This function will look at
other entries in the same column and attempt to match a previous entry with your current
entry .or example, if you have already typed J@esleyanJ in another cell and you type J@J in
a new cell, Excel will automatically enter J@esleyanJ "f you intended to type J@esleyanJ
into the cell, your task is done, and you can move on to the next cell "f you intended to type
something else, eg J@illiams,J into the cell, Iust continue typing to enter the term
To turn on the *uto&omplete funtion, click on JToolsJ in the menu bar, then select J+ptions,J
then select JEdit,J and click to put a check in the box beside JEnable *uto&omplete for cell
valuesJ
*nother way to Huickly enter repeated labels is to use the Pic* !ist feature !ight click on a
cell, then select JPick .rom /istJ This will give you a menu of all other entries in cells in
that column &lick on an item in the menu to enter it into the currently selected cell
Enterin. 4alues
* value is a number, date, or time, plus a few symbols if necessary to further define the
numbers Ksuch as: , C - 3 5 L M G N
#um2ers are assumed to be positive7 to enter a negative number, use a minus sign J-J or
enclose the number in parentheses J35J
Dates are stored as ''G##G6666, but you do not have to enter it precisely in that format
"f you enter JIan 4J or JIan-4J, Excel will recogni2e it at Oanuary 4 of the current year, and
store it as >G4G:??: Enter the four-digit year for a year other than the current year 3eg JIan 4,
>444J5 To enter the current day=s date, press JcontrolJ and J7J at the same time
)imes default to a :A hour clock 0se JaJ or JpJ to indicate JamJ or JpmJ if you use a >:
hour clock 3eg JF:9? pJ is interpreted as F:9? P'5 To enter the current time, press JcontrolJ
and J:J 3shift-semicolon5 at the same time
*n entry interpreted as a value 3number, date, or time5 is aligned to the right side of the cell,
to reformat a value
5oundin. #um2ers t0at Meet Specified \$riteria% To apply colors to maximum andGor
minimum values:
> \$elect a cell in the region, and press &trlC\$hiftCP 3in Excel :??9, press this or
&trlC*5 to select the &urrent !egion
: .rom the .ormat menu, select &onditional .ormatting
9 "n &ondition >, select .ormula "s, and type Q'*R3M.:M.5 QM.>
A &lick .ormat, select the .ont tab, select a color, and then click +<
B "n &ondition :, select .ormula "s, and type Q'"%3M.:M.5 QM.>
D !epeat step A, select a different color than you selected for &ondition >, and
then click +<
#ote% )e sure to distinguish between absolute reference and relative reference when entering
the formulas
5oundin. #um2ers t0at Meet Specified \$riteria
Pro2lem% !ounding all the numbers in column * to 2ero decimal places, except for those that
have JBJ in the first decimal place
Solution% 0se the "., '+#, and !+0%# functions in the following formula:
Q".3'+#3*:,>5Q?B,*:,!+0%#3*:,?55
)o \$opy and Paste All \$ells in a S0eet
> \$elect the cells in the sheet by pressing &trlC* 3in Excel :??9, select a cell in
a blank area before pressing &trlC*, or from a selected cell in a &urrent
!egionG/ist range, press &trlC*C*5
-5
&lick \$elect *ll at the top-left intersection of rows and columns
: Press &trlC&
9 Press &trlCPage #own to select another sheet, then select cell *>
A Press Enter
)o \$opy t0e Entire S0eet
&opying the entire sheet means copying the cells, the page setup parameters, and the defined
range %ames
-ption +%
> 'ove the mouse pointer to a sheet tab
: Press &trl, and hold the mouse to drag the sheet to a different location
9 !elease the mouse button and the &trl key
-ption '%
> !ight-click the appropriate sheet tab
: .rom the shortcut menu, select 'ove or &opy The 'ove or &opy dialog box
enables one to copy the sheet either to a different location in the current
workbook or to a different workbook )e sure to mark the &reate a copy
checkbox
-ption ,%
> .rom the @indow menu, select *rrange
: \$elect Tiled to tile all open workbooks in the window
9 0se +ption > 3dragging the sheet while pressing &trl5 to copy or move a sheet
Sortin. 2y \$olumns
The default setting for sorting in *scending or #escending order is by row To sort by
columns:
> .rom the #ata menu, select \$ort, and then +ptions
: \$elect the \$ort left to right option button and click +<
9 "n the \$ort by option of the \$ort dialog box, select the row number by which
the columns will be sorted and click +<
Descripti/e Statistics
The #ata *nalysis ToolPak has a #escriptive \$tatistics tool that provides you with an easy
way to calculate summary statistics for a set of sample data \$ummary statistics includes
'ean, \$tandard Error, 'edian, 'ode, \$tandard #eviation, ,ariance, <urtosis, \$kewness,
!ange, 'inimum, 'aximum, \$um, and &ount This tool eliminates the need to type
indivividual functions to find each of these results Excel includes elaborate and customisable
toolbars, for example the JstandardJ toolbar shown here:
\$ome of the icons are useful mathematical computation:
is the J*utosumJ icon, which enters the formula JQsum35J to add up a range of cells
is the J.unction@i2ardJ icon, which gives you access to all the functions available
is the J-raph@i2ardJ icon, giving access to all graph types available, as shown in this display:
Excel can be used to generate measures of location and variability for a variable \$uppose we
wish to find descriptive statistics for a sample data: :, A, D, and F
\$tep > \$elect the Tools Ppull-down menu, if you see data analysis6 clic* on t0is option6
ot0er3ise6 clic* on add-in"" option to install analysis tool pak
\$tep : &lick on the data analysis option
\$tep 9 &hoose Descripti/e Statistics from *nalysis Tools list
\$tep A @hen the dialog box appears:
Enter *>:*A in the input ran.e box, A+ is a value in column A and ro3 +, in this case this
value is : 0sing the same techniHue enter other ,*/0E\$ until you reach the last one "f a
sample consists of :? numbers, you can select for example *>, *:, *9, etc as the input
range
\$tep B \$elect an output ran.e, in this case )> &lick on summary statistics to see the results
\$elect -K
@hen you click -K, you will see the result in the selected range
*s you will see, the mean of the sample is B, the median is B, the standard deviation is
:BF>4F4, the sample variance is DDDDDDE,the range is D and so on Each of these factors
might be important in your calculation
of different statistical procedures
#ormal Distri2ution
&onsider the problem of finding the probability of getting less than a certain value under any
normal probability distribution *s an illustrative example, let us suppose the \$*T scores
nationwide are normally distributed with a mean and standard deviation of B?? and >??,
respectively *nswer the following Huestions based on the given information:
*: @hat is the probability that a randomly selected student score will be less than D?? pointsS
): @hat is the probability that a randomly selected student score will exceed D?? pointsS
&: @hat is the probability that a randomly selected student score will be between A?? and
D??S
(int: 0sing Excel you can find the probability of getting a value approximately less than or
eHual to a given value "n a problem, when the mean and the standard deviation of the
population are given, you have to use common sense to find different probabilities based on
the Huestion since you know the area under a normal curve is >
Solution%
"n the work sheet, select the cell where you want the answer to appear \$uppose, you chose
cell number one, *> .rom the menus, select Jinsert pull-downJ
Steps '-, .rom the menus, select insert, then click on the .unction option
Step 7" *fter clicking on the .unction option, the Paste .unction dialog appears from
.unction &ategory &hoose Statistical then NORMDIST from the Function Name box7
&lick OK
Step 8" *fter clicking on +<, the %+!'#"\$T distribution box appears:
i Enter D?? in R 3the value box57
ii Enter B?? in the 'ean box7
iii Enter >?? in the \$tandard deviation box7
iv Type JtrueJ in the cumulative box, then click +<
*s you see the value ?FA>9AAEA appears in *>, indicating the probability that a randomly
selected student=s score is below D?? points 0sing common sense we can answer part JbJ by
subtracting ?FA>9AAEA from > \$o the part JbJ answer is >- ?FA>9AEA or ?>BFDB9 This is
the probability that a randomly selected student=s score is greater than D?? points To answer
part JcJ, use the same techniHues to find the probabilities or area in the left sides of values
D?? and A?? \$ince these areas or probabilities overlap each other to answer the Huestion you
should subtract the smaller probability from the larger probability The answer eHuals
?FA>9AAEA - ?>BFDBB:D that is, ?DF:D4 The screen shot should look like following:
&n/erse \$ase
&alculating the value of a random variable often called the JxJ value
6ou can use #-5M&#4 from the function box to calculate a value for the random variable -
if the probability to the left side of this variable is given *ctually, you should use this
function to calculate different percentiles "n this problem one could ask what is the score of a
student whose percentile is 4?S This means approximately 4?L of students scores are less
than this number +n the other hand if we were asked to do this problem by hand, we would
have had to calculate the x value using the normal distribution formula x Q m C 2d %ow let=s
use Excel to calculate P4? "n the Paste function, dialog click on statistical, then click on
#-5M&#4 The screen shot would look like the following:
@hen you see #-5M&#4 the dialog box appears
i Enter ?4? for the probability 3this means that approximately 4?L of students= score is less
than the value we are looking for5
ii Enter B?? for the mean 3this is the mean of the normal distribution in our case5
iii Enter >?? for the standard deviation 3this is the standard deviation of the normal
distribution in our case5
*t the end of this screen you will see the formula result which is approximately D:F points
This means the top >?L of the students scored better than D:F
\$onfidence &nter/al for t0e Mean
\$uppose we wish for estimating a confidence interval for the mean of a population
#epending on the si2e of your sample si2e you may use one of the following cases:
!ar.e Sample Si9e :n is lar.er t0an6 say ,(;%
The general formula for developing a confidence interval for a population means is:
"n this formula is the mean of the sample7 T is the interval coefficient, which can be found
from the normal distribution table 3for example the interval coefficient for a 4BL confidence
level is >4D5 \$ is the standard deviation of the sample and n is the sample si2e
%ow we would like to show how Excel is used to develop a certain confidence interval of a
population mean based on a sample information *s you see in order to evaluate this formula
you need Jthe mean of the sampleJ and the margin of error Excel will
automatically calculate these Huantities for you
The only things you have to do are:
add the margin of error to the mean of the sample, 7 .ind the upper limit of
the interval and subtract the margin of error from the mean to the lower limit of the interval
To demonstrate how Excel finds these Huantities we will use the data set, which contains the
hourly income of 9D work-study students here, at the 0niversity of )altimore These numbers
appear in cells *> to *9D on an Excel work sheet
*fter entering the data, we followed the descriptive statistic procedure to calculate the
unknown Huantities The only additional step is to click on the confidence interval in the
descriptive statistics dialog box and enter the given confidence level, in this case 4BL
(ere is, the above procedures in step-by-step:
\$tep > Enter data in cells *> to *9D 3on the spreadsheet5
\$tep : .rom the menus select )ools
\$tep 9 &lick on Data Analysis then choose the Descripti/e Statistics option then click -K
+n the descriptive statistics dialog, click on \$ummary \$tatistic *fter you have done that,
click on the confidence interval level and type 4BL - or in other problems whatever
confidence interval you desire "n the +utput !ange box enter )> or what ever location you
desire
%ow click on -K The screen shot would look like the following:
*s you see, the spreadsheet shows that the mean of the sample is Q D4?:EEEEEF and the
absolute value of the margin of error Q ?:9>DEF>?4 This mean is based on
this sample information * 4BL confidence interval for the hourly income of the 0) work-
study students has an upper limit of D4?:EEEEEF C ?:9>DEF>?4 and a lower limit of
D4?:EEEEEF - ?:9>DEF>?4
+n the other hand, we can say that of all the intervals formed this way 4BL contains the
mean of the population +r, for practical purposes, we can be 4BL confident that the mean of
the population is between D4?:EEEEEF - ?:9>DEF>?4 and D4?:EEEEEF C ?:9>DEF>?4 @e
can be at least 4BL confident that interval KMDDF and ME>9N contains the average hourly
income of a work-study student
Smal Sample Si9e :say less t0an ,(; "f the sample n is less than 9? or we must use the small
sample procedure to develop a confidence interval for the mean of a population The general
formula for developing confidence intervals for the population mean based on small a sample
is:
"n this formula is the mean of the sample is the interval coefficient providing an area
of in the upper tail of a t distribution with n-> degrees of freedom which can be found
from a t distribution table 3for example the interval coefficient for a 4?L confidence level is
>F99 if the sample is >?5 \$ is the standard deviation of the sample and n is the sample si2e
%ow you would like to see how Excel is used to develop a certain confidence interval of a
population mean based on this small sample information
*s you see, to evaluate this formula you need Jthe mean of the sampleJ and the margin of
error Excel will automatically calculate these Huantities the way it did for
large samples
*gain, the only things you have to do are: add the margin of error to the
mean of the sample, , find the upper limit of the interval and to subtract the margin of error
from the mean to find the lower limit of the interval
To demonstrate how Excel finds these Huantities we will use the data set, which contains the
hourly incomes of >? work-study students here, at the 0niversity of )altimore These
numbers appear in cells *> to *>? on an Excel work sheet
*fter entering the data we follow the descriptive statistic procedure to calculate the unknown
Huantities 3exactly the way we found Huantities for large sample5 (ere you are with the
procedures in step-by-step form:
\$tep > Enter data in cells *> to *>? on the spreadsheet
\$tep : .rom the menus select )ools
\$tep 9 &lick on Data Analysis then choose the Descripti/e Statistics option &lick -K on
the descriptive statistics dialog, click on \$ummary \$tatistic, click on the confidence interval
level and type in 4?L or in other problems whichever confidence interval you desire "n the
+utput !ange box, enter )> or whatever location you desire %ow click on -K The screen
shot will look like the following:
%ow, like the calculation of the confidence interval for the large sample, calculate the
confidence interval of the population based on this small sample information The confidence
interval is:
DF U ?A>AA:D>?:
or
MD94VQQQWME:>
@e can be at least 4?L confidant that the interval KMD94 and ME:>N contains the true mean of
the population
)est of <ypot0esis \$oncernin. t0e Population Mean
*gain, we must distinguish two cases with respect to the si2e of your sample
!ar.e Sample Si9e :say6 o/er ,(;% "n this section you wish to know how Excel can be used
to conduct a hypothesis test about a population mean @e will use the hourly incomes of
different work-study students than those introduced earlier in the confidence interval section
#ata are entered in cells *> to *9D The obIective is to test the following #ull and
Alternati/e hypothesis:
The null hypothesis indicates that the average hourly income of a work-study student is eHual
to ME per hour7 however, the alternative hypothesis indicates that the average hourly income
is not eHual to ME per hour
" will repeat the steps taken in descriptive statistics and at the very end will show how to find
the value of the test statistics in this case, 2, using a cell formula
\$tep > Enter data in cells *> to *9D 3on the spreadsheet5
\$tep : .rom the menus select )ools
\$tep 9 &lick on Data Analysis then choose the Descripti/e Statistics option, click -K
+n the descriptive statistics dialog, click on \$ummary \$tatistic \$elect the -utput 5an.e
box, enter )> or whichever location you desire %ow click -K
3To calculate the value of the test statistics search for the mean of the sample then the
standard error "n this output, these values are in cells &9 and &A5
\$tep A \$elect cell #> and enter the cell formula Q 3&9 - E5G&A The screen shot should look
like the following:
The value in cell #> is the value of the test statistics \$ince this value falls in acceptance
range of ->4D to >4D 3from the normal distribution table5, we fail to reIect the null
hypothesis
Small Sample Si9e :say6 less t0an ,(;%
0sing steps taken the large sample si2e case, Excel can be used to conduct a hypothesis for
small-sample case /et=s use the hourly income of >? work-study students at 0) to conduct
the following hypothesis
The null hypothesis indicates that average hourly income of a work-study student is eHual to
ME per hour The alternative hypothesis indicates that average hourly income is not eHual to
ME per hour
" will repeat the steps taken in descriptive statistics and at the very end will show how to find
the value of the test statistics in this case JtJ using a cell formula
\$tep > Enter data in cells *> to *>? 3on the spreadsheet5
\$tep : .rom the menus select )ools
\$tep 9 &lick on Data Analysis then choose the Descripti/e Statistics option &lick -K
+n the descriptive statistics dialog, click on \$ummary \$tatistic \$elect the +utput !ange
boxes, enter )> or whatever location you chose *gain, click on -K
3To calculate the value of the test statistics search for the mean of the sample then the
standard
error, in this output these values are in cells &9 and &A5
\$tep A \$elect cell #> and enter the cell formula Q 3&9 - E5G&A The screen shot would look
like the following:
\$ince the value of test statistic t Q -?DDF4D falls in acceptance range -::D: to C::D: 3from t
table, where Q ??:B and the degrees of freedom is 45, we fail to reIect the null
hypothesis
Difference Bet3een Mean of )3o Populations
"n this section we will show how Excel is used to conduct a hypothesis test about the
difference between two population means assuming that populations have eHual variances
The data in this case are taken from various offices here at the 0niversity of )altimore "
collected the hourly income data of 9D randomly selected work-study students and 9D student
assistants The hourly income range for work-study students was MD - MF while the hourly
income range for student assistants was MD-M4 The main obIective in this hypothesis testing
is to see whether there is a significant difference between the means of the two populations
The #U!! and the A!)E5#A)&4E hypothesis is that the means are eHual and the means
are not eHual, respectively
!eferring to the spreadsheet, " chose A+ and A' as label centers The work-study students=
hourly income for a sample si2e 9D are shown in cells A'%A,=, and the student assistants=
hourly income for a sample si2e 9D is shown in cells B'%B,=
Data for 1or* Study Student% D, D, D, D, D, D, D, DB, DB, DB, DB, DB, DB, E, E, E, E, E, E, E,
EB, EB, EB, EB, EB, EB, F, F, F, F, F, F, F, F, F
Data for Student Assistant% D, D, D, D, D, DB, DB, DB, DB, DB, E, E, E, E, E, EB, EB, EB, EB,
EB, EB, F, F, F, F, F, F, F, FB, FB, FB, FB, FB, 4, 4, 4, 4
0se the Descriptive Statistics procedure to calculate the variances of the two samples The
Excel procedure for testing the difference between the two population means will reHuire
information on the variances of the two populations \$ince the variances of the two
populations are unknowns they should be replaced with sample variances The descriptive for
both samples show that the variance of first sample is s
>
:
Q ("8887>'+?, while the variance of
the second sample s
:
:
Q("@>@=7?
3or*-study student student assistant
'ean E?BE>A:FD 'ean EAE>A:4
\$tandard Error ?>:B4EEBE \$tandard Error ?>DDABA
'edian E 'edian EB
'ode F 'ode F
\$tandard #eviation ?EAB:499B \$tandard #eviation ?4FAEBF
\$ample ,ariance ?BBBAD:>F \$ample ,ariance ?4D4EAF
<urtosis ->9FFE?BBF <urtosis ->>4:F:B
\$kewness -??49EA9EB \$kewness -??>9F>4
!ange : !ange 9
'inimum D 'inimum D
'aximum F 'aximum 4
\$um :AE \$um :D>B
&ount 9B &ount 9B
To conduct the desired test hypothesis with Excel the following steps can be taken:
Step +" .rom the menus select Tools then click on the Data Analysis option
Step '" @hen the Data Analysis dialog box appears:
\$0oose 9-)est% )3o Sample for means then click +<
Step ," @hen the 9-)est% )3o Sample for means dialog box appears:
Enter A+%A,> in the /aria2le + ran.e 2ox :3or*-study studentsA 0ourly income;
Enter B+%B,> in the /aria2le ' ran.e 2ox :student assistantsA 0ourly income;
Enter ? in the Hypothesis Mean Difference box 3if you desire to test a mean difference other
than ?, enter that value5
Enter the variance of the first sample in the 4aria2le + 4ariance 2ox
Enter the variance of the second sample in the 4aria2le ' 4ariance 2ox and select /abels
Enter ??B or, whatever le/el of si.nificance you desire, in the Alp0a 2ox
\$elect a suitable -utput 5an.e for the results, " chose \$+@, then click +<
The value of test statistic 9B-+"@?78?'7 appears in our case in cell #:A The reIection rule for
this test is 2 V ->4D or 2 W >4D from the normal distribution table "n the Excel output these
values for a two-tail test are 2V->4B44D>?F: and 2WC>4B44D>?F: \$ince the value of the test
statistic 2Q->4FABF:A is less than ->4B44D>?F: we reIect the null hypothesis @e can also
draw this conclusion by comparing the p-value for a two tail -test and the alpha value
\$ince p-value ("(7=+@(?+, is less than aQ??B we reIect the null hypothesis +verall we can
say, based on the sample results, the two populations= means are different
Small Samples% n
+
-5 n
'
are less t0an ,(
"n this section we will show how Excel is used to conduct a hypothesis test about the
difference between two population means - -iven that the populations have eHual variances
when two small independent samples are taken from both populations \$imilar to the above
case, the data in this case are taken from various offices here at the 0niversity of )altimore "
collected hourly income data of >> randomly selected work-study students and >> randomly
selected student assistants The hourly income range for both groups was similar range, MD -
MF and MD-M4 The main obIective in this hypothesis testing is similar too, to see whether
there is a significant difference between the means of the two populations The #U!! and
the */TE!%*T",E hypothesis are that the means are eHual and they are not eHual,
respectively
work-study student student assistant
D D
F 4
EB FB
DB E
E DB
D E
EB EB
F D
D F
DB 4
E EBW
!eferring to the spreadsheet, we chose A+ and A' as label centers The work-study students=
hourly income for a sample si2e >> are shown in cells A'%A+', and the student assistants=
hourly income for a sample si2e >> is shown in cells B'%B+' 0nlike previous case, you do
not have to calculate the variances of the two samples, Excel will automatically calculate
these Huantities and use them in the calculation of the value of the test statistic
\$imilar to the previous case, but a bit different in step X :, to conduct the desired test
hypothesis with Excel the following steps can be taken:
Step +" .rom the menus select Tools then click on the Data Analysis option
Step '" @hen the Data Analysis dialog box appears:
&hoose t-)est% )3o Sample Assumin. ECual 4ariances then click +<
Step , @hen the t-)est% )3o Sample Assumin. ECual 4ariances dialo. 2ox appears:
Enter *>:*>: in the /aria2le + ran.e 2ox 3work-study student hourly income5
Enter )>:)>: in the /aria2le ' ran.e 2ox 3student assistant hourly income5
Enter ? in the <ypot0esis Mean Difference box3if you desire to test a mean difference other
than 2ero, enter that value5 then select /abels
Enter ??B or, whatever level of significance you desire, in the Alpha box
\$elect a suitable Output Range for the results, " chose &>, then click +<
The value of the test statistic tB-+",>'''@?'? appears, in our case, in cell #>? The reIection
rule for this test is tV-:?FD or tWC:?FD from the t distri2ution ta2le where the t value is
based on a t distribution with n
>
-n
:
-: degrees of freedom and where the area of the upper one
tail is ??:B 3 that is eHual to alphaG:5
"n the Excel output the values for a two-tail test are tV-:?FB4D:AEF and tWC:?FB4D:AEF
\$ince the value of the test statistic tQ->9D:::4F:F, is in an acceptance range of tV-
:?FB4D:AEF and tWC:?FB4D:AEF, we fail to reIect the null hypothesis
@e can also draw this conclusion by comparing the p-value for a two-tail test and the alpha
value
\$ince the p-/alue ("+??'=+'=? is .reater t0an aB("(8 a.ain, we fail to reIect the null
hypothesis
+verall we can say, based on sample results, the two populations= means are eHual
work-study student student assistant
'ean D4?4?4?4?4 EABABABABB
,ariance ?B4?4?4?4> >>E:E:E:E9
+bservations >> >>
Pooled ,ariance ?FF>F>F>F:
(ypothesi2ed 'ean #ifference ?
#f :?
t \$tat ->9D:::4F:F
P3TVQt5 one tail ??4A>9BD94
t &ritical one tail >E:AE>F??A
P3TVQt5two tail ?>FF:E>:EF
t &ritical two tail :?FB4D:AEF
A#-4A% Analysis of 4ariances
"n this section the obIective is to see whether or not means of three or more populations based
on random samples taken from populations are eHual or not *ssuming independents samples
are taken from normally distributed populations with eHual variances, Excel would do this
analysis if you choose one way anova from the menus @e can also choose *nova: two way
factor with or without replication option and see whether there is significant difference
between means when different factors are involved
Sin.le-Factor A#-4A )est
"n this case we were interested to see whether there a significant difference among hourly
wages of student assistants in three different service departments here at the 0niversity of
)altimore \$ix student assistants were randomly were selected from the three departments
and their hourly wages were recorded as following:
*!& &\$" T&&
>??? DB? 4??
F?? E?? E??
EB? E?? E??
F?? EB? E??
E?? E?? DB?
Enter data in an Excel work sheet starting with cell *: and ending with cell &F The
following steps should be taken to find the proper output for interpretation
Step +" .rom the menus select Tools and click on #ata *nalysis option
Step '" @hen data analysis dialog appears, choose *nova single-factor option7 enter *::&F
in the input range box \$elect labels in first row
Step,"\$elect any cell as output3in here we selected *>>5 &lick +<
The general form of *nova table looks like following:
\$ource of ,ariation \$\$ df '\$ . P-value . crit
)etween -roups \$\$T! <-> '\$T! '\$TG'\$E ??ADE:B 9DF:9>DDEA
@ithin -roups \$\$E n
t
-< '\$E
Total
\$uppose the test is done at level of significance a Q ??B, we reIect the null hypothesis This
means there is a significant difference between means of hourly incomes of student assistants
in these departments
)0e )3o-3ay A#-4A 1it0out 5eplication
"n this section, the study involves six students who were offered different hourly wages in
three different department services here at the 0niversity of )altimore The obIective is to see
whether the hourly incomes are the same Therefore, we can consider the following:
.actor: #epartment
Treatment: (ourly payments in the three departments
)locks: Each student is a block since each student has worked in the three different
departments
\$tudent *!& &\$" T&&
> >??? EB? E??
: F?? E?? D??
9 E?? D?? D??
A F?? DB? DB?
B 4?? F?? E??
D F?? F?? D??
)0e .eneral form of Ano/a ta2le 3ould loo* li*e%
\$ource of ,ariation \$um of \$Huares #egrees of freedom 'ean \$Huares .
Treatment \$\$T <-> '\$T .Q'\$TG'\$E
)locks \$\$) b-> '\$)
Error \$\$E 3<->53b->5 '\$)
Total \$\$T nt->
)o find t0e Excel output for t0e a2o/e data t0e follo3in. steps can 2e ta*en%
Step +" .rom the menus select Tools and click on #ata *nalysis option
Step'" @hen data analysis box appears: select *nova two-factor without replication then
Enter *:: #F in the input range \$elect labels in first row
Step," \$elect an output range 3in here we selected *>>5 then +<
\$0''*!6 &+0%T \$0' *,E!*-E ,*!"*%&E
> 9 :AB F>DDDDE :BF9999
: 9 :> E >
9 9 >4B DB ?:B
A 9 :>B E>DDDDE ?BF9999
B 9 :9 EDDDDDE :999999
D 9 :: E999999 >999999
*!& D B? F999999 >?DDDDE
&\$" D A9 E>DDDDE ?DDDDDE
T&& D 9FB DA>DDDE ?:A>DDE
A#-4A
\$ource of ,ariation \$\$ df '\$ . P-value . crit
!ows A4?:EEF B ?4F?BBD >4E:?DE ?>DFE4: 99:BF9E
&olumns >>>4AAA : BB4E::: >>:BD4F ???:EB: A>?:F>D
Error A4E:::: >? ?A4E:::
Total :>?D4AA >E
%+TE: .Q'\$TG'\$E Q?4F?BBDG?A4E::: Q >4E:?DE
. Q 999 from table 3B numerator #. and >? denominator #.5
\$ince >4E:?DEV999 we fail to reIect the null
\$onclusion% There is not sufficient evidence to conclude that hourly rates differ for the three
departments
)3o-1ay A#-4A 3it0 5eplication
!eferring to the student assistant and the work study hourly wages here at the university of
)altimore the following data shows the hourly wages for the two categories in three different
departments:
*!& &\$" T&&
DB? D>? D4?
@ork \$tudy DF? D?? E:?
E>? DB? E>?
EA? DF? EB?
\$tudent *ssistant EB? E?? E??
F?? DD? E>?
.actors
Factor A% \$tudent Iob category 3in here two different Iob categories exists5
Factor B% #epartments 3in here we have three departments5
!eplication: The number of students in each experimental condition "n this case there are
three replications
"nteraction:
*!& &\$" T&&
DB? D>? D4?
@ork \$tudy DF? D?? E:?
E>? DB? E>?
EA? DF? EB?
\$tudent *ssistant EB? E?? E??
F?? DD? E>?
\$0''*!6 *!& &\$" T&& Total
&ount 9 9 9 4
\$um :?A >4 :> D?:
*verage DF D: E> DD4
,ariance ??4 ?> ? ?>4
&ount 9 9 9 4
\$um ::4 :? :: DA4
*verage ED9999 DF E: E:>
,ariance ?>?999 ? ?> ?>F
Total
Total
&ount D D D
\$um A99 94 A9
*verage E:>DDE DB E>
,ariance ?:FBDE ?: ?
*%+,*
\$ource of ,ariation \$\$ df '\$ . P-value . crit
\$ample3.actor *5 >::E:: > >: >FD ???>?>DBBE AEAE::>
&olumns3.actor )5 >FA999 : ?4 >94 ????EA>44F 9FFB:4
"nteraction ?9F>>> : ?: :FF ??4B??9AA9 9FFB:4
@ithin ?E4999 >: ?>
Total A:AB >E
\$onclusion%
'ean hourly income differ by Iob category
'ean hourly income differ by department
"nteraction is not significant
Doodness-of-Fit )est for Discrete 5andom 4aria2les
The \$<&-SEUA5E distribution can be used in a hypothesis test involving a population
variance (owever, in this section we would like to test and see how close a sample results
are to the expected results
Example% )0e Multinomial 5andom 4aria2le
"n this example the obIective is to see whether or not based on a randomly selected sample
information the standards set for a population is met There are so many practical examples
that can be used in this situation .or example it is assumed the guidelines for hiring people
with different ethnic background for the 0\$ government is set at E?L3@("TE5, :?L3*frican
*merican5 and >?L3others5, respectively * randomly selected sample of >??? 0\$ employees
shows the following results that is summari2ed in a table
ET(%"& ERPE&TE# %0')E! +. +)\$E!,E# .!+'
)*&<-!+0%# E'P/+6EE\$ \$*'P/E
@("TE E?? QE?L+. >??? EB?
*.!"&*% *merican :?? Q:?L+. >??? >E?
+T(E!\$ >?? Q>?L+. >??? F?
*s you see the observed sample numbers for groups two and three are lower than their
expected values unlike group one which has a higher expected value "s this a clear sign of
discrimination with respect to ethnic backgroundS @ell depends on how much lower the
expected values are The lower amount might not statistically be significant To see whether
these differences are significant we can use Excel and find the value of the &("-\$10*!E "f
this value falls within the acceptance region we can assume that the guidelines are met
otherwise they are not %ow lets enter these numbers into Excel spread- sheet @e used cells
)E-)4 for the expected proportions, &E-&4 for the observed values and #E-#4 for the
expected freHuency To calculate the expected freHuency for a category, you can multiply the
proportion of that category by the sample si2e 3in here >???5 The formula for the first cell of
the expected value column, #E is >???P)E To find other entries in the expected value
column, use the copy and the paste menu as shown in the following picture These are
important values for the chi-sHuare test The observed range in this case is &E: &4 while the
expected range is #E: #4 The null and the alternative hypothesis for this test are as follows:
(
?
: P
@
Q ?E?, P
*
Q?:? and P
+
Q?>?
(
*
: The population proportions are not P
@
Q ?E?, P
*
Q ?:? and P
+
Q ?>?
%ow lets use Excel to calculate the p-value in a \$<&-SEUA5E test Step +"\$elect a cell in
the work sheet, the location which you like the p value of the \$<&-SEUA5E to appear @e
chose cell #>:
Step '" .rom the menus, select insert then click on the Function option, Paste Function
dialog box appears
Step ,"!efer to function category box and choose statistical, from function name box select
\$<&)ES) and click on -K
Step 7"@hen the \$<&)ES) dialog appears:
Enter &E: &4 in the actual-ran.e box then enter #E: #4 in the expected-ran.e box, and
finally click on -K
The p-value will appear in the selected cell, #>:
*s you see the p value is ???:94: which is less than the value of the level of significance 3in
this case the level of significance, aQ ?>?5 (ence the null hypothesis should be reIected
This means based on the sample information the guidelines are not met %otice if you type
JQ&("TE\$T3&E:&4,#E:#45J in the formula bar the p-value will show up in the designated
cell
#-)E% Excel can actually find the value of the &("-\$10*!E To find this value first select
an empty cell on the spread sheet then in the formula bar type JQ&(""%,3#>:,:5J #>:
designates the p-,alue found previously and : is the degrees of freedom 3number of rows
minus one5 The &("-\$10*!E value in this case is >:?E>:> "f we refer to the &("-
\$10*!E table we will see that the cut off is AD?B>E since >:?E>:>WAD?B>E we reIect the
null The following screen shot shows you how to the &("-\$10*!E value
)est of &ndependence% \$ontin.ency )a2les
The &("-\$10*!E distribution is also used to test and see whether two variables are
independent or not .or example based on sample data you might want to see whether
smoking and gender are independent events for a certain population The variables of interest
in this case are smoking and the gender of an individual *nother example in this situation
could involve the age range of an individual and his or her smoking habit \$imilar to case one
data may appear in a table but unlike the case one this table may contains several columns in
addition to rows The initial table contains the observed values To find expected values for
this table we set up another table similar to this one To find the value of each cell in the new
table we should multiply the sum of the cell column by the sum of the cell row and divide the
results by the grand total The grand total is the total number of observations in a study %ow
based on the following table test whether or not the smoking habit and gender of the
population that the following sample taken from are independent +n the other hand is that
true that males in this population smoke more than femalesS
6ou could use formula bar to calculate the expected values for the expected range .or
example to find the expected value for the cell &B which is replaced in c>> you could click on
the formula bar and enter &DP#BG#D then enter in cell &>>
Step +" -2ser/ed 5an.e 27%c8
\$moking and gender
yes no total
male 9> D4 >??
female AB >:: >DE
total ED >4> :DE
Step'" Expected 5an.e 2+(%c++
\$o the observed range is bA:cB and the expected range is b>?:c>>
Step ," &lick on fx3paste function5
Step 7" @hen Paste .unction dialog box appears, click on Statistical in function category and
&("TE\$T in the function name then click +<
@hen the &("TE\$T box appears, enter bA:cB for the actual range, then b>?:c>> for the
expected range
Step 8" &lick on +< 3the p-value appears5 ?AEE94B
\$onclusion% \$ince p-value is greater than the level of significance 3??B5, fails to reIect the
null This means smoking and gender are independent events )ased on sample information
one can not assure females smoke more than males or the other way around
Step >" To find the chi-sHuare value, use &("%, function, when &hinv box appears enter
?AEE94B for probability part, then > for the degrees of freedom
#egrees of freedomQ3number of columns->5R3number of rows->5
&("-\$10*!EQ?B?AF?E
)est <ypot0esis \$oncernin. t0e 4ariance of )3o Populations
"n this section we would like to examine whether or not the variances of two populations are
eHual @henever independent simple random samples of eHual or different si2es such as n
>

and n
:
are taken from two normal distributions with eHual variances, the sampling
distribution of s
>
:
Gs
:
:
has . distribution with n
>
- > degrees of freedom for the numerator and n
:
- > degrees of freedom for the denominator "n the ratio s
>
:
Gs
:
:
the numerator s
>
:
and the
denominator s
:
:
are variances of the first and the second sample, respectively The following
figure shows the graph of an . distribution with >? degrees of freedom for both the numerator
and the denominator 0nlike the normal distribution as you see the . distribution is not
symmetric The shape of an . distribution is positively skewed and depends on the degrees of
freedom for the numerator and the denominator The value of . is always positive
%ow let see whether or not the variances of hourly income of student-assistant and work-
study students based on samples taken from populations previously are eHual *ssume that
the hypothesis test in this case is conducted at a Q ?>? The null and the alternative are:
5ejection 5ule% !eIect the null hypothesis if .V .
??4B
or .W .
??B
where ., the value of the test
statistic is eHual to s
>
:
Gs
:
:
, with >? degrees of freedom for both the numerator and the
denominator @e can find the value of .
?B
from the . distribution table "f s
>
:
Gs
:
:
, we do not
need to know the value of .
??4B
otherwise, .
?4B
Q >G .
??B
for eHual sample si2es
* survey of eleven student-assistant and eleven work-study students shows the following
descriptive statistics +ur obIective is to find the value of s
>
:
Gs
:
:
, where s
>
:
is the value of the
variance of student assistant sample and s
:
:
is the value of the variance of the work study
students sample *s you see these values are in cells .F and #F of the descriptive statistic
output
To calculate the value of s
>
:
Gs
:
:
, select a cell such as *>D and enter cell formula Q .FG#F and
enter This is the value of . in our problem \$ince this value, .Q>4FAD>B9FB, falls in
acceptance area we fail to reIect the null hypothesis (ence, the sample results do support the
conclusion that student assistants hourly income variance is eHual to the work study students
hourly income variance The following screen shoot shows how to find the . value @e can
follow the same format for one tail test3s5
!inear \$orrelation and 5e.ression Analysis
"n this section the obIective is to see whether there is a correlation between two variables and
to find a model that predicts one variable in terms of the other variable There are so many
examples that we could mention but we will mention the popular ones in the world of
business 0sually independent variable is presented by the letter x and the dependent variable
is presented by the letter y * business man would like to see whether there is a relationship
between the number of cases of sold and the temperature in a hot summer day based on
information taken from the past (e also would like to estimate the number cases of soda
which will be sold in a particular hot summer day in a ball game (e clearly recorded
temperatures and number of cases of soda sold on those particular days The following table
shows the recorded data from Oune > through Oune >9 The weatherman predicts a 4A. degree
temperature for Oune >A The businessman would like to meet all demands for the cases of
sodas ordered by customers on Oune >A
DAF
&ases of
\$oda
Temperature
>-Oun BE BD
:-Oun B4 BF
9-Oun DB D9
A-Oun DE DD
B-Oun EB E9
D-Oun F> EF
E-Oun FD FB
F-Oun FF FB
4-Oun FF FE
>?-
Oun
FA FA
>>-
Oun
F: FF
>:-
Oun
F? FA
>9-
Oun
F9 F4
%ow lets use Excel to find the linear correlation coefficient and the regression line eHuation
The linear correlation coefficient is a Huantity between -> and C> This Huantity is denoted by
5 The closer 5 to G+ the stronger positive 3direct5 correlation and similarly the closer 5 to
-> the stronger negative 3inverse5 correlation exists between the two variables The general
form of the regression line is y Q mx C b "n this formula, m is the slope of the line and b is
the y-intercept 6ou can find these Huantities from the Excel output "n this situation the
variable y 3the dependent variable5 is the number of cases of soda and the x 3independent
variable5 is the temperature To find the Excel output the following steps can be taken:
Step +" .rom the menus choose Tools and click on #ata *nalysis
Step '" @hen #ata *nalysis dialog box appears, click on correlation
Step ," @hen correlation dialog box appears, enter )>:&>A in the input range box &lick on
/abels in first row and enter a>D in the output range box &lick on +<
&ases of \$oda Temperature
&ases of \$oda >
Temperature ?4DDB4FEE >
*s you see the correlation between the number of cases of soda demanded and the
temperature is a very strong positive correlation This means as the temperature increases the
demand for cases of soda is also increasing The linear correlation coefficient is ?4DDB4FBEE
which is very close to C>
#o3 lets follo3 same steps 2ut a 2it different to find t0e re.ression eCuation"
Step +" .rom the menus choose Tools and click on Data Analysis
Step ' @hen Data Analysis dialog box appears, click on regression
Step ," @hen Regression dialog box appears, enter b>:b>A in the y-range box and c>:c>A in
the x-range box &lick on labels
Step 7" Enter a>4 in the output range box
%ote: The regression eHuation in general should look like 6Qm R C b "n this eHuation m is
the slope of the regression line and b is its y-intercept
SUMMAR OUTPUT
!egression \$tatistics
'ultiple ! ?4DDB4FBEE
! \$Huare ?49A9>:F?4
\$tandard Error :4>49F9>4>
+bservations >9
A#-4A
df \$\$ '\$ . \$ignificance .
!egression > >999AE44F4 >999AE44F4 >BDAD?9A4E EBFB>>E-?F
!esidual >> 49EB?EF?9A FB::E4F:>9
Total >: >A:E:9?ED4
&oefficients
\$tandard
Error
t \$tat P-value /ower 4BL
0pper
4BL
"ntercept 4>EF??EDE BAABEA:F9D >DFB9BABFE ?>:??AAF?> -:F?E44EBD :>>DA?>
Temperature ?FE4:?:E>> ??E?:FF4: >:B?FA>>>D EBFB>>E-?F ?E:AA4EED9 >?994?F
The relationship between the number of cans of soda and the temperature is: 6 Q
?FE4:?:E>> R C 4>EF??EDE
The number of cans of soda Q ?FE4:?:E>>P3Temperature5 C 4>EF??EDE !eferring to this
expression we can approximately predict the number of cases of soda needed on Oune >A The
weather forecast for this is 4A degrees, hence the number of cans of soda needed is eHual to7
The number of cases of sodaQ?FE4:?:E>>P34A5 C 4>EF??EDE Q 4>F: or about 4: cases
Mo/in. A/era.e and Exponential Smoot0in.
'oving *verage 'odels: 0se the *dd Trendline option to analy2e a moving average
forecasting model in Excel 6ou must first create a graph of the time series you want to
analy2e \$elect the range that contains your data and make a scatter plot of the data +nce the
chart is created, follow these steps:
> &lick on the chart to select it, and click on any point on the line to select the
data series @hen you click on the chart to select it, a new option, &hart, s
: .rom the &hart menu, select *dd Trendline
The following is the moving average of order A for weekly sales:
Exponential \$moothing 'odels: The simplest way to analy2e a timer series using an
Exponential \$moothing model in Excel is to use the data analysis tool This tool works
almost exactly like the one for 'oving *verage, except that you will need to input the value
of a instead of the number of periods, k +nce you have entered the data range and the
damping factor, >- , and indicated what output you want and a location, the analysis is the
same as the one for the 'oving *verage model
Applications and #umerical Examples
Descripti/e Statistics% \$uppose you have the following, n Q >?, data:
>:, >B, :D, 9F, :A, >4, 9B, :B, :A, 9?
> Type your n data points into the cells *> through *n
: &lick on the JToolsJ menu 3*t the bottom of the JToolsJ menu will be a
submenu J#ata *nalysisJ, if the *nalysis Tool Pack has been properly
installed5
9 &licking on J#ata *nalysisJ will lead to a menu from which J#escriptive
\$tatisticsJ is to be selected
A \$elect J#escriptive \$tatisticsJ by pointing at it and clicking twice, or by
highlighting it and clicking on the J+kayJ button
B @ithin the #escriptive \$tatistics submenu,
a for the Jinput rangeJ enter J*>:#nJ, assuming you typed the data into cells *> to *n
b click on the Joutput rangeJ button and enter the output range J&>:&>DJ
c click on the \$ummary \$tatistics box
d finally, click on J+kayJ
)0e \$entral )endency% The data can be sorted in ascending order:
>:, >B, >4, :A, :A, :B, :D, 9?, 9B, 9F
The mean, median and mode are computed as follows:
3>: >B :D 9F :A >4 9B :B :A 9?5 G >? Q :AF
3:A C :B5 G : Q :AB
The mode is :A, since it is the only value that occurs twice
The midrange is 3>:C 9F5 G : Q :B
%ote that the mean, median and mode of this set of data are very close to each other This
suggests that the data is very symmetrically distributed
4ariance% The variance of a set of data is the average of the cumulative measure of the
sHuares of the difference of all the data values from the mean
The sample variance-based estimation for the population variance are computed differently
The sample variance is simply the arithmetic mean of the sHuares of the difference between
each data value in the sample and the mean of the sample +n the other hand, the formula for
an estimate for the variance in the population is similar to the formula for the sample
variance, except that the denominator in the fraction is 3n->5 instead of n (owever, you
should not worry about this difference if the sample si2e is large, say over 9? Compute an
estimate for the variance of the population, given the following sorted data:
>:, >B, >4, :A, :A, :B, :D, 9?, 9B, 9F mean Q :AF as computed earlier *n estimate for
the population variance is: s
:
Q > G 3>?->5 K 3>: - :AF5
:
C 3>B - :AF5
:
C 3>4 - :AF5
:
C 3:A
-:AF5
:
C 3:A - :AF5
:
C 3:B - :AF5
:
C 3:D - :AF5
:
C 39? - :AF5
:
C 39B -:AF5
:
C 39F - :AF5
:
N
Q 3> G 45 3>D9FA C ?4D?A C ?99DA C ???DA C ???DA C ????A C ??>AA C ?:E?A C >?A?A
C >EA:A5 Q ?DDFA
Therefore, t0e standard de/iation is s Q 3 ?DDFA 5
>G:
Q ?F>ED
Pro2a2ility and Expected 4alues% %ewsweek reported that Javerage takeJ for bank
robberies was M9,:AA but FB percent of the robbers were caught *ssuming D? percent of
those caught lose their entire take and A? percent lose half, graph the probability mass
function using ER&E/ &alculate the expected take from a bank robbery #oes it pay to be a
bank robberS
To construct the probability function for bank robberies, first define the random variable x,
bank robbery take "f the robber is not caught, x Q M9,:AA "f the robber is caught and
manages to keep half, x Q M>,D:: "f the robber is caught and loses it all, then x Q ? The
associated probabilities for these x values are ?>B Q 3> - ?FB5, ?9A Q 3?FB53?A5, and ?B> Q
3?FB53?D5 *fter entering the x values in cells *>, *: and *9 and after entering the
associated probabilities in )>, ):, and )9, the following steps lead to the probability mass
function:
> &lick on &hart@i2ard The J&hart@i2ard \$tep > of AJ screen will appear
: (ighlight J&olumnJ at J&hart@i2ard \$tep > of AJ and click J%extJ
9 *t J&hart@i2ard \$tep : of A &hart \$ource #ata,J enter JQ)>:)9J for J#ata
range,J and click JcolumnJ button for J\$eries inJ * graph will appear &lick
on JseriesJ toward the top of the screen to get a new page
A *t the bottom of the J\$eriesJ page, is a rectangle for J&ategory 3R5 axis
labels:J &lick on this rectangle and then highlight *>:*9
B *t J\$tep 9 of AJ7 move on by clicking on J%ext,J and at J\$tep A of AJ, click on
J.inishJ
The expected value of a robbery is M>,?9F?F
E3R5 Q 3?53?B>5C3>D::53?9A5 C 39:AA53?>B5 Q ? C BB>AF C AFDD? Q >?9F?F
The expected return on a bank robbery is positive +n average, bank robbers get M>,?9F?F
per heist "f criminals make their decisions strictly on this expected value, then it pays to rob
banks * decision rule based only on an expected value, however, ignores the risks or
variability in the returns "n addition, our expected value calculations do not include the cost
of Iail time, which could be viewed by criminals as substantial
Discrete H \$ontinuous 5andom 4aria2les%
Binomial Distri2ution Application% * multiple choice test has four unrelated Huestions
Each Huestion has five possible choices but only one is correct Thus, a person who guesses
randomly has a probability of ?: of guessing correctly #raw a tree diagram showing the
different ways in which a test taker could get ?, >, :, 9 and A correct answers \$ketch the
probability mass function for this test @hat is the probability a person who guesses will get
two or more correctS
Solution% /etting 6 stand for a correct answer and % a wrong answer, where the probability
of 6 is ?: and the probability of % is ?F for each of the four Huestions, the probability tree
diagram is shown in the textbook on page >F: This probability tree diagram shows the
JbranchesJ that must be followed to show the calculations captured in the binomial mass
function for n Q A and Q ?: .or example, the tree diagram shows the six different branch
systems that yield two correct and two wrong answers 3which corresponds to AYG3:Y:Y5 Q D
The binomial mass function shows the probability of two correct answers as
P3x Q : Z n Q A, p Q ?:5 Q D3:5:3F5: Q D3??:BD5 Q ?>B9D Q P3:5
@hich is obtained from excel by using the J)"%+'#"\$TJ &ommand, where the first entry is
x, the second is n, and the third is mass 3?5 or cumulative 3>57 that is, entering
Q)"%+'#"\$T3:,A,?:,?5 "% *%6 ER&E/ &E// 6"E/#\$ ?>B9D *%#
Q)"%+'#"\$T39,A,?:,?5 6"E/#\$ P3xQ9ZnQA, p Q ?:5 Q ??:BD
Q)"%+'#"\$T3A,A,?:,?5 6"E/#\$ P3xQAZnQA, p Q ?:5 Q ???>D
Q>-)"%+'#"\$T3>,A,?:,>5 6"E/#\$ P3x : Z n Q A, p Q ?:5 Q ?>F?F
#ormal Example% "f the time reHuired to complete an examination by those with a certain
learning disability is believed to be distributed normally, with mean of DB minutes and a
standard deviation of >B minutes, then when can the exam be terminated so that 44 percent of
those with the disability can finishS
Solution% )ecause the average and standard deviation are known, what needs to be
established is the amount of time, above the mean time, such that 44 percent of the
distribution is lower This is a distance that is measured in standard deviations as given by the
T value corresponding to the ?44 probability found in the body of *ppendix ), Table B,as
shown in the textbook +! the commands entered into any cell of Excel to find this T value is
Q%+!'"%,3?44,?,>5 for :9:D9A:
The closest cumulative probability that can be found is ?44?>, in the row labeled :9 and
column headed by ?9, T Q :99, which is only an approximation for the more exact :9:D9A:
found in Excel 0sing this more exact value the calculation with mean and standard
deviation in the following formula would be
T Q 3 R - 5 G
That is, T Q 3 x - DB5G>B
Thus, x Q DB C >B3:9:D9A5 Q 444 minutes
*lternatively, instead of standardi2ing with the T distribution using Excel we can simply
work directly with the normal distribution with a mean of DB and standard deviation of >B and
enter JQ%+!'"%,3?44,DB,>B5J "n general to obtain the x value for which alpha percent of
a normal random variable=s values are lower, the following J%+!'"%,J command may be
used, where the first entry is , the second is , and the third is
Anot0er Example% "n the early >4F?s, the Toro &ompany of 'inneapolis, 'innesota,
advertised that it would refund the purchase price of a snow blower if the following winter=s
snowfall was less than :> percent of the local average "f the average snowfall is AB:B
inches, with a standard deviation of >:: inches, what is the likelihood that Toro will have to
make refundsS
Solution% @ithin limits, snowfall is a continuous random variable that can be expected to
vary symmetrically around its mean, with values closer to the mean occurring most often
Thus, it seems reasonable to assume that snowfall 3x5 is approximately normally distributed
with a mean of AB:B inches and standard deviation of >:: inches %ine and one half inches
is :> percent of the mean snowfall of AB:B inches and, with a standard deviation of >::
inches, the number of standard deviations between AB:B inches and 4B inches is T:
T Q 3 x - 5 G s Q 34B? - AB:B5G>:: Q -:49
0sing *ppendix ), Table B, the textbook demonstrates the determination of P3x 4B?5 Q P32
-:495 Q ?>E, the probability of snowfall less than 4B inches 0sing Excel, this normal
probability is obtained with the J%+!'#"\$TJ command, where the first entry is x, the
second is mean , the third is standard deviation s, and the fourth is &0'0/*T",E 3>5
Entering
Q%+!'#"\$T34B,AB:B,>::,>5, -ives P3 x 4B?5 Q ???>D49
Samplin. Distri2ution and t0e \$entral !imit )0eorem % * bakery sells an average of :A
loaves of bread per day \$ales 3x5 are normally distributed with a standard deviation of A
"f a random sample of si2e n Q > 3day5 is selected, what is the probability this x value will
exceed :FS
"f a random sample of si2e n Q A 3days5 is selected, what is theprobability that xbar :FS
@hy does the answer in part > differ from that in part :S
Solutions%
> The sampling distribution of the sample mean xbar is normal with a mean of :A and a
standard error of the mean of A Thus, using Excel, ?>BFDD Q>-%+!'#"\$T3:F,:A,A,>5
: The sampling distribution of the sample mean xbar is normal with a mean of :A and a
standard error of the mean of : using Excel, ??::EB Q>-%+!'#"\$T3:F,:A,:,>5
5e.ression Analysis% The highway deaths per >?? million vehicle miles and highway speed
limits for >? countries, are given below:
3#eath, \$peed5 Q 39?, BB5, 399, BB5, 39A, BB5, 39B, E?5, 3A>, BB5, 3A9, D?5, 3AE, BB5, 3A4,
D?5, 3B>, D?5, and 3D>, EB5
.rom this we can see that five countries with the same speed limit have very different
positions on the safety list .or example, )ritain with a speed limit of E? is demonstrably
safer than Oapan, at BB &an we argue that, speed has little to do with safety 0se regression
Solution% Enter the ten paired y and x data into cells *: to *>> and ): to )>>, with the
JdeathJ rate label in *> and JspeedJ limits label in )>, the following steps produce the
regression output
&hoose J!egressionJ from J#ata *nalysisJ in the JToolsJ menu The !egression dialog box
will will appear
%ote: 0se the mouse to move between the boxes and buttons &lick on the desired box or
button The large rectangular boxes reHuire a range from the worksheet * range may be
typed in or selected by highlighting the cells with the mouse after clicking on the box "f the
dialog box blocks the data, it can be moved on the screen by clicking on the title bar and
dragging
.or the J"nput 6 !ange,J enter *> to *>>, and for the J"nput R !angeJ enter )> to )>>
)ecause the 6 and R ranges include the J#eathJ and J\$peedJ labels in *> and )>, select the
J/abelsJ box with a click
&lick the J+utput !angeJ button and type reference cell, which in this demonstration is *>9
To get the predicted values of 6 3#eath rates5 and residuals select the J!esidualsJ box with a
click
6our screen display should show a Table, clicking J+<J will give the J\$0''*!6
+0TP0T,J J*%+,*J *%# !E\$"#0*/ +0TP0TJ
The first section of the ER&E/ printout gives J\$0''*!6 +0TP0TJ The J'ultiple !J is
the sHuare root of the J! \$Huare7J the computation and interpretation of which we have
already discussed The J\$tandard ErrorJ of estimate 3which will be discussed in the next
chapter5 is s Q ?FDA:9, which is the sHuare root of J!esidual \$\$J Q B4EB>> divided by its
degrees of freedom, df Q F, as given in the J*%+,*J section @e will also discuss the
adIusted !-sHuare of ?:>9:B in the following chapters
0nder the J*%+,*J section are the estimated regression coefficients and related statistics
that will be discussed in detail in the next chapter .or now it is sufficient to recogni2e that
the calculated coefficient values for the slope and y intercept are provided 3b Q ??EBBD and a
Q -?:49995 %ext to these coefficient estimates is information on the variability in the
distribution of the least-sHuares estimators from which these specific estimates were drawn:
the column titled J\$td ErrorJ contains the standard deviations 3standard errors5 of the
intercept and slope distributions7 the Jt-ratioJ and JpJ columns give the calculated values of
the t statistics and associated p-values *s shown in &hapter >9, the t statistic of >FBABF and
p-value of ?>??EE, for example, indicates that the sample slope 3??EBBD5 is sufficiently
different from 2ero, at even the ?>? two-tail Type " error level, to conclude that there is a
significant relationship between deaths and speed limits in the population This conclusion is
contrary to assertion that Jspeed has little to do with safetyJ
SUMMA5F -U)PU)% 'ultiple ! Q ?BAF99, ! \$Huare Q ?9??DE, *dIusted ! \$Huare Q
?:>9:B, \$tandard Error Q ?FDA:9, +bservations Q >?
*%+,* df \$\$ '\$ . P-value
!egression > :BDFF4 :BDFF4 9A94AB ?>??EE
Total 4 FBAA??
&oeffs Estimate \$td Error T \$tat P-value /ower 4BL 0pper 4BL
"ntercept -?:4999 :AB4D9 -?>>4:D ?4?F?> -B4DB:D B9EFD?
\$peed ??EBBD ??A?EA >FBABF ?>??EE -??>F94 ?>D4B?
5esidual -utput%
Predicted 5esiduals
9FD::: -?FD:::
9FD::: -?BD:::
A44BBD ->A4BBD
9FD::: ?:9EEF
A:A??? ??D???
9FD::: ?F9EEF
A:A??? ?DD???
A:A??? ?FD???
B9E999 ?E:DDE
E-!a2s to Fully Understand Statistical \$oncepts
)0e 4alue of Performin. Experiment% "f the learning environment is focused on
background information, knowledge of terms and new concepts, the learner is likely to learn
that basic information successfully (owever, this basic knowledge may not be sufficient to
enable the learner to carry out successfully the on-the-Iob tasks that reHuire more than basic
knowledge Thus, the probalility of making real errors in the business environment is high
+n the other hand, if the learning environment allows the learner to experience and learn
from failures within a variety of situations similar to what they would experience in the Jreal
worldJ of their Iob, the probalility of having similar failures in their business environment is
low This is the realm of simulations-a safe place to fail
The appearance of statistical software is one of the most important events in the process of
decision making under uncertainty \$tatistical software systems are used to construct
examples, to understand the existing concepts, and to find new statistical properties +n the
other hand, new developments in the process of decision making under uncertainty often
motivate developments of new approaches and revision of the existing software systems
\$tatistical software systems rely on a cooperation of statisticians, and software developers
)eside the statistical software, Ia/a Applets, -nline statistical computation, and the use of
a scientific calculator is reHuired for the course * \$cientific &alculator is the one, which has
capability to give you, say, the result of sHuare root of B *ny calculator that goes beyond the
A operations is fine for this course These calculators allow you to perform simple
calculations you need in this course, for example, enabling you to take sHuare root, to raise e
to the power of say, ?9D and so on These types of calculators are called general \$cientific
&alculators There are also more specific and advanced calculators for mathematical
computations in other areas such as .inance, *ccounting, &ivil Engineering, and even
\$tatistics The last one, for example, computes mean, variance, skewness, and kurtosis of a
sample by simply entering all data one-by-one and then pressing any of the mean, variance,
skewness, and kurtosis keys
@ithout a computer one cannot perform any realistic statistical data analysis
\$tudents who are signing up for the course are expected to know the basics of
*s a starting point, you need visiting the Excel @eb site created for this
course
This section is a part of the Oava\$cript E-labs learning tools for decision
making The following is a classification of statistical Oava\$cript by their
application areas:
ME#U
+" Summari9in. Data
)ivariate \$ampling \$tatistics
#escriptive \$tatistics
#etermination of the +utliers
Empirical #istribution .unction
(istogram
The Three 'eans
'" \$omputational pro2a2ility
&ombinatorial 'aths
&omparing Two !andom ,ariables
'ultinomial #istributions
P-values for the Popular #istributions
," 5eCuirements for most tests H estimations
!emoval of the +utliers
\$ample \$i2e #etermination
Test for (omogeneity of Population
Testing the 'ean
Testing the 'edians
Testing the ,ariance
8" -ne population H t3o or more /aria2les
The )efore-and-*fter Test for 'eans and ,ariances
The )efore-and-*fter Test for Proportions
&hi-sHuare Test for &rosstable !elationship
'ultiple !egressions
Polynomial !egressions
\$imple !egression with #iagnostic Tools
Testing the Population &orrelation &oefficient
>" )3o populations H one /aria2le
&onfidence "ntervals for Two Populations
<-\$ Test for EHuality of Two Populations
Test for %ormality
Test for !andomness
7" -ne population H one /aria2le
)inomial Exact &onfidence "ntervals
-oodness-of-.it for #iscrete ,ariables
'ean, and ,ariance &onfidence "ntervals
Two Populations Testing 'eans [ ,ariances
=" Se/eral populations H one or more /aria2les
*%+,*: Testing EHuality of the 'eans
&ompatibility of 'ulti-&ounts
EHuality of 'ulti-variances: The )artlett=s Test
"dentical Populations Test for &rosstable #ata
Testing the Proportions
Testing \$everal &orrelation &oefficients
&nterestin. and Useful Sites
*dd-ins for Excel
*nalyse-"t for 'icrosoft Excel
A selection of
Z )0)/ &atalogueZ )usiness and Economics 3)i2Ged5Z )usiness [ .inanceZ
)usiness [ "ndustrialZ )usiness %ationZ Education @orldZ Economics /T\$%Z
'ath.orumZ 'aths, \$tats [ +! %etworkZ 'E!/+TZ \$ocial \$cienceZ \$tatistics
[ +perational !esearchZ \$tatistics %etworkZ \$tatistics on the @ebZ \$urf\$tatZ
0niversity of &ambridgeZ ,irtual /earning !esource &entreZ ,irtual /ibraryZ
@ebEcZ
)ack to
)usiness \$tatistics
The &opyright \$tatement: The fair use, according to the >44D .air 0se
-uidelines for Educational 'ultimedia, of materials presented on this @eb
site is permitted for non-commercial and classroom purposes only
This site may be mirrored intact 3including these notices5, on any server with
public access *ll files are available at
http:GGwwwmirrorserviceorgGsitesGhomeubalteduGntsbarshG)usiness-stat for
mirroring