Anda di halaman 1dari 7

1.

Stat

Population

Probability

Sample

If we know #:
Stats

Stats: The science of gaining information and making inferences


about real world situations based on data.
Quantitative: measures or records a numerical quantity
Categorical divides sample points into groups. One or two
categories.
Cases are units in sample/population
Variables are measurements/data
1.2 Population into sample.
A sample is representative when population looks like many copies
of itself.
Goal: similar to population but smaller.
SRS (simple random sample) Every set of n in population has equal
chance of being selected.
Sample Bias. Elements of population are more or less likely of being
selected.
Under coverage: Fail to consider some units when selecting a
sample.
Nonresponse: Unable to contact or refuse to participate.
Response bias: May get inaccurate answer.
Wording bias: Respondent may be confused or deliberately
manipulated by how question is asked.
Populatio
n

Sampling bias?
Sample

Population includes
interest. Data is
a subset of the
with an n.

Other bias?
Data

all individuals or objects of


collected rom a sample that is
population. Size of population

Statistical inference. Process of using data from a sample to gain


information about population.
Population

Data
collection.
Sample

Stat inference.

1.3
Association: Two variables are as if values of one tend to be related
to values of other in regular way.
Smoking and lung cancer
Road salt and accidents.
Two variable association: Are casually associated changing value of
one influences another. Cause and effect.
Colder temperature and higher heating cost.

Association
Twovariablesareassociatedifvaluesofonevariabletendtoberelatedtothevaluesof
theothervariable.

Causation
Twovariablesarecausallyassociatedifchangingthevalueofonevariableinfluences
thevalueoftheothervariable.

Explain and responsive.

Fix this!!!!

Exercise and metal performance.


Confounding variable: 3rd variable associated with both explain and
responsive.
Heating and back pain.
Severe weather.
Statistical experiment. We actually control one or more explanatory
factors to observe the relationship with response variable.
Observational study: We collect data without attempting to
influence any of the variables.
Experimental units: Students in a class.
Possible explanatory factors. Such as music playing or type. Or
source of music.
Response variables: Quiz grade.
Comparative experiments: Look for differences in the response
variable among different treatments.
Randomization: The explanatory treatment are assigned at random
before the response is measured.
Control group is a set of experimental units, which get no treatment.
In a double blind experiment neither subject or experimenter know
which treatment is involved.
Matched pair design for a factor with two levels/ Identify pairs of
subjects with similar characteristics.
Two randomly assign one from each pair to each treatment.
Compare the responses within each pair.
Chapter 2.1

NotationforaProportion
Theproportionforasampleisdenoted andreadphat.
Theproportionforapopulationisdenotedp.

TwoWayTable

Atwowaytableisusedtoshowtherelationshipbetweentwocategoricalvariables.The
categoriesforonevariablearelisteddowntheside(rows)andthecategoriesforthe
secondvariablearelistedacrossthetop(columns).Eachcellofthetablecontainsthe
countofthenumberofdatacasesthatareinboththerowandcolumncategories.

Outliers
Anoutlierisanobservedvaluethatisnotablydistinctfromtheothervaluesinadataset.
Usually,anoutlierismuchlargerormuchsmallerthantherestofthedatavalues.

CommonShapesforDistributions
Adistributionshowninahistogramordotplotiscalled:
Symmetricifthetwosidesapproximatelymatchwhenfoldedonaverticalcenterline
Skewedtotherightifthedataarepiledupontheleftandthetailextendsrelativelyfar
outtotheright
Skewedtotheleftifthedataarepiledupontherightandthetailextendsrelativelyfar
outtotheleft
Bellshapedifthedataaresymmetricand,inaddition,havetheshapeshownin
Figure2.9(c)
Ofcourse,manyothershapesarealsopossible.

Mean
Themeanofthedatavaluesforasinglequantitativevariableisgivenby

Median
Themedianofasetofdatavaluesforasinglequantitativevariable,denotedm,is
themiddleentryifanorderedlistofthedatavaluescontainsanoddnumberofentries,or
theaverageofthemiddletwovaluesifanorderedlistcontainsanevennumberof
entries.
Themediansplitsthedatainhalf.23

Resistance
Ingeneral,wesaythatastatisticisresistantifitisrelativelyunaffectedbyextreme
values.Themedianisresistant,whilethemeanisnot.

DefinitionofStandardDeviation

Thestandarddeviationforaquantitativevariablemeasuresthespreadofthedataina
sample:

Thestandarddeviationgivesaroughestimateofthetypicaldistanceofadatavaluefrom
themean.Thelargerthestandarddeviation,themorevariabilitythereisinthedataand
themorespreadoutthedataare.

NotationfortheStandardDeviation
Thestandarddeviationofasampleisdenoteds,andmeasureshowspreadoutthedata
arefromthesamplemean .
Thestandarddeviationofapopulation37isdenoted,whichistheGreeklettersigma,
andmeasureshowspreadoutthedataarefromthepopulationmean.

NumberofStandardDeviationsfromtheMean:zScores
Thezscoreforadatavalue,x,fromasamplewithmean andstandarddeviationsis
definedtobe
Forapopulation, isreplacedwithandsisreplacedwith.
Thezscoretellshowmanystandarddeviationsthevalueisfromthemeanandis
independentoftheunitofmeasurement.

Percentiles
ThePthpercentileisthevalueofaquantitativevariablewhichisgreaterthanPpercent
ofthedata.39

FiveNumberSummary
Wedefine
where

Thefivenumbersummarydividesthedatasetintofourths:about25%ofthedatafall
betweenanytwoconsecutivenumbersinthefivenumbersummary.

RangeandInterquartileRange
Fromthefivenumbersummary,wecancomputethefollowingtwostatistics:

DetectionofOutliers
Asageneralruleofthumb,wecalladatavalueanoutlierifitis
SmallerthanQ11.5(IQR)orLargerthanQ3+1.5(IQR)
Correlation
Thecorrelationisameasureofthestrengthanddirectionoflinearassociationbetween
twoquantitativevariables.

NotationfortheCorrelation
Thecorrelationbetweentwoquantitativevariablesofasampleisdenotedr.
Thecorrelationbetweentwoquantitativevariablesofapopulationisdenoted,whichis
theGreekletterrho.
TheregressionlinetopredictyfromxisNOTthesameastheregressionlineto
predictxfromy.Besuretoalwayspayattentiontowhichistheexplanatoryvariableand
whichistheresponsevariable!Aregressionlineisalwaysintheform
Theresidualatadatavalueisthedifferencebetweentheobservedandpredictedvalues
oftheresponsevariable:
Onascatterplot,theresidualrepresentstheverticaldeviationfromthelinetoadata
point.Pointsabovethelinewillhavepositiveresidualsandpointsbelowthelinewill
havenegativeresiduals.Ifthepredictedvaluescloselymatchtheobserveddatavalues,
theresidualswillbesmall.
Fortheregressionline
,
Theslopebrepresentsthepredictedchangeintheresponsevariableygivenaoneunit
increaseintheexplanatoryvariablex.
Theinterceptarepresentsthepredictedvalueoftheresponsevariableyiftheexplanatory
variablexiszero.Theinterpretationmaybenonsensicalsinceitisoftennotreasonable
fortheexplanatoryvariabletobezero.

StatisticalInference
Statisticalinferenceistheprocessofdrawingconclusionsabouttheentirepopulation
basedontheinformationinasample.

ParametersvsStatistics
Aparameterisanumberthatdescribessomeaspectofapopulation.
Astatisticisanumberthatiscomputedfromthedatainasample.

PointEstimate
Weusethestatisticfromasampleasapointestimateforapopulationparameter.

SampleSizeMatters!
Asthesamplesizeincreases,thevariabilityofsamplestatisticstendstodecreaseand
samplestatisticstendtobeclosertothetruevalueofthepopulationparameter.

IntervalEstimateBasedonaMarginofError
Anintervalestimategivesarangeofplausiblevaluesforapopulationparameter.One
commonformofanintervalestimateis
Pointestimatemarginoferror
wherethemarginoferrorisanumberthatreflectstheprecisionofthesamplestatistic
asapointestimateforthisparameter.

ConfidenceInterval
Aconfidenceintervalforaparameterisanintervalcomputedfromsampledatabya
methodthatwillcapturetheparameterforaspecifiedproportionofallsamples.
Thesuccessrate(proportionofallsampleswhoseintervalscontaintheparameter)is
knownastheconfidencelevel.

95%ConfidenceIntervalUsingtheStandardError
IfwecanestimatethestandarderrorSEandifthesamplingdistributionisrelatively
symmetricandbellshaped,a95%confidenceintervalcanbeestimatedusing
Statistic2SE

Anda mungkin juga menyukai