Statistics

UNIT 3
SYLLABUS:
• Hypothesis testing: Hypothesis testing: one tailed and two tailed tests for means
of small sample (t-test)- F-test – one way and two way analysis of variance
(ANOVA) – chi-square test for simple sample standard deviation, independence
of attributes and goodness of fit.
Sample Design
All the items under consideration in any field constitute a

“Universe” or “Population”
A complete enumeration of all the items in the
“population” is known as a “census enquiry”
Since a complete census enquiry is not possible generally,
we select a ‘sample’ – a few items from the “universe”
for our study
Researcher selects the sample by using ‘sampling design’ –
a definite plan determined before any data is actually
collected
Types of Sampling
Probability sampling techniques:
1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster/area Sampling
5. Multi-stage Sampling
Types of Sampling
Non-Probability sampling techniques:
1. Deliberate Sampling
2. Quota Sampling
3. Sequential Sampling
4. Snowball sampling
5. Panel samples
Sampling Techniques
• Sample: A sample can be defined as a part of
the target population that represents the total
population.
• Sampling Process:
1. Define the population.
2. Identify the sampling frame.
3. Specify the sampling unit.
4. Selection of sampling method.
Sampling Techniques
5. Determination of sample size.
6. Specify the sampling plan.
7. Selection of samples.
Factors for determination of sample size for a
survey/research:
1. Inappropriate sampling frame.
2. Defective measuring device.
3. Non-respondents.
Sampling Techniques
4. Indeterminancy principle.
5. Natural bias in the reporting of data.
Sampling errors: these are the random
variations in the sample estimates around the
true population estimates. It decreases with
the increase in the size of the sample and it
happens to be of smaller magnitude in case of
homogeneous population.
Determination of Sample Size
• To determine the sample size for a pilot study the
following formula is used:
• Sample size, n = {(σZα) / e}2
Where,
σ represents the SD of the population, Zα the value
is determined with the level of confidence of the
researcher and e represents the error expected in
the study
Hypothesis Testing
• Hypothesis: in statistics, hypothesis is referred
to as a statement characterising the
population that the researcher wishes to
verify on the basis of available sample
information.
• Hypothesis testing: it is a process in which
choice is made between the two actions, i.e.,
either accept or reject the presumed
statement.
Hypothesis Testing
• Terminologies:
1. Null hypothesis: it is a statement about the
population whose credibility or validity the
researcher wants to assess based on the
sample. This is formulated specifically to test
for possible rejection or nullification. It always
states ‘no difference’. The researchers main
aim is tested with this statement only. Eg:
there is no significant difference in the
customers opinion on opening walmart
outlets in chennai city.
Hypothesis Testing
2. Alternative hypothesis: the conclusion that
we accept when the data fail to support nul
hypothesis. Eg: the customers prefer kirana
shops rather than established outlets.
3. Significance level: it is a percentage value that
is the probability of rejecting the null
hypothesis when it is true. Normally, 5% and
1% significance values are considered for
evaluation.
Hypothesis Testing
4. One-tailed test: a hypothesis test in which
there is only one rejection region, i.e., we are
concerned with whether the observed value
deviates from the hypothesised value in one
direction.
5. Two-tailed test: a hypothesis test in which the
null hypothesis is rejected and if the sample
value is significantly higher or lower than the
hypothesised value. It is the test that invloves
both the rejection regions
Hypothesis Testing
• Types of hypothesis:
Descriptive hypothesis.
Relational hypothesis.
Working hypothesis.
Null hypothesis.
Analytical hypothesis.
Statistical hypothesis.
Common sense hypothesis.
Simple and composite hypothesis.
Hypothesis Testing
• Sources of hypothesis:
1. Theory.
2. Observation.
3. Past experience.
4. Case studies.
5. Similarity.
Steps involved in Hypothesis Testing
1. Formulate the hypothesis.
2. Select the level of significance.
3. Find the critical region.
4. Select an appropriate test.
5. Calculate the value.
6. Obtain the critical test value.
7. Make decisions.
Errors in hypothesis testing
1.Type 1 error- Hypothesis is true but your test
rejects it.
2. Type 2 error- Hypothesis is false but your test
accepts it.
Level of significance and confidence
• Significance means the percentage risk to
reject a null hypothesis when it is true and it is
denoted by 𝛼. Generally taken as 1%, 5%, 10%
• (1−𝛼)is the confidence interval in which the
null hypothesis will exist when it is true.
Two tailed test at 10
5% Significance level
Acceptance and Rejection

regions in case of a Two Suitable When 𝐻0: 𝜇 = 𝜇0
tailed test 𝐻𝑎: 𝜇 ≠ 𝜇0
𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛

𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜 /𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
/𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
(𝛼 = 0.025 𝑜𝑟 2.5%) 𝑛 (𝛼 = 0.025 𝑜𝑟 2.5%)
𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
(1 − 𝛼) = 95%
𝐻0: 𝜇 = 𝜇0
Left tailed test at 11

regions in case of a left tailed Suitable When 𝐻0: 𝜇 = 𝜇0
test 𝐻𝑎 : 𝜇 < 𝜇
0
𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜

/𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑛
(𝛼 = 0.05 𝑜𝑟 5%) 𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (1 −
𝛼) = 95%
𝐻0: 𝜇 = 𝜇0
Right tailed test at 12

regions in case of a Right Suitable When 𝐻0: 𝜇 = 𝜇0
tailed test 𝐻𝑎 : 𝜇 > 𝜇
0
𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛

𝑛 /𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
𝑜𝑟 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (𝛼 = 0.05 𝑜𝑟 5%)
(1 − 𝛼) = 95%
𝐻0: 𝜇 = 𝜇0
HYPOTHESIS TESTING PROCEDURES
Z-test (Large Samples)
• The z-test is a hypothesis test in which the z-statistic follows
a normal distribution.
• The z-test is best used for greater than 30 samples because,
under the central limit theorem, as the number of samples
gets larger, the samples are considered to be approximately
normally distributed.
• A z-test is a statistical test used to determine whether two
population means are different when the variances are
known and the sample size is large.
t-test (Small Samples)
• A t-test is an analysis of two populations means through the
use of statistical examination; a t-test with two samples is
commonly used with small sample sizes, testing the difference
between the samples when the variances/SD of two normal
distributions are not known.
• A t-test looks at the t-statistic, the t-distribution and degrees
of freedom to determine the probability of difference
between populations for hypothesis testing.
• T Test is often called Student's T test in the name of its
founder "Student".
t-test : Test for a specified mean
Two tailed test hypothesis:
• H0 : μ=μ0
• H1 : μ≠μ0
• Test Statistic,
• Where S = √{(ns2) / (n-1)}

Inference:
• Table Value: (n-1) is the degrees of freedom for the distribution. This value is used to
find the table value for the given level of significance.
• If the calculated value is less than the table value at 5% or 1% significance value, Null
Hypothesis is accepted.
• If the calculated value is more than the table value at 5% or 1% significance value, Null
Hypothesis is rejected. So, the alternative hypothesis will be accepted in that case.
• Note: One tailed test is performed the same way the difference is with the
representation in hypothesis and the table values for significance level.
t-test : Test of significance for the difference between
two population means when the population SD’s are
not known
• H0 : μ1=μ2
• H1 : μ1≠μ2
• Test Statistic,
Where Sp = √{(n1 s1 2 + n2 s2 2) / (n1 + n2 -2)}

Inference:
• Table Value: (n-1) is the degrees of freedom for the distribution. This value is used to find the
table value for the given level of significance.
• If the calculated value is less than the table value at 5% or 1% significance value, Null
• If the calculated value is more than the table value at 5% or 1% significance value, Null
Hypothesis is rejected. So, the alternative hypothesis will be accepted in that case.
• Note: One tailed test is performed the same way the difference is with the representation in
hypothesis and the table values for significance level.
t-test – Paired Observations
• Condition of independence may not hold good for all samples.
When the samples are related to each other t-test can be
performed for small samples on converting the given sample into
single data type by taking the difference. So, the formula will be:
Where d=x-y and Sd represents the S.D of the population(here).

Note: If Sd value is taken from the sample then the denominator will
be Sd/ √n-1
• Inference is similar to the previous t-tests discussed.
F-test
• This test is based on the test statistic that
follows F-distribution.
• This F-test is used to check the equality of two
population variances.
• H0 : σ12=σ22
• H1 : σ12≠σ22
F-test
• Test Statistic,
• The value of F is calculated such that it is always

greater than 1
F-test
Table value Calculations:
• The value of (n1 – 1) degrees of freedom represents the
row & the value of (n2 – 1) degrees of freedom
represents the column.
• With this table value the final interpretation is made.
• If the calculated value is less than the table value, Null
• If the calculated value is more than the table value,
Null Hypothesis is rejected. So, the alternative
hypothesis will be accepted in that case.
ANOVA
• Analysis Of Variance.
• It is a technique to test equality of means when
more than 2 populations are considered.
• Between Sample Variation and within sample
variation.
• There are two types in this:
(i) One-way ANOVA &
(ii) Two-way ANOVA
One-Way Analysis of Variance
Methodology:
• Write down the Hypothesis for one way ANOVA i.e., H0 :
H1 :
1. Calculate N (Total No.of Observations)
2. Calculate T (Total of all the observations)
3. Calculate Correction Factor T2/N.
4. Calculate Sum of Squares:
(i) Total Sum of Squares:
SST = [∑X12 + ∑ X22 + ……. +∑ Xn2] – T2/N
(ii) Column Sum of Squares:
SSC = {[(∑X1 ) 2 /N1]+ [(∑X2) 2/N1] + ……. +[(∑Xn ) 2/N1]} – T2/N
Where N1 refers to the No.of elements in each column
5. Prepare ANOVA table and Calculate F-ratio
(F-value is calculated such that F>1)
ANOVA TABLE
SOURCE OF SUM OF DEGREES OF MEAN SUM VARIANCE
VARIATION SQUARES FREEDOM OF SQUARES RATIO
Between SSC c-1 MSC = F=

Columns SSC/ (c-1) MSC / MSE
Within SSE N-c MSE = Or

Columns SSE / (N-c) F=
(Errors) MSE/MSC
Total SST N-1
6. After calculating the F ratio value, final interpretation is
made on comparison with the respective table value.
Finding Table value:
1. (c-1) degrees of freedom value - Column.
2. (N-c) degrees of freedom value – Row.
Compare the respective table value with the calculated
value at 5% or 1% level of significance.
• If calculated value < table value, Null hypothesis is
accepted.
• If calculated value > table value, Null hypothesis is rejected
& Alternate hypothesis is accepted.
• Accordingly give the final interpretation in wordings.
Example- one way ANOVA
Example: 3 samples obtained from normal
populations with equal variances. Test the
hypothesis at 5% level of significance that sample
means are equal.
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
Solution: H0 : H1 :
X1 (X1) 2 X2 (X2 )2 X3 (X3 )2
8 64 7 49 12 144
10 100 5 25 9 81
7 49 10 100 13 169
14 196 9 81 12 144
11 121 9 81 14 196
Total 50 530 40 336 60 734
Number of observations , N= 15
Total sum of all observations , T= 50 + 40 + 60 = 150 Correction
factor = T2 / N=(150)2 /15= 22500/15=1500 Total sum of squares=
530+ 336+ 734 – 1500= 100
Sum of square b/w samples=(50)2/5 + (40)2 /5 + (60) 2 /5 - 1500=40
Sum of squares within samples= 100-40= 60
ANOVA Table
Between SSC = 40 c-1 = 3-1 = 2 MSC = F=
Columns SSC/ (c-1) MSC / MSE
=40/2 = 20 =20/5 = 4
(Since,
MSC>MSE)
Within SSE = 60 N-c = 15-3 = MSE =
Columns 12 SSE / (N-c) =
(Errors) 60/12 = 5
Total SST = 100 N-1 = 15-1 =
14
• F=4 (Calculated Value)

Solution…
• Table Value: V1 = 2 and V2 = 12 at 5% level of
significance = 3.89
• Calculated value>Table value, so Null
Hypothesis is rejected and Alternate
• So, the population means are not equal at 5%
level of significance.
Two-Way Analysis of Variance
Methodology:
• Write down the Hypothesis for one way ANOVA i.e.,
• H0 : There is no significant difference between column means as well as
between row means.
H1 : There is a significant difference between column means as well as
between row means.
1. Calculate N (Total No.of Observations)
2. Calculate T (Total of all the observations)
3. Calculate Correction Factor T2/N.
4. Calculate Sum of Squares:
(i) Total Sum of Squares:
SST = [∑X12 + ∑ X22 + ……. +∑ Xn2] – T2/N
(ii) Column Sum of Squares:
SSC = {[(∑X1 ) 2 /N1]+ [(∑X2) 2/N1] + ……. +[(∑Xn ) 2/N1]} – T2/N
Where N1 refers to the No.of elements in each column
(iii) Row Sum of Squares:
SSR = {[(∑Y1 ) 2/N2] + [(∑Y2) 2/N2] + ……. +[(∑Yn ) 2/N2]} – T2/N
Where N2 refers to the No.of elements in each row
5. Prepare ANOVA table and Calculate F-ratio
(F-value is calculated such that F>1)
ANOVA TABLE
Between SSC c-1 MSC = Fc = MSC/MSE

Columns SSC/ (c-1)
Between Rows SSR r-1 MSR = FR = MSR/MSE

SSR / (r-1)
Residual (Errors) SSE N-c-r+1 MSE =

SSE / (N-c-r+1)
Total SST N-1

6. After calculating the Fc and FRratio value, final interpretation is made on
comparison with the respective table value.
Finding Table value:
Fc
1. (c-1) degrees of freedom value - Column.
2. (N-c-r+1) degrees of freedom value – Row.
FR
1. (r-1) degrees of freedom value - Column.
2. (N-c-r+1) degrees of freedom value – Row.
Compare the respective table value with the calculated value at 5% or 1%
level of significance.
• If calculated value < table value, Null hypothesis is accepted.
• If calculated value > table value, Null hypothesis is rejected & Alternate
hypothesis is accepted.
• Accordingly give two final interpretation in wordings.
• Coding method is another method of solving
two-way ANOVA.
• In this method the first step is subtract the
given value from all the values in the data set
and then the regular methodology discussed
earlier will be followed.
NON-PARAMETRIC METHODS
• Non-parametric tests can be applied when:
– Data don’t follow any specific distribution and no assumptions
about the population are made. Distribution-free tests.
– Data measured on any scale.
• Commonly used Non Parametric Tests are:
− Chi Square test
− The Sign Test
− Wilcoxon Signed-Ranks Test
− Mann–Whitney U or Wilcoxon rank sum test
− The Kruskal Wallis or H test
− The Spearman rank correlation test
CHI SQUARE TEST
• First used by Karl Pearson
• Simplest & most widely used non-parametric
test in statistical work.
• Calculated using the formula-
χ2 = ∑ ( O – E )2
E
O = observed frequencies
E = expected frequencies
• Greater the discrepancy b/w observed & expected frequencies,

greater shall be the value of χ2.
• Calculated value of χ2 is compared with table value of χ2 for given
degrees of freedom.
CHI SQUARE TEST
• Application of chi-square test:
– Test of independence of attributes (disease &
treatment, vaccination & immunity)
– Test of proportions (compare frequencies of groups)
– The test for goodness of fit (determine if actual
numbers are similar to the expected/ theoretical
numbers)
CHI SQUARE TEST OF INDEPENDENCE
• H0: In the population, the two categorical
variables are independent OR there is no
relationship between the two variables.
• H1: In the population, two categorical variables
are dependent OR there is a relationship between
the two variables.
• Summarize the data in the two-way contingency
table. One column representing the observed
counts and another for expected counts.
• Calculating the Expected count from the
observed data set through a formula
CHI SQUARE TEST OF INDEPENDENCE
• Expected Count, E=[row total×column total] /
sample size.
• Then the table is expanded to calculate
χ2 = ∑ ( O – E )2
E
• The calculated value is compared with the table
value and final interpretations are made. For
Table Value: Degrees of freedom = (r - 1) (c - 1),
which represents the row value to be used under
significance column values.
CHI SQUARE TEST FOR GOODNESS OF
FIT
The chi-square goodness of fit test is appropriate
when the following conditions are met:
• The sampling method is simple random sampling
• The variable under study is categorical
• The expected value of the number of sample
observations in each level of the variable is at
least 5.
• This approach consists of four steps: (1) state the
hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.
CHI SQUARE TEST FOR GOODNESS OF
FIT - ANALYSIS
• Degrees of freedom: The degrees of freedom (DF) is equal to the number of
levels (k) of the categorical variable minus 1: DF = k - 1 .
• Expected frequency counts: The expected frequency counts at each level of
the categorical variable are equal to the sample size times the hypothesized
proportion from the null hypothesisEi = npi
where Ei is the expected frequency count for the ith level of the categorical
variable, n is the total sample size, and pi is the hypothesized proportion of
observations in level i.
• Test statistics:. The test statistic is a chi-square random variable (Χ2) defined
by the following equation.Χ2 = Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of the categorical
variable, and Ei is the expected frequency count for the ith level of the
categorical variable.
• P-value: The P-value is the probability of observing a sample statistic as
extreme as the test statistic.
• INTERPRETING THE FINAL RESULTS ON COMPARISON WITH THE TABLE VALUE
(USING DF VALUE)
CHI SQUARE TEST FOR A SPECIFIED
POPULATION VARIANCE OR STANDARD
DEVIATION
• To test a claim about the value of the variance
or the standard deviation of a population,
then the test statistic will follow a chi-square
distribution with n−1 degrees of freedom, and
is given by the following formula.
• χ2=(n-1)s2 / σ2 where n=Sample size, s=Sample
S.D and σ = Population S.D
Parametric vs Non-parametric
• Parametric tests => have information about
population, or can make certain assumptions
– Assume normal distribution of population.
– Data is distributed normally.
– population variances are the same.
• Non-parametric tests are used when there are no
assumptions made about population distribution
– Also known as distribution free tests.
– But info is known about sampling distribution.

Statistics

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics

Diunggah oleh

Hak Cipta:

Format Tersedia

UNIT 3

All the items under consideration in any field constitute a

Acceptance and Rejection

𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛

Acceptance and Rejection

𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜

Acceptance and Rejection

𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑟𝑒𝑔𝑖𝑜 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛

• Where S = √{(ns2) / (n-1)}

Where Sp = √{(n1 s1 2 + n2 s2 2) / (n1 + n2 -2)}

Where d=x-y and Sd represents the S.D of the population(here).

• The value of F is calculated such that it is always

Between SSC c-1 MSC = F=

Within SSE N-c MSE = Or

X1 (X1) 2 X2 (X2 )2 X3 (X3 )2

• F=4 (Calculated Value)

Between SSC c-1 MSC = Fc = MSC/MSE

Between Rows SSR r-1 MSR = FR = MSR/MSE

Residual (Errors) SSE N-c-r+1 MSE =

Total SST N-1

• Greater the discrepancy b/w observed & expected frequencies,

Anda mungkin juga menyukai