SGTBIMIT 01290201817

GGSIPU SGTBIMIT

1

TABLE OF CONTENTS

6 41-44

data

7

of employees data

2

LIST OF FIGURES

4. Box Plot 29

5. Sales Diagram 57

3

DATA SET 1: FREQUNCY DISTRIBUTION

Description:Workers

This data set consist of workers working in a small & medium enterprise in a

city of India.

Objective:

To calculate frequency distribution and present bar chart of education profile

of the workers.

IBM SPSS Statistical Version 21

a) Frequency Distribution

b) Bar- Chart

c) Pie- Chart

d) Cross tabs

4

Dataset of workers working in small & medium scale enterprises in city of India is shown

below in table.

S.No Gender Age Religion Education S.No Gender Age Religion Education

group group

1 1 1 3 2 26 1 5 3 2

2 1 4 2 1 27 1 1 1 2

3 1 3 3 4 28 1 5 2 2

4 1 3 1 3 29 1 1 2 4

5 2 4 1 1 30 1 5 2 2

6 1 4 1 1 31 1 2 3 5

7 2 2 1 1 32 1 3 2 1

8 1 2 3 1 33 2 2 2 2

9 1 2 2 1 34 1 5 2 1

10 2 2 2 2 35 2 5 1 2

11 1 3 1 2 36 2 5 2 3

12 1 3 1 3 37 2 2 3 4

13 1 4 1 4 38 2 5 2 3

14 2 1 2 3 39 1 3 3 3

15 1 5 2 2 40 1 5 2 2

16 2 2 2 2 41 1 2 1 1

17 1 1 1 5 42 1 2 3 1

18 1 5 1 5 43 1 3 2 1

19 1 5 2 2 44 1 5 2 5

20 1 2 2 5 45 1 2 1 2

21 2 5 2 2 46 2 5 2 3

22 1 2 2 1 47 2 2 1 2

23 1 2 3 1 48 1 3 3 4

24 1 2 1 5 49 1 4 2 4

25 2 5 2 5 50 2 1 1 1

Table 1.1: Data set of workers working in small & medium scale enterprises in city of India.

The coding details of different variables in the dataset are shown below in table 1.2

Gender 1=Male

2=Female

Age group 1=Less than 25 years

2=26-35 years

3=36-45 years

4=46-55 years

5= 56 & above

Religion 1=Hindu

2=Muslim

3=Other religion

Education 1=Below 10th

2=High school

3=Intermediate

4=Technical diploma

5=Degree level

Table1.2: Coding details of different variables

5

STEPS

STEP 1: In the VARIABLE VIEW select the heads of DATA VIEW. In this

case first head is ‘Gender’ and then choose the type,measure,values,labes etc

as done below.

STEP 2: Similarly assign the values of all the names of variable view as per

data given above.

6

STEP 3: Final output of variable view.

7

To convert data in the values click the convertor sign from the menu

bar(A arrow 1)

STEP 4: Analyze< Descriptive Statistics< Frequencies

8

STEP 5: From the new dialogue box choose any one label and transfer it to

the variable box.

9

RESULT

FREQUENCIES

Statistics

EDUCATION

Frequency Percent Valid Cumulative

Percent Percent

BELOW TENTH 14 28.0 28.0 28.0

GRADE

HIGH SCHOOL 16 32.0 32.0 60.0

INTERMEDIATE 7 14.0 14.0 74.0

Valid

TECHNICAL 6 12.0 12.0 86.0

DIPLOMA

DEGREE LEVEL 7 14.0 14.0 100.0

Total 50 100.0 100.0

Table 1.3: Shows the frequency distribution

10

STEP 7: From the gallery of new dialogue box select the type of graph

(here bar graph is selected).

STEP 8: Drag education to the x-axis and count will appear on the y axis.

11

Bar graph will apper like this.

12

Fig:1.2: Pie chart of education in count and percentage.

13

STEP 11: Select the two variables.

RESULT

Count

Gender Total

Male Female

Below tenth 11 3 14

grade

High school 10 6 16

Education

Intermediate 3 4 7

Technical diploma 5 1 6

Degree level 6 1 7

Total 35 15 50

Table 1.4: Shows the cross tab of gender and education.

14

CONCLUSION

workers in each of education category in summarized manner.

B. Fig1.1: Bar diagram graphically shows count of workers in each

education category.

C. Fig 1.2: Pie Chart concludes that education level of 50 workers are

calculated and found that the number of workers-Below 10th grade is

28%, High school is 32%, Intermediate is 14%, Technical Diploma is 12% &

Degree level is 14%.

D. Table 1.4: Cross tabulation of gender and education shows interaction

between these two. For example there are total 14 workers below tenth

grade 11 of them are male and 3 are female.

Hence through frequency distribution we find that majority number of workers had done

their education till high school level which is 32% (16 of total number of 50 workers).

15

DATA SET 2: MEAN, MODE& RANGE

Introduction: Ice Melt

In this data set we will estimate or predict the time as and when will the ice melt upon the

river next year by observing and understanding past data.

Objectives:

1. To determine mean and mode hour of the day in which ice melts using variable like

2. To determine the hour range for the ice melts and to determine in which month ice

1. Mean

2. Mode

3. Bar chat

4. Pie chart

5. Range

16

STEPS

17

18

19

STEP 2: To find mean, mode hour of the day. Analyze<Descriptive Statistics

< Frequencies.

STEP 3: Select the mean and mode hour that to find out the central tendency.

20

RESULT

Statistics

hour of the day

Valid 90

N

Missing 0

Mean 14.60

Mode 13

Table 2.1: Shows Mean & Mode hour of the days in which ice melt.

Descriptive Statistics

N Minimum Maximum Mean Std.

Deviation

hour of the day 90 5 23 14.60 4.069

Valid N (list 90

wise)

Table 2.2: Shows the descriptive statistics of hour of day of ice melt.

STEP 4: To determine hour range, using bar graph. Graphs< Legacy Dialogue < Bar.

21

STEP 5: For cluster bar graph choose hour for category axis and month for cluster.

RESULT

Fig2.1: Cluster bar diagram for hour of ice melt corresponding to months.

22

STEP 6: To determine in which month ice melts most. Graphs< Legacy Dialogue< Pie.

23

Pie Chart for hour of the day in which ice melt in count not in percentage.

Fig 2.2: Shows the pie chart for hours of ice melt in count.

24

Key Findings

2. Table 2.2 shows the descriptive statistics that is minimum hour are 5 and maximum

hour for ice melt are 23. And S.D. is 4.069 i.e. the amount of dispersion of ice melting

hour.

3. The mode month of ice melt is the May that is 5 month and mode hour of ice melt is

13 that is 1 PM.

25

DATA SET 3: OUTLIERS

Description: Sports

This data set consist of male and female of different age category and the time(hours) they

spend on playing their favourate outdoor sport.

Outlier:

An outlier may be due to variability in the measurement or it may indicate

experimental error; the latter are sometimes excluded from the data set.

An outlier can cause serious problems in statistical analysis.

usually the presence of an outlier indicates some sort of problem.

This can be a case which does not fit the model under study or an error in

measurement.

Objective:

a) To find the outlier (if any)

b) To understand the effect of outlier on the measurement and replace it do that distorted

data could be rectified.

c) Correcting the data.

IBM SPSS Statistical Version 21.

Extreme Value and Stem & leaf

Box Plot

26

STEPS

STEP 1: In the VARIABLE VIEW select the heads of DATA VIEW. In this case first head

is ‘Gender’ and then choose the type,measure,values,labes etc as done below.Values given are

1 for male and 2 for female.

27

RESULT as per data view.

28

STEP 4: For outlay,

Analyze< Descriptive statistics< Explore

STEP 5: From the new dialogue box drag any name (say hour spend for playing) into

dependent list. Click statistics.

29

STEP 6: From the new dialogue box click OUTLIERS. Click continue.

Result of outliers.

Cases

Valid Missing Total

N Percent N Percent N Percent

hours spent for 30 100.0% 0 0.0% 30 100.0%

playing

30

Table 3.1: Shows the summary of data

Descriptives

Mean 3.033 .4082

95% Confidence Interval Lower Bound 2.198

for Mean Upper Bound 3.868

5% Trimmed Mean 2.759

Median 2.750

Variance 4.999

hours spent for

Std. Deviation 2.2358

playing

Minimum .5

Maximum 13.0

Range 12.5

Interquartile Range 2.1

Skewness 3.146 .427

Kurtosis 13.662 .833

Table3.2: Shows all the descriptive if the data.

Extreme Values

Case Number Value

1 22 13.0

2 29 5.0

Highest 3 4 4.5

4 26 4.5

hours spent for 5 27 4.5

playing 1 8 .5

2 30 1.0

Lowest 3 11 1.0

4 10 1.0

5 15 1.5a

Table 3.3:Shows the extreme values.

31

Hours spent for playing

1.00 0. 5

6.00 1. 000555

8.00 2. 00000555

7.00 3. 0000055

6.00 4. 000555

1.00 5. 0

1.00 Extremes (>=13.0)

Each leaf: 1 case(s)

BOX PLOT

Fig:3.1:Box plot

32

Key findings

1 The highest value is 13 at case number 22

2. The lowest value is 0.5 at case number 8

3. The extreme value is 13 at case number 22

33

DATA SET 4: NORMALITY TEST

The following data shows the number of items sold by an enterprise in India.

Objective:

Normality test

Assumptions/ Hypothesis:

34

STEPS

STEP 1: Fill the data in the variable view whose result is shown on the data view.

STEP 2: For normality test click on Analyze < Descriptive Statistics < Explore.

35

STEP 3: Drag Monthly Sales to Dependent List

STEP 4: Explore< Plots. Select normality plot with test < Continue< Ok.

36

RESULT

OUTPUT MONTHLY_SALES.sav

Cases

Valid Missing Total

N Percent N Percent N Percent

Monthly 50 100.0% 0 0.0% 50 100.0%

Sales

Table 4.1: Shows the summary of the data

Descriptives

Statistic Std.

Error

Mean 61.34 4.519

Lower 52.26

95% Confidence Bound

Interval for Mean Upper 70.42

Bound

5% Trimmed Mean 59.99

Median 55.00

monthly Variance 1021.290

sales

Std. Deviation 31.958

Minimum 8

Maximum 150

Range 142

Interquartile Range 35

Skewness .761 .337

Kurtosis .240 .662

Table 4.2: Shows all the statistical value

37

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Monthly .133 50 .027 .951 50 .039

Sales

Table 4.3: Test of normality

a. Lilliefors Significance Correction

KEY FINDINGS

Table 4.5: Shows that significance value is 0.027 & 0.039 respectively, which is less than

0.05.

2. Hence, the monthly sales of employees are not normally distributed.

CONCLUSION

That means we cannot apply Parametric test(T-test, F-test, Z- test, ANNOVA) and Non-

parametric test (chi-square) should be applicable.

38

DATA SET 5: ONE SAMPLE t-TEST

Introduction: Healthcare

A healthcare provider claims that on an average its customers have lost 5 kg of weight in a

month after joining its weight loss programme. In order to test the validity of the claim an

independent researcher collects data of weiht loss by 5 customes a month after joining the

programme. The researcher has decided to apply 1 sample t- test in order to test the validity of

the claim.

Hypothesis:

So, Null Hypo(Ho): mean of the population is 5

Alternative Hypo(Ha/H1): mean of the population is ≠ 5 which means <5 or >5.

Objective:

To check whether the claim of healthcare provider is right i.e null hypothesis is accepted or

not. If the null hypothesis is rejected find the correct mean of the population.

IBM SPSS Statistical Version 21.

One sample t- test

P-value approach

39

STEPS

STEP 1: Fill the data of the weight loss in the variable view.

40

STEP 3: Analyse< Compare mean < one sample t test

41

RESULT

t- Test

One-Sample Statistics

Loss in Weight During Weight

Loss Program

One-Sample Test

Test Value = 5

t Df Sig. (2- Mean 95% Confidence Interval of

tailed) Difference the Difference

Lower Upper

Loss in Weight During -6.212 49 .000 -.980 -1.30 -.66

Weight Loss Program

Table5.2: Shows the result of one sample t -test

Observation 1:

As here p-value (o.ooo) is not greater than alpha value(0.05). We will reject the null

hypothesis which means claim of healthcare of 5 kg weight loss is wrong.

RESULT

t-TEST

One-Sample Statistics

N Mean Std. Std. Error

Deviation Mean

Loss in Weight During 50 4.02 1.116 .158

Weight Loss Program

42

Table5.3: Shows the result for one sample statistics

One-Sample Test

Test Value = 4

T df Sig. Mean 95% Confidence Interval of the Difference

(2- Difference Lower Upper

tailed)

Loss in .127 49 .900 .020 -.30 .34

Weight

During

Weight Loss

Program

Table5.4: Shows the (revised) result of one sample t- test

Observation 2

Here p-value (0.900) is greater than alpha value(0.05). So we will accept the alternative

hypothesis which means if healthcare would have claimed 4kg as the mean weight of the

population he would be right.

Key findings

a) Table 5.2 conclude that p-value is 0.000 which is not greater than 0.05 alpha

value.We will reject the null hypothesis which means claim of healthcare of 5 kg

weight loss is wrong.

b) Table5.4 concludes that p value is 0.900 that is obviously greater than alpha value. So

we conclue that 4 is the correct mean of the population.

c) Null hypothesis is rejected because p value is not greater than alpha value(0.05) at the

sample mean 5.

d) Alternative hypothesis will be accepted at the sample mean 4 where p value(0.900) is

greater than alpha value(0.05).

CONCLUSION

Therefore, there is no significant difference between sample mean and population mean at test

value 4.

The claim of healthcare provider is not true. A customer loses 4 kg of weight instead of 5kg in

a month after joining its weight loss program.

43

DATA SET 6: PAIRED t-TEST

Introduction: Training program

The HR manager of a business firm wants to analyze the impact of a training program

conducted for 30 employees. The purpose of conduction the training program was to improve

performance of employees. The performance scores of employees are noted before and after

training program. He wanted observe the performance of same respondents on pre sample and

post sample i.e. Bivariate.

Assumption:

Null hypothesis (Ho): There is no difference between pre training and post training of

employees.

Alternative hypothesis (Ha): There is difference between pre training and post training

of employees.

Objective:

To record the performance of scores of the employees before & after training.

To improve the performance of employees.

To perform paired t-test.

44

STEPS

STEP 1: Define the variables (name, type, label, measure) and fill it with values.

STEP 2: For paired sample t-test Analyze< Compare Mean < Paired Sample t-Test.

45

STEP 3: Drag both Pre Training Score to Paired Variables.

RESULT

Output of “Paired t-test.”

Mean N Std. Deviation Std. Error

Mean

Pre_training_Score 51.43 30 12.792 2.335

Pair 1

Post_training_Score 68.80 30 12.416 2.267

Table6.1: Shows Paired Samples Statistics

N Correlation Sig.

Pair 1

Post_training_Score

Table 6.2: Shows paired sample correlations

46

Paired Samples Test

Paired Differences t df Sig. (2-

Mean Std. Std. 95% Confidence tailed)

Deviatio Error Interval of the

n Mean Difference

Lower Upper

Pair Pre_training_Score - -17.367 9.565 1.746 -20.938 -13.795 -9.945 29 .000

1 Post_training_Score

Table 6.3: Shows paired sample test

Observation

Since p- value (0.000) of the sample mean is NOT GREATER than alpha value (0.05) that

means we will not accept null hypothesis.

Key Findings

Table 6.3 shows that significance value 0.000 is less than alpha value 0.005 which

conclude that we will reject null hypothesis

1. Null hypothesis is rejected due to lesser p value than alpha value which means that

alternative hypothesis is accepted. Therefore, there is significant difference between

means of pre-sample and post-sample performance of employees.

So, the training program is highly effective in increasing the sales figure of the company.

47

DATA SET 7: INDEPENDENT SAMPLES t-TEST

in different demographic profile. He divide employees on the basis of gender and age group

and apply independent t-test to analyze a difference between their performances.

Objective:

Assumptions

For Levene’s homogeneity test

Ho = There is no significance difference between the sample variances of two independent

samples (equality of variance)

Ha = There is significance difference between the sample variances of two independent

samples.

Ho = There is no significance difference between the performance scores of male and female

employees.

Ha = There is significance difference between the performance scores of male and female

employees.

Levene’s Homogeneity Test

48

STEPS

STEP 1: Define the variables (name, type, label, measure) and fill it with values.

<Independent Sample t -Test

49

STEP 3: Define variable list. Drag Performance Score to Test variables & Gender to

Grouping Values. Click “Define Groups”, write “Male” & “Female” in Group 1 & Group 2

column respectively < Click Continue then OK.

RESULT

Group Statistics

Gender N Mean Std. Deviation Std. Error Mean

Performance_Score

Female 25 60.60 18.949 3.790

Table 7.1: Shows group statistics

50

Independent Samples Test

Levene's Test for t-test for Equality of Means

Equality of

Variances

tailed) Differen Error Interval of the

- ce Differen Difference

- ce Lower Upper

variances

Performan- assumed

ce_Score Equal .200 47.9 .843 1.080 5.411 -9.800 11.960

variances not 83

assumed

.

51

Table 7.2: Shoes levenes test and one sample t test

If Levene’s hypothesis (Ho) will reject i.e. if it fails then go for latter value.

As first two columns shows the result of Levene’s homogeneity test, p-value =0.956 is

compare with alpha value which is 0.05. and latter columns tells t-test results.

In this case, p-value ( 0.937) > 0.05,so accept the Ho.

INDEPENDENT SAMPLE T-TEST

52

STEP5: Define variable list on the basis of age group and set cut point”40”.

Group Statistics

Age N Mean Std. Deviation Std. Error Mean

>= 40 22 68.86 19.075 4.067

Performance Score

< 40 28 55.07 16.777 3.171

53

Independent Samples Test

Levene's Test t-test for Equality of Means

for Equality of

Variances

F Sig. t Df Sig. Mean Std. 95%

(2- Differ Error Confidence

tailed) ence Differ Interval of the

ence Difference

Lower Upper

Equal 1.408 .241 2.7 48 .009 13.792 5.077 3.585 23.999

variances 17

Performa

assumed

nce_Scor

Equal 2.6 42. .011 13.792 5.157 3.387 24.197

e

variances 75 170

not assumed

54

KEY FINDINGS

1. According to table 7.2 the p value of Levene test on the basis of gender is 0.956,

which is greater than 0.05. So, the variances of performance for male & female are

equal. (σ12=σ22)

2. According to table 7.4 the p value of Levene test on the basis of age is 0.241, which is

greater than 0.05. So, the variances of performance of male & female are equal. (σ12 =

σ2 2 )

3. According to table 7.2 for the performance of employees on the basis of gender, the p

value is 0.843 which is greater than 0.05, which implies that the HO will be accepted.

4. For the performance of employees on the basis of age, the p value is 0.009 which is

less than 0.05,which implies that the HO will be rejected & HA will be accepted

So, there is no significant difference in the average performance of the Male & Female

employees.

So, there is a significant difference in the average performance of the Employees below and

above 40 years of age.

55

DATA SET 8: ANOVA

A researcher wants to compare the sales of three companies is collected frem diferrent retail

stores. The companies are coaded as 1,2 and 3 and the data of their sales from different retail

stores is goven in the data set.

Assumption:

Null Hypothesis(Ho): Variances are equal for all 3 companies.

Alternative Hypothesis(Ha): Varianves of all three companies are not equal

Null Hypothesis(Ho): Sales i.e. mean for all three companies are equal

Alternative Hypothesis (Ha) of Anova are:-

a) Mean sale of comp.1= mean sale of comp. 2 ≠ mean sale of comp.3

b) Mean sale of comp.1 ≠ mean sale of comp.2 = mean sale of comp.3

c) Mean sale of comp.1 ≠ mean sale of comp. 3 = mean sale of comp.2

d) Mean sale of comp1 ≠ mean sale of comp. 2 ≠ mean sale of comp. 3

Objective :

Apply one way anova to find out if the sales of three companies are equal.

IBM SPSS Statistical Version 21

Levenes test

Anova test

56

STEPS

STEP 1:Fill the data in the variable view.

57

STEP 3: Analyze< Compare Mean< One way Anova

STEP 4: Take sales to dependent list and company to the factor list.

58

STEP 5: Post Hoc< Tukey< Continue

STEP 6: Click options. From option select descriptive, homogenityand mean plot. Click

Continue.

59

RESULT

Descriptive

Sales

N Mean Std. Std. 95% Confidence Interval for Mini Maximu

Deviation Error Mean mum m

Lower Bound Upper Bound

1 18 17.11 15.335 3.615 9.49 24.74 6 76

2 15 22.53 20.525 5.299 11.17 33.90 5 76

3 17 44.47 15.895 3.855 36.30 52.64 20 89

Total 50 28.04 20.767 2.937 22.14 33.94 5 89

Table 8.1: Shows descriptive data.

Sales

Levene df1 df2 Sig.

Statistic

2.098 2 47 .134

Table 8.2: Shows Homogeneity of Variance

ANOVA

Sales

Sum of Squares Df Mean Square F Sig.

Between Groups 7194.174 2 3597.087 12.130 .000

Within Groups 13937.746 47 296.548

Total 21131.920 49

60

Multiple Comparisons

Tukey HSD

(I) (J) Mean Std. Sig. 95% Confidence Interval

Company Company Difference Error Lower Upper

(I-J) Bound Bound

2 -5.422 6.020 .643 -19.99 9.15

1 *

3 -27.359 5.824 .000 -41.45 -13.26

1 5.422 6.020 .643 -9.15 19.99

2 *

3 -21.937 6.100 .002 -36.70 -7.17

*

1 27.359 5.824 .000 13.26 41.45

3 *

2 21.937 6.100 .002 7.17 36.70

*. The mean difference is significant at the 0.05 level.

Table:8.3: Shows Anova

Sales

a,b

Tukey HSD

Compan N Subset for alpha = 0.05

y 1 2

1 18 17.11

2 15 22.53

3 17 44.47

Sig. .639 1.000

Table8.5: Shows tukey comparison of the three companies

Means for groups in homogeneous subsets are displayed.

a. Uses Harmonic Mean Sample Size = 16.570.

b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error

levels are not guaranteed.

61

Fig 8.1: Shows the diagrammatic representation of sales of companies

62

CONCLUSION

Table 8.2 –Homogeneity of Variance in Anova using levene’s test concludes that since

significance value is 0.134 that is more than alpha0.05 value hence our null hypothesis for

levene’s test is accepted i.e variance of sale all three companies are equivalent or similar.

Table 8.3- In the Anova table significance value is 0.00 which states that null hypothesis (Ho)

is rejected and the mean sales of all three companies are not equivalent.

Hence, we will accept alternative hypothesis (Ha). Now to check which of the 4 cases are to

be selected we will perform post Hoc test.

Table 8.4: Shows the significance value of each company with respect to other for example

company1 with company 2 has significance value 0.643 we will Ho here and significance

value of 1 with company 3 is 0.000 here we will reject Ho.

Fig 8.1: Shows that sales of company 1 is equivalent to company 2 and vice versa but sales of

company 3 are different from both.

Lastly we conclude that mean sales of Company 1 and Company 2 are equivalent but

mean sale of company 3 are totally different.

63

