Anda di halaman 1dari 19

Q.

Students who miss the midterm test in a statistics course have the opportunity to write a
deferred test provided that they have a valid reason. The students who requested to
write the deferred test over the last four years were categorized as giving one of the
following reasons:
1. Sickness
2. Marriage in the family
3. Out of town because of the university team game
4. Forgot the date or time of the midterm test
The responses were recorded using the numerical codes shown.
a. Produce a frequency distribution and a relative frequency distribution.
b. Draw an appropriate graph to summarize the data. What does the graph tell you?

Ans.

We have the data which tells us the number of student who miss the mid-term with above
reasons.

So we can generate the frequency table with the help of EXCEL

REASON NUMBER OF RELATIVE FREQUENCY


STUDENT
1. Sickness 141 39.166
2. Marriage in family 128 35.55
3. Out of town because of the 32 8.88
university team game
4. Forgot the date or time of the 59 16.3
midterm
TOTAL 360
For summarizing the data we make a bar graph that show how these data varies
160
140
120
100
80
60
40
20
0
ss ige es te
kne ar
r am
e da
sic m rg tt
h
fo o
wn fo
rg
to
t of
ou

So after showing this bar graph, we can say that sickness is major reason of the student who
miss the mid-term which is around 39% of the whole data and after this marriage is next
reason of the students for missing the mid-term.

Out of town reason have the less number of student who miss the mid term test.

Q.2

An increasing number of statistics courses use a computer and software rather than
manual calculations. A survey of statistics asked each to report the software his or her
courses uses. The responses are
1. Excel, 2. Minitab, 3. SAS 4. SPSS, and 5. Other
a. Produce a frequency distribution.
b. Graphically summarize the data so that the proportions are depicted.
c. What do the charts tell you about the software choices?
Ans.

First of all we make the frequency table that comes out from our database, that shows us the
exact dat about how many user use which software.

SOFTWARE NO. OF USER RELATIVE FREQUENCY


1. Excel 34 48.57
2. Minitab 17 24.28
3. SAS 3 4.28
4. SPSS 4 5.71
5. Other 12 17.14
TOTAL 70
For graphically summarization we make the bar chart by the help of EXCEL

40

35

30

25

20

15

10

0
excel minitab sas spss other

This graphical representation shows us that maximum number of users use EXCEL and SAS is
use by minimum number of the users. After the excel minitab is next preference of the users.

Our data shows that around 49% people of our survey use the excel which is very major part in
the whole survey.

Q.3

To help determine the need for more golf courses, a survey was undertaken. A sample of
75 self declared golfers was asked how many rounds of golf they played last year. These
data are as follows

18 26 16 35 30 15 18 15 18 19 25 30 35 14 20 18 24 21 25 18 29 23 15 19 27 28 9 17 28
25 23 20 24 28 36 20 30 26 12 31 13 26 22 30 29 26 17 32 36 24 29 18 38 31 36 24 30 20
13 23 3 28 5 14 24 13 18 10 14 16 28 19 10 42 22

a. Draw a histogram.
b. Draw a stem-and-leaf display.
c. Draw an ogive.
d. Describe what you have learned.
Ans.

So, here we first find the minimum (min) and maximum (max) entries to get a spread of the
variable (in the sample).

In sample we found that minimum value =3

Maximum value=42

So we make the frequency table

Class interval Frequency


0-5 1
5-10 2
10-15 9
15-20 17
20-25 15
25-30 17
30-35 8
35-40 5
40-45 1
TOTAL 75
By the help of this data we make the histogram by the help of MINITAB which have the above
class interval
Histogram of golf round
18

16

14

12
Frequency

10

0
0 10 20 30 40
golf round

NOW we can also make stem and leaf display

Stem-and-Leaf Display: golf round

Stem-and-leaf of golf round N = 75


Leaf Unit = 1.0

1 0 3
3 0 59
12 1 0 0 2 3 3 3 4 4 4
29 1 5 5 5 6 6 7 7 8 8 8 8 8 8 8 9 9 9
(15) 2 0 0 0 0 1 2 2 3 3 3 4 4 4 4 4

Now we have an ogive with the help of minitab


So we add another cumulative frequency into our table

Class interval Frequency Cumulative frequency


0-5 1 1
5-10 2 3
10-15 9 12
15-20 17 29
20-25 15 44
25-30 17 61
30-35 8 69
35-40 5 74
40-45 1 75
TOTAL 75

After making the histogram and stem and leaf of the this data we learn that in maximum
number of golfer comes in the interval of 25-30 and 15-20 round but we note that there
maximum value of round is 42 and minimum value is 3. After making the frequency table we
can also quote that mostly golfer have there round in between 15 to 30 which is around 75% of
the whole data and minimum value is in 0-5 and 40-45 class interval.

So after making this graphical description we can say that they make round between the 15 to
30.

Q.4.
Comparing Returns on Two Investments:
a. Draw histograms for each set of return by using MS Excel, Minitab and; SPSS and report
your findings. Which investment would you choose and why?
b. Also write your interpretation using the numerical measures viz. mean. Median,
variance and standard deviation.

CLASS INTERVAL FOR RETURN A FREQUENCY


-30 - -10 10
-10 – 10 15
10-30 14
30-50 8
50-70 3
TOTAL 50
Histogram of RETURN A
16

14

12

10
Frequency

0
-30 -10 10 30 50 70
RETURN A

CLASS INTERVAL FOR RETURN B FREQUENCY


-40 - -20 10
-20 – 0 2
0 – 20 18
20 – 40 12
40 – 60 5
60 – 80 3
TOTAL 50
Histogram of RETURN B
20

15
Frequency

10

0
-40 -20 0 20 40 60 80
RETURN B

Descriptive Statistics: RETURN A, RETURN B

N for
Variable Mean StDev Median Mode Mode
RETURN A 10.95 21.89 9.88 12.89 2
RETURN B 12.76 28.05 10.75 * 0

After making the graphical and descriptive analysis of this data we can say that return A mean
is less than then the mean of return B but standard deviation of return b is more than the
return a , now if I want to choose the the return, I choose the return b because here our positive
return is more than the return a. but we also note that variation is more in return b. another
thing is that minimum value is more in return b and return a have less minimum value than
return but maximum value is also more in return b then return a. so I choose the return b.
Q.5
Analysis of Long Distance Telephone Bills:
The company’s marketing manager conducted a survey of 200 new residential
subscribers wherein the first month’s bills were recorded.

a. Find the Mean long-distance telephone bill.

b. Find the median for the data

c. Find the Mode for the data

d. Determine the inter-quartile range for the data.

e. Draw a Box plot of long distance telephone bills.

Ans.

First of all we find out the descriptive analysis of the data with the help of minitab

Descriptive Statistics: BILLS

N for
Variable Mean Minimum Q1 Median Q3 Maximum Mode Mode
BILLS 43.59 0.00 9.28 26.91 84.94 119.63 0 8

Boxplot of BILLS

120

100

80
BILLS

60

40

20

0
After drawing the box-plot of this long distance telephone we can write that the variation
between the first quartile and median is lass than the variation between median and third
quartile. So that I can write that more variation is present in this data.
Here also one thing is present that distance between the first quartile and minimum value is
less than distance between the third quartile and maximum value.
So after analyzing this data we can say that the telephone bill data is full of variation

Q.6
The career counseling center at a university wanted to learn more about the starting
salaries of the university’s graduates. They asked each graduate to report the highest
salary offer received. The survey also asked each graduate to report the degree and
starting salaries. Draw box plots to compare the four groups of starting salaries. Report
your findings.

ANS.

Before drawing the box-plot curve , we have the data of highest salary offer received in survey

So first of all we must know the descriptive analysis of data so we can easily understand the
box plot that what is interpretation of the box-plot.

So

Descriptive Statistics: BA, BSc, BBA, OTHER

N for
Variable Mean Minimum Q1 Median Q3 Maximum Mode Mode
BA 27697 18719 25730 27765 29836 37025 28539 2
BSc 33148 23451 29927 33397 36745 40105 * 0
BBA 35260 23401 31316 34284 39551 47639 * 0
OTHER 30474 21994 28254 29951 32905 38812 32262 2
SO NOW BOX PLOT OF EVERY SEGMENT OF THE SALARY

In this question we have the data of the highest offered salaries, now drawing the box plot we
can see that maximum salary is offered in BBA and minimum salary is offered in BA. Mean
offered salary is also highest in the BBA so we can write that BBA course is have the maximum
salary. In this box plot BA also has a out bound value which is outside the maximum value of
the offered salary in BA.

In others courses variation is very less and BA have the highest variation in between the
minimum salary , highest salary and first quartile , median and third quartile.

So as a counselor I preferred the course BBA for the new student and variation is also not
more, so he preferred BBA course than the other course.
Q.10. Compare the five sets of data using a box plot and interpret the results.

Here the data of the five restaurants are given to us, and we have the box-plot of these data,
here in this box-plot we can see that mean time of service taken is highest in jack and smallest
mean time is in wendeys. Here maximum time taken is in jack and minimum time taken is in
wendeys.

So as per this box-plot variation is maximum in hardees because there minimum to first
quartile distance and maximum to third quartile having a large variation, here no consistency is
present in this restaurant so as a customer the service provided by the hardees is very
inconsistence.

In jack, there consistency is present because here variation present in the parameters is very
low but there minimum time and maximum service time is very high in comparison to other
restaurant so as a customer I may not be preferred this restaurant.

in other restaurant,popeyes,wendeys,and mcdonalds they have less minimum time but


wendeys have lease minimum time and less variation is present in wendeys so as customer I
choose wendeys who have the best service provider then the other restaurants.
Q.17

Do the data indicate that it is possible that you can do extremely well with little
likelihood of a large loss? Is it likely that you could lose money? The returns for the two
types of investments were entered in SPSS and the descriptive statistics was produced as
given in the table below.
Statics is

Return A Return B
N Valid 50 50
Missing 0 0
Mean 10.9454 12.7602
Median 9.8800 10.7550
Mode 12.89 -38.47a
Std. Deviation 21.89401 28.04676
Variance 479.348 786.621
Range 84.95 106.47
Minimum -21.95 -38.47
Maximum 63.00 68.00
Sum 547.27 638.01
Percentiles 25 -7.4175 -1.2150
50 9.8800 10.7550
75 25.8625 31.3000

In this question we have the descriptive summery of two return which is RETURN A and
RETURN B. here given in our above table mean return is more in RETURN B as compare to
RETURN A but variance is also more in RETURN B as compare to RETURN A. minimum and
maximum value is also more in RETURN B but if we see the variation in both of these return
than we can see that variation is less in RETURN A as compare to RETURN B.

So in my point of view there are more consistency present in the RETURN A as compare to
RETURN B, so if we invest our money in RETURN A than that give us consistence result than
the RETURN B.
Q.18
A small bank that heretofore did not use a scorecard wanted to determine whether a
scorecard would be advantageous. The bank manager took a random sample of 300
loans that were granted and scored each on a scorecard borrowed from a similar bank.
This scorecard is based on the responses supplied by the applicants to question such as
age, marital status, and household income. The cutoff is 650, which means that those
scoring above are predicted to repay. Two hundred twenty of the loans were repaid, the
rest were not. The scores of those who repaid and the scores of those who defaulted
were recorded.

a. Use a graphical technique to present the score of those who repaid.


b. Use a graphical technique to present the scores of those who defaulted.
c. What have you learned about the scorecard?

Ans.
Frequency table of who paid

CLASS INTERVAL FREQUENCY

540-560 2
560-580 2
580-600 0
600-620 3
620-640 3
640-660 11
660-680 9
680-700 17
700-720 14
720-740 23
740-760 23
760-780 33
780-800 25
800-820 19
820-840 14
840-860 12
860-880 2
880-900 5
900-920 2
920-940 1
TOTAL 220
Histogram of REPAID
35

30

25
Frequency

20

15

10

0
540 600 660 720 780 840 900
REPAID

This is the histogram of the repaid customer here maximum number of the customer have the
in the 760-780 class interval. After having this histogram maximum number of repaid customer
are in the interval of 680 – 860 which is around 60% of the whole data.

Given that cutoff is 650 and here only 52 customer have the value less than the 650 in all 220
customer, around 75% customer are come in above cutoff region.

So around 75% customer predicted to repay the loan.


Frequency table of defaulted

CLASS INTERVAL FREQUENCY

425-450 1
450-475 1
475-500 1
500-525 3
525-550 3
550-575 8
575-600 8
600-625 10
625-650 16
650-675 11
675-700 11
700-725 5
725-750 0
750-775 1
775-800 1
TOTAL 80

Histogram of DEFAULTED
18

16

14

12
Frequency

10

0
450 500 550 600 650 700 750 800
DEFAULTED

This is the histogram of the defaulted loan customer where values are in 425 to 800 class
interval. Total 80 defaulted customer are taken in this list, in this list mostly customer comes in
the 550 -725 which is around 80% of the whole data.
Q.19
Refer to the question above. The bank decided to try another scorecard, this one based
not on the responses of the applicants, but on credit bureau reports, which lists
problems such as late payments and previous defaults. The scores using the new score
card of those who repaid and the scores of those who did not repay were recorded. The
cutoff score is 650.
a. Use a graphical technique to present the score of those who repaid.
b. Use a graphical technique to present the scores of those who defaulted.
c. What have you learned about the scorecard?
d. Compare the results of this exercise with those of the above exercise. Which scorecard
appears to be better?

Ans.

FREQUENCY TABLE OF REPAID

CLASS INTERVAL FREQURNCY

610-630 2
630-650 3
650-670 10
670-690 17
690-710 8
710-730 25
730-750 34
750-770 44
770-790 28
790-810 17
810-830 15
830-850 9
850-870 8
TOTAL 220
Histogram of REPAID
50
44

40
34

30
Frequency

28
25

20 17 17
15

10
9
10 8 8

3
2

0
640 680 720 760 800 840
REPAID

FREQURNCY TABLE OF DEFAULTED

FREQUENCY TABLE CLASS

412.5-437.5 1
437.5-462.5 5
462.5-487.5 6
487.5-512.5 14
512.5-537.5 7
537.5-562.5 16
562.5-587.5 13
587.5-612.5 7
612.5-637.5 9
637.5-662.5 2
TOTAL 80
Histogram of DEFAULTED
18
16
16
14
14 13

12
Frequency

10 9

8 7 7
6
6 5

4
2
2 1

0
450 500 550 600 650
DEFAULTED

Anda mungkin juga menyukai