Students who miss the midterm test in a statistics course have the opportunity to write a
deferred test provided that they have a valid reason. The students who requested to
write the deferred test over the last four years were categorized as giving one of the
following reasons:
1. Sickness
2. Marriage in the family
3. Out of town because of the university team game
4. Forgot the date or time of the midterm test
The responses were recorded using the numerical codes shown.
a. Produce a frequency distribution and a relative frequency distribution.
b. Draw an appropriate graph to summarize the data. What does the graph tell you?
Ans.
We have the data which tells us the number of student who miss the mid-term with above
reasons.
So after showing this bar graph, we can say that sickness is major reason of the student who
miss the mid-term which is around 39% of the whole data and after this marriage is next
reason of the students for missing the mid-term.
Out of town reason have the less number of student who miss the mid term test.
Q.2
An increasing number of statistics courses use a computer and software rather than
manual calculations. A survey of statistics asked each to report the software his or her
courses uses. The responses are
1. Excel, 2. Minitab, 3. SAS 4. SPSS, and 5. Other
a. Produce a frequency distribution.
b. Graphically summarize the data so that the proportions are depicted.
c. What do the charts tell you about the software choices?
Ans.
First of all we make the frequency table that comes out from our database, that shows us the
exact dat about how many user use which software.
40
35
30
25
20
15
10
0
excel minitab sas spss other
This graphical representation shows us that maximum number of users use EXCEL and SAS is
use by minimum number of the users. After the excel minitab is next preference of the users.
Our data shows that around 49% people of our survey use the excel which is very major part in
the whole survey.
Q.3
To help determine the need for more golf courses, a survey was undertaken. A sample of
75 self declared golfers was asked how many rounds of golf they played last year. These
data are as follows
18 26 16 35 30 15 18 15 18 19 25 30 35 14 20 18 24 21 25 18 29 23 15 19 27 28 9 17 28
25 23 20 24 28 36 20 30 26 12 31 13 26 22 30 29 26 17 32 36 24 29 18 38 31 36 24 30 20
13 23 3 28 5 14 24 13 18 10 14 16 28 19 10 42 22
a. Draw a histogram.
b. Draw a stem-and-leaf display.
c. Draw an ogive.
d. Describe what you have learned.
Ans.
So, here we first find the minimum (min) and maximum (max) entries to get a spread of the
variable (in the sample).
Maximum value=42
16
14
12
Frequency
10
0
0 10 20 30 40
golf round
1 0 3
3 0 59
12 1 0 0 2 3 3 3 4 4 4
29 1 5 5 5 6 6 7 7 8 8 8 8 8 8 8 9 9 9
(15) 2 0 0 0 0 1 2 2 3 3 3 4 4 4 4 4
After making the histogram and stem and leaf of the this data we learn that in maximum
number of golfer comes in the interval of 25-30 and 15-20 round but we note that there
maximum value of round is 42 and minimum value is 3. After making the frequency table we
can also quote that mostly golfer have there round in between 15 to 30 which is around 75% of
the whole data and minimum value is in 0-5 and 40-45 class interval.
So after making this graphical description we can say that they make round between the 15 to
30.
Q.4.
Comparing Returns on Two Investments:
a. Draw histograms for each set of return by using MS Excel, Minitab and; SPSS and report
your findings. Which investment would you choose and why?
b. Also write your interpretation using the numerical measures viz. mean. Median,
variance and standard deviation.
14
12
10
Frequency
0
-30 -10 10 30 50 70
RETURN A
15
Frequency
10
0
-40 -20 0 20 40 60 80
RETURN B
N for
Variable Mean StDev Median Mode Mode
RETURN A 10.95 21.89 9.88 12.89 2
RETURN B 12.76 28.05 10.75 * 0
After making the graphical and descriptive analysis of this data we can say that return A mean
is less than then the mean of return B but standard deviation of return b is more than the
return a , now if I want to choose the the return, I choose the return b because here our positive
return is more than the return a. but we also note that variation is more in return b. another
thing is that minimum value is more in return b and return a have less minimum value than
return but maximum value is also more in return b then return a. so I choose the return b.
Q.5
Analysis of Long Distance Telephone Bills:
The company’s marketing manager conducted a survey of 200 new residential
subscribers wherein the first month’s bills were recorded.
Ans.
First of all we find out the descriptive analysis of the data with the help of minitab
N for
Variable Mean Minimum Q1 Median Q3 Maximum Mode Mode
BILLS 43.59 0.00 9.28 26.91 84.94 119.63 0 8
Boxplot of BILLS
120
100
80
BILLS
60
40
20
0
After drawing the box-plot of this long distance telephone we can write that the variation
between the first quartile and median is lass than the variation between median and third
quartile. So that I can write that more variation is present in this data.
Here also one thing is present that distance between the first quartile and minimum value is
less than distance between the third quartile and maximum value.
So after analyzing this data we can say that the telephone bill data is full of variation
Q.6
The career counseling center at a university wanted to learn more about the starting
salaries of the university’s graduates. They asked each graduate to report the highest
salary offer received. The survey also asked each graduate to report the degree and
starting salaries. Draw box plots to compare the four groups of starting salaries. Report
your findings.
ANS.
Before drawing the box-plot curve , we have the data of highest salary offer received in survey
So first of all we must know the descriptive analysis of data so we can easily understand the
box plot that what is interpretation of the box-plot.
So
N for
Variable Mean Minimum Q1 Median Q3 Maximum Mode Mode
BA 27697 18719 25730 27765 29836 37025 28539 2
BSc 33148 23451 29927 33397 36745 40105 * 0
BBA 35260 23401 31316 34284 39551 47639 * 0
OTHER 30474 21994 28254 29951 32905 38812 32262 2
SO NOW BOX PLOT OF EVERY SEGMENT OF THE SALARY
In this question we have the data of the highest offered salaries, now drawing the box plot we
can see that maximum salary is offered in BBA and minimum salary is offered in BA. Mean
offered salary is also highest in the BBA so we can write that BBA course is have the maximum
salary. In this box plot BA also has a out bound value which is outside the maximum value of
the offered salary in BA.
In others courses variation is very less and BA have the highest variation in between the
minimum salary , highest salary and first quartile , median and third quartile.
So as a counselor I preferred the course BBA for the new student and variation is also not
more, so he preferred BBA course than the other course.
Q.10. Compare the five sets of data using a box plot and interpret the results.
Here the data of the five restaurants are given to us, and we have the box-plot of these data,
here in this box-plot we can see that mean time of service taken is highest in jack and smallest
mean time is in wendeys. Here maximum time taken is in jack and minimum time taken is in
wendeys.
So as per this box-plot variation is maximum in hardees because there minimum to first
quartile distance and maximum to third quartile having a large variation, here no consistency is
present in this restaurant so as a customer the service provided by the hardees is very
inconsistence.
In jack, there consistency is present because here variation present in the parameters is very
low but there minimum time and maximum service time is very high in comparison to other
restaurant so as a customer I may not be preferred this restaurant.
Do the data indicate that it is possible that you can do extremely well with little
likelihood of a large loss? Is it likely that you could lose money? The returns for the two
types of investments were entered in SPSS and the descriptive statistics was produced as
given in the table below.
Statics is
Return A Return B
N Valid 50 50
Missing 0 0
Mean 10.9454 12.7602
Median 9.8800 10.7550
Mode 12.89 -38.47a
Std. Deviation 21.89401 28.04676
Variance 479.348 786.621
Range 84.95 106.47
Minimum -21.95 -38.47
Maximum 63.00 68.00
Sum 547.27 638.01
Percentiles 25 -7.4175 -1.2150
50 9.8800 10.7550
75 25.8625 31.3000
In this question we have the descriptive summery of two return which is RETURN A and
RETURN B. here given in our above table mean return is more in RETURN B as compare to
RETURN A but variance is also more in RETURN B as compare to RETURN A. minimum and
maximum value is also more in RETURN B but if we see the variation in both of these return
than we can see that variation is less in RETURN A as compare to RETURN B.
So in my point of view there are more consistency present in the RETURN A as compare to
RETURN B, so if we invest our money in RETURN A than that give us consistence result than
the RETURN B.
Q.18
A small bank that heretofore did not use a scorecard wanted to determine whether a
scorecard would be advantageous. The bank manager took a random sample of 300
loans that were granted and scored each on a scorecard borrowed from a similar bank.
This scorecard is based on the responses supplied by the applicants to question such as
age, marital status, and household income. The cutoff is 650, which means that those
scoring above are predicted to repay. Two hundred twenty of the loans were repaid, the
rest were not. The scores of those who repaid and the scores of those who defaulted
were recorded.
Ans.
Frequency table of who paid
540-560 2
560-580 2
580-600 0
600-620 3
620-640 3
640-660 11
660-680 9
680-700 17
700-720 14
720-740 23
740-760 23
760-780 33
780-800 25
800-820 19
820-840 14
840-860 12
860-880 2
880-900 5
900-920 2
920-940 1
TOTAL 220
Histogram of REPAID
35
30
25
Frequency
20
15
10
0
540 600 660 720 780 840 900
REPAID
This is the histogram of the repaid customer here maximum number of the customer have the
in the 760-780 class interval. After having this histogram maximum number of repaid customer
are in the interval of 680 – 860 which is around 60% of the whole data.
Given that cutoff is 650 and here only 52 customer have the value less than the 650 in all 220
customer, around 75% customer are come in above cutoff region.
425-450 1
450-475 1
475-500 1
500-525 3
525-550 3
550-575 8
575-600 8
600-625 10
625-650 16
650-675 11
675-700 11
700-725 5
725-750 0
750-775 1
775-800 1
TOTAL 80
Histogram of DEFAULTED
18
16
14
12
Frequency
10
0
450 500 550 600 650 700 750 800
DEFAULTED
This is the histogram of the defaulted loan customer where values are in 425 to 800 class
interval. Total 80 defaulted customer are taken in this list, in this list mostly customer comes in
the 550 -725 which is around 80% of the whole data.
Q.19
Refer to the question above. The bank decided to try another scorecard, this one based
not on the responses of the applicants, but on credit bureau reports, which lists
problems such as late payments and previous defaults. The scores using the new score
card of those who repaid and the scores of those who did not repay were recorded. The
cutoff score is 650.
a. Use a graphical technique to present the score of those who repaid.
b. Use a graphical technique to present the scores of those who defaulted.
c. What have you learned about the scorecard?
d. Compare the results of this exercise with those of the above exercise. Which scorecard
appears to be better?
Ans.
610-630 2
630-650 3
650-670 10
670-690 17
690-710 8
710-730 25
730-750 34
750-770 44
770-790 28
790-810 17
810-830 15
830-850 9
850-870 8
TOTAL 220
Histogram of REPAID
50
44
40
34
30
Frequency
28
25
20 17 17
15
10
9
10 8 8
3
2
0
640 680 720 760 800 840
REPAID
412.5-437.5 1
437.5-462.5 5
462.5-487.5 6
487.5-512.5 14
512.5-537.5 7
537.5-562.5 16
562.5-587.5 13
587.5-612.5 7
612.5-637.5 9
637.5-662.5 2
TOTAL 80
Histogram of DEFAULTED
18
16
16
14
14 13
12
Frequency
10 9
8 7 7
6
6 5
4
2
2 1
0
450 500 550 600 650
DEFAULTED