Anda di halaman 1dari 7

5

2:5
National College of Ireland :1
07
MSc in Data Analytics Full-time Year
7 1 MSCDAD
. 1
MSc in Data Analytics Part-time1 Year 1 MSCDA
Postgraduate Diploma in Data Analytics6.0 Part-time Year 1 PGDSP
0
D
AExaminations 2015/16
O
Autumn/ Repeat
NL
W
D O Wenesday 10th August 2016

C I 6.30pm 8.30pm
N
______________________________________________________________________

Statistics for Data Analytics

Dr. Geraldine Gray


Mr. Tony Delaney
Dr. Barry Haycock

Candidates should answer:


Five questions from Section A (40 marks) and
Two questions from Section B (60 marks)

Duration of exam: Two Hours


Attachments: NCI Postgraduate Statistical Tables

Page 1 of 7
SECTION A Answer five questions

Question 1

Critically assess the observation that Traditional use of significance testing is an inherently misleading
process that should be abandoned in favour of other approaches (Cohen, 1994)
5
2:5 8 marks
:1
07
Question 2 . 17
. 01
6
Before proceeding to assess a MANOVA, you should check that certain key assumptions are not
0
violated. What are these and how would you confirm that they are not violated?
AD 8 marks
L O
N
W
Question 3
I DO
C
NWise Insurance Company believes that younger drivers have
The claims department at
more accidents and, therefore, should be charged higher insurance rates. Investigating a sample of
1,200 Wise policyholders revealed the following breakdown on whether a claim had been filed in the
last three years and the age of the policyholder. Is it reasonable to conclude that there is a relationship
between the age of the policyholder and whether or not the person filed a claim? Use the .05
significance level.

Age Group No Claim Claim


16 up to 25 170 74
25 up to 40 240 58
40 up to 55 400 44
55 or older 190 24
TOTAL 1000 200

8 marks

Page 2 of 7
Question 4

Consider a Markov chain with three possible states 1, 2 and 3 and transition probabilities as shown.
What is the state transition matrix?
8 marks

5
2:5
:1
07
. 17
. 01
06
AD
L O
Question 5 N
W
DO
Distinguish between situations in which it is acceptable to use a priori tests vs. post-hoc tests when
I
C
running an ANOVA procedure. When and why might you make a Bonferroni adjustment?
N
8 marks

Question 6

Describe what is meant by each of the following terms:

a) Multicollinearity
b) Heteroscedasticity
8 marks

Page 3 of 7
SECTION B Answer two questions

Question 7

a) Briefly outline the type of analytic problem for which logistic regression would be a suitable
procedure. 6 marks

b) Explain the concept of odds ratio encountered in logistic regression. 6 marks


55
2:
c) Consider the SPSS output provided below. The variables sex, age, problem getting asleep
1
:
07
(getsleprec), problem staying asleep (stayslprec) and hours sleep on weeknight (hourwnit) are
analysed as predictor variables of individuals reporting a SLEEP PROBLEM [0 = no, 1 = yes]
. 17
01and assess in this output and outline the
Comment on the key values that you would examine
.
conclusions you would draw about the model. 6 18 marks
0
AD
L O
N
W
D O
C I
N

Page 4 of 7
5
2:5
:1
07
. 17
. 01
06
AD
L O
N
W
I DO
N C

Page 5 of 7
Question 8

A researcher is interested in comparing staff satisfaction scores across various levels of length of
service (less than two years, three to five years and six years plus). Based on data collected the
ANOVA summarized below is produced.

a) Comment on the preliminary checks you would undertake before proceeding to interpret the
results.
8 marks
b) Interpret the SPSS output provided.
5
2:5 15 marks
c) Briefly outline how you would report the results in a research report.
:1
07 7 marks

Oneway . 17
.
Descriptives01
Total Staff Satisfaction Scale 06
AD 95% Confidence
L O Interval for Mean
N
Std. W Std. Lower Upper Minimu Maxim
N Mean Deviation
D O Error Bound Bound m um
<= 2 172 35.57 I
6.489 .495 34.59 36.55 19 50
3-5 127 33.34 NC6.558 .582 32.19 34.49 18 48
6+ 136 33.18 7.586 .650 31.90 34.47 10 50
Total
435 34.17 6.947 .333 33.52 34.83 10 50

Test of Homogeneity of Variances


Total Staff Satisfaction Scale
Levene
Statistic df1 df2 Sig.
1.807 2 432 .165

ANOVA
Total Staff Satisfaction Scale
Sum of Mean
Squares df Square F Sig.
Between
557.061 2 278.530 5.902 .003
Groups
Within Groups 20387.008 432 47.192
Total 20944.069 434

Page 6 of 7
Post Hoc Tests

Multiple Comparisons
Dependent Variable: Total Staff Satisfaction Scale
Tukey HSD
95% Confidence
Mean Interval
(I) length of (J) length of Difference Std. Lower Upper
5
service grp 3 service grp 3 (I-J) Error Sig. :5
Bound
2
Bound
<= 2 3-5 2.231* .804 .016 :1 .34 4.12
6+ 2.386 * .788 .007 07 .53 4.24
3-5 <= 2 -2.231* .804 .016 . 17 -4.12 -.34
6+ .155 .848
. .98201 -1.84 2.15
6+ <= 2 -2.386 * .788 06 .007 -4.24 -.53
3-5 -.155
AD.848 .982 -2.15 1.84
O
*. The mean difference is significant at the 0.05 level.
L
N
W
I DO
N C
Question 9

a) Doctors find that people with Creutzfeldt-Jacob disease (CJD) almost invariably ate
hamburgers, thus p(Hamburger eater | CJD ) = 0.9. The probability of an individual having CJD
is currently rather low, about one in 100,000. Assuming eating lots of hamburgers is rather
widespread, what is the probability that a hamburger eater will have Creutzfeldt-Jacob
disease?
15 marks

b) A computer system can operate in two different modes. Every hour, it remains in the same
mode or switches to a different mode according to the transition probability matrix:

0.4 0.6
P = [ ]
0.6 0.4

i. Compute the 2-step transition probability matrix.


7 marks

ii. If the system is in Mode I at 5:30 pm, what is the probability that it will be in Mode I at
8:30 pm on the same day?
8 marks

Page 7 of 7