Anda di halaman 1dari 19

Nonparametric methods

Nonparametric tests are often used in place of their parametric counterparts


when certain assumptions about the underlying population are questionable such as
normality of the data, or when observations may be measured on an ordinal rather than
an interval scale, or come from skewed or multimodal distributions. All tests involving
ranked data, i.e. data that can be put in order, are nonparametric.
Nonparametric methods
One sample

Sign Test
Wilcoxon Signed Ranks Test

Two samples

Parametric methods
One-sample t test
(paired t-test)
One-sample t test
(paired t-test)
Two-sample t test

Wilcoxon Mann-Whitney
Test
One-way ANOVA Test

One-way ANOVA

Kruskal-Wallis Test
Two-way ANOVA Test

Two-way ANOVA

Friedman Test
Correlation Test

Correlation

Spearman Rank Correlation


Test

Goodness -of-Fit

Kolmogorov-Smirnov Test

Regression

Nonparametric linear
regression

Chi-Squared goodness of fit


test
Linear regression

Non-parametric statistics are used if the data are not compatible with the assumptions of
parametric methods such as normality or homogeneous variance.
Advantages of non-parametric methods
1. are easy to apply
2. Assumptions, such as normality, can be relaxed
3. When observations are drawn from non-normal populations, non-parametric
methods are more reliable
4. can be used for ranks scores which are not exact in a numerical sense.
Disadavantages of non-parametric methods
1. when observations are drawn from normal populations, non-parametric
tests are not as powerful as parametric tests.

2. Parametric methods are sometimes robust to certain types of


departure from normality (especially as n gets large)
Central Limit Theorem
3. Confidence interval construction is difficult with non-parametric
methods.

One Sample
Sign Test
The sign test is designed to test a hypothesis about the location of a population
distribution. It is most often used to test the hypothesis about a population median, and
often involves the use of matched pairs, for example, before and after data, in which case
it tests for a median difference of zero.
Example
Out of a population, 10 mentally retarded boys received general appearance scores as
follows: 4, 5, 8, 8, 9, 6, 10,7,6,6.
1. H 0 : median 5 vs H A : Not H 0
2. Transform the data into signs: Assign + if the observed value > 5
- if the observed value < 5
0 if the observed value = 5.
obs
4
5
8
8
9
6
10
7
6
Signs 0
+
+
+
+
+
+
+

6
+

Zeros are eliminated from the analysis. Since there is 1 zero, the number of
observations is reduced from 10 to 9.
3. Thus we observed 8 +s out of 9 trials. The probability that we observe as many as 8
or more +s is, in EXCEL, (1-BINOMDIST(7,9,0.5, True))=.0195. Since we perform
the two-sided test, the p-value = 2*.0195 = .0370.
Large Sample Approximation for n > 20:
Z

In this example, Z =

89/ 2
9/4

Tn/2
~ N (0,1)
n/4

2.33

From EXCEL, the p-value =2*[1- NORMDIST(2.33,0,1,TRUE)] = 0.020 < 0.05.

4. Since the p-value is smaller than 0.05, we conclude that the median score is
not equal to 5.

Wilcoxon Signed Ranks Test


The Wilcoxon Signed Ranks test is designed to test a hypothesis about the median
of a population distribution. It often involves the use of matched pairs, for example,
before and after data, in which case it tests for a median difference of zero. In many
applications, this test is used in place of the one sample t-test when the normality
assumption is questionable. It is a more powerful alternative to the sign test, but does
assume that the population probability distribution is symmetric.
Example
id
1
2 3
4
5 6
7
8 9
10
T+
score
4
5 8
8
9 6
10 7 6
6
x5
-1 0 3
3
4 1
5
2 1
1
x 5
1
0 3
3
4 1
5
2 1
1
Rank- 2.5
Rank+
6.5
6.5
8 2.5 9
5 2.5
2.5
42.5
1. H 0 : median Cardiac 5 vs H A : Not H 0 .
2. Find the differences between observed values and the proposed median = 5.
3. Find the absolute values of the differences.
4. Eliminate the observation whose value, after subtracting 5, becomes 0.
5. Rank the absolute values of the differences, breaking the ranks among the ties.
6. Add all the ranks for the positive differences. T+ = 42.5
From the statistical table for the Wilcoxon Signed-Rank Test
when n = 9, T+ = 42.5, T- = 2.5, p-value = 2* .008 = .016
For Large sample approximation if n > 20
Z

T [ n ( n 1) / 4]

~ N (0,1)
n ( n 1)(2n 1) / 24
42.5 [9 * 10 / 4]
2.37 , its p-value is .018.
For this example Z
9 *10 * 19 / 24

8. Since n < 20, based on the p-value from the table (.1514),
we conclude that the median is not equal to 5.
SAS program for the sign test and the Wilcoxon ranked sign test for 1 sample data
DATA IN;
INPUT X @@;
diff=X-5;
CARDS;
4 5 8 8 9 6 10 7 6 6
run;
PROC UNIVARIATE;
VAR diff;
run;

Output
Univariate Procedure
Variable=DIFF

Moments
N
10 Sum Wgts
10
Mean
1.9 Sum
19
Std Dev 1.852926 Variance 3.433333
Skewness 0.180769 Kurtosis -0.62777
USS
67 CSS
30.9
CV
97.5224 Std Mean 0.585947
T:Mean=0 3.242617 Pr>|T|
0.0101
Num ^= 0
9 Num > 0
8
M(Sign)
3.5 Pr>=|M|
0.0391
Sgn Rank
20 Pr>=|S|
0.0195

M(sign) = # of +s n/2
= 8 - 9/2 = 3.5
Sgn Rank = T - n(n+1)/4
= 42.5 22.5 = 20

Two Paired Samples


Sign Test
Example (From Table 18-1 (p489))
Matched-pair design involving change scores in self-perception of heath among
hypertensives
id
Treatment
Control
Sign

1
10
6
+

2
12
5
+

3
8
7
+

4
8
9
-

5
13
10
+

6
11
12
-

7
15
9
+

8
16
8
+

9
4
3
+

10
13
14
-

11
2
6
-

12
15
10
+

13
5
1
+

14
6
2
+

15
8
1
+

T+
11

1. H 0 : median Treatment median Control vs H A : Not H 0 .


2. The signs are assigned + if Treatment > Control
- if Treatment < Control
= if Treatment = Control.
3. n = 15, T+ = 11
The exact p value =2*[1 - BINOMDIST(10,15,0.5, TRUE)] = 0.1185
Large Sample Approximation for n > 20:
Z

In this example, Z =

Tn/2
n/4

~ N (0,1)

11 15 / 2
1.81
15 / 4

From EXCEL, the p-value =2*[1- NORMDIST(1.81,0,1,TRUE)] = 0.07 > 0.05.


Based on the large sample approximation and the exact p -value, we conclude that
the median of the treatment group is not different from the median of the
control group.

Wilcoxon Signed Rank Test


Example
Matched-pair design involving change scores in self-perception of heath among
hypertensives
id
Treatment
Control
Diff
Diff

1
10
6
4
4

2
12
5
7
7

3
8
7
1
1

4
8
9
-1
1
3

5
13
10
3
3

6
11
12
-1
1
3

7
15
9
6
6

8
16
8
8
8

9
4
3
1
1

10
13
14
-1
1
3

11
2
6
-4
4
8.5

12
15
10
5
5

13
5
1
4
4

14
6
2
4
4

15
8
1
7
7

RankRank+
8.5 13.5 3
6
12 15 3
11 8.5 8.5 13.5
H
:
median

median
vs
H
:
Not
H
.
1.
0
Treatment
Control
A
0
2. Find the difference between treatment and control.
3. Rank the absolute differences breaking the ranks among the ties.
4. Add all the ranks for the positive differences. T+ = 110
From the statistical table for the Wilcoxon Signed-Rank Test
when n = 15, T+ =102.5, T- = 17.5, p-value = 2* .007 = .014
For Large sample approximation if n > 20
Z

T [ n ( n 1) / 4]

~ N (0,1)
n ( n 1)(2n 1) / 24
102.5 [15 * 16 / 4]
2.41 and its p-value = .016.
For this example, Z
15 * 16 * 31 / 24

5. Since the p-value < .05, we conclude that the medians are not different.
Note: the results from the sign test and the Wilcoxon signed ranks test on the paired
sample are different. The Wilcoxon signed ranks test is more powerful.
SAS program for the sign test for paired data
data hyper;
input treat control @@;
diff = treat-control;
cards;
10 6 12 5
4 3 13 14
run;

8 7 8 9 13 10 11 12 15 9 16 8
2 6 15 10 5 1 6 2 8 1

proc univariate;
var diff;
run;

Output
Variable=DIFF
Moments

Univariate Procedure

parametric paired
t-test

T+

102.5

N
15 Sum Wgts
15
Mean
2.866667 Sum
43
Std Dev 3.563038 Variance 12.69524
Skewness -0.34413 Kurtosis -0.82487
USS
301 CSS
177.7333
CV
124.292 Std Mean 0.919972
T:Mean=0 3.116036 Pr>|T|
0.0076
Num ^= 0
15 Num > 0
11
M(Sign)
3.5 Pr>=|M|
0.1185
Sgn Rank
42.5 Pr>=|S|
0.0139

M(sign) = # of +s n/2
= 11 15/2 = 3.5
Sgn Rank = T - n(n+1)/4
= 102.5 60 = 42.5

Two Independent Samples


Wilcoxon Mann-Whitney Test
The Wilcoxon Mann-Whitney Test is one of the most powerful of the nonparametric tests
for comparing two populations. In many applications, the Wilcoxon Mann-Whitney Test
is used in place of the two sample t-test when the normality assumption is questionable.
This test can also be applied when the observations in a sample of data are ranks, that is,
ordinal data rather than direct measurements.
Example
A researcher assess the effects of prolonged inhalation of cadmium oxide on the
hemoglobin level.
Exposed
Unexposed
Exposed(sort)
Rank
Unexposed(sort)
Rank
14.4
14.2
13.8
16.5
14.1
16.6
15.9
15.6
14.1
15.3
15.7
16.7
13.7
15.3
14.0

17.4
16.2
17.1
17.5
15.0
16.0
16.9
15.0
16.3
16.8

13.7
13.8
14
14.1
14.1
14.2
14.4
15.3
15.3
15.6
15.7
15.9
16.5
16.6
16.7
S1

1
2
3
4.5
4.5
6
7
10.5
10.5
12
13
14
18
19
20
145

15
15
16
16.2
16.3
16.8
16.9
17.1
17.4
17.5

8.5
8.5
15
16
17
21
22
23
24
25

S2 180
1. Hypotheses: H 0 : median X median Y vs H A : Not H 0
2. Sort each column of the variables. Assign the joint ranks to the samples from
the two variables. Find S1 and S2, the sum of the ranks assigned to each group.
S = max (S1,S2)
3. The test statistic is T S

n (n 1)
2

where n = max (n1, n2)


SAS program for the Wilcoxon Mann-Whitney Test
data oxide;

infile cards missover;


input group $ @;
do until (hemo = .);
input hemo @;
if hemo ne .
then output;
end;
cards;
1 13.7 13.8 14 14.1 14.1 14.2 14.4 15.3 15.3 15.6 15.7 15.9 16.5 16.6 16.7
2 15 16 16.2 16.3 16.8 16.9 17.1 17.4 17.5 15
run;
proc npar1way wilcoxon;
class group;
var hemo;
exact;
run;
proc means median;
class group;
var hemo;
run;
NPAR1WAY PROCEDURE

GROUP
1
2

Wilcoxon Scores (Rank Sums) for Variable HEMO


Classified by Variable GROUP
Sum of
Expected
Std Dev
Mean
N
Scores
Under H0
Under H0
Score
15
145.0
195.0
18.0173527
9.6666667
10
180.0
130.0
18.0173527
18.0000000
Average Scores Were Used for Ties
Wilcoxon 2-Sample Test

S = 180.000

Exact P-Values
(One-sided) Prob >= S
= 0.0021
(Two-sided) Prob >= |S - Mean| = 0.0042
Normal Approximation (with Continuity Correction of .5)
Z = 2.74735
Prob > |Z| = 0.0060
T-Test Approx. Significance = 0.0112

The MEANS Procedure


Analysis Variable : hemo
N
group
Obs
Median

1
15
15.3000000
2
10
16.5500000

Conclusion: Reject H 0 : median X median Y and conclude that the median hemo of
Treatment group 1 and 2 are different or the median hemo due to Group 2 is greater
than the median due to Group 1.
One-way ANOVA
Kruskal-Wallis Test
The Kruskal-Wallis test is a nonparametric test used to compare three or more samples.
Data

Levels

Observations

Sum i

Mean i

Y11 Y12 Y13 .....Y1n1

Y1.

Y1.

Y21 Y22 Y23 .....Y2 n 2

Y2.

Y2.

:
a

Ya1 Ya 2 Ya 3 .....Yan a

Ya .

Ya .

__________________________________________
Total

Y..

Y..

N = n 1 n 2 ... n a
Converting the original data into the ranks, we get
Levels

Observations

Sum i

Mean i

R 11 R 12 R 13 .....R 1n1

R 1.

R 1.

R 21 R 22 R 23 .....R 2 n 2

R 2.

R 2.

:
a

R a1 R a 2 R a 3 .....R an a

R a.

R a.

__________________________________________
Total

R ..

R ..

Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0


Test Statistics

a
R
12
R..

T =
n i i.
N ( N 1) i 1 n i
N

a
12
(R 2 i. / n i ) 3( N 1) ~ X 2 a 1
N( N 1) i 1

SAS program for the Kruskal-Wallis Test


data Kruskal;
infile cards missover;
input treat $ @;
do until (time = .);
input time @;
if time ne . then output;
end;
lines;
A 17 20 40 31 35
B8798
C2543
run;
proc npar1way wilcoxon;

class treat;
var time;
exact;
run;
proc means median;
class treat;
var time;
run;
NPAR1WAY PROCEDURE
Wilcoxon Scores (Rank Sums) for Variable TIME
Classified by Variable TREAT

TREAT
A
B
C

Sum of
Scores

N
5
4
4

Expected
Under H0

Std Dev
Under H0

55.0
35.0
6.82191040
26.0
28.0
6.47183246
10.0
28.0
6.47183246
Average Scores Were Used for Ties
Kruskal-Wallis Test

Mean
Score

11.0000000
6.5000000
2.5000000

S = 10.711

Exact P-Value
Prob >= S = 6.66E-05
Chi-Square Approximation
DF = 2
Prob > S = 0.0047

The MEANS Procedure


Analysis Variable : time
N
treat
Obs
Median

A
5
31.0000000
B
4
8.0000000
C
4
3.5000000

Conclusion: Reject H 0 : median A median B median C and conclude that the median
times of Treatment group A, B, and C are different. It seems that the median time
due to Treatment A is greater than the medians due to Treatments B and C. This
must be followed up by a nonparametric multiple comparison procedure.
Randomized Block Design
Friedman Test
Data

Let the data be in the following format;

Blocks
Levels

Sum i

Mean i

Y11 Y12 Y13 .....Y1b

Y1.

Y1.

Y21 Y22 Y23 .....Y2 b

Y2.

Y2.

Ya1 Ya 2 Ya 3 .....Yab

Ya .

Ya .

Sum j

Y.1 Y.2 Y.3 ..... Y.b

Y..

Mean j

Y.1 Y.2 Y.3 ..... Y.b

Y..

where N = ab
Converting the data into the separate ranks within each block, we get
Blocks
Levels 1 2 3

Sum i

Mean i

R 1 R 1 R 1 .....R 1

R 1.

R 1.

R 2 R 2 R 2 .....R 2

R 2.

R 2.

R a R a R a .....R a

R a.

R a.

:
a

Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0


Test Statistics

a
12
R 2 i. 3b(a 1) ~ X 2 a 1
ab(a 1) i 1

SAS program for the Friedman Test


DATA Fried;
INPUT BLOCK $ TRTMENT $ YIELD @@;
CARDS;
1 A 32.6
1 B 36.4
1 C 29.5 1 D 29.4
2 A 42.7
2 B 47.1
2 C 32.9 2 D 40.0
3 A 35.3
3 B 40.1
3 C 33.6 3 D 35.0
4 A 35.2
4 B 40.3
4 C 35.7 4 D 40.0
5 A 33.2
5 B 34.3
5 C 33.2 5 D 34.0
6 A 33.1
6 B 34.4
6 C 33.1 6 D 34.1
run;
PROC RANK;
BY BLOCK;
VAR YIELD;

RANKS RYIELD;
RUN;
proc freq;
tables block*trtment*ryield / noprint cmh;
title 'Friedman''s Chi-Square';
run;
proc means median;
class trtment;
var yield;
run;

Output
Friedman's Chi-Square
The FREQ Procedure
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF
Value
Prob
--------------------------------------------------------------1
Nonzero Correlation
1
0.7448 0.3881
2
Row Mean Scores Differ
3
12.6207 0.0055
3
General Association
12
27.7500 0.0060
Total Sample Size = 24
The MEANS Procedure
Analysis Variable : YIELD
N
TRTMENT
Obs
Median
------------------------------A
6
34.2000000
B
6
38.2500000
C
6
33.1500000
D
6
34.5500000
-------------------------------

Conclusion: We reject the null hypothesis and conclude that the median yields of
Treatment A, B, C and D are different. It seems that the median yield due to
Treatment B is greater than the medians due to Treatment A, C, and D. This must
be followed up by the nonparametric multiple comparison procedure.
Correlation
The Spearman Rank Correlation Coefficeint
The Spearman Rank Correlation test uses the ranks (rather than the actual values) of the
two sets of variables to calculate a statistic, the correlation coefficient: rs.
data rankcorr;
input age EEG @@;
lines;
20 98 21 75 22 95 24 100 27 99 30 65 31 64 33 70 35 85
38 74 40 68 42 66 46 48 51 54 53 63 55 52 58 67 60 55
run;
proc rank;
var age;
ranks rage;
run;
proc rank;
var EEG;
ranks rEEG;
run;

Proc corr;
var rage rEEG;
run;

Output
2 'VAR' Variables: RAGE
Variable
RAGE
REEG

N
18
18

Correlation Analysis
REEG

Simple Statistics
Mean Std Dev
Sum Minimum Maximum Label
9.5000 5.3385
171.0 1.0000 18.0000 RANK FOR VARIABLE AGE
9.5000 5.3385
171.0 1.0000 18.0000 RANK FOR VARIABLE EEG

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 18


RAGE
REEG
RAGE
1.00000
-0.76264
RANK FOR VARIABLE AGE
0.0
0.0002
REEG
-0.76264
1.00000
RANK FOR VARIABLE EEG
0.0002
0.0

Conclusion: The Spearman rank correlation between age and EEG is 0.76264. We
reject H 0 : 0 and conclude that the correlation is different from 0 based on the pvalue of 0.0002.

Goodness- Of-Fit Test


Kolmogorov-Smirnov Goodness- Of-Fit Test
1) The Kolmogorov-Smirnov test can be used to test whether a data is from a normal
distribution or not.
The graph in the right is a plot of the
empirical distribution function with a
normal cumulative distribution function
for 100 normal random numbers. The
Kolmogorov-Smirnov test is based on
the maximum distance between these
two curves.
The test statistic, here designated Dmax, is
the maximum difference between the
cumulative proportions of the two
patterns.

2) Suppose that we observe N = m + n observations, X 1 ,..., X m and Y1 ,..., Yn , which


are mutually independent, and from two populations.
The question becomes: are the populations the same or different?
Equivalently: H0: P(X < a) = P(Y < a) for all a .
To test this, define Empirical
Distribution Functions (EDFs):
( t ) # X' s
F
x
m
#
X
's
(t )
F
x
m

t
t

and
.

The test statistic, here designated


Dmax, is the maximum difference
between the cumulative
proportions of the two patterns.
PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as
1 2
KS max
n j (Fi ( x j ) F( x j )) 2 where j = 1,2,...,n

j
n i
The asymptotic Kolmogorov-Smirnov statistic is computed as
KS a KS n

If there are only two class levels, PROC NPAR1WAY computes the two-sample
Kolmogorov statistic as
D = maxj | F1 (xj) - F2(xj) |

where j = 1,2, ... ,n

SAS program for Kolmogorov-Smirnov Test


data oxide;
infile cards missover;
input group $ @;
do until (hemo = .);
input hemo @;
if hemo ne .
then output;
end;
cards;
1 13.7 13.8 14 14.1 14.1 14.2 14.4 15.3 15.3 15.6 15.7 15.9 16.5 16.6 16.7
2 15 16 16.2 16.3 16.8 16.9 17.1 17.4 17.5 15
run;
/* checking the normality of a single variable using the Kolmogorov-Smirnov test */
proc univariate normal;
var hemo;
run;

SAS output(edited)
Tests for Normality

Test

--Statistic---

-----p Value------

Shapiro-Wilk
W
0.943237 Pr < W
0.1925
Kolmogorov-Smirnov D
0.123822 Pr > D
>0.1500
Cramer-von Mises
W-Sq 0.053008 Pr > W-Sq >0.2500
Anderson-Darling
A-Sq 0.403511 Pr > A-Sq >0.2500

Conclusion: The distribution of the variable, hemo, may be considered normal.


/* comparing the distribution of two sample observations */
proc npar1way wilcoxon edf;
class group;
var hemo;
run;

/* edf = empirical distribution function */

SAS output

Kolmogorov-Smirnov 2-Sample Test (Asymptotic)


KS = 0.293939
D = 0.600000
KSa = 1.46969
Prob > KSa = 0.0266

Conclusion: The distribution of X is different from that of Y.


Regression
Nonparametric linear regression using the Theil estimate.
X
Y

X1
Y1

X2
Y2

....
....

Xn
Yn

Here, we construct estimates of the slope using (n(n-1))/2 pairs of observations,


and use the median of these as the estimate of the slope.
Suppose that we have ordered the data by their x values so that for (Yi, xi) and (Yj,
xj) satisfying i<j , then x i x j , and we now have: S i , j

Yj Yi
xi x j

Note that this causes a big problem if some of the x values are the same! In this case, one
must use the finite slopes (i.e. only use the values Si,j from x i x j and use the median of
this reduced set. Therefore, median(Si , j ) i j .
When one cannot assume that the error terms are symmetric about 0, find the n terms,
Yi * X i , i = 1,...,n. The median of these n terms is the estimate of the intercept.
In the following example, Y = acid levels and X = exercise times (in minutes). We want
to establish the relationship between these two variables.
Parametric linear regression
DATA npreg1;
input indep dep @@;

CARDS;
230 421
275 465
run;

175 278
150 105

315 618
360 550

290 482
425 750

proc reg;
model dep = indep;
run;
Model: MODEL1
Dependent Variable: DEP
Analysis of Variance
Source
Model
Error
C Total

Sum of
Mean
DF
Squares
Square
F Value
Prob>F
1 53614.66151 53614.66151
58.115
0.0003
6 5535.33849 922.55642
7 59150.00000

Root MSE
30.37361
R-square
Dep Mean
277.50000
Adj R-sq
C.V.
10.94545
Parameter Estimates

0.9064
0.8908

Parameter
Standard T for H0:
Variable DF
Estimate
Error Parameter=0
INTERCEP 1
INDEP
1

76.210483 28.50456809
0.438898 0.05757290

Nonparametric linear regression


DATA npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
DO I = 1 TO 8;
INPUT Y(I) X(I) @@;
END;
OUTPUT;
CARDS;
230 421 175 278 315 618
275 465 150 105 360 550
run;

290 482
425 750

DATA npreg2;
SET npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
DO I=1 TO 7;
DO J=I+1 TO 8;
SLOPE = (Y(J)-Y(I))/(X(J)-X(I));
OUTPUT;
END;
END;
KEEP SLOPE;
PROC SORT;
BY SLOPE;
run;
PROC PRINT;
TITLE 'THEIL SLOPE ESTIMATE EXAMPLE';
run;
proc means median;
var slope;
run;

Prob > |T|

2.674
7.623

0.0368
0.0003

Data npreg3;
set npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
/* when one assumes that the error terms are not symmetric about 0 */
DO I = 1 to 8;
inter1 = Y(I)- 0.4878*X(I); /* .4878 is the median of the slopes */
output;
END;
keep inter1;
Proc print;
var inter1;
run;
Proc means median;
var inter1;
run;
THEIL SLOPE ESTIMATE EXAMPLE
Obs
1
2
3
4
5
6

SLOPE

-0.66176
0.14451
0.18382
0.25316
0.26144
0.32164

7
0.32500
8
0.34722
9
0.37135
10
0.38462
11
0.41176
12
0.42636
13
0.43147
14
0.47191

15
16
17
18
19
20
21
22

0.50373
0.52632
0.52966
0.53476
0.56373
0.59271
0.68015
0.83333

23
24
25
26
27
28

0.88235
0.98361
1.00000
1.00775
1.02273
1.02941

5
6
7

48.1730
98.7810
91.7100

59.1500

The MEANS Procedure


Analysis Variable : SLOPE
Median
-----------0.4878207
-----------Obs
1

inter1
24.6362

2
3
4

39.3916
13.5396
54.8804

Analysis Variable : inter1


Median
-----------51.5267000
------------

Compare the linear regression equations


due to parametric regression:
and due to nonparametric regression:

76.2105 0.4389 * X
Y
51.5267 0.4878 * X .
Y

To compare the performance of the predicted values, lets compare the means of their
residuals as follows:
DATA npreg1;
input indep dep @@;
pred1 = 76.2105+ 0.4389*dep;
resid1 = indep - pred1;
np_pred2 = 51.5267 + 0.4878*dep;
resid2 = indep - np_pred2;
CARDS;
230 421
175 278
315 618
290 482
275 465
150 105
360 550
425 750
run;
proc means;

var resid1 resid2;


run;

The MEANS Procedure


Variable N
Mean
Std Dev
Minimum
Maximum

resid1
8
-0.0010125
28.1205022
-32.4507000
42.3945000
resid2
8
2.2560250
29.7632035
-37.9871000
47.2543000

In this case, seems that the parametric regression equation predicts the values of Y more closely than that of the
nonparametric regression. Still, the use of the method depends on the distribution assumption of the data.

Homework problems
1. A sample of 15 patients suffering from asthma participated in an experiment to study
the effect of a new treatment on pulmonary function. The dependent variable is FEV
(forced expiratory volume, liters, in 1 second) before and after application of the
treatment.
Subject
1
2
3
4
5
6
7
8

Before
1.69
2.77
1.00
1.66
3.00
.85
1.42
2.82

After
1.69
2.22
3.07
3.35
3.00
2.74
3.61
5.14

Subject
9
10
11
12
13
14
15

Before
2.58
1.84
1.89
1.91
1.75
2.46
2.35

After
2.44
4.17
2.42
2.94
3.04
4.62
4.42

On the basis of these data, can one conclude that the treatment is effective in increasing
the FEV level? Let 0.05 and find the p-value.
a) Perform the sign test.

b) Perform the Wilcoxon signed-rank test.

2. From the same context with Problem 1, subjects 1-8 came from Clinic A and subjects
9 15, from Clinic B. Can one conclude that the FEV levels from these two groups are
different? Let 0.05 and find the p-value. Perform the Wilcoxon Mann-Whitney
Test.
Subject
1
2
3
4
5

Clinic A
1.69
2.22
3.07
3.35
3.00

Subject
9
10
11
12
13

Clinic B
2.44
4.17
2.42
2.94
3.04

6
7
8

2.74
3.61
5.14

14
15

4.62
4.42

3. From the same context with Problems 1 & 2, subjects 1-8 came from Clinic A,
subjects 9 15, from Clinic B, and subjects 16-20 from Clinic C were added later. Can
one conclude that the FEV levels from these three groups are different? Let 0.05
and find the p-value. Perform the Kruskal-Wallis Test.
Subject
1
2
3
4
5
6
7
8

Clinic A
1.69
2.22
3.07
3.35
3.00
2.74
3.61
5.14

Subject
9
10
11
12
13
14
15

Clinic B
2.44
4.17
2.42
2.94
3.04
4.62
4.42

Subject
16
17
18
19
20

Clinic C
2.34
3.17
4.42
4.94
5.04

4. The following table shows the scores made by nine randomly selected student nurses
on final examination in three subject areas:
Student number
1
2
3
4
5
6
7
8
9

Subject Area
Fundamentals
98
95
76
95
83
99
82
75
88

Physiology
95
71
80
81
77
70
80
72
81

Anatomy
77
79
91
84
80
93
87
81
83

Test the null hypothesis that student nurses from which the above sample was drawn
perform equally well in all three subject areas against the alternative hypothesis that they
perform better in, at least, one area. Let 0.05 and find the p-value. Perform the
Friedman Test.

5. From Problem 4, find the Spearman rank correlation between the Physiology scores
and the Anatomy scores.

Anda mungkin juga menyukai