Anda di halaman 1dari 373

Lecture Slides on Mixed Models

Based on

A Course in Mixed Models for Use in


Animal Health and Animal Welfare Research

Søren Højsgaard & Erik Jørgensen

Biometry Research Unit


Danish Institute of Agricultural Sciences
Research Centre Foulum

October 18, 2001


1 Preface

In the spring 2001 the Biometry Research group at the Danish Institute of Agricultural Sciences
arranged a course in Mixed models for researchers at the Department of Animal Health and
Animal Welfare at the same institute. The course consisted a combination of lectures, group
exercises, written assignments and a final project report based on data from experiments that
the project participants were involved in.
During the course, the book SAS System for Mixed Models by Littell et al. (1996) was used,
referred to as LMSW in the present document. It was necessary to supplement the book with
additional theoretical material and examples based on data from the research institute. This
led to a comprehensive number of slides used for the presentations.
This supplementary material is compiled in the present document. We hope the readers will
find it useful. Maybe the online version1 of this document will be even more useful, because of
the hypertext facilities.

Søren Højsgaard & Erik Jørgensen


sorenh@agrsci.dk Erik.Jorgensen@agrsci.dk

Biometry Research Unit


Danish Institute of Agricultural Sciences
Research Centre Foulum
P.O. Box 50
DK-8830 Tjele

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/HSVmixed2001Slides.pdf

3
1 Preface

4
Contents

1 Preface 3

Contents 9

2 Overview of slides 11

3 Basic Concepts from Linear algebra) 13


Why Linear Algebra?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
n–dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Projections onto Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Linear normal models 39


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Linear Normal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Random Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Functions of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
The Distribution of a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
The Expectation in a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Representations of Models in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Least Squares Estimation in a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Estimation on matrix form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
The parameter vector β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Estimability and Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Estimability in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Least Squares Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Hypothetis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Calculating things in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 Some Basic Statistical Concepts 97


Data and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Why the Normal Distribution is so “Normal” . . . . . . . . . . . . . . . . . . . . . . . 101

5
Contents

The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102


Some General Principles of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
How good is an estimator? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Consistency of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Desirable Properties of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 112
The Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 113
The Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
The Maximum likelihood principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
How Good is the Estimate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
The Asymptotic Normal Distribution of the MLE . . . . . . . . . . . . . . . . . . . . . 122
Asymptotical normality of transformations of the MLE . . . . . . . . . . . . . . . . . . 125
Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
How to get the asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6 An overview 137
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Darwins maize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Galtons tilgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Korrekt tilgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Hvad er sket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Den 5. potte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Populations genetik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Populations genetik/ Husdyravl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Mixed Models generelt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7 Experimental planning and design 149


Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Forskningsprocessen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Darwins majs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Hypoteser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Luse Beslutningsstøtte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Forskningsbeslutningsstøtte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Designmuligheder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8 Randomized Complete Block Design 157


Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Linear Normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Random vs. Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
ML - estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Proc Mixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Andre eksempler på RCBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Proc Mixed fortsat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
IC - options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6
Contents

9 Randomized Complete Block Design II 175


Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
BLUEs and BLUPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
BLUP Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Model Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

10 Split-Plot Experiments 183


The General Idea behind Split–Plot Experiments . . . . . . . . . . . . . . . . . . . . . 184
Variance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Comparing Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Inference Issues for Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Analysis of the Split–Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Modelling the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Three Technical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Back to the Original Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Unbalanced cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Satterthwaites approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
How Good is Satterthwaites Approximation . . . . . . . . . . . . . . . . . . . . . . . . 201
Two–sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Split–Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Making the “right” tests with PROC MIXED . . . . . . . . . . . . . . . . . . . . . . . 204
A Severe Warning!! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Some Tentative Conclusions on Satterthwaite . . . . . . . . . . . . . . . . . . . . . . . 207
Random or Fixed Effects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Multilocation Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

11 Examples of Split-Plot Designs 213


Example: W. Schouten Ph.D. work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Breed Effect on Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Straw shortener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Group Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Herd Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Multilocation trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

12 Estimation and tests in mixed models 221


Maximum Likelihood and Linear Normal Models . . . . . . . . . . . . . . . . . . . . . 222
Maximum Likelihood Estimation in Mixed Models . . . . . . . . . . . . . . . . . . . . 225
Using ML or REML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Tests in Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

13 Complications concerning Variance Components 235


Sugar Beet example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Reason . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Likelihood contour plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

7
Contents

G not positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239


Warning Satterthwaite goes wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Testing effects of random components . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

14 Repeated Measurements 245


Analyzing Repeated Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Tacit Assumptions when using the Split–Plot Model . . . . . . . . . . . . . . . . . . . 248
Modelling of Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Types of random variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Unstructured Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
The AR(1)–model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
How to estimate the autocorrelation?? . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Compound Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Which Covariance Structure to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Numerical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
What does the covariance structure mean for the conclusions? . . . . . . . . . . . . . . 263

15 Repeated Measurements: Covariance structures 265


Repeated statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Types of variance structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Unstructured . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Autoregressive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Antedependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Toeplitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Heterogeneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
AR vs CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

16 Random Regression 275


The Basic Idea behind Random Regression . . . . . . . . . . . . . . . . . . . . . . . . 276
Analyzing the Individual Regression Coefficients . . . . . . . . . . . . . . . . . . . . . 279
Random Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
How to ... In SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Correlation structure in Random Regression Models . . . . . . . . . . . . . . . . . . . 285

17 Factor Structure Diagrams 289


Factor Structure Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Two–way ANOVA with Replicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Two–way ANOVA without Replicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Block Experiments with Replicates within Blocks . . . . . . . . . . . . . . . . . . . . . 294
Block Experiments without Replicates within Blocks . . . . . . . . . . . . . . . . . . . 296
Split Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

18 Covariate Models and Multivariate Response 301


Example of the use of covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

8
Contents

Model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304


Table 5:1 LMSW page 5.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
SAS- Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Feed vs daily gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Multivariate Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
The Components of a MLNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
How to ... In SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
The general setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

19 Heterogeneous Variance 319


Why Variance Heterogeneity is Important to Recognize . . . . . . . . . . . . . . . . . 320
Graphical Investigation of the Variance Structure . . . . . . . . . . . . . . . . . . . . . 321
Variance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
The Delta–method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Taylors Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Applying Taylors Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Transformation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Modelling Variance Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
Heterogeneous Variance for Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . 334
Power–of–Mean for Data with Covariates . . . . . . . . . . . . . . . . . . . . . . . . . 340
Noget om transformationer, normalfordelingsapproximation og konfidensintervaller . . 345
Transformation og konfidensintervaller . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

20 Variansheterogeneity: Example of effect of transformation 355


Variance Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Model of Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Model comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Treatment differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Natural Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

21 Variance Homogeneity: Diurnal Variation 365


Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Random Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Model of mean ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Modelling variance inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
SAS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

22 Links to supplementary material 371

Bibliography 373

9
Contents

10
2 Overview of slides

The course was arranged consisting of three blocks of lectures.

1. Brush-up concerning the necessary prerequisites of statistical concepts, linear algebra and
linear normal models. In addition, a historic review was given and experimental planning
discussed. This covers Chapter 3-7.

2. This block of lectures covered the basic application of Mixed Models within the experi-
mental designs typically used at the Department of Animal Health and Animal Welfare.
That is

• randomized complete block designs, (Chapter 8 and 9),


• split-plot designs (Chapter 10 and 11),
• repeated measurements. (Chapter 14 and 15)
• random regression. (Chapter 16)
• covariates and multivariate response. (Chapter 18)

In addition the fundamentals concerning estimation and tests in Mixed Models, is dis-
cuused in Chapter 12. The two remaining issues: numerical problems (Chapter 13) and
factor structure diagrams (Chapter 17) were included because of questions raised from the
participants. In practical examples some of the variance components estimates were very
often set to 0, leading to problems concerning the calculations of d.f. (i.e., with Satterth-
waites approximation). This further raised a need for a more ’manual’ approach towards
d.f. calculations in different designs.

3. In the final part of the course some additional topics and developments within Mixed
Models were presented and efforts were made to give a general summary and overview of
the topics. Lectures concerning variance heterogeneity is presented in Chapter 19 and 20.
An example using the presented methods on data concerning diurnal variation is presented
in Chapter 21
In addition, the preliminary work on the final project report were presented during this
final block.

11
2 Overview of slides

The final chapter (22) in this book consist of links to supplementary material. Mainly, SAS
examples.
The exercises uses in the course is not included but can be found by visiting the home page of
the course1
Finally, it should be mentioned that each chapter starts with a very short introduction to the
topic. In addition, a link to the full screen version of the presentation can be found.

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/HSVmixed2001.htm

12
3 Basic Concepts from Linear algebra)

Linear algebra is an important prerequisite in order to understand the model formulation and
calculations within Mixed Model. The following slides served as a brush-up on the theory, with
presentation of the most important concepts and results.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/LinAlg.f.pdf

13
3 Basic Concepts from Linear algebra)

Why Linear Algebra??

• Many statistical models used in practice are assumed to have some


kind of a linear structure. (Linear regression and analysis of variance
are classical examples.)

• Linear algebra is the branch of mathematics that deals with linear


structures.

• Linear algebra is a convenient tool for handling models with linear


structures.

• Moreover, many concepts from linear algebra can be given


geometrical interpretation.
October 17, 2001 Mixed Models Course 1

• Hence geometry can be a way to understand statistical models with


linear structures

October 17, 2001 Mixed Models Course 2

14
Vectors

Vectors: A column vector is a list of numbers stacked on top of each


other, e.g.  
2
a= 1 
 
3
A row vector is a list of numbers written one after the other, e.g.

b = (2, 1, 3)

In both cases, the list is ordered, i.e.

(2, 1, 3) 6= (1, 2, 3).


October 17, 2001 Mixed Models Course 3

• Note In what follows all vectors are column vectors unless


otherwise stated.
In general an n–vector has the form
 
a1
 a
 
a =  .2

 .


an

where the ais are numbers.

October 17, 2001 Mixed Models Course 4

15
3 Basic Concepts from Linear algebra)

Transpose of vectors: This means that a column vector is turned


into a row vector and that a row vector is turned into a column
vector. The transpose is denoted by “>”. For example,

a> = (a1, a2, . . . , an)

Hence transposing twice takes us back to where we started:

a = (a>)>

• Example:
 >  
1 1
 3  = [1, 3, 2] og [1, 3, 2]> =  3 
   
2 2

October 17, 2001 Mixed Models Course 5

Multiplying a vector by a number: If a is a vector and α is a


number then αa is the vector
 
αa1
 αa2 
 
αa =  . 
 . 
αan

• Example:    
1 7
7  3  =  21 
   
2 14

October 17, 2001 Mixed Models Course 6

16
Sum of vectors: Let a and b be n–vectors. The sum a + b is the
n–vector
     
a1 b1 a1 + b 1
 a   b2   a2 + b 2
     
a + b =  .2 + . = =b+a

 .   .   .. 
an bn an + b n

• Note Only vectors of the same dimension can be added !


• Example:
       
1 2 1+2 3
 3  +  8  =  3 + 8  =  11 
       
2 9 2+9 11

October 17, 2001 Mixed Models Course 7

Inner product of vectors: Let a and b be n–vectors. The inner


product a · b is the number
n
X
a · b = a 1 b1 + a2 b2 + · · · + a n bn = ai bi
i=1

• Note The product is a number – not a vector


• Note Only vectors of the same dimension can be multiplied!
• Example:
   
1 2
 3  ·  8  = 1 · 2 + 3 · 8 + 2 · 9 = 44
   
2 9

October 17, 2001 Mixed Models Course 8

17
3 Basic Concepts from Linear algebra)

The length (norm) of a vector: The length (or norm) of a vector


a is v
u n
√ uX
||a|| = a · a = t a2i
i=1

The 0–vector and the 1–vector: The 0-vector (1–vector) is a


vector with 0 (1) on all entries. The 0–vector (1–vector) is
frequently written simply as 0 (1) or as 0n (1n) to emphasize that
it is of length n.

Orthogonal (perpendicular) vectors: Two vectors a and b with


a 6= 0 and b 6= 0 are orthogonal if their inner product is zero,
written
a⊥b⇔a·b=0

October 17, 2001 Mixed Models Course 9

Matrices

Matrix: A matrix A with r rows og c columns is an r × c table of


the form  
a11 a12 . . . a1c
 a a22 . . . a2c 
 
A =  21. .. . . . .. 
 . 
ar1 ar2 . . . arc
It is said that A has the dimension r × c.
• Note One can regard A as consisting of c columns vectors put
after each other:

A = [a1 : a2 : · · · : ac]
October 17, 2001 Mixed Models Course 10

18
Transpose of matrices: A matrix is transposed by interchanging
rows and columns and is denoted by “>”. That is,
 
a11 a21 . . . ar1
a12 a22 . . . ar2 
 
>
A =

 .. . ... .  
a1c a2c . . . arc

Example:
 >
1 2  
1 3 2
3 8  =
 
2 8 9
2 9

October 17, 2001 Mixed Models Course 11

• Note If A is an r × c matrix then A> is a c × r matrix.


• Note One can regard a column vector of length r as an r × 1
matrix and a row vector of length c as a 1 × c matrix.

October 17, 2001 Mixed Models Course 12

19
3 Basic Concepts from Linear algebra)

Multiplying a matrix with a number: For a number α and a matrix


A, the product αA is the matrix
 
αa11 αa12 . . . αa1c
 αa αa22 . . . αa2c
 
αA =  . 21

 . .. ... .. 

αar1 αar2 . . . αarc

Example:    
1 2 7 14
7  3 8  =  21 56 
   
2 9 14 63

October 17, 2001 Mixed Models Course 13

Sum of matrices: Let A = [a1 : a2 : · · · : ac] and B = [b1 : b2 : · · · :


bc] be r × c matrices.
The sum A + B is the r × c matrix given by

A + B = [a1 + b1 : a2 + b2 : · · · : as + bs]
   
a11 a12 . . . a1c b11 b12 . . . b1c
 a21 a22 . . . a2c   b21 b22 . . . b2c 
   
=  . .. . . . ..  +
 .   .. .. . . . .. 

ar1 ar2 . . . arc br1 br2 . . . brc
 
a11 + b11 a12 + b12 . . . a1c + b1c
 a21 + b21 a22 + b22 . . . a2c + b2c 
 
=  .. .. ... .. =B+A
 
ar1 + br1 ar2 + br2 . . . arc + brc

October 17, 2001 Mixed Models Course 14

20
• Note Only matrices with the same dimensions can be added.
Example:      
1 2 5 4 6 6
 3 8  +  8 2  =  11 10 
     
2 9 3 7 5 16

October 17, 2001 Mixed Models Course 15

Multiplication of a matrix and a vector: Let A be an r × c matrix


and let b be a c-dimensional column vector. The product Ab is the
r × 1 matrix
    
a11 a12 . . . a1c b1 a11b1 + a12b2 + · · · + a1cbc
a a22 . . . a2c   b2   a21b1 + a22b2 + · · · + a2cbc
    
Ab =  21 =

 . .   ..  
.. . . . ..   .. 

ar1 ar2 . . . arc bc ar1b1 + ar2b2 + · · · + arcbc

• Eksempel:
     
1 2   1·5+2·8 21
 5
 3 8  =  3 · 5 + 8 · 8  =  79 
    
8
2 9 2·5+9·8 82

October 17, 2001 Mixed Models Course 16

21
3 Basic Concepts from Linear algebra)

Multiplication of matrices: Let A be an r × c matrix and B a c × t


matrix, i.e. B = [b1 : b2 : · · · : bt]. The product AB is the r × t
matrix given by:

AB = A[b1 : b2 : · · · : bt] = [Ab1 : Ab2 : · · · : Abt]

Example:
    
"
1 2
#  1 2   1 2  
5 4  5 4 
3 8 =  3 8  :  3 8 
  
8 2 8 2

2 9 2 9 2 9
   
1·5+2·8 1·4+2·2 21 8
=  3·5+8·8 3 · 4 + 8 · 2  =  79 28 
   
2·5+9·8 2·4+9·2 82 26

October 17, 2001 Mixed Models Course 17

• Note The product AB can only be formed if the number of


rows in B and the number of columns in A are the same. On
that case, A and B are said to be conforme.
• Note In general AB and BA are not identical.
A mnemonic for matrix multiplication is :

5 4  
"
1 2
#  8 2 21 8
5 4
3 8 = 1 2 1 · 5 + 2 · 8 1 · 4 + 2 · 2 =  79 28 
 
8 2
2 9 3 8 3·5+8·8 3·4+8·2 82 26
2 9 2·5+9·8 2·4+9·2

October 17, 2001 Mixed Models Course 18

22
Special matrices:
• An n × n matrix is said to be a square matrix
• A matrix with 0 on all entries is the 0–matrix and is often written
simply as 0 (or as 0r×c to emphasize the dimension).
• A matrix consisting of 1s in all entries is of written J (or as Jr×c
to emphasize the dimension).
• A square matrix with 0 on all off–diagonal entries and elements
d1, d2, . . . , dn on the diagonal is said to be a diagonal matrix and
is iften written diag{d1, d2, . . . , dn}
• A diagonal matrix 1s on the diagonal is called the unity matrix
and is denoted I (or In×n to emphasize the dimension).
• A matrix A is a symmetric matrix A = A>.

October 17, 2001 Mixed Models Course 19

Some rules for matrix operations: For (conformable) matrices


A, B and C the following rules apply

(A + B)> = A> + B >

(AB)> = B >A>
A(B + C) = AB + AC
AB = AC 6⇒ B = C

October 17, 2001 Mixed Models Course 20

23
3 Basic Concepts from Linear algebra)

Inverse of a matrix: The inverse of an n × n matrix A is the matrix


B (which is also n × n) which multiplied with A gives the identity
matrix I. That is,
AB = BA = I.
One says that B is A’s inverse and writes B = A−1.

• Note Only square matrices can have an inverse.

• Note Not all square matrices have an inverse.

• Note When the inverse exists, it is unique.

• Note Finding the inverse of a large matrix A is numerically


complicated.
October 17, 2001 Mixed Models Course 21

Example 1. It is easy find the inverse for a 2 × 2 matrix. When


 
a b
A=
c d

then the inverse is


 
1 d −b
A−1 =
ad − bc −c a

under the assumption that ab − bc 6= 0. The number ab − bc is


called the determinant of A, sometimes written det(A).

If the determinant det(A) = 0, then A has no inverse. f in

October 17, 2001 Mixed Models Course 22

24
Example 2. Finding the inverse of a diagonal matrix is easy: Let
 
a1 0 . . . 0
 0 a2 0 
 
A= . ... 0 
 . 
0 0 . . . an

where all ai 6= 0. Then the inverse is


 1 
a 0 ... 0
 01 1 0 
A−1 =  . a2 .
 
 . .. 0


1
0 0 ... an

If one ai = 0 then A−1 does not exist. f in

October 17, 2001 Mixed Models Course 23

Generalized inverse: Not all square matrices have an inverse.


However all square matrices have a generalized inverse.
A generalized inverse of a square matrix A is a matrix A− satisfying
that
AA−A = A

Any square matrix has an infinite number of generalized inverses.

October 17, 2001 Mixed Models Course 24

25
3 Basic Concepts from Linear algebra)

Linear Combinations

Let a1, a2, . . . , ac be r–vectors and let A = [a1 : a2 : · · · : ac] be the


corresponding r × c matrix.

Let vv = (v1, v2, . . . , vc)> be a c-vector and let


X
x = Av = a1v1 + a2v2 + · · · + acvc = a j vj
j

Then the r–vector x is said to be a linear combination of


a1 , a 2 , . . . , a c .

October 17, 2001 Mixed Models Course 25

Let w = (w1, w2, . . . , wc)> be another c vector and let


P
correspondingly y = Aw = j aj wj .

Then the following can be noted:

• For a number α the vector αx = α(Av) = A(αv) is also a linear


combination of a1, a2, . . . , ac.

• The sum x + y = Av + Aw = A(v + w) is also a linear combination


of a1, a2, . . . , ac.

• Hence if x and y are both linear combination a1, a2, . . . , ac then so


is the sum αx + βy where α and β are numbers.

October 17, 2001 Mixed Models Course 26

26
n–dimensional Spaces

A 2–vector x = (x1, x2) can be regarded as the point with


coordinates (x1, x2) in a 2–dimensional coordinate system, i.e. in the
plane.

Likewise a 3–vector x = (x1, x2, x3) can be regarded as the point


with coordinates (x1, x2, x3) in a 3–dimensional coordinate system,
i.e. in space.

In general an n–vector x = (x1, x2, . . . , xn) can be regarded as the


point with coordinates (x1, x2, . . . , xn) in an n–dimensional
coordinate system, i.e. in an n–dimensional space. Such as space
shall here be referred to as Rn. Its hard to draw!
October 17, 2001 Mixed Models Course 27

To justify such n–dimensional spaces, suppose x consists of a


location of an object (that takes 3 coordinates), the temperature of
the object (that occupies one coordinate) and the time (that also
occupies one coordinate). Hence the total information about the
object can be regarded as a point in a 5–dimensional space.

Note that If x and y are both vectors in Rn then so is the sum


αx + βy.

October 17, 2001 Mixed Models Course 28

27
3 Basic Concepts from Linear algebra)

Linear Subspaces

Consider a set a1, a2, . . . , ac of r–vectors.

We can regard these vectors as “building blocks” for creating new


vectors as linear combinations of the building blocks. Any such
vector is an r–vector

The set of vectors which can be created as linear combinations of


the “building blocks” is called a linear subspace of Rr .

Such a space, let us call it L, is said to be spanned by a1, a2, . . . , ac


and we write L = span(a1, a2, . . . , ac).

October 17, 2001 Mixed Models Course 29

Example 3. Consider the vectors


   
2 1
a1 =  6  , a 2 =  5 
   
4 7

Hence span(a1, a2) is the set of vectors which can be written as


   
2 1
y =  6  v 1 +  5  v2
   
4 7

for alle possible choices of v = (v1, v2). f in

October 17, 2001 Mixed Models Course 30

28
More precisely, L consists of all vectors of the form

a 1 v1 + a 2 v2 + · · · + a c vc

for all possible choices of c–vectors v = (v2, . . . , vc).

It is common to organize the building blocks as a matrix


A = [a1 : · · · : ac]. Then another way of describing L is as the set of
vectors that can be written as Av, or more precisely

L = {y|y = Av for all possible vectors v}

Frequenly one uses the name span(A) for L.

October 17, 2001 Mixed Models Course 31

There are some additional aspects of subspaces of which a few will


be illustrated:

Example 4. Consider again the subspace L = span(a1, a2) where

a1 = (2, 6, 4)> a2 = (1, 5, 7)>

• A question is whether all vectors y = (y1, y2, y3)> can be written


as y = a1v1 + a2v2?
The answer is “no”, for example y = (1, 5, 3) can not be written
in that form.

• Another question is whether there are other ways of representing


L?
The answer is “yes” – there are infinitely many. To pick one, let
b1 = a1 + a2 and b2 = a1 − a2. Then L = span(b1, b2).
October 17, 2001 Mixed Models Course 32

29
3 Basic Concepts from Linear algebra)

f in

• Note The 0-vector belongs to all linear subspaces. In the previous


example one gets y = 0 when choosing α = (0, 0, 0).)

October 17, 2001 Mixed Models Course 33

Linear dependence and independence

Linearly dependent vectors: A set of vectors a1, ..., ac are


linearly dependent if one of them can be written as a linear
combination of the others, for example if
c−1
X
ac = a j qj
j=1

where the vj s are numbers.

Linearly independent vectors: If none of the vectors a1, ..., ac can


be written as a linear combination of the others, the set is said to
be linearly independent.
October 17, 2001 Mixed Models Course 34

30
Throw–out–technique: If one vector, say ac, can be written as a
linear combination of the other vectors, then it can be thrown away
with changing the structure of the space, i.e.

span(a1, . . . , ac) = span(a1, . . . , ac−1)

This process can go on until one ends up with a set of linearly


independent vectors.
This allow us to find a representation of the which is as simple
(economical) as possible.

October 17, 2001 Mixed Models Course 35

Example 5. Consider the vectors


       
2 1 0 3
a1 =  6  , a2 =  5  , a3 =  2  og x =  13 
       
4 7 5 16

1. The vector x is a linear combination of a1, a2 and a3, since


x = a1 + a2 + a3 .

2. Since a3 = a2 − 12 a1, the ai–vectors are linearly dependent.


Consequently x can be written as a linear combination of only
a1 og a2, because x = 12 a1 + 2a2.

3. The vectors a1, a2 are linearly independent and so are the sets
a1, a3 and a2, a3.
October 17, 2001 Mixed Models Course 36

31
3 Basic Concepts from Linear algebra)

f in

Basis of a subspace: If the vectors a1, ..., ac span a given subspace


L and are linearly independent, the are said to be a basis for L.
Any linear subspace has infinitely many different bases.

Dimension of a linear subspace: Yet all bases of a linear subspace


shares have a common feature: They have the same number of
elements. The number of elements of a basis is the dimension of
the subspace.

Throw–away: Having a linearly dependent set of vectors a1, ..., ac


on can always apply the throw–away–technique to obtain a
linearly independent set of vectors. This set is then a basis
October 17, 2001 Mixed Models Course 37

for span(a1, . . . , ac).

Example 6. Consider the vectors


     
2 1 0
a1 =  6  , a 2 =  5  , a 3 =  2 
     
4 7 5
   
1 2
b1 =  3  and b2 =  8 
   
2 9
and the corresponding matrices A = [a1 : a2 : a3], Ã = [a1 : a2] og
B = [b1 : b2].

1. Since a3 = a2 − 12 a1, the ai vectors are linearly dependent.


October 17, 2001 Mixed Models Course 38

32
f in

• Note Since L = span(A) = span(B) one can think of the


matrices A and B as two different ways of representing the same
linear subspace.

October 17, 2001 Mixed Models Course 40

Projections onto Linear Subspaces

Example 7. Consider the vector a = (2, 2) and y = (1, 2).

Clear y is not in span(a). In statistics the following question is


extremely important: Can we find a vector ŷ in span(a) which is as
“close to” y as possible?

The answer is “yes”: Find the (orthogonal) projection of the point


y onto the line going through a. There is a simple mathematical
expression for obtaining ŷ, namely

3
        
2 1 1 1 1 1 1
ŷ = a(a>a)−1a>y = [2, 2] = = 2
3
2 8 2 2 1 1 2 2

October 17, 2001 Mixed Models Course 41

33
3 Basic Concepts from Linear algebra)

The property of ŷ is that the length of y − ŷ is as small as possible.

Moreover, y − ŷ and ŷ are orthogonal. f in

In general let y be an r–vector and let A = [a1 : · · · : ac] be an r × c


matrix.

Then there always exist a vector ŷ in span(A) which is as close to y


as possible.

If y is in span(A), then ŷ = y because in this case the lenght of


y − ŷ is zero.

If y is not in span(A) then the expression is as follows: Assume that


all columns of A are linearly independent. (Recall that if that is not
October 17, 2001 Mixed Models Course 42

the case we can throw away redundant columns without changing


the space spanned by those remaining.)

Then ŷ = P y where

P = A(A>A)−1A>

is the projection matrix onto span(A).

It then holds that

1. P y is in span().

2. P y is the vector in span(A) which is closest to y (in the sense that


the lenght of y − ŷ is minmized.

3. P y = y if and only if y is already in span(A).


October 17, 2001 Mixed Models Course 43

34
Example 8. Consider the 3 × 2 matrix A = [a1 : a2], where

   
1 2
a1 =  3  og a2 =  8 
   
2 9

Then the projection matrix onto span(A) is P = A(A>A)−1A>. To


find P we first calculate

 
  1 2  
1 3 2 14 44
A> A =  3 8 =
 
2 8 9 44 149
2 9

October 17, 2001 Mixed Models Course 44

Hence

 
> −1 1 149 −44
(X X) =
150 −44 14

From this we find

  
> −1 > 1 149 −44 1 3 2
(X X) X =
150 −44 14 2 8 9
 
1 61 95 98
=
150 −16 −20 38
October 17, 2001 Mixed Models Course 45

35
3 Basic Concepts from Linear algebra)

Finally we find
 
1 2  
1  61 95 98
P = A(A>A)−1A> = 3 8 

150 −16 −20 38
2 9
 
29 55 −22
1 
=  55 125 10 

150
−22 10 146

f in

October 17, 2001 Mixed Models Course 46

Exercises in linear algebra

Exercise 1. 1. Are the vectors (1, 1) and (1, 2) orthogonal?

2. Are (1, 1) and (2, −2) ?

3. Are (1, 1) and (−1, −1) ?

4. Make a drawing which illustrates these vectors

Exercise 2. Let  
1 2
A =  3 4 .
 
5 6

October 17, 2001 Mixed Models Course 47

36
1. Is A symmetrical?

2. Is A>A symmetrical?

3. Is AA> symmetrical?

4. What is the result from adding A and A>?

Exercise 3. Let
   
1 2 1 0
A= , and B = .
3 4 1 1

Calculate AB and BA. What can be concluded from this?

Exercise 4. Let a = (1, 1, 1, 0, 0, 0)> be a 6 × 1 matrix. Find aa>


and a>a.
October 17, 2001 Mixed Models Course 48

Exercise 5. Let
 
a b
A=
c d

and
 
1 d −b
B=
ad − bc −c a

Calculate AB. What can be concluded from this?

Exercise 6. What is the inverse to the 3 × 3 matrix diag(1, 4, 9)?

Exercise 7. Two equations with two unknowns. COnvince yourself


October 17, 2001 Mixed Models Course 49

37
3 Basic Concepts from Linear algebra)

that the system of equations

x1 + 2x2 = 3
2x1 + 3x2 = 4

can be written as
    
1 2 x1 3
= ,
2 3 x2 4

i.e. as Ax = b. Find A−1 and use this for solving the system of
equations as follows:

x = Ix = A−1Ax = A−1b.
October 17, 2001 Mixed Models Course 50

Exercise 8. Let  
1 0
1 0
 
A= .
 
 0 1 
0 1

1. How do vectors of the form Av look when v = (v1, v2)>?

2. Find the projection matrix P = A(A>A)−1A>.

3. Let y = (1, 3, 5, 7)> . Find P y.

October 17, 2001 Mixed Models Course 51

38
4 Linear normal models

Linear normal models serves as a natural starting point for the presentation of Mixed Models
theory. Most researchers within animal science has a least a working knowledge of linear normal
models
These slides served the purpose of giving an overview of the different concepts, and to link the
concepts with the underlying statistical theory. Finally, the standard terminology used within
SAS, were presented from a theoretical point of view.
Link to the full screen presentation1 .

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/LNM.f.pdf

39
4 Linear normal models

Introduction

Many well known statistical models used in practice, for example


• linear regression,
• multiple regression,
• analysis of variance,
• analysis of covariance,

can be formulated in the general framework of linear normal models


(abbreviated LNM), which undoubtly is the most important class of
models in statistics.

October 17, 2001 Mixed Models Course 1

A linear normal model is also sometimes called a


general linear model.

The SAS procedure PROC GLM is designed to deal with the class of
linear normal models.

Any linear normal model can be formulated in matrix form as

Y = Xβ + 

where Y is an n × 1 vector of observations, X is an n × p matrix of


covariates, β is a p × 1 vector of unknown parameters and  is a
n × 1 vector of unobservable random errors.

October 17, 2001 Mixed Models Course 2

40
Example 1. One–way analysis of variance.

The model
Ykl = αk + kl
2
where kl ∼ N (0, σ ) for k = 1, 2 and l = 1, 2, 3 can be written in
matrix form as
     
Y11 1 0 11

 Y12 


 1 0 


  12 

 Y13  
 =  1 0  α1  13 
 α2 + 
   

 Y21 


 0 1   21 

 Y22   0 1   22 
Y23 0 1 23

Y = X β + 

October 17, 2001 Mixed Models Course 3

The vector of expected values µ = (µ11, µ12, . . . , µ23)> is


     
µ11 1 0 α1

 µ12 


 1 0 


  α1 

 µ13  
 =  1 0  α1  α1 
 α2 = 
   

 µ21 


 0 1   α2 

 µ22   0 1   α2 
µ23 0 1 α2

µ = X β

f in

October 17, 2001 Mixed Models Course 4

41
4 Linear normal models

There are good reasons for dealing with LNMs in general instead, of
treating regression analysis, analysis of variance etc. separately.

For LNMs in general it is easy to establish how to


• estimate parameters,
• estimate contrasts,
• make significance tests,
• perform model control.

From these general results, it can be deduced how to make the


corresponding tests in e.g. regression models and in analysis of
variance

October 17, 2001 Mixed Models Course 5

It is also convenient to work with LNMs in matrix terminology,


because any LNM can be formulated generally as

y = Xβ + 

Moreover, random effects models (mixed models) are an extension of


linear normal models. I.e. any linear normal model is in a sense also
a mixed model.

Many aspects of mixed models become extremely cumbersome if the


matrix representation is not available.

October 17, 2001 Mixed Models Course 6

42
Example 2. Simple linear regression:

The linear regression model

Y i = β 0 + β 1 xi +  i

where i ∼ N (0, σ 2) for i = 1, . . . , 6 can be written in matrix form as


     
Y1 1 x1 1

 Y2 


 1 x2 


  2 

 Y3  
 =  1 x3  β0  3 
 β1 + 
   

 Y4 


 1 x4   4 

 Y5   1 x5   5 
Y6 1 x6 6

Y = X β + 

October 17, 2001 Mixed Models Course 7

The vector of expected values µ = (µ1, µ2, . . . , µ6)> is


     
µ1 1 x1 β 0 + β 1 x1

 µ2 


 1 x2 


  β 0 + β 1 x2 

 µ3  
 =  1 x3  β0  β 0 + β 1 x3 
 β1 = 
   

 µ4 


 1 x4   β 0 + β 1 x4 

 µ5   1 x5   β 0 + β 1 x5 
µ6 1 x6 β 0 + β 1 x6

µ = X β

f in

October 17, 2001 Mixed Models Course 8

43
4 Linear normal models

Linear Normal Models

A linear normal model (LNM) is defined as follows:

1. The observations y1, . . . , yn come from (are realizations of)


independent random variables Y1, . . . , Yn.

2. Each random variable has a normal distribution

Yi = µi + i i ∼ N (0, σ 2).

Hence each Yi is allowed to have its own mean value, but the
variance σ 2 is the same for all i = 1, . . . , n.
October 17, 2001 Mixed Models Course 9

3. To each observation yi there are covariates (known constants)


xi1, . . . , xip such that
p
X
µi = µ(β)i = xi1β1 + xi2β2 + · · · + xipβp = xik βk .
k=1

That is, the mean value µi is related to the covariates in a linear


way through the parameters β1, . . . , βp.

A practical interpretation of constant variance is that each random


variable Yi has the same tendency to deviate (in a random way)
from its expectation µi.

October 17, 2001 Mixed Models Course 10

44
As it has been illustrated, any LNM can be cast in matrix form as

Y = Xβ + 

where

Y : is an n × 1 vector of observations,

X : is an n × p matrix of covariates, whose ith row is xi1, . . . , xip,

β : is a p × 1 vector of unknown parameters, and

 : is a n × 1 vector of unobservable random errors which are


independent and N (0, σ 2) distributed.
October 17, 2001 Mixed Models Course 11

The matrix X is called the design matrix (or model matrix) because
it contains information about covariates, i.e. about the design of the
study.

October 17, 2001 Mixed Models Course 12

45
4 Linear normal models

Example 3. Polynomial regression:

The polynomial regression model

Yi = β0 + β1xi + β2x2i + i

where i ∼ N (0, σ 2) for i = 1, . . . , 6 can be written in matrix form as


x21
     
Y1 1 x1 1
 Y2   1 x2 x22 
 β0
  2 
x23
    
 Y3  
 =  1 x3  
  β1  +  3 
x24
 
 Y4   1 x4 
 β1
 4 
x25
    
 Y5   1 x5   5 
Y6 1 x6 x26 6

µ = X β + 

f in
October 17, 2001 Mixed Models Course 13

Random Vectors and Matrices

A random vector Z = (Z1, . . . , Zn)> is a vector of random variables.

Since we are working with vectors of random variables, it is


convenient to establish the notions of

• expectation vector (or mean vector ) and

• covariance matrix of a vector of random variables.

October 17, 2001 Mixed Models Course 14

46
• Most frequently the interest is the the mean vector.

• Yet, the covariance matrix is of interest when modelling that


observations can not be regarded as comming from independent
random variables.

• In fact, one view of mixed models is that mixed models are


concerned with modelling the covariance matrix is some structured
way.

October 17, 2001 Mixed Models Course 15

The mean or expectation of a random vector is the vector of mean


values, i.e.

   
E(Z1) µ1
E(Z2) µ2
   
E(Z) =  = =µ
   
 ..   .. 
E(Zn) µn

For a LNM, we have already seen a use of this, namely through


writing
µ = Xβ.

October 17, 2001 Mixed Models Course 16

47
4 Linear normal models

The covariance matrix Cov(Z) of a random vector

Z = (Z1, . . . , Zn)>

is the n × n matrix whose element in the ith row and jth column is
the covariance between Zi and Zj .

Example 4. For example, with n = 3 we have

σ12 σ12 σ13


" # " #
Var(Z1) Cov(Z1, Z2) Cov(Z1, Z3)
Cov(Z) = Cov(Z1, Z2) Var(Z2) Cov(Z2, Z3) = σ21 σ22 σ23
Cov(Z3, Z1) Cov(Z3, Z2) Var(Z3) σ31 σ32 σ32

f in

October 17, 2001 Mixed Models Course 17

In general

Cov(Z)ij = Cov(Zi, Zj ) = E[(Zi − µi)(zj − µj )].

In particular the diagonal elements of Cov(Z) contain the variances,

Cov(Z)ii = Cov(Zi, Zi) = E[(Zi − µi)2] = V ar(Zi).

Since Cov(Zi, Zj ) = Cov(Zj , Zi), the covariance matrix is


symmetric.

October 17, 2001 Mixed Models Course 18

48
Example 5. The error term  = (1, . . . , n) from a linear normal
model has a very simple covariance matrix:

• Var(i) = σ 2 because the variance is the same for all units

• Cov(i, j ) = 0 because i and j are independent.

• Hence  
1 0 ... 0
0 1 ... 0 
Cov() = σ 2 .. .. . . . 2
..  = σ In

0 0 ... 1

f in

October 17, 2001 Mixed Models Course 19

Functions of Random Vectors

Matrix algebra is useful when dealing with


linear functions of random vectors.

If Z is a random n-vector, A is an r × n matrix and b is an r–vector,


then
U = AZ + b
is also a random vector.

October 17, 2001 Mixed Models Course 20

49
4 Linear normal models

The mean and covariance of linear functions of random vectors is


easily calculated using the following:

Result 1.

E(AY + b) = A E(Y ) + b (1)


Cov(AY + b) = Cov(AY ) = A Cov(Y )A> (2)

October 17, 2001 Mixed Models Course 21

A particular application of (1) and (2) is the following:

• Let Z be a random vector of length n with mean E(Z) (an


n–vector) and covariance matrix Cov(Z) (an n × n matrix).

• Let a = (a1, . . . , an)> be a vector of numbers and consider the


linear combination U = i aiZi = a>Z.
P

• Then (1) and (2) implies that

E(U ) = E(a>Z) = a> E(Z)


Cov(U ) = Cov(a>Z) = a> Cov(Z)a

October 17, 2001 Mixed Models Course 22

50
The Multivariate Normal Distribution

So far, we have treated the mean and covariance of a random vector.

We shall now discuss a distribution of a random vector:

Definition 1. It is said that Z follows an n–dimensional


multivariate normal distribution (in short MVN) with mean vector
µ = E(Z) and covariance matrix Σ = Cov(Z), written

Z ∼ Nn(µ, Σ)

if a>Z follows a univariate normal distribution for all possible n-


vectors a.
October 17, 2001 Mixed Models Course 23

Without going into detail, we shall just mention that if Σ has an


inverse, then Z has a density which can be written

n n 1
f (z) = (2π)− 2 det(Σ)− 2 exp{ (z − µ)>Σ−1(z − µ)}
2

Example 6. For n = 2 the density looks as follows:

October 17, 2001 Mixed Models Course 24

51
4 Linear normal models

f in

October 17, 2001 Mixed Models Course 25

The Distribution of a LNM

For a LNM, the vector of unobservable errors is  = (1, . . . , n)>,


where i ∼ N (0, σ 2) and 1, . . . , n are independent.

Hence we have

E() = 0 and Cov() = σ 2I

Since any linear combination of independent N (0, σ 2)–variables


yields a normal variable we conclude that

 ∼ Nn(0, σ 2 I)
October 17, 2001 Mixed Models Course 26

52
Hence for the linear normal model Y = Xβ +  we find that

E(Y ) = µ = E(Xβ + )
= Xβ + E() = Xβ
Cov(Y ) = Cov(Xβ + )
= Cov() = σ 2I

and can write


Y ∼ Nn(Xβ, σ 2I).

October 17, 2001 Mixed Models Course 27

The Expectation in a LNM

Example 7. (Continuation of Example 1).

The one–way analysis of variance model in Example 1 can be


formulated at least three different ways:

1. As Ykl = αk + kl, and β = (α1, α2)>.

2. As Ykl = δ + γk + kl where γ2 = 0, such that γ1 is represents the


treatment effect. Hence, β2 = (δ, γ1)>.

3. As Ykl = δ + ρk + kl. Thus, β3 = (δ, ρ1, ρ2)>.


October 17, 2001 Mixed Models Course 28

53
4 Linear normal models

In many ways, the latter formulation is the most natural and


conventional, but it poses some problems

Let
     
1 0 1 1 1 1 0

 1 0 


 1 1 


 1 1 0 

1 0 1 1 1 1 0
     
X=  X2 =   X3 =  (3)
     
0 1 1 0 1 0 1

     
0 1 1 0 1 0 1
     
     
0 1 1 0 1 0 1

Any vector which can be written as Xβ must be of the form


(a, a, a, b, b, b)> for numbers a and b.
October 17, 2001 Mixed Models Course 29

But that is also the case for vectors of the form X2β2 and X3β3.
From this we conclude that with respect to the mean vector the
matrices X, X2 and X3 are “all the same”.

This leads to that

µ = Xβ = X2β2 = X3β3.

1. X corresponds to writing the model as Ykl = αk + kl.

2. X2 corresponds to writing the model as Ykl = δ + γk + kl, with


γ2 = 0.

3. X3 corresponds to writing the model as Ykl = δ + ρk + kl.

October 17, 2001 Mixed Models Course 30

54
Consider the mean vector µ = (2, 2, 2, 3, 3, 3)> . The formulation as
µ = X3β3 where β3 = (δ, ρ1, ρ2)> is different from the two others in
an important way:

• Under the representation µ = Xβ, there is only one choice of β


namely β = (2, 3) which yields µ.

• Under the representation µ = X2β2, there is only one choice of β2


namely β2 = (3, −1) which yields µ.

• Under the representation µ = X3β3, there are infinitely many ways


of obtaining µ. Two such are β3 = (1, 1, 2) and β3 = (3, −1, 0).

f in

October 17, 2001 Mixed Models Course 31

• Example 7 illustrates that there in general are different


representations of the same model. Corresponding to the different
representations, there are different parameters, with different
interpretations.

• We say that the there are different parametrizations of the same


model.

• The representation µ = X3β3 is said to be over parametrized –


there are too many parameters in the model.

October 17, 2001 Mixed Models Course 32

55
4 Linear normal models

In many practical situations the models we work with are over


parametrized.

Yet, it does not matter which representation of the model we choose


and it is not really important that whether the model is over
parametrized in the following sense:

Any question that can be answered under one representation can


also be answered under another.

October 17, 2001 Mixed Models Course 33

To treat these issues in detail, it is necessary to think about what a


LNM really says: It says that

y = Xβ +  where µ = Xβ.

Hence β effects the distribution of the observables y only indirectly,


namely through Xβ.

Therefore since y is what can be observed, we can only use y for


saying “somethingh” about β if this “something” can be expressed
through Xβ.

This observation leads to the important notion of estimability and


estimable functions.
October 17, 2001 Mixed Models Course 34

56
The columns of X defines a subspace of Rn which we denote by L,
i.e.
L = span(X).

The statement µ = Xβ simply means that µ can be written as a


linear combination of the column vectors of X, i.e. that µ lies in
span(X).

But as has been illustrated in Example 7, there might be more than


one β vector producing µ.

Hence by saying that µ = Xβ, all one really says is that µ belongs
to L.

Moreover, there are infinitely many different ways of representing L,


October 17, 2001 Mixed Models Course 35

because one can always find another matrix, say X2 with


span(X2) = span(X) such that any vector µ = Xβ = X2β2.

Therefore, since the parameter vector β is closely related to the


actual representation of L, and since β might not be uniquely
determined, the value of a parameter vector β is rarely of direct
interest in itself.

October 17, 2001 Mixed Models Course 36

57
4 Linear normal models

Example 8. (Continuation of Example 2)

Let x̄. = n1 i xi denote the average of the xis. Define new variables
P
zi = xi − x̄. and consider the regression model

Y i = α0 + α1 z i +  i .

This model corresponds to “centering the xis around their mean”.


Not surprisingly, this does not change the fundamental structure of
the model - it is still a linear regression model, but with the following
new design matrix:

October 17, 2001 Mixed Models Course 37

   
1 z1 1 x1 − x̄.

 1 z2  
  1 x2 − x̄. 
  
1 z3 1 x3 − x̄. α0
   
X̃ =  =  , β̃ =
   
 1 z4   1 x4 − x̄.  α1
1 z5 1 x5 − x̄.
   
   
1 z6 1 x6 − x̄.
f in

October 17, 2001 Mixed Models Course 38

58
Representations of Models in SAS

Here we shall illustrate some of the differences between different


ways of specifying the models in SAS.

The illustration is with PROC MIXED but applies to PROC GLM too.

The model in Example 7 can be analyzed with the SAS program

PROC MIXED;
CLASS TREAT;
MODEL Y = TREAT / SOLUTION;
RUN;

Here TREAT is a variable with levels 1 and 2.


October 17, 2001 Mixed Models Course 39

1. First SAS generates the matrix X3.

2. SAS then realizes that the columns of X3 are linearly dependent.

3. SAS therefore proceeds by eliminating columns until a set of linearly


independent columns are achieved. This is done in a systematic
way: The column corresponding to the highest value of TREAT is
removed which yields X2.

The parameter estimates reported by SAS are therefore (δ, γ1).

Note that it is the option SOLUTION that causes the parameter


estimates to be reported.

October 17, 2001 Mixed Models Course 40

59
4 Linear normal models

The SAS program


PROC MIXED;
CLASS TREAT;
MODEL Y = TREAT / NOINT SOLUTION;
RUN;

on the other hand causes SAS to directly generate X, because the


NOINT option specifies that there shall not be a column of 1s in the
design matrix. The parameter estimates reported by SAS is therefore
(α1, α2).

October 17, 2001 Mixed Models Course 41

Example 9. Consider the two–way analysis of variance

Yijk = δ + αi + βj + γij + ijk

where i = 1, 2, j = 1, 2 and k = 1, 2, 3. The mean vector is


 
δ
  α1 
1 1 0 1 0 1 0 0 0 
 α2 

β1
 1 1 0 0 1 0 1 0 0
  
µ= β2  = Xβ
 
 1 0 1 1 0 0 0 1 0

 γ11 
1 0 1 0 1 0 0 0 1
 
 γ12 
γ21
 
γ22

(where in the designmatrix we regard 1 and 0 as vectors of length 3).


October 17, 2001 Mixed Models Course 42

60
This model is highly over parametrized. SAS handles this problem in
the way indicated above: A new design matrix giving the same model
is created, namely
  
1 1 1 1 δ
 1 1 0 0   α1 
  
µ=  = X 2 β2
 1 0 1 0   β1 

1 0 0 0 γ11

This corresponds to setting α2 = β2 = γ21 = γ12 = γ22 = 0 on


beforehand. (That is every time a parameter contains the level
number 2 in its index it is set to being zero.) f in

October 17, 2001 Mixed Models Course 43

This means that SAS solves the problem of an over parametrized


model by simply reducing it to a representation which is not over
parametrized.

As mentioned previously, this is not a problem because any quation


that can be answered under one representation of a model can also
be answered under another.

Yet, care should be taken when it comes to interpreting output from


SAS, see Section 18.

October 17, 2001 Mixed Models Course 44

61
4 Linear normal models

Least Squares Estimation in a LNM

In a LNM, the mean µi is a function of the parameter vector β.

One frequently used criterion for estimation is the method of


least squares:

Find the vector µ̂ = (µ̂1, . . . , µ̂n)> which minimizes the sum of


squared deviations
n
X
D(β) = (yi − µi)2
i=1

under the restriction that µ̂ = X β̃ for some parameter vector β̃.


October 17, 2001 Mixed Models Course 45

• Such a vector µ̂ always exists and is unique.

• We say that β̃ is a least squares estimate for β. Such an estimate


β̃ also exists, but it is in general not unique.

October 17, 2001 Mixed Models Course 46

62
Example 10. (Continuation of Example 2)

For the regression analysis we find

n
X
D(β) = (yi − (β0 + β1xi))2
i=1

Most standard textbooks on statistics take the following approach to


minimization of D(β):
∂ ∂
1) Calculate the derivatives ∂β0 D(β) and ∂β1 D(β),

2) set these equal to zero and

3) solve for β0 and β1.


October 17, 2001 Mixed Models Course 47

This gives
P
i(y
Pi
− ȳ.)(xi − x̄.)
β̂1 = 2
i (xi − x̄.)
β̂0 = ȳ. − β̂1x̄.

f in

October 17, 2001 Mixed Models Course 48

63
4 Linear normal models

Example 11. (Continuation of Example 1) For the one–way analysis


of variance
X 2 X3
D(β) = (ykl − αk )2
k=1 l=1

The values of αk which minimizes D(β), where β = (α1, α2)>, are

3
1X
αk = ykl = ȳk
3
l=1

The vector µ̂ is in this case (ȳ1, ȳ1, ȳ1, ȳ2, ȳ2, ȳ2)>.

However, if the model is written as Ykl = δ + ρk + kl, i.e. as


Y = X3β3 +  in Example 7, there is no unique least squares estimate
October 17, 2001 Mixed Models Course 49

of β3 = (δ, α1, α2). To see this, just note that

δ = 0, α1 = ȳ1, α2 = ȳ2

and

δ = (ȳ1 + ȳ2)/2, α1 = (ȳ1 − ȳ2)/2, α2 = (−ȳ1 + ȳ2)/2

both results in the same vector µ̂ = (ȳ1, ȳ1, ȳ1, ȳ2, ȳ2, ȳ2)>. f in

October 17, 2001 Mixed Models Course 50

64
Estimation on matrix form

The estimation problem can be formulated very generally in matrix


notation and can be solved generally using projections onto
subspaces:

Using matrix notation the least squares method is:

Find the vector µ̂ = (µ̂1, . . . , µ̂n)>

D(β) = (y − µ)>(y − µ)

under the restriction that µ̂ = X β̃ for some parameter vector β̃.


October 17, 2001 Mixed Models Course 51

Then we have the following results:

1. There always exists a unique vector of expected values µ̂ =


(µ̂1, . . . , µ̂n)> which minimizes D(β).

2. The vector µ̂ is µ̂ = P y where P be is the projection matrix onto


span(X).

3. Since µ̂ is in span(X), there exists a vector β̂1 satisfying that


µ̂ = X β̂1. We say that β̂1 is a least squares estimate of β.

4. If the columns of X are linearly independent, there exists only one


vector β̂1 satisfying that µ̂ = X β̂1. In that case the least squares
estimate is unique.
October 17, 2001 Mixed Models Course 52

65
4 Linear normal models

5. If the columns of X are linearly dependent, there exists several least


squares estimates, i.e. there is another vector β̂2 with µ̂ = X β̂2,
and where β1 6= β2.

6. In regression problems, the least squares estimate is typically unique,


whereas in analysis of variance problems, the least squares estimate
is generally not unique.

7. In the case where the least squares estimate is unique is is given as

β̂ = (X >X)−1X >y.

It is easy to see why it is so: We know that µ̂ = P y =


X[(X >X)−1X >y]. However, since µ̂ is in span(X), we also
know that µ̂ = X β̂. But both equations can only be true if
β̂ = (X >X)−1X >y.
October 17, 2001 Mixed Models Course 53

The vector e = y − µ̂ is the vector of residuals reflecting the


unobserved error vector .

Hence e>e = (y − µ̂)>(y − µ̂) is the residual sums of squares and if


the model fits well to data, e>e should be “small” in some sense.

If there are p linearly independent columns in X the estimate for the


variance σ 2 is
1 > 1
σ̂ 2 = e e= (y − µ̂)>(y − µ̂)
n−p n−p

October 17, 2001 Mixed Models Course 54

66
Example 12. (Continuation of Example 7).

With the matrix X as in Example 7, the projection matrix becomes


 
1 1 1 0 0 0
 1 1 1 0 0 0 
 
1
 1 1 1 0 0 0 

P = 
3 0 0 0 1 1 1 

 0 0 0 1 1 1 
 
0 0 0 1 1 1

f in

October 17, 2001 Mixed Models Course 55

The parameter vector β

We shall now assume that the LNM is such that the columns of X
are linearly independent such that the least squares estimate

β̂ = (X >X)−1X >y.

of β is unique.

Letting A = (X >X)−1X > we note that A is an p × n–matrix and


see that β̂ = Ay.

October 17, 2001 Mixed Models Course 56

67
4 Linear normal models

Thinking in terms of random variables, the data y is a realization of


a random vector Y with E(Y ) = Xβ and Cov(Y ) = σ 2I. Then

β̂(Y ) = (X >X)−1X >Y = AY

is also a random vector because β̂(Y ) is a function of the random


vector Y .

If the elements of A are denoted aij we see that the ith component
Pp
of β̂ is β̂i = j=1 aij yj

Hence each component βi of the vector β is a linear function of the


data y. Therefore it is not surprising that the corresponding random
variables β̂i(Y ) are dependent is some way.

October 17, 2001 Mixed Models Course 57

Using the relations (1) and (2) we find that

E(β̂(Y )) = AE(Y ) = (X >X)−1X >E(Y )


= (X >X)−1X >Xβ = β (4)

Equation (4) says that the expected value of the least squares
estimator β̂ is simply the true but unknown value β.

October 17, 2001 Mixed Models Course 58

68
Cov(β̂(Y )) = A Cov(Y )A> = σ 2AIA> = σ 2AA>
= σ 2(X >X)−1X >[(X >X)−1X >]>
= σ 2(X >X)−1X >X(X >X)−1
= σ 2(X >X)−1 (5)

Equation (5) says that the covariance of the least squares estimator
β̂ is proportional to the residual variance σ 2. Moreover, the matrix
(X >X)−1 does not depend on the data y but only on the design
matrix X, i.e. on how the study at hand was conducted.

October 17, 2001 Mixed Models Course 59

Recall that on the diagonal of a covariance matrix one finds the


variances. Hence when knowning (X >X)−1 and an estimate for σ 2
then we also know the variance estimates for β̂i.

October 17, 2001 Mixed Models Course 60

69
4 Linear normal models

Example 13. (Continuation of Example 2) Suppose xi = i and


zi = i − 3.5 in the regression example for i = 1, . . . , 6.
Regression of y on x with the program
PROC GLM ;
MODEL y = x / inv;
RUN; QUIT;

gives the result

October 17, 2001 Mixed Models Course 61

The GLM Procedure


X’X Inverse Matrix
Intercept x y
Intercept 0.8666666667 -0.2 -1.286578758
x -0.2 0.0571428571 0.4835938022
y -1.286578758 0.4835938022 3.225955579

Dependent Variable: y Sum of


Source DF Squares Mean Square F Value Pr > F

Model 1 4.09260190 4.09260190 5.07 0.0874


Error 4 3.22595558 0.80648889
Corrected Total 5 7.31855748

Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -1.286578758 0.83603651 -1.54 0.1987
x 0.483593802 0.21467436 2.25 0.0874

October 17, 2001 Mixed Models Course 62

70
The two first diagonal elements of (X >X)−1 times the variance
estimate σ̂ (i.e. the Mean Square Error) gives variance estimates of
the regression parameters.

The square root of these estimates are the standard errors reported.

Moreover, the covariance between the intercept and the slope is


estimated to be −0.2 so these estimates are correlated.

October 17, 2001 Mixed Models Course 63

Regression of y on z with the program


PROC GLM ;
MODEL y = z / inv;
RUN; QUIT;

gives the result

October 17, 2001 Mixed Models Course 64

71
4 Linear normal models

The GLM Procedure


X’X Inverse Matrix
Intercept z y
Intercept 0.1666666667 0 0.4059995498
z 0 0.0571428571 0.4835938022
y 0.4059995498 0.4835938022 3.225955579

The GLM Procedure


Dependent Variable: y Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 4.09260190 4.09260190 5.07 0.0874
Error 4 3.22595558 0.80648889
Corrected Total 5 7.31855748

Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 0.4059995498 0.36662626 1.11 0.3302
z 0.4835938022 0.21467436 2.25 0.0874

October 17, 2001 Mixed Models Course 65

In this case we see that centering the x values around their average
(3.5) gives parameter estimates which are uncorrelated. Moreover,
the estimate of the slope (and the associated standard error) is the
same as before. f in

October 17, 2001 Mixed Models Course 66

72
Example 14. (Continuation of Example 2)

With
 
1 x1

 1 x2 
  
1 x3 β0
 
X= , β =
 
 1 x4  β1
1 x5
 
 
1 x6

we find (when letting n = 6) that

 P 
n x i
X >X = P Pi 2
i xi i xi

October 17, 2001 Mixed Models Course 67

Recall that
   
a b −1 1 d −b
A= implies that A =
c d ad − bc −c a

(provided that ab − bc 6= 0). Using this gives

 P 2 P 
> −1 1 i xi − i xi
(X X) = P 2 P
n i xi − ( i xi )2
P
− i xi n

1
Letting K = P 2 P 2 , the variance of the estimator β̂0 for the
n i xi −( i xi )
intercept is
2
P
i xi
V ar(β̂0) =
K
October 17, 2001 Mixed Models Course 68

73
4 Linear normal models

and the variance of the estimator β̂1 for the slope is


n
V ar(β̂1) =
K

The estimators β̂0 and β̂1 are correlated since

1 X
Cov(β̂0, β̂1) = − xi
K i

f in

October 17, 2001 Mixed Models Course 69

Example 15. (Continuation of Example 8)


P
Since i(xi − x̄.) = 0 (Verify this!) we find that
   
> n 0 n 0
X̃ X̃ = P 2 = P 2
0 i zi 0 i(xi − x̄.)

Since the inverse of a diagonal matrix is also diagonal, we conclude


that the estimators α̂0 and α̂1 are independent. f in

October 17, 2001 Mixed Models Course 70

74
The estimator β̂ has a p–dimensional multivariate normal
distribution (in short MVN), with mean vector β and covariance
matrix σ 2(X >X)−1.

This is written
β̂ ∼ Np(β, σ 2(X >X)−1).

This means that any linear combination λ>β̂ has a univariate normal
distribution
λ>β̂ ∼ N (λ>β, σ 2λ>(X >X)−1λ) (6)
and that is a very important result for practical statistics.

October 17, 2001 Mixed Models Course 71

Estimability and Contrasts

In a LNM with mean vector µ = Xβ one is typically interested in


making statements about (some of) the components of the
parameter vector β.

However, with µ = Xβ we only have indirect knowledge about β


P
because all we know is that µi = j xij βj and, as has been
illustrated, β is in general not uniquely determined. That is, there
can be another vector β2 such that µ = Xβ = Xβ2.

Hence there are some constraints on what can actually be said about
β.

October 17, 2001 Mixed Models Course 72

75
4 Linear normal models

In the one–way analysis of variance of Example 1 one might be


interested in the difference α1 − α2 or in α1 itself and there is no
problem in that. For later purposes it can be noted that

α1 − α2 = (1, −1)(α1, α2)> = (1, −1)β


α1 = (1, 0)(α1, α2)> = (1, 0)β

October 17, 2001 Mixed Models Course 73

Example 16. Consider the two–way analysis of variance

Yij = δ + αi + βj + ij

where  
  δ
1 1 0 1 0
  α1 
 
1 1 0 0 1

µ=   α2  = Xβ
 
1 0 1 1 0
 β1 
 
1 0 1 0 1
β2
It is clear that this model is grossely over parametrized (why?)

Under this model we can estimate quantities like

1
α1 − α2, δ + α1, δ + α1 + (β1 + β2)
2
October 17, 2001 Mixed Models Course 74

76
Note that

α1 − α2 = (0, 1, −1, 0, 0)β,


1 1 1
δ + α1 + (β1 + β2) = (1, 1, 0, , )β
2 2 2

However other things like

α1 = (0, 1, 0, 0, 0)β or β1 = (0, 0, 0, 1, 0)β

can not be estimated under this model.

f in

October 17, 2001 Mixed Models Course 75

In a sense, the only thing uniqely determined in a LNM is µ.

Therefore the only thing one can truely say something about is linear
combinations of µ, i.e. linear combinations of the form

a> µ

for some n–vector a.

Most frequently interest is in contrasts of the form λ>β.

Therefore, a natural question is how

a>µ and λ>β

relate to each other?


October 17, 2001 Mixed Models Course 76

77
4 Linear normal models

Since µ = Xβ, we can only say something about β if one can


express it as
a>Xβ.
Note that a>X is an 1 × p–vector.

Therefore, we can say something about the contrast λ>β only if one
can find an n–vector a such that

a> X = λ >

If there exists such a vector a, the contrast λ>β is said to be


estimable.

In this case the contrast is can be written

λ>β = a>Xβ = a>µ


October 17, 2001 Mixed Models Course 77

After having estimated µ̂, the contrast λ>β is estimated by

λ>β̂ = a>Xβ = a>µ̂.

Recall from the section on estimation that there might in general be


many least squares estimates for β. However, the following holds:

Result 2. The least squares estimate of λ>β is unique if and only


if λ>β is estimable.

In other words,

The only thing one can say something about in an unambiguous


way is estimable functions.

October 17, 2001 Mixed Models Course 78

78
From the general result

λ>β̂ ∼ N (λ>β, σ 2λ>(X >X)−1λ) (7)

we know the distribution of the contrast λ>β̂ and hence testing for
the contrast being zero is straight forward.

Note that transposing a>X = λ> gives X >a = λ.

Hence the condition for estimability is that λ can be written as a


linear combination of the columns of X > i.e. as a linear combination
of the rows of X.

This amounts to solving a set of linear equations – and computers


can do that!

October 17, 2001 Mixed Models Course 79

Example 17. (Continuation of Example 16)

We wish to verify that

1 1 1
δ + α1 + (β1 + β2) = (1, 1, 0, , )β
2 2 2
is indeed estimable.

That is, we seek a vector a = (a1, a2, a3, a4)> such that

1 1
a>X = (1, 1, 0, , ).
2 2

October 17, 2001 Mixed Models Course 80

79
4 Linear normal models

Direct multiplication gives

a1 + a 2 + a 3 + a 4 = 1
a1 + a 2 = 1
a3 + a 4 = 0
1
a1 + a 3 =
2
1
a2 + a 4 =
2
It is not hard to spot that the solution to these equations are

a1 = a2 = 1/2 and a3 = a4 = 0.

f in

October 17, 2001 Mixed Models Course 81

Estimability in SAS

In checking whether a specific contrast is estimable, it is


recommended to use PROC GLM.

The following SAS program deals with data from Example 16


proc glm data=a;
class i j;
model y = i j/E;
lsmeans i j /E;
run;

October 17, 2001 Mixed Models Course 82

80
The output caused by the E–option in the MODEL statement is
General Form of Estimable Functions
Effect Coefficients
1 Intercept L1
2 i 1 L2
3 i 2 L1-L2
4 j 1 L4
5 j 2 L1-L4

Recall that β = (δ, α1, α2, β1, β2). The numbers 1,2,3,4,5 identify
the entry of the λ–vector, λ = (λ1, λ2, . . . , λ5), and the Ls specify
the constraints to be satisfied by the λis.

It reads as follows: λ1 can be set to any value L1, and λ2 can be set
to any value L2. But then λ3 is constrained to be equal to L1 − L2.
Likewise, λ4 can be set to any value L4, but then λ5 is constrained

October 17, 2001 Mixed Models Course 83

to be equal to L1 − L4.

From this we see how to specify some contrasts

λ = (1, 1, 0, 1, 0) : λ>β = δ + α1 + β1
1 1 1
λ = (1, 1, 0, , ) : λ>β = δ + α1 + (β1 + β2)
2 2 2
>
λ = (0, 1, −1, 0, 0) : λ β = α1 − α2

But we can also see that the contrast δ + 12 (α1 + α2) is not
estimable: Taking λ1 = 1 and λ2 = λ3 = 12 would give the desired
result, but setting λ4 = 0 implies that λ5 = 1, so it is not possible.

The contrasts specified above are constructed as follows in PROC


GLM (and in PROC MIXED. Note that we have indicate two ways of
October 17, 2001 Mixed Models Course 84

81
4 Linear normal models

constructing the last contrast.


title ’Estimation of contrasts’;
proc glm data=a;
class i j;
model y = i j /E;
estimate ’Lambda 1’ intercept 1 i 1 0 j 1 0 / E;
estimate ’Lambda 2’ intercept 1 i 1 0 j .5 .5 / E;
estimate ’Lambda 3’ intercept 0 i 1 -1 j 0 0 / E;
estimate ’Lambda 3’ intercept 0 i 1 -1 / E;
run; quit;

October 17, 2001 Mixed Models Course 85

Least Squares Means

The LSMEANS statement in GLM is an attempt to generate meaningful


estimates automatically, sometimes (but not always) with success.
These are denoted least squares means and can be constructed as
title ’Least squares means’;
proc glm data=a;
class i j;
model y = i j ;
lsmeans i j / E stderr;
run; quit;

The output caused by the E–option in the LSMEANS statement is

October 17, 2001 Mixed Models Course 86

82
Least Squares Means
Coefficients for i Least Square Means i Level

Effect 1 2
1 Intercept 1 1
2 i 1 1 0
3 i 2 0 1
4 j 1 0.5 0.5
5 j 2 0.5 0.5

Coefficients for j Least Square Means j Level


Effect 1 2
1 Intercept 1 1
2 i 1 0.5 0.5
3 i 2 0.5 0.5
4 j 1 1 0
5 j 2 0 1

October 17, 2001 Mixed Models Course 87

The interpretation of the columns to the right is exactly as before:

The vector λ = (1, 1, 0, 0.5, 0.5)> gives

1
λ>β = δ + α1 + (β1 + β2).
2

From this we see that the LSMEANS for i = 1 is the δ + α1 plus the
“average effect” of the factor j, i.e. 12 (β1 + β2).

October 17, 2001 Mixed Models Course 88

83
4 Linear normal models

Hypothetis Testing

Example 18. The two–way analysis of variance model

Yij = δ + αi + βj + ij , , i = 1, 2, j = 1, 2

is in the following be referred to as the large model.

Data is assumed to be in accordance with the large model.

Suppose we are interested in testing whether βj = 0.

October 17, 2001 Mixed Models Course 89

The mean µij of Yij is δ + αi + βj and the mean vector has the form
 
    δ
µ11 1 1 0 1 0  α1 
µ   1 1 0 0 1
    
µ =  12 = α2  = Xβ
 
 µ21   1 0 1 1 0

β1
 
µ22 1 0 1 0 1
 
β2

Testing βj = 0 corresponds to testing whether the reduced model

Yij = δ + αi + ij

is in accordance with data.

October 17, 2001 Mixed Models Course 90

84
Under the reduced model, the mean µij of Yij is δ + αi and the mean
vector has the form
   
µ11 1 1 0  
δ
 µ12   1 1 0  
   
µ= =   α1  = X 0 β 0

 µ21   1 0 1 
α2
µ22 1 0 1

Hence testing the hypothesis βj = 0 corresponds to testing whether


µ = X0β0 when we “know” that µ = Xβ. f in

October 17, 2001 Mixed Models Course 91

Note that any vector µ that can be written as µ = X0β0 can also be
written as µ = Xβ – simply by setting the last two elements of β to
zero.

More generally, any vector in span(X0) is also in span(X), but not


vice versa.

(Recall that span(X0) is the set of vectors that can be written as a


linear combination of the columns of X0.)

Let P and P0 be the projection matrices corresponding to X and

October 17, 2001 Mixed Models Course 92

85
4 Linear normal models

X0. The least squares estimate of µ are

µ̂ = P y under the large model


µ̂ = P0y under the reduced model

How to judge whether the reduced model is feasible??

The answer lies in the “distance” between the observations and the
expected values.

The vector of residuals

e = y − µ̂ = y − P y = (I − P )y
October 17, 2001 Mixed Models Course 93

reflect random deviations from the mean under the large model (in
which we “believe”).

Therefore the length of e (and hence the squared length e>e is


expected to be “small” in some sense.

October 17, 2001 Mixed Models Course 94

86
If the reduced model is true then e0 = (I − P0)y is also the vector of
residuals, and the length of the vector should also be small.

On the other hand if the reduced model is not true, then e0 is not
just residuals, because it contains some of the variation due to the
factor βj .

In this case the length of the residual vector is expected to be large.

Consider the difference between the residuals

D = e − e0 = y − P y − (y − P0)y = P y − P0y = (P − P0)y

If the reduced model is true, then this difference is just difference


between residuals, and the length of D is expected to be small.

October 17, 2001 Mixed Models Course 95

If we let d and d0 denote the number of independent columns in X


and X0, one can show the following

Result 3.
D >D 1
E( )= E(D >D) = σ 2 + k
d − d0 d − d0
or equivalently that

E(D >D) = (d − d0)(σ 2 + k) = (d − d0)σ 2 + (d − d0)k,

where k ≥ 0 and k = 0 when the reduced model is true.

If σ 2 had been known the result above would be very useful:

If D >D is “much larger” than (d − d0)σ 2, this would indcate that


October 17, 2001 Mixed Models Course 96

87
4 Linear normal models

k > 0 which in turn causes us to doubt the feasibility of the reduced


model.

October 17, 2001 Mixed Models Course 97

There are two problems in this connection:

1. σ 2 is not known, and

2. what does “much larger” mean...

Yet, in Linear Normal Models there is a simple solution to this two


problems now to be outlined:

October 17, 2001 Mixed Models Course 98

88
Problem 1: σ 2 is not known

Under the large model, the variance estimate is

σ̂ 2 = e>e/(n − d),

i.e. the residual sum of squares divided by the residual degrees of


freedom.

It is well known that E(σ̂ 2) = σ 2, so it is reasonable to assume that


σ̂ 2 ≈ σ 2.

Therefore, if the reduced model is true (and hence k = 0), the ratio

D>D/(d − d0)
F = ≈ 1.
e>e/(n − d)
October 17, 2001 Mixed Models Course 99

That takes, to some extent, “care of” the problem that σ 2 is


unknown.

Problem 2: what does “much larger” mean... :

If the reduced model is not true, then the ratio F would tend to be
larger than 1. The problem remaining is to define what is meant by
“large”. On can show the following:

Result 4. If the reduced model is true then F has an Fd−d0,n−d–


distribution.

Here d − d0 is the number of parameters removed from the model


(i.e. the additional residual degrees of freedom gained by going from
the large to the reduced model), and n − d is the residual degrees of
October 17, 2001 Mixed Models Course 100

89
4 Linear normal models

freedom under the large model.

If the reduced model is not true, then F has an expected value larger
than 1.

Therefore, if F is larger than a pre–specified quantile in the


Fd−d0,n−d–distribution one would doubt the feasibility of the model
reduction, i.e. reject the hypothesis.

October 17, 2001 Mixed Models Course 101

Calculating things in Practice

Consider again the difference between the residuals

D = e − e0 = y − P y − (y − P0)y = P y − P0y = (P − P0)y.

There is an easy way to calculate D >D in practice:

Result 5.

D > D = e> >


0 e0 − e e = RSS0 − RSS

where RSS and RSS0 denote the residual (or error) sums of squares
under the large and the reduced model respectively.
October 17, 2001 Mixed Models Course 102

90
Tests in LNMs in short form

• Consider a LNM Y ∼ Nn(µ, σ 2I). Hence Y =D µ + e, where


e ∼ Nn(0, σ 2I).

• Consider the models for the mean value

M : µ ∈ L = C(X) calM0 : µ ∈ M0 = C(X0) L0 ⊂ L

where M is assumed to hold true, and let M and M0 denote the


corresponding projections of dimension d and d0.

• Under M, M Y = M µ + M e = µ + M e.
October 17, 2001 Mixed Models Course 103

• If M0 is true, then

(M − M0)Y = M µ + M e − M0µ − M0e = (M − M0)e

is only “random noise”. In this case (M − M0)Y is expected to be


small.

• Clearly, M − M0 is the projection onto L ∩ L>.

>
• Hence ||(Md−d
−M0 )Y ||
0
(M −M0 )Y
= Y r(M −M0 ) is a measure of how close M0Y
is to M Y in relation to the difference in dimensionality of the
models.
October 17, 2001 Mixed Models Course 104

91
4 Linear normal models

• We use the results that

E(Y >AY ) = tr(A Var(Y )) + E(y)>A E(Y )


tr(M ) = d, tr(M − M0) = d − d0

• Assuming only M,

Y >(M − M0)Y σ2 (M − M0)


E( ) = ( tr(M − M0)) + β >X > Xβ
r(M − M0) d − d0 d − d0 )
(M − M0)
= σ 2 + β >X > Xβ
d − d0 )
= σ 2 + ||v||2

• If M0 is true, then ||v||2 = 0.


October 17, 2001 Mixed Models Course 105

Y >(I−M )Y
• If we use M SE = n−d = σ̃ 2 as an estimate for σ 2 then
under M0,
Y > (M −M0)Y
d−d0
F = Y >(I−M )Y
≈1
n−d

• It is clear that nominator and denominator are independent:


     
I −M I −M 2 I −M 0
Y ∼N µ; σ
M − M0 M − M0 0 M − M0

• Under M0,
1 >
2
Y (M − M0)Y ∼ χ2(d − d0, β >X >(M − M0)Xβ)
σ
, i.e. a non–central χ2 distribution.
October 17, 2001 Mixed Models Course 106

92
• Hence large values of F causes doubt in M0.

October 17, 2001 Mixed Models Course 107

Hypothesis Testing in SAS

In practice SAS performs all relevant calculations (and,


unfortunately, a few more).

Degrees of freedom: A comment regarding the degrees of


freedom reported by SAS is appropriate:

Default in SAS is that all observations are centered around their


average.

This centering “costs” one degree of freedom and therefore SAS


reports the Corrected Total which is n − 1, where n is the
number of observations.

October 17, 2001 Mixed Models Course 108

93
4 Linear normal models

In the large model in Example 18 there are three parameters,


(δ, α1, β1)

Because of the centering of the data, SAS does not regard δ as a


parameter when it comes to reporting degrees of freedom. So the
real number of parameters is the number SAS reports plus 1. Hence
d = 2 + 1 while d0 = 1 + 1.

(Note: If the NOINT option is specified, the model degrees of


freedom become correct.)

In practice it is not a problem whether data are centered or not,


because we mainly are interested in differences between the number
of parameters, i.e. differences in degrees of freedom.

October 17, 2001 Mixed Models Course 109

Example 19. (Continuation of Example 18) Below we find the


output from fitting the large and the reduced model in PROC GLM.
Dependent Variable: y Large model
Sum of
Source DF Squares Mean Square F Value Pr > F

Model 2 3.76999467 1.88499734 2.70 0.3954


Error 1 0.69877998 0.69877998
Corrected Total 3 4.46877465

Source DF Type III SS Mean Square F Value Pr > F


i 1 0.73276693 0.73276693 1.05 0.4924
j 1 3.03722775 3.03722775 4.35 0.2847

Dependent Variable: y Reduced model


Sum of
Source DF Squares Mean Square F Value Pr > F

Model 1 0.73276693 0.73276693 0.39 0.5951


Error 2 3.73600773 1.86800386
Corrected Total 3 4.46877465

October 17, 2001 Mixed Models Course 110

94
In the notation from before

D>D = RSS0 − RSS = 3.73600773 − 0.69877998 = 3.037


e>e = RSS = 0.699
d − d0 = 3 − 2 = 2 − 1 = 1
n−d = 4−3=3−2=1

The F-statistic therefore becomes


3.037/1
F = = 4.35
0.699/1

This is the statistic reported in the Type III SS–section of the


output. So in most (but not all) cases, SAS does the work for us.
f in
October 17, 2001 Mixed Models Course 111

Example 20. The two–way analysis of variance with interactions

Yijk = δ + αi + βj + γij + ijk , i = 1, 2; j = 1, 2; k = 1, 2, 3

has mean
 
δ

 α1 

    α2 
µ11 1 1 0 1 0 1 0 0 0  
 β1 
µ12   1 1 0 0 1 0 1 0 0
    
µ= = β2  = Xβ
  
µ21   1 0 1 1 0 0 0 1 0

γ11
  
µ22 1 0 1 0 1 0 0 0 1
 
 

 γ12 

 γ21 
γ22
October 17, 2001 Mixed Models Course 112

95
4 Linear normal models

Here we regard µij , 1 and 0 as vectors of length 3 such that µ


contains 12 elements.

In this form, the model is overparametrized so SAS works with an


equivalent representation, namely

  
1 1 1 1 δ
 1 1 0 0   α1 
  
µ=  = X 2 β2 (8)
 1 0 1 0   β1 

1 0 0 0 γ11

f in

October 17, 2001 Mixed Models Course 113

96
5 Some Basic Statistical Concepts

This lecture presented/refreshed basic statistic concepts, such as central limit theorem, principles
of estimation, the likelihood principle and test of hypothesis.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/StatTheory.f.pdf

97
5 Some Basic Statistical Concepts

Data and Models

The starting point for a statistical analysis is a set of observations

y = (y1, . . . , yn)

resulting from an experiment (or perhaps an observational study)


conducted in order to gain insight in a specific area.

We shall in general use the term experiment even though the setting
may not be that of a controlled experiment.

October 18, 2001 Mixed Models Course 1

Some Characteristics:

A fundamental characteristic of the experiment is that the outcome


is stochastic rather than deterministic.

Hence, if the experiment is repeated again under similar conditions


the new result would not necessarily be y.
Because of the random/stochastic variation in data, it is natural to
consider models based on probability theory, because this is the
branch of mathematics dealing with random variation. In this
setting, the starting point is the set of possible outcomes

Y = (Y1, . . . , Yn)

of the experiment.

October 18, 2001 Mixed Models Course 2

98
Here Yi could be for example
• the set of all real numbers,
• the set of positive real numbers,
• the set {diseased, not diseased}, or
• the set {low, medium, high}.

The link between the observed value yi and the set of possible values
Yi is established through the notion of a random variable Yi.

A random variable Yi is a function whose values can be in the set


Yi, and the observed value yi is said to be a realization of the
random variable Yi.

October 18, 2001 Mixed Models Course 3

The random variable Yi is a function, but not a deterministic


function such as e.g.
f (x) = x2 + 7.

It is a random function whose outcome on one hand is uncertain but


on the other hand typically governed by some rules. Those rules are
best formulated in terms of a probability distribution.

Example 1. : Binomial Experiment Any animal can be infected


with a specific disease, i.e. it can be diseased or not–diseased.

For the ith animal in the population the state of disease is denoted by
Yi and Yi can therefore take one of the values {diseased, not diseased}
(for brevity written simply as {1, 0}).

f in
October 18, 2001 Mixed Models Course 4

99
5 Some Basic Statistical Concepts

Example 2. : Binomial Experiment If the possible outcomes of


Yi is the set {diseased, not diseased} (for brevity written simply as
{1, 0}) the random variable Yi can be either 1 or 0. A statistical
model for Yi is obtained by specifying the probability distribution for
Yi, for instance
p(Y = y) = θ y (1 − θ)1−y
where 0 ≤ θ ≤ 1. f in

Example 3. : Samples from the normal distribution If Yi has a


normal distribution, e.g. Yi ∼ N (θ, 1) the set of possible outcomes
Yi is the real line. f in

October 18, 2001 Mixed Models Course 5

In both examples, the function Yi is specified through a


probability distribution.

The distribution depends on an (unknown) parameter θ. (In the


examples, θ is a single number but more generally the parameter is a
vector θ = (θ1, . . . , θp).)

October 18, 2001 Mixed Models Course 6

100
In statistical terms, one speaks of a parametrical statistical model:

1. It is a statistical model, because the outcome of Yi is described in


terms of a probability distribution.

2. It is a parametrical model because once the parameter θ is known


the distribution is known.

October 18, 2001 Mixed Models Course 7

Why the Normal Distribution is so “Normal”

The most frequenly employed distribution is the normal distribution.


Many (but certainly not all) random phenomena encountered in
practice exhibit a certain regularity:

1. Observations have a tendency to be clustered around a “mean


value”.

2. Deviations from the “mean value” are often symmetric.

3. The histogram of observations can be well approximated with the


bell–shaped normal (or Gaussian) distribution

October 18, 2001 Mixed Models Course 8

101
5 Some Basic Statistical Concepts

Histogram of z.mean

5
4
Relative Frequency

3
2
1
0
0.3 0.4 0.5 0.6 0.7

z.mean

The bell-shaped curve is written

1 1
f (y; µ, σ 2 ) = √ exp(− 2 (y − µ)2)
2πσ 2σ

Why does this bell–shaped curve fit quite well to many


phenomenons encountered in practice??

October 18, 2001 Mixed Models Course 9

The Central Limit Theorem

Parts of the answer is given by the Central Limit Theorem:

Let Z1, . . . , Zn be independent random variables with E(Zi) = µi


and V ar(Zi) = σi2.
Pn
Let Y = i=1 Zi.

Then E(Y ) = µ = i µi and V ar(Y ) = σ 2 = i σi2.


P P

What about the distribution of Y ?

October 18, 2001 Mixed Models Course 10

102
Result 1. The Central Limit Theorem says that

Y ∼approx N (µ, σ 2).

The approximation becomes better as n → ∞.

(Note: We have not made any assumption about the distribution of


the Zis – it has only been assumed that they are independent.

Many things encountered in nature can be regarded as the sum of


many small (independent) contributions. That is one explanation
why the normal distribtuion is so “normal”.

October 18, 2001 Mixed Models Course 11

Example 4. Let Zi be uniformly distributed on [0, 1], i.e. all values


in the [0, 1]–interval are “equally likely” for i = 1, . . . , 4.
Py
How does the distribution of Z̄ = n1 i=1 Zi look?

Quite normal, actually !


Histogram of z1 Histogram of z2 Histogram of z.mean Normal Q−Q Plot
0.8
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1.2

0.7
1.5
Relative Frequency

Relative Frequency

Relative Frequency

Sample Quantiles

0.6
0.8

1.0

0.5
0.4
0.4

0.5

0.3
0.0

0.0

0.2

0.0 0.4 0.8 0.0 0.4 0.8 0.2 0.4 0.6 0.8 −2 −1 0 1 2

z1 z2 z.mean Theoretical Quantiles

f in

October 18, 2001 Mixed Models Course 12

103
5 Some Basic Statistical Concepts

Some General Principles of Estimation

After establishing a statistical model a problem is to estimate the


value of the parameter θ. To find this estimate we need to make
some assumptions.

In what follows, a very fundamental assumption will be made:

There exists a true (but unknown) value of θ.

If θ had been known, then the distribution of Yi would be known


too. That is we would know the characteristics of the mechanism
which generated the data y.
October 18, 2001 Mixed Models Course 13

A consequence of this is that the important task is to obtain a good


estimate of θ. Some examples of doing so are given in the following.

October 18, 2001 Mixed Models Course 14

104
Example 5. (Continuation of Example 2) Consider the experiment
of tossing a “pin” n times, giving data y = (y1, . . . , yn). Hence the
possible outcomes are Yi = {up, down} which we write {1, 0}.

It is assumed that
P (Yi = 1) = θ
for all i, such that the probability of observing “pin up” (!) is the
same every time.P If we observe that the pin points upwards all
together y+ = i yi times, then it takes only very little creativity to
suggest that the relative frequency

y+/n

is a sensible estimate for θ. f in

October 18, 2001 Mixed Models Course 15

Example 6. : Linear regression Consider the case where there is


associated a known number xi to each outcome of the experiment
yi, and where it is suspected that there might be an approximately
linear relationship between xi and yi.

This can lead to the linear regression model

Yi ∼ N (θi, σ 2) where θi = θ0 + θ1xi

This model is fundamentally different from the model in Example 2:


In Example 2, each observation was assumed to have the same
distribution. In the present model, this is not the case as the mean
for each random variable Yi is allowed to depend on the value of xi.

October 18, 2001 Mixed Models Course 16

105
5 Some Basic Statistical Concepts

It is well known from any standard textbook on statistics that the


parameters θ = (θ0, θ1) can be estimated by minimizing the squared
distance between the observed and the expected values, i.e. by
minimizing the function
X
D(θ0, θ1) = (yi − (θ0 + θ1xi))2
i

f in

October 18, 2001 Mixed Models Course 17

Example 7. (Continuation of Example 3) Suppose we conduct an


experiment where each observation yi is a realization of Yi ∼ N (θ, 1).
Then it takes very little fantasy to suggest that the average
n
1X
z1 = yi
n i=1

is a sensible estimate for θ. f in

October 18, 2001 Mixed Models Course 18

106
In the examples above it is easy to suggest ways of estimating the
unknown parameters. These can be described as:

Example 5: Estimation by the relative frequency.

Example 6: Estimation by minimizing the squared distance.

Example 7: Estimation by the average.

However, it is clear that there is a need for:

• General principles for obtaining those estimates.

• Some notion for how “good” an estimate is.


October 18, 2001 Mixed Models Course 19

In the following we present and discuss some of these principles


briefly.

The exposition is by no means intended to be neither comprehensive


nor very precise.

The aim is solely to illustrate some of the considerations made in


connection with estimation of unknown parameters on the basis of
data.

Eventually the exposition leads to the method of maximum


likelihood.

October 18, 2001 Mixed Models Course 20

107
5 Some Basic Statistical Concepts

Method of Moments

One approach is to base the estimation on the moments, i.e. the


expectation, variance etc. of radom variables.

Recall that the first moment of a random variable X is E(X) and


the second central moment of X is E(X − E(X))2 = V ar(X).

For Example 3 with Yi ∼ N (θ, 1) we define a new random variable,


say Z1, as the avereage of the Yis. Then it is well known that

n
1X
Z1 = Yi ∼ N (θ, 1/n)
n i=1

October 18, 2001 Mixed Models Course 21

Pn
The estimate z1 = n1 i=1 yi can then be regarded as a realization
of the random variable Z1 which has mean E(Z1) = θ.

It is important to keep in mind that Z1 is a function of Y1 . . . , Yn


which can be emphasized by writing Z1(Y ). Likewise, z1 is a
function of the observed data whih is emphasized by writing z1(y).

We say that

• the random variable Z1(Y ) is an estimator, and

• a specific value of Z1(y) is an estimate.

October 18, 2001 Mixed Models Course 22

108
The method of moments is to consider θ̂(y) as a good estimate of θ
because the corresponding random variable Z1(Y ) has θ as its
expectation:
E(Z1(Y )) = θ (1)

October 18, 2001 Mixed Models Course 23

How good is an estimator?

An estimator with the property (1) is said to be unbiased.

Unbiasedness seems to be desireable property of an estimator.

However, there are many estimators with the property (1). Two
additional ones are

• the average Z2(Y ) = (Y1 + Y2)/2 of the two first random variables,
and

• Z3(Y ) = Y1, i.e. the first random variable itself.


October 18, 2001 Mixed Models Course 24

109
5 Some Basic Statistical Concepts

Yet, intuition indicates that z1 is a “better” estimate of θ than


z2 = (y1 + y2)/2 which in turn is “better” than z3 = y1.

To be precise about what is meant by “better” we consider the


variance of the estimators:

V ar(Z1(Y )) = 1/n
V ar(Z2(Y )) = 1/2
V ar(Z3(Y )) = 1

October 18, 2001 Mixed Models Course 25

Hence (with more than 2 observations), we have

V ar(Z1) < V ar(Z2) < V ar(Z3),

and on the basis of this it is clear that we will consider Z1 to be a


better estimate of θ than Z2 or Z3.

Note: Because estimates are realizations of random variables (their


corresponding estimators) it is “a must” always to report a the
variance, a standard deviation or a related quantity whenever
reporting the value of an estimate.

October 18, 2001 Mixed Models Course 26

110
Someone might suggest to estimate θ by Z4(Y ) = Z1(Y ) + 7.

In terms of considering estimators with small variance as being


“good”, one can argue that Z4 is just as good as Z1, because
V ar(Z4) = V ar(Z1).

However, E(Z4) = θ + 17 6= θ, so Z4 is not an unbiased estimate of


θ.

These considerations suggest that good estimators should be


unbiased and have as small variance as possible.

October 18, 2001 Mixed Models Course 27

These two criteria leads to the theory of


Minimum Variance Unbiased Estimation – sometimes written briefly
as MVUE. It is not surprising that Z1 is a MVUE (Minimum
Variance Unbiased Estimator).

In general, establishing MVUEs can be a complicated task: Finding


estimators that are unbiased may not be too hard, but finding one
with the smallest possible variance may be very very complicated.

October 18, 2001 Mixed Models Course 28

111
5 Some Basic Statistical Concepts

Consistency of Estimators

The estimator Z1 has other nice properties compared with Z2, Z3


and Z4.
When the number of observations n tends to infinity, the variance of
Z1 tends to 0. The practical implication of this is straight forward:
Z1 becomes indistinguishable from its expectation θ. An estimator
with this property is said to be consistent.
Consistency is an attractive feature of an estimator, because it
means that the estimate of θ gets better and better the more data
we collect.

It is clear that neither of Z2, Z3 and Z4 are consistent.

October 18, 2001 Mixed Models Course 29

Desireable Properties of Estimators

From the discussion above we have found that

• Unbiasedness,

• Smallest possible variance, and

• Consistency

are three attractive properties of estimators.

October 18, 2001 Mixed Models Course 30

112
Estimators, whatever kind they are, are functions of the random
variables Y1, . . . , Yn from which data y1, . . . , yn are realizations.
Hence estimators are random variables and as such they have a
distribution. This distribution is needed when drawing inference
about a parameter, e.g. when making a test or constructing a
confidence interval.

Therefore a fourth desireable property of an estimator is that

• The distribution of the estimator is known.

October 18, 2001 Mixed Models Course 31

The Method of Maximum Likelihood

There is a general estimation method called maximum likelihood


estimation to be discussed in the following.

An estimator obtained from this method do not in general have the


attractive properties mentioned above – but almost. That is, when
the sample size goes to infinity (in a sufficiently well behaved way)
then the properties hold.

We say that the estimator is asymptotically unbiased, do


asymptotically have the smallest possible variance, is asymptotically
consistent and finally, the distribution of the estimator is
asymptotically normal.
October 18, 2001 Mixed Models Course 32

113
5 Some Basic Statistical Concepts

These four properties of maximum likelihood estimators indicates


why this is such a powerful method.

Moreover, it turns out that the estimation process can be made by


maximizing a particular function, called the likelihood function.

Maximization of such a function can in practice be complicated, but


is in principle not much different from what we all learned in high
school: Calculate the derivative, set this one to zero and solve!

October 18, 2001 Mixed Models Course 33

Example 8. : Binomial Experiment

Consider n throws with a pin where θ = P r(“Falls with pin up”).


Hence the outcome of the ith toss can be {U p, Down} written
briefly as {1, 0} and

p(yi; θ) = P (Yi = yi; θ) = θ yi (1 − θ)1−yi

October 18, 2001 Mixed Models Course 34

114
Suppose the observed data are y = {1, 1, 0, 1, 0, 1, 0, . . . , 0, 0}.

If the outcomes of the tosses are independent, then the probability of


observing y is

p(y; θ) = p(y1; θ)p(y2; θ) . . . p(yn; θ)


= p(1)p(1)p(0)p(1)p(0)p(1)p(0) . . . p(0)p(0)
= θθ(1 − θ)θ(1 − θ)θ(1 − θ) . . . (1 − θ)(1 − θ)
= θ y+ (1 − θ)n−y+ (2)
P
where n is the number of times the pin is thrown and y+ = i yi is
the number of times the pin points up.

f in

October 18, 2001 Mixed Models Course 35

The Likelihood function

When data y is observed, p(y; θ) can be regarded as a function of


θ. This function is called the likelihood function and is denoted by
L(θ).

Hence in the example,

L(θ) = θ y+ (1 − θ)n−y+ .

To be specific, let the pin be thrown n = 25 times, and suppose that


pin up is observed y+ = 10 times. Then we have

L(θ; y) = θ 10(1 − θ)25−10


October 18, 2001 Mixed Models Course 36

115
5 Some Basic Statistical Concepts

Figure 1 shows a plot of L(θ) against θ for n = 25 and y+ = 10.

5*10^-8
4*10^-8
3*10^-8
Likelihood function

2*10^-8
10^-8
0

0.0 0.2 0.4 0.6 0.8 1.0

Theta value

Figure 1: Likelihood function for n = 25 and y+ = 10.

October 18, 2001 Mixed Models Course 37

116
The Maximum likelihood principle

The principle in maximum likelihood estimation is that

the estimate of θ is the value of θ which maximizes the likelihood


function.

One can think of θ̂ as the value of θ which maximizes the probability


of observing the data which one actually has observed.

October 18, 2001 Mixed Models Course 38

• This value is called the maximum likelihood estimate (MLE) and


is often denote by θ̂.

• The corresponding estimator is called the maximum likelihood estimator.

For clarity one should write θ̂(y) for the estimate and θ̂(Y ) for the
corresponding estimator, but this is too cumbersome to do. So,
except for special cases, we simple write θ̂ for both entities and then
derive from the context whether its is an estimate (a number) or and
estimator (the corresponding random variable).

Figure 1 suggests that 0.4 is the maximum likelihood estimate.

October 18, 2001 Mixed Models Course 39

117
5 Some Basic Statistical Concepts

It is often easier to maximize the log-likelihood function often


denoted by l(θ):

l(θ) = log L(θ) = y+ log θ + (n − y+) log(1 − θ)

Since log is a monotone function the value of θ maximizing l(θ) will


also maximize L(θ).

October 18, 2001 Mixed Models Course 40

Figure 2 shows a plot of l(θ) against θ for n = 25 and y+ = 10.


-20
-30
log-Likelihood function

-40
-50
-60
-70

0.0 0.2 0.4 0.6 0.8 1.0

Theta value

Figure 2: Log–Likelihood function for n = 25 and y+ = 10.

October 18, 2001 Mixed Models Course 41

118
Maximization of

l(θ) = y+ log θ + (n − y+) log(1 − θ)

is obtained by solving the equation

S(θ) = l0(θ) = 0,

where l0(θ) denotes the derivative of l(θ).

• The function S(θ) is called the score function.

• The equation S(θ) = 0 is called the likelihood equation.

We find that
y+ n − y +
S(θ) = l0(θ) = − =0
θ 1−θ
October 18, 2001 Mixed Models Course 42

119
5 Some Basic Statistical Concepts

which happens if and only if


y+
θ̂ =
n
Hence, the maximum likelihood estimate is just the relative
frequency. The corresponding maximum likelihood estimator is

Y+
θ̂(Y+) = .
n

Hence when y+(= 10) is observed the observed value of the


maximum likelihood estimator (i.e. the maximum likelihood
estimate) becomes θ̂(x) = θ̂(10) = 0.4 - in accordance with Figure 1
and Figure 2.

October 18, 2001 Mixed Models Course 43

How Good is the Estimate?

When y+ = 10 and n = 25 we have θ̂ = 0.4, but the same value is


found if y+ = 2 and n = 5.

However, intuition suggests that with 25 observations we should


have more confidence that θ̂ is a good estimate than with only 5
observations. That is, we would expect that the variance of the
estimator is smaller with 25 observations than with only 5.

It is well known for binomial experiments that V ar(Y+) = nθ(1 − θ)


and hence that V ar(θ̂) = θ(1 − θ)/n which indeed confirms the
intuition.

October 18, 2001 Mixed Models Course 44

120
In Figure 3 is shown the likelihood function for (n = 2, y+ = 5),
(n = 4, y+ = 10), (n = 10, y+ = 25) and (n = 20, y+ = 50).

0.0012
0.03

0.0008
Likelihood function

Likelihood function
0.02

0.0004
0.01
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Theta - y+=2, n=5 Theta - y+=4, n=10


5*10^-8

2*10^-15
Likelihood function

Likelihood function
3*10^-8

10^-15
10^-8
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Theta - y+=10, n=25 Theta - y+=20, n=50

Figure 3: Likelihood function for (n = 5, y+ = 2), (n = 10, y+ = 4),


(n = 25, y+ = 10) and (n = 50, y+ = 20).
October 18, 2001 Mixed Models Course 45

It is clear from those graphs that the more observations the more
“peaked” is the likelihood function and the higher is its curvature at
its maximum and.

That is, the value of L(θ̂) is more and more distinct from the value
of L(θ) for θ 6= θ̂ when more and more observations are made.

It is therefore not surprising that there is a connection (indeed it


turns out to be a close connection) between that variance of the
maximum likelihood estimator and the curvature of the likelihood
functions at its maximum.

This connection is presented in the next sections.

October 18, 2001 Mixed Models Course 46

121
5 Some Basic Statistical Concepts

The Asymptotic Normal Distribution of the MLE

In this section we present a very important result:

The maximum likelihood estimator is asymptotically normally


distributed.

This property of the MLE is central to much practical statistical


inference.

October 18, 2001 Mixed Models Course 47

Example 9. Frequently one is interested in making statements


about θ on the basis of the experiment. For example one might
be interested in whether on can reasonably assume that the true
value of θ is 0.5.

The key to answering this question is the random variable θ̂(y). Put
in a popular way, one has to investigate whether 0.5 is a “likely”
outcome of θ̂(Y ). To answer that question, one need to know
the distribution of θ̂(Y ) – and this distributions is in general very
complicated to find. f in

October 18, 2001 Mixed Models Course 48

122
Therefore one frequently resorts to an approximate result, on which
so much resides in statistics:

When n → ∞ and certain conditions are satisfied then it holds


approximately that

1
θ̂ ∼ N (θ, − )
l00(θ̂)

That is, the distribution of θ̂(X) will asymptotically be like a normal


distribution with the true (but unknown) parameter θ as expectation
and and a variance − l001(θ̂) .

October 18, 2001 Mixed Models Course 49

Example 10. For the binomial experiment, it is not hard to see why
the MLE is asymptotically normal:
We can regard y as a sum of independent random variables yi where
yi = 1 corresponds to pin up and yi = 0 is “pin not up”.
Hence the Central Limit Theorem gives that y is approximately
normally distributed, and hence so is θ̂ = y/n.
For a single experiment we know that E(yi) = θ and V ar(yi) =
θ(1 − θ). From this we find that
θ(1 − θ)
E(θ̂) = θ, V ar(θ̂) =
n
so approximately,
θ(1 − θ)
θ̂ ∼ N (θ, )
n
f in

October 18, 2001 Mixed Models Course 50

123
5 Some Basic Statistical Concepts

Example 11. In general, the answer is not so straight forward. We


therefore outline the “standard” calculations which one goes through
in this connection:

The expression for the variance is obtained as follows: Recall that the
likelihood and score functions are given by

l(θ) = x log θ + (n − x) log(1 − θ)


x n−x
S(θ) = l0(θ) = −
θ 1−θ

Differentiating the scorefunction and changing sign gives

x n−x
−l00(θ) = +
θ2 (1 − θ)2
October 18, 2001 Mixed Models Course 51

In practice θ is unknown. However, it can be justified to plug the


n
estimate θ̂ = x/n into l00(θ) and this gives −l00(θ̂) = θ̂(1− θ̂)
.

Hence, asymptotically,
!
θ̂(1 − θ̂)
θ̂ ∼ N θ,
n

October 18, 2001 Mixed Models Course 52

124
With n = 25, x = 10 we get θ̂ = 0.4 and V ar(θ̂) ≈ 0.0096. Hence,
an (approximate) 95% confidence interval for θ is
q q
(θ̂ − 1.96 V ar(θ̂) ; θ̂ + 1.96 V ar(θ̂))
= (0.4 − 0.19 ; 0.4 + 0.19) = (0.21; 0.59)

f in

October 18, 2001 Mixed Models Course 53

Asymptotical normality of transformations of the


MLE

If h is a function of θ then the distribution of h(θ̂) will,


asymptotically, look like a normal distribution with mean h(θ) and
variance which can be estimated by −(h0(θ̂))2/l00(θ̂), i.e.
asymptotically
h0(θ̂)2
h(θ̂) ∼ N (h(θ), − )
l00(θ̂)

October 18, 2001 Mixed Models Course 54

125
5 Some Basic Statistical Concepts

Example 12. For example, if we are more comfortable with


interpreting the odds η = h(θ) = θ/(1−θ) we find h0(θ) = 1/(1−θ)2.
Hence, asymptotically,

θ θ̂ θ
η̂ ∼ N ( , ) = N( , 0.0133).
1 − θ (1 − θ̂)n 1−θ

f in

October 18, 2001 Mixed Models Course 55

Tests of Hypotheses

The final point to touch upon concerns tests of hypotheses regarding


θ.

Suppose interest is in testing whether θ is equal to a specific fixed


value θ0.

The likelihood ratio test

The maximum likelihood estimate θ̂ is the value of θ which gives the


observed data the highest probability which is L(θ̂).

If the value θ0 assigns nearly the same probability L(θ0) as θ̂ does,


we would be tempted to accept the hypothesis that θ = θ0.
October 18, 2001 Mixed Models Course 56

126
In other words, it is tempting to consider the
likelihood ratio test statistic Q defined by

L(θ0)
Q=
L(θ̂)

Clearly Q is a number between 0 and 1 and values close to 1 are in


favor of the hypothesis.

It can be shown that if the hypothesis is true then

−2 log Q = 2(l(θ̂) − l(θ0))

has (when n is large) approximately a χ2 distribution with 1 degree


of freedom. Large values of −2 log Q leads to rejection of the
hypothesis. In Figure 4 it can be seen that −2 log Q is twice the
October 18, 2001 Mixed Models Course 57

vertical distance between the value of l in θ̂ and θ0. .

l(θ)




l(θ̂) − l(θ0 )


Slope
l0 (θ)

θ
.
θ̂ θ0

Figure 4: Illustration of the likelihood ratio test, the score test and
the Wald test.

The Score Test

A test statistic equivalent to −2 log Q is obtained by considering the


slope of l in the point θ0. It is known that the slope of l in θ̂ is 0
October 18, 2001 Mixed Models Course 58

127
5 Some Basic Statistical Concepts

(l(θ̂) = 0 by definition of the MLE.) Hence values of l 0(θ0) near 0


will also speak in favor of the hypothesis.

It can be shown that when n is large and the hypothesis is true, the
distribtion of the so called score test

S = −l0(θ0)2/l00(θ0)

will also look like a χ2 distribution with 1 degree of freedom.

Hence when n is large the likelihood ratio test and the score test are
equivalent.

The Wald Test

A third test is the Wald test which compares the values of θ̂ and θ0
October 18, 2001 Mixed Models Course 59

directly corresponding to the horizontal distance in Figure 4.

It can be shown that when n is large and the hypothesis is true, the
distribtion of the Wald test statistic

W = −(θ̂ − θ0)2(l00(θ̂))2

will also look like a χ2 distribution with 1 degree of freedom.

Note that in W is simply theqsquare of the difference (θ̂ − θ0) divided


by its standard deviation 1/ l00(θ̂). In the litterature, one frequently
use the term “Wald test” about the square root of W which yields a
test statistic approximately with a N (0, 1) distribution.

October 18, 2001 Mixed Models Course 60

128
Hence when n is large the likelihood ratio test, the score test and
the Wald test are equivalent.

October 18, 2001 Mixed Models Course 61

How to get the asymptotic normality

This section is somewhat theoretical.

Consider the following general setup: Let X be a single random


variable. The expectation and variance of X is
Z
µi = E(X) = xp(x; θ)dx
Z
V ar(X) = (x − µ)p(x; θ)dx.

Since X is a random variable, then so is the score function


S(θ; X) = l0(θ; X).
October 18, 2001 Mixed Models Course 62

129
5 Some Basic Statistical Concepts

For later purposes we need the mean and the variance of the score
function.

To obtain these quantities, we use the following facts:

1
S(θ) = l0(θ; x) = (log p(x; θ))0 = p0(x; θ)
p(x; θ)
1 2 1
S 0(θ) = l00(θ; x) = − 2
(p 0
(x; θ)) + p00(x; θ)
p(x; θ) p(x; θ)
Z
p(x; θ)dx = 1

The function S 0(θ) is called the Hessian (matrix) and is very


important in connection with PROC MIXED.
October 18, 2001 Mixed Models Course 63

Moreover, in most cases of practical interest, the order of


differentiation and integration can be interchanged. Hence

d d d
Z Z
p(x; θ)dx = p(x; θ)dx = 1 = 0
dθ dθ dθ

Mean of the score function We shall supress the dependence on X


in the following: We find that
Z Z
E(S(θ)) = E(l0(θ)) = l0(θ)p(x; θ)dx = p0(x; θ)dx

Interchanging the order of differentiation and integration yields

d d d
Z Z
E(S(θ)) = p(x; θ)dx = p(x; θ)dx = 1 = 0
dθ dθ dθ
October 18, 2001 Mixed Models Course 64

130
So the expected value of the score function is zero.

Variance of the score function The variance of the score function


has a special name, namely the Fisher information and is usually
denoted by I(θ). Hence we have

I(θ) = V ar(S(θ)) = E(S(θ)2)


= E([l0(θ)]2)
1
Z Z
2
= 0
l (θ) p(x; θ)dx = p0(x; θ)2dx
p(x; θ)

because the expected value is zero.

A more convenient expression for the variance can be found in


October 18, 2001 Mixed Models Course 65

terms of the derivative of the score funtion:

E(S 0(θ)) = E(l00(θ))


1 1
Z
0 2
= [− 2
(p (x; θ)) + p00(x; θ)]p(x; θ)dx
p(x; θ) p(x; θ)
1
Z
= [− (p0(x; θ))2 + p00(x; θ)]dx
p(x; θ)

Interchanging
R 00the order of differentiation and integration as before
gives that p (x; θ)dx = 0. Hence

1
Z
0
E(S (θ)) = − (p0(x; θ))2dx = −V ar(S(θ, X)).
p(x; θ)
October 18, 2001 Mixed Models Course 66

131
5 Some Basic Statistical Concepts

Hence we have for a single observation

E(S(θ)) = 0
I(θ) = V ar(S(θ)) = E(S(θ)2) = −E(S 0(θ)) (3)

The likelihood for all data

From (2) it is seen that the likelihood for all data is the product of
the likelihood for each observation, i.e.
Y
L(θ; y) = p(y1; θ) . . . p(yn; θ) = p(yi; θ),
i

Consequently, the log–likelihood, the score function and the


derivative of the score function for all data is a sum of independent
October 18, 2001 Mixed Models Course 67

components:

X X
l(θ) = l(θ; yi) = li(θ)
i u
X X X
S(θ) = l0(θ; y) = l0(θ; yi) = S(θ; yi) = Si(θ),
i i i
X
S 0(θ) = Si0(θ), (4)
i

For a single observation we have

E(Si(θ)) = 0
I(θ) = V ar(Si(θ)) = E(Si(θ)2) = −E(Si0(θ))
October 18, 2001 Mixed Models Course 68

132
and correspondingly for all observations

E(S(θ)) = 0
V ar(S(θ)) = nI(θ).

We then need three small results:

Result 1: Since S 0(θ; y) = i Si0(θ) it is reasonable to assume (using


P

the law of large numbers) that


1 0 1X 0
S (θ) = Si(θ) ≈ E(Sk0 (θ)) = −I(θ)
n n i
P
Result 2: Since S(θ) = i Si(θ) is a sum of independent random
variables where E(Si(θ)) = 0 and V ar(Si(θ)) = I(θ). Hence by
October 18, 2001 Mixed Models Course 69

the central limit theorem, approximately

S(θ) ∼ N (0, nI(θ))

Result 3: Let θ0 be the true (but unknown to us) value of the


parameter θ. Let us assume that θ̂ is a good estimate, i.e. close to
θ0. Then

0 = S(θ̂) ≈ S(θ0) + S 0(θ0)(θ̂ − θ0)

That is
1 1 √
√ S(θ0) ≈ − S 0(θ0) n(θ̂ − θ0)
n n

≈ I(θ0) n(θ̂ − θ0)
October 18, 2001 Mixed Models Course 70

133
5 Some Basic Statistical Concepts

The left hand side is approximately N (0, I(θ)) distributed.


1
Hence, approximately, √nI(θ S(θ0) ∼ N (0, I(θ)−1). That is,
0)
approximately,

n(θ̂ − θ0) ∼ N (0, I(θ)−1).

or
θ̂ ∼ N (θ0, (nI(θ))−1).
as desired.

October 18, 2001 Mixed Models Course 71

Likelihood and Linear Normal Models

For a linear normal model maximum likelihood estimation is the


same as least squares estimation. The unknown parameters are β
and σ 2, so let θ = (β, σ 2).

October 18, 2001 Mixed Models Course 72

134
Because the observations are independent, the likelihood becomes

L(θ) = f (y1, ...yn; θ)


Yn
= f (yi; θ)
i=1
n
1 1
Y 1
= √ √ exp(− 2 (yi − µi)2)
i=1
2π σ 2 σ
1 1 1 X
= √ n √ n exp(− 2 (yi − µi))2)
2π σ 2 σ i
1 1 1
= √ n √ n exp(− 2 (y − Xβ)>(y − Xβ))
2π σ 2 σ

For the moment, suppose σ is known.


October 18, 2001 Mixed Models Course 73

Maximizing L(θ) = L(β, σ 2) is done by minimizing


2 >
P
i (yi − µi )) ) = (y − Xβ) (y − Xβ)). But this is exactly what is
done in least squares estimation.

October 18, 2001 Mixed Models Course 74

135
5 Some Basic Statistical Concepts

Once β has been estimated, it can be verified that the maximum


likelihood estimate for σ is

1
σ̂ 2 = (y − µ̂)>(y − µ̂)
n

In practice, one never uses this variance estimate. Instead one uses

1
σ̃ 2 = (y − µ̂)>(y − µ̂)
n−p

where p is the number of parameters in β.


October 18, 2001 Mixed Models Course 75

The reason for using the latter estimate is that

n−p 2
E(σ̂ 2) = σ
n
E(σ̃ 2) = σ 2

Hence the latter estimate is unbiased while the former is not.

October 18, 2001 Mixed Models Course 76

136
6 An overview

The purpose of this lecture was to illustrate, how the problems of the research within the
biological sciences is related to the progress within statistical theory both in general, and related
to mixed models.
Starting out with an experiment reported from Darwin, the lecture discussed the state of the art
of experimental design and analysis at Darwin’s time, proceeded with the progress in statistical
theory, very much related to animal breeding, and ended up with the general theory of mixed
models. Important researchers such as F. Galton, R.A. Fisher, S. Wright, C.R.Henderson were
presented.
The slides are in Danish. Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/oversigt.f.pdf

137
6 An overview

Outline
• Baggrund for metoder
• Historisk forløb
• Relation til vores fagområder

February 7, 2001

Darwins Majs

• C. Darwin (1876) The effects of cross- and self-fertilisation in the


vegetable Kingdom. John Murray, London.

• c.f. Fisher, R.A. (1935) Design of Experiments. Oliver and Boyd.

February 7, 2001

138
3

Darwins Majs
column I II III
Crossed Self. -fert
Pot I 23 48 17 38
12 20 38
21 20
Pot II 22 20
19 18 18 38
21 48 18 58
Pot III 22 18 18 58
20 38 15 28
18 28 16 48
21 58 18
23 28 16 28
Pot IV 21 18
22 18 12 68
23 15 48
12 18

February 7, 2001

Darwins Majs
column I II III
Crossed Self. -fert
Pot I 23.50 17.38
12.00 20.38
21.00 20.00
Pot II 22.00 20.00
19.13 18.38
21.50 18.63
Pot III 22.13 18.63
20.38 15.25
18.25 16.50
21.63 18.00
23.25 16.25
Pot IV 21.00 18.00
22.13 12.75
23.00 15.50
12.00 18.00

February 7, 2001

139
6 An overview

Darwins Majs
” As only a moderate number of crossed and self-fertilised plants
were measured, it was of great importance to learn, how far the
averages were trustworthy. I therefore asked Mr Galton, who has
much experience in statistical researches, to examine some of my
tables..... I may premise that if we took by chance a dozen score of
men belonging to different nations and measured them, it would I
presume, be very rash to form any judgment from such small
numbers on their average heights. But the case is somewhat
different with my crossed and self-fertilised plants, as they were of
exactly the same age, were subjected from first to last to the same
conditions, and were descended from the same parents”

February 7, 2001

Galtons tilgang
column I II III Sorteret Diff.
Crossed Self. -fert Crossed Self. -fert
Pot I 23.50 17.38 23.50 20.38 3.125
12.00 20.38 23.25 20.00 3.250
21.00 20.00 23.00 20.00 3.000
Pot II 22.00 20.00 22.13 18.63 3.500
19.13 18.38 22.13 18.63 3.500
21.50 18.63 22.00 18.38 3.625
Pot III 22.13 18.63 21.63 18.00 3.625
20.38 15.25 21.50 18.00 3.500
18.25 16.50 21.00 18.00 3.000
21.63 18.00 21.00 17.38 3.625
23.25 16.25 20.38 16.50 3.875
Pot IV 21.00 18.00 19.13 16.25 2.875
22.13 12.75 18.25 15.50 2.750
23.00 15.50 12.00 15.25 -3.250
12.00 18.00 12.00 12.75 -0.750

February 7, 2001

140
7

Galtons Tilgang

• Sortering

• Differencer

• Spredning (Most probable error) – men ikke t-test

February 7, 2001

Hvem var Galton

Anthropologi, Meteorologi, populationsgenetik, Eugenics


(arvehygiejne), fingeraftryk, Korrelation.

Meget interesseret i målemetoder, objektiv kvantificering af


fænomener.

K. Pearson’s Guru

February 7, 2001

141
6 An overview

Korrekt tilgang ?
column I II III Diff.
Crossed Self. -fert
Pot I 23.50 17.38 3.125
12.00 20.38 3.250
21.00 20.00 3.000
Pot II 22.00 20.00 3.500
19.13 18.38 3.500
21.50 18.63 3.625
Pot III 22.13 18.63 3.625
20.38 15.25 3.500
18.25 16.50 3.000
21.63 18.00 3.625
23.25 16.25 3.875
Pot IV 21.00 18.00 2.875
22.13 12.75 2.750
23.00 15.50 -3.250
12.00 18.00 -0.750

February 7, 2001

10

Korrekt tilgang ?
• Differencer

• Spredning + t-test

• Anova. Lineær Normal Model.

• Hypotesetest. Nul hypoteser.

• Uafhængighedsantagelse.

• Randomisering

February 7, 2001

142
11

Korrekt tilgang ?
column I II III Diff.
Crossed Self. -fert
Pot I 23.50 17.38 3.125
12.00 20.38 3.250
21.00 20.00 3.000
Pot II 22.00 20.00 3.500
19.13 18.38 3.500
21.50 18.63 3.625
Pot III 22.13 18.63 3.625
20.38 15.25 3.500
18.25 16.50 3.000
21.63 18.00 3.625
23.25 16.25 3.875
Pot IV 21.00 18.00 2.875
22.13 12.75 2.750
23.00 15.50 -3.250
12.00 18.00 -0.750

February 7, 2001

12

Hvad er sket

• R.A. Fisher
? Rothamstead

• Student (W. Gossett)

February 7, 2001

143
6 An overview

13

Den 5. Potte

• Hvad forventer vi af udslag i potte 5. Hvad


er et gæt på forskellen ?

• Hvorfor ?.

• Hvad er et gæt på niveauet for Self-fertilized.

• Tilfældige effekter,
Populationer,
Stikprøver

February 7, 2001

14

Populationsgenetik

• Population

• P =A+M

• V(P ) = V(A) + V(M )


V(A)
• h2 = V(P )

• Ao = 12 Am + 21 Af

February 7, 2001

144
15

Populationsgenetik

• R.A. Fisher

• Sewall Wright

• (Haldane)

February 7, 2001

16

Hierarkiske populationer

 
.

Sires

   Females

 
 Offspring
.

February 7, 2001

145
6 An overview

17

Populationsgenetik/ Husdyravl

• R.A. Fisher

• Sewall Wright

• Jay R. Lush

• C.R. Henderson

• S.R. Searle.

February 7, 2001

18

Husdyravl

• Oprindelig Hierarkisk Struktur

• Strukturen bryder ned, specielt pga. KS

• Metoder til krydset klassifikation

• Henderson’s Mixed Model Equations

February 7, 2001

146
19

Husdyravl

• Hovedvægt på estimation (Selektion)

• Afhængighed beskrives ved residual varians og


heretabilitet

• Problem er primært regneteknisk (Matrice-


invertering)

• Normalt MANGE! observationer

• Hypotesetest af mindre interesse

February 7, 2001

20

Mixed Models generelt

• Gentagne målinger/longitudinelle data

• Spatiale observationer

• Hierarkiske forsøgsdesign (e.g. split-plot)

• Mixed Model Equations fælles referenceramme

• Fælles program udvikling

February 7, 2001

147
6 An overview

21

Mixed Models generelt

• Hypotesetest af stor interesse

• Afhængighed beskrives ved mange


variansparametre

• Begrænset antal observationer

• Stadig løse ender

February 7, 2001

148
7 Experimental planning and design

The purpose of the lecture was to refresh the concepts used in experimental planning and design,
i.e., hypothesis, power of designs, blocking. Typical blocking factors were discussed.
Different types of experimental design, such as randomized block, split-plot, latin squares and
factorial designs, were discussed, and examples were sought within the participants areas of
research.
The slides are in Danish. Link to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Forsplanpl.f.pdf

149
7 Experimental planning and design

Outline
• Hypotheses
• Decision Support
• Need of information for planning
• Restrictions in experimental design
• Different designs

February 12, 2001

Forskningsprocessen

Pakke

Ansøgning
Publicering

Forsøg

February 12, 2001

150
3

Forskningsprocessen

• Få ideer til områder, hvor eksisterende


viden/teori er utilstrækkelig/forkert

• Foretage iagttagelser, så ideerne kan be- eller


afkræftes

• Beslutte om viden/teori skal justeres

• (Kvantificering af viden)

• Gruppearbejde over tid og sted

February 12, 2001

Darwins Majs
column I Height, Inch
Crossed Self. -fert
Pot I 23.50 17.38
12.00 20.38
21.00 20.00
Pot II 22.00 20.00
19.13 18.38
21.50 18.63
Pot III 22.13 18.63
20.38 15.25
18.25 16.50
21.63 18.00
23.25 16.25
Pot IV 21.00 18.00
22.13 12.75
23.00 15.50
12.00 18.00

February 12, 2001

151
7 Experimental planning and design

Hypotheses

Hypothesis A GMO sugar beets are not harmfull to cows

Hypothesis B GMO sugar beets are harmfull to cows

Hypothesis A Pesticide use reduces fertility

Hypothesis B Pesticide use do not reduce fertility

February 12, 2001

Luse Beslutningsstøtte

Table 1: Sprøjteeksempel – gevinsttabel


Afgrødens tilstand
Beslutning Ingen lus Lus
Sprøjt
Omkostninger til Omkostninger til
sprøjtemiddel og sprøjtemiddel og
arbejde arbejde
Sprøjt ikke 0 Udbytte tab

February 12, 2001

152
7

Forskning Beslutningsstøtte

Table 2: Forskningseksempel – gevinsttabel


’Verdens’ tilstand
Beslutning Hypotese 1 er Hypotese 2 er
sand sand
Accepter hypotese 1 OK Fejl !
Accepter hypotese 2 Dyr fejl ! Gennembrud
!

February 12, 2001

Typer af fejlkonklusion

Hypotese 1 Hypotese 2

Type I fejl Type II fejl

February 12, 2001

153
7 Experimental planning and design

Muligheder i designfase

Hypotese 1 Hypotese 2

Forøg præcision Forøg forsøgsudslag

-1 0 1 2 -1 0 1 2

NB!: Type I fejl er konstant, e.g. 0.05

February 12, 2001

10

Biologisk input

• Måleegenskaber

• Forventede forsøgsudslag

• Mulige konklusioner af forsøg

• Afhængige <> uafhængige hypoteser

• Hypotesegene(re)rende egenskaber

February 12, 2001

154
11

Table 3: Oversigt over forventede forsøgsudslag


Egenskab Hypotese 1 er sand Hypotese 2 er sand
Behandling 1 Behandling 2 Behandling 1 Behandling 2
A 100 100 100 120
B
.. .. .. .. ..

February 12, 2001

12

Typiske blokfaktorer

• Kuld
• Sti, Flok, Bur
• Køn
• Afstamning
• Besætning
• Observatør

February 12, 2001

155
7 Experimental planning and design

13

Begrænsninger i design muligheder

• Blokstørrelse
• Opstaldning/Management
• Ressource kamp

February 12, 2001

14

Designtyper

• Randomiseret Blokforsøg
• Split-Plot forsøg
• Romer Kvadrat
• Ikke komplette blokforsøg
• Faktorielle forsøg
• Fraktionerede designs

February 12, 2001

156
8 Randomized Complete Block Design

These are the first slides in the second block of lectures. They start off with the augmentation
of the linear normal model to a mixed model. Then PROC MIXED in SAS were presented, and
example 1.2.1 in LMSW (Littell et al., 1996) were discussed. The slides can be seen as a summary
of chapter 1 in LMSW.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RDBC.f.pdf

157
8 Randomized Complete Block Design

Outline
• Hypotheses

• Udvidelse af LNM

• Introduktion til Proc Mixed

• RCBD eksempel (1.2.4)

February 28, 2001

Linear Normal Model

Y11 = δ + α1 + u1 + ε11
Y12 = δ + α2 + u1 + ε12
Y21 = δ + α1 + u2 + ε21
Y22 = δ + α2 + u2 + ε22

εij ∼ N (0, σ 2)
Y ∼ N (0, σ 2I)

February 28, 2001

158
5

Matrix formulering

       
Y11 1 1 0   1 0 ε11
 δ    
Y12 1 0 1   1 0 u1 ε 
   
 =  α1 +  +  12
Y21 1 1 0 0 1 u2 ε21

α2
Y22 1 0 1 0 1 ε22

Y = Xβ + Zu + ε

ε ∼ N (0, R), u ∼ N (0, G)

V(Zu) = Z V(u)Z > = ZGZ >

V(Y ) = ZGZ > + R

February 28, 2001

Random vs. Fixed

• Do the levels of the factor come from a probability distribution?


McCulloch & Searle (1997)

• Are Inferences to be drawn from these data about just these level
of the factor ? Searle, (1971)

February 28, 2001

159
8 Randomized Complete Block Design

ML - estimation

Type Distribution Estimate

LNM Y ∼ N (Xβ, σ 2I) β̂ = (X >X)−1X >y


If V is known:
LMM Y ∼ N (Xβ, V ) β̂ = (X >V −1X)−1X >V −1y

V = ZGZ > + R is not known, depends on parameters,


V = f (σ 2, σu2 ).

February 28, 2001

Likelihood function

1 1 n
l(y, β, σ 2, σu2 ) = − log |V | − (y − Xβ)>V −1(y − Xβ) − log(2π)
2 2 2
−40
−41
Loglike
−42
−43
−44

1 2 3 4 5
February 28, 2001 σ2

160
9

Proc Mixed I

PROC MIXED < options > ;


BY variables ;
ID variables ;
WEIGHT variable ;

February 28, 2001

10

Proc Mixed II

CLASS variables ;

MODEL dependent = < fixed-effects > < / options > ;


RANDOM random-effects < / options > ;
REPEATED < repeated-effect> < / options > ;

PARMS (value-list) ... < / options > ;


PRIOR <distribution > < / options > ;

February 28, 2001

161
8 Randomized Complete Block Design

11

Proc Mixed III

CONTRAST ’label’ < fixed-effect values ... >


< | random-effect values ... > , ... < / options > ;
ESTIMATE ’label’ < fixed-effect values ... >
< | random-effect values ... >< / options > ;
LSMEANS fixed-effects < / options > ;
MAKE ’table’ OUT=SAS-data-set ;

February 28, 2001

12

Proc Mixed

Model concerns Xβ

Random concerns Zu and G = V(u)

Repeated concerns ε and R = V(ε)

February 28, 2001

162
13

Ingot Støbeblok/metal barre


metal Metal brugt til lodning (?) af Ingot (nickel, iron, copper)
Pres Tryk der brækker lodningen

/*---Data Set 1.2.4---*/


data rcb;
input ingot metal $ pres;
datalines;
1 n 67.0
1 i 71.9
1 c 72.2
.
.

February 28, 2001

14

Design

Ingot no.
Lodning 1 2 3 4 5 6 7
1 n i c c c n n
2 c n i i n c i
3 i c n n i i c

February 28, 2001

163
8 Randomized Complete Block Design

15

Andre eksempler på RCBD

• Parrede observationer
Den rullende Afprøvning
• (Beretning 685) Stigende mængder solsikkefrø (4 niveauer). 20
kuld a 4 grise.
• Beretning 546. Opdrætningsintensitet, Jersey. 10 par enæggede
tvillinger. Høj vs. lav intensitet,
• Forskningsrapport 25. Airwash systemet. Besætning opdeles efter
lige vs ulige konumre.

February 28, 2001

16

Proc Mixed model

proc mixed data=rcb;


class ingot metal;
model pres=metal;
random ingot;

lsmeans metal / pdiff;


estimate ’nickel mean’ intercept 1 metal 0 0 1;
estimate ’copper vs iron’ metal 1 -1 0;
contrast ’copper vs iron’ metal 1 -1 0;
run;

February 28, 2001

164
17

Anden notation

Yijk = µ + αi + uj + εij

uj ∼ N (0, σu2 )
εij ∼ N (0, σε2)

February 28, 2001

18

Tredje notation

Yijk = Xβ + Zu + ε

u ∼ N (0, G)
ε ∼ N (0, R)

February 28, 2001

165
8 Randomized Complete Block Design

19

SAS (8E) Output


The Mixed Procedure

Model Information

Data Set WORK.RCB


Dependent Variable pres
Covariance Structure Variance Components
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Containment

February 28, 2001

20

Class Level Information

Class Levels Values

ingot 7 1 2 3 4 5 6 7
metal 3 c i n

February 28, 2001

166
21

Dimensions

Covariance Parameters 2
Columns in X 4
Columns in Z 7
Subjects 1
Max Obs Per Subject 21
Observations Used 21
Observations Not Used 0
Total Observations 21

February 28, 2001

22

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 112.40987952
1 1 107.79020201 0.00000000

Convergence criteria met.

February 28, 2001

167
8 Randomized Complete Block Design

23

Estimate of σu2 , σε2

Covariance Parameter
Estimates

Cov Parm Estimate

ingot 11.4478
Residual 10.3716

February 28, 2001

24

Kriterier for fit af model, bruges ved modelsammenligninger.

Fit Statistics

-2 Res Log Likelihood 107.8


AIC (smaller is better) 111.8
AICC (smaller is better) 112.6
BIC (smaller is better) 111.7

February 28, 2001

168
25

Signifikans test

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

metal 2 12 6.36 0.0131

February 28, 2001

26

Degrees of Freedom

Numerator H0 : α1 = α2 = α3 = 0
 
  µ
0 1 −1 0  
 α 
K β = 0 ⇔ 0 1 0 −1  1 = 0
> 
α2
0 0 1 −1
α3

Num DF is rank(K)

February 28, 2001

169
8 Randomized Complete Block Design

27

Denominator Containment method: ”Denote the fixed effect in


question A, and search the RANDOM effect list for the effects that
syntactically contain A. For example, the RANDOM effect B(A)
contains A, but the RANDOM effect C does not, even if it has the
same levels as B(A).
Among the RANDOM effects that contain A, compute their rank
contribution to the (XZ) matrix. The DDF assigned to A is
the smallest of these rank contributions. If no effects are found,
the DDF for A is set equal to the residual degrees of freedom,
N − rank(XZ)”

Methods CONTAIN,BETWITHIN, RESIDUAL, SATTERTH, KENWARDROGER.


MODEL .... \DDFM=SATTERTH;

February 28, 2001

28

Output fra Estimate

Estimates Standard
Label Estimate Error DF t Value Pr > |t|

nickel mean 71.1000 1.7655 12 40.27 <.0001


copper vs iron -5.7143 1.7214 12 -3.32 0.0061

Contrasts
Num Den
Label DF DF F Value Pr > F

copper vs iron 1 12 11.02 0.0061

February 28, 2001

170
29

Least Squares Means

Standard
Effect metal Estimate Error DF t Value Pr > |t|

metal c 70.1857 1.7655 12 39.75 <.0001


metal i 75.9000 1.7655 12 42.99 <.0001
metal n 71.1000 1.7655 12 40.27 <.0001

Differences of Least Squares Means

Standard
Effect metal _metal Estimate Error DF t Value Pr > |t|

metal c i -5.7143 1.7214 12 -3.32 0.0061


metal c n -0.9143 1.7214 12 -0.53 0.6050
metal i n 4.8000 1.7214 12 2.79 0.0164

February 28, 2001

30

GLM

GLM:
Source DF Type III SS Mean Square F Value Pr > F

ingot 6 268.2895238 44.7149206 4.31 0.0151


metal 2 131.9009524 65.9504762 6.36 0.0131

Mixed:
Num Den
Effect DF DF F Value Pr >F

metal 2 12 6.36 0.0131

February 28, 2001

171
8 Randomized Complete Block Design

31

GLM:
Standard LSMEAN
metal pres LSMEAN Error Pr > |t| Number

c 70.1857143 1.2172327 <.0001 1


i 75.9000000 1.2172327 <.0001 2
n 71.1000000 1.2172327 <.0001 3

Mixed: Least Squares Means

Standard
Effect metal Estimate Error DF t Value Pr > |t|

metal c 70.1857 1.7655 12 39.75 <.0001


metal i 75.9000 1.7655 12 42.99 <.0001
metal n 71.1000 1.7655 12 40.27 <.0001

February 28, 2001

32

GLM: Standard
Parameter Estimate Error t Value Pr > |t|

nickel mean 71.1000000 1.21723265 58.41 <.0001


copper vs iron -5.7142857 1.72142692 -3.32 0.0061

Mixed: Standard
Label Estimate Error DF t Value Pr > |t|

nickel mean 71.1000 1.7655 12 40.27 <.0001


copper vs iron -5.7143 1.7214 12 -3.32 0.0061

February 28, 2001

172
33

Summary

• Model specification
• Output elements
• Estimation Methods
• Fit Statistics/Information Criterias
• Degrees of freedom, model parameters.
• GLM differs

February 28, 2001

34

IC Options

The IC option displays a table of various information criteria. The


criteria are all in smaller-is-better form, and are described in .
Criteria Formula Reference
AIC −2l + 2d Akaike (1974)
n∗
AICC −2l + 2d n∗−d−1 Burnham and Anderson (1998)
HQIC −2l + 2d log(log(n)) Hannan and Quinn (1979)
BIC −2l + d log(n) Schwarz (1978)
CAIC −2l + d(log(n) + 1) Bozdogan (1987)
Here l denotes the maximum value of the (possibly restricted) log
likelihood, d the dimension of the model, and n the number of
observations. In Version 6 of SAS/STAT software, n equals the
February 28, 2001

173
8 Randomized Complete Block Design

35

number of valid observations for maximum likelihood estimation and


n − p for restricted maximum likelihood estimation, where p equals
the rank of X. In later versions, n equals the number of effective
subjects as displayed in the ”Dimensions” table, unless this value
equals 1, in which case n equals the number of levels of the first
RANDOM effect you specify. If the number of effective subjects
equals 1 and you have no RANDOM statements, then n reverts to
the Version 6 values. For AICC (a finite-sample corrected version of
AIC), n∗ equals the Version 6 values of n, unless this number is less
than d + 2, in which case it equals d + 2.

For restricted likelihood estimation, d equals q the effective number


of estimated covariance parameters. In Version 6, when a parameter
estimate lies on a boundary constraint, then it is still included in the
calculation of d, but in later versions it is not. The most common
February 28, 2001

36

example of this behavior is when a variance component is estimated


to equal zero. For maximum likelihood estimation, d equals q + p.

For ODS purposes, the name of the ”Information Criteria” table is


”InfoCrit.

February 28, 2001

174
9 Randomized Complete Block Design II

These slides discussed the concept of BLUE and BLUP estimates. The question of model control
is addressed.
Link to full-screen presentation presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RDBC2SLU.f.pdf

175
9 Randomized Complete Block Design II

Outline
• BLUEs and BLUPs
• Examples of model control

February 28, 2001

BLUEs and BLUPs

• Best Linear Unbiased Estimator l>Xβ0


• Best Predictor: E(u|y)

February 28, 2001

176
3

Linear Regression
2

2
1

1
x2

x2
0

0
−1

−1
−2

−2 −1 0 1 2 −2 −2 −1 0 1 2
x1 x1

February 28, 2001

Linear Regression

     
x2 µx2 V2 C21
∼N ,
x1 µx1 C12 V1

E(X2|X1) = µx2 + C21V1−1(x1 − µx1 )


V(X2|X1) = V2 − C21V1−1C21
>

V(E(X2|X1)) = C21V1−1C21
>

February 28, 2001

177
9 Randomized Complete Block Design II

     
u µu G C
∼N ,
y µy C> V

u1 = u1
u2 = u2
Y11 = δ + α1 + u1 + ε11
Y12 = δ + α2 + u1 + ε12
Y21 = δ + α1 + u2 + ε21
Y22 = δ + α2 + u2 + ε22

February 28, 2001

BLUEs and BLUPs

• Best Linear Unbiased Estimator l>Xβ0


• Best Predictor: E(u|y)
• Best Linear Predictor: (µu) + CV −1(y − µy )
• Best Linear Unbiased Predictor:
BLUP(t>Xβ + s>u) = t>Xβ0 + s>CV −1(y − Xβ0)
• Estimated Best (?) Linear Unbiased Predictor:
EBLUP(t>Xβ + s>u) = t>X β̂0 + s>Ĉ V̂ −1(y − X β̂0)

EBLUP(t>Xβ + s>u) = t>X β̂0 + s>ĜZ >V̂ −1(y − X β̂0)

February 28, 2001

178
7

Variance in BLUP

u true value, ũ BLUP estimate, εu error in prediction.

u = ũ + εu ⇔ u − ũ = εu

V(u) = V(ũ) + V(εu)

The error of prediction:

V(ũ − u) = G − CV −1C >

The variance in BLUP value:

V(ũ) = CV −1C >

February 28, 2001

Example

One-way classification model: Effect of number of observations per block

niσu2
ũi = BLUP(ui) = (ȳi· − µ)
σ2 + niσu2

i: block no, ni: number of observations in block i, ȳi·: block mean.


2
ni σu
As n → ∞ the coefficient σ 2 +ni σu
2 → 1 and the variance of the BLUP estimates
V(ũi) → G.

February 28, 2001

179
9 Randomized Complete Block Design II

Fixed vs. Random


2.5

2.5
2.0

2.0
1.5

1.5
1.0

1.0
0.5

−5 0 5 10 0.5 −30 −20 −10 0 10


Block Effect Block Effect

(c) Ingots (d) Litter

February 28, 2001

10

BLUP summary

• BLUP corresponds to the conditional expectation of the random effect given


observation

• Under normality assumptions and known variance BP=BLUP

• With unknown variance this no longer holds.

• Variance of BLUPs depends on the precision of information concerning the


random effects

February 28, 2001

180
11

Model check - LNM - model

• εi are independent and indentically distributed εi ∼ N (0, σ 2)

• Residual vs. predicted

• Residual vs. anything else

• Probit plots.

• εi,t vs εi,t−1

• etc.

February 28, 2001

12

Residuals – Mixed Models

Distribution of residuals Mixed Models

(y − Xβ) = (Zu + ε) ∼ N (0, V )

i.e., not iid. ( option OUTPM in PROC MIXED)

Another definition of residuals

(y − Xβ − Z E(u)) = (Z(u − ũ) + ε) ∼ N (0, VG − VGV −1VG> + R)

where VG = ZGZ >. i.e., not iid. ( option OUTP in PROC MIXED)

Standardized residuals ?

February 28, 2001

181
9 Randomized Complete Block Design II

13

Residual vs predicted

10
10

5
0

0
r1

r2
−20 −15 −10 −5
−10
−20

72 73 74 75 66 68 70 72 74 76 78
p1 p2

(e) (f)

February 28, 2001

182
10 Split-Plot Experiments

These slides present the theoretical background for split-plot designs. The slides augments the
presentation of split-plot designs in chapter 2 in LMSW, (Littell et al., 1996). The concept of
variance-components are presented, and the different variance of different contrast presented. In
addition concepts such the distribution of Sum of Squares, Satterthwaite’s approximation and
the distinction between random and fixed effects are presented.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SplitPlot.f.pdf

183
10 Split-Plot Experiments

The General Idea behind Split–Plot Experiments

“Once upon a time there were linear normal models -


systematic effects plus one error term...”
Yet, many experiments and studies have a hierarchical structure

• with respect to treatments

• with respect to error structures

Split–plot models is a very powerful (and early) way of handling such


situations.
April 6, 2001 Mixed Models Course 1

The name “split–plot” comes from the area of field experiements:

• Some treatments (say factor A) are applied to entire plots (parcels).


Those plots are called whole–plot–units and the factor A is the
whole plot factor.

• A plot is sometimes further sub–divided into sub–plots and other


treatments (say factor B) are applied to each of these sub–plots. The
sub–plots are called split–plot–units and the factor B is called the
split–plot factor.

April 6, 2001 Mixed Models Course 2

184
Other examples:

• Treatment A (e.g. feeding) applied to a whole pig pen (the whole–plot)


while treatment B (something...) is applied to pigs within a pen.

• Treatment A is applied to an entire litter of piglets, treatment B is


applied to each piglet in the litter.

• Treatment A is a management straegy applied to a whole farm, while


treatment B is a treatment of each pig pen on the farm.

April 6, 2001 Mixed Models Course 3

The basic property of split–plot experiments is that subjects within a


whole–plot are more similar than than subjects on in different
whole–plots.

More generally, Subjects/individuals/plots close (in some sense) to each


other are expected to be more similar than if they were further apart.

Split–plit models are sometimes appropriate for analyzing repeated


measurements.

April 6, 2001 Mixed Models Course 4

185
10 Split-Plot Experiments

Example 1. (Example 2.2 from LMSW).

• The effect of 3 bacterial inoculation treatments (INOC, indexed with j)


applied to 2 grass cultivars (CULT, indexed with i).

• There are 4 blocks (BLOCK, indexed with k.) and and CULT is randomly
assigned to each half of the block.

• Half a block is the whole–plot unit. Each whole–plot unit is subdivided


into 3 split plot units and each INOC is applied there.

The statistical model is

yijk = µ + αi + βj + γij + rk + wik + ijk

where rk ∼ N (0, σr2), wik ∼ N (0, σw


2
) and ijk ∼ N (0, σ 2). f in

April 6, 2001 Mixed Models Course 5

Variance and Correlation

The total variance is

Var(yijk ) = Var(rk ) + Var(wik ) + Var(ijk )


= σr2 + σw
2
+ σ 2 = σtot
2

which justifies the name variance component model:

• The total variance is a sum of individual variance contributions.

• Moreover, each variance contribution can be assigned to a specific


feature of the experiment.

April 6, 2001 Mixed Models Course 6

186
The variance components have implications for the correlation structure
among the variables:

1. Observations within the same block (k) but with different levels
of factor A (i) are correlated through the block component
Cov(yijk ,yi0jk ) σr2
Corr(yijk , yi0j 0k ) = Corr(yijk , yi0jk ) = Var(yijk ) = 2
σtot

2. Observations within the same block (k) and with the same level
of factor A (i) but different levels of factor B (j) are correlated
through the block component and the whole–plot component
Cov(yijk ,yij 0k ) σr2+σw 2
Corr(yijk , yij 0k ) = 2
σtot
= 2
σtot

April 6, 2001 Mixed Models Course 7

Hence in the split plot model it is assumed that the correlation, when
present, is positive.

The split–plot structure has important implications with respect to the


statistical inference:

1. The effect of the interaction between A and B and treatment B itself


should be compared with the “residual variation” i.e. the variation
between the split–plot units.

2. The effect of treatment A should be compared with the “whole–plot


variation”, i.e. the variation between the whole–plot units.

We shall illustrate these points for a balanced split–plot experiment.

April 6, 2001 Mixed Models Course 8

187
10 Split-Plot Experiments

Comparing Differences

Consider again the model

yijk = µ + αi + βj + γij + rk + wik + ijk

where rk ∼ N (0, σr2), wik ∼ N (0, σw 2


) and ijk ∼ N (0, σ 2), and
i = 1 . . . a, j = 1 . . . b and k = 1 . . . c.

A simple calculation of differences of means illstrates the special issues


arising in a split–plot experiment.

April 6, 2001 Mixed Models Course 9

Different levels of factor A can be compared by

ȳ1.. − ȳ2.. = α1 − α2 + γ̄1. − γ̄2. + (w̄1. − w̄2.) + (¯


1.. − ¯2..)
Var(ȳ1.. − ȳ2..) = Var(w̄1. − w̄2.) + Var(¯
1.. − ¯2..)
2
σw σ2 2 2 σ2
= 2 + 2 = (σw + )
c bc c b

Different levels of factor B can be compared by

ȳ.1. − ȳ.2. = β1 − β2 + γ̄.1 − γ̄.2 + (¯


.1. − ¯.2.)
2 σ2
Var(ȳ.1. − ȳ.2.) = .1. − ¯.2.) = ( )
Var(¯
c a

April 6, 2001 Mixed Models Course 10

188
Hence Var(ȳ1.. − ȳ2..) is bigger than Var(ȳ.1. − ȳ.2.).

In other words, the effect of the whole–plot–factor is determined less


accurately the the effect of the split–plot factor.

April 6, 2001 Mixed Models Course 11

Inference Issues for Mixed Models

For balanced experiments, inference is based on F–tests.

For unbalanced cases, inference is a delicate issue. Loosely speaking


“What are the denominator degrees of freedom”.

In PROC MIXED one can make “approximate F–tests” (but SAS never
informs you that the tests are only approximate).

Several suggestions have been made regarding this. One such is


Satterthwaites Approximation.

April 6, 2001 Mixed Models Course 12

189
10 Split-Plot Experiments

Analysis of the Split–Plot Experiment

Consider again the model

yijk = µ + αi + βj + γij + rk + wik + ijk

where rk ∼ N (0, σr2), wik ∼ N (0, σw 2


) and ijk ∼ N (0, σ 2), and
i = 1 . . . a, j = 1 . . . b and k = 1 . . . c.

For simplicity suppose that factor B does not represent a treatment but
only replications within each whole–plot. Then the model reduces to

yijk = µ + αi + rk + wik + ijk


April 6, 2001 Mixed Models Course 13

The replicates due to factor B are eliminated by calculating the average


within each block and treatment:

2
ȳi.k = µ + αi + rk + (wik + ¯i.k ) where Var(wik + ¯i.k ) = σw + σ 2/b

2
• Hence the between whole–plot variation (σw ) remains unchanged while
2
the within whole–plot variation σ is reduced by a factor b.

• Therefore by taking more replicates within a whole–plot unit, parts of


the variation is reduced , while other parts of the variation
remains the same.

April 6, 2001 Mixed Models Course 14

190
Modelling the Mean

Let zik = ȳi.k denote the mean and define uik = wik + ¯i.k .

Then the model for the means can be written

zik = µ + αi + rk + uik

where uik ∼ N (0, σu2 ) with Var(uik ) = σw


2
+ 1b σ 2 = σu2 and
rk ∼ N (0, σr2).

This is an ordinary ANOVA–model with one treatment, one (random)


block effect and no interaction. Analyzing such a model is straight
forward.
April 6, 2001 Mixed Models Course 15

Three Technical Results

In connection with ANOVA calculations, one frequently uses the following results:

ANOVA1: Let X, Y be independent with E(X) = E(Y ) = 0 and let a be a number.


Then

E(a + X + Y )2 = V ar(a + X + Y ) + [E(a + X + Y )]2


= Var(X) + Var(Y ) + a2 = E(X 2) + E(Y 2) + a2

ANOVA2:
Pn Let Y1, . . . , Yn be independent with Yi ∼ N (µ, σ 2), and let SSD =
2
i=1 (Yi − Ȳ. ) . Then

E(SSD) = (n − 1)σ 2 = (n − 1) Var(Yi)


SSD ∼ σ 2χ2(n − 1)

April 6, 2001 Mixed Models Course 16

191
10 Split-Plot Experiments

ANOVA3: Let Y1, . . . , Yn be independent with Yi = µi + i, where i ∼ N (µi, σ 2), and
let n n
X X
SSD = (Yi − Ȳ.)2 and Q(µ) = (µi − µ̄.)2
i=1 i=1
Then
n
X
E(SSD) = Q(µ) + E( (i − ¯.)2) = Q(µ) + (n − 1)σ 2
i=1

April 6, 2001 Mixed Models Course 17

With
zik = µ + αi + rk + uik

summation gives

z̄i. = µ + αi + r̄. + ūi.


z̄.. = µ + ᾱ. + r̄. + ū..

The difference
z̄i. − z̄.. = (αi − α.) + (ūi. − ū..)

is a measurement of the treatment effect, and does not depend on the


block.
April 6, 2001 Mixed Models Course 18

192
− z̄..)2 we find that
P
Letting SSDA = i(z̄i.

X X
E(SSDA) = 2
(αi − α.) + E( (ūi. − ū..)2)
i i
σu2
= Q(α) + (a − 1)
and hence
c
X
E(c (z̄i. − z̄..)2) = Q(α) + (a − 1)σu2 .
i

• If there is no effect of treatment A then Q(α) = 0 and SSDA has a


χ2–distribution.

• To be able to make the F –test we need to find a quantity which has


σu2 as expected value no matter whether αi = 0 or not.
April 6, 2001 Mixed Models Course 19

− z̄i. − z̄.k + z̄..)2. It is easy to see that


P
1. Let SSDAC = ik (zik

zik − z̄i. − z̄.k + z̄.. = uik − ūi. − ū.k + ū..

2. It is not difficult to verify (and it can be found in any standard text


book on statistics) that

E(SSDAC ) = σu2 (a − 1)(c − 1).

3. Finally it is equally easy to verify that SSDA and SSDAC are


independent.
April 6, 2001 Mixed Models Course 20

193
10 Split-Plot Experiments

4. Therefore the F –statistic for testing αi = 0 becomes

c · SSDA/(a − 1)
F =
SSDAC /(a − 1)(c − 1)
c i(z̄i. − z̄..)2/(a − 1)
P
= P 2
ik (zik − z̄i. − z̄.k + z̄..) /(a − 1)(c − 1)
∼ Fa−1,(a−1)(c−1)

Large values of F are critical to the hypothesis.

April 6, 2001 Mixed Models Course 21

• The important point is that the treatment effect of factor A is “tested


against” the variance σu2 = σw2
+ σ 2/b. which largely consists of the
2
whole–plot variation (σw ) + a “minor” contribution from the split–plot
2
variation (σ /b).

• In the balanced case, the test for αi = 0 can be made by simply


analyzing the “means”. That is the reason why PROC GLM in
special (balanced) cases can make the correct tests in certain variance
component models.

April 6, 2001 Mixed Models Course 22

194
Back to the Original Setup

Return to the original model with a treatment effect of factor B, i.e.

yijk = µ + αi + βj + γij + rk + wik + ijk

1. The interaction effect γij is tested exactly as if wik and rk had been
fixed effects. I.e. the test is made “against” the residual variation σ 2.

2. In the absence of γij , the main effect βj is also tested as if wik and rk
had been fixed effects.

3. The main effect of factor A is tested as described previously. Just note


that the effect of B cancels out in all calculations.

April 6, 2001 Mixed Models Course 23

Unbalanced cases

All the nice calculations previously presented breaks down when the
design is no longer balanced.

Consider again
yijk = µ + αi + rk + wik + ijk
and suppose this time that i = 1 . . . a, k = 1 . . . c and j = 1 . . . bik .

Hence there might not be the same number of replicates (j) within each
whole–plot unit.

April 6, 2001 Mixed Models Course 24

195
10 Split-Plot Experiments

As before, the replicates due to factor B are eliminated by calculating


the average within each block and treatment:

zik = ȳi.k = µ + αi + rk + (wik + ¯i.k )

But now with uik = wik + ¯i.k

2
Var(uik ) = σw + σ 2/bik = σu2 ik

That is, the zik s have different variances.

April 6, 2001 Mixed Models Course 25

1. One unpleasant consequence of this is that

z̄i. = µ + αi + r̄. + ūi.

2
has variance (σw + σu2 i. )/c which depends on i.

2. Another, equally unpleaseant, consequence is that SSDAC from before


does not have a χ2 distribution.

3. Consequently, the F –statistic from before does not have an F


distribution.

April 6, 2001 Mixed Models Course 26

196
Some consequences of this:

• Hence we can still calculate the F –statistic, but it has an unknown


distribtution in the unbalanced case.

• Hence we have a problem in judging whether an observed F –statistic


is “large”.

• It seems plausible that when the experiment is “nearly balanced”, then


F must “nearly be F –distributed. But what is “nearly balanced”, and
what to do when the experiment is very unbalanced?

April 6, 2001 Mixed Models Course 27

A related problem:

A related problem arises even in the balanced case. Suppose interest is in


comparing
µ11 − µ21 = α1 − α2 + γ11 − γ21.

The optimal estimate of this contrast is in the balanced case the differnce

ȳ11. − ȳ21.

and the variance of that difference is


2 2
Var(ȳ11. − ȳ21.) = (σw + σ 2)
3

April 6, 2001 Mixed Models Course 28

197
10 Split-Plot Experiments

2
• The problem is that to estimate σw + σ 2, two sums–of–squares are
needed.

• To put it in general terms, suppose SSD1 ∼ σ12χ2(f1) and SSD2 ∼


σ22χ2(f2) are needed. The problem arising is that the weighted sum

SSD = a1SSD1 + a2SSD2

does not have a χ2–distribtution unless σ1 = σ2 and a1 = a2.


• Satterthwaites idea was the following: Let us assume that SSD
approximately has a χ2–distribution.
• The problem is then how many degrees of freedom – but this number
can be “estimated” in the following way.

April 6, 2001 Mixed Models Course 29

Satterthwaites approximation

Consider the two–sample problem

Yij ∼ N (µi, σi2), i = 1, 2, j = 1, . . . , ni

Then

σi2 σ2 σ2
Ȳi ∼ N (µi, ), Ȳ1 − Ȳ2 ∼ N (µ1 − µ2, 1 + 2 )
ni n1 n2
ni
1X σ2
Si2 = (Yij − Ȳi.)2 ∼ i χ2(fi), fi = ni − 1
fi j=1 fi

April 6, 2001 Mixed Models Course 30

198
2 σ12 σ2 2
Let σD = n1 + n22 . A natural and unbiased estimate for σD is

2 S12 S22
SD = + (1)
n1 n2

2
Question : What is the distribution SD ?

Satterthwaite (Worked at General Electric, USA) (approx. 1945): We


2
don’t know but let’s approximate the distribution of SD with a suitable
2
χ –distribution:
2 φ2
SD ∼approx χ2(η) (2)
η

April 6, 2001 Mixed Models Course 31

2 S12 S2
• With SD = n1 + n22 we have

2 σ12 σ22 2
E(SD ) = + = σD
n1 n2
2 σ14 σ24
V ar(SD ) = 2( 2 + 2 )
n1f1 n2f2

2 φ2 2
• Under the approximation SD ∼approx η χ (η) is

2
E(SD ) = φ2
2 φ4
V ar(SD ) = 2
η
April 6, 2001 Mixed Models Course 32

199
10 Split-Plot Experiments

• Satterthwaites ide: Match the first two moments:

φ2 = σD
2

2 2
(σD )
η =
σ14 σ4
n21f1
+ n22f
2 2

• In real life σi2 and hence σD


2
are unknown. Instead we plug in the
2 2
estimates si and sD in the calculation of η:

(s2D )2
η =
s41 s4
n21f1
+ n22f
2 2

April 6, 2001 Mixed Models Course 33

Example 2. Let σ12 = 2, σ22 = 10, n1 = n2 = 6, f1 = f2 = 5. Then

2 2 10
σD = + =2
6 6
22
η = 22 102
= 6.9 ≈ 7
62·5
+ 62·5

Hence
2 S12 S22 2
σD
SD = + ∼approx χ2(7)
n1 n2 7
f in

April 6, 2001 Mixed Models Course 34

200
Example 3. Let σ12 = 100, σ22 = 90, n1 = 100, n2 = 10, f1 = 99, f2 =
9. Then
2 100 90
σD = + = 10
100 10
(1 + 9)2
η = 12 92 = 11.1
99 + 9

If the variances are assumed equal, then


2 99 · 100 + 9 · 90 1 1
σD = ( + ) = 10.9
108 100 10
which has a scaled χ2(108)–distribution.

Quite a difference! f in

April 6, 2001 Mixed Models Course 35

How Good is Satterthwaites Approximation

The 1000 EURO question is now : How good is Satterthwaites


approximation ???

The usual answer : Simulate and calculate coverage percentages !!!

April 6, 2001 Mixed Models Course 36

201
10 Split-Plot Experiments

Two–sample Problem

Model:
Yij = µi + ij , i = 1, 2, j = 1, . . . , ni
where ij ∼ N (0, σi2).

1. Simulate data where µ1 = µ2.

2. Test hypothesis µ1 = µ2 at different significane levels.

- Using Satterthwaites approximation

- Using the Containment method, (default in PROC MIXED).

3. Calculate coverage percentages.


April 6, 2001 Mixed Models Course 37

n1 σ1 n2 σ2 Method DDF F pr0.01 χ2 pr0.01 F pr0.05 χ2 pr0.05 F pr0.10 χ2 pr0.10


3 1 3 20 contain 4 0.047 0.127 0.114 0.204 0.182 0.260
3 1 3 20 satterth 2.16 0.020 0.124 0.056 0.202 0.106 0.258
8 1 3 20 contain 9 0.071 0.110 0.133 0.169 0.187 0.227
8 1 3 20 satterth 2.01 0.013 0.110 0.052 0.169 0.088 0.227
3 1 8 20 contain 9 0.009 0.030 0.053 0.084 0.101 0.134
3 1 8 20 satterth 7.16 0.006 0.030 0.046 0.084 0.093 0.134
8 1 8 20 contain 14 0.010 0.024 0.064 0.084 0.112 0.145
8 1 8 20 satterth 7.04 0.007 0.024 0.038 0.084 0.096 0.145
16 1 16 20 contain 30 0.013 0.025 0.068 0.078 0.119 0.128
16 1 16 20 satterth 15.1 0.010 0.025 0.060 0.078 0.110 0.128
3 1 3 5 contain 4 0.026 0.105 0.090 0.178 0.157 0.235
3 1 3 5 satterth 2.61 0.013 0.105 0.056 0.178 0.107 0.234
8 1 3 5 contain 9 0.078 0.132 0.168 0.210 0.226 0.271
8 1 3 5 satterth 2.62 0.026 0.132 0.070 0.210 0.130 0.271
3 1 8 5 contain 9 0.020 0.046 0.062 0.089 0.117 0.144
3 1 8 5 satterth 7.94 0.016 0.046 0.059 0.089 0.112 0.144
8 1 8 5 contain 14 0.026 0.035 0.056 0.080 0.107 0.131
8 1 8 5 satterth 7.73 0.014 0.035 0.048 0.080 0.090 0.131

Table 1: Two–sample problem - 1000 simulations

April 6, 2001 Mixed Models Course 38

202
Split–Plot Experiment

We consider the model

Yijk = µ + αi + βj + wik + ijk , i = 1, 2, k = 1, . . . , ni, j = 1, . . . , nik

2
where wik ∼ N (0, σw ) and ijk ∼ N (0, σ 2).

2
• Make simulations for different values of σw .

• In the simulations α1 = α2.

• Test of the hypothesis α1 = α2.

April 6, 2001 Mixed Models Course 39

The design is as follows:

n1 = 3 and n2 = 8

i = 1 : j = 1 . . . n1k = 5

i = 2 : k = 1 . . . 3 : j = 1 . . . n1k = 3

i = 2 : k = 4 . . . 8 : j = 1 . . . n1k = 9

So all problems arise to to unbalancedness (rather than variance


heterogeneity as before).

April 6, 2001 Mixed Models Course 40

203
10 Split-Plot Experiments

σ σw Method DDF F pr0.01 χ2 pr0.01 F pr0.05 χ2 pr0.05 F pr0.10 χ2 pr0.10


1 1 contain 9 0.007 0.030 0.050 0.068 0.086 0.125
1 1 satterth 9.67 0.012 0.030 0.051 0.068 0.088 0.125
3 1 contain 9 0.004 0.018 0.037 0.064 0.083 0.125
3 1 satterth 21.7 0.009 0.018 0.043 0.064 0.098 0.125
6 1 contain 9 0.002 0.014 0.020 0.043 0.057 0.086
6 1 satterth 33.5 0.012 0.014 0.034 0.043 0.072 0.086
9 1 contain 9 0.002 0.020 0.034 0.063 0.083 0.116
9 1 satterth 36.5 0.011 0.020 0.054 0.063 0.097 0.116

Table 2: Split–Plot Experiment - 1000 simulations

April 6, 2001 Mixed Models Course 41

Making the “right” tests with PROC MIXED

A typical SAS program for analyzing the split plot data above is like
proc mixed data=sim noitprint;
class i j k subject;
model y = i j /ddfm=contain chisq;
random i*k;
run;

• The containment method is default in PROC MIXED (but can be


specified explicitely with ddfm=contain) in the MODEL statement.

• This tells SAS that when testing any of the fixed effects in the model,
SAS should look for a random effect which syntactically contains the
April 6, 2001 Mixed Models Course 42

204
fixed effect: Since i is contained in i*j SAS then knows that that it is
this random effect the test should be “made against”.

• It is well known that this is the right thing to do when the experiment
is balanced.

April 6, 2001 Mixed Models Course 43

A Severe Warning!!

A very commonly made mistake in this connection is the following:


Each combination (i, k) often identifies an experimental entity, e.g. an
animal or a (whole) plot in a field. Typically one would have a variable in
the data set identifying such an entity. For illustration we have made a
variable, called subject defined as (i, k). A typical SAS program would
then be:
proc mixed data=sim noitprint;
class i j k subject;
model y = i j /ddfm=contain chisq;
random subject;
run;

Such a program is made under the mistaken impression that since


April 6, 2001 Mixed Models Course 44

205
10 Split-Plot Experiments

subject and (i, k) really identifies the same units in the experiment
then it should be immaterial what one writes.

This is not true, and the reason is the following:

Since i is not syntactically contained in subject the tests (for effect of


the factor i) would be made against the residual variance, which we
know is wrong.

April 6, 2001 Mixed Models Course 45

To emphasize this point, suppose that we declare a new variable icopy


which is just a copy of i. Then writing
proc mixed data=sim noitprint;
class i j k subject icopy;
model y = i j /ddfm=contain chisq;
random icopy*k;
run;

will also make SAS perform the test of effect of the factor i against the
residual variance which, as poined out above, is wrong.

If, however, we write ddfm=satterth in any of the examples above,


then SAS will actually identify the right variance component to make the
test for effect of factor i against.

April 6, 2001 Mixed Models Course 46

206
Some Tentative Conclusions on Satterthwaite

• For small samples, Satterthwaites method performs much better than


the default Containment method.

• For larger samples, there is not much difference between the two
methods. In practice, this is because the difference between the
quantiles in a F (1, 7) and F (1, 14) distribution is not large whereas the
differences between quantiles in a F (1, 2) and a F (1, 4) distribution be
substantive

• Both methods generally perform better than the large sample χ2 tests.

• A drawback of Satterthwaites method is that it is computationally


April 6, 2001 Mixed Models Course 47

somewhat intensive.

• Results suggest the use of Satterthwaites approximation.

April 6, 2001 Mixed Models Course 48

207
10 Split-Plot Experiments

Random or Fixed Effects?

Sometimes it is straight forward to decide on whether a specific effect


should be considered as random or fixed.

In other cases, it is a more delicate issue.

The text below is taken from lecture notes by L. R. Schaeffer, University


of Guelph, Ontario, Canada:
Fixed factors are factors in which the classes comprise all of the possible classes of
interest that could be observed. For example, the sex of an animal is either male,
female, sterilized male, or sterffized female. If the number of classes in a factor is small
and confined to this number even if conceptual resampling were performed an infinite
number of times, then the factor is likely fixed. Other examples are age classes,

April 6, 2001 Mixed Models Course 49

lactation number, management system, cage number, and breed class. Usually if the
sampling were to be repeated a second time, those factors which maintain the same
classes between the two samplings would be fixed factors. For example, a growth trial
on pigs using two diets would probably need to use the same housing facilities, the
same age groups of pigs, and the same diets, but the individual pigs would necessarily
have to be new animals because an animal could not go through the same growth
phase a second time in its life. Pig effeets would be considered a random factor whfle
the other effects would be fixed.

Random factors are factors whose levels are considered to be drawn randomly from an
infinitely large population of levels. As in the previons pig experiment, pigs were
considered random because the pig population of the world is large enough to be
considered infinitely large, and the group that were involved in that experiment were a
random sample from that population. In actual fact, however, the pigs on that
experiment were likely sampled from those relatively few pigs that were available at the
time the trial started, but still they are considered to be a random factor because if the
experiment were to be repeated again, there would likely be a completely different
group of pigs involved.

April 6, 2001 Mixed Models Course 50

208
Another way to determine if a factor is fixed or random is to know how the results will
be used. In a nutrition trial the results infer something about the diets in the trial. The
diets are specific and no inferences should be made about other diets not tested in the
experiment. Hence diet effects would be a fixed factor. In contrary, if animal effeets
were in the model, inferences about how any animal might respond to a specific diet
may need to be made. There should not be anything peculiar about the animal on the
trial that would nullify that inference. Animal effeets would be a random factor.

In general, a few questions need to be answered to make the correct choice of fixed or
random factor designation. Some of the questions are:

1. How many levels of the factor a-re in the model? If smalt, then perhaps this is a
fixed factor. If large, tILen perhaps this is a random factor.

2. Is the number of levels in the population large enough to be considered fiffinite? If


yes, then perhaps this factor is random.

3. Would the same levels be used again if the experiment were to be repeated a second

April 6, 2001 Mixed Models Course 51

time? If yes, then perhaps this factor is fixed.

4. Are inferences to be made about levels not included in the experiment? If yes, then
perhaps this factor should be random.

5. Were the levels of a factor determined in a nonrandom manner? If yes, tiden perhaps
this factor should be treated as fixed.

By studying the scientific literature, a researcher should be able to get some help in this
decision process. If in doubt, then the assistance of an experienced statistician should
be sought.

April 6, 2001 Mixed Models Course 52

209
10 Split-Plot Experiments

Multilocation Trials

Consider the following setup:

• Four treatments, e.g. of housing systems for pigs are to be compared.

• Studies are carried out on 9 farms (locations)

• Within each farm a randomized block design with 3 blocks is employed,


i.e. each treatment is repeated 3 times within each farm, once in each
block.

How to analyze such data?


April 6, 2001 Mixed Models Course 53

Note that since there are replicates within each farm, the
farm–treatment interaction can be estimated.

The following model seems appealing:

yijk = µ + τi + Lj + (RL)jk + (τ L)ij + ijk

where i = 1 . . . 4 is treatment, j = 1 . . . 9 is location and k = 1 . . . 3 is


block.

April 6, 2001 Mixed Models Course 54

210
It is reasonable to assume that (RL)jk and ijk are random. But other
effects need more consideration:

• One can consider Lj and (hence) (τ L)ij as being random.

• Alternatively one can consider Lj and (τ L)ij to be fixed effects.

The effects in question can be considered random if the farms (locations)


are random representatives from the population of farms with specific
characteristics.

But if the farms are selected as e.g. “those 9 farms whose owners
responded to a questionnaire sent out to all farms with given
characteristics”, then the farms are not random representatives from the
April 6, 2001 Mixed Models Course 55

population. In that case, the effects in question should be regarded as


fixed, and one can not extrapolate the conclusions from the study
outside these 9 farms.

What to do if 6 farms are selected randomly, while 3 are not?

What to do if there are only 3 randomly selected farms in the study?

April 6, 2001 Mixed Models Course 56

211
10 Split-Plot Experiments

212
11 Examples of Split-Plot Designs

The purpose of this lecture was to illustrate the kind of problems that may arise, if split-plot
designs are not treated properly. Most of the experiments presented were made at the Danish
Institute of Agricultural Sciences, or rather the National Institute of Animal Science, as it was
called in those days.
Another common aspect of several of the experiments were that they have led to a heated debate.
The pro’s and con’s in those debates were presented.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SPLITPLOTExamples.pdf

213
11 Examples of Split-Plot Designs

. . . After reading 50 of these papers in AABS-issues [Applied


Animal Behaviour Science] of 1984 and 1985, we found that in
about 25 cases statistical methods were used incorrectly. The
main defect was that observations entered into test statistics
were not independent. In a number of cases it was totally
unclear how the authors made their computations
Hoekstra & Jansen, AABS 16 (1986) 303-308

March 6, 2001 1

Example: W. Schouten Ph.D. work

Rearing conditions and Behaviour in pigs.

How does early experience influence later behaviour ?

’Barren’ farrowing crates vs. 2 × 2 m2 vs ’enriched’ large straw pens


28 m2.

8 sows (4 sister-pairs). Within each sister-pair the pigs were


assigned to treatment at random. Each litter consisted of 8 pigs,
i.e., a total of 64 piglets.

Detailed behavioural observations

March 6, 2001 2

214
Anovas

Reported Model Litter Averages


Effect df SS F df SS F
Sister-Pair 3 384.2 3 48.0
Housing System 1 893.3 5.14 ∗
1 117.0 2.424
Residual 59 10253.7 3 138.2

Total 63 7

March 6, 2001 3

Mixed model formulation

Reported model:

Yijk = µ + Pi + Hj + εijk

Pi Effect of sister pair i ∈ {1, . . . , 4}. Hj effect of housing. εijk


random residual.
Correct model:

Yijk = µ + Pi + Hj + Sij + εijk

Sij Effect of sow.

March 6, 2001 4

215
11 Examples of Split-Plot Designs

Breed effect on production

Are the present feeding standards for essential nutrients per FUp
sufficient for Ad lib feeding ?

Beretning 579. A. Just et al. (1985)

6 litters (YY) and 6 litters of (LL) 6 (7) pigs (boars, gilts,


castrates). Two levels of nutrient concentrations in the feed.

March 6, 2001 5

Model

Yijkl = µ + ai + bj + ck + dl(j) + (ab)ij + (ac)ik + εijkl

• ai: effect of feed nutrient concentration, i ∈ {1, 2}


(Norm vs. Norm +20%).
• bj : effect of breed, j ∈ 1, 2 (LL and YY).
• ck : effect of sex k, k ∈ {1, 2, 3}.
• dl(j): effect of litter l within breed j.
• (ab)ij : interaction between feed concentration and breed.
• (ac)ik : interaction between feed concentration and sex.
• εijkl : random residual.

March 6, 2001 6

216
Similar designs

• Breeding line vs. pecking behaviour

• Rearing Conditions vs. later productivity

• Effect of organic feed.

• Effect of GMO production.

March 6, 2001 7

Straw shortener
A number of sows were fed with either control feed or feed containing straw from
fields treated with straw shortener (CCC). To investigate long term effects the
study covered 4 parities.

Reported model:
Yijk = µ + ti + pj + (tp)ij + εijk
Yijk : Observed variable e.g., litter size. ti: effect of treatment. pj : effect of parity.
(tp)ij : Interaction between parity and treatment.εijk : random residual

Correct model:
Yijk = µ + ti + pj + (tp)ij + Sik + εijk
Sik : Effect of sow k on treatment i, Sik ∼ N (0, σS2 )

March 6, 2001 8

217
11 Examples of Split-Plot Designs

Group housing

Loose housed sows. Automatic feeding systems.

Hypothesis: Pelleted feed reduces aggression compared with mealy


feed.

Hypothesis: Pelleted feed reduces the effect of rank on received


aggression.

March 6, 2001 9

Herd Investigations

Inspired by Nørgård (1999).

Yijkl = µ + ai + sj + Hijk + vl + (vs)jl + εijklm

• Yijklm measurement at slaughter.


• ai : Effect of Abattoir i.
• sj : Effect of herd disease state j.
2
• Hijk Random effect of herd Hijk ∼ N (0, σH ).
• vl: Effect of season l.
• (vs)jl: Interaction between season and disease state.
• εijklm: Random residual from mth animal. εijklm ∼ N (0, σ 2)

March 6, 2001 10

218
Multi location trials

Yijk = µ + τi + Lj + R(L)jk + (τ L)ij + εijk

• τi: effect of treatment


• Lj : effect of location
• R(L)jk : random effect of block within location, R(L)jk ∼
2
N (0, σR )
• (τ L)ij : interaction between treatment and location
• εijk : residual εijk ∼ N (0, σ 2)

March 6, 2001 11

219
11 Examples of Split-Plot Designs

220
12 Estimation and tests in mixed models

The purpose of this lecture was to give a detailed description of theoretical issues of estimation
and tests in mixed models, i.e. properties of maximum likelihood estimators in the linear normal
model and the mixed linear normal model. Concepts such as ML and REML is introduced.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/MLMixed.f.pdf

221
12 Estimation and tests in mixed models

Maximum Likelihood and Linear Normal Models

Example 1. Consider the linear regression model

yi = β0 + β1xi + i

We shall show that the maximum likelihood estimate and the least squares
estimate for
β = (β0, β1)
are identical.

April 6, 2001 Mixed Models Course 1

Because of the independence, the joint density for y1, . . . , yn (and hence
the likelihood function) becomes
n
Y
f (y1, ...yn; β) = f (yi; β)
i=1
n
Y 1 1 1
= √ exp(− 2 (yi − (β0 + β1xi))2)
i=1
2π σ 2σ
1 1 1 X
= √ n n exp(− 2 (yi − (β0 + β1xi))2)
2π σ 2σ i
= L(β)

April 6, 2001 Mixed Models Course 2

222
The likelihood function is
1 1 1 X
L(β) = √ n
n
exp(− 2
(yi − (β0 + β1xi))2)
2π σ 2σ i

− (β0 + β1xi))2.
P
• Let D(β0, β1) = i(yi

• If σ is known then L(β) is maximized by minimizing the sum


of squared deviations D(β0, β1) (because of the “−” sign in the
exponential).

• Therefore the maximum likelihood estimate is the same as the least


squares estimate.

f in

April 6, 2001 Mixed Models Course 3

For a general linear normal model

y = Xβ +  where  ∼ N (σ 2I)

the likelihood is
1 1 1 X
L(β, σ 2) = √ n n exp(− 2
(yi − µi))2)
2π σ 2σ i
1 1 1
= √ nn
exp(− 2
(y − Xβ)>(y − Xβ))
2π σ 2σ

Hence the maximum likelihood estimate for β is found by minimizing

(y − Xβ)>(y − Xβ).

April 6, 2001 Mixed Models Course 4

223
12 Estimation and tests in mixed models

Once β̂ (and hence µ̂) is found, it is not hard to verify that L(β̂, σ 2) is
maximized as a function of σ 2 by

1
σ̂ 2 = (y − X β̂)>(y − X β̂)
n

However, in practice one never uses the ML estimate for σ 2. Instead one
uses
1
σ̃ 2 = (y − X β̂)>(y − X β̂)
n−p

where p is the number of parameters in the model.


April 6, 2001 Mixed Models Course 5

The reason for using σ̃ 2 instead of σ̂ 2 is that

E(σ̃ 2) = σ 2
n−p 2
E(σ̂ 2) = σ
n

That is σ̃ 2 is an unbiased estimate for σ 2 while σ̂ 2 is biased.

April 6, 2001 Mixed Models Course 6

224
It can be noted that
1
σ̃ 2 = (y − X β̂)>(y − X β̂)
n−p

is called the REML estimate for σ 2, where REML means REstricted or


REsidual Maximum Likelihood.

The REML method is frequently applied in connection with mixed


models in an attempt to obtain unbiased variance estimates.

April 6, 2001 Mixed Models Course 7

Maximum Likelihood Estimation in Mixed Models

For a mixed model


y = Xβ + Zu + 
the variance of y is Cov(y) = V = Z Cov(u)Z > + Cov().

• The unknown parameters are in this case (β, V ).

• The typical case is that V depends only on a small number of parameters


itself, e.g. on α = (σr2, σw
2
, σ 2) in a split–plot experiment.

• So we write V = V (α).

April 6, 2001 Mixed Models Course 8

225
12 Estimation and tests in mixed models

In mixed models, maximum likelihood estimation becomes much more


involved.

The likelihood function is


1 n 1
L(β, V ) = √ det(V )− 2 exp(− (y − Xβ)>V −1(y − Xβ))
n
2π 2

Here det(V ) is a number, called the determinant of V .


There are two situations to consider: When V is known and when V is
unknown.

April 6, 2001 Mixed Models Course 9

Case 1 - V is known: If V is known then L is maximized by


minimizing
(y − Xβ)>V −1(y − Xβ)

This quantity is minimized by

β̂ = (X >V −1X)−1X >V −1y

which is also the weighted least squares estimate of β.

April 6, 2001 Mixed Models Course 10

226
Case 2 - V is unknown: If V is unknown (which of course is
generally the case in practice) things become more complicated.

There are different approaches available. Two of these are

• Maximum Likelihood (ML) and

• Restricted Maximum Likelihood (REML)

April 6, 2001 Mixed Models Course 11

Maximum Likelihood: The expression

β̂(V ) = (X >V −1X)−1X >V −1y

depends on V which is unknown. If the expression for β̂ is substituted


into L we get

1 n 1
L(β̂(V ), V ) = √ n det(V ) 2 exp(− (y − X β̂(V ))>V −1(y − X β̂(V )))
2π 2

This likelihood depends now only on V .

April 6, 2001 Mixed Models Course 12

227
12 Estimation and tests in mixed models

Maximization of L has to be done iteratively.

This gives V̂ and hence

β̂(V̂ ) = (X >V̂ −1X)−1X >V̂ −1y

Typically, V only depends on a few parameters, say α, so we write


V = V (α).

In that case L(β̂(V (α)), V (α)) has to be maximized as a function of α.

April 6, 2001 Mixed Models Course 13

Restricted Maximum Likelihood:

An alternative to ML estimation REML estimation.

This is the default method in PROC MIXED.

Consider a mixed model

y = Xβ + Zu + , where Var(y) = V

and V and β are unknown.

If β had been known, the residuals are

 = y − Xβ ∼ N (0, V )
April 6, 2001 Mixed Models Course 14

228
and one could use the ML method from before for estimating V .

However, β is not known. Therefore one frequently does the following:


The least squares estimate of β is

β̂ls = (X >X)−1X >y

and while not the optimal estimate for β, it is still an unbiased estimate.

One then considers the residuals

ls = y − X β̂ls ∼ N (0, A(X)V A(X)>)

where A(X) is a known matrix which is a function of X.

April 6, 2001 Mixed Models Course 15

The likelihood for the “residuals” ls then depends only on V and one
can maximize that likelihood numerically.

This gives the REML estimate V̂reml for V .

When V depends on fewer parameters α the result is the REML estimate


α̂reml.

With this estimate at hand we can estimate β as

−1 −1
β̂reml = β̂(V̂reml) = (X >V̂reml X)−1X >V̂reml y

April 6, 2001 Mixed Models Course 16

229
12 Estimation and tests in mixed models

Using ML or REML

In practice the ML and the REML estimates do not differ much.

The main argument for REML estimation is that, at least in the balanced
cases, V̂reml is unbiased while V̂ml is not.

Whether V̂reml is always unbiased is not known.

April 6, 2001 Mixed Models Course 17

Tests in Mixed Models

In dealing with tests in mixed models we shall first assume that the
covariance matrix V is known.

Typically we are interested in testing hypotheses of the form λ>β = k for


some vector λ and some number k (often k = 0.)

We know that the contrast λ>β is estimable if and only if there is a


vector a such that a>X = λ>.

The estimate of the contrast is λ>β is a>X β̂, where

X β̂ = X(X >V −1X)−1X >V −1y


April 6, 2001 Mixed Models Course 18

230
Standard calculations gives that

Var(X β̂) = X(X >V −1X)−1X >V −1X(X >V −1X)−1X >
= X(X >V −1X)−1X >

so
X β̂ ∼ N (Xβ, X(X >V −1X)−1X >).

Hence
a>X β̂ ∼ N (a>Xβ, a>X(X >V −1X)−1X >a)

If the hypothesis λ>β = k is true then

a>X β̂ − k ∼ N (0, a>X(X >V −1X)−1X >a)


April 6, 2001 Mixed Models Course 19

Therefore if V is known the task is to test whether E(a>X β̂ − k) = 0


when Cov(a>X β̂ − k) is known.

This can be done by constricting the statistic

X 2 = (a>X β̂ − k)>[a>X(X >V −1X)−1X >a]−1(a>X β̂ − k)

which under the hypothesis has a χ2(f1)–distribution where f1 is the


number of parameters “eliminated” in the contrast a>X β̂ = k

April 6, 2001 Mixed Models Course 20

231
12 Estimation and tests in mixed models

The problem is what to do when V is unknown?

In some cases V (e.g. in a split–plot experiment) the structure of V is


such that V = ω 2W −1 where W is known and ω 2 is unknown.

In that case, one can construct an F–statistic

(a>X β̂ − k)>[a>X(X >W −1X)−1X >a]−1(a>X β̂ − k)/f1


F =
ω̂ 2

which under the hypothesis has an Ff1,f2 –distribution.

How to derive f2 shall not be discussed here. We just note that PROC
MIXED attempts to construct such test statistics and to derive the
appropriate number f2 of denominator degrees of freedom.
April 6, 2001 Mixed Models Course 21

In this connection it is to be pointed out that it is extremely important to


specify the random effects in the RANDOM–statement in the correct way.

April 6, 2001 Mixed Models Course 22

232
Another approach is to construct approximate F–tests by establishing
a denominator D, such that

(a>X β̂ − k)>[a>X(X >V −1X)−1X >a]−1(a>X β̂ − k)/f1


F =
D/f2

has an approximate F –distribution when the hypothesis is true.

Adding the option DDFM=SATTERTH to the MODEL–statement causes PROC


MIXED to attempt to construct such tests.

April 6, 2001 Mixed Models Course 23

A final option is the following:

When n → ∞ (in a suitably regular way) then V̂ and V becomes


indistinguishable.

Therefore, one approach is to simply “pretend” that the ML estimate V̂


is the true, but unknown variance V .

One can force PROC MIXED to making such tests by adding the
CHISQ–option to a the model statement.

April 6, 2001 Mixed Models Course 24

233
12 Estimation and tests in mixed models

234
13 Complications concerning Variance
Components

This lectures illustrated some of the problems that may arise because of numerical problems
in the iterative search for the maximum likelihood, and the reason why some of the variance
components are set equal to 0.
Based on an example from one of the exercises, the profile of the likelihood function is illustrated.
A special problem is that Satterthwaites approximation fails in the cases where the variance
component is set to 0, and the G matrix is not positive-semidefinit. Rules of thumb is suggested
in that case.
Finally, the relevance of a test of a positive variance component is discussed, e.g. comparable
to a test of a block effect, when block is treated as a fixed effect
Link to the fullscreen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Complicate.pdf

235
13 Complications concerning Variance Components

Sugar beet example

Pct Sukk
Num Den
Effect DF DF F Value Pr > F
OPTAGN 1 2 15.21 0.0599
SAATID 4 16 189.37 <.0001
OPTAGN*SAATID 4 16 5.37 0.0061

Kg
OPTAGN 1 18 336.85 <.0001
SAATID 4 18 408.52 <.0001
OPTAGN*SAATID 4 18 12.70 <.0001

March 13, 2001 1

Inspection of Log

Pct Sukk

NOTE: Convergence criteria met.


NOTE: There were 30 observations read from the data set WORK.ROER.

Kg

NOTE: Convergence criteria met.


NOTE: Estimated G matrix is not positive definite.
NOTE: There were 30 observations read from the data set WORK.ROER.

March 13, 2001 2

236
Sugar beet example

Table 1: Covariance Parameter Estimates


Pct Sukk
Cov Parm Estimate Alpha Lower Upper
BLOK 0.001000 0.05 0.000164 37.9371
BLOK(OPTAGN) 0.001000 0.05 0.000219 0.2840
Residual 0.001333 0.05 0.000740 0.003088

Kg
BLOK 0.05344 0.05 0.01660 3.13E192
BLOK(OPTAGN) 0 . . .
Residual 5.1215 0.05 2.9241 11.2004

March 13, 2001 3

Outline

• Estimation of variance components


2
– Why are σ̂X =0
– Consequences
– Rules of Thumb

• Are random effects significant ?


– Are we really interested ?
– Likelihood ratio tests

March 13, 2001 4

237
13 Complications concerning Variance Components

Reason

The likelihood function is maximized subject to the constraint that


2
the variance component parameters σX ≥ 0.

The precision of numerical optimisation methods depends on the


internal representation of numbers in the computer. Proc Mixed
2
solves this by setting σ̂X = 0 if it is close to 0.

Other statistical packages (R,S-Plus) handles the constraint by


2
maximising the likelihood as a function of log(σX )

Sometimes (e.g., repeated measurements) the assumption that


2
σX ≥ 0 cannot be justified.

March 13, 2001 5

Likelihood contour plot Pct Sukk


2
1
0
log10(σ2B)
−1
−2
−3
−4
−5

−5 −4 −3 −2 −1 0 1 2
(
log10 σ2B(O) )
March 13, 2001 6

238
Likelihood contour plot Kg

2
1
0
log10(σ2B)
−1
−2
−3
−4
−5

−5 −4 −3 −2 −1 0 1 2
(
log10 σ2B(O) )
March 13, 2001 7

G Not positive Definite

 2 
σB 0 0 0 0 0
 0 ... 0 0 0 0 
 
0 2
0 σ B 0 0 0 
V(u) = G = 
 
2
 0 0 0 σ B(O) 0 0 

0

0 0 0 . .. 0 

2
0 0 0 0 0 σB(O)

March 13, 2001 8

239
13 Complications concerning Variance Components

G Not positive Definite

 2 
σ̂B 0 0 0 0 0
 0 ... 0 0 0 0 
 
0 2
0 σ̂ B 0 0 0 
V̂(u) = Ĝ = 
 
2
 0 0 0 σ̂ B(O) 0 0 

0

0 0 0 . .. 0 

2
0 0 0 0 0 σ̂B(O)

March 13, 2001 9

G Not positive Definite

2
 
σ̂B 0 0 0 0 0
0 ... 0 0 0 0
 2

0 0 σ̂B 0 0 0
Ĝ = 
0

0 0 0 0 0

0 0 0 0 ... 0

0 0 0 0 0 0

Ĝ−1 =???

March 13, 2001 10

240
Warning: Satterthwaite Goes Wrong

Satterthwaite’s approximation uses the estimated variance


components for calculation of test degrees of freedom. The
2
calculations includes differentiation with respect to σ̂X . At boundary
values such as 0 this differentiation is not defined.

In the PARMS statement a lower bound on the estimated variance


components may be specified, e.g.,
PARMS /LBOUND=0.001,0.001,0.001;
2
This produces the same problems as σ̂X =0

March 13, 2001 11

Conclusions

• If estimated covariance parameters are > 0 use Satterthaites


approximation.

• If not
– If model reductions are ”natural”, reestimate parameters using
revised models.
– Nested design should be reformulated to maintain design
– Use containment method but be careful to specify model
syntactically correct. (Compare with random statement in GLM)

March 13, 2001 12

241
13 Complications concerning Variance Components

Testing Effects of Random components

2
• Why are we interested in testing σB >0?

• Model Reduction
2
• σ̂B = 0 is not a test and may not be used for this purpose.

• Fixed effects vs. Random Effects

• Biologically significant, i.e.,. if we sample x individuals at random,


what are the average difference between lowest and highest,
confidence interval for the difference. What is the correlation,
heretability, repeatability, sensitivity and specificity.

March 13, 2001 13

Model Reduction

Consider model A and model B that represents a special case of A,


2
e.g., one of the variance components σX = 0. B is said to be nested
within A. In this case a Likelihood Ratio test may be performed

Then 2(LogLikeA − LogLikeB ) is asymptotically χ2 distributed with


(pA − pB ) degrees of freedom, where pA is number of parameters in
model A.
2
NB! This not feasible if σ̂X =0

March 13, 2001 14

242
General recommandations

• Using ML any nested models may be compared


• Using REML only nested models with identical fixed effects may be
compared.
• With respect to test for variance components this test is
conservative, i.e., true p-value is smaller than the calculated. Thus
the test results in too few significant findings.
• With respect to test for fixed effects this test is anti conservative,
i.e., true p-value is larger than the calculated. Thus the test results
in too many significant findings. (Therefore likelihood ratio tests
should not be used for fixed effects).

March 13, 2001 15

Fixed Effects

If the variance component is 0, this implies that ui = uj for every i


and j.

i.e., Reformulate model and treat the factor of interest as Fixed.

However:

ui ≈ uj does not imply that σu2 = 0

March 13, 2001 16

243
13 Complications concerning Variance Components

Biologically significant

• Very often the real interest can be formulated as an interval of the


variance component parameter, e.g., is it larger than some preset
’irrelevance’ level ?
• The confidence interval produced with the CL option in the
Proc Mixed statement are often sufficient for this. However,
the general comment about sufficient sample size is VERY relevant
here.
• Many ’biologically’ relevant parameters are combinations of several
variance component parameters, e.g., correlation ( repeatability)
σ2
( σ2+σ
A
2 ). Therefore the joint distribution of parameter estimates
ε A
need to be considered. This is not trivial (Interest ???).
March 13, 2001 17

Covariance Matrix: Sugar beet PCT Sukk

Asymptotic Covariance Matrix of Estimates

Row Cov Parm CovP1 CovP2 CovP3

1 BLOK 3.069E-6 -8.02E-7


2 BLOK(OPTAGN) -8.02E-7 1.613E-6 -4.44E-8
3 Residual -4.44E-8 2.222E-7

March 13, 2001 18

244
14 Repeated Measurements

This lecture gives an introduction to repeated measurements, and is a supplement to Chapter 3


in LMSW (Littell et al., 1996). It illustrates how it is possible to modify the tacit assumptions
of the split-plot design into a more flexible modelling of the variance matrix.
Different variance structure is illustrated graphically and the use of SAS to compare different
structures presented. The AR(1) and CS structure are discussed in detail. Finally, methods for
comparison between different structures is shown.
Links to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Repeated.f.pdf

245
14 Repeated Measurements

Analyzing Repeated Measurements

Consider the setup:

• A treatment factor A with a levels is applied to individuals, e.g.


pigs.

• Within each treatment there are c individuals

• On each individual repeated measurements of the same response is


made at b different time points.

October 11, 2001 Mixed Models Course 1

Example: Exercise Therapy (LMSW p. 88)

• Subjects (SUBJ) were assigned to one of three different training


programs (PROGRAM) on weightlifting.

• The strength (STRENGTH) of the subjects was measured every


second day (TIME) for a two period from the start of the study.

Some questions:

• Is there a treatment effect?

• Is there an interaction between treatment and time?

October 11, 2001 Mixed Models Course 2

246
Mean profiles
Group means

W
W

82.5
W W

strength

81.5
W
R R R
W R
R
80.5 R

C C C
C
R C
79.5

C C

1 2 3 4 5 6 7

time

The task: Comparison of the mean profiles

Clear evidence of treatment effect and treat–by–time interaction.

October 11, 2001 Mixed Models Course 3

Individual profiles:
CONT RI WI
90

90

90
85

85

85
strength

strength

strength
80

80

80
75

75

75

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7

time time time

No evidence of non–constant variance!!

Sometimes (but certainly not always!) repeated measurements can


be appropriately dealt with by a split–plot model.

October 11, 2001 Mixed Models Course 4

247
14 Repeated Measurements

• A statistical model for this situation could be

yijk = µ + αi + βj + γij + wik + ijk

2
where wik ∼ N (0, σw ) and ijk ∼ N (0, σ 2).

• Here i denotes treatment, k is replications (within treatment) and


j is “time”

• “Time” is called the within–subject factor.

• Note: “Time” can also refer to different locations, e.g. in the


intestine.

• It is the usual split–plot model!

October 11, 2001 Mixed Models Course 5

Tacit Assumptions when using the Split–Plot Model

It is important to realize the assumptions one make in applying a


split–plot model to a repeated measurement problem:

1. It is assumed that the variance is constant.


This may not a reasonable assumption: Sometimes the variance
increases with the mean, and if the mean changes over “time”, this
assumption is violated.
If time is really location in the instine, there might be certain
segments where the variance of a given respons is much larger than
in other segments.
October 11, 2001 Mixed Models Course 6

248
2. It is assumed that the correlation between two measurements on the
same individual is the same – no matter how far the measurements
are apart in time.
This may not be a reasonable assumption: Observations close
to each other in time might be expected to be more alike than
observations far from each other.

3. It is assumed that the correlation is positive.


This may not be a reasonable assumption: Consider a feeding
experiment. If the feed intake is lower than expected in one week
because of diseases it may be higher than expected in the next
week. Hence the observations would be negatively correlated.

October 11, 2001 Mixed Models Course 7

4. It is assumed that the biological questions can be answered through


the interaction γik and possibly the main effects αi and βj .
That might be a too crude model. For example, data might
indicate the mean value evolves over time in a specific way, e.g.

µij = µ + αi + β × j + β2 × j 2

October 11, 2001 Mixed Models Course 8

249
14 Repeated Measurements

Modelling of Covariances

A classical way of thinking of a statistical model is as

Observables = Systematic effects + Random effects

Most frequently, the main interest is in the systematic effects, while


the random effects are considered as a nuissance.

Yet, the random effects are important to understand and to model in


an appropriate way.

October 11, 2001 Mixed Models Course 9

Types of random variation


m+e m + subj
200

200
150

150
100

100
50

50

5 10 15 20 5 10 15 20

x x

m + ser m + subj + ser + e


200

200
150

150
100

100
50

50

5 10 15 20 5 10 15 20

x x

October 11, 2001 Mixed Models Course 10

250
Can be summarized as:

• Random subject effect

• Serial dependence

• Residual variation

October 11, 2001 Mixed Models Course 11

Unstructured Covariance Matrix

Consider Exercise Therapy data.

A very general model is the model where for each treatment i and
time j there is mean value µij , and the measurements have a
completely unstructured covariance matrix.

   
Yi1k µi1
Yik =  ..  ∼ N7(µi =  .. 
,V )
  
Yi7k µi7
where k refers to to subject within treatment, and where V is a
7 × 7 unstructured matrix.
October 11, 2001 Mixed Models Course 12

251
14 Repeated Measurements

Since the subjects are independent the random vector arising after
stacking all Yik s on the top of each other has a covariance matrix
consisting of V ’s on the “diagonal” and 0s outside.

Such a matrix is said to be block diagonal.

Note that in V there are 7 × 8/2 = 28 parameters.

October 11, 2001 Mixed Models Course 13

This model can be fitted with the following SAS program:


proc mixed data=weight2;
class program subj time;
model strength = program time program*time / outP=pred;
repeated time / subject=subj*program type=un r;
ods listing exclude r; ods output r=r rcorr=rcorr;
data r; set r; keep col1-col7;
data rcorr; set rcorr; keep col1-col7;
run;

The data set r contains the estimated covariance matrix, while


rcorr contains the correlation matrix

Note that V is the covariance matrix for Yik . But if we write


Yik = µi + ik (note: everything here are vectors) then V is also the
covariance for the error terms ik which has mean 0.

October 11, 2001 Mixed Models Course 14

252
The estimated correlation matrix is
1.0000 0.9602 0.9246 0.8716 0.8421 0.8091 0.7968
0.9602 1.0000 0.9396 0.8770 0.8596 0.8273 0.7917
0.9246 0.9396 1.0000 0.9556 0.9372 0.8975 0.8755
0.8716 0.8770 0.9556 1.0000 0.9601 0.9094 0.8874
0.8421 0.8596 0.9372 0.9601 1.0000 0.9514 0.9165
0.8091 0.8273 0.8975 0.9094 0.9514 1.0000 0.9531
0.7968 0.7917 0.8755 0.8874 0.9165 0.9531 1.0000

October 11, 2001 Mixed Models Course 15

The AR(1)–model

Consider a sequence of measurements z1, z2, . . . , zT made on the


same experimental unit at T time points t = 1, . . . , T .

It is assumed that E(zt) = 0 for all t.

A frequently employed model is the AutoRegressive model of order


1, which states that

zt = ρzt−1 + t t = 2, . . . , T

where t ∼ N (0, σz2), all independent and where −1 < ρ < 1.


October 11, 2001 Mixed Models Course 16

253
14 Repeated Measurements

Hence what happens at time t is ρ times what happened at time


t − 1 + some random noise.

The variance of each zt is the same and is denoted ω 2.

This variance can be found as:

ω 2 = Var(zt) = Var(ρzt−1 + t)


= ρ2 Var(zt−1) + Var(t)
= ρ2 ω 2 + σ 2

σ2
Hence ω 2 = 1−ρ2
.

October 11, 2001 Mixed Models Course 17

It is illustrative to investigate the covariance structure of this model.

First consider observations one time–step apart:

Cov(zt, zt−1) = Cov(ρzt−1 + t, zt−1)


= ρ Cov(zt−1, zt−1) = ρ Var(zt−1) = ρω 2

Next we consider observations two time–steps apart:

Cov(zt, zt−2) = Cov(ρzt−1 + t, zt−2)


= Cov(ρzt−1, zt−2) = ρ Cov(zt−1, zt−2)
= ρ2 ω 2
October 11, 2001 Mixed Models Course 18

254
In general, the covariance between observations k time–steps apart is

Cov(zt, zt−k ) = ρk ω 2

The correlation between observations k time steps apart therefore


becomes
ρk ω 2
γ(k) = Corr(zt, zt−k ) = 2 = ρk
ω
The number k is called the lag between the observations and γ(k) is
called the autocorrelation function

If the postulated model is correct, the autocorrelation should tend to


0 as the lag increases.

Some Autocorrelations
October 11, 2001 Mixed Models Course 19

Autocorrelation, rho= 0.5 Observations


1.0

10
0.8
rho^c(0, x)

5
0.6

z
0.4

0
0.2

−5
0.0

0 10 20 30 40 50 0 10 20 30 40 50

c(0, x) x

Autocorrelation, rho= −0.5 Observations


1.0

10
0.5

5
rho^c(0, x)

0
0.0

−5
−10
−0.5

0 10 20 30 40 50 0 10 20 30 40 50

c(0, x) x

October 11, 2001 Mixed Models Course 20

255
14 Repeated Measurements

Autocorrelation, rho= 0.9 Observations

1.0

10
0.8

5
rho^c(0, x)

0.6

0
z
0.4

−5
0.2

−10
0.0

0 10 20 30 40 50 0 10 20 30 40 50

c(0, x) x

Autocorrelation, rho= 0.1 Observations


1.0

10
0.8

5
rho^c(0, x)

0.6

0
0.4

−5
0.2

−10
0.0

0 10 20 30 40 50 0 10 20 30 40 50

c(0, x) x

October 11, 2001 Mixed Models Course 21

How to estimate the autocorrelation??

A very brute–force way of estimating the autocorrelation is the


following: Suppose there are observations from 4 time points, i.e.
t = 1, . . . , 4 on many subjects and assume observations all have zero
mean.

Then the (symmetric) matrix of correlations is

 
1 ρ12 ρ13 ρ14
ρ21 1 ρ23 ρ24
 
Corr = 
 
ρ31 ρ23 1 ρ34

 
ρ41 ρ24 ρ43 1

October 11, 2001 Mixed Models Course 22

256
Simple estimates of the autocorrelation for observations one, two
and three time–step apart are

1
γ̂(1) = (ρ12 + ρ23 + ρ34)
3
1
γ̂(2) = (ρ13 + ρ24)
2
1
γ̂(3) = (ρ14)
1

Obviously, for higher values of k, γ(k) will be poorly estimated as it


is the average over few values.

October 11, 2001 Mixed Models Course 23

The autocorrelation can be estimated (as described above) by


invoking the macro:
%autocorr(r);

where r is the covariance matrix estimated in connection with the


model with unstructured covariance matrix.
If the file autocorr.sas is located in e.g. c:\stat then the macro
is included, i.e. made available by submitting the statement
%include ’d:\stat\autocorr.sas’;

This creates the SAS dataset autocorr with autocorrelation and lag.

October 11, 2001 Mixed Models Course 24

257
14 Repeated Measurements

The macro also creates a plot of the autocorrelation against lag:


Autocorrelation for Exercise Therapy data

1.00
0.95
autocorr

0.90
0.85
0.80

0 1 2 3 4 5 6

lag

What can be concluded from that?


October 11, 2001 Mixed Models Course 25

• There is a clear indication of positive correlation and that the


correlation decreases with time.

• Whether the correlation structure can be appropriately described


by ρk is another issue. There is not much evidence for or against
that structure.

October 11, 2001 Mixed Models Course 26

258
Since all autocorrelations γ(k) are positive it is tempting to plot
log γ(k) against k as well.

The reason is that if the autocorrelation is γ(k) = ρk then


log γ(k) = k log ρ.

Hence a plot of log γ(k) = k log ρ against k should approximately


yield a straight line with intercept 0 and slope log ρ:

October 11, 2001 Mixed Models Course 27

Log Autocorrelation for Exercise Therapy data


0.00
−0.05
−0.10
log(autocorr)

−0.15
−0.20

0 1 2 3 4 5 6

lag

October 11, 2001 Mixed Models Course 28

259
14 Repeated Measurements

Again, there is not any strong evidence against the AR(1) structure.

From the graph it follows that the slope is approximately


log ρ ≈ −0.23/6 = −0.038 such that ρ ≈ 0.962.

Hence the correlation between observations does decrease as the


time between them increases – but it decreases very slowly!!

October 11, 2001 Mixed Models Course 29

Compound Symmetry

The Split–plot model can also be formulated using a REPEATED


statement instead of a RANDOM statement.
proc mixed data=weight2;
class program subj time;
model strength = program time program*time;
repeated time / type=cs sub=subj(program) r rcorr;
ods listing exclude r; ods output r=r;
run;

Fortunately, the results using a REPEATED or a RANDOM statement are


the same!

The option type=cs specifies that the covariance matrix for each
October 11, 2001 Mixed Models Course 30

260
subject has a compound symmetry structure:

 
σ 2 + σw2
σw 2
... σw 2
2
σw σ 2 + σw 2
... σw 2
 
 

 .. .. ... .. 

2 2
σw σw . . . σ 2 + σw2

From the SAS output one sees that the correlation between
observations on the same subject is estimated to
2
σw
2 + σ2
≈ 0.8892
σw

October 11, 2001 Mixed Models Course 31

Which Covariance Structure to use?

With all this flexibility in choosing the covariance structure, some


guidelines are needed for choosing an appropriate one:

• Parsimony: Covariance structures with few parameters are most


attractive as there are fewer parameters to be estimated from data.

• Exploratory data analysis: A graphical investigation of the data


might suggest an appropriate covariance structure.

• Subject matter considerations: Sometimes the problem at hand


really dictates an appropriate covariance structure
October 11, 2001 Mixed Models Course 32

261
14 Repeated Measurements

• Necessity: Sometimes one is for numerical reasons forced to use a


very simple covariance structure – PROC MIXED might not be able
to fit the complex ones.

• Numerical criteria: There are some numerical criteria, which can


be a guideline.

October 11, 2001 Mixed Models Course 33

Numerical Criteria

AIC and BIC are some criteria to be used. They are both the
log–likelihood + some term penalizing for the number of parameters
used in the model. BIC penalizes the use of many parameters harder
than AIC.

Smaller values of both criteria indicate a good fit.

For the Exercise Therapy the result is

Structure CS AR(1) UN
AIC 1424.9 1270.8 1290.9
BIC 1428.9 1274.9 1348.1

October 11, 2001 Mixed Models Course 34

262
Hence the result is in favor of using the AR(1)–structure.

October 11, 2001 Mixed Models Course 35

What does the covariance structure mean for the


conclusions?

For the Exercise Therapy the p–values for the test of no interaction
effect are:
Structure CS AR(1) UN
Program*Time 0.0005 0.3007 0.1297

Radically different conclusions!

The data really suggests that the interaction is present!

October 11, 2001 Mixed Models Course 36

263
14 Repeated Measurements

264
15 Repeated Measurements: Covariance
structures

This lecture gives an overview of how to specify different covariance structures in SAS via the
REPEATED statement in PROC MIXED. The lecture is based on the description in the on-line SAS-
manual1 .
The most important types of covariance structure is presented.

• Unstructured (UN)

• Autoregressive (AR(1)–SP(POW))

• Antedependence (ANTE(1))

• Toeplitz (TOEP)

• Heterogeneous variance (ARH(1),CSH, etc.)

The pro’s and con’s of the different structures are discussed


Link to full screen Presentation2

1
http://dokumentation.agrsci.dk/sasdocv8/sasdoc/sashtml/onldoc.htm
2
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RepeatedType.f.pdf

265
15 Repeated Measurements: Covariance structures

Repeated statement

Y = Xβ + Zu + ε
V(ε) = R

R is a n × n matrix, where n is number of observations.

In order to handle this, a structure of the matrix is defined with


repeated use of the elements in the structure.

March 21, 2001 1

Repeated Statement

The syntax of the REPEATED statement

REPEATED < repeated-effect > < / options >;

Usually a formulation like:


REPEATED time / subj=animal*treat ;

A good precaution is always to specify the repeated-effect

March 21, 2001 2

266
Missing data: example

Treat Animal Time Y


A 1 1 12.4
A 1 2 .
A 1 3 14.5
B 1 1 14.3
B 1 2 15.3
B 1 3 14.8
.. .. .. ..

March 21, 2001 3

PROC MIXED: REPEATED Statement

REPEATED < repeated-effect > < / options > ;

You can specify the following options in the REPEATED statement


after a slash (/).
GROUP=effect HLM HLPS
LDATA=SAS-data-set LOCAL LOCALW
NONLOCALW R<=value-list> RC<=value-list>
RCI<=value-list> RCORR<=value-list> RI<=value-list>
SSCP SUBJECT=effect TYPE=covariance-structure

March 21, 2001 4

267
15 Repeated Measurements: Covariance structures

Types of variance structure

• Approximately 30 different methods

• ”Time”/”linear” structure vs. spatial structure

• Homogeneous vs. heterogeneous variance

• ”Banded” vs full structure

March 21, 2001 5

Unstructured: type=un

The measurements of each subject


 
σ11 σ12 σ13 σ14
σ22 σ23 σ24
 

σ33 σ34
 

σ44

Parameters t × (t + 1)/2

March 21, 2001 6

268
Autoregressive: type=AR(1)

The measurements of each subject


 
1 ρ ρ 2 ρ3
 1 ρ ρ2 
 
σ2 
1 ρ


1

ρ ρ ρ ρ
Y1 Y2 Y3 Y4 Y5
.

March 21, 2001 7

Autocovariance
1.0
0.8
0.6
ρ
0.4
0.2
0.0

0 1 2 3 4 5 6
lag
March 21, 2001 8

269
15 Repeated Measurements: Covariance structures

Autocovariance
1.0
0.8
0.6
ρ
0.4
0.2
0.0

0 1 2 3 4 5 6
lag
March 21, 2001 9

Autoregressive: type=SP(POW)

The measurements of each subject


 
1 ρ|t2−t1| ρ|t3−t1| ρ|t4−t1|
1 ρ|t3−t2| ρ|t4−t2|
 
σ2 

1 ρ|t4−t3|


1

March 21, 2001 10

270
Ante-Dependence: type=ANTE(1)

AR(1)
.

ρ ρ ρ ρ
Y1 Y2 Y3 Y4 Y5
.

ANTE(1)
.

ρ1 ρ2 ρ3 ρ4
Y1 Y2 Y3 Y4 Y5
.

March 21, 2001 11

Ante-Dependence: type=ANTE(1)

The measurements of each subject


 
σ12 σ1σ2ρ1 σ1σ3ρ1ρ2 σ1σ4ρ1ρ2ρ3
σ22 σ 2 σ 3 ρ2 σ 2 σ 4 ρ2 ρ3 
 

σ32 σ 3 σ 4 ρ3 
 

σ42

March 21, 2001 12

271
15 Repeated Measurements: Covariance structures

Toeplitz: type=TOEP

The measurements of each subject


 
σ 2 σ1 σ2 σ3
σ 2 σ1 σ2 
 

σ 2 σ1 
 

σ2

March 21, 2001 13

Heterogenous variance

Instead of identical variance at every time point, the variance is


estimated at each time point

In general, the type is found by simple adding an H to the type, i.e.,


csh, arh(1), toeph

The structures are preserved as far as the correlation between time


points are concerned

More elaborate parametric techniques are available Eq. LIN

March 21, 2001 14

272
Conclusions

• Parsimony !
• Fixed observation times and similar intervals : AR(1)
(2 parms)
• Slightly varying observation times and similar intervals
: SP(POW) (2 parms)
• Fixed observation times but intervals of different type:
ANTE(1) (2t − 1 parms (heterogen. variance))
• Fixed observation times, similar intervals, no simple
lag-structure : TOEP (t − 1 parms)

March 21, 2001 15

AR vs CS

AR(1)
.

ρ ρ ρ ρ
Y1 Y2 Y3 Y4 Y5
.

CS A

Y1 Y2 Y3 Y4 Y5
.

March 21, 2001 16

273
15 Repeated Measurements: Covariance structures

274
16 Random Regression

The random regression model is discussed starting with an example from one of the exercises.
The presentation supplements chapter 7: Random Coefficients in LMSW (Littell et al., 1996)
The basic idea behind random regression and the implementation of the model in PROC MIXED
is shown. Finally, the implications for the covariance structure of the observations is presented.
Link to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RandomRegression.f.pdf

275
16 Random Regression

The Basic Idea behind Random Regression

Feeding pigs with different amounts of vitamin E supplement.

Weights recorded weekly.


Cu = 1 Cu = 2 Cu = 3
100

100

100
80

80

80
Weight

Weight

Weight
60

60

60
40

40

40
4 6 8 10 12 4 6 8 10 12 4 6 8 10 12

Time Time Time

October 4, 2001 Mixed Models Course 1

• Clearly (random) between–subject (pig) variation.

• Approximately linear increase in weight.

• Slight tendency to larger dispersion between pigs at the end of the


study than at the beginning.

• Repeated measurement problem.

Aims:

• Find a regression model which describes the weight as function of


time.

• Draw inferences about possible treatment effects.

October 4, 2001 Mixed Models Course 2

276
First idea: fit linear regression model (with random pig effect) and
treatment specific parameters:

yijt = αi + βit + Uij + ijt

Here, i is treatment, j is subject (pig) within treatment, t is time,


Uij ∼ N (0, σu2 ) and ijt ∼ N (0, σ 2), all independent.
title ’Linear regression (with random Pig effect)’;
title2 ’Treatment specific parameters’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time /noint solution outp=R1 ;
random Cu*Pig;
run;

October 4, 2001 Mixed Models Course 3

Plot the curves of residuals:


symbol i=j;
proc gplot data=R1;
by Cu;
plot resid*Time=Pig;
run;
Cu = 1 Cu = 2 Cu = 3
5

5
Resid

Resid

Resid
0

0
−5

−5

−5
−10

−10

−10

4 6 8 10 12 4 6 8 10 12 4 6 8 10 12

Time Time Time

The “residual curves” do not look random.


October 4, 2001 Mixed Models Course 4

277
16 Random Regression

Second idea: fit individual linear regression model (with random pig
effect):
yijt = αi + βij t + Uij + ijt
where i is treatment, j is subject (pig) within treatment, t is time,
and Uij ∼ N (0, σu2 ) and ijt ∼ N (0, σ 2), independent.
title ’Individual linear regressions (with random Pig effect)’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Pig*Time /noint solution outp=R2;
random Cu*Pig;
ods output solutionf=sf2;
proc gplot data=R2;
by Cu;
plot resid*Time=Pig;
run;

October 4, 2001 Mixed Models Course 5

Cu = 1 Cu = 2 Cu = 3
4

4
2

2
Resid

Resid

Resid
0

0
−2

−2

−2
−4

−4

−4

4 6 8 10 12 4 6 8 10 12 4 6 8 10 12

Time Time Time

The “residual curves” now look much more random.

This approach gives a whole lot of parameter estimates βij , where i


refers to treatment and j to individual within treatment.

How to proceed with the analysis?


October 4, 2001 Mixed Models Course 6

278
Analyzing the Individual Regression Coefficients

Frequently the task is to estimate the effect of time for each


treatment.

A tempting (and classical) way of doing this is to continue analyzing


the βij s.

For example, β̄i. = J1 j βij is the average slope within treatment i.


P

The analysis could then proceed by comparing β̄1. , β̄2. and β̄3. in
some way.

Yet - it is somewhat unsatisfactory to first estimate the βij s as


systematic effects and then afterwards analyzing these as if they
October 4, 2001 Mixed Models Course 7

were random quantities.

October 4, 2001 Mixed Models Course 8

279
16 Random Regression

Some graphics of the βij s:


Estimates Time*Cu*Pig for Cu= 1 Normal Q−Q Plot

Sample Quantiles
0.0 0.4 0.8
Density

7.5
6.0
5 6 7 8 9 −1.0 −0.5 0.0 0.5 1.0

Estimate Theoretical Quantiles

Estimates Time*Cu*Pig for Cu= 2 Normal Q−Q Plot

Sample Quantiles
Density

0.6

7.4
0.0

6.6
5 6 7 8 9 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Estimate Theoretical Quantiles

Estimates Time*Cu*Pig for Cu= 3 Normal Q−Q Plot


Sample Quantiles
Density

0.3

7.0
5.5
0.0

5 6 7 8 9 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Estimate Theoretical Quantiles

October 4, 2001 Mixed Models Course 9

Random Regression

A random regression model is an alternative:

yijt = αi + βit + Uij + Bij t + ijt

The systematic effects are as usual.

The random effects are Uij ∼ N (0, σu2 ), Bij ∼ N (0, σB


2
) and
2
ijt ∼ N (0, σ ).

It is assumed that ijt is independent of Uij and of Bij but needs


not to be assumed that Uij and Bij are independent.
October 4, 2001 Mixed Models Course 10

280
Hence

• βi is the population slope pigs receiving the ith treatment.

• Bij describes individual random deviations from the population


slope.

In this way systematic and random variation of the regression


coefficients can be separated.

October 4, 2001 Mixed Models Course 11

Just like the parameter estimates in a regression usually are


correlated, then so might the random effects Uij and Bij also be.

To obtain such flexibilities, we assume


     2 
Uij 0 σU σU B
∼ N2 , 2
Bij 0 σU B σB

If σU B = 0 then Uij and Bij are independent.

October 4, 2001 Mixed Models Course 12

281
16 Random Regression

How to ... In SAS

Independence:
title ’Random regression model (with random Pig effect)’;
title2’Independent intercepts and slopes’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time / ddfm=satterth noint solution outp=R3;
random int Time / sub=Pig type=vc solution;
ods output solutionf=sf3;
ods exclude listing solutionr;
ods output solutionr=sr3;
run;

Independence of Uij and Bij is obtained by type=vc in the RANDOM


statement.

October 4, 2001 Mixed Models Course 13

Dependence:
title ’Random regression model (with random Pig effect)’;
title2’Dependent intercepts and slopes’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time / ddfm=satterth noint solution outp=R4;
random int Time / sub=Pig type=un solution;
ods output solutionf=sf4;
ods exclude listing solutionr;
ods output solutionr=sr4;
run;

Dependence of Uij and Bij is obtained by type=un in the RANDOM


statement.

October 4, 2001 Mixed Models Course 14

282
Inference

In connection with random regression models we recommend always


using the ddfm=satterth option for estimating the degrees of
freedom.

Contrast etc. can be obtained as follows:


proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Time Cu*Time / ddfm=satterth solution outp=R3;
random int Time / sub=Pig type=vc solution;
lsmeans Cu / diff;
estimate ’Slope: Cu1 vs Cu2’ Cu*Time 1 -1 0;
estimate ’Slope: Cu1 vs Cu3’ Cu*Time 1 0 -1;
estimate ’Slope: Cu2 vs Cu3’ Cu*Time 1 0 -1;

October 4, 2001 Mixed Models Course 15

run;

October 4, 2001 Mixed Models Course 16

283
16 Random Regression

When a random regression coefficient is present in the model, then


it is important that the model also contains a random intercept.

To see why consider the random regression model

yijt = αi + βit + Uij + Bij t + ijt

Suppose that the scale of time t is changed to t0 = c1t + c2. Then it


would be very desirable to obtain the same result whether t or t0 was
used as time in the regression.

October 4, 2001 Mixed Models Course 17

Now we use t0 in a random regression model without random


intercept:

yijt = αi + βit + Bij t0 + ijt


= αi + βit + Bij (c1t + c2) + ijt
= αi + βit + (Bij c1t) + (Bij c2) + ijt

Hence Bij c2 will play the role of a random intercept.

In other words, the presence of a random intercept a matter of the


scale on which t is measured.

Likewise, in a polynomial regression involving t2: If there is a


random regression coefficient for t2 then there must also be a
October 4, 2001 Mixed Models Course 18

284
random regression coefficient for t and a random intercept.

October 4, 2001 Mixed Models Course 19

Correlation structure in Random Regression Models

Consider again the random regression model

yijt = αi + βit + Uij + Bij t + ijt

and assume for simplicity that Uij and Bij are independent.

The variance of Yijt is

2 2 2
Var(Yijt) = σU + σB t + σe2

2 2 2
For later use let Vt = σU + σB t .

October 4, 2001 Mixed Models Course 20

285
16 Random Regression

Next consider the variance at time t + k:


2 2
Var(Yij(t+k) ) = σU + σB (t + k)2 + σe2 = Vt+k + σe2
2
= σU + t 2 σB
2
+ k(2t + k)σB2
+ σe2
2
= Vt + k(2t + k)σB + σe2

The covariance between Yijt and Yij(t+k) is

Cov(Yijt, Yij(t+k) ) = Cov(Uij + Bij t + ijt, Uij + Bij (t + k) + ijt)


= Var(Uij ) + Cov(Bij t, Bij (t + k))
2 2
= σU + t(t + k)σB
2
= [σU + t 2 σB
2 2
] + tkσB 2
= Vt + tkσB

October 4, 2001 Mixed Models Course 21

In total

Var(Yijt) = Vt + σe2
2
Var(Yij(t+k) ) = Vt + k(2t + k)σB + σe2
2
Cov(Yijt , Yij(t+k) ) = Vt + tkσB

Hence the correlation is


2
Vt + tkσB
Corr(Yijt , Yij(t+k) ) = p 2 + σ 2)
(Vt + σe2)(Vt + k(2t + k)σB e

Now consider a fixed t. The numerator is a linear function in k while


the denominator is a quadratic function in k.
October 4, 2001 Mixed Models Course 22

286
Hence we know from high school mathematics that

Corr(Yijt , Yij(t+k) ) → 0

as k (i.e. the time span between Yijt and Yij(t+k) goes to infinity.

In other words, under the random regression model, the correlation


decreases as with distance in time.

That is an appealing property of the model!

October 4, 2001 Mixed Models Course 23

287
16 Random Regression

288
17 Factor Structure Diagrams

The discussion with participants during the previous lectures had shown the need for an inde-
pendent means of checking the degrees of freedom in the F-tests in PROC MIXED. The methods
of calculation of degrees of freedom (option ddfm) is not fool-proof. The containment method
may lead to errors if the experimental design cannot be deducted from the model specification,
and the satterthwaite method is erroneous if one of the variance component is estimated as
0.
Therefore, the factor structure diagram method were presented, supplement with an exercise.
Link to the full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/FactorStructure.f.pdf

289
17 Factor Structure Diagrams

Factor Structure Diagrams

Factor structure diagrams is a way of representing certain factorial


designs, including block experiments, split plot experiments etc.

• With such diagrams, it is for certain balanced cases easy to calculate


the correct degrees of freedom for the tests.

• It is also for certain balanced cases easy to identify which “error” an


effect is to be “tested against”.

April 17, 2001 1

However

• it is a somewhat restricted class of models that can be appropriately


represented this way.

• the degree of freedom calculations are not correct in unbalanced cases

• It is a very comprehensive task to describe the class of designs for which


factor structure diagrams can be used

Nonetheless, they are quite useful...

April 17, 2001 2

290
Two–way ANOVA with Replicates

Factors A and B have a and b levels. Replicates within each


combination A × B are denoted by the factor R with r levels.

That is, there are abr units in the experiment

The usual two–factor ANOVA model is

yabr = µ + αa + βb + (αβ)ab + abr

The model can be represented in a factor structure diagram


April 17, 2001 3

Aaa−1

[ABR]abr
abr−ab
ab
ABab−a−b+1 O11
b
Bb−1

• The term O is to be identified with µ

• The term A is to be identified with αa

• The term AB is to be identified with (αβ)ab etc.

• Terms in [. . . ] are random effects.


April 17, 2001 4

291
17 Factor Structure Diagrams

Calculating the degrees of freedom

1. Fill in the levels of the factors as superscripts (i.e. the red) symbols.

2. Then calculate the degrees of freedom (DF) recursively from right to


left:
The DF for O is 1 (the blue symbol).
The DF for A is a minus the sum of DFs from factors pointing towards
A in the diagram, i.e.
a−1=a−1

3. Proceed like this towards left in the diagram: The DF for AB are

ab − (a − 1) − (b − 1) − 1 = ab − a − b + 1

April 17, 2001 5

“Proof that it works...”


%let a=4; %let b=2; %let r=3;
title ’Two-way ANOVA with replicates’;
data data1;
do A=1 to &a;
do B=1 to &b;
do R=1 to &r;
y=rannor(0);
output;
end; end; end;

proc mixed data=data1 noinfo noclprint;


class A B R;
model Y = A B A*B;
run;

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

A 3 16 0.56 0.6470
B 1 16 1.54 0.2329
A*B 3 16 1.72 0.2021

April 17, 2001 6

292
Two–way ANOVA without Replicates
If there are no replicates within each combination of A and B (i.e.
r = 1), the model is

yab = µ + αa + βb + ab

since the interaction can not be estimated.

Following the lines from before, a diagram is


Aaa−1

[ABR]ab
ab−ab=0
ab
ABab−a−b+1 O11
b
Bb−1

April 17, 2001 7

Another way of looking at it is by saying that the random error is the


interaction!!

So a more appropriate diagram is

Aaa−1

[AB]ab
ab−a−b+1 O11
b
Bb−1

April 17, 2001 8

293
17 Factor Structure Diagrams

“Proof that it works...”

title ’Two-way ANOVA without replicates’;


data data2;
do A=1 to &a;
do B=1 to &b;
y=rannor(0);
output;
end; end;
proc mixed data=data2 noinfo noclprint;
class A B;
model Y = A B;
run;

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

A 3 3 0.45 0.7377
B 1 3 0.05 0.8414

April 17, 2001 9

Block Experiments with Replicates within Blocks


If A is a (random) block effect and there are replicates of the factor B
within each block the model is

yabr = µ + Ua + βb + Vab + abr

The diagram is

[A]aa−1

[ABR]abr
abr−ab [AB]ab
ab−a−b+1 O11
b
Bb−1

April 17, 2001 10

294
Note:

• The systematic effect B is to be tested against the random effect


closest to it in the diagram, i.e. [AB]

• Note that since A is a random effect, any factor containing A must


also be random.

April 17, 2001 11

“Proof that it works...”


title ’Block experiment with replicates within blocks’;
data data3;
do A=1 to &a;
U = rannor(0);
do B=1 to &b;
V = rannor(0);
do R=1 to &r;
y=rannor(0) + U + V;
output;
end; end; end;

proc mixed data=data3 noinfo noclprint;


class A B R;
model Y = B;
random A A*B;
run;

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

B 1 3 14.99 0.0305

April 17, 2001 12

295
17 Factor Structure Diagrams

Block Experiments without Replicates within Blocks


If A is a (random) block effect and there are no replicates of the factor
B within each block the model is

yab = µ + Ua + βb + ab

The diagram is

[A]aa−1

[AB]ab
ab−a−b+1 O11
b
Bb−1

April 17, 2001 13

“Proof that it works...”


title ’Block experiment without replicates within blocks’;
data data4;
do A=1 to &a;
U = rannor(0);
do B=1 to &b;
y=rannor(0) + U;
output;
end; end;
proc mixed data=data4 noinfo noclprint;
class A B;
model Y = B;
random A;
run;

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

B 1 3 3.30 0.1671

April 17, 2001 14

296
Split Plot Experiment
Let A denote the whole–plot treatment and B the split–plot treatment.
Replicate units within A are denoted by R.

The model is:

yabr = µ + αa + Uar + βb + (αβ)ab + abr

[AR]ar
ab−a−b+1 [A]aa−1

[ABR]abr
abr−ab O11
ab b
ABab−a−b+1 Bb−1

April 17, 2001 15

“Proof that it works”


title ’Split plot experiment’;
%let a=4; %let b=3; %let r=3;
data data5;
do A=1 to &a;
do R=1 to &r;
U = rannor(0);
do B=1 to &b;
y=rannor(0) + U;
output;
end; end; end;
proc mixed data=data5 noinfo noclprint;
class A B R;
model Y = A B A*B;
random A*R;
run;

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

A 3 8 0.68 0.5901
B 2 16 3.81 0.0444
A*B 6 16 2.57 0.0618

April 17, 2001 16

297
17 Factor Structure Diagrams

Split Plot Experiment – Homework

Let E and C be the vitamin E and copper treatments applied to R pigs


within each combination of E and C.

Let M denote the membrane.

Hence the model is

yecrm = µ+αe+βc+(αβ)ec+Uecr +γm+(αγ)em +(βγ)cm+(αβγ)ecm+ecrm

April 17, 2001 17

The factor structure diagram becomes

cm m
CMcm−c−m+1 Mm−1
ecm
ECM(e−1)(c−1)(m−1)

[ECRM ]ecrm
ec(rm−r−m+1)
em
EMem−e−m+1 e
Ee−1 O11

[ECR]ecr
ec(r−1)
ec c
ECec−e−c+1 Cc−1

April 17, 2001 18

298
“Proof that it works”
title ’Split plot experiment - homework - with 3 membranes’;
%let sigma_G = 2;
%let sigma_M = 6;
%let sigma_E = 1;
data mem;
do cu= 1 to 2;
do e_vit= 1 to 2;
do grnr= 1 to 8;
U_g = &sigma_G * rannor(0);
do membran= 1 to 3;
V_m = &sigma_M * rannor(0);
do muskel= 1 to 2;
E = &sigma_E * rannor(0);
y = U_g + V_m + E;
output;
end;
end;
end;
end;
end;
data mem1; set mem(where=(muskel=1));

April 17, 2001 19

proc mixed data=mem1;


class cu e_vit membran grnr;
model y = cu | e_vit | membran ;
random cu*e_vit*grnr ;
run;

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

cu 1 28 0.05 0.8316
e_vit 1 28 0.10 0.7489
cu*e_vit 1 28 1.55 0.2230
membran 2 56 0.10 0.9091
cu*membran 2 56 0.57 0.5708
e_vit*membran 2 56 1.26 0.2904
cu*e_vit*membran 2 56 1.16 0.3198

April 17, 2001 20

299
17 Factor Structure Diagrams

A Neat Little Exercise

1. Draw a factor structure diagram for the entire membrane experiment.

2. Compute the degrees of freedom for each test.

3. Verify by simulation that SAS does the right thing.

Hint: Use a BIG sheet of paper!

April 17, 2001 21

300
18 Covariate Models and Multivariate
Response

The use of covariates in mixed models is discussed, initially based on chapter 5 in LMSW (Littell
et al., 1996), i.e., model specification, comparison, and reduction.
Then it is shown that the covariate model may be naturally modified to include several dependent
variables, i.e., to a multivariate response model. The data manipulation steps in SAS is described
and the necessary model specification shown.
Link to full screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/covariate.f.pdf

301
18 Covariate Models and Multivariate Response

Example of use of covariates

Excercise 1: Treatments copper and vitamin E each at three levels.


Litters as blocks. Dependent variables, daily gain (and feed intake).
Weight at start differed.

April 17, 2001 1

Plot
1.0
0.9
Daily Gain
0.8
0.7
0.6

15 20 25 30 35
Start weight
April 17, 2001 2

302
Plot

1.0
0.9
Daily Gain
0.8
0.7
0.6

15 20 25 30 35
Start weight
April 17, 2001 3

Yijk = (αγ)ij + Lk + βij wijk + εijk

• Yijk : Daily gain,


• wijk weight at start,
• βij regression coefficient for level ij of treatment,
• (αγ)ij interaction between copper and vitamin E,
2
• Lk random effect of litter (Lk ∼ N (0, σL )),
2
• εijk random residual, εijk ∼ N (0, σ )

Model reduction ?

April 17, 2001 4

303
18 Covariate Models and Multivariate Response

Model reduction

Reformulate as additive model and remove non-significant terms

Yijk = (αγ)ij + Lk + βij wijk + εijk

(αγ)ij = µ + αi + γj + (αγ)0ij
0
βij = β0 + β1i + β2j + βij

April 17, 2001 5

Table 5:1 LMSW, page 5.2.2

1. Are all slopes = 0 ? If fail to reject goto step 2. else goto 3

2. Fit a common slope and test hypothesis = 0. If fail to reject


compare treatments using ANOVA, else use parallel lines

3. Test that the slopes are equal. If fail to reject use common slope
model, if reject goto step 4.

4. Use the unequal slopes model.

April 17, 2001 6

304
SAS-code

Step 1:

proc Mixed data=a;


class Kuld Evit Kobber ;
model Tilv= Evit*Kobber
Startv*Evit*Kobber /noint solution ;
random kuld ;

Step 3:

model Tilv= Evit Kobber Evit*Kobber


Startv Startv*Evit Startv*Kobber
Startv*Kobber*Evit ;

April 17, 2001 7

SAS-Anova

Type 3 Tests of Fixed Effects


Num Den
Effect DF DF F Value Pr > F

EVIT 2 34 0.54 0.5905


KOBBER 2 34 0.46 0.6333
EVIT*KOBBER 4 34 1.10 0.3740
STARTV 1 34 27.62 <.0001
STARTV*EVIT 2 34 0.79 0.4627
STARTV*KOBBER 2 34 0.55 0.5829
STARTV*EVIT*KOBBER 4 34 1.13 0.3572

April 17, 2001 8

305
18 Covariate Models and Multivariate Response

Plot

1.0
0.9
Daily Gain
0.8
0.7
0.6

15 20 25 30 35
Start weight
April 17, 2001 9

Final Model
1.0
0.9
Daily Gain
0.8
0.7
0.6

15 20 25 30 35
Start weight
April 17, 2001 10

306
Feed per day

1.0
0.9
Daily Gain
0.8
0.7
0.6

1.4 1.6 1.8 2.0 2.2 2.4 2.6


Feed pr day
April 17, 2001 11

Feed per day


1.0
0.9
Daily Gain
0.8
0.7
0.6

1.4 1.6 1.8 2.0 2.2 2.4 2.6


Feed pr day
April 17, 2001 12

307
18 Covariate Models and Multivariate Response

Feed per day

1.0
0.9
Daily Gain
0.8
0.7
0.6

1.4 1.6 1.8 2.0 2.2 2.4 2.6


Feed pr day
April 17, 2001 13

SAS-code

Test

proc Mixed data=a;


class Kuld Evit Kobber ;
model Tilv= Kobber
Fedag Fedag*Kobber ;
random kuld ;

Estimation:

model Tilv= Kobber


Fedag*Kobber /noint solution ;

April 17, 2001 14

308
Feed per day

1.0
0.9
Daily Gain
0.8
0.7
0.6

1.4 1.6 1.8 2.0 2.2 2.4 2.6


Feed pr day
April 17, 2001 15

The lines actually denotes the conditional distribution of the daily


gain given the feed intake, i.e.,

Yij = µ + βXij + εij

If both variables measures the effect of the treatment, the joint


distribution may be more interesting.
There is a relatively simple relationship between the conditional and
joint distribution.

E(Xij ) = µx
E(Yij ) = µy = E(µ + βXij ) = µ + βµx

April 17, 2001 16

309
18 Covariate Models and Multivariate Response

V(Xij ) = σx2
1
V(Yij |Xij ) = V(εij ) = σx2 − σyx σxy
σx2
C(Xij , Yij ) = C(Xij , µ + βXij + εij ) = β V(Xij ) = βσx2
V(Yij ) = σε2 + β 2σx2

i.e., the joint distribution


   2
βσx2
  
Xij µx σx
∼N ,
Yij µy βσx2 σε2 + β 2σx2

Can this be generalised ?


April 17, 2001 17

Multivariate Responses

Consider a feeding experiment where a treatment factor A (say


supplement of copper) is applied to pigs.

Two responses are measured:

Y 1 : Weight gain

Y 2 : Feed intake

Hence the response is a two–dimensional vector Y = (Y 1, Y 2)>.

April 17, 2001 18

310
Return to the feeding experiment.

A model for each response Y r , where r = 1, 2 could be

Yikr = µr + αir + εrik

where i = 1, . . . , I is treatment, k = 1, . . . K is replicates within


each treatment, and εrik ∼ N (0, σr2).

Hence all parameters µr , αir , σr2 are specific to the rth response.

April 17, 2001 21

The Components of a MLNM

For each response Y r it is assumed that E(Y r ) can be written as a


linear function of the explanatory variables.

In the example,
E(Yikr ) = δ r + αir

April 17, 2001 22

311
18 Covariate Models and Multivariate Response

It is assumed that the mean value has the same structure for each
response r made on the same unit.

In the example,

E(Yik ) = (E(Yik1 ), E(Yik2 )) = (δ 1 + αi1, δ 2 + αi2) = (µ1i , µ2i )

It is also assumed that the parameters β r and β s relating to the rth


respectively the sth response have nothing in common.

In the example, this means that there are no restrictions on the


parameters of the form that e.g. αi1 and αi2 are restricted to being
identical.

April 17, 2001 23

The responses are possibly correlated. To account for this we allow


for a covariance matrix of the form
 2 
σ1 σ12
Σ = C(Yik ) =
σ21 σ22

The model we consider can be briefly written

Yik = (Yik1 , Yik2 ) ∼ N2((µ1i , µ2i ), Σ)

If the vectors are regarded as row vectors, then it just looks like two
linear normal models appended to each other, with the extra finesse
that the two responses are allowed to be non–independent.
And - that is just what it is !
April 17, 2001 24

312
Such models can be dealt with in a mixed model setup.

The trick is to arrange the data in columns.

Suppose there are two treatments, i.e. i = 1, 2 and two pigs per
treatment, i.e. j = 1, 2.

Then there 4 units in the experiment, each with two measurements


giving all together 8 measurements.

April 17, 2001 25

It is not very hard to see that the mean of each of these can be
written in the matrix form
  1    
Y11 1 1 0 0 0 0  
2
  Y11  
  
 0 0 0 1 1 0 
 δ1
1

 Y12 


 1 1 0 0 0 0 
 α11 

2
 Y   0 0 0 1 1 0  α12 
E(  12 ) =
    
1
δ2

Y21 1 0 1 0 0 0
  
    
2
0 0 0 1 0 1 α21
    
 Y
  21







1
α22

 Y22   1 0 1 0 0 0 
2
Y22 0 0 0 1 0 1

April 17, 2001 26

313
18 Covariate Models and Multivariate Response

The covariance matrix is easy to specify too: The units are assumed
independent, and hence the covariance between measurements on
different units is zero.

The covariance structure for measurements on the same unit


together with the variances are described in the 2 × 2 matrix Σ.

April 17, 2001 27

For all measurements, the covariance matrix is therefore the 8 × 8


matrix
  1  
Y11
2
  Y11  
 
 
1

 Y 12

 Σ 0 2 0 2 0 2
2
Y 0 Σ 0 0
   
 2 2 2 
C(  12 ) =
 
1

Y21  02 02 Σ 0 2 
  
 
 2 
02 02 02 Σ
  Y21
 
1
 
 Y22 
2
Y22

where 02 is the 2 × 2 matrix consisting exclusively of 0s.

April 17, 2001 28

314
How to ... In SAS

A brief outline about how to work with such problems in SAS.

The response variables are stacked on top of each other in a variable


called Y.

Let R be another variable with levels, say W and I indicating whether


the corresponding measurement in Y is a measurement of weight or
feed intake.

Let K be a variable identifying the subjects (within the treatment),


and let A be the treatment factor.

Then the following SAS program would do the trick:


April 17, 2001 29

proc mixed data=...;


class R K A;
model Y = R R*A / noint ddfm=satterth ...;
repeated R / subject=K*A type=un;
run;

In the REPEATED statement the subject option specifies the blocks


of the covariance matrix (in the example that there are 4 blocks).

The option type=un specifies that the blocks should be completely


unstructured

The variable R in the REPEATED statement is used for identifying the


different response types.

April 17, 2001 30

315
18 Covariate Models and Multivariate Response

The General Setup

More generally,
E(Yjr ) = x>
j β
r

where xj are covariates for the jth experimental unit and β r is a


vector of parameters establishing the connection between E(Yjr ) and
xj
More generally,

E(Yj ) = (E(Yj1), E(Yj2), . . . , E(YjR)) = x> 1 2 R >


j [β : β : · · · : β ] = xj B

Hence B = [β 1 : β 2 : · · · : β R] is now a matrix of parameters where


the rth column is the parameters associated with the rth response.
April 17, 2001 31

If we let Yj = (Yj1, . . . , RjR) be a row vector, then


E(Yj ) = x>
j B

is also a row vector and is given by


If the rows of data from all n units are stacked on top of each other
we obtain an n × R matrix
 
Y11 Y12 . . . , R1R
 Y2 Y22 . . . , R2R 
 1 
Y = . .. .. 
 . 
1 2 R
Yn Yn . . . , R n

Similarly the covariates x>


j can be stacked on top of each other to
give a design matrix X (with dimension n × p) in the usual way.
April 17, 2001 32

316
The previous considerations then gives that
E(Y ) = X B
(n × R) (n × p) (p × R)
i.e. the mean is now organized as a matrix rather than as a vector.

April 17, 2001 33

317
18 Covariate Models and Multivariate Response

318
19 Heterogeneous Variance

The purpose of this lecture was to present why it is important to recognize variance heterogeneity,
how to model such heterogeneity and consequences of different modelling approaches. The
lecture extends the description in chapter 8 in LMSW (Littell et al., 1996).
Graphical techniques for finding suitable models of variance heterogeneity is presented and vari-
ance functions including the power-family is introduced. In addition, the effect of transformation
is illustrated.
Link to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/VarianceStructure.f.pdf

319
19 Heterogeneous Variance

Why Variance Heterogeneity is Important to


Recognize

Frequently the usual assumptions about variance homogeneity are


not met in practice. In that case the variance is said to be
heterogeneous.

One reason for incorporating variance heterogeneity in the model is


the ability to

• downweight portions of data which are highly variable, and

• extract more information from portions of the data which are more
precise.
October 18, 2001 Mixed Models Course 1

As always there is a price to pay:

• The models become less parsimonious in terms of the number of


parameters.

• Fitting the models can be more difficult (numerical problems).

• Usually, only asymptotic inference can be carried out (i.e. no exact


F–tests etc.)

• Model control becomes more complicated.

October 18, 2001 Mixed Models Course 2

320
Graphical Investigation of the Variance Structure

Frequently there is some structure on the way in which the variance


is non–constant:

Frequently the variance increases when the mean increases.

That is, the variance is a function of the mean, symbolically

Var(Y ) = f (E(Y ))

With grouped data, the variance function can sometimes be


identified.
October 18, 2001 Mixed Models Course 3

Example 1. One-way ANOVA:

Ykl = αk + kl

where kl ∼ N (0, σk2 ) for some treatments k = 1, 2, . . . K and


replicates within treatments l = 1, 2, . . . , Lk .

Good estimates for mean and variance in the kth group are

• Mean: ȳk.
1
• Variance: s2k = − ȳk.)2
P
Lk −1 l (ykl

A reasonable idea is to plot s2k against ȳk. to see if the variance is a


function of the mean. f in

October 18, 2001 Mixed Models Course 4

321
19 Heterogeneous Variance

Variance Functions

After having found that the variance is non–constant, the next step
is to look for some structure in which it is non–constant.

This is obtained by considering a particular function for the variance


as a function of the mean.

Frequently in practice one works with the variance function

Var(Y ) = σ 2µθ

where µ = E(Y ), and σ 2 and θ are unknown constants.

Variance functions of this form are called the power family.

October 18, 2001 Mixed Models Course 5

With
Var(Y ) = σ 2µθ
we have a linear relationship on the log–scale:

log Var(Y ) = log σ 2 + θ log µ

Therefore, in the ANOVA example the natural thing to do is to plot


log s2k against log ȳk. and see if the relationship is approximately
linear.

If so, it may be reasonable to assume then we are within the power


family of variance functions – and this is a nice family as shall soon
be shown.

October 18, 2001 Mixed Models Course 6

322
Example 2. A substance X14 has been added in the concentration
fod∈ {0.0, 4.4, 6.2, 9.3} to the food for some pigs. The pigs are
fed (up!) with this food until their weight is 60 kg. From thereof
and until they are slaughtered at 100kg, their food does contain the
substance.

At 60kg (sample=1) and 100kg (sample=2) muscle biopsies are made


and the concentration of the substance is determined.
Concentrations, 1=60kg, 2=100kg

1
3

1
2
m

2
2
1

1
2

0 2 4 6 8

fod

October 18, 2001 Mixed Models Course 7

Plot of individual points and of log–variance against log–mean indicate


that variance increases with the mean:
Sample = 1 Sample = 2 Log−var vs log−mean, slope=1.23(0.25)
−1.5
4

4
3

logv
X14

X14

−2.5
2

2
1

−3.5
0

0 2 4 6 8 0 2 4 6 8 −1.0 −0.5 0.0 0.5 1.0

fod fod logm

• One possibility is a linear increase with the slope being ≈ 1.


• Another is that there are two variances: One when fod= 0 and
another one when fod6= 0.

f in

October 18, 2001 Mixed Models Course 8

323
19 Heterogeneous Variance

From hereof there are different possibilities:

• Transform data onto a scale where the variance is (approximately)


constant

• Include the heterogeneous variance explicitly in the model

October 18, 2001 Mixed Models Course 9

The Delta–method

First we consider transformation of data onto a scale where the


variance is approximately constant.

Let Y be a random variable and let h() be a nice function, e.g.



h(y) = y, h(y) = y 2, h(y) = log y.

We shall investigate the properties of the transformed random


variable Z where
Z = h(Y )

October 18, 2001 Mixed Models Course 10

324
Example 3. Let Y ∼ N (µ, σ 2). If h is linear, i.e. h(y) = α + βy,
then it is well known that

Z = h(Y ) ∼ N (α + βµ, β 2σ 2)

If h is non–linear, e.g. if h(y) = log y then Z is not normally


distributed. f in

• However, Z = h(Y ) will in certain cases be approximately normal


if Y is normal.

• Moreover, one can find the approximate mean and variance of Z


independently of whether Y is normal or not.

October 18, 2001 Mixed Models Course 11

Taylors Approximation

The road to these results can be based on the following argument:

Let x0 and x be two numbers (not too far apart) and assume that h
is “nice” (i.e. differentiable).

Then it is well known from high school that

h(x) ≈ h(x0) + h0(x0)(x − x0).

The further x is from x0 the worse is this approximation.

This approximation is frequently called a Taylor expansion of h


around x0.
October 18, 2001 Mixed Models Course 12

325
19 Heterogeneous Variance

First order Taylor approximation

80
60
f(x)

40
20
0

0 1 2 3 4

h(x) ≈ h(x0) + h0(x0)(x − x0).

October 18, 2001 Mixed Models Course 13

Applying Taylors Approximation

Taylors approximation is now applied to the random variable Y with


mean µ = E(Y ) and variance σ 2 = Var(Y ).

The approximation is around µ. We then get

Z = h(Y ) ≈ h(µ) + h0(µ)(Y − µ).

• Hence, when Y is “close to” µ, then h(Y ) is approximately a linear


function of Y .

• Y “being close to” µ means basically that σ 2 has to be to be small.

October 18, 2001 Mixed Models Course 14

326
• From the approximation

Z = h(Y ) ≈ h(µ) + h0(µ)(Y − µ).

we also conclude that

E(Z) = E(h(Y )) ≈ h(µ)


Var(Z) = Var(h(Y )) ≈ h0(µ)2 Var(Y )

• Hence, if Y is normal then it follows that Z must also be


approximately normal since Z is an approximately linear function
of Y . In this case we therefore conclude

Z = h(Y ) ≈ N (h(µ), h0(µ)2σ 2).


October 18, 2001 Mixed Models Course 15

It must be emphasized that these results are asymptotic results.


How good they are depend on many things including

• the variance of Y , i.e. how close Y –value tend to be to µ

• the form of h – how “smooth” (that is how close to being linear)


h is.

October 18, 2001 Mixed Models Course 16

327
19 Heterogeneous Variance

Transformation of Data

The previous results can sometimes be used for identifying


transformations of data onto a scale where the variance is constant.

It is assumed in the following that

E(Yi) = µi and V ar(Yi) = σ 2µθi .

By plotting log–variance against log–mean one can frequently get a


good estimate of θ, and from that one can (sometimes) identify an
appropriate transformation.

October 18, 2001 Mixed Models Course 17

We look for a function h such that Z = h(Y ) has constant variance


2
σZ :

• From the previous section we have

2
σZ = Var(h(Y )) ≈ h0(µ)2 Var(Y ) = h0(µ)2σ 2µθ

• If we solve for h0 we get


r
2
σZ − θ2
h0(µ) ≈ µ
σ2

q 2
σZ
• For later use let c = σ2
. Hence we look for a function h which

October 18, 2001 Mixed Models Course 18

328
satisfies that its derivative is
β
h0(µ) = cµ− 2 .

October 18, 2001 Mixed Models Course 19

Such an equation is called a differential equation.

The search for h has to be taken in two steps:

When θ = 2: Then h0(µ) = c µ1 , and high school knowledge tell us


that the solution is the natural logarithm, i.e.

h(µ) = c log(µ).

When θ 6= 2: In this case we need the anti–derivative of a simple


power function. It is then well know from high school that

2 2−θ
h(µ) = c µ 2 .
2−θ

October 18, 2001 Mixed Models Course 20

329
19 Heterogeneous Variance

With Var(Y ) = σ 2µθ there are some well known special cases:

• Note that θ = 0 implies that the Var(Y ) = σ 2.


(As is the case in Linear Normal Models)

• Note that σ 2 = θ = 1 implies that the Var(Y ) = µ.


(As is the case in the Poisson distribution.)

• Note that θ = 2 implies that the Var(Y ) = σ 2µ2.


(I.e. the coefficient of variation is constant as is the case in the
Gamma distribution.)

October 18, 2001 Mixed Models Course 21

Modelling Variance Heterogeneity

As has been seen transformation of data in an attempt to obtain


variance can be a mixed blessing:

• the transformation can ruin the linearity of the men structure.

• it can be very difficult to report contrasts and their standard error


on the original scale.

An attractive alternative to transformation is therefore to include


variance heterogeneity in the model.
October 18, 2001 Mixed Models Course 22

330
Consider the pig–feeding example from before and the model

yis = α + βxi + βsxi + is

where i is pig, s is sample and xi is the dose given to the ith pig.

• if is ∼ N (0, σ 2) then it is a LNM, i.e. there is assumed variance


homgeneity.

• if is ∼ N (0, σx2i ) then we accomodate for different variances


corresponding to different doses of x. (Recall that xi can assume
4 different values, so there are 4 different variance parameters

• if is ∼ N (0, σ12) when xi = 0.0 and is ∼ N (0, σ22) when xi 6= 0.0
there are two different variance parameters in the model.
October 18, 2001 Mixed Models Course 23

• if is ∼ N (0, σx2i,s) then we accomodate for different variances


corresponding to different doses of x and for there different samples
(Hence there are 8 different variance parameters).

October 18, 2001 Mixed Models Course 24

331
19 Heterogeneous Variance

Fitting the models in PROC MIXED:

data biopsi; set biopsi; fod_c =fod; if fod=0.0 then fod_c2 = 1;


else fod_c2=2;

title ’Variance homogeneity’;


proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o1;
ods output solutionf=sf1;

title ’Variance heterogeneity, 4 variances’;


proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o2;
ods output solutionf=sf2;
repeated fod_c/ type=un(1);

October 18, 2001 Mixed Models Course 25

title ’Variance heterogeneity, 2 variances’;


proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o3;
ods output solutionf=sf3;
repeated fod_c2/ type=un(1);
run;

October 18, 2001 Mixed Models Course 26

332
Parts of the SAS output is
Variance homogeneity: Residual 0.1262
-2 Res Log Likelihood 51.6
AIC (smaller is better) 53.6
AICC (smaller is better) 53.7
BIC (smaller is better) 55.4

Variance heterogeneity, 4 variances: Cov Parm Estimate


-2 Res Log Likelihood 39.1 UN(1,1) 0.02512
AIC (smaller is better) 47.1 UN(2,2) 0.08855
AICC (smaller is better) 48.0 UN(3,3) 0.1491
BIC (smaller is better) 54.6 UN(4,4) 0.2481

Variance heterogeneity, 2 variances: Cov Parm Estimate


-2 Res Log Likelihood 41.8 UN(1,1) 0.02517
AIC (smaller is better) 45.8 UN(2,2) 0.1592
AICC (smaller is better) 46.1
BIC (smaller is better) 49.6

October 18, 2001 Mixed Models Course 27

The parameter estimates are:


Effect sample Estimate StdErr DF tValue Probt model

Intercept 0.3130 0.09145 46 3.42 0.0013 varhomo


fod 0.1453 0.01735 46 8.38 <.0001 varhomo
fod*sample 1 0.2433 0.01689 46 14.40 <.0001 varhomo
fod*sample 2 0 . . . . varhomo

Intercept 0.2608 0.04468 12.1 5.84 <.0001 varhet1


fod 0.1546 0.01552 41 9.96 <.0001 varhet1
fod*sample 1 0.2489 0.01985 33.3 12.54 <.0001 varhet1
fod*sample 2 0 . . . . varhet1

Intercept 0.2620 0.04489 11.9 5.84 <.0001 varhet2


fod 0.1524 0.01466 44.9 10.39 <.0001 varhet2
fod*sample 1 0.2432 0.01897 34.9 12.82 <.0001 varhet2
fod*sample 2 0 . . . . varhet2

October 18, 2001 Mixed Models Course 28

333
19 Heterogeneous Variance

Heterogeneous Variance for Grouped Data

Example 4. Example 8.2 from LMSW, p. 268.

• The response is the ultrafiltration rate UFR (in ml/hr) of 20 high


flux membrane dialyzers measured at 7 different transmembrane
pressures TMP.

• The measurements are made in vivo and the aim is to characterize


the ultrafiltration characteristics of the membranes.

• The dialyzers are evaluated in vitro using bovine blood and flow
rates QB of either 200 or 300 dl/min.

October 18, 2001 Mixed Models Course 29

QB= 200 QB= 300


60

60
40

40
ufr

ufr
20

20
0

0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0

tmp tmp

• Plots suggest inhomogeneous variance, and more specifically that


variance increases with the mean.

• The plot also suggest that there might be individual curves for each
membrane, i.e. to consider random regression coefficient models.

October 18, 2001 Mixed Models Course 30

334
The starting point is the 4. degree polynomial model

yimj = β0 + τi + (β1 + δ1i)ximj + (β2 + δ2i)x2imj


+(β3 + δ3i)x3imj + (β4 + δ4i)x4imj + imj

where x is TMP, i denotes QB–level, m is membrane within QB–


level, and j is the jt measurement on the membrane to which the
measurement ximj is associated.

There are 7 measurements on each membrane, so a crude starting


point could be to assume that im = (im1, . . . , im7 ) follows a
7–dimensional normal distribution,

im ∼ N (0, R)

where R is an unstructured 7 × 7 covariance matrix.


October 18, 2001 Mixed Models Course 31

The SAS program employed by LMSW for fitting this model is


proc mixed data=dial;
class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated / type=un subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;

With this program data is treated as being equidistant in TMP, i.e. the
actual difference between two TMP–measurements is accounted for.

This becomes transparent if the program is rewritten as


proc mixed data=dial;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated index / type=un subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;

October 18, 2001 Mixed Models Course 32

335
19 Heterogeneous Variance

Some of the SAS output is

Estimated Covariance matrix


2.76 2.90 3.57 3.04 0.36 0.46 0.64
2.90 5.10 6.40 6.38 4.13 3.32 1.16
3.57 6.40 11.15 12.46 8.33 5.44 4.02
3.04 6.38 12.46 18.54 13.38 10.90 7.68
0.36 4.13 8.33 13.38 17.71 13.83 12.04
0.46 3.32 5.44 10.90 13.83 20.31 11.33
0.64 1.16 4.02 7.68 12.04 11.33 19.67

Estimated Correlation matrix


1.00 0.77 0.64 0.43 0.05 0.06 0.09
0.77 1.00 0.85 0.66 0.43 0.33 0.12
0.64 0.85 1.00 0.87 0.59 0.36 0.27
0.43 0.66 0.87 1.00 0.74 0.56 0.40
0.05 0.43 0.59 0.74 1.00 0.73 0.65
0.06 0.33 0.36 0.56 0.73 1.00 0.57
0.09 0.12 0.27 0.40 0.65 0.57 1.00

October 18, 2001 Mixed Models Course 33

f in

• Note that with the model above there are 7 × 8/2 = 28 parameters
in the covariance matrix.

• The variances increase with TMP, and hence the covariances increase
with the differences in TMP.

• Yet, the correlations decrease with the difference in TMP.

• We seek a more parsimoneous model describing this correlation


structure.

October 18, 2001 Mixed Models Course 34

336
• A simple AR(1) model in which the ijth element of R is

Rij = σ 2ρ|i−j|

(which has 2 parameters) will clearly not fit to these data.

• A more flexible alternative is the heterogeneous AR(1) model (the


ARH(1) model) in which the ijth element of R is

Rij = σiσj ρ|i−j|

(which has 8 parameters). This model is still much more


parsimonious than the unstructured covariance matrix which
requires 28 parameters.

October 18, 2001 Mixed Models Course 35

The ARH(1) model can be fitted using


proc mixed data=dial;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated index / type=arh(1) subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;

October 18, 2001 Mixed Models Course 36

337
19 Heterogeneous Variance

The empirical and estimated correlation matrix from the ARH(1)


model are close:

Estimated Correlation matrix (ARH(1))


1.00 0.76 0.58 0.44 0.34 0.26 0.20
0.76 1.00 0.76 0.58 0.44 0.34 0.26
0.58 0.76 1.00 0.76 0.58 0.44 0.34
0.44 0.58 0.76 1.00 0.76 0.58 0.44
0.34 0.44 0.58 0.76 1.00 0.76 0.58
0.26 0.34 0.44 0.58 0.76 1.00 0.76
0.20 0.26 0.34 0.44 0.58 0.76 1.00

Estimated Correlation matrix (Unstructured)


1.00 0.77 0.64 0.43 0.05 0.06 0.09
0.77 1.00 0.85 0.66 0.43 0.33 0.12
0.64 0.85 1.00 0.87 0.59 0.36 0.27
0.43 0.66 0.87 1.00 0.74 0.56 0.40
0.05 0.43 0.59 0.74 1.00 0.73 0.65
0.06 0.33 0.36 0.56 0.73 1.00 0.57
0.09 0.12 0.27 0.40 0.65 0.57 1.00

October 18, 2001 Mixed Models Course 37

For the model with the unstructured covariance matrix, a plot of the
residuals against TMP gives some insight:

Residuals, UN − QB= 200 Residuals, UN − QB= 300


5

5
Resid

Resid
0

0
−5

−5
−10

−10

0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0

tmp tmp

• The profiles do not vary randomly around 0 – some profiles are


steadily increasing, other steadily decreasing.
October 18, 2001 Mixed Models Course 38

338
• This suggests that maybe we are not faced with variance
heterogeneity but rather with individual regression coefficients.

• (After all, there is likely to be some variation between the


membranes).

The random regression model is fitted by:


proc mixed data=dial ;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / outp=o2;
random int tmp tmp*tmp / subject=sub type=un;
run;

October 18, 2001 Mixed Models Course 39

Now there is no tendency for the residuals to be steadily increasing


or decreasing when plotted against TMP.
Residuals, RandomReg − QB= 200 Residuals, RandomReg − QB= 300
4

4
2

2
Resid

Resid
0

0
−2

−2
−4

−4

0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0

tmp tmp

Yet, the curves are still somewhat “smooth” suggesting that some
within subject variation has yet to be accounted for.

October 18, 2001 Mixed Models Course 40

339
19 Heterogeneous Variance

Power–of–Mean for Data with Covariates

Previously it was discussed that the variance can sometimes be


regarded as a function of the mean.

This was used for

• identifying situations where serious variance heterogeneity was


present

• suggesting transformations of data

Yet, until now the actual structure – the variance as a function of


the mean has never been used directly.
October 18, 2001 Mixed Models Course 41

Usually when estimating variance/covariance parameters this is done


by subtracting estimates for the mean from the observed data to
give residuals. The residuals are then used for estimating the
variance/covariance parameters.

REML estimation is a clear example of this.

• In the setup in this section the mean and variance parameters are
not estimated separately.

• With this setup, one can capture variance heterogeneity together


with having random regression coefficients in the model

October 18, 2001 Mixed Models Course 42

340
• We consider cases where the variance of the residuals is

Var(i) = σ 2|µi|θ

such that the R–matrix is diagonal with Rii = σ 2|µi|θ .

• Since µi = x>
i β, the mixed model becomes complicated:

y = Xβ + Zu + 

where

E(Y ) = Xβ
Var() = R(σ 2, β, θ) = diag(σ 2|x> θ
i β| )

are both functions of β.


October 18, 2001 Mixed Models Course 43

• Consequently, maximizing the likelihood function is going to be a


very complicated task.

October 18, 2001 Mixed Models Course 44

341
19 Heterogeneous Variance

Yet, it is easy to suggest a heuristic solution to the estimation


problem:

• Suppose we have a provisional estimate β p of β.

• If this estimate is plugged into R, i.e.

R(σ 2, β p, θ) = diag(σ 2|x> p θ 2


i β | ) = R̃(σ , θ)

then R is all of a sudden only a function of σ 2 and the power θ.

• These parameters can be estimated, together with β and the


parameters in Var(u) in PROC MIXED.

• The trick is then to set β p equal to the new estimate for β and
repeat the iteration until the parameters stop changing.
October 18, 2001 Mixed Models Course 45

In LMSW, p. 278 a way of doing it is shown. A simpler way is given


here:

1. First the iteration has to be started:

proc mixed data=dial;


class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / s;
random int tmp tmp*tmp / type=un sub=sub;
repeated / local;
ods output solutionf=sf covparms=cp;
run;

October 18, 2001 Mixed Models Course 46

342
2. Then the estimated parameters β are used as provisional parameters
in the next iteration. (This happens in the repeated statement).
The estimated parameters of Var(u) as used as starting point
for the maximization algorithm. (This happens in the parms
statement).
This step is not necessary to but it speeds up the procedure
considerably:

proc mixed data=dial;


class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / s outp=o3;
random int tmp tmp*tmp / type=un sub=sub;
repeated / local=pom(sf);
parms / pdata=cp;
ods output solutionf=sf1 covparms=cp1;
run;

October 18, 2001 Mixed Models Course 47

3. Finally the provisional estimate β p is set to the recent estimate for


β.
Likewise, the starting values for the parameters in Var(u) are set
to the recently estimated values of these:

proc compare brief data=sf compare=sf1;


var estimate;

data sf; set sf1;


data cp; set cp1;
run;

Now iterate between 2. and 3. until convergence, i.e. until the


parameters in sf and sf1 become very similar.

October 18, 2001 Mixed Models Course 48

343
19 Heterogeneous Variance

Parts of the output from the final iteration is


Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) sub 3.8360


UN(2,1) sub -5.8353
UN(2,2) sub 28.2501
UN(3,1) sub 1.3778
UN(3,2) sub -8.3312
UN(3,3) sub 2.6970
POM 1.9785
Residual 0.001974

The power is estimated to 1.9785 ≈ 2 which, in a sense, corresponds


to the case of constant coefficient of variation.

October 18, 2001 Mixed Models Course 49

Now there is no tendency for the residuals to be steadily increasing


or decreasing when plotted against TMP.

Also the curves are less smooth than before, suggesting that more of
the within subject variation has yet to be accounted for.
Residuals, POM − QB= 200 Residuals, POM − QB= 300
4

4
2

2
Resid

Resid
0

0
−6 −4 −2

−6 −4 −2

0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0

tmp tmp

October 18, 2001 Mixed Models Course 50

344
Noget om transformationer,
normalfordelingsapproximationen og
konfidensintervaller

Baseret på 250 kvitteringer for indkøb af benzin li samt tilsvarende


registreringer af kørte kilometer pr. tankfuld ki er benzinøkonomien

ki
yi = , i = 1, . . . , 250
li

udtrykt ved kilometer pr. liter beregnet.

Histogrammet og probitdiagrammet i øverste række af nedenstående


figur viser at man med rimelighed kan antage at yi’erne er
October 18, 2001 Mixed Models Course 51

realisationer af stokastiske variabler Yi, hvor

Yi ∼ N (µ, σ 2), i = 1, . . . , 250

På basis af data kan man nu opstille f.eks. et konfidensinterval, for µ.

Af forskellige grunde beslutter man sig for at ville sælge bilen i USA,
hvor man sædvanligvis angiver benzinøkonomi som “gallon pr. 100
miles”. For at gøre det nemt betragter vi i stedet “liter pr. 100 km”,
nemlig
li 1
zi = 100 = 100 .
ki yi
Det vil sige at vi transformerer data som zi = h(yi) = 100/yi.

October 18, 2001 Mixed Models Course 52

345
19 Heterogeneous Variance

Det er velkendt at hvis Yi er normalfordelt, så er 100/Yi IKKE


normalfordelt.

Nedenfor ses histogrammer og qqplots for Y og Z = h(100/Y ).


Histogram of y Normal Q−Q Plot

Sample Quantiles

14
Frequency

30

12
0 10

10
10 11 12 13 14 15 −3 −2 −1 0 1 2 3

y Theoretical Quantiles

Histogram of z Normal Q−Q Plot

Sample Quantiles

10
60
Frequency

40

9
8
20

7
0

7 8 9 10 11 −3 −2 −1 0 1 2 3

z Theoretical Quantiles

Man sporer en svagt højreskæv fordeling for zierne, men ellers ser
data ud til rimeligt at kunne beskrives ved en normalfordeling. Det
vil sige at med en vis rimelighed kan man arbejde med at 100/Yi
tilnærmelsesvist er normalfordelt.

October 18, 2001 Mixed Models Course 53

Ovenstående data er i virkeligheden 250 observationer simulerede fra


en N (12, 12)–fordeling.

Vi skal nu illustrere at approximationen til normalfordelingen bliver


gradvist dårligere når spredningen bliver større.

Vi har derfor gennemført ovenstående for spredningen σ = 2 og


σ = 3. Resultaterne er vist nedenfor:
Histogram of y Normal Q−Q Plot Histogram of y Normal Q−Q Plot
20
Sample Quantiles

Sample Quantiles
60
16
Frequency

Frequency
30

15
40
12

10
20
10

5
0

6 8 10 12 14 16 18 −3 −2 −1 0 1 2 3 5 10 15 20 −3 −2 −1 0 1 2 3

y Theoretical Quantiles y Theoretical Quantiles

Histogram of z Normal Q−Q Plot Histogram of z Normal Q−Q Plot


Sample Quantiles

Sample Quantiles
14
60

20
Frequency

Frequency

60
40

15
10
20

10
20
8
6
0

6 8 10 12 14 −3 −2 −1 0 1 2 3 5 10 15 20 −3 −2 −1 0 1 2 3

z Theoretical Quantiles z Theoretical Quantiles

Vi skal nu illustrere hvorledes det går med middelværdien og

October 18, 2001 Mixed Models Course 54

346
variansen af de transformerede data.

I det følgende lader vi E(Z) = η og V ar(Z) = τ 2. Vi kan da


estimere η og τ 2 direkte på bagrrund af de transformerede data som
henholdsvis gennemsnittet og stiskprøvevariansen.

Dernæst bemærkes at med h(x) = 100/x er h0(x) = −100/x2. Af


resultaterne

E(Z) = E(h(Y )) ≈ h(µ)


V ar(Z) = V ar(h(Y )) ≈ h0(µ)2V ar(Y )

har vi derfor at E(Z) ≈ 100/µ og V ar(Z) ≈ 10000σ 2 /µ4.

For de σ = 1, 2, 3 er tallene givet i tabelen nedenfor.


October 18, 2001 Mixed Models Course 55

1
Det ses at µ̂ er en god approximation til E(Z) = η og ligeledes er
2
10000 σ̂µ̂4 en rimelig tilnærmelse til V ar(Z) = τ 2 når spredningen er
lille. Det fremgår også at når spredingen bliver stor, blive specielt
approximationen til V ar(Z) = τ 2 dårlig.

Størrelse σ=1 σ=2 σ=3


µ̂ 11.968 11.919 11.962
σ̂ 2 1.146 4.007 9.475
η̂ 8.423 8.658 9.261
τ̂ 2 0.588 2.834 18.833
E(Z) = E(h(Y )) ≈ 100 µ̂1 8.355 8.389 8.359
2
V ar(Z) = V ar(h(Y )) ≈ 10000 σ̂µ̂4 0.558 1.985 4.627

Afslutningsvis bemærkes at η og µ i dette eksempel er et udtryk for


October 18, 2001 Mixed Models Course 56

347
19 Heterogeneous Variance

det samme nemlig benzinøkonomien.

Gennem transformationen af data zi = 100/yi fås at zi er utrykt i


“liter pr. 100 km”, hvilket også bliver enheden for E(Z) = η.

Enheden for µ er “km. pr. liter”, og derfor er enheden for 100/µ


“liter pr. 100 km”.

Man kan derfor diskutere hvorvidt 100/µ eller η er den relevante


størrelse. De estimeres forskelligt, den første som 100 gange et
reciprokt gennemsnit og den anden som 100 gange gennemsnittet at

October 18, 2001 Mixed Models Course 57

reciprokke data:

1 X −1
100/µ̂ = 100( yi )
n i
1X 1X 1
η̂ = zi = 100
n i n i yi

Beslutter man sig for at enheden “liter pr. 100 km” er den relevante
størrelse, så har vi altså to måder at få den frem på: Enten som et
gennemsnit af transformerede data eller som en transformation af
middelværdien af de oprindelige data.

October 18, 2001 Mixed Models Course 58

348
Transformation og konfidensintervaller

Antag at de observerede data er y1, . . . , yn og at disse f.eks. for at


opnå varianshomogenitet er transformeret til z1, . . . , zn med
transformationen h, dvs. zi = h(yi).

På den transformerede skala er der udført en statistisk analyse. Lad


θ være den størrelse vi er interesserede i. På baggrund af (de
transformerede) data fås et estimat θ̂, for θ samt et estimat σ̂θ for
spredningen på θ̂.

F.eks. kunne θ være hældningen i en lineær regression

Zi = α + θxi + i.
October 18, 2001 Mixed Models Course 59

Generelt er et (1 − α) konfidensinterval for θ givet ved to stokastiske


variable Zlav og Zhøj sådan at sandsynligheden for at θ ligger i
intervallet [Zlav , Zhøj ] er 100(1 − α)%.

I mange klassiske lineære modeller beregnes et (1 − α)


konfidensinterval som

Ẑlav = θ̂ − t1− α2 (d)σ̂θ


Ẑhøj = θ̂ + t1− α2 (d)σ̂θ

hvor t1− α2 (d) er 1 − α2 –fraktilen i en t–fordeling med d frihedsgrader.

Hvis f.eks. θ er hældningen i en regression som ovenfor så udtrykker


θ den forventede tilvækst på Z når x øges med een enhed.

Ofte er man interesseret i at undersøge udtrykke den forventede


October 18, 2001 Mixed Models Course 60

349
19 Heterogeneous Variance

tilvækst af Y altså på den originale skala når x øges med een enhed.
Populært sagt, vil man udtrykke θ “på den oprindelige skala”.

Dette gøres ofte ved følgende. Lad h−1 være den omvendte funktion
til h. Da lader man h−1(θ) være et udtryk for θ “på den oprindelige
skala”.

Man anvender derfor h−1 på den estimerede værdi θ̂, hvilket giver
η̂ = h−1(θ̂). Konfidensgrænserne på den transformerede skala kan
også transformeres tilbage med h−1:

Hvis h er strengt voksende da er

Ŷlav = h−1(Ẑlav )
Ŷhøj = h−1(Ŷhøj )
October 18, 2001 Mixed Models Course 61

og hvis h er strengt aftagende, så er

Ŷlav = h−1(Ẑhøj )
Ŷhøj = h−1(Ŷlav )

Hvis [Ẑlav , Ẑhøj ] er et 100(1 − α)% konfidensinterval for θ da er


[Ŷlav , Ŷhøj ] er et 100(1 − α)% konfidensinterval for h−1(θ).

Bemærk: [Ẑlav , Ẑhøj ] er symmetrisk omkring θ̂ men [Ŷlav , Ŷhøj ] er


IKKE generelt symmetrisk omkring h−1(θ̂).

Hvis h er approximativt lineær, da her h−1 ligeså, og i det tilfælde


bliver [Ŷlav , Ŷhøj ] tilnærmelsesvist symmetrisk omkring h−1(θ̂).

Et alternativ til ovenstående er følgende: Middelværdi og varians på


October 18, 2001 Mixed Models Course 62

350
den transformerede skala er tilnæmelsesvis givet ved

E(Z) = E(h(Y )) ≈ = h(E(Y ))


V ar(Z) = V ar(h(Y )) ≈ h0(E(Y ))2V ar(Y ).

Man kan nu løse disse ved hjælp af h−1. Man får

E(Y ) ≈ h−1(E(Z))
V ar(Z) V ar(Z)
V ar(Y ) ≈ 0 2
= 0 −1 .
[h (E(Y ))] [h (h (E(Z)))]2

Disse resultater kan anvendes på parameteren θ, som vi er


October 18, 2001 Mixed Models Course 63

interesseret i. Man får da

η̂ = h−1(θ̂)
σ̂θ
σ̃η =
|h0(η̂)|

Det er nu fristende at udregne konfidensgrænser for h−1(θ) som

Ỹlav = η̂ − t1− α2 (d)σ̃η̂


Ỹhøj = η̂ + t1− α2 (d)σ̃η̂

Dette interval bliver symmetrisk omkring η̂.

Der er dog ikke såvidt vides gode formelle argumenter for at kalde
October 18, 2001 Mixed Models Course 64

351
19 Heterogeneous Variance

[Ỹlav , Ỹhøj ] for et 100(1 − α)% konfidensinterval for h−1(θ). Derfor


anbfeales generelt [Ŷlav , Ŷhøj ]

Det vil dog i nogle tilfælde være tilfældet at [Ỹlav , Ỹhøj ] og


[Ŷlav , Ŷhøj ] faktisk ligner hinanden meget.

Dette sker hvis varitionen i datamaterialet er lille. Indenfor et


snævert interval han h da betragtes som nogenlunde lineær, hvorved
ovennævnte approximationer bliver gode.

October 18, 2001 Mixed Models Course 65

Eksempel: Antag at data er transformeret som

zi = h(yi)

På baggrund af de transformerede data laves en regression

Zi = α + βxi + i

Vi er interesserede i et konfidensinterval for h−1(β).

På baggrund af data estimeres β̂ = 0.25 of σ̂β = 0.03.

Vi vil nu sammenligne to måder at beregne intervallerne på. For


argumentets skyld skal vi gennemføre tilsvarende beregninger for
σ̂β = 0.06 og σ̂β = 0.09.
October 18, 2001 Mixed Models Course 66

352
For simpelhedens skyld antager vi at der er så mange observationer
at t fordelingen ligner en normalfordeling. Dermed bliver
t1− α2 (d) ≈ 1.96 for α = 0.05.

Bemærk først at

p
h(y) = (y) = y 1/2 hvormed
h−1(y) = y 2 og
1
h0(y) = √ .
2 y

og at η̂ = h−1(β̂) = 0.0625 samt at h0(η̂) = 2 (regn selv efter)!.


October 18, 2001 Mixed Models Course 67

For σ̂β = 0.03 fås nu

Ẑlav = β̂ − 1.96σ̂β = 0.19


Ẑhøj = β̂ + 1.96σ̂β = 0.31

Transformeres disse grænser tilbage ved h−1 fås

Ŷlav = 0.192 = 0.0361


Ŷhøj = 0.312 = 0.0961

der ikke er symmetrisk omkring η̂ = h−1(β̂) (men næsten!).


October 18, 2001 Mixed Models Course 68

353
19 Heterogeneous Variance

Under den anden metode skitseret ovenfor skal vi beregne

σ̂β
σ̃η =
|h0(η̂)2|
σ̂β
= = 0.015
2
idet σ̂β = 0.03. Vi får nu

Ỹlav = η̂ − 1.96σ̃η = 0.0331


Ỹhøj = η̂ + 1.96σ̃η = 0.0919.

Vi ser altså at intervallerne [Ỹlav , Ỹhøj ] og [Ŷlav , Ŷhøj ] ligner


hinanden meget.

October 18, 2001 Mixed Models Course 69

For σ̂β = 0.06 gennemføres helt analoge beregninger og vi finder


σ̃η = 0.06/2 = 0.03. Dermed fås

Ẑlav = β̂ − 1.96σ̂β = 0.13


Ẑhøj = β̂ + 1.96σ̂β = 0.37
Ŷlav = 0.132 = 0.0175
Ŷhøj = 0.372 = 0.1351
Ỹlav = η̂ − 1.96σ̃η = −0.0037
Ỹhøj = η̂ + 1.96σ̃η = 0.1213.

Vi ser nu at intervallerne [Ỹlav , Ỹhøj ] og [Ŷlav , Ŷhøj ] bliver mere


forskellige.

October 18, 2001 Mixed Models Course 70

354
20 Variansheterogeneity: Example of effect of
transformation

This lecture illustrates the consequence of transformation, based on an analysis of an experiment


investigation the effect of feed concentration on muscle content of a certain ingredient.
Transformation back to the original scale is discussed, both related to the mean level and to
estimates of treatment effects.
Finally, examples are shown of different scales for usual production traits within animal produc-
tion.
Link to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/VariansHetero.f.pdf

355
20 Variansheterogeneity: Example of effect of transformation

Variance homogeneity

• Variance homogeneity

• Transformation as a solution

• Effect of back-transformation.

12. oktober 2001

Variance homogeneity

Yij = µ + αi + εij

εij ∼ N (0, σ 2)

Variance homogeneity implied by missing suffix

12. oktober 2001

356
3

Variance homogeneity

Herd no. Herd type Observations Herd average


A 1 10 12.3
B 2 10 13.6
C 1 10 10.2
D 2 10 15.0

12. oktober 2001

Variance homogeneity

Herd no. Herd type Observations Herd average


A 1 100 12.3
B 2 100 13.6
C 1 1 10.2
D 2 1 15.0
Weigh according to precision in measurements

12. oktober 2001

357
20 Variansheterogeneity: Example of effect of transformation

Variance of an average

nobs
1 X
Ȳ = Yi
nobs i
1
V(Ȳ ) = σY2
nobs
The magnitude of variance inhomogeneity can be assessed by
using this as an analogue.

12. oktober 2001

Example

A certain ingredient is added to the feed ration in the


concentration x, x ∈ {0.0, 4.4, 6.2, 9.3}. The pigs are fed with the
rations until 60 kg. Biopsies are made at 60 kg. Concentration of
the feed ingredient in the biopsy is measured. Let yi denote the
concentration of the ingredient in animal i.

12. oktober 2001

358
7

Mean curve
5
y, Muscle conc., 60 kg
4
3
2
1
0

0 2 4 6 8
x, Feed contents
12. oktober 2001

Transformation ?
0.4

−1.5
0.3

Log(Variance)
Variance
0.2

−2.5
0.1

−3.5

0.5 1.0 1.5 2.0 2.5 3.0 3.5 −1.0 −0.5 0.0 0.5 1.0
Mean Log(Mean)

12. oktober 2001

359
20 Variansheterogeneity: Example of effect of transformation

Model of expectations

E(y) = µ + αi

E( y) = µ + αi ⇒ E(y) = µ2 + αi2 + 2µαi
E(log(y)) = µ + αi ⇒ E(y) = exp(µ) exp(αi)

12. oktober 2001

10

Curve tting

E(y) = µ + β1x + β2x2



E( y) = µ + β1x + β2x2
E(log(y)) = µ + β1x + β2x2

12. oktober 2001

360
11

Model comparison


Dependent variable y y
Parameter Estimate P-value Estimate P-value
β1 0.438 0.081∗∗∗ 0.242 0.026∗∗∗
β2 -0.007 0.008 -0.010 0.003∗∗

12. oktober 2001

12

Sqrt transformed
sqrt(y), Muscle conc., 60 kg

5
2.0

y, Muscle conc., 60 kg
4
1.5

3
1.0

2
1
0.5

0 2 4 6 8 0 2 4 6 8
x, Feed contents x, Feed contents

12. oktober 2001

361
20 Variansheterogeneity: Example of effect of transformation

13

Comparisons
5

5
y, Muscle conc., 60 kg

y, Muscle conc., 60 kg
4

4
3

3
2

2
1

1
0

0 2 4 6 8 0 2 4 6 8
x, Feed contents x, Feed contents

12. oktober 2001

14

Treatment differences

Very often we are inter-


sqrt(y), Muscle conc., 60 kg

ested in estimating treat-


2.0

ment differences, α1 − α2.


1.5

In SAS we may use PDIFF


option in LSMEANS, or
1.0

ESTIMATE.
0.5

How do we transform ??
0 2 4 6 8
x, Feed contents

12. oktober 2001

362
15

Conclusion

• Transformations may achieve variance homogeneity

• Transformations changes the model of the mean

• Back transformations of expected values OK

• Back transformations of general estimable functions may cause


problem

12. oktober 2001

16

Natural scales ?

• Geometric cell-count

• Daily gain vs Age at slaughter

• Feed utilisation FU/Gain vs. Gain/FU

• Calvings per cow year vs. Calving interval.

• Feeding interval vs. Feeding frequency

12. oktober 2001

363
20 Variansheterogeneity: Example of effect of transformation

364
21 Variance Homogeneity: Diurnal Variation

The purpose of this lecture was to illustrate the application and combination of some of the
advanced topics presented during the course.
A data set consisting of half-hourly observations of cortisol release in pigs was analysed using a
random regression model to capture the individual difference between pigs in diurnal variation.
The power-of-mean approach was used to model the variance heterogeneity.
The application of such a model requires iterative use of PROC MIXED
The experience with the model was that it was possible to estimate the model parameters, but
that it was necessary to ’nudge’ the procedure to secure convergence of the iterative calculations,
and that the calculations were very time-consuming. At the current of state-of-the art the
application of such models is not a routine matter.
Link to full-screen presentation1

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/PowerOfMean.f.pdf

365
21 Variance Homogeneity: Diurnal Variation

Example

In an experiment pigs were assigned to two different treatments on


order to study the effect of the treatment on the diurnal release of
cortisol. Cortisol were sampled continuously in a period of
approximately 24 hours for each animal.

Yijlm = µ+αi+Aij +(β1+B1j ) cos( 2π 2π


24 tijk )+(β2 +B2j ) sin( 24 tijk )+εijk

where Yijk is the logarithmic transformed plasma cortisol cortisol, µ


general mean, αi effect of treatment, Aij random effect of animal j
within treatment i.
May 2, 2001 1

cos( 2π 2π
24 tijk ) and sin( 24 tijk ) are covariates for estimation of the
diurnal variation. βk and Bkj are corresponding regression
parameters. βk a systematic effect and, Bkj a random deviation
from the line. The random effects (Aij , B1k , B2k )> ∼ N 3(0, V ),
where V is a 3 × 3 variance matrix. εijk ∼ N (0, σ 2)

May 2, 2001 2

366
Random regression model

The model is a random regression model and can be estimated using


the following SAS statements

*Initial model ;
data a ;
....
PI=3.141593 ;
sint=sin(time*2*pi/24) ;
cost=cos(time*2*pi/24) ;

proc mixed CL data=a ;


class beh dyr ;
model Logcort = beh sint cost /ddfm=satterth ;
random intercept sint cost / subject=dyr*kuld*beh type=un ;

May 2, 2001 3

Resultat eksempler

Dyrnr: 17111 Dyrnr: 31111 Dyrnr: 35111


6.0

6.0

6.0
log(Cortisol)

log(Cortisol)

log(Cortisol)
5.0

5.0

5.0
4.0

4.0

4.0
3.0

3.0

3.0

15 20 25 30 35 15 20 25 30 35 15 20 25 30 35
Timer Timer Timer

May 2, 2001 4

367
21 Variance Homogeneity: Diurnal Variation

Model of Mean ?

exp(Xβ) =exp(µ + αi + Aij + (1)


(β1 + B1j ) cos( 2π 2π
24 tijk ) + (β2 + B2j ) sin( 24 tijk )) (2)

May 2, 2001 5

Modelling variance inhomogeneity

Logarithmic transform of cortisol were used because the variance


increased with the mean. Another approach to model this increase
directly.
Using the so-called power of mean method, we use the measured
cortisol level directly, but instead of homogenous variance we assume

εijk ∼ N (0, σn2 |Xβ|δ )

and estimate σn2 and δ.


In order to do this it is neccessary to perform the calculations with
PROC MIXED iterativly.
May 2, 2001 6

368
SAS Model

*Initial model ;
proc mixed CL data=a ;
class kuld beh dyr ;
model cortisol = beh sint cost /ddfm=satterth s;
random intercept sint cost / subject=dyr*kuld*beh type=un ;
repeated / subject=dyr*kuld*beh local ;
ods output SolutionF=sf ;
ods output Covparms=cp ;
run;

May 2, 2001 7

* Loop ;
proc mixed CL data=a maxiTER=100 CONVH=1e-8;
class kuld beh dyr ;
model cortisol = beh sint cost /ddfm=satterth s;
random intercept sint cost /
subject=dyr*kuld*beh type=un s ;
repeated /local=pom(sf) ;
parms /pdata=cp ;
ods output SolutionF=sf1 ;
ods output SolutionR=Coeff ;
ods output Covparms=cp1 ;
run ;

proc compare brief data=sf compare=sf1 ;


var estimate ;
run;
data sf ; set sf1 ;
data cp ; set cp1 ;
run;

May 2, 2001 8

369
21 Variance Homogeneity: Diurnal Variation

Experience

• δ was estimated as 3.10, indicating that logarithmic may not be


1
sufficient to obtain variance homogeneity (y − 2 )

• Estimation of a single model run much longer with pom

• It was necessary to adjust convergence criteria to obtain


convergence

• Approx. 10 iterations needed.

May 2, 2001 9

370
22 Links to supplementary material

In order to illustrate the underlying principles in linear algebra it was necessary to introduce
a method for performing the calculations. For that purpose the IML procedure of SAS was
introduced using the small program in ImlExample.sas1
Several SAS macros were introduced for performing standard calculations, e.g., a SAS macro
for calculation of autocorrelations2 . The biometry research unit has further SAS macros and
examples on this web-page3 .
The book used for the course, LMSW (Littell et al., 1996), contains a series of program examples.
These examples may be downloaded from SAS institutes home pages, but can be found here 4
as well. Another important link is the SAS online manual5
Finally, most of the course participants used Word for text processing and SAS for making graphs.
To get these two programs to interact satisfactorily was clearly a problem. Therefore a short
note Eksport af grafer fra SAS til Word 6 were made, and references made to SAS tech. report
ts252x7 were the export facilities are discussed in detail.

1
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/ImlExample.sas
2
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS/autocorr.sas
3
http://www.jbs.agrsci.dk/Biometri/SASmateriale/SASmateriale.html
4
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS/sasmixed.sas
5
http://dokumentation.agrsci.dk/sasdocv8/sasdoc/sashtml/onldoc.htm
6
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS2Word.pdf
7
http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/ts252x.pdf

371
22 Links to supplementary material

372
Bibliography

Littell, R.C., G.A. Milliken, W.W. Stroup, & R.D. Wolfinger (1996). SAS System for Mixed
Models. SAS Institute, Inc., Cary, NC.

373

Anda mungkin juga menyukai