Measuring Health A Review of Quality of Life Measurement Scales

Downloaded by [ Faculty of Nursing, Chiangmai University 5.62.156.86] at [07/18/16]. Copyright © McGraw-Hill Global Education Holdings, LLC.
Not to be redistributed or modified in any way without permission.

A review of quality of life measurement scales
HEALTH
Measuring
Ann Bowling
third edition
a n
Downloaded by [ Faculty of Nursing, Chiangmai University 5.62.156.86] at [07/18/16]. Copyright © McGraw-Hill Global Education Holdings, LLC. Not to be redistributed or modified in any way without permission.
M E A S U R I N G H E A LT H
Third Edition
Ann Bowling
M E A S U R I N G H E A LT H
A REVIEW OF
QUALITY OF LIFE
MEASUREMENT
SCALES
Third Edition
Open University Press

Open University Press
McGraw-Hill Education
McGraw-Hill House
Shoppenhangers Road
Maidenhead
Berkshire
England
SL6 2QL
email: enquiries@openup.co.uk
world wide web: www.openup.co.uk
and
Two Penn Plaza
New York, NY 10121–2289, USA
First published 2005
Copyright © Ann Bowling 2005
All rights reserved. Except for the quotation of short

passages for the purposes of criticism and review, no
part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording
or otherwise, without the prior permission of the
publisher or a licence from the Copyright Licensing
Agency Limited. Details of such licences (for
reprographic reproduction) may be obtained from the
Copyright Licensing Agency Ltd of 90 Tottenham
Court Road, London, W1T 4LP.
A catalogue record of this book is available from

the British Library
ISBN 0 335 21527 0 (pb) 0 335 21528 9 (hb)
Library of Congress Cataloging-in-Publication Data

CIP data applied for
Typeset by RefineCatch Ltd, Bungay, Suffolk

Printed in the UK by Bell & Bain Ltd, Glasgow
CONTENTS
PREFACE TO REVISED EDITION ix
LIST OF ABBREVIATIONS xi
1 CONCEPTS OF FUNCTIONING, HEALTH, WELL-BEING AND

QUALITY OF LIFE 1
Measuring health outcome 1

The concept of functional ability 3
The concept of positive health 4
The concept of social health 5
The concept of subjective well-being 6
The concept of quality of life 7
2 THEORY OF MEASUREMENT 10
Choice of health indicator 10

Measurement theory 10
Measurement problems 11
Factor structure 13
Types of instruments 14
Self-report measures 15
Scaling item responses 15
Weighting scale items 16
Utility rating scales 16
vi CONTENTS
3 MEASURING FUNCTIONAL ABILITY 19
The Older Americans’ Resources and Services Schedule (OARS): Multi-Dimensional

Functional Assessment Questionnaire (OMFAQ) 21
The Stanford Arthritis Center Health Assessment Questionnaire (HAQ) 23
The Arthritis Impact Measurement Scales (AIMS) 26
The Index of Activities of Daily Living (ADL) 28
Townsend’s Disability Scale 30
The Karnofsky Performance Index (KPI) 32
The Barthel Index 34
London Handicap Scale (LHS) 36
The Quality of Well-Being Scale (QWBS) 37
The Crichton Royal Behaviour Rating Scale (CRBRS) 39
The Clifton Assessment Procedures for the Elderly (CAPE) 41
4 MEASURING BROADER HEALTH STATUS 43
The Sickness Impact Profile (SIP) 45

The Nottingham Health Profile (NHP) 49
The McMaster Health Index Questionnaire (MHIQ) 53
The Rand Health Insurance/Medical Outcomes Study Batteries 55
Physical Health Battery 56
Mental Health Battery 57
Depression Screener 58
The Social Health Battery 59
Social Support Scale 60
General Health Perceptions Battery 62
The Short Form-36 Health Survey Questionnaire (SF-36) 63
The Short Form-12 Health Survey Questionnaire (SF-12) and Health Status
Questionnaire-12 (HSQ-12) 68
The Short Form-8 Health Survey Questionnaire (SF-8) 69
The Dartmouth COOP Function Charts 69
The Cornell Medical Index (CMI) 71
The McGill Pain Questionnaire (MPQ) 74
The EuroQol (EQ-5D) 75
5 MEASURING PSYCHOLOGICAL WELL-BEING 78
Depression 78
Zung’s Self-Rating Depression Scale 79
Montgomery-Asberg Depression Rating Scale (MADRS) 81
Hamilton Depression Rating Scale 83
The Beck Depression Inventory (BDI) 85
Hospital Anxiety and Depression Scale (HADS) 88
CONTENTS vii
Goldberg’s General Health Questionnaire (GHQ) 90
State–Trait Anxiety Inventory (STAI) 94
The Geriatric Depression Scale (GDS) 95
The Geriatric Mental State (GMS) 97
Short Mental-Confusion Scales 98
The Mental Status Questionnaire (MSQ) 98
The Abbreviated Mental Test Score (AMTS) 99
6 MEASURING SOCIAL NETWORKS AND SOCIAL SUPPORT 101
Social-network analysis 101

Social capital 103
Methods of measurement of social networks and social support 104
Available measures of social network and support 105
Inventory of Socially Supportive Behaviours (ISSB) 106
Arizona Social Support Interview Schedule (ASSIS) 108
Perceived Social Support from Family and Friends 109
Social Support Questionnaire 110
Interview Schedule for Social Interaction (ISSI) 111
The Social Network Scale (SNS) 113
The Lubben Social Network Scale (LSNS) 114
The Family Relationship Index (FRI) 116
The Social Support Appraisals Scale (SS-A) and the Social Support Behaviours Scale (SS-B) 117
Interpersonal Support Evaluation List (ISEL) 119
The Network Typology: The Network Assessment Instrument 120
Loneliness 122
The Revised University of California at Los Angeles (UCLA) Loneliness Scale 122
7 MEASURING THE DIMENSIONS OF SUBJECTIVE WELL-BEING 125
Happiness 126
Life satisfaction 127
Morale 127
Self-esteem and self-concept 128
Sense of coherence 128
Global measures 128
The Life Satisfaction Index A (LSIA) and Index B (LSIB) 129
The Life Satisfaction Index Z 13-Item Version (LSIZ) 131
The Affect-Balance Scale (ABS) 132
The Philadelphia Geriatric Center Morale Scale (PGCMS) 134
Delighted–Terrible Faces (D–T) Scale 135
Satisfaction with Life Scale (SWLS) 137
Scales of Psychological Well-Being (PWB) 138
The (Psychological) General Well-Being Schedule (GWBS) 140
Sense of Coherence Scale (SOC) 141
viii CONTENTS
Scales of Self-Esteem 143
The Self-Esteem Scale 143
The Tennessee Self-Concept Scale 144
Coopersmith Self-Esteem Inventory 146
8 MEASURES OF BROADER QUALITY OF LIFE 148
The WHOQOL 149

LEIPAD Questionnaire 152
CASP-19 154
Quality of Life Questionnaire (QLQ) 155
Linear Analogue Self-Assessment (LASA) 156
Spitzer’s Quality of Life Index (QL Index) 157
Individualized measures of quality of life 160
Schedule for the Evaluation of Individual Quality of Life (SEIQoL) 160
Patient Generated Index (PGI) 162
APPENDIX: A SELECTION OF SCALE DISTRIBUTORS AND USEFUL ADDRESSES 165
REFERENCES 168
INDEX 209
PREFACE TO
REVISED EDITION
Since the first two editions of Measuring Health were This is because, while it was a popular tool in the
published, the interest in patient-based evaluation UK for use with older populations, it is no longer
of health care has continued. This third edition used as a stand alone instrument, and is used mainly
has been based on extensive searches of internet in the USA with the SCL-90 subscale on anxiety
sites for measurement scales, as well as Medline and (the DSSI/sAD) (Bedford et al. 1976; Bedford and
Psychinfo. It would not be feasible in a book of Deary 1997).
this size to include reference to all those published While this volume includes a wide selection of
studies utilizing a specific scale. The studies which popularly used generic and domain specific meas-
have been referenced include those that have pro- urement scales, there are still inevitable omissions.
vided information on the psychometric properties For example, the various scales of stress and coping
of an instrument, or applied the measure to a dif- ability have been omitted (some adjustment and
ferent population group. coping scales were reviewed in Bowling’s (2001)
Most investigators who aim to measure health- Measuring Disease). This is a complex area and inter-
related quality of life in generic terms have con- ested readers are referred to Kasl and Cooper
tinued to use broader health status scales as proxy (1987), Maes et al. (1987) and Cooper and Kasl
measures, with some justifying this with reference (1995); electronic searches are recommended for
to their overlapping domains. But given the updated information on any instruments of interest.
increasing interest in conceptual clarification, and Copyright permission was sought to reproduce
in measuring broader quality of life, an additional the items from the scales reproduced in this volume,
chapter on these measures has been included in this and the author is grateful to the scale developers
edition. Additional, popular measures of func- and distributors for their consent. Some scales are
tioning, anxiety, life satisfaction and well-being, and too long to reproduce in full here, and others
broader health status have also been included in the are strictly copyrighted and only available com-
remaining chapters. mercially (which prevents reproduction of full
Readers will note the continued inclusion of scales). It was, as before, decided to aim for con-
the Cornell Medical Index, despite its withdrawal. sistency and only reproduce a selection of scale
This is to provide a historical record, and also items as examples, except where scales are very
because the owners of this instrument have not brief. Potential users are advised to consult the
ruled out revising, updating and re-releasing it in authors of scales (where a contact address has
the future, and enquiries about it are still received. been given) before use in order to avoid copy-
One scale that has been removed, however, is the right infringement. Informed use also assists
Symptoms of Anxiety and Depression Scale (SAD). authors of scales to compile bibliographies of users
x PREFACE TO REVISED EDITION
and results, and handbooks or user guides may be disease-specific quality of life research instru-
available. ments has facilitated easier access to information
An updated selection of useful addresses has been about popular scales and their psychometric
incorporated at the end of this revised volume in properties (for example, Spilker et al. 1990, 1992a,
order to assist readers. Where a contact address has 1992b; Mulder and Sluijs 1993; Hubanks and
not been given for a particular scale, it is usually Kuyken 1994; Berzon et al. 1995; McDowell
because there is no current contact for that scale, and Newell 1996; Spilker 1996; Salek 1999;
and readers should consult the references to the Bowling 2001). Overviews of relevant research
scale, as well as conduct electronic searches, for methods and terminology can be found in the
further details. Where no address has been given author’s Research Methods in Health (2002) and
for a scale, the scale has usually been reproduced Earl-Slater’s (2002) Handbook on Clinical Trials.
in full in one of the key publications listed in the Potential users of measures are advised to contact
references (e.g. many of the older social support the developer, current copyright owner, or distri-
scales have been reproduced in this way). butor, or failing this, approach key users of the scale
The current addresses of scale developers and for information.
distributors are not always easy to trace, although Finally, as before, section summaries recom-
web-based information is increasingly helpful in mending a ‘best buy’ have not been included as
locating them. I would again like to repeat a plea each scale has its strengths and weaknesses, most
made by Wilkin et al. (1992) that authors should be scales are being continually developed or tested, and
encouraged to publish their scales in full in key ultimately the choice of which scale to use is
journal publications or, where copyright, length or dependent on the aims of the study, its population
commercial reasons prevent this, to, at minimum, type and the judgement of the investigators. While
publish the address of the scale distributor and open a number of the scales have not been subjected to
an internet site providing up-to-date details. A full or adequate psychometric testing, or they
word of caution is also needed when undertaking appear to require further refinement or develop-
web-based searches – scales uploaded onto websites ment, it should be remembered that testing can be
are not always the version in current use, and are expensive and can take many years. Investigators are
sometimes uploaded onto unmaintained sites by encouraged to include routine tests for reliability
investigators other than the developer. and validity in order to develop the knowledge
The publication of bibliographies and reviews bases of the scales they have selected, and to inform
of broader health status and health-related/ the scale owners of their results.
LIST OF
ABBREVIATIONS
ABS Affect-Balance Scale FIM+FAM Functional Independence Measure

ADDQoL Audit of Diabetes Dependent Quality and Functional Assessment Measure
of Life FLP Functional Limitations Profile
ADL Activities of daily living FRI Family Relationship Index
AIMS Arthritis Impact Measurement Scales GDS Geriatric Depression Scale
AMTS Abbreviated Mental Test Score GHQ General Health Questionnaire
APA American Psychiatric Association GHRI General Health Rating Index
ASSIS Arizona Social Support Interview GHS General Household Survey
Schedule GMS Geriatric Mental State
AUC area under the curve GWBS (Psychological) General Well-Being
BDI Beck Depression Inventory Schedule
CAL chronic airflow limitation HADS Hospital Anxiety and Depression Scale
CAMDEX Cambridge Examination for Mental HANES Health and Nutritional Examination
Disorders of the Elderly Survey
CAPE Clifton Assessment Procedures for the HAQ (Stanford Arthritis Center) Health
Elderly Assessment Questionnaire
CARE Comprehensive Assessment and HIS Health Insurance Study (Rand)
Referral Evaluation HSQ-12 Health Status Questionnaire-12
CASP-19 Control, Autonomy, Self-realisation HUI3 Health Utilities Index mark 3
and Pleasure 19-item questionnaire IADL Instrumental activities of daily
CES-D Center for Epidemiologic Studies living
Depression Scale ICD International Classification of
CMI Cornell Medical Index Diseases
CRBRS Crichton Royal Behaviour Rating ICF International Classification of
Scale Functioning, Disability and Health
DIS Diagnostic Interview Schedule IQOLA International Quality of Life
D–TFS Delighted–Terrible Faces Scale Assessment
DSM Diagnostic and Statistical Manual ISEL Interpersonal Support Evaluation List
EQ-5D EuroQol ISSB Inventory of Socially Supportive
ERSS Edinburgh Rehabilitation Status Behaviours
Scale ISSI Interview Schedule for Social
FAI Functional Assessment Inventory Interaction
xii LIST OF ABBREVIATIONS
KPI Karnofsky Performance Index QALYs quality adjusted life years
LASA Linear Analogue Self Assessment QL Index
(Spitzer) Quality of Life Index
LEIPAD LEIden and PADua quality of life QLQ Quality of Life Questionnaire
questionnaire QoL quality of life
LHS London Handicap Scale QWBS Quality of Well-Being Scale
LSIA Life Satisfaction Index A RA rheumatoid arthritis
LSIB Life Satisfaction Index B RADS Reynolds Adolescent Depression
LSIZ Life Satisfaction Index Z Scale
LSNS Lubben Social Network Scale RCT randomized controlled trial
MACTAR McMaster-Toronto Arthritis and ROC receiver operating characteristic
Rheumatism SEIQoL Schedule for the Evaluation of
MADRS Montgomery-Asberg Depression Individual Quality of Life
Rating Scale SFMPQ short version of McGill Pain Questionnaire
MCS-36 Mental Component Summary Score SF-36 Short Form-36
MHIQ McMaster Health Index SF-12 Short Form-12
Questionnaire SF-8 Short Form-8
MOS Medical Outcomes Study (Rand) SIP Sickness Impact Profile
MOT Medical Outcomes (Study) Trust SNS Social Network Scale
MPQ McGill Pain Questionnaire SOC Sense of Coherence Scale
MSE Mental State Examination SS-A Social Support Appraisals Scale
MSQ Mental Status Questionnaire SS-B Social Support Behaviours Scale
NHP Nottingham Health Profile STAI State-Trait Anxiety Inventory
OARS Older Americans’ Resources and STAI-Y State-Trait Anxiety Inventory – Form Y
Services Schedule SWLS Satisfaction with Life Scale
OMFAQ OARS Multi-Dimensional TAPS Team for the Assessment of
Functional Assessment Questionnaire Psychiatric Services
ONS Office for National Statistics UCLA University of California at
PCS-36 Physical Component Summary Score Los Angeles
PGCMS Philadelphia Geriatric Center Morale VAS Visual Analogue Scale
Scale WHO World Health Organization
PGI Patient Generated Index WHOQOL World Health Organization Quality
PGWB Psychological General Well-Being of Life
Index or Schedule WONCA World Organization of Family
PWB Scales of Psychological Well-Being Doctors
1
CONCEPTS OF
FUNCTIONING, HEALTH,
WELL-BEING AND
QUALITY OF LIFE
Researchers in health and social care are increas- can be the result of pathological abnormality, but
ingly focused on the measurement of the outcomes not necessarily so. A person can feel ill without
of service provision and interventions. The con- medical science being able to detect disease.
ceptualization and methods of measurement of out- Measures of health status need to take both con-
comes is still controversial, although there is general cepts into account. What matters in the twenty-first
recognition that meaningful measures of people’s century is how the patient feels, rather than how
health status and quality of life should be used. professionals think they feel. Symptom response
In social care, the measurement of the effective- or survival rates are no longer enough; and, par-
ness of services was relatively limited for many ticularly where people are treated for chronic or
years. Attention was on performance and activity life-threatening conditions, the therapy has to be
indicators (inputs and processes of care), rather evaluated in terms of whether it is more or less
than the outcomes for service users. But there was likely to lead to an outcome of a life worth living in
increasing concern that, for example, the number social and psychological, as well as physical, terms.
of hours of home care allocated to clients did Moreover, there are multiple influences upon
not indicate how effectively their needs had been patient outcome, and these require a broad model
met (Nocon and Qureshi 1996). Changes in com- of health to incorporate them. The non-biological
munity care arrangements in the UK during the factors which can affect recovery and outcome
1990s led to an increasing emphasis on service include patient psychology, motivation, adherence
monitoring, setting service objectives, measuring to therapy, coping strategies, access to health care,
people’s needs and met needs, and to the broaden- social support networks, individual values, cultural
ing of indicators of outcomes to clients’ concerns, beliefs, health behaviours and socio-economic
rather than just, for example, measuring their status. These are recognized in research on health
physical functioning. behaviour (Becker 1974), and require recognition
In health care, where clinical interventions are in other studies which aim to understand the factors
usually more specific and invasive, outcome assess- influencing service outcomes.
ment has a long tradition. Most existing clinical
indicators reflect a ‘disease’ model. The ‘disease’
model is a medical conception of pathological MEASURING HEALTH OUTCOME
abnormality which is indicated by signs and symp-
toms. But a person’s ‘ill health’ is indicated by In order to measure health outcomes a measure of
feelings of pain and discomfort or perceptions of health status is required which needs to be based on
change in usual functioning and feeling. Illnesses a concept of health. The limitations of the widely
2 MEASURING HEALTH
used negative definition of health as the absence of exponential increase in the additional use of indica-
disease, and the unspecific World Health Organiza- tors of broader health status and health-related
tion’s (1948a, 1948b) definition of health as total quality of life over the past two decades, particularly
physical, mental and social well-being, have long in trials of cancer, arthritis and rheumatology treat-
been recognized. In the absence of satisfactory ments, although there has been relatively little
definitions of health, the question is: how should standardization of measurement instruments
the outcome of interventions and services be (Garratt et al. 2002). But most measures of health
measured? status, broader health status and health-related
One commonly used source of information quality of life still take health and life quality as a
about the functioning of health services is routinely starting point and measure deviations away from it
collected data about processes and outcomes – (deteriorating health and quality of life), rather than
health service use information. For example, the USA also encompassing gradations of healthiness and
relies heavily on health insurance data for informa- good quality of life. A perspective which captures
tion about service use, the UK relies on routinely the positive end of the spectrum is required to
collected information from the National Health create a balance and a less skewed perspective. In
Service about deaths and discharges from hospital contrast to the emphasis on negative health, Merrell
by the patients’ diagnosis, surgical procedures and Reed (1949) proposed a graded scale of
performed, socio-demographic details, and geo- health from positive to negative health. On such a
graphical area. All routinely collected data about scale people would be classified from those who
usage are subject to problems of inaccuracy. Also, are in top-notch condition with abundant energy,
indicators of service use reflect the policies and through to people who are well, to fairly well, down
practices of service providers and their resources, to people who are feeling rather poorly, and finally
and provide no information about the impact of to the definitely ill. The word ‘health’ rather than
treatment on the patient’s life. Service-use rates ‘illness’ was chosen deliberately to emphasize the
also reflect people’s ‘illness behaviour’ (the extent positive side of this scale. Despite the existence of
to which people perceive, react to, and act upon single item ranking scales asking people to rate their
their symptoms of ill health), which varies by their health from ‘Excellent’ to ‘Very poor’, the develop-
socio-demographic and cultural characteristics ment of a broader health status scale along such a
(Mechanic 1962, 1978; Kasl and Cobb 1966; continuum is still awaited.
Cockerham 1995). While useful as indicators of Standardized items or measures of subjective
trends, routine data needs to be supplemented health status are increasingly included in population
with patient-based information to be of value in health surveys, in evaluations and clinical trials of
outcome assessments. service interventions. These are measures which ask
As stated earlier, the traditional indices of out- people to rate their own health status and the
comes of health care are negative in their focus. The impact of their health on various aspects of their
main clinical outcome measures are mortality rates lives. They are often referred to as patient-based
(length of survival), morbidity, complications, bio- measures. Detailed information about self-
chemical tests, physical condition, symptoms and, in perceived health and its effects can be collected
the past, return to work (for example, in cardiology, from large numbers of people using self-report
Wilson Barnett 1981). Clinicians have traditionally questionnaires. They can be administered to the
judged the value of an intervention mainly in terms target group of interest by post, telephone, com-
of the five-year survival period. While obviously puters, in clinical settings, or in face-to-face home
important in the case of life-threatening conditions, interviews. The group of interest may be a patient
this indicator ignores the living. Many health care or client group or a sample of the general popula-
programmes and interventions will have little or no tion, depending on the aims of the study. Survey
impact on mortality rates (for example, in relation information about health and illness at population
to chronic diseases). Survival needs to be inter- level is collected routinely by several governments
preted more broadly in terms of the impact and (e.g. government-sponsored interview surveys in
consequences of treatment. the USA since 1956, in Britain since 1971, and
In recognition of this, there has been an in Finland since 1964).
CONCEPTS OF FUNCTIONING, HEALTH, WELL-BEING AND QUALITY OF LIFE 3
A wide range of concepts are used interchange- medical status. While impairment is concerned
ably in the literature on patient-based health and with biological function, disability is concerned
quality of life outcomes research, and there is a with activities expected of the person or the body.
general lack of consensus over their definition and Disability was defined as ‘. . . any restriction or lack
measurement. The following sections examine the (resulting from an impairment) of ability to per-
main concepts that are widely used and which form an activity in the manner or within the range
arguably make up broader health status, and, ulti- considered normal for a human being’. Functional
mately, health-related and generic quality of life. handicap thus represents the social consequences of
impairments or disabilities. It is a social phenom-
enon and a relative concept. The attitudes and
THE CONCEPT OF FUNCTIONAL values of the non-handicapped play a major part in
ABILITY defining a handicap. It was defined as a disadvantage
for a given individual, resulting from an impairment
One of the oldest, and most common, methods of or a disability, that limits or prevents the fulfilment
assessing outcome of care in a broader sense is in of a role that is normal (depending on age, sex and
terms of people’s functional ability, especially their social and cultural factors) for that individual. While
performance of tasks of daily living. Such ‘disability’ welcomed, more detailed classifications have been
measures are regarded as more meaningful to proposed (Nagi 1965, 1991).
people’s lives than objective biochemical measures These concepts lead to the concept of depend-
or measures of timed walking or grip strength. ency on other people or service providers. Impair-
There are several models of disability (see review ment and disability may or may not lead to depend-
by Putnam 2002). Social models of disability ency in the same way they lead to handicap. As with
attempt to explain what disability is and how it is the concept of handicap, functional ‘dependency’ is
experienced; it is held that disability is a function of a social consequence – societal attitudes decide on
the interaction between a person and the demands its definition and existence. Wilkin (1987) defined
of their environment. The original WHO (1980) dependency as ‘a state in which an individual is
social model of disability recognized that the terms reliant upon other(s) for assistance in meeting
‘impairment’, ‘disability’ and ‘handicap’ are often recognized needs’. In summary, impairment and
erroneously used interchangeably. The increasing disability may lead to dependency in the same way
use of the concept of functional dependency has they lead to handicap. However, they cannot be
recently added to the confusion. The World Health equated with dependency, nor is there a necessary
Organization’s (WHO 1980) International Classifi- relationship. On the basis of the previous definitions,
cation of Impairments, Disabilities and Handicaps functional status can be defined as the degree to
provided a consistent, if over-simple, terminology which an individual is able to perform socially
and classification system (since revised to present a allocated roles free of physically (or mentally in the
more complex, positive model, see later). It defined case of mental illness) related limitations.
the terms ‘impairment’, ‘disability’ and ‘handicap’ The WHO classifications constituted useful
and linked them together conceptually: working definitions of impairment, disability and
handicap. A working definition, as distinct from an
Disease or operational definition, must be precise enough to
→ Impairment → Disability → Handicap suggest the content of the indicators but must not
Disorder
e.g.:
be so precise that it cannot be generalized to a
variety of contexts. Operational definitions, in con-
Blindness → Vision → Seeing → Orientation trast, are usually specific to a particular measure-
Rheumatism → Skeletal → Walking → Mobility ment instrument and even to a particular type of
study. They define the specific behaviours and the
Impairment was defined as ‘. . . any loss or ways in which they are to be classified. Most
abnormality of psychological, physiological or operational definitions in this area concentrate
anatomical structure or function’. It represents upon activities of daily living, often subdivided
deviation from some norm in the individual’s bio- into domestic and self-care activities. Thus the
4 MEASURING HEALTH
operational definition of dependency is failure to on objective data has been criticized in terms of its
perform certain specified activities independently inability to capture factors pertinent to health status,
to a predefined standard. to the way people feel, and the context in which
The WHO (1998) has since updated and revised they live (Patrick 2003). Although social scientists
its classification to produce a more complex model view health as a continuum along which people
in a move away from a ‘consequence of disease’ progress and regress, and despite Merrell and Reed’s
classification, to a more positive classification of early (1949) proposal for a graded scale of health
‘Impairments (of structure), Activities (previously from positive to negative health, current measures
called disabilities), and Participation (previously of health status still take health as a baseline and
called handicaps)’, and a ‘components of health’ then measure deviations away from this. They
classification, known as the International Classifica- measure ill health because it is easier to measure
tion of Functioning, Disability and Health (ICF) departures from health rather than to find indices
(World Health Organization 2001). In the latter, the of health itself. Health has been operationalized (e.g.
aim was to provide a unified and standard language definition to enable proxy measurement) in most
and framework for the description of health and patient-based research in terms of self-reported
health-related states. Functioning was described as (mental and physical) health status and broader
an umbrella term for all body functions, activities health status (which includes the ability to con-
and participation in life situations, and disability as tinue with everyday social role functioning). But
an umbrella term for impairments, activity limita- developed scales still tend to measure the negative,
tions or restrictions on participation. The first part rather than the positive ends of continua.
of the ICF contains two main lists: ‘body functions Of course, when studying severely ill popula-
and structures’ – for example, specific mental tions, the most effective strategy may be to employ
functions (such as memory) and structures of the measures of negative health status. However, only
nervous system (for example, the spinal cord), and approximately 15 per cent of a general population
‘activities and participation’ (for example, mobility in a Western society will have chronic physical
and community recreation). The second part limitations, and some 10–20 per cent will have
incorporates environmental factors (for example, substantial psychiatric impairment (Stewart et al.
support and relationships or services, systems and 1978; Ware et al. 1979). Large numbers of very old
policies). The WHO conceived a person’s function- people have also reported in surveys that they are
ing and disability as a dynamic interplay between relatively healthy, although this might partly reflect
health and environment and personal factors. the survival of the fittest (Nybo et al. 2001). Thus
There is no formal definition of disability within reliance on a negative definition of health provides
their revised model; it is the umbrella term for any little information about the health of the remaining
impairment of body structure or function, limita- 80–90 per cent of general populations.
tion of activities, or restriction in participation. It In its 1946 constitution (WHO 1948a, 1948b),
provides a framework of human functioning on a the WHO adopted a positive definition of health
continuum, rather than just at the extreme points. and specified that ‘Health is a state of complete
There is, then, a clear distinction between func- physical, mental and social well-being and not
tioning and general health status. Functioning is merely the absence of disease and infirmity.’ How-
directly related to the ability to perform one’s roles ever, no conceptual or operational definitions were
and participate in life. As such, functional status is attempted. Despite the controversy provoked by
just one component of health – it is a measure of this utopian definition, it has led to a greater focus
the effects of disease. on a broader, more positive concept of health, rather
than a narrow, negative (disease-based) focus
(Seedhouse 1986). The WHO’s concept of health
THE CONCEPT OF POSITIVE HEALTH in social, psychological and physical terms has
become accepted to the extent that a measure of
It was pointed out earlier that, in medicine, health health status that fails to incorporate one of these
is usually referred to negatively as the absence of dimensions is subject to negative evaluation (Kaplan
disease, illness and sickness. A narrow focus solely 1985).
During the 1980s, as a result of the increasing THE CONCEPT OF SOCIAL HEALTH
focus on health promotion, the search for indicators
of positive health had intensified (Scottish Health Donald et al. (1978) have called for a broader view
Education Group 1984; Abelin et al. 1986; Ander- of health than the reporting of symptoms, illness
son et al. 1989). The World Health Organization’s and functional ability and aimed to measure ‘social
(1985) ideal of Health for all by the Year 2000 and health’ in the Rand Health Insurance Study. Social
the Ottawa Charter for Health Promotion (1986), health was viewed as a dimension of individual
emphasized assisting the individual to increase well-being distinct from both physical and mental
control over and improve health, and continued health. They conceptualized social health both
to promote broader, more positive definitions of as a component of health-status outcomes (as a
health (Thuriaux 1988). However, a positive con- dependent variable) and, following Caplan (1974)
ception of health is still difficult to measure because and Cassel (1976):
of the lack of agreement over its definition. With-
out an operational definition it is not possible to in terms of social support systems that might
determine if and when a state of health has been intervene and modify the effect of the
achieved by a population. While clinical judge- environment and life stress events on physical
ments focus upon the absence of disease, lay and mental health (as an intervening variable).
people may hold a variety of concepts of health Measurement of social health focuses on the
such as the ability to carry out normal everyday individual and is defined in terms of
tasks, feeling strong, good, fit and so on (Cox et al. interpersonal interactions (e.g. visits with
1987, 1993). friends) and social participation (e.g.
There is broad agreement that the concept membership in clubs). Both objective and
of positive health is more than the mere absence of subjective constructs (e.g. number of friends
disease or disability and implies ‘completeness’ and and a rating of how well one is getting along,
‘full functioning’ or ‘efficiency’ of mind and body respectively) are included in this definition.
and social adjustment. Beyond this there is no (Donald et al. 1978)
one accepted definition. Positive health could be
described as the ability to cope with stressful situ- Other authors have also conceptualized social
ations, the maintenance of a strong social-support health as a separate component of health status,
system, integration in the community, high morale defining it in terms of the degree to which people
and life satisfaction, psychological well-being, and function adequately as members of the community
even levels of physical fitness as well as physical (Renne 1974; Greenblatt 1975). Lerner (1973)
health (Lamb et al. 1988). It is composed of distinct noted that health status may be a function of non-
components that must be measured and interpreted health factors external to the individual, such as the
separately. environment, the community and significant social
The concept of health, even when defined groups. He recommended that social well-being
broadly, and with a positive slant, for example, in measures focus on constructs such as role-related
terms of the ability to continue with everyday social coping, family health and social participation. He
role functioning, is theoretically distinct from hypothesized that socially healthy persons would
health-related quality of life. Although health is be more able to cope successfully with day-to-day
valued highly by people, it is but one of several challenges arising from performance of major social
component of life (Bowling et al. 2003). And, by roles; would live in families that are more stable,
definition, health status and health-related quality integrated and cohesive; would be more likely to
of life are only part of overall quality of life. There participate in community activities; and would
is agreement that quality of life (and health- be more likely to conform to societal norms. In
related quality of life) is more comprehensive than relation to psychiatric illness, Leighton (1959) has
health status, and includes aspects of the environ- described how individual personalities can be
ment that may or may not be affected by health, influenced by the quality and quantity of inter-
as well as more global evaluations of life (Patrick personal relationships. Lack of social integration
2003). may produce psychological stress and decrease the
6 MEASURING HEALTH
individual’s resources for dealing with it, possibly The concept of social health is a dimension
resulting in psychiatric disorders. Lack of social of both broader health and of quality of life.
support has also been implicated in poor outcome Having social relationships, being involved in
of depressive illness (George et al. 1989). social activities, and living in safe neighbourhoods
Social support can thus be regarded as a key con- with good facilities have all been nominated by the
cept in theory and research on ‘social health’. public as giving their life quality; conversely, their
Kaplan (1975) outlined several areas of social sup- absence has been said to take quality away from
port, including work achievements and position their lives (Bowling et al. 2003).
in the hierarchy; family support, social activity and
friendships; financial adequacy; personal life (e.g.
existence of a confidant(e)); personal achievements THE CONCEPT OF SUBJECTIVE
and philosophy and sexual satisfaction. WELL-BEING
Most investigators of social health in these areas
have focused on individual social network and sup- In contrast to objective indicators, subjective indica-
port systems rather than on community resources tors are those which involve some evaluation of
and integration. But human ecology theory also one’s circumstances in life. Subjective well-being
holds that the quality of life of humans and the is more than the absence of physical and mental
quality of their environment are interdependent, health problems or psychological morbidity, such as
and the former cannot be considered apart from the anxiety and depression. It is a more positive concept
whole ecosystem (Rettig and Leichtentritt 1999). and comprises dimensions of happiness, life satis-
Cultures, the environment, societal resources faction, morale, self-esteem and sense of coherence.
and facilities can all contribute to health. Social Psychological models of well-being are distinct but
cohesion and social capital are collective, ecological related, and emphasize existential challenges of life:
dimensions of society, distinct from the concepts personal growth, control, autonomy, self-efficacy
of social networks and social support which are or self-mastery (Larson 1978; Keyes et al. 2002).
measured at the level of the individual (Kawachi The distinction is discussed further in Chapter 7.
and Berkman 2000). As Durkheim (1895, 1897) However, few investigators have distinguished
recognized long ago, society is not simply the sum between these concepts and they are commonly
of individuals, and well-being is influenced by used interchangeably.
society as a whole. Therefore, in order to under- The dimensions of well-being have been
stand individuals we must study them in the context measured with what social scientists have called
of external, societal as well as internal, personal ‘subjective indicators’, on the grounds that it is
forces. Social cohesion refers to the connectedness unlikely that human happiness and satisfaction can
and solidarity between groups of people (Kawachi be understood without asking people about their
and Berkman 2000). Social capital is a subset of feelings. Subjective or experiential social indicators
the concept of social cohesion, and refers to the are based on the model of subjective well-being as
extent to which communities offer members defined by people’s ‘hedonic feelings or cognitive
opportunities, through active involvement in social satisfactions’ (Diener and Suh 1997). People are
activities, voluntary work, group membership, routinely engaged in evaluating themselves in
leisure and recreation facilities, political activism relation to the life domains they consider to be of
and educational facilities, to increase their personal relevance, and important, to themselves. Subjective
resources (i.e. their social capital) (Coleman 1984; indicators formalize these natural tendencies.
Putnam 1995; Brissette et al. 2000). It can also be Veenhoven (1991) has argued that making an over-
defined as those features of social structures which all judgement about one’s life implies a cognitive,
act as resources for individuals and facilitate collec- intellectual activity and requires the assessment of
tive action, such as high levels of interpersonal trust past experiences and estimation of future experi-
and mutual aid (Kawachi and Berkman 2000). ences: ‘Both require a marshalling of facts into a
Social capital needs to be incorporated into a model convenient number of cognitive categories. It also
of social health, and included in broader studies of demands an evaluation of priorities and relative
health and quality of life. values’ (Veenhoven 1991). Life assessment is also
bipolar, consisting of the independent dimensions mon definition or definitive theoretical framework
of positive and also negative affect. The difficulty of quality of life.
for research lies in capturing the relevant and impor- The wide range of definitions of quality of life,
tant areas to most people. While biases inevitably health-related quality of life, and their inconsistent
threaten all subjective measures (Veenhoven structures, was reviewed by Farquhar (1995a), and
2002), researchers have risen to the challenge with the diverse contributions of sociology (functional-
exhaustive, now classic, investigations of the validity ism) and psychology (subjective well-being) to the
of measures of reported well-being (Andrews and theoretical foundations of the concept of quality
Crandall 1976). of life were described by Patrick and Erickson
Reasons for discontent with subjective indicators (1993). Quality of life has been defined in macro
have been described, and counterargued, by Veen- (societal, objective) and micro (individual, subjec-
hoven (2002) and include the difficulties of com- tive) terms. The former includes income, employ-
paring people because of varying standards for ment, housing, education, other living and environ-
comparison, shifting standards over time (e.g. when mental circumstances. The latter includes percep-
living standards improve, standards for comparison tions of overall quality of life, individual’s experi-
might raise and lead to increasing dissatisfaction); ences and values. Some definitions overlap. For
also the partly unconscious and implicit criteria example, Shin and Johnson (1978) suggested that
which underlie subjective appraisals (e.g. people quality of life consists of ‘the possession of resources
may be able to state how satisfied they are, but be necessary to the satisfaction of individual needs,
less certain why). While random errors are not wants and desires, participation in activities enabling
always problematic, Veenhoven (2002) admits that personal development and self-actualization and sat-
social desirability bias can inflate certain self-ratings isfactory comparison between oneself and others’,
of circumstances and happiness; and interviewing all of which are dependent on previous experience
biases, question sequence and response format can and knowledge. Veenhoven (2000) also distin-
lead to systematic distortion of data (see Schwartz guished between opportunities (chances) for a good
and Strack 1999). These criticisms apply to all social life and the good life (outcomes) itself. Each area of
research with human participants, including the quality of life can also have knock-on effects on the
measurement of health status. As Veenhoven others. For example, having access to transport may
(2002) pointed out, despite criticism over the biases promote independence and social participation,
inherent in measuring subjective perceptions, sub- promote life and enhance perceived quality of life
jective indicators are still needed in the setting of (Bowling et al. 2003), but the former are partly
policy goals based on what people need and want, dependent on having health and adequate finances.
and in evaluations of outcome in terms of public These can also be influenced by local transport
support. Objective indicators alone do not provide facilities, type of housing, community resources to
sufficient information. facilitate social participation and social relationships.
Quality of life appears to be a complex collection
of interacting objective and subjective dimensions
THE CONCEPT OF QUALITY OF LIFE (Lawton 1991), and most investigators focus on its
multi-dimensionality. Beckie and Hayduk (1997)
In general terms quality can be defined as a grade of argued, however, that multi-dimensional definitions
‘goodness’. Quality of life, then, is about the good- of quality of life confound the dimensionality of the
ness of life, and in relation to health is about the concept with the multiplicity of the causal sources
goodness of those aspects of life affected by health. of that concept. They argued that quality of life
Health-related quality of life is one dimension of could be considered as a unidimensional concept
wider quality of life. Quality of life and health- with multiple causes, and a unidimensional QoL
related quality of life are multi-level and amorphous rating, such as ‘How do you feel about your life as a
concepts, and both are increasingly popular as end whole?’ could be the consequence of global assess-
points in the evaluation of public policy, including ments of a range of diverse and complex factors.
the outcomes of health and social care. But the Thus it is logical for a unidimensional indicator of
wider research community has accepted no com- quality of life (e.g. a self-rating global QoL uniscale)
8 MEASURING HEALTH
to be the dependent variable in analyses, and the of quality of life. Their theoretical model, which
predictor variables include the range of health, was supported by their data on people with mental
social and psychological variables. The predictor health problems, focused on how subjectively
variables in a model of global quality of life self- perceived quality of life is mediated by several
evaluation would, by necessity, have to include a interrelated variables, including self-related con-
wide range of life domains if it is to mirror how structs and how these perceptions are influenced by
those evaluations were made. In addition, these fac- cognitive mechanisms. Zissi et al. (1998) pointed to
tors may interact, adding to the complexity of the the confusion surrounding the many psychological
evaluation. They argued that if the QoL evaluation concepts commonly used to denote quality of
is greater than the sum of its parts, then this can be life, with their potential roles as influences, con-
problematic for causal analyses, but that the diver- stituents or mediators of perceived life quality. They
sity, multiplicity and complexity of sources of the argued that perceived quality of life is likely to be
concept warrants treating its measurement in terms mediated by several interrelated variables, including
of a global assessment. Thus it appears reasonable self-related constructs (e.g. self-mastery and self-
that quality of life is influenced by causal variables, efficacy, morale and self-esteem, perceived control
and the level of quality of life manifests itself in over life) and these perceptions are likely to be
indicator variables. But the traditional approach to influenced by cognitive mechanisms (e.g. expecta-
its measurement has implicitly assumed only indica- tions of life, social values, beliefs, aspirations and
tor variables. An appreciation of the distinction social comparison standards). Although the model is
between these types of variable may lead to more attractive, there is still little empirical data to support
careful definition of concepts and the selection of or refute the distinction between psychological
more appropriate measurement scales (Zizzi et al. constructs as mediating or influencing variables in
1998; Fayers and Hand 2002). determining the quality of life.
In relation to this, the effects of personality on The main theoretical models of quality of life, as
perceived well-being and quality of life are contro- opposed to basic definitions, include needs-based
versial, partly because of the debate about causal approaches derived from Maslow’s (1954, 1962)
versus mediating variables. Extroversion and neuroti- hierarchy of human needs (deficiency needs:
cism have been reported to account for a moderate hunger, thirst, loneliness, security; and growth
amount of the variation in subjective well-being needs: learning, mastery and self-actualization)
(the trait of extraversion is associated with positive (Higgs et al. 2003). Overlapping with this are social-
affect and with well-being; emotionality is associ- psychological models which emphasize autonomy
ated with negative affect and poor well-being) and control, self-sufficiency, internal control and
(Costa et al. 1987). Spiro and Bossé (2000), on the self-assessed technical performance, social com-
basis of their survey of over 2,000 adults in the petence (Abbey and Andrews 1986; Fry 2000);
Normative Aging Study, reported the same associ- classic models based on subjective well-being,
ation with personality traits and well-being, and happiness, morale, life satisfaction (Andrews and
also with health-related quality of life. However, Withey 1976; Larson 1978; Andrews 1986); social
these personality factors are highly stable traits, expectations or gap models based on the discrep-
while subjective well-being has been shown to have ancy between desired and actual circumstances
only moderate stability over time (Headey et al. (Calman 1984; Michalos 1986); and phenomeno-
1985). logical models of the individual’s unique percep-
Despite classic work on mediators in the 1980s, tions of their circumstances (O’Boyle 1997a,
theoretical and empirical development has made lit- 1997b), based on the concept that quality of life is
tle progress (Abbey and Andrews 1986). Following dependent on the individual who experiences it,
Abbey and Andrews (1986), Barry (1997) and Zissi and should be measured using their own value
et al. (1998) argued that there is a need for a model systems (Ziller 1974; Benner 1985; Rosenberg
of quality of life which focuses on the potential link 1995). In recognition of the importance of the
between psychological factors (e.g. self-esteem or individual’s perspective, the World Health Organ-
self-worth; self-efficacy, perceived control and self- ization included in their definition of quality of life
mastery; and autonomy) and subjective evaluations in the context of health, the individual’s perception
of their position in life in the context of the culture quality of life in terms of having a positive psycho-
and value systems in which they live, and in relation logical outlook and emotional well-being, having
to their goals. However, they proceeded to develop good physical and mental health and the physical
a measure of quality of life (the WHOQOL) using ability to do the things they want to do, having
structured scales (WHOQOL Group 1993, 1995). good relationships with friends and family, par-
While acknowledging the importance of the ticipating in social activities and recreation, living
individual’s perceptions, there is empirical evidence in a safe neighbourhood with good facilities and
that most people hold a set of common values in services, having enough money and being inde-
relation to what gives quality to life and what make pendent (Bowling 1995; Farquhar 1995b; Bowling
up the important things in life, although priorities and Windsor 2001; Bowling et al. 2003). Thus the
vary by people’s socio-demographic characteristic. topics covered in the chapters which follow have a
Research shows that most people define their logical basis in population values.
2
THEORY OF MEASUREMENT
CHOICE OF HEALTH INDICATOR the outcome of intervention B a nominal scale may

be sufficient (for example, ‘died’ or ‘survived’).
When deciding which measure to use, the investi- Hypotheses can be tested regarding the distribution
gator should assess whether a disease-specific or of cases among categories by using the nonpara-
broad-ranging instrument is required; the type of metric test, x2, and also Fisher’s exact probability
scoring that the instrument is based on (whether test. The most common measure of association
scores can be easily analysed in relation to other (correlation) for nominal data is the contingency
variables); the reliability, validity and sensitivity of coefficient.
the scale; the appropriateness of the instrument for An ordinal (or ranking) scale is applicable where
the study population; and the acceptability of the objects in one category of a scale are not simply
instrument to the group under study (Hunt et al. different from objects in other categories of that
1986). It is also important to decide whether the scale, but they stand in some kind of relation to
study requires measurement on a nominal-, them. For example, typical relations may be higher
ordinal-, interval- or ratio-scale level; these terms than, more preferred, more difficult (in effect,
are explained below. greater than). This is an ordinal scale. Many dis-
ability and health-status measures are strictly of this
type.
MEASUREMENT THEORY The most appropriate statistic for describing the
central tendency of scores in an ordinal scale is
Descriptions can be placed on a nominal or an the median, since the median is not affected by
ordinal, interval or ratio scale. The requisite level of changes of any scores above or below it, as long as
measurement depends on the intended applica- the number of scores above and below remain the
tions of the indicator and on the question that the same. Hypotheses can be tested using nonpara-
researcher is attempting to answer. metric statistics such as correlation coefficients
With a nominal scale, numbers or other symbols based on rankings (e.g. Spearman r or the Kendall
are used simply to classify a characteristic or item. r). An ordinal scale is sufficient only for answering
This is measurement at its weakest level. For basic questions such as ‘How does X compare with
example, functional disability states and perceived Y?’
health are defined by descriptions and thus a An interval scale is obtained when a scale has all
nominal or classification scale is constructed. For the characteristics of an ordinal scale, and when, in
the purpose of the comparative evaluation of the addition, the distances between any two numbers
outcome of intervention A in comparison with on the scale are of known size. Measurement
THEORY OF MEASUREMENT 11
considerably stronger than ordinality has thus been Validity
achieved. An interval scale is characterized by a
Validity is concerned with whether the indicator
common and constant unit of measurement which
actually does measure the underlying attribute or
assigns a real number, but the zero point and unit
not. One of the most problematic aspects of
of measurement are arbitrary (e.g. temperature –
assessing validity is the varying terminology. Text-
where two scales are commonly used as the zero
books have tended to focus on content validity,
point – differs on each and is arbitrary).
criterion validity and construct validity. Construct
The interval scale is a truly quantitative scale
validity is differentiated into convergent and dis-
and all the common parametric statistics (means,
criminant validity. However, all types of validity
standard deviations, Pearson correlations, etc.) are
are addressing the same issue of the degree of con-
applicable as are the common statistical tests of sig-
fidence that can be placed on the inferences drawn
nificance (t test, F test, etc.). Parametric tests should
from scale scores. These issues have been addressed
be used as nonparametric methods would not
more fully by Messick (1980) and Streiner and
usually take advantage of all information contained
Norman (2003).
in the research data. Interval scales are appropriate if
The assessment of validity involves assessment
the question is ‘How different is X to Y?’
against a standard criterion. Because there is no
The ratio scale exists when a scale has all the char-
‘gold standard’ of health against which health-status
acteristics of an interval scale and in addition has a
indices can be compared, the validation methods
true zero point as its origin. The ratio of any two
commonly used in the behavioural sciences are the
scale points is independent of the unit of measure-
assessment of content and construct validity
ment. Weight is one example. Any statistical test is
(American Psychological Association 1974). The
usable when ratio measurement has been achieved.
criteria of validity which should be met in general
Ratio scales are needed if the question is: ‘Pro-
are:
portionately how different is X to Y?’
The most rigorous methods of data analysis
require quantitative data. Whenever possible, Content validity
measures which yield interval or ratio data should Content validity refers to whether the components
be used, although this is often difficult in social of the scale/item cover all aspects of the attribute to
science. Measures of functional disability and health be measured, in a balanced way. At its most basic
status never strictly reach a ratio- or interval-scale level, the content of the variable should match the
of measurement. However, methods of data trans- name which it has been given. Each item should
formation do exist which permit even nominal data fall into at least one of the content areas being
to be made quantitative for purposes of analysis. tapped. If it does not, then the item is not relevant to
the scale’s objectives, or the list of scale objectives
is not comprehensive. The number of items in
MEASUREMENT PROBLEMS each area should also reflect its importance to the
attribute. Content validity is more systematic than
Measurement problems are rife when attempting to face validity, and judgements about these issues
measure health outcomes. Indicators may work well are usually made by a panel following literature
or badly and are usually assessed by tests of validity reviews, focus groups and exploratory interviews
and reliability. with the target population. It is generally agreed
Authors of the various measures often make that the content validity of subjective indicators
claims for reliability and validity based on achieved should be judged by members of the target group
coefficients without any reference to acceptable being assessed (Patrick 2003).
levels. Suggestions for acceptable levels for reliabil-
ity and validity range from 0.85 to 0.94, although
Face validity
often 0.50 is regarded as acceptable for correlation
coefficients (Ware et al. 1980). A full discussion of Face validity is more superficial than content
the problems of achieving reliability and validity validity. It is a subjective assessment by the investi-
can be found in Streiner and Norman (2003). gators about whether the indicator, on the face of
12 MEASURING HEALTH
it, is a reasonable one and the items appear to be Construct validity
measuring the variables they claim to measure. The
Construct validity is corroboration that the instru-
meaning and relevance of the indicator should
ment is measuring the underlying concept it is
appear self-evident.
intended to measure. This type of validity is relevant
in more abstract areas such as psychology and
Criterion validity
sociology where the variable of interest cannot be
Criterion validity refers to whether the variable directly observed. Unlike other types of validity
can be measured with accuracy. The traditional testing, testing for construct validity involves assess-
definition of criterion validity is the correlation of ing both theory and method simultaneously. It
a scale with some other ‘criterion’ measure of necessitates stating a conceptual definition of the
the topic under study, ideally a ‘gold standard’. construct to be measured, specifying its dimensions,
Criterion validity is usually divided into two types: hypothesizing its theoretical relationship with other
concurrent and predictive validity: variables, and then testing it. One problem is that,
if the predictions made on the basis of theory are
Concurrent validity not confirmed, then the problem could be with the
validity of the measure or the validity of the theory.
Concurrent validity is the independent corrobor-
This type of validity is generally divided into con-
ation that the instrument is measuring what it
vergent and discriminant validity:
purports to measure against a criterion measure
(e.g. the corroboration of a physical functioning
Convergent-discriminant validity
scale with observable functioning). Thus, when a
scale is tested against another scale measuring the Convergent validity is the extent to which two
same thing it is called concurrent validity, which measures which purport to be measuring the same
refers to a scale’s substitutability. Both scales are topic correlate (that is, converge). The hypothesis
administered at the same time. It is most often used to be tested is that the measure will correlate with
when attempting to develop a replacement scale variables which measure the same topic. It thus
which is simpler or less expensive to administer involves assessing the extent to which the scale is
than an existing scale. related to other variables and measures of the
same construct to which it should be related. The
Predictive validity interpretation is sometimes problematic with the
measurement of similar concepts, as scale scores that
Predictive validity refers to whether the measure
correlate too highly may be measuring the same
can predict future differences in key variables in
dimension. Some investigators have interpreted this
the expected direction (e.g. was the health-status
definition technically incorrectly and have assumed
scale able to predict self-reported improvements in
that convergent validity has been obtained when
patients’ health after treatment?). With this type of
a measure correlates highly with a measure on a
predictive validity, the criterion will be unavailable
different topic that it is expected to be associated
until some future end point.
with.
Predictive validity has also been defined in terms
Conversely, discriminant validity (also known
of its discriminative ability (i.e. in tests of associ-
as divergent validity) requires that the construct
ation, the percentage of respondents who have
should not correlate with dissimilar (discriminant)
been correctly classified using the instrument).
variables. Thus, the hypothesis to be tested is that
For example, if there are theoretically sound reasons
the measure will not correlate with variables which
to hypothesize that people in the lower socio-
measure a different, unrelated topic.
economic groups are more likely than people in
Convergent-discriminant validity is also referred
higher socio-economic groups to report poor
to as multitrait-multimethod matrix validity
health status, then the health status of the two
(Campbell and Fiske 1959), which simply means
groupings can be compared as a check of predictive
that different methods of measuring the same
validity.
construct should produce similar results (that is,
correlate highly), and measures with different
dimensions should produce different results (i.e. minus specificity). The degree of overlap between
not correlate). The validation process thus involves the defined groups is measured by calculating the
calculation of the inter-correlations between area under the curve (AUC), and its associated
measures. standard error (Hanley and McNeil 1982). The
greater the total area under a plotted curve from all
Precision cut-off points, then the greater the instrument’s
This is the ability of an instrument to detect small responsiveness. A statistic of 0.5 indicates that pre-
changes in an attribute. diction is no better than chance and 1.0 represents
perfect accuracy.
Responsiveness to change (sensitivity)
This is the ability of an instrument to be responsive FACTOR STRUCTURE
(sensitive) to actual changes which occur over time.
It is a measure of the association between the Factor analysis is used to identify and define a
change in the observed score and the change in small number of separate factors (dimensions) that
the true value of the construct. This involves corre- make up an instrument; it describes how the items
lating the instrument’s scores with other measures contained in each dimension group together in a
which reflect any anticipated changes (e.g. in the consistent and coherent way (i.e. with sufficient
case of a depression scale, it will need to be corre- consistency to each other). Thus, factor analysis
lated with a standardized, structured psychiatric reduces a large number of interrelated observations
interview and any indicated changes in psycho- to a smaller number of common dimensions
logical status between the two methods of assess- (factors). Factor analysis is an increasingly used
ment compared). technique for the assessment of the number of
dimensions that underlie a set of variables (the
Sensitivity factor structure). For example, if items relate to a
This refers to the proportion of cases (e.g. people single dimension, then the combination of items
with verified diagnoses of depression) score as into a single measure is supported.
positive cases on an instrument (e.g. a scale of The selection of items in a measurement scale
depression), and the ability of the gradation in the also needs to consider the relevance of the items
scale’s scores to adequately reflect actual changes. before discarding any of them following factor
analyses. Juniper et al. (1997) compared two philo-
Specificity sophically different methods for selecting items for
a disease-specific quality of life questionnaire: the
This is a measure of the probability of the scale impact method which selects items that are most
correctly identifying non-cases (e.g. people without frequently perceived as important by the target
depression) with the measure (e.g. of depression). population and the psychometric method (factor
It refers to the discriminative ability of the measure. analysis) which selects items primarily according
Again, it also refers to the ability of the gradation to their relationships with one another. Based
in the scale’s scores to adequately reflect actual on research with 150 adult asthma patients, they
changes. reported that the impact method resulted in a
32-item instrument and the psychometric method
ROC curves
led to a 36-item tool, with 20 items common to
The discriminant ability of a scale which uses both. The psychometric approach had discarded the
continuous data can be investigated with receiver items relating to emotional function and environ-
operating characteristic (ROC) curves. The ROC ment, and included items mainly on fatigue instead.
curve examines the degree of overlap of the dis- Thus the two approaches lead to important dif-
tributions of scale scores for all cut-off points for ferences. Again, Kane et al. (1998), in a comparative
defined groups; the curve itself is a plot of the true study of the USA and Europe, compared geriatric
positive rate against the false positive rate for each professionals’ and lay people’s ratings of the impor-
point on the scale (sensitivity plotted against one tance of 32 items measuring physical functioning.
14 MEASURING HEALTH
While the overall correlation between the groups worthy of retention in a scale if its eigenvalue (a
was 0.82, in general lay people rated instrumental measure of its power to explain variation between
activities of activities of daily living items more subjects) exceeds a certain value (usually 1.5).
highly (e.g. (dis)ability to prepare meals, clean the
house, shopping). The experts rated the most Multiple-form reliability
dysfunctional activities of daily living items higher
This refers to the correlation between the sub-
than the lay people (e.g. (dis)ability to dress, feed
domains of the scale.
self, get to/use toilet).
Alternate forms reliability
Reliability Where measures have alternative formats – for
An instrument will also require testing for reliabil- example a version for interviewer administration
ity. A measure is judged to be reliable when it and a version for self-administration, or a long and
consistently produces the same results, particularly shorter version – they should be highly correlated.
when applied to the same subjects at different
time periods when there is no evidence of change. Test-retest reliability
The methods of testing for reliability include The test is administered to the same population on
multiple form, basic tests of internal consistency two occasions and the results are compared, usually
(for example, split-half, item-item correlations and by correlation. The main problem with this is that
item-total correlations), test-retest, intra-rater and the first administration may affect responses on the
inter-rater agreement, and sensitivity to change. second. There can be problems with interpretation
In addition, tests of internal consistency based on of observed change, given the potential for observer
statistical models, for example using factor analysis errors with any scale, and the potential for genuine
are becoming widespread (Harman 1976). individual change between administrations which
affects the estimate of reliability.
Internal consistency
Intra-rater and inter-rater agreements
Internal consistency involves testing for homo-
geneity. This can take the form of correlations Intra-rater agreement is the reliability of the same
between the items in the scale, or within each scale rater’s scores, of the same subjects, on different
domain, or between the two halves of the scale occasions. Inter-rater agreement is the concordance
where the scale can be divided into two equivalent of scores achieved by different raters on the same
parts (split-half reliability); correlations between occasion.
the items and the total score are also performed. The achievement of standards of validity and
Cronbach’s alpha should be calculated (Cronbach reliability requires time and effort. It is a powerful
1951), which is based on the average correlation reason for using existing scales.
among the items and the number of items in the
instrument (values range from 0 to 1). A low
coefficient alpha (e.g. below 0.50) indicates that TYPES OF INSTRUMENTS
the item does not come from the same conceptual
domain. More detailed information on the Most instruments used in social science rely on
appropriate statistical methods to employ, and self-reporting of feelings, attitudes and behaviour by
minimum acceptable values, are given in Measuring people in an interview situation or in response to
Disease (Bowling 2001). a self-administered questionnaire. Other measure-
As questions that deliberately tap different ment approaches include the use of records and
dimensions within a scale cannot be expected the observation of behaviour. Each approach has
to necessarily have high item-item or item-total its strengths and limitations. The optimal measure-
correlations, factor analysis should be used to ment strategy is to measure the same phenomenon
identify the separate factors (e.g. domains) within using several different approaches (Webb et al.
the scale. Each item within a factor is judged to be 1966).
SELF-REPORT MEASURES (nominal) categories), it is preferable to ask
respondents to indicate their opinions along a con-
Self-report measures are essential for much research tinuum of agreement: e.g. ‘strongly disagree, dis-
because of the need to obtain subjective assessments agree, no opinion, agree or strongly agree’ (Likert
of experiences (e.g. feelings about recovery, level 1952). Attitudinal and behavioural issues are not
of health and well-being). They have a broad appeal easily dichotomized; they often lie on a continuum.
as they are often quick to administer and involve A question about any difficulty in washing oneself
little interpretation by the investigator. Self-report can elicit a range of responses from ‘no difficulty’, to
measures may take a variety of forms: ‘slight’, ‘moderate’ or ‘severe’ difficulty, to ‘cannot
do this at all’. Offering a wide range of choices is
likely to reduce the potential for error due to confu-
Single-item measures
sion, although the continuum should not be too
These are self-report questions which use a single great, or meaningless responses will be elicited.
question, rating or item to measure the concept of If the aim of the research is to elicit continuous
interest. rather than categorical responses, there are several
techniques available. One approach is to ask
respondents to indicate their replies on a visual
Battery
analogue scale (for example, a line of 100 mm), with
A series of self-report questions, ratings or items descriptions such as ‘very depressed’ to ‘not at
used to measure a concept. The responses are not all depressed’ at each end. Respondents are asked
summed or weighted. A battery is like a series of to place a mark on the line corresponding to their
single-item measures, all tapping the same concept. state. There is little evidence that different methods
(e.g. categorical response choices or a visual
analogue scale) produce different responses and the
Scale
choice of method is ultimately the investigator’s
A series of self-report questions, ratings or items preference.
used to measure a concept. The response categories A common method for developing scales is
of the items are all in the same format, are summed Thurstone’s method. This involves asking respon-
and may be weighted. dents to rank statements relating to the variable of
Sometimes researchers do not wish to use a long interest, which are typed onto cards, into hier-
scale, because their questionnaires are already fairly archical order, from the most to the least desirable.
lengthy, and they prefer single-item questions. The details and scoring systems of this technique
Generally, where questionnaire length permits, are described by Streiner and Norman (2003) and
scales are preferred because they contain a larger in most psychology and methodology textbooks.
number of items and are suitable for statistical In relation to functional status, many methods
calculations using summed and weighted scores. exist of scoring or assessing ‘function’ by scaling,
Single-item measures are least preferable because it whereby a set of items can be put in a hierarchy
is doubtful that one question can effectively tap a of severity. The notion is that patients who can
given phenomenon and it is also difficult to assess perform a particular task will be able to perform
the adequacy of a single-item instrument. all tasks more easily. Conversely, if they cannot
perform a particular task, they will be unable to
perform tasks rated as higher. Guttman’s (1944)
SCALING ITEM RESPONSES scaling of disability was one of the earliest attempts
in this field. He ranked degrees of patient disability
There is a wide variety of scaling methods for item in respect of a number of activities, such as feeding,
responses. The finer the distinctions that can be continence, ambulation, dressing and bathing. This
made between subjects’ responses, the greater method assumes that disabilities can be ordered.
the precision of the measure. For example, rather Provided that disability progressed steadily from
than asking a person to simply agree or disagree one activity to another (i.e. patients first have
with a statement (which yields only two-response difficulty in bathing, then in bathing and dressing,
16 MEASURING HEALTH
and so on until they are disabled in respect of all health status as equal contributors to overall
five activities), this method of scaling yields a single severity, and expressing the results for each item
rating from 1 (no disability) to 6 (disabled on all separately. The disadvantage of this method is that
five). For example, four disabilities always score multiple disabilities are not then evident and this
worse than three, and a score of 6 would assume that type of breakdown can be cumbersome in analysis.
all disability items 2–5 had been affirmed. Examples The alternative is to assign different values (weights)
of well-known measures using this scaling method to different scale items for scoring purposes. The
are the Index of Activities of Daily Living (ADL) Normative Scale relies on the classification of items
and the Arthritis Impact Measurement Scales into major or minor categories, e.g. with disability
(AIMS). A more recent Guttman scaling instrument items, ‘being able to feed oneself’ is given twice the
has been developed by Williams et al. (1976). The weight of ‘being able to dress oneself’. Principal
activities which make up the scale are not com- Component Scaling relies on the internal evidence
prehensive in terms of describing activities of daily of the data being scaled, thus calculating the relative
living. The instrument has the advantage of having weights to be given to each item to construct a
two scales, one for men and one for women. Work linear additive index. Numerous health-index
on developing and refining the scale is continuing. questionnaires fall into these categories. The main
Guttman scaling, although popular, has been methods of weighting have been clearly and fully
criticized for its method of attributing equal described by Streiner and Norman (2003). The
weights to item responses. For example, with most common uses a different number of items
responses per item ranging from 1 to 6, the higher to measure the various aspects of the trait, pro-
the score the greater the debility; it cannot be portional to its importance within the construct.
assumed that 6 is six times as bad as 1 (Skinner and The total for any subscale would be the number
Yett 1972). of items with a response of interest divided by the
total number of items within the subscale. The
subscale scores can be added together for the scale
WEIGHTING SCALE ITEMS total. Thus, each subscale contributes equally to the
scale’s total score, even though each may consist of a
Many scales simply involve summing the item different number of items.
scores, each of which has been given an equal In practice, however, it is frequently found that
weight. This is the easiest solution to the scale weighting items makes little difference to subjects’
scoring. There is a fundamental problem with this relative scores, despite the inherent logic of this
method: some items may be more important to the technique. This is because people who score high
construct underlying the scale than others and on one scale variant often score high on others.
should therefore contribute more to the total score. Examples of this have been given by Streiner and
Summing also erroneously converts what is at best Norman (2003). On the other hand, weighting
ordinal data into interval levels of measurement items can increase the predictive ability of an index
when applying statistical techniques. Statistical (Perloff and Persons 1988). Streiner and Norman
caution is required. (2003) concluded that when the scale contains at
The problem with many scoring methods, par- least 40 items, or when the items are fairly homo-
ticularly when equal weight for each item is applied, geneous, then differential weighting contributes
is that a given score can be arrived at in different little (except complexity in scoring).
ways. A person who is cheerful and lucid but unable
to walk due to arthritis may achieve the same scale
score as someone who is physically mobile but dis- UTILITY RATING SCALES
oriented and withdrawn. While this may be useful
for assessments of staff workload, it is not useful in Finally, economists have devised a series of econo-
assessing patient outcome in any detail. Scoring metric scaling techniques in an attempt to assign a
may exacerbate the instrument’s distortion of the numerical value to a health state. These are known
experiences of individuals. The problem may be as utility ratings, the most well-known application
avoided by treating all aspects of disability and of these being the quality-adjusted life year
(Chiange 1965; Rosser and Watts 1971; Kaplan and him indifferent between the gamble and the
Bush 1982). A QALY is a year of full life quality. certainty.
Poor health may reduce the quality of a year. In An example of the standard gamble is to take a
QALYs, improvements in the length and quality of person faced with the choice of remaining in a poor
life are combined into one single index. Each life state of health versus taking a gamble on treatment
year is quality-adjusted with a utility value, where that could fully restore health or result in death (for
1 = full health. QALYs are measures of units of example, surgery for angina). If the probability of
benefit from a medical intervention, aiming to restoring full health is varied there will be a point
reflect the change in survival with a weighting where the person is indifferent between his/her
factor for quality of life. current poor state of health and taking the gamble
Different types of medical interventions are of surgery. If the person perceives his/her poor
then compared by calculations of costs per gained health state as particularly undesirable, he/she will
QALY (Williams 1985). QALYs can be derived by be more likely to accept a greater probability of
several different methods (e.g. the Rosser Index death in order to escape it.
of Disability (Rosser and Watts 1972), standard
gamble, trade-off and rating scale techniques
Equivalence
(Torrance et al. 1972, 1982; Torrance 1986, 1987).
These will be only briefly referred to here; inter- A similar technique is the ‘equivalence technique’
ested readers should consult specialist texts for whereby respondents are asked to identify their
their evaluation (e.g. Torrance 1986; Teeling Smith point of indifference between keeping alive a group
1988). of people in a state of fairly good health and a larger
group, whose size is determined by the respondent,
of less well people.
The rating scale
The rating scale is suitable for measuring prefer-
Time trade-off
ences for chronic or temporary health states. A
typical rating scale consists of a line drawn on a page With this method, the technique is to vary the
with clearly defined end points such as ‘death/least length of time in each health state with treatment
desirable’ at one end and ‘healthy/most desirable’ choice. For example, the respondent is presented
at the other. The remaining health states are then with two alternatives and asked to select the more
located on the line between these two in order of preferred. Alternative 1 offers the respondent a
their preference, such that the intervals between particular health outcome for a specified length of
them correspond to the differences in preference time followed by death, and alternative 2 offers a
between the health states, as perceived by the different outcome for a different length of time.
respondent. This is the interval-scaling technique. The time is varied until the respondent is indif-
The scale is measured from 0 assigned to the worst ferent between the two alternatives.
health state of the group and 1 assigned to the best. This technique then requires people to judge
The person is asked to select the best and worst how long a period in one state of health could
health states from the group and then locate the be ‘traded’ for a different period in another state of
other states on the scale relative to each other, health. The assumption underlying this concept
according to the interval-scaling principle. is that the better their state of health, the shorter
period of life people would accept as a ‘trade-off’
for longer survival in a less desirable state.
The standard gamble
All utility scales achieve interval scale level.
With this technique, people are asked to choose Research has found that people find the trade-off
between a gamble, with a desirable outcome, with techniques the easiest of the utility measures, the
risk P, and a less desirable outcome, with risk 1-P, standard gamble has been found to be more difficult
and a certain option of intermediate desirability. for people, and the rating scale the most difficult.
The person is asked what probability of getting One main criticism of these techniques is that
the desirable or less desirable outcome will make disease sufferers probably assign more positive
18 MEASURING HEALTH
utilities to states of ill health than normal people The following chapters present relatively concise
in hypothetical disease states. Very elderly persons reviews of generic health-status measures, and
may feel that a frail and painful existence is just measures of specific domains of broader health-
as valuable to them as someone else’s apparently related, including generic, quality of life. A generic
healthier state. scale is useful when the investigator aims to make
Such models suffer from several limitations. They comparisons between conditions or population
have not been adequately tested for validity and groups; a domain-specific scale should be used
reliability, and they rarely ask sufferers themselves when the topic covered is of particular interest
to make ratings (Carr-Hill 1989; Carr-Hill and to the investigator. Some domain-specific scales
Morris 1991). Judgements are usually made by overlap with disease-specific scales (e.g. scales of
‘proxy’ patients or ‘experts’ (e.g. doctors, nurses and physical functioning and psychological well-
medical students). They assume that people are being). Disease-specific scales are useful when
rational when assessing quality of life, and that the attributes of particular diseases or conditions
individual value judgements are not interfering require assessment, as they will usually be more
with their ratings. It is also difficult to quantify relevant to the condition and more sensitive.
quality of life, which is a multi-dimensional Disease-specific scales were reviewed in Measuring
concept, in terms of one figure. Disease (Bowling 2001).
3
MEASURING FUNCTIONAL
ABILITY
There are a number of methodological techniques period of time or that is likely to affect you
available for measuring function: direct physical over a period of time.
tests of function, direct observation of behaviour
IF YES
and interviews with the person concerned or a
third party. Each method has its limitations, as has Does this illness or disability limit your
been indicated. Direct observation is rarely used activities in any way?
because it is so time consuming. Direct tests of
functioning, such as range-of-limb movement, ‘Longstanding illness’ is defined as a positive
grip strength, walking time; or standards such as answer to the first part of the question, and
joint swelling, pain scores, morning stiffness, ‘limiting longstanding illness’ as a positive
erythrocyte sedimentation rate and joint counts, answer to both parts of the question.
while objective, may not necessarily give an Respondents are also asked ‘What is the
accurate indication of ability or performance. Grip matter with you’ and responses are categorised
strength will tell you how much a patient can into disease groups similar to International
and will squeeze a bag on a particular day. Patients Classification of Diseases categories.
may be more concerned with subjective feelings (Office of Population Censuses and Surveys
and reductions in activities associated with daily 1987; Office for National Statistics 2002)
living.
Most measures of functional disability are self- The main criticism of this type of measure is that
report methods. Respondents are asked to report responses may vary with people’s expectations of
limitations on their activities of daily living (ADL). health and illness and perceptions of limitations.
Sometimes researchers do not wish to use a long Subjectivity is involved. People who are short-
scale to measure functional status, usually because sighted might reply ‘yes’ or they might not define
their questionnaires are already fairly lengthy, and their condition as a ‘longstanding illness, disability
they therefore prefer single-item questions. One or infirmity’ or as limiting. Also, people who are
well-known example is the British General House- used to their conditions and the restrictions they
hold Survey which limits its measure of functional impose may have adjusted to them and no longer
status to questions: define them as limiting. However, if a measure of
perceived health status is required, rather than
Do you have any longstanding illness, objective morbidity indicators, then this inherent
disability or infirmity? By longstanding I subjectivity is the strength of the measure.
mean anything that has troubled you over a There are many measures of functional ability.
20 MEASURING HEALTH
Some measures focus on basic mobility (e.g. walk- which may be equally or more important. Measures
ing indoors/outdoors), although most also include of physical functioning and activity limitations do
self-care activities of daily living (termed ADL, not always provide assessments of functioning in
and includes ability to wash self, bathe, use the toilet, everyday social roles, mental functioning, sexual
eat/drink) along with instrumental activities of functioning, pain and comfort. More meaningful
daily living (IADL), which encompass the activities aspects of household roles are also largely ignored:
that are required for the maintenance of inde- for example, the effect of the condition on the time
pendence and optimum levels of functioning (e.g. taken to perform chores such as cooking, shopping,
do laundry, shop, housework, prepare meals/cook). cleaning, errands, child-care and other roles.
Only those measures which are most well tested Assessments of patients’ satisfaction and choice with
for validity and reliability, of topical interest, are regard to level of functioning are seldom made.
frequently used, or are potentially applicable in For example, people may prefer to have a strip wash
Europe as well as in the USA, are presented in rather than to risk slipping while getting into the
detail. Other scales which are less well tested, older, bath. On scales of functioning it is often assumed
but which are popular or are frequently used, by investigators that respondents achieving a low
are also presented, but in less detail (e.g. the Index score necessarily have a poorer quality of life than
of Activities of Daily Living, Barthel Index and a patient with a higher score. Thus someone who
the Karnofsky Performance Index). While Deyo is wheelchair bound might have a low functional
(1993) recommended that these older scales of ability score, and thereby be assumed by investi-
functioning should be abandoned in favour of the gators to have a poor quality of life, despite the
more recently developed scales (e.g. the HAQ and fact he or she may be receiving good quality social
AIMS), these older scales continue to be popular support and rate their own quality of life as good.
among many investigators, possibly because they Most scales have been developed on the basis
are relatively brief and simple to score, comparisons of professionals’ (e.g. doctors’) judgements about
with studies using the same measures over time can essential abilities for daily living. Berg et al. (1976)
be made, and they are simply well known. Readers asked 150 health workers to assign weight from
are referred to Measuring Disease (Bowling 2001) 0 to 10 for 50 listed abilities or functions; open-
for measures used in a wider variety of generic and ended questions to elicit functions not listed were
disease specific contexts. There are also several also used. Serious problems were documented in
available measures of broader health status which finding simple and meaningful terms to describe
incorporate subscales of functional ability. These functional loss to many respondents. Respondents
are discussed in Chapter 4 on broader measures of assigned the largest average values to ability to use
health status. one’s mental abilities, to see, to think clearly, to love
Measures of functional ability are frequently and be loved, to make decisions for oneself, to live
used in population surveys and evaluative studies at home, to walk, to maintain contact with family
because they are socially relevant and interpretable. and friends, and to talk. Although the sample was
However, self-care limitations are rare in a general limited to health workers, the results indicate a need
population – less than 0.5 per cent are likely to consider lay person’s judgements of essential
to report limitations in eating, dressing, bathing or functions and to include these in measures of health
using the toilet due to poor health. Thus in studies outcomes. Also there are often differing viewpoints
of the general population these items should be of how people ought to be performing, for instance,
selected sparingly in contrast to studies of the on the part of clinicians and patients. The patient
severely ill or very elderly where self-care measures may want to walk without aids or limp, while the
may be more appropriate. Broad measures of func- clinician may regard ‘walking with aids and limp’ as
tion are also likely to miss specific effects of disease, indicative of a satisfactory outcome.
and generic measures should be supplemented with The problem of measuring functional disability is
more highly focused measures of disease impact. compounded by conceptual difficulties and inter-
Most measures of functioning focus narrowly active factors. One of the major problems with
on mobility, self-care and instrumental tasks, often using a functional index is that different people
ignoring financial, emotional and social needs may react differently to apparently similar levels of
MEASURING FUNCTIONAL ABILITY 21
physical impairment, depending on their expecta- people aged 26–86 and reported that the global test
tions, priorities, goals, social support networks and scores can be misleading when applied to different
so on. Functional disability, like dependency, is a age groups. It can be used with community or insti-
multi-dimensional concept which may relate to tutional samples. When used with institutionalized
physical, mental, cognitive, social, economic or samples some questions are omitted, and institution-
environmental factors (Wilkin 1987). Thus it is relevant items are added.
an interactive concept – it is not a necessary con- OARS measures five dimensions of personal
sequence of impairment but perhaps, for example, functioning, including mental impairment. Despite
of the siting of bathrooms, toilets and other facilities its popularity, little work has been done on the
and the necessity for negotiating stairs. In terms measurement properties of the mental-health scale.
of dependency, severity might be a function of the The numerous applications of the OARS during
existence of aids or the frequency and timing of its construction are described in the OARS hand-
help. Perceptions of severity with both disability book (Fillenbaum 1978). One of the most well-
and dependency will also be influenced by previous known studies using the OARS questionnaire was
history and expectations for the future. Meaning the survey of social support in relation to mortality
apart, most scales are not sensitive enough as they carried out in Durham, North Carolina (Blazer
simply ask respondents whether they have no or 1982). This was based on 331 people aged 65 and
some difficulty with a task, or whether they are over, taken from a wider community sample survey
unable to perform it at all. The problem emerges of 997 people. This sub-sample was followed up 30
of how limited does one have to be to answer in months after baseline interview which assessed
the affirmative? More sensitive scales with greater functional status, social support, depressive symp-
response choices based on degrees of severity toms, physical-health status and cognitive function-
require development. ing, stressful life events and cigarette smoking.
Finally, caution is needed when deciding on Increased mortality risk was found for those with
which measures to use. Measures tend to be impaired social support and social interaction.
developed, administered and validated on one of The OARS questionnaire has been widely used
two types of samples – people living in the com- throughout the USA and in other countries, in-
munity or in institutions. The measures are not cluding Australia. Two Spanish translations are
necessarily interchangeable between samples. The available for use with Spanish-speaking people in
measures must be appropriate for the population America. A Brazilian version of the mental health
type. Many measures have been developed to be subscale has been developed (Blay et al. 1988). The
optimally suitable for a particular age group and use of the OARS subscales of mental health and
may be inappropriate for use with other age groups. physical functioning (IADL) has been more com-
mon than the administration of the entire measure
(Gatz et al. 1987; Pfeiffer et al. 1989; Doble et al.
THE OLDER AMERICANS’ RESOURCES AND SERVICES 1997).
SCHEDULE (OARS): MULTI-DIMENSIONAL Administration time is on average 45 minutes but
FUNCTIONAL ASSESSMENT QUESTIONNAIRE probably takes an hour generally to administer.
(OMFAQ) The main limitation of OARS is its length. A
shorter version of OARS is available, known as the
The OARS Multi-Dimensional Functional Assess- Functional Assessment Inventory (FAI), although
ment Questionnaire (OMFAQ) was developed in this still takes approximately 35 minutes to adminis-
order to measure the level of functioning and need ter. The shorter FAI contains the functional
for services of older people. It was developed at assessment items, but not the detailed service-use
Duke University Center for the Study of Ageing items (Pfeiffer et al. 1989). Interviewer training,
and Human Development (Fillenbaum 1978, 1988; which takes two days, is recommended. Its adminis-
Fillenbaum and Smyer 1981). The measure was tration is also explained in a manual which can be
developed for use with adults aged 55 and over and purchased from the authors. As with most scales,
should be restricted to this age group. Gatz et al. OARS is strictly copyrighted and permission for its
(1987) tested OARS on a sample of over 1,000 use must be obtained from the authors.
22 MEASURING HEALTH
Content 2 Without help (can scrub floors, etc.).
1 With some help (can do light housework but need
The questionnaire consists of two independent help with heavy work).
sections. Part A consists the assessment of function- 0 Or are you completely unable to do any housework?
ing in relation to five domains: social, economic, – Not answered.
mental and physical health, and activities of daily
living. Interviewers also make ratings of ability. The Scoring
responses to the items in each area are summarized
on a six-point scale (e.g. level of functioning: 1 = There are various methods of aggregation. The 1–6
excellent to 6 = totally impaired). These five ratings ratings on each of the five scales may be summed
yield a profile showing concomitant functioning (5: excellent functioning in each area to 30: total
across the five areas. This scale contains 66 questions, overall impairment), although summing is prob-
plus 10 questions for completion by an informant. lematic as was indicated earlier. Alternatively, the
If sub-questions are included, the total number of number of areas of functioning that are impaired
items asked is 120, plus five interviewer summary can be counted. To do this, it is necessary first to
ratings. Part B is a services assessment that directs establish which level of functioning indicates
enquiry into 24 generically defined services, impairment. Ratings of 1 and 2 may be combined
determining for each the current use, the extent and compared with ratings of 3–6 (i.e. non-
of use in the past six months, the type of service impaired versus impaired) or the contrast may be
provider and perceived need. There is also a between ratings of 1–3 and 4–6. Summed over
demographic section. Information is sought from areas, this yields a 6-class system (0 areas of function-
the respondent, proxy interviews are permitted. ing impaired to 5 areas impaired). A more complex
Examples of the items are: classification based on a trichotomized scale (1 +
2/3 + 4/5 + 6) which takes account of both number
Can you get to places out of walking distance? of areas of impairment and severity of impairment
has also been developed; full details are contained
2 Without help (can travel alone on buses, taxis, or drive
in the handbook. It is also possible to examine the
your own car).
1 With some help (need someone to help you or go with
responses to each individual question and treat the
you when travelling). items as separate units. For clinical purposes it might
0 Or are you unable to travel unless emergency be important to maintain the distinctions but not
arrangements are made for a specialized vehicle like for population purposes.
an ambulance? However, a classification based on summed
– Not answered. information assumes that areas are equivalent; at
present the validity of this assumption has not been
Can you go shopping for groceries or clothes (assuming established. Consequently a classification system
subject has transportation)? that maintains distinctions between areas may be
2 Without help (taking care of all shopping needs your- preferred.
self, assuming you had transportation).
1 With some help (need someone to go with you on all Validity
shopping trips).
0 Or are you completely unable to go shopping? OARS and the shorter FAI were extensively tested
– Not answered. for validity at the various stages of its development
(Fillenbaum 1978; Cairl et al. 1983). Both instru-
Can you prepare your own meals? ments appear to have face and content validity,
2 Without help (plan and cook full meals yourself).
although discriminant and predictive validity have
1 With some help (can prepare some things but unable not yet been adequately tested.
to cook full meals yourself). Criterion validity was tested on 33 patients from
0 Or are you completely unable to prepare any meals? a family medical practice. Spearman correlation
– Not answered. between separate criterion ratings and the eco-
nomic items was 0.68, 0.67 for mental health, 0.82
Can you do your own housework? for physical health, and 0.89 for self-care ability. On
another study of 82 community residents, the ment, the OARS appears to be a superior measure
Spearman correlation between independent to most others. There is relatively little evidence
psychiatrists’ ratings and the mental-health items of the recent use of the full OARS scales, but it was
was 0.62, and between physician assistants’ ratings unhesitatingly recommended by McDowell and
and the physical-health items was 0.70. Detailed Newell (1996) in their detailed review of measures,
results for reliability and validity of OARS can be and who called for more, larger studies of its
found in Fillenbaum (1978) and Fillenbaum and psychometric properties. On the other hand it
Smyer (1981), and results for the FAI can be found has been criticized for the excessive number of
in Cairl et al. (1983). sub-items and the lengthy period of administration
Fillenbaum (1978) analysed the factor structure (Perlman 1987).
of the OARS instrument and reported that it
represented three factors within social resources,
one in economic resources, four in mental health THE STANFORD ARTHRITIS CENTER HEALTH
(life satisfaction, psychosomatic symptoms, aliena- ASSESSMENT QUESTIONNAIRE (HAQ)
tion, cognitive deficit), and one in each of the
physical functioning subscales confirming their Fries et al. (1980) developed the HAQ on the basis
division into ADL and IADL. The mental health that outcome should be measured in terms of
items have been used separately, in an instrument the patient’s value system. Functional ability (e.g.
known as the Short Psychiatric Evaluation the ability to walk) is a component of this but sedi-
Schedule, and factor analyses of this subscale have mentation rate is not. The framework used for the
indicated that its 15 subscale items represent three development of the HAQ was based on the belief
distinct factors: alienation, somatic symptoms and that a patient desires to be alive, free of pain, func-
depression (Gatz et al. 1987). tioning normally, experiencing minimal treatment
toxicity, and financially solvent. Patient outcome
was thus represented by:
Reliability
OARS and the FAI have been extensively tested for 1 death,
reliability. Tests of agreement of the raters’ assess- 2 discomfort,
ments have been carried out involving 11 raters 3 disability,
who assessed 30 patients. Intraclass correlations 4 therapeutic toxicity,
ranged from 0.66 for physical health to 0.87 for self- 5 dollar cost.
care. Raters were in complete agreement for 74 per
cent of the ratings (Fillenbaum 1978). Reliability The HAQ was one of the first broader, patient-
ratings of the Community Service Questionnaire based, measures of functioning.
gave inter-rater Kendall coefficients of concordance In the process of developing this measure, 62
between 0.70 and 0.93. Test-retest reliability, potential questions were selected from question-
conducted 12–18 months apart, gave correlations naires in use in the rheumatic-diseases field and
of between 0.47 and 1.00. Five-week test-retest elsewhere, including the Uniform Database for
correlations based on ratings of 30 elderly people Rheumatic Diseases (Fries et al. 1974; Convery
gave results of 0.82 for the physical Activities of et al. 1977); the Barthel Index (Mahoney and
Daily Living (ADL) questions, such as personal Barthel 1965); the ADL (Katz et al. 1963). Testing
care, 0.71 for the Instrumental Activities of Daily the measure for reliability and validity with patients
Living (IADL) questions, such as housework, and with rheumatoid arthritis reduced the measure to
0.79 for the economic resources items. For social 21 questions, grouped into nine components, and
resources the correlation was 0.71 and for subjec- graded in ordinal fashion from 0 to 3. Individual
tive questions it was 0.53. Coefficients for life items with correlations of 0.85 or higher were
satisfaction and mental health were lower: 0.42 and eliminated in the interests of conciseness, on the
0.32 respectively. assumption that this suggested redundancy between
Although much of the data on reliability and components. Correlations of remaining items range
validity refers to previous versions of the instru- between 0.35 and 0.65 (Fries et al. 1980). The
24 MEASURING HEALTH
resulting instrument was subsequently administered with much difficulty = 2 and unable to do = 3
in more than two dozen settings. (completely disabled). The index is calculated by
The HAQ is suitable for use in community the addition of scores and then dividing the score
settings and has been frequently administered to by the total number of components answered. The
patients with rheumatoid and osteoarthritis, authors reported reluctance among patients to
systemic lupus and ankylosing spondylitis. It is report sexual activity (Fries et al. 1980), and some
coherent and concise and can be administered face investigators omitted this item (Fitzpatrick et al.
to face or self-administered, so it does not rely on 1988); thus it has been removed from the current
skilled personnel to administer (it is therefore rela- version of the HAQ.
tively cheap). Administration takes 5–10 minutes The scales of pain and drug toxicity range from
and manual scoring can be completed within a 0 to 3. The HAQ VAS pain scale asks about pain
minute. The full HAQ is the most commonly used, over the last week, and consists of a horizontal visual
although a short two-page HAQ is also available, analogue scale (VAS) anchored at each end with 0
which includes the functional ability scale (HAQ (no pain) to 3 (severe pain). A scale of 0 to 100 may
Disability Index), the visual analogue pain scale, and be used instead. The global health status scale is a
visual analogue patient global health scale (Bruce 15 cm VAS anchored with 0 (very well) to 100
and Fries 2003a, 2003b). (very poor). The drug toxicity index is composed
of questions about the adverse effects from drugs
Content and treatment ranging from none = 0 to severe = 3.
In the personal-cost section (applicable to private
Functional ability is measured by 20 questions health care systems), medical and surgical costs are
within eight components relating to movements of calculated for the past year. The number and type of
the upper extremity, locomotor activities of the medications, X-rays, surgery, physician and para-
lower extremity and activities that involve both medical visits, appliances, number of laboratory
extremities: dressing and grooming, rising, eating, tests, and hospitalizations are detailed. The average
walking, hygiene, reach, grip, outside activity. Sex- cost in the area covered by the research team
ual activity was included in an earlier version. Each (Stanford, California) was determined and used for
of these components consists of two or three rele- the computation of the dollar values. This section
vant questions. Pain, discomfort, drug toxicity and can be applied only in countries and areas where
financial costs are also assessed. The functional abil- costs are known. Social costs are calculated by deter-
ity scale is the most commonly used, rather than the mining changes in employment, income, the need
full HAQ. Examples of questions are: to employ domestic help, the cost of transport for
medical care and all arthritis-related costs over the
Are you able to: past 12 months (Fries et al. 1980). The cost
Dress yourself, including tying shoelaces and doing questions have not yet been satisfactorily tested;
buttons?
initial tests for validity suffered from poor patient
Stand up from an armless chair?
Get in and out of bed?
recall. Most investigators only use the scales of
Walk outdoors on flat ground? functioning.
Do chores such as vacuuming, housework or light Fitzpatrick et al. (1989), in their comparison of
gardening? the HAQ and the Functional Limitation Profile
(FLP) (derived from the Sickness Impact Profile),
The range of answers is, ‘without any difficulty; reported that nothing appeared to be achieved in
with some difficulty; with much difficulty; unable relation to precision by the complex scoring system
to do’. utilized by the FLP in comparison with the simpler
ordinal assumptions of the HAQ.
Scoring
Validity
In relation to functional ability, the ordinal scoring
of 0–3 is based on the following scale: without dif- It has been extensively validated. Correlations of the
ficulty = 0 (no disability), with some difficulty = 1, HAQ against observed patient performance ranged
from fair to high (0.47–0.88) (Fries et al. 1980, cent improvement in condition at one and two
1982; Fries 1983; Kirwan and Reeback 1983), and years (odds ratios 1.77 to 5.05) in four out of
it correlated highly with a range of clinical and five outcome measures used in a drug-treatment
laboratory measures (Fries et al. 1980, 1982; Ramey trial of patients with early diffuse scleroderma. The
et al. 1992, 1996; Bruce and Fries 2003a). HAQ was reported in early studies to be sensitive
Several studies have tested the validity of the to change in patients’ conditions (Fries 1983).
HAQ, in particular by correlation of the results of Fitzpatrick et al. (1989), in their UK study, reported
the HAQ with the Arthritis Impact Measurement that the HAQ performed better than the FLP in
Scales (AIMS) (Fries et al. 1982; Brown et al. 1984). relation to specificity and sensitivity, although at
These two instruments were shown to measure the best this can only be said to be moderate. The large
same dimensions of disability; the correlation co- standard deviations in the scores of both measures
efficient reported by Fries et al. (1982) was 0.91. indicated the presence of many ‘false positives’ for
Inter-correlations within the three parts of each both improvement and deterioration of patients
instrument relating to physical disability, psycho- over time.
logical state and pain were high and those across The validity of the HAQ pain scale and global
these three dimensions were weak. Patient self- health status scale has been demonstrated in many
assessed global arthritis scores were also strongly studies (Ramey et al. 1996; Bruce and Fries 2003b).
associated with disability score and less strongly to However, initial tests for the validity and reliability
pain. These correlations are consistent with current of the drug toxicity section revealed weak results,
knowledge within the speciality of rheumatology, and this component requires further testing (Fries
that disability is a large component of arthritic et al. 1980).
patients’ concerns. HAQ scores correlated in the Principal components analysis has shown factor
expected direction with the direct medical costs loadings along the first ‘disability’ component,
of treatment for rheumatoid arthritis (Michard et al. which explains 65 per cent of the variance, and a
2003), and were able to predict outcome at 6 second component with positive loadings for
months in an exercise programme for people with fine activities of the upper extremity and negative
osteoarthritis of the knee (Dias et al. 2003). loadings for weight-bearing actions of the lower
The HAQ has been reported by Liang et al. extremity, which explains an additional 10 per cent
(1985) to correlate well with other well-tested of the variance (Fries et al. 1980, 1982). From this,
scales of health status, such as the Sickness Impact it was inferred that the resulting disability index (an
Profile (Bergner et al. 1981). A study in the UK by equal weight sum) is well focused and appropriate
Fitzpatrick et al. (1988) of 105 patients with for measuring overall arthritis severity.
rheumatoid arthritis reported high inter-measure
correlations between the HAQ, the Functional
Reliability
Limitations Profile (FLP) (Charlton et al. 1983), and
between observations of grip strength and the The earliest tests for reliability were reported by
articular index (e.g. the correlations between grip Fries et al. (1980). In addition to the reporting of
strength and the HAQ on two occasions were mean values, these included correlations of HAQ
−0.73 and −0.68). The HAQ appears to be a valid scores with the results of direct observations by a
measure of function in rheumatoid arthritis. Liang nurse of patients’ performance of 15 household
et al. (1985) also reported good correlations between and personal-care tasks, mobility and grip; and of
the HAQ and other scales of health status and func- self-administered and interview-based HAQ com-
tional ability, including the Sickness Impact Profile pletion. These early tests were based on just 20
(Bergner et al. 1981) and the Functional Status patient volunteers attending a rheumatoid arthritis
Index (Denniston and Jette 1980; Jette 1980). clinic. The correlations for individual items for
HAQ scores have been reported to be predictive self-administration versus interview-administered
of outcome among patients with rheumatoid HAQ range from average (or ‘respectable’) at 0.56
arthritis, and other conditions. Sultan et al. (2004) to excellent at 0.85. The corresponding correlations
reported that a baseline HAQ disability score of less for the disability score was 0.85, indicating good
than the median was predictive of at least a 20 per reliability for this component.
26 MEASURING HEALTH
The inter-item correlations ranged from average 500 references (Bruce and Fries 2003a, 2003b).
at 0.47 to excellent at 0.88. The weaker items (e.g. While freely available and considered to be in the
reach) were subsequently reworded to minimize public domain (although independently translated,
variability in responses – e.g. this question originally copies may carry a charge), the HAQ is strictly
read ‘reach and get down heavy objects’. People’s copyrighted by Stanford University (trex@stan-
ideas of ‘heavy’ varied, so respondents are now ford.edu; http://aramis.stanford.edu) in order to
asked about a standardized item: ‘reach and get prevent unauthorized modifications and to preserve
down a 5 lb. bag of sugar which is above your head’. its validity and the standardization of assessments
The authors also compared overall questionnaire across studies.
and evaluator agreement; these agreed exactly on
59 per cent of the responses and were within one
point in 93 per cent of cases (the weighted kappa THE ARTHRITIS IMPACT MEASUREMENT SCALES
(AIMS)
statistic result, using rank disagreement rates, was
0.52, implying ‘moderate’ agreement). The original AIMS1, and the revised AIMS2, were
Fries et al. (1982) reported mean values from a developed by Meenan et al. (1980) and Meenan and
diverse English-speaking community population Mason (1990). A shortened version of the AIMS1
of 331 respondents suffering from rheumatoid was produced with good psychometric properties
arthritis. The authors also reported the test-retest (Wallston et al. 1989). AIMS2 was developed in an
correlation of 0.98, based on this population. The attempt to produce a more comprehensive and
mean values showed stability on repeat testing. sensitive version of AIMS1 (Meenan et al. 1992).
Responses are similar when the instrument is self- A short-form version of AIMS2 (AIMS2-SF)
completed, administered by a nurse or doctor. has been developed with good psychometric
Many studies have since replicated or improved properties, similar to those in the full AIMS2
upon these correlations indicating excellent levels (Guillemin et al. 1997).
of reliability (Ramey et al. 1996). Both AIMS1 and AIMS2 cover physical, social
The HAQ is a good measure of function and has and emotional well-being, and were developed to
been extensively tested for reliability and validity. It assess patient outcome in arthritis and other chronic
is sensitive to change, can be self- or interviewer- diseases. AIMS1 was partly adapted from Katz’s
administered and is suitable for use in the com- Index of Activities of Daily Living, the RAND and
munity. It is often the preferred tool for use because BUSH scales (Patrick et al. 1973a; Brook et al.
it is concise, short and easy to administer (Lubeck 1979a, 1979b; Ware et al. 1979; Bush 1984). AIMS1
2002). It has been used in population-based studies and AIMS2 were well tested for reliability and
as well as evaluations of treatment outcome (Bruce validity, and sensitivity to change. Its applications
and Fries 2003b). However, it does not capture dis- have been predominantly in clinical settings (with
ability associated with sensory dysfunction, patient arthritis and rheumatism patients) as an assessment
satisfaction or social role functioning. of outcome after therapy. The instrument can be
Pincus et al. (1983) developed an abbreviated self-completed. The self-completion time for
version of the HAQ with one question for each of AIMS2 is an average of 23 minutes (Meenan et al.
the eight disability domains, plus new questions 1992).
on satisfaction and transition items on changes in
ability. Its test-retest reliability was high (0.91), and
Content
although correlations with grip strength, walking
time and functional classification were more The original AIMS1 had 45 multiple-choice items,
moderate (0.44–0.60). Other modified versions with nine subscales. It assessed nine dimensions of
have been developed for use in other disease- health and functional ability: mobility, physical
specific contexts, including AIDS (Lubeck and activity (walking, bending, lifting), activities of daily
Fries 1992). The HAQ has been extensively used living, dexterity, household activities (management
among clinicians in the USA and the UK (Ramey of money, medication, housekeeping), pain, social
et al. 1992), it is available in more than 60 languages activity, depression and anxiety. An additional 19
and is supported by a bibliography of more than items covered general health, health perceptions
and demographic details (e.g. questions, including a are listed in Guttman scale order, so that a respond-
visual analogue item, that assess the effect of arthritis, ent who indicates a disability on one item will also
other medical problems and their treatment). indicate disability on section items falling below
AIMS2 is a longer, 78-item questionnaire than it. Unlike the AIMS1 which had a combination of
AIMS1 (which contained just 45 items). It has some dichotomous ‘yes/no’ and scaled response categor-
new items and others have been revised or deleted. ies, the AIMS2 has mainly scaled response choices;
The three new sections evaluate arm function, work examples include: ‘All days’ (1) to ‘no days’ (5);
and social support. Sections were added to assess ‘always’ (1) to ‘never’ (5) or ‘very satisfied’ (1) to
satisfaction with function, attribution of problems ‘very dissatisfied’ (5). Scale scores are summed, the
to arthritis, and self-designation of priority areas for range of scores depends on the number of items
improvement. The first 57 items form 12 scales: in the subscale. No item weights are used. A ‘nor-
mobility level, walking and bending, hand and finger malization procedure’ converts scores into the
function, arm function, self-care tasks, household range 0–10, with 0 representing good ‘health status’
tasks, social activity, social support, pain from and 10 representing poor ‘health status’ (in AIMS1
arthritis, work, level of tension and mood. The and AIMS2). The scale is ordinal in type. As men-
remaining items relate to satisfaction with health tioned earlier, the original AIMS was described as
status in each of the areas of functioning measured, superior to other applications of Guttman scaling
functional problems due to arthritis, prioritization (McDowell and Newell 1996).
of the three areas in which the respondent would
most like to see improvement, general health per-
Validity
ceptions, overall impact of arthritis in each of the
areas of functioning measured, type and duration of Extensive studies of the validity and reliability of
arthritis, medication usage, co-morbidity and socio- AIMS1 were conducted, demonstrating its good
demographic characteristics (Meenan and Mason psychometric properties (Meenan et al. 1980,
1990, 1994). Most questions refer to problems 1982, 1984; Meenan 1982, 1985; Brown et al. 1984;
experienced within the last month. Meenan and Mason 1990, 1994; Weinberger et al.
1990). The scaling properties, validity and reliability
Content of the AIMS2 have also been reported to be good.
Meenan et al. (1992) reported that subjective
Examples of AIMS2: patient assessments of problems and areas in need
of improvement were significantly associated with
How often were you in a bed or chair for most or all of
a poorer AIMS2 score in that area. Haavardsholm
the day?
Did you have trouble doing vigorous activities such as
et al. (2000), on the basis of a community survey
running, lifting heavy objects, or participating in strenu- of over a thousand rheumatoid arthritis patients,
ous sports? reported that the components of the AIMS2 and
Could you easily button a shirt or blouse? AIMS2-SF had substantial to near perfect agree-
How often did you get together with friends or relatives? ment, and internal consistency was high in all com-
How often did you have severe pain from your arthritis? ponents. Both instruments correlated well, and simi-
How often were you unable to do any paid work, larly, with other measures of health status (Medical
housework or schoolwork? Outcomes Study Short Form-36 and a modified
All days (1)/most days (2)/some days (3)/few days (4)/no Health Assessment Questionnaire) supporting their
days (5) convergent validity, and with an indicator of patient
How often have you felt tense or strung up?
How often have you been in low or very low spirits?
assessed change in health status. De Joode et al.
Always (1)/very often (2)/sometimes (3)/almost never (4)/ (2001) also reported that the physical health com-
never (5) ponent of the Dutch version of the AIMS2 scale
correlated significantly with clinical data in patients
with haemophilia, and also with the Sickness
Scoring
Impact Profile (SIP) (Pearson’s r = 0.53; P < 0.05).
The AIMS2, like AIMS1, is self-administered, and However, the psychological health and social inter-
takes approximately 20 minutes to complete. Items action components of the Dutch AIMS2 did not
28 MEASURING HEALTH
correlate significantly with the psychosocial com- Living. Katz designed the index in order to
ponents of the SIP, possibly they were tapping describe, for clinical purposes, the functional status
different conceptual parts. of elderly and chronically ill patients.
Meenan et al. (1992) reported that, in samples
of patients with osteoarthritis and rheumatoid Content
arthritis, all within-scale factor analyses produced
single factors, except for mobility in patients with The index consists of a rating form that is com-
osteoarthritis. Salaffi et al. (2000), used the Italian pleted by a therapist or other observer on the basis
version of AIMS2 in a study of patients with osteo- of observation and interview. In each of the activities
arthritis of the knee, and reported a three-factor assessed, the patient is rated by the observer on a
health status model explaining 63 per cent of the three-point scale of independence for each activity.
variance between patients. The index assesses independence in functioning in
six areas: bathing, dressing, toileting, transferring
from bed to chair, continence and feeding. On the
Reliability basis of more than 2,000 evaluations of states of
Meenan et al. (1992) reported, on the basis of a patients, the authors observed that these functions
study of 408 respondents with rheumatoid or decreased in order. They claimed to have a measure
osteoarthritis, that the internal consistency co- of fundamental biological functions, a claim
efficients for the 12 AIMS2 scales were 0.72–0.91 questioned by those using Guttman scales (Williams
in the rheumatoid arthritis group and 0.74–0.96 in et al. 1976). The evaluation form is shown below:
the osteoarthritis group. Test-retest reliability at two
Bathing (sponge bath, tub bath, or shower)
weeks (postal survey) was 0.78–0.94. Salaffi et al.
Receives no assistance (gets in and out of tub by self if
(2000) reported high internal consistency for the tub is usual means of bathing)
Italian version of AIMS2, and that test-retest Receives assistance in bathing only one part of the body
reliability at 6 months exceeded 0.80 for eight of (such as back of a leg)
the 12 subscales. De Joode et al. (2001) reported Receives assistance in bathing more than one part of the
that the Dutch version of the AIMS2 scale and body (or not bathed)
subscales had moderate to high internal consistency Dressing
in patients with haemophilia (Cronbach’s alpha = Gets clothes and gets completely dressed without
0.62–0.92). assistance
In sum, the AIMS has good measurement Gets clothes and gets dressed without assistance except
properties, has been extensively tested for validity for assistance in tying shoes
and reliability, and the identified dimensions Receives assistance in getting clothes or in getting
explain the majority of illness impact estimated by dressed, or stays partly or completely undressed
patients. AIMS2 appears to be a superior instru- Toileting
ment to AIMS1. Like AIMS1, it has been validated Goes to toilet room, uses toilet/cleans self, arranges
for use in many countries, including Italy and Hol- clothes, and returns without any assistance (may use
land (Salaffi et al. 2000; de Joode et al. 2001). The cane, walker or wheelchair and may manage night
AIMS scales are strictly copyrighted. A manual is bedpan or commode)
available for the AIMS2 (Meenan and Mason 1990, Receives assistance in going to toilet room or in using
1994). toilet/cleaning self or arranging clothes
Doesn’t go to toilet room
Transfer
THE INDEX OF ACTIVITIES OF DAILY LIVING (ADL) Moves in and out of bed and chair without assistance
(may use cane or walker)
One of the oldest, and best known, of the disability Moves in and out of bed or chair with assistance
scales is the Activities of Daily Living (ADL) index Doesn’t get out of bed
developed by Katz et al. (1963, 1966, 1968, 1970, Continence
1973; Katz and Akpom 1976). This is also known as Controls urination and bowel movement completely by
the Index of Independence in Activities of Daily self
Has occasional accidents D Independent in all but bathing, dressing and one
Supervision helps keep urine or bowel control; catheter additional function
is used, or is incontinent E Independent in all but bathing, dressing, going to
toilet and one additional function
Feeding
F Independent in all but bathing, dressing, going to
Feeds self without assistance
toilet, transferring and one additional function
Feeds self except for getting assistance in cutting meat or
G Dependent in all six functions
buttering bread
Other Dependent in at least two functions, but not
Receives assistance in feeding or is fed partly or
classifiable as C, D, E or F.
completely by using tubes or intravenous fluids
The authors later developed a survey instrument for Full definitions of activities are given by Katz et al.
obtaining health-status data containing questions (1970).
about the need for and use of health services and
attitudes towards medical care. Five categories of Validity
‘need’ were defined and ranked: no disability,
Despite the scale’s widespread popularity among
restricted activity with no chronic conditions,
clinicians worldwide, there is little evidence of the
restricted activity with chronic condition, mobility
validity of the scale. Katz et al. (1970) administered
limitations and bed disability. These were chosen to
the ADL and other instruments to 270 chronically
permit comparisons with existing national surveys
ill patients. The ADL correlated weakly to moder-
(Katz et al. 1973).
ately with a mobility scale (0.50) and with a scale of
home-confinement (0.39).
Scoring The index of ADL was shown to predict the
Patients are graded on ordinal three-point scales by long-term course and social adaptation of patients
interviewers in relation to their ability in bathing, with a number of conditions, including strokes and
dressing, transferring, toileting, continence and hip fractures, and was used to evaluate out-patient
feeding. Scores on individual scales are translated treatment for rheumatoid arthritis (Katz et al. 1966,
into ‘dependent’/‘independent’ classifications, and 1968). It has also been shown to predict mortality
then the overall level of functioning is summarized (Brorsson and Asberg 1984). There is other early
on an eight-point scale (A-G, plus ‘Other’ – see evidence of its predictive validity, reported by Katz
below. An alternative scoring system simply counts and Akpom (1976). There are more studies which
the number of activities in which the person is support the ability of the scale to discriminate
dependent (0 – independent in all six functions) to between groups in the hypothesized direction. For
6 – dependent in all 6 functions), removing the example it was able to distinguish between people
need for the ‘Other category’. Some investigators aged 77<85, 85<90 and 90 and over in a com-
have reversed the scoring so that a score of 6 indi- munity survey, with the oldest members having the
cates full functioning, 4 indicates moderate func- poorest ADL scores (von Strauss et al. 2003). How-
tioning, and 2 or less indicates severe functional ever, such concise indices tend to be insensitive to
impairment. Thus each method produces a single small changes in disease severity and to focus on
total score, and all items are treated as equal. The physical-performance measures. It is also of limited
use of a single index results in a loss of informa- value in community surveys of elderly people
tion about variability, because different patterns because, like other short scales, it does not take
of restriction, with different implications, can be adaptation to environment into account. It appar-
reduced to the same score. ently underestimates dysfunction in community
The eight-point scale is: populations (Spector et al. 1987).
A Independent in feeding, continence, transferring, Reliability

going to toilet, dressing and bathing
B Independent in all but one of these functions Little testing for reliability has been carried out;
C Independent in all but bathing and one this is again surprising given the popularity of this
additional function scale, particularly by clinicians. Katz et al. (1963)
30 MEASURING HEALTH
assessed inter-rater reliability; they reported that Content
discrepancies between raters occurred in 1 in 20 The index deliberately focuses on a narrow range of
observations. activities, in order that the concepts underlying
A Swedish study of Guttman analyses on 100 them are generally applicable to a wide section of
patients yielded coefficients of scalability of the population and for ease of application within a
0.74–0.88, suggesting that the index is a successful survey framework of the general population:
cumulative scale (Brorsson and Asberg 1984).
More evidence of the ADL’s reliability and validity
(a) activities which maintain personal existence, such
is required. as drinking, eating, evacuating, exercising, sleeping,
The Katz is one of the earliest indices used in the hearing, washing and dressing;
evaluation of the care of elderly and chronically ill (b) activities which provide the means to fulfil these
patients, and hence one of the best known. The personal acts, such as obtaining food, preparing
items included in the scale have formed the basis for meals, providing and cleaning a home.
ADL scales used in major national surveys in Britain
(Bridgwood 2000). It is still a popular and useful The scale asks respondents if they have difficulty
index with a restricted range of patients, particularly doing the following tasks (none = 0, some difficulty
those living in nursing and care homes, or clinical = 1, or unable to do alone = 2), washing all over,
populations. However, the range of disabilities cutting own toenails, running to catch a bus, carry-
included in the instrument is not comprehensive ing heavy shopping, going up/down stairs, doing
and thus the populations to which it can be heavy housework, preparing a hot meal, reaching a
administered are restricted. It focuses on basic jug from an overhead shelf and tying a good knot in
activities for daily living. The single index also a piece of string. It can be self- or interviewer-
means that the information derived from the index administered, is easily completed and concise. For
is limited. Wade (1992) concluded that while this example:
was once the most popular scale used by neurolo-
gists, it has since been overtaken by the Barthel. Do you or would you have any difficulty (or find it
troublesome, exhausting or worrying):
TOWNSEND’S DISABILITY SCALE (a) Washing down (whether in bath or not)?

(b) Removing a jug, say, from an overhead shelf?
Townsend’s Disability Scale is frequently used in (c) Tying a good knot in string?
population surveys of older people in the UK. It (d) Cutting toenails?
comprises a list of activities of daily living, derived
from early research on disabled people of all ages The remaining questions are not asked of children under
in the UK and the USA (Haber 1968; Sainsbury the age of 10 or the bedfast:
1973) and from Townsend’s own early survey work
(e) Running to catch a bus?
on elderly people (Townsend 1962; Shanas et al.
(f) Going up/down stairs?
1968). It was also used in Townsend’s later poverty (g) Going shopping and carrying a full basket of
survey (Townsend 1979). It is still popular because it shopping in each hand?
is concise and simple, and focused on tasks which (h) Doing heavy housework?
are relevant to people living at home. Sainsbury (i) Preparing a hot meal?
(1973) stated that the list of tasks of daily living
initially selected was chosen on the basis of a ‘sub-
Scoring
jective’ decision ‘of the more important daily and
social activities’. Its advantages are its brevity and Difficulty with each activity is given equal weight-
acceptability to elderly people. Although the scale is ing; changes in individual capacity from day to day
also useful in that needs for particular types of and season to season are ignored. ‘No difficulty’ is
health and social services can easily be inferred from scored as 0, ‘with some difficulty’ is scored as 1, and
the items, it still requires more detailed testing. the score is 2 if the reply was ‘unable to do alone’.
The overall score has a range of 0–18. Townsend, as
a result of early validation work, regards people with more useful as the number of people with no
a score of 0 as having no disability, 1–2 as slightly problems with any of the tasks is then evident in the
affected, 3–6 some disability, 7–10 as appreciable raw total score, and is simpler to manipulate statis-
disability and 11–18 as severe/very severe. The basis tically. The scoring methods still require validation.
for this does not appear to have been tested any The extended scale is shown below:
further.
Mobility
Getting in/out of bed
Validity and reliability Transferring from a chair/wheelchair
None of the original studies which used the scale Going up/down stairs
Getting on/off toilet
provided adequate details of the initial testing of the Getting in/out of bath
measure (Haber 1968; Sainsbury 1973). Bowling Getting about indoors
and Gabriel (2004), used the original version of the Getting about outdoors
scale with a national sample of people aged 65 and Using public transport
over living at home; they reported that it had high
Personal care
internal consistency and a Cronbach’s alpha of
Washing self/shaving (men)
0.91. Bathing self
There are numerous examples in the UK of Dressing self
applications of the scale, or adaptations of it, and Brushing/combing hair
population norms are available (e.g. Vetter et al. Washing hair
1982; Vetter and Ford 1989; Bowling et al. 1992, Cutting toenails
2002; McGee et al. 1998). The scale is popular Domestic
because it is simple and covers a range of activities Cooking/preparing meal
which are relevant to living at home, although more Housework
extensive testing for its reliability, validity and factor Laundry
structure is still required. It was used on the assump- Shopping
tion that it represents one factor, although it covers Doing odd jobs
three domains of functioning (mobility; personal
care (ADL); and domestic activities (IADL)). Two other items were initially included, brushing
McGee et al. (1998) criticized the method of or managing teeth and handling money, but these
reporting factor analyses of the instrument. appeared ambiguous. They were excluded after
factor analysis. On the basis of community surveys
with 662 people aged 85 and over living in London,
Adaptations
and almost 700 people aged 65+ living in Essex and
The limitations of this scale have led to adaptation London, inter-item correlations coefficients
of the scale by many researchers, in terms of adding between tasks ranged from around 0.13 (this was for
other activities of daily living or removing others. shopping and managing teeth or dentures which
For example, some of the items are not appropriate would not be expected necessarily to correlate
for use with a frail population (e.g. ‘running to catch highly) to around 0.74 (for difficulties with washing
a bus’). Adaptation of the scoring is also common. self and with dressing self which would be expected
Bowling and Grundy (1997) increased the number to correlate more highly); split half reliability:
of items and also the range of responses to: no dif- 0.78–0.91 between samples. The inter-item corre-
ficulty with the task, slight, moderate or severe dif- lation (alpha) for the personal-care task section was
ficulty, unable to do alone, and unable to do at all 0.70–0.75; for the domestic-task section was 0.80–
(even with help). Each response was then scored 0.85; and for the mobility section was 0.81–0.89.
from 0 (no difficulty) to 5 (unable to do at all) and Testing of the scale was carried out across the three
the scores are summed to produce a total score. The samples, and results were highly statistically signifi-
sub-sections representing mobility, ADL and IADL cant indicating that the scale has good reliability.
can also be scored separately. It was initially scored The ADL scale items correlated moderately to well
from 1 to 6, but the score of ‘0’ was judged to be with comparable items on dressing self, trouble with
32 MEASURING HEALTH
steps/stairs, walking outdoors, and walking indoors ironing clothes; light housework; making bed
from the Nottingham Health Profile (Hunt et al. (coefficient of reproducibility: 0.96).
1986), which was used for a sub-sample (0.635,
p < 0.0001; 0.565, p < 0.0001; 0.350, p < 0.004; and
0.472, p < 0.0001 respectively). The scale, and items THE KARNOFSKY PERFORMANCE INDEX (KPI)
from it (e.g. difficulties getting about outdoors), was
also highly significantly associated with relevant The Karnofsky Performance Index (sometimes
physical health problems (e.g. aches, pains, stiffness called Karnofsky Performance Scale) emphasizes
in joints, muscles), health and social service use, physical performance and dependency. It was
life satisfaction and mental health, and predictive originally designed for use with lung cancer
of worsening emotional well-being, supporting its patients in relation to assessing palliative treatments
convergent and predictive validity (Bowling et al. (Karnofsky et al. 1948, 1949). It is a simple scale
1992, 1993, 1994a, 1994b). The convergent validity from 0 to 100 (normal-moribund), developed for
of the scale is further supported by its ability to use in clinical settings. The scale is heavily weighted
independently predict quality of life rating in the towards the physical dimensions of quality of life,
expected direction, in a British survey of 999 rather than social and psychological dimensions.
people aged 65 and over living at home (Bowling et Patients are assigned to categories by a clinician or
al. 2002). other health care professional. It is widely used in
A similar scale was developed by the Institute for the USA and Europe.
Economic and Social Research at York University An early literature review, examining the fre-
and used in a wide range of surveys (Morton- quency of measurement of quality of life in clinical
Williams 1979). Their scale encompasses a wider trials of outcome of care in six international cancer
range of self-care activities than the original journals, showed that only 6 per cent attempted
Townsend scale, with alternative rankings of: able to measure it, and that the vast majority of this 6 per
to do easily, with a little difficulty, with a lot of cent used the original performance criteria of
difficulty, unable to do without someone helping, Karnofsky (Bardelli and Saracci 1978).
unable to do even with someone helping. Testing of
this modified scale is also incomplete. Content
Another adaptation of the scale was undertaken
by Bond and Carstairs (1982) in their survey of Examples of some of the classifications, which are
5,000 elderly people in Scotland. These authors dis- made by professionals, and scores, are:
tinguished between functional criteria for depend-
ency (mobility, self-care and home-care capacity) Normal; no complaint; no evidence of disease
and clinical criteria for dependency (incontinence, (index: 100).
Requires occasional assistance from others but able to
mental state), and they attempted to measure these.
care for most needs (index: 60).
They selected items from the original Sainsbury Disabled; requires special care and assistance
scale, on the basis of inter-item correlations, which (index: 40).
were then subjected to a Guttman scaling analysis. Moribund; fatal processes progressing rapidly
The items selected were based on a hierarchical (index: 10).
concept: having difficulty with an activity is associ- Dead (index: 0).
ated with having difficulty with earlier scale items.
The items classified as mobility incapacity were:
Scoring
difficulty/ability travelling by bus, walking out-
side, getting up from chair (coefficient of repro- Each of the 11 components of the scale is given a
ducibility: 0.97); self-care incapacity included dif- notional percentage score (100 = normal; 0 = dead).
ficulty/ability washing hair, self, bathing; dressing, The scoring procedure has not been assessed for
put on shoes and socks/stockings (coefficient of validity. Various categorizations exist.
reproducibility: 0.99); and home-care incapacity There are problems with this method, particu-
included: difficulty/ability: doing heavy shopping; larly with the assumption that a patient with a low
washing clothes; preparing and cooking meals; score due to immobility necessarily has a poorer
quality of life than a patient with a higher score, and because the latter is multi-dimensional. One of its
vice versa. For example, the patient with a poor most well-known applications has been in the US
score may have better social support than a patient National Heart Transplantation Study (Evans et al.
with a good score and thus, despite the immobility, 1984). Results from this study showed a marked
may have a better quality of life. Moreover, the shift in the distribution of Karnofsky scores before
Karnofsky ratings are often reported as a mean score, and after transplantation.
yet there is no evidence that the intervals between Although it is frequently described as an indica-
the ten categories represent the same degree of tor of quality of life (Sitjas et al. 2003), it is a measure
dysfunction (O’Brien 1988). Interpretations of the of functioning not broader quality of life. In a study
scale’s classifications are likely to vary, particularly of 139 lung cancer patients, Schaafsma and Osoba
as the points cover different conceptual elements. (1994) reported weak associations between obser-
ver-rated Karnofsky scores and self-rated quality
of life using the core European Organization for
Validity
the Treatment of Cancer 30-item Questionnaire
The disadvantage of the Karnofsky Performance (Aaronson 1993). Although popular among clini-
Index, apart from its limited content, is that it cians, particularly in cancer studies (see the author’s
involves categorization of patients by another per- Measuring Disease for review in relation to
son. This is a fundamental flaw, given the evidence oncology), it is a limited measure. In addition, Yates
of discrepancies between patients’ and physicians’ et al. (1980) carried out the first objective validation
ratings of functioning and quality of life. In Evans of the scale and concluded that the index is not
et al.’s (1984) study, for example, there were wide appropriately scaled and that scale values may bear
discrepancies between patients’ self-assessments, no relation to clinical significance.
based on the Sickness Impact Profile, and physicians’
assessments, based on the Karnofsky Performance
Reliability
Index, with the latter rating patients as being less
impaired than the former. The results of testing for inter-rater reliability have
However, it is still very popular among clinical varied widely, with some studies reporting low
researchers, especially in oncology (Hwang et al. reliability at 29–43 per cent with Cohen’s kappa,
2003; Llobera et al. 2003; Tentes et al. 2003) Mor and others reporting higher (Pearson’s) correlations
et al. (1984) used the scale in a national hospice of 0.66–0.69 (Hutchinson et al. 1979; Yates et al.
evaluation study. They reported that the convergent 1980). Mor et al. (1984), on the basis of a national
validity of the scale was achieved: it was strongly hospice evaluation, reported that the inter-rater reli-
related to two other independent measures of ability coefficient of the 47 interviewers employed
patient functioning (Katz’s ADL scale and another at test-retests at four-month intervals was 0.97.
quality-of-life assessment). It was also able to predict Slevin et al. (1988) reported a study involving
longevity (0.30) in the population of terminally-ill 108 cancer patients in which two different groups
cancer patients, thus indicating that it has predictive of patients filled in the same collection of instru-
validity. Firat et al. (2002) also reported, on the basis ments, including the Karnofsky index, on a single
of a sample of patients with lung cancer, that the day, and then daily for five consecutive days.
KPI was independently associated with overall Although the Karnofsky was more robust than the
survival, and was a better predictor (along with co- other measures tested (Spitzer’s Quality of Life
morbidity) than clinical tumour stage. Thus, the Index, the Hospital Anxiety and Depression scale
Karnofsky Performance Index is a successful pre- and LASA visual analogue scales), the same score
dictor of survival. Llobera et al. (2003) also reported was achieved on only 54 per cent of occasions,
that it was predictive of dependence and deteriora- despite the fact that only the top five points on the
tion of cancer patients during their terminal stages. Karnofsky were covered.
Mor (1987), again in a study of cancer patients The reliability and validity of the index have only
(total: 2,046), reported a moderate correlation recently begun to be examined (Hutchinson et al.
between the Karnofsky and the Spitzer Quality of 1979). The scale’s ‘numeric’ status has not been
Life Index; the correlation was probably moderate seriously challenged, and it has generally been
34 MEASURING HEALTH
uncritically accepted and applied in a large number living such as cooking and shopping and other
of clinical settings (Schag et al. 1984). It has even everyday tasks essential for life in the community.
been used in settings where its applicability must It thus appears suitable only for institutionalized
be questioned (bone-marrow transplantation in populations (for whom it was designed).
children) (Hinterberger et al. 1987).
Other versions of the scale have been developed,
Scoring
but with no obvious improvements in effectiveness
over the original version (Zubrod et al. 1960; World Different values are assigned to different activities.
Health Organization 1979; Nou and Aberg 1980). Individuals are scored on ten activities which are
These are described in Measuring Disease (Bowling summed to give a score of 0 (totally dependent)
2001). to 100 (fully independent). The authors of the
scale provide detailed instructions for assessing and
scoring patients; for example:
THE BARTHEL INDEX
Doing personal toilet: 5 = patient can wash
The Barthel Index was developed by Mahoney and hands and face, comb hair, clean teeth, and
Barthel (1965). The Barthel Index is a popular shave. He may use any kind of razor but must
measure in neurology, and is widely judged to be put in blade or plug in razor without help as
useful and able to predict outcomes, particularly in well as get it from drawer or cabinet. Female
stroke patients. It is not a comprehensive measure of patients must put on own make-up, if used, but
functioning (for example, it omits domestic, social need not braid or style hair.
and other role functioning) and is less suitable for
use in the community and less frail or disabled A modified scoring method gives a maximum score
people. It is based on observed functions. The index of 20 to patients who are continent, able to wash,
was developed to measure functional ability before feed and dress themselves and are independently
and after intervention treatment, and to indicate the mobile (Collin et al. 1988).
amount of nursing care required. It was designed for The scores are intended to reflect the amount
use with long-term hospital patients with neuro- of time and assistance a patient requires. However,
muscular or musculo-skeletal disorders, and has the scoring method is inconsistent in that changes
been used more generally to evaluate treatment by a given number of points do not reflect equiva-
outcomes since. The Barthel Index is based on a lent changes in disability across different activities.
rating scale of ten domains completed by a therapist Moreover, as its authors point out, the scale is
or other observer. The scale takes approximately restricted in that changes can occur beyond the end
30 seconds to score. points of the scale.
Content Validity
The scale covers the following dimensions: The original Barthel Index was tested for validity
by Wade et al. (1985, 1992) and Collen et al. (1990),
Feeding in their evaluations of therapies for stroke patients.
Mobility from bed to wheelchair
The results have been more fully described by Wade
Personal toilet (washing, etc.)
Getting on/off the toilet
(1992). Mattison et al. (1991) used the Barthel
Bathing Index with 364 patients attending day centres for
Walking on level surface (propel wheelchair) the physically disabled. They compared it with the
Going up/down stairs PULSES Scale and the Edinburgh Rehabilitation
Dressing Status Scale (ERSS) (Affleck et al. 1988) and
Incontinence (bladder and bowel) reported that the correlation between the Barthel
and these two scales was 0.65 and r = −0.69 respec-
A score of zero is given where the patients cannot tively. Good results were reported by Wade and
meet the defined criterion. It omits tasks of daily Collin (1988). Most studies of its validity compare it
with the PULSES profile, which give correlations two factors have also been reported – mobility and
of −0.74 to −0.90 (Granger et al. 1979; Mattison personal care (Wade and Langton-Hewer 1987).
et al. 1991). Wilkinson et al.’s (1997) study of the
long-term outcome of stroke patients reported
Reliability
that the Barthel Index correlated highly with the
physical functioning dimension of the SF-36 (r = Sherwood et al. (1977) reported alpha reliability
0.810), the physical mobility dimension of the Not- coefficients of 0.95 to 0.97 for three samples of
tingham Health Profile (r = −0.840), the London hospital patients, suggesting that the scale is intern-
Handicap Scale (r = 0.727) and the Frenchay ally consistent. Collin et al. (1988) tested it for
Activities Index (r = 0.826). They concluded that reliability on 25 stroke patients, and analysed obser-
the use of the Barthel Index is still justified for use ver agreement, using groups of two, three and
in long-term follow-up studies of stroke patients. four observers. They reported that difficulties in
The Barthel Index has been reported to have agreement were lower for the middle category, and
predictive validity as it correlates well with various consequently they refined the instructions for
prognostic scores of stroke patients (Kalra and observers. De Haan et al. (1993) in a study of neu-
Crome 1993), length of hospital stay and mortality rology out-patients with stroke, also reported that
(Wylie and White 1964). It has been reported to the Barthel Index had high internal consistency
be sensitive to their recovery (Wade and Langton- (Cronbach’s alpha = 0.96) and concordance of total
Hewer 1987). scores by three observers (mean kappa = 0.88, range
It is less appropriate for use with less severely 0.82–0.90), and single-item scores (mean values for
disabled patients. It was insensitive to clinical kappa = 0.82–1.00). Granger et al. (1979) reported
change among elderly patients attending a day a test-retest reliability of 0.89 with severely disabled
hospital (Rodgers et al. 1993; Parker et al. 1994). As adults, and an inter-rater agreement exceeding 0.95.
the Barthel Index is a measure of what the patient However, other studies of doctors’ and nurses’ rat-
actually does, rather than ability, scoring may also ings of elderly nursing-home patients have reported
be location dependent (McMurdo and Rennie poor agreement, particularly for patients with some
1993). It may not be sensitive to improvements of degree of cognitive impairment (Ranhoff and
deteriorations beyond the end points of the scale Laake 1993).
(‘floor’ and ‘ceiling’ effects) (McDowell and Newell Despite its acknowledged limitations, there is a
1996). Yohannes et al. (1998) assessed the sensitivity large body of literature supporting its use with
and specificity of the Barthel Index in a study of specific groups of disabled patients, such as those
people with and without chronic airflow limitation with neurological disability (Collin et al. 1988;
(CAL) (out-patients and people in the community). Wade and Collin 1988). No permission is needed to
They reported that the Barthel Index under- use the Barthel Index.
estimated disability in CAL and that the Not- Several modified versions of the scale have been
tingham extended ADL index discriminated developed (Granger et al. 1979; Fortinsky et al.
between groups better than the Barthel. It has been 1981; Granger 1982; Granger and McNamara 1984;
reported that the Barthel Index has less powerful Shah et al. 1989; Gompertz et al. 1993a, 1993b).
end points than the earlier developed Rankin Scale However, Granger and his colleagues now regard
when used in clinical trials of stroke patients their early modifications of the scale (which was
(Rankin 1957), with the implication that it weakens extended to cover 15 topics) to be obsolete and
trial power for a given trial size, or requires a larger have replaced it with the Functional Independence
sample size to obtain statistical power, than the Measure (18 items) and Functional Assessment
Rankin Scale (Young et al. 2003). Measure (12 items) (FIM+FAM) (Stineman et al.
De Haan et al. (1993) reported that the Barthel 1994). FIM is an 18-item, seven-level ordinal scale
Index factor analysis showed that the items covering two dimensions: motor and cognitive dys-
described one common trait, which explained 81 function. The FAM contains 12 items and is an
per cent of the variance in their study of stroke adjunct to the FIM and includes cognitive,
patients. This supports earlier factor analyses which behavioural, communication and community
showed that it measures a single domain, although functioning (Hall 1997). The total 18-item scale is
36 MEASURING HEALTH
known as the FIM+FAM, and takes approximately entered into the score formula to obtain the overall
35 minutes to administer. These have been gener- handicap score. The score is an estimate of the
ally reported to have good reliability and validity relative desirability (utility) of the state of health
(Ottenbacher et al. 1994; Kidd et al. 1995; Turner- described. A score of 100 indicates no handicap,
Stokes et al. 1999). Hall (1992) reported rater while a score of 0 represents maximum handicap.
agreement to be 89 per cent, and the kappa score The scale is self-completed; a proxy version for
was 0.85. However, Riazi et al. (2003) reported that completion by a third party is available for people
the FAM mobility subscale had lower reliability who are unable to complete the questionnaire them-
(alpha 0.78) than the SF-36 physical subscale in a selves. A handbook is available which describes
clinical trial of patients with multiple sclerosis. the scale and includes the scoring instructions.
A UK version has been developed, and ensuing Examples from the London Handicap Scale are:
modifications to enhance the objectivity of the
scoring have resulted in improved team and indi- Physical independence
vidual accuracy in scoring (Turner-Stokes et al. Looking after yourself. Think about things like house-
1999). The FIM+FAM are increasingly used, work, shopping, looking after money, cooking, laundry,
although the original Barthel is still very popular, getting dressed, washing, shaving and using the toilet.
particularly in studies of stroke, and investigators Does your health stop you looking after yourself?
continue to justify its use (Wilkinson et al. 1997;
Fjartoft et al. 2003). Although the Barthel Index is Not at all: you do everything to look after yourself.
narrow in scope, broader measures have not always Very slightly: You need a little help now and again.
Quite a lot: You need help with some tasks (such as
been shown to have greater validity or responsive-
heavy housework or shopping), but no more than once a
ness to change in condition (Guyatt et al. 1993). day.
Very much: You do some things for yourself, but you
need help more than once a day. You can be left alone
LONDON HANDICAP SCALE (LHS) safely for a few hours.
Almost completely: You need help with everything. You
The London Handicap Scale (Harwood et al. 1994; need constant attention, day and night.
Harwood and Ebrahim 1995) is a handicap
classification questionnaire covering six domains:
Validity and reliability
mobility, physical independence, occupation,
social integration, orientation and economic self- The London Handicap Scale, together with several
sufficiency. It includes a table of severity weights other scales, was administered to 89 survivors of
to enable an overall interval level handicap score stroke at 12 months and between 24 and 36 months
to be calculated. Handicap was defined by the after the index stroke. The London Handicap Scale
scale developers as the disadvantage for a person was moderately strongly associated with the other
that results from ill-health. The scale was based on scales used (r = 0.4 to 0.7), which included the
the WHO (1980) International Classification of Barthel Index (Mahoney and Barthel 1965), the
Impairment, Disability and Handicap. The weights Nottingham Extended ADL Scale (Nouri and
for scoring were obtained from two samples of lay Lincoln 1987), the Nottingham Health Profile
people (n = 34 and 79), 97 medical doctors and 14 (Hunt et al. 1986) and the Geriatric Depression
health professionals who were asked to rate the Scale (Yesavage et al. 1983). In a further study of
severity of hypothetical descriptions. 58 patients, the scale was administered six months
after hospital admission for stroke. The scale corre-
lated highly (r = 0.78) with the modified Rankin
Content, administration and scoring
Scale (Rankin 1957; UK-TIA Study Group 1988),
Each of the six domains contains a single item supporting its concurrent validity (Harwood and
question, with standardized six-point response Ebrahim 1995). The scale was associated with pre-
choices, with detailed descriptions of the meaning stroke disability, initial stroke severity and mood
of the subscale response items. The appropriate at one, and two to three years follow-up of 316
score from a matrix is applied to each response, and acute stroke patients (Harwood et al. 1994, 1997).
In a study of long-term outcome of stroke by comparing the health status of groups of individuals
Wilkinson et al. (1997), the LHS was reported to for the evaluation of health care programmes
correlate highly with the Barthel Index (r = 0.726). (Kaplan et al. 1976; Bush 1984). Hence it provides
It has been shown to be sensitive to improvement an estimate of the value of health status which is
after hip replacement in a study of 81 patients. necessary for cost-utility analyses (Hays et al. 1996).
Principal components analyses showed one or It can be used with general populations and
two factor solutions (with the first representing applied to any type of disease. The items in the
handicap, but there was no particular pattern to interview schedule were drawn mainly from the
the loadings on the second factor) (Harwood and US Health Interview Survey and from the Social
Ebrahim 1995). Security Administration Survey, but the schedule
In a study of 37 stroke patients who recompleted has been extensively tested on large groups of
the scale at two weeks post baseline assessment, nurses and graduate students and revised. The
the reliability coefficient was 0.91. A three-month QWBS combines mortality with estimates of the
follow-up study of 79 rheumatology patients yielded quality of life. The scale quantifies the health output
a reliability coefficient of 0.72. The reliability co- of a treatment in terms of years of life, adjusted
efficient from mixed groups of patients (13 hip for their diminished quality, that it is responsible
replacement, 4 knee replacement and 6 angioplasty for. First, the assessment of health begins with an
patients) was 0.77 (Harwood and Ebrahim 1995). assessment of functional status, based on the indi-
The weighted and unweighted (simple summa- vidual’s performance. Second, a value reflecting
tion of scores without weights) scale scores were the relative desirability (utility) is assigned to each
tested in a study of stroke patients by Jenkinson functional level (Fanshel and Bush 1970).
et al. (2000). Cronbach’s alpha for the LHS was 0.83 Responses to a branching questionnaire are used
indicating good internal consistency. The weighted to assign subjects to one of a number of discrete
and unweighted versions of the LHS correlated function states. It is based on a model of health
highly with each other (r = 0.98). Both versions which encompasses symptom/problem, mobility,
correlated similarly with the Dartmouth Coop physical activity and social activity. The QWBS has
Charts, the Frenchay Activities Index, the Barthel been used in many evaluative studies of pulmonary
Index and the Hospital Anxiety and Depression disease and in drug trials (Toevs et al. 1984; Bom-
scale, and the correlations supported their con- bardier et al. 1986; Kaplan 1994). Several areas of
vergent validity and discriminative ability. Thus the application have been reported by Bush (1984),
simple summation of scores of the LHS did not lead most of which are unpublished. The scale includes
to any change in the measurement properties of death. This avoids the problem inherent in other
the instrument, compared with standard weighted indices where, as death is frequently ignored, the
scoring. Thus the authors recommended that death of a disabled person appears to improve the
unweighted scores should be used as these are easier population estimate of health status (Kaplan et al.
to calculate and interpret. 1994). Third, information is also collected about
The scale has been used in several descriptive future changes (prognosis). This permits a distinc-
studies of handicap (Harwood et al. 1996, 1998a, tion between two people with equal functional dis-
1998b; Prince et al. 1997, 1998). In sum, it is a ability but one of whom is terminally ill.
relatively recently developed scale showing good The interview schedule is long, and a 30-page
psychometric abilities to date. manual is purchasable from the authors. The
questionnaire takes 10–15 minutes to administer.
It can be administered to proxies where people are
THE QUALITY OF WELL-BEING SCALE (QWBS) not able to reply (e.g. stroke victims). A self-
administered format is available (QWBS-SA).
The Quality of Well-Being Scale was developed
in order to operationalize ‘wellness’ for a general
Content
health-policy model. This was an attempt to
develop an alternative to economic cost-benefit The Quality of Well-Being Scale consists of three
analysis for resource allocation, for example, by ordinal scales on dimensions of daily activity.
38 MEASURING HEALTH
Combinations of each of the three scales of mobility, 43 function levels a ‘preference weight’ has been
physical and social activity were initially taken to established empirically, ranging from 1 (complete
define 29 function levels. Subsequent modification well-being) to 0 (death). The appropriate prefer-
has increased the number of function levels to 43. ence weight is assigned to the respondents’ func-
Each function level can be linked with a separate tional-ability level, and the resulting score is known
classification of symptoms and problems. Questions as the Quality of Well-Being score. The authors
are based on performance, not capacity. Four noted that it is possible to provide a score below
aspects of function are covered, mobility/confine- zero to represent a state ‘worse than death’ (Kaplan
ment, physical activity, social activity (e.g. work, and Bush 1982). The weights assigned represent
housekeeping) and self-care. The categories on preferences of the relative importance that mem-
each scale range from full independence to death. bers of society assigned to each functional level. The
The physical ability scale has four categories and the third stage of rating involves adjusting for prognosis
others have five. (Kaplan et al. 1976, 1979). The authors are currently
Respondents are given a list of 22 ‘symptom/ developing a new self-administration version of the
problem’ complexes and asked to identify all that scale. Further testing for the validation of the scaling
applied to them during the preceding six days. technique has been reported by Kaplan and Ernst
They are then asked to indicate which of these has (1983). The means and variances of preferences for
‘bothered them most’. Next, they are asked about attributes of quality of life did not change over time
mobility, physical and social activity. Actual ability (Kaplan et al. 1978).
rather than capacity to perform is asked about.
Examples of the scale items are:
Validity
Mobility The content validity of the QWBS has been stated
Drove car and used bus or train by Kaplan et al. (1976). Its content validity as an
Did not drive or had help to use bus or train index of disability is enhanced by incorporating
Physical activity death. Kaplan et al. (1976) reported correlations of
Walked without physical problems −0.75 between the QWBS and number of reported
Walked with physical limitations symptoms, and of 0.96 between the QWBS and the
Social activity number of chronic health problems. The correlation
Did work, school, or housework and other activities between the QWBS and the number of physician
Limited in amount or kind of work, school or housework contacts in the preceding eight days was 0.55. It
correlated well (0.76) with self-rated quality of life.
A review of the literature showed that it correlated
Scoring
with functional ability and broader health-status
Respondents’ functional status is classified for each scales from 0.17 to 0.71, with most correlations at
day on the scales of mobility, physical and social 0.50 or more (Revicki and Kaplan 1993). Thus it
activity. The QWBS score is calculated by combin- was said by its authors to have convergent validity.
ing this information with the symptom/problem More recent studies have confirmed the good con-
responses, using a set of preference weights. The vergent validity of the QWBS when tested against
latter were developed on the basis of interviews other measures of health status (Groessl et al. 2003).
with 800 respondents in a household survey in It is able to successfully predict outcome among
which respondents were asked to rate different people with HIV infection (Kaplan et al. 1995), and
health states on a ten-point scale ranging from has been widely reported as having good con-
death to perfect health. The ratings were used in a vergent validity when tested against other HIV
multiple-regression analysis to obtain weights for health-status measures (Hughes et al. 1997). Its pre-
different responses. Scores are calculated separately dictive value has been reported to be between 0.95
for each of the six days, and the QWBS score is and 0.98 and sensitivity at 0.90, on the basis of data
expressed as the average of these. The final score from 1,324 subjects with a range of diseases and
ranges from 0 (equated with death) to 1 (perfect injuries. It is able to discriminate between changes
health) (Patrick et al. 1973a, 1973b). For each of the in life quality over time (Kaplan et al. 1976).
The Quality of Well-Being Scale was used by The complexity of the original interviewer-
Leighton Read et al. (1987) in a study of 400 administered scale has inhibited its use in research
out-patients in Boston, Massachusetts, USA. The on health outcomes. At the other extreme it has also
authors tested this scale, the SIP and the Rand Cor- been used with patients where its appropriateness
poration’s General Health Rating Index (GHRI) is questionable (e.g. with paediatric cancer patients
for convergent validity, content and discriminant (Bradlyn et al. 1993)). The short form (QWBS-SA)
validity. The findings for convergent validity were also has good psychometric properties and may be
similar to those achieved by the developers of preferred as a less expensive and complex instru-
the scales – the correlations between scales were ment to use than the interviewer-administered
moderately high (0.46–0.55). They reported the version when estimating the effectiveness and cost-
GHRI to be the easiest to use, as they found both effectiveness of treatments (Pyne et al. 2003). The
the SIP and the QWB scales required a major instruments are available free of charge.
commitment to interviewer training. All were
equally acceptable to respondents. The QWBS
contained more items than the others on specific THE CRICHTON ROYAL BEHAVIOUR RATING
symptoms; in contrast the SIP contained more SCALE (CRBRS)
detail on dysfunction. The authors concluded that
each scale was a valid measure of health status. The Crichton Royal Behaviour Rating Scale was
Anderson et al. (1998) compared the results of the developed by Robinson (1968) and refined and
QWBS with those of the SF-36 in characterizing tested for reliability and validity by Wilkin and
the health of patients (mainly cancer and AIDS) Jolley (1979) with samples of people in residential
over two and a half years. They reported that while homes for the elderly, and geriatric and psychiatric
the QWBS indicated a decrease in the functioning wards. The scale has had a wide application in the
of patients over time, the SF-36 did not. Apparently, UK and was used for the evaluation of state-funded
this was because the QWBS counted patient death nursing homes (Bond et al. 1989; Bowling and
as an outcome, whereas the SF-36 counted them as Formby 1990) and for the assessment of residential
missing data. Hence the authors recommended the care of the elderly (Willcocks et al. 1987; Jagger and
QWBS as better able to capture the outcomes of Lindesay 1997).
serious illness over time than the SF-36. The CRBRS was designed for use with elderly
people living in institutions. It is completed by a
third person who knows the respondent well (e.g. a
Reliability
nurse in a hospital ward would complete it on
The reliability coefficient obtained when judges behalf of a patient well known to him/her). The
reassessed scale values for 29 function levels was modified CRBRS contains ten items, five relating
0.90 (Kaplan and Bush 1982). Test-retest reliability to functional ability (mobility, feeding, dressing,
is between 0.93 and 0.98 (Kaplan et al. 1978; Bush bathing, continence) and five relating to mental
1984). The internal consistency reliability estimates disturbance (memory, orientation, communication,
for the QWBS overall score have been reported cooperation, restlessness). The confusion subscale
for four populations and reliability estimates have can be used independently of the functional-ability
exceeded 0.90. The details of the reliability studies scale.
are reported in Kaplan et al. (1988). As the scale was designed for use with popula-
The QWBS is advantageous in that it incor- tions resident in institutions, home-care activities
porates mortality. Kaplan (1994) and Kaplan et al. were excluded. Although the research reports con-
(1994) recommended that the scale is useful for taining evidence of the scale’s reliability and validity
policy analysis and clinical research because of its are now out of print, a brief description of the scale
unidimensionality, but its unidimensional approach has been published by Wilkin and Thompson
makes it less informative for clinical studies, where a (1989). Interviewer training is required.
multi-dimensional scoring approach is preferable. Notes of guidance are provided for interviewers
However, it can be used to estimate QALYs to elicit the appropriate questions. The interviewer
(quality adjusted life years) for policy analysis. is expected to probe and ask for examples of
40 MEASURING HEALTH
behaviour before a classification is made. The obtain the total score. These are grouped into six
emphasis of the scale is on behaviour (what the ranges, although the authors state that they do not
person actually does), rather than on ability. A set of represent levels of dependency: 0–1, 2–5, 6–10, 11–
structured questions has been designed by Bond 15, 16–20, 21–38. It is unclear what the different
et al. (1989) for use with the interviewers’ assess- scale scores represent.
ments, with high correlations between the two. A subscale for confusion can also be derived, con-
The interviewer assessment without the structured sisting of the items relating to memory, orientation
questions takes approximately three minutes, the and communication. These subscale scores are
assessment with the structured questions can extend totalled and grouped into four ranges: 0–1 (lucid);
the interview to seven to ten minutes (Wilkin and 2–3 (intermediate); 4–6 (moderately confused); 7–
Thompson 1989; Bowling and Formby 1990). 11 (severely confused) (Evans et al. 1981).
Content Validity
Examples of the rating scale are: Thompson (1984) used factor analysis to assess the
construct validity of the CRBRS. She reported
Dressing that the ten items reflected two dimensions of
0 Correct
dependency: capacity for self-care and ability to
1 Imperfect but adequate
2 Adequate with minimum of supervision
walk.
3 Inadequate unless continually supervised The research reports which gave details of results
4 Unable to dress or to retain clothing for the validity and reliability of the CRBRS are
out of print, but Wilkin and Thompson (1989) have
Feeding summarized the results. They reported that total
0 Correct unaided at appropriate times
1 Adequate with minimum of supervision
scale scores of 10 or more were associated with
2 Inadequate unless constantly supervised a diagnosis of psychiatric disorder. The study
3 Requires feeding reported by Evans et al. (1981) suggested that con-
fusion subscale scores of 4 or more were associated
Memory with a clinical diagnosis of dementia. Total scale
0 Complete
scores and the confusion subscale scores have been
1 Occasionally forgetful
2 Short-term loss
compared with independent clinical assessments
3 Short- and long-term loss and the modified Roth-Hopkins mental-state test
(the correlations for the latter range between 0.75
Orientation and 0.82) (Vardon and Blessed 1986; Wilkin and
0 Complete Thompson 1989). There has been little work test-
1 Oriented in ward, identifies people correctly
2 Misidentifies but can find way about
ing the validity of the CRBRS.
3 Cannot find way to bed or toilet without assistance Wilkin and Thompson (1989) caution that the
4 Completely lost scale is insufficiently sensitive as a tool for assessing
change over time and should be used to provide
Full details of the CRBRS, together with the full- population profiles. The problem with using the
scale items, interviewer notes and structured ques- scale to assess individual change over time is that
tions designed by Bond et al. (1989) to accompany reliance on different informants to complete the
the rating scale, are reproduced by Wilkin and scale may produce bias.
Thompson (1989).
Reliability
Scoring
The emphasis of the scale on behaviour rather
Each item is scored 0 to 4, except memory and than ability in theory minimizes problems with
feeding which are scored 0 to 3; thus these two reliability. In practice, however, it is likely that
items were designed to make a smaller contribution ratings will reflect the philosophy of the institution;
to the overall score. Individual items are added to for example, staff working in institutions that
encourage independence are likely to assess use with elderly people in hospital. The whole test
patients/residents as less dependent than staff in can take between 5 and 30 minutes, depending on
institutions which encourage dependency. the mental and functional ability of the respondent.
Thompson (1984) assessed the internal reliability It is designed to be completed by a third party who
of the capacity for self-care dimension of the scale. knows the respondent well. Interviewer training is
She reported that reliability can be increased by necessary. A shortened version of the two subscales
removing the items relating to feeding, restless- was also developed (Pattie 1981).
ness and cooperation, and by treating mobility as a
separate dimension. Wilkin and Jolley (1979)
Content
examined inter-rater reliability, using two inter-
viewers to assess the same informant; the corre- The Cognitive Performance Scale consists of a
lations obtained were greater than 0.90. Far more battery of tests which includes items such as the
testing for reliability is required. person tracing a circular route with a pencil,
There are some difficulties within the scale avoiding obstacles; the information/orientation
which have not yet been resolved; for example, items include the usual memory tests of own place
the classification of the use of stairs in the case of a of residence, name of the prime minister, date, etc.,
respondent who lives on the ground floor and does and, more unusually, knowledge of the colours of
not need to use stairs. the British flag.
Despite the limited evidence about its reliability The Behavioural Rating Scale contains 18 items.
and validity, the CRBRS is still used in the UK as Four items relate to mobility, continence and activi-
a measure of functioning for use with frail, elderly ties of daily living. The remaining items relate to
people living in institutions (Jagger and Lindesay confused behaviour. The scale asks about current-
1997; Challis et al. 2000). It is fairly concise and level functioning. The scale focuses heavily on the
easily administered by a trained interviewer. Its behavioural problems of those elderly people who
popularity in this area is probably partly due to the are mentally infirm. The rater is instructed to rate
advantage it has in being specifically designed for people according to their level of current function-
use with a third party. ing, and to take into account behaviour over the
past two weeks. The CAPE relates to four sub-
scales: physical disability, apathy, communication
THE CLIFTON ASSESSMENT PROCEDURES FOR THE difficulties and social disturbance. Examples of the
ELDERLY (CAPE) behaviour-rating scale are:
The Clifton Assessment Procedures for the Elderly When bathing or dressing, he/she requires:
are the most extensively tested measures of depen- 0 No assistance
dency in widespread use in the UK, particularly in 1 Some assistance
relation to psychological assessments of the elderly. 2 Maximum assistance
The scales were developed for use with elderly With regard to walking, he/she
people living in institutions, and tested for validity 0 Shows no signs of weakness
by Pattie and Gilleard (1975, 1979). A manual is 1 Walks slowly without aid, or uses a stick
available which provides details of use and also 2 Is unable to walk, or if able to walk, needs frame,
normative data for a range of patient populations crutches or someone by his/her side
(Pattie and Gilleard 1979). The CAPE consists of He/she is confused (unable to find way around, loses
two schedules, designed to measure behaviour and possessions, etc.)
cognitive performance, known as the Behaviour 0 Almost never confused
Rating Scale and the Cognitive Assessment Scale. 1 Sometimes confused
The Cognitive Assessment Scale was originally 2 Almost always confused
known as the Clifton Assessment Scale, and can be His/her sleep pattern at night is
completed by an elderly person in 5 to 15 minutes. 0 Almost never awake
The Behaviour Rating Scale is a short version of 1 Sometimes awake
the Stockton Geriatric Rating Scale, designed for 2 Often awake
42 MEASURING HEALTH
Scoring home, and between mentally infirm people who
The 18 items are added to form a total score, or survive and those who die.
selected items are added to produce subscale scores. Results of factor analyses showed that two factor
Each item has a range of scores from 0 (no/few structures emerged for three groups of elderly
problems) to 2 (frequent/constant problems). Four people tested, and no clear factor structure emerged
subscales can be created from the items relating to for non-psychiatrically ill elderly people (Pattie and
physical disability, apathy, communication difficul- Gilleard 1979). The four subscales suggested by the
ties and social disturbance. The total scale range of authors for analysing the Behaviour Rating Scale
scores is from 0 to 36. A score of 0–3 indicates data were not supported. Factor analyses carried
independence; 4–7 indicates low dependency; out on the CAPE by Twining and Allen (1981),
8–12 indicates medium dependency; 13–17 indi- on the basis of 903 people in residential homes for
cates high dependency and 18–36 is maximum the elderly, also failed to support the four suggested
dependency. No item weights are used. subscales.
Reliability
Validity
Pattie and Gilleard (1979) reported inter-item
Early versions of the scale were shown to correlate correlations, and all are fairly high. This suggests
with clinicians’ assessments about appropriate levels that items are consistent and are measuring the same
of care for females but not males. The sample size dimensions of dependency. Inter-rater reliability
was small, 38. The authors also report correlations for the four subscales was tested on psychiatric and
with other scales such as the Wechsler Memory psychogeriatric patients and people in residential
Scale, but sample sizes were too small to be conclu- homes for the elderly; the correlations were all 0.70
sive. Pattie and Gilleard (1975) reported that the or higher with the exception of the correlation for
CAPE correlated highly with psychiatric diagnoses, ‘communication difficulties’ which was low. Tests
and was able to discriminate between patients for inter-rater reliability for the total scale showed
who were discharged home or not. Evidence of the wide variations. Smith et al. (1981), in their study of
scale’s concurrent and convergent validity is still 38 elderly mentally handicapped patients, reported
limited (Smith et al. 1981). McPherson et al. (1985) inter-rater a reliability coefficient of 0.58 between
also reported that the shortened survey version of two nursing sisters.
the scales were able to distinguish between patients Test-retest reliability coefficients ranged from
with severe, moderate, mild and no dementia, and 0.56 to 0.90 at retests over two to three days with
patients with physical disability. 38 hospital patients aged 65 and over, and at retests
Black et al. (1990) compared the diagnostic ability over two to three months. The six-month test-
of the CAPE, in relation to dementia, with the retest reliabilities, based on 39 new admissions
diagnosis made by the computer program AGE- to homes for the elderly, range from 0.69 to 0.84
CAT and a clinical diagnosis made by a psychiatrist. (Pattie and Gilleard 1979).
The sample was an elderly sample of patients from a Although the CAPE has undergone more testing
general practice (112 were selected from 378 who than the Crichton Royal Behaviour Rating Scale,
had been tested three years previously when aged evidence to support its psychometric properties is
70+ using a 13-item mental-function test). The still limited. The scale, along with others in com-
authors reported that the sensitivity of the CAPE mon use in the UK, has been reviewed by Wilkin
was low, probably because it identified only the and Thompson (1989). They drew attention to
more severe cases. The CAPE only detected about ambiguities in the scale wording, for example the
half the number of known cases. terms ‘sometimes’ and ‘frequently’ are not defined
In relation to behaviour, Pattie and Gilleard and therefore likely to be interpreted differently by
(1979) have reported that the CAPE can discri- different raters. The CAPE has also been reviewed
minate between elderly people requiring differing by Mulgrave (1985) who reported that the CAPE’s
degrees of help, with different levels of social main advantages are that they are short and easily
adjustment following admission to a residential administered.
4
MEASURING BROADER
HEALTH STATUS
Broader measures of health status generally focus significantly and independently associated with use
on individuals’ subjective perceptions of their of health services, changes in functional status,
health. Subjective or perceived health may be mortality and to rates of recovery from episodes
defined as an individual’s experience of mental, of ill health (National Heart and Lung Institute
physical and social events as they impinge upon 1976; Singer et al. 1976; Kaplan and Camacho 1983;
feelings of well-being (Hunt 1988). Many studies Goldstein et al. 1984; Idler and Kasl 1995; Greiner
in medical sociology have indicated the importance et al. 1999; Siegel et al. 2003).
of the perceptual component of illness in deter- Respondents to the 1995 Australian National
mining whether people feel ill or whether they Health Survey were asked to rate their health in
seek help. general as ‘excellent, very good, good, fair or poor’.
Scales of broader-health status are more stable, A sub-sample of respondents was asked the question
and have better reliability and validity than single twice – before and after other questions about their
item questions and are the preferred instruments to health. Crossley and Kennedy (2000) analysed the
use. On the other hand, some single items, despite stability of the question and found that 29 per cent
some instability, are popularly used in general, of respondents changed their self-assessed health
multi-topic population surveys. A popular single- rating at the repeat question. Of course, this also
item measure consists of simply asking respondents partly reflects the biasing effects of question order.
to rate their health as ‘excellent, good, fair or poor’. But while self-assessed health status was associated,
In order to increase the question’s ability to dis- in the expected directions, with age and income,
criminate between groups, researchers now insert response instability was also associated with age,
a ‘very good’ category in between ‘excellent’ and income and occupation.
‘good’ (given that most respondents are affected by This measure may be contextual and vary over
social desirability bias and rate their health at the time with people’s varying expectations. Thus self-
‘good’ end of the scale spectrum) (Ware 1984; Ware ratings of health are often criticized as subjective,
et al. 1993). The tradition in gerontology is to ask although their subjectivity is their strength because
respondents to rate their health in relation to their they reflect personal evaluations of health. Out-
age (Cartwright and Anderson 1981; Bowling come assessment needs to incorporate patient-
and Cartwright 1982; Bowling et al. 2002). This based assessments as well as clinical indicators. It
prevents older respondents from assessing their should, however, be cautioned that people with
health with reference to younger age groups as well poor mental health (e.g. depression) are more likely
as their own. This single-item measure of self- to rate their health status, functional ability and
perceived health status has long been reported to be social support as low. In some cases poor mental
44 MEASURING HEALTH
health will distort perceptions of health and well- monly used in studies of health status, are symptom
being, and poor physical health can also lead to poor checklists. These also have their limitations, but are
mental health and well-being (including reduced generally considered to be useful tools if used in
social interaction). conjunction with scaled measurement techniques.
Given that subjectivity is a major criticism There are numerous examples of checklists of
levelled at this type of indicator, it merits some symptoms presented to respondents in surveys.
discussion. Health indicators have largely been Respondents are typically asked to indicate which,
developed within the era of science based on the if any, they currently suffer from. General symptom
logical positivist paradigm. This inevitably leads to checklists can be found in George and Bearon
suspicion when data are presented which are based (1980), in Cartwright’s classic national surveys
on subjective experience. This is despite the (Dunnell and Cartwright 1972; Cartwright and
research questioning the reliability of ‘objective’ Anderson 1981; Bowling and Cartwright 1982),
data. The concordance rate of clinical and patho- and in the Rand Health Insurance Study Question-
logical diagnoses has been shown in classic studies naires (Stewart et al. 1978). Disease-specific quality
to be as low as 45.3 per cent (Heasman and of life questionnaires usually contain a list of
Lipworth 1966); several early studies reported symptoms relevant to the condition under study
on the arbitrary nature of normal values in bio- (see Measuring Disease, Bowling 2001). However,
chemistry (Grasbeck and Saris 1969; Bradwell questions that ask about symptoms tend to produce
et al. 1974), and of the problem of establishing a a high proportion of affirmative responses, given
dividing line between sick and healthy in relation to the presence of a large amount of minor morbidity
diabetes, hypertension and glaucoma (Cochrane in a population (Dunnell and Cartwright 1972).
and Holland 1971). Items focusing on trivial problems are unlikely
There are also studies reporting wide discrepan- to have much discriminatory power in terms of
cies between patients and clinicians in relation monitoring change between groups over time.
to preferences for treatments (McNeil et al. 1978, They may include not only response errors, but
1981; Liddle et al. 1993; Schneiderman et al. 1993), diagnostic errors (many people do not present their
and between their ratings of the patients’ outcomes symptoms to doctors for investigation, and patients
after specific therapies. One Swedish out-patient are not always fully investigated by their doctors
study also found that there was only 50 per cent (Bond et al. 2003). Reporting of morbidity, and
agreement between doctors and patients on consultation patterns, depend on symptom
whether treatment had been successful (Orth- tolerance levels, pain thresholds; attitude towards
Gomer et al. 1979). Similar results have been illness and self-care, the expectations and demands
obtained in relation to low back pain and outcome of others (family, employer, friends), knowledge and
of surgery, blood-pressure treatment, results of understanding of symptoms experienced and other
surgery for peptic ulcer, the effect of cancer treat- social and cultural factors.
ment and other treatments (Hall et al. 1976; Given that a subjective measure of health status
Orth-Gomer et al. 1979; Thomas and Lyttle 1980; is required, and single-item measures can be limited,
Jachuck et al. 1982; Slevin et al. 1988). This is the issue becomes that of which measure to choose.
because doctors and patients use different criteria The researcher also has to decide whether a general
in their assessments. Moinpour et al. (2000) also and/or specific measure is required, depending on
reported poor agreement between cancer patients’ the nature of the study. There is little point in
and their families’ ratings of the patient’s quality of including a health measure if it is unlikely to detect
life and concluded that proxies are a poor substitute the effects of the treatment or symptoms specific to
for capturing patients’ perspective on their quality the condition. Some specific disease-related scales
of life. Proxy measures should be a last resort in do exist, although the Nottingham Health Profile
research. While people’s own self-ratings may be and the Short Form-36 have also been used as
subject to optimism, social desirability and other more general measures of health status and are
biases (Brissette et al. 2003), so proxy ratings can apparently successful at distinguishing between
carry their own biases. patient types.
Other measurement formats, which are com- The case for using general, rather than specific,
MEASURING BROADER HEALTH STATUS 45
indicators of health status in population surveys healthy and ill lay people, which described ‘sickness
has been clearly argued by Kaplan (1988). For related behaviour dysfunction’ (Bergner et al.
example, detailed information about specific disease 1981).
categories may appear overwhelming to many The populations and patient groups involved
respondents not suffering from them. Also the use in its development varied widely and included
of disease-specific measures precludes the possi- in-patients, out-patients, and home-care patients
bility of comparing the outcomes of services that with chronic diseases, patients in intensive care
are directed at different groups suffering from dif- units, patients undergoing hip replacements and
ferent diseases. Policy analysis requires a general arthritis patients. It may be self- or interviewer-
measure of health status. Broader measures of administered. It takes 20–30 minutes to complete.
physical, social and psychological functioning Deyo et al. (1983) recognized the problem of its
exist, but their use in the UK has been limited. If length and suggested that attempts to shorten it
necessary, as in the case of a study of a specific dis- might augment patient acceptability. A sub-sample
ease group, global measures can be supplemented of the participants in their study were asked about
with disease-effect questions. their opinions of the SIP and, while most said it was
The following sections describe the most well- acceptable, the few complaints elicited concerned
known and best-tested measures of broader-health its length and the fact that it was not disease specific.
status that are currently available. The authors suggest that the subscales could be used
independently if desired (e.g. the physical function
subscale consists of only 45 of the 136 items). A
THE SICKNESS IMPACT PROFILE (SIP) study asking interviewers to make ratings of the
ease of application of the SIP by Read et al. (1987)
One of the best instruments developed in the USA found that the length of the SIP was reported to
has been the Sickness Impact Profile (SIP) (Deyo be tedious, largely because of the repetition it con-
et al. 1982, 1983; Bergner 1988, 1993). The Sickness tained. They also reported that the SIP involved
Impact Profile was developed as a measure of per- a major commitment to interviewer training time
ceived health status, for use as an outcome measure (at least a week). More optimistic findings were
for health care evaluation across a wide range of reported by Hall et al. (1987). The SIP was used
health problems and diseases, across demographic in a general study of patient outcome, based in a
and cultural groups. Sickness is measured in relation general practice setting in Sydney, Australia by
to its impact on behaviour. The profile emphasizes Hall et al. (1987). The SIP was used along with the
sickness-related dysfunction rather than disease. It General Health Questionnaire and the Rand
was designed to be sensitive to differences in health Health Insurance Battery; 160 questionnaires
status in terms of minor morbidity (Bergner et al. were completed by patients, and only 3 per cent
1976a, 1976b, 1981; Gilson et al. 1979). did not complete all the questions across the three
The Sickness Impact Profile concentrates on scales.
assessing the impact of sickness on daily activities There have been many applications of the SIP to
and behaviour, rather than feelings and clinical a wide range of patient groups, mainly in the USA,
reports. The justification was that feelings are dif- but also in the UK and other parts of the world.
ficult to measure and subjective and thus difficult One application in the UK was by Fletcher et al.
to validate, and clinical reports can be provided (1988) in a randomized controlled trial of outcome
only where someone has sought medical care. The of drug treatment of angina patients. The SIP has
authors felt that behavioural reports were less been widely used in the US heart-transplantation
subject to bias than feelings, although reported evaluations. Many applications of the SIP have been
behaviour can be influenced by perceptual bias described by Wenger et al. (1984). The advantage
and behaviour can be influenced by feelings. that this wide application provides is that scores
The authors acknowledged that the SIP does not for many population groups are available for com-
measure positive functioning. The SIP was parison. Given that the length of the SIP (136
developed on the basis of a literature review and items) has been one barrier to its use, there have
after extracting statements from health professionals, been attempts to shorten it. De Bruin et al. (1993)
46 MEASURING HEALTH
have constructed a 68-item version using multi- dysfunction score for the SIP. This figure is then
variate techniques, claiming that its psychometric multiplied by 100 to obtain the SIP overall score.
properties are similar to the longer 136-item version. The two sub-scores (physical and psychosocial) are
Further testing is required. calculated using a similar formula, but limiting the
calculations to the relevant items.
Content Item weights that indicate the relative severity of
limitation implied by each statement. The weights
The SIP incorporates 136 questions, not only on were derived from equal-appearing interval-scaling
functioning, but also on feelings of emotional procedures involving more than 100 judges. The
well-being and social functioning. This contains judges rated each item on an equal-interval 11-
136 items referring to illness-related dysfunction point scale from ‘minimally dysfunctional’ to
in 12 areas: work, recreation, emotion, affect, home ‘severely dysfunctional’. The scaling technique has
life, sleep, rest, eating, ambulation, mobility, com- been justified and described by Carter et al. (1976).
munication and social interaction. A typical set of However, Jenkinson et al. (1991) argued that
questions from the SIP requests respondents to tick Thurstone’s scaling method was unsuccessful as the
statements, which apply to them on a given day and distributions of the SIP were similar whether the
are related to their state of health: scale items were weighted or simply added.
The study based in Sydney by Hall et al. (1987)
I spend much of the day lying down in order to rest. reported that results were skewed towards the
I sit during much of the day.
healthy end of the SIP scale. Eighteen per cent of
I am sleeping or dozing most of the time – day and night.
I stand up only with someone’s help.
patients scored 0 on all components. There were
I kneel, stoop, or bend down only by holding on to no scores above 25 (range of possible scores 0–100).
something. This pattern was repeated within components.
I am in a restricted position all the time. Although the many studies using the SIP have
I do work around the house only for short periods of time provided population norms for comparison, it is
or rest often. unclear what each score represents. Many other
I am doing less of the regular daily work around the scales also suffer this problem. It is known that a
house that I would usually do. normal population may score only 2 or 3 on the SIP,
I am not doing any of the house cleaning that I would increasing to the mid-30s for terminally-ill cancer
usually do. and stroke patients, but the precise definition of
I am going out less to visit people.
I am not going out to visit people at all.
scale points has not so far been tackled.
I am doing fewer social activities with groups of people.
(© The Johns Hopkins University 1977) Validity
Only those questions to which the respondents The early validation studies of the SIP tested it
answer ‘yes’ are recorded; thus there is no way against self-assessments of health status, clinicians’
of knowing in the analysis whether the columns assessments of health status and functional assess-
left blank represent ‘no’ replies or whether the ment instruments; 278 people were assessed. SIP
respondent/interviewer omitted them deliberately scores discriminated between four sub-groups
or in error. divided according to severity of sickness, and
correlations between measures were better for
patients’ self-assessments than physicians’ assess-
Scoring
ments (0.69 with a self-assessment of limitation;
The responses to the 136 items can be scored by 0.63 with a self-assessment of sickness; 0.50 with
component, by physical or psychosocial dimension a clinician’s assessment of limitation; and 0.40 with
or as a single score with a range of 0–100. The lower a clinician’s assessment of sickness). The combined
the score the better the respondent’s health status. SIP score was tested against the Katz ADL scale
The overall score for the SIP is calculated by adding with a correlation of 0.64; and the correlation
the scale values for each item checked across all between the SIP and the National Health Interview
categories and dividing by the maximum possible Survey Instrument was 0.55 (Bergner et al. 1981).
Convergent and discriminant validity was evalu- More negative results relating to sensitivity
ated using the multi-trait-multi-method technique. were reported by Hall et al. (1987) who tested the
Clinical validity was assessed by comparing clinical SIP against the Rand Health Insurance Study
judgements with SIP scores; all achieved good batteries. The Rand and the SIP were reported to
results. These have been described in some detail be measuring different aspects of health. The range
by Bergner et al. (1981). More recent evidence of scores for the Rand measures was less skewed
pointing to its strong validity properties has been than for the SIP. The problem with an instrument,
provided by Read et al. (1987). However, the corre- which registers high scores, is that it may be unable
lations between the SIP score and clinical status to measure improvements in health. Thus the Rand
have tended to range between 0.40 and 0.60, measures have better discriminative abilities than
probably due to the broad nature of the SIP; this the SIP. A follow-up study of 185 stroke patients by
may not be a high enough correlation for the Schuling et al. (1993) reported that the SIP was
successful use of the scale in studies of clinical out- ‘time consuming and tiring’ with these patients, and
come of health care interventions (Anderson et al. it was insensitive to improvement in condition at
1993). eight weeks, in contrast to the cruder Barthel
The Australian study by Hall et al. (1987) Index which did detect improvement. Katz et al.
reported that, with the use of a correlation thresh- (1992) also found that the SIP was less sensitive to
old of −0.30, the SIP correlated with instru- clinical change among patients undergoing hip
ments from Rand. The Rand mental-health index replacement than shorter measures, including the
correlated with the SIP items relating to social SF-36 and a short version of the Arthritis Impact
interaction, emotional behaviour and alertness; Measurement Scales. A review of studies using the
and ambulation from the SIP correlated with the SIP by de Bruin et al. (1992) reported research
Rand physical-abilities scale. Home-management demonstrating that the SIP was insensitive to small
and social-interaction items from the SIP correlated changes and improvements. The SIP’s responsive-
with the role-functioning items on the Rand ness to change has not yet been satisfactorily
battery. However, the correlations ranged from 0.32 demonstrated.
to 0.54. Other research has reported moderate Nanda et al. (2003) assessed the psychometric
correlations (0.40–0.60) between the SIP and scales properties of the short, 68-item version of the SIP,
of anxiety and depression, reflecting the behavioural (SIP68) in comparison with the full version, in a
focus of the SIP (Linzer et al. 1991). sample of people with disability. They reported
However, the Sickness Impact Profile has re- that the correlation between the SIP and SIP68
portedly been used successfully in clinical trials was 0.94; and they were moderate between the
(Bergner et al. 1976a, 1976b, 1981) and is valuable SIP68 and comparable subscales of the SF-36. A
for assessing the impact of illness on the chronically factor analysis of the SIP68 reproduced a factor
ill. It has been used in a randomized controlled trial structure that included 65 of the items. In relation
of early exercise and counselling for patients with to factor structure of the full SIP, de Bruin et al.’s
myocardial infarction. This study reported that the (1992) review also questions the SIP’s construct
SIP showed that the group undergoing exercise plus validity as the results of factor analyses have been
counselling reported better functioning (Ott et al. inconsistent.
1983). It has also been used to evaluate treatment
for patients with end-stage renal disease, with the
Reliability
result that transplantation patients had better SIP
scores (Hart and Evans 1987). In a cross-sectional Work began on the SIP in 1972, and tests for
longitudinal study of 99 women with rheumatoid reliability and validity continued to be conducted
arthritis, the SIP was reported to be sensitive to by its authors for over a decade. Patients studied
one-year pre- and post-treatment changes showing included those with hypothyroidism, rheumatoid
both improvement and deterioration (Sullivan et al. arthritis and hip replacements. The details of the
1990). Grady et al. (2003) reported that it was able reliability tests conducted on the SIP are reported
to discriminate between different groups of heart by Bergner et al. (1981). Earlier reliability tests on
patients at three months after surgery. SIPs of differing lengths before the development
48 MEASURING HEALTH
of the 136 version are reported by Pollard et al. work. They found the items related to age and
(1976, 1978). number of medical conditions but were poor
Test-retest reliability was high (0.88–0.92). predictors of service use (Charlton et al. 1983).
Internal consistency was also high (0.81–0.97). The Another UK application of the Functional Limita-
interviewer-administered versions scored better in tions Profile was a study of 92 patients with chronic
each case than mailed or self-completed versions. obstructive pulmonary disease by Williams and
Deyo et al. (1983) applied the SIP to 79 patients Bury (1989) who reported poor to good corre-
with arthritis and found that test-retest reliability lations between the FLP physical sub-section and
for 23 patients tested was 0.91. It was found to have clinical data of 0.38–0.90. Fitzpatrick et al. (1988)
better reliability than the traditional functional also used the FLP in their study of 105 patients with
scales of the American Rheumatism Association rheumatoid arthritis assessed over a 15-month
and patients’ self-ratings of function. Test-retest period. They used both the FLP and the HAQ,
reliability was better for the overall score than for but reported only modest levels of sensitivity and
each of the dimensions. Results of studies of the specificity in relation to data on clinical change
reliability of the SIP throughout Europe and the for each. Although the authors acknowledged
USA indicate similarly high levels of internal that the FLP provided more information and was
consistency (0.91–0.95), test-retest correlations more precise than the short HAQ, they cautioned
(0.75–0.85) and also inter-rater reliability (0.87– that this has to be weighed against the simpler
0.92) (de Bruin et al. 1992). Nanda et al.’s (2003) measurement assumptions and shorter time
study of people with disability found retest correl- required to administer the HAQ. Fitzpatrick et al.
ations were greater than 0.75 for all dimensions (1988) concluded that the SIP/FLP is less sensitive
except physical disability (0.61). to improvement than to deterioration. McColl et al.
(1995) reported that the FLP had a high level of
item of non-response and also had ceiling effects.
Adaptations
The conclusion relating to the FLP is that it
Although patient acceptance of the SIP was judged requires far more testing for reliability and validity
by the US authors to be good, it has been rejected before it can be considered the UK alternative to
for use in evaluations of some treatment pro- the SIP.
gramme evaluations in the UK on the grounds Some investigators have adapted the SIP to
of its length, and the more concise Nottingham specific clinical conditions, and used a selection of
Health Profile selected in preference (Buxton 1983, the items. Ada et al. (2003) used 30 items from the
O’Brien 1988). SIP in their study of outcome of rehabilitation
A modified version of the SIP was developed at programmes for stroke but reported that they did
St Thomas’s Hospital, London, for a community not detect documented improvements in patients’
disability survey. The modified version is called the walking speed and capacity. Vetter et al. (1989) used
Functional Limitations Profile (FLP) (Patrick 1982; their own adaptation of the SIP along with the
Charlton et al. 1983). Linguistic changes were made Barthel Index to assess rehabilitation outcomes
and scale weights were recalculated, although these among a pilot sample of 59 elderly people receiving
agreed closely with the original weights of the US either home or day hospital care in Wales. The
version. The translation has been criticized by Hunt authors substantially changed the style of the SIP.
et al. (1986) on the grounds that language changes They reported that respondents found the ‘I’
alone do not satisfy the requirements for cross- format (e.g. ‘I dress myself, but do so very slowly’)
cultural adaptations. The changes made are fairly confusing. They became confused about who the
minimal and the FLP is still 136 items long, is ‘I’ was referring to. The statements were, there-
designed to be interviewer-administered and fore, changed to actual questions (e.g. ‘Do you
contains the same range of scores from 0 (low) to dress yourself, but do so very slowly?’). A further
100 (high), although the weighting is different. The problem they found with the SIP was that some
authors later reported that FLP and SIP items questions were not applicable to all patients. Thus
grouped satisfactorily into five global measures: the question ‘Are you unable to walk up and down
physical, psychosocial, eating, communication and hills?’ in the original profile could be answered only
‘yes’ or ‘no’. The researchers thus added a ‘not its developers to measure health-related quality of
applicable’ category to avoid confusion, for life, nor to detect health conditions or states of mild
example, for those who were bed/chairfast. The symptom severity (Hunt 1984). It is too short to
physical dimension scale of the SIP was found to assess the impact of a condition on quality of life.
have acceptable validity and was judged as a suitable For this, combinations of measures are required:
measure of outcome (other dimensions are yet to be a functional-disability scale, symptom and pain
tested), although the correlations were unreported. indices, a measure of psychological disturbance,
The SIP was also found to be a stable measure when quantitative and more qualitative methods of the
repeated eight weeks later on the patients (they impact on social functioning (e.g. work, inter-
were not expected to change in relation to physical personal relationships and social support, domestic
state). life, etc.). The NHP can provide only a shallow
The advantages of the SIP and its adaptations profile of effects on these aspects.
are that it can be self-administered or interview Pilot work with the NHP led to the identifica-
based, it can be used with chronically or acutely ill tion of relevant concepts. Statements were drawn
patients, it has been adapted for use in the UK as the which exemplified those concepts and, after further
Functional Limitations Profile, and it is well tested piloting, statements were finally categorized into
for reliability and validity. McDowell and Newell six areas: physical mobility, pain, sleep, energy,
(1996), in their review of health-status measures, emotional reactions and social isolation. The
noted the thoroughness and care with which the Nottingham Health Profile is designed for self-
SIP was developed, and its frequent use as a gold completion, is concise and easily administered and
standard against which scales are evaluated. Its was the first measure of perceived health which was
limitations are its length and the fact that it can be extensively tested and developed for use in Europe.
used only with people who are regarded or who Hunt et al. (1986) reported that, when the NHP is
regard themselves as ill; also, its factor structure and based on a postal survey, people are unlikely to
responsiveness to change has not been adequately return it if they have a high number of zero scores
demonstrated. International versions of the SIP (no problems). People felt they had nothing to
have been reviewed by Anderson et al. (1993). contribute to the study. Pilot work has been under-
taken with positive items as dummies: ‘I sleep
soundly at night’; ‘I am usually free of any pain’.
THE NOTTINGHAM HEALTH PROFILE (NHP) The NHP is short and simple, and it can be used
with groups of patients or a general population.
The Nottingham Health Profile was developed in Although it is suitable for people who are not
the UK and is based on lay perceptions of health necessarily unhealthy or ill, like many others, it
status. The conceptual basis of the NHP was that focuses on negative rather than positive experi-
it should reflect lay rather than professional ences. Population norms exist for the instrument,
definitions of health. It was developed after inter- as do scores on individual patient groups (Hunt
views with a large number of lay people about the et al. 1984b).
effects of illness on behaviour. Hunt et al. (1986)
commented that their pilot interviewing to develop
Content
the NHP demonstrated that lay people had a
limited range of language for describing good Part I measures perceived or subjective health status
health and well-being, which would have made the by asking for yes/no responses to 38 simple state-
creation of a comprehensive health index difficult. ments of six dimensions: mobility, pain, energy,
The NHP is not an index of disease, illness or dis- sleep, emotional reactions and social isolation. Each
ability but relates rather to how people feel when dimension has a range of possible scores of 0–100.
they are experiencing various states of ill health. As Part II asks about any effects of health on seven
a survey tool it is useful in assessing whether people areas of daily life: work, looking after the home,
have a (severe) health problem, although diagnostic social life, home life, sex life, interests, hobbies and
data would be required to point to the kind of holidays. Part II items are coded 1 for ‘yes’ and 0
health problem. The measure was not intended by for ‘no’ and then scored. Part II is not always a
50 MEASURING HEALTH
useful addition. For some groups several items formations can be applied to skewed data, thus
do not apply (e.g. the elderly, the unemployed, the permitting the application of parametric statistics.
disabled, those on low incomes). The authors Part I of the NHP requires respondents to indi-
carried out later developmental work on Part II and cate ‘yes’ or ‘no’ according to whether the state-
recommended that Part II should no longer be ment applies to them ‘in general at the present
used. Examples of the NHP items in Part I, which time’. Relative weights are applied to these. All
asks about the applicability of statements to the statements relate to limitations on activity or
respondent at the present time, include: aspects of distress. Dimension scores of 100 indicate
the presence of all limitations listed, and a zero score
I’m tired all the time. the absence of limitations, but these two extremes
I have pain at night. do not reflect the extremes of death or perfect
Things are getting me down. health. Part II, which has been temporarily with-
I have unbearable pain.
drawn, relates to seven areas of task performance
I take tablets to help me sleep.
affected by health. Respondents answer ‘yes’ if
A main advantage of the NHP is that its authors their present state of health is causing problems
have published a book containing a review of the with the activity. Part II has no weights: a count of
development of the scale, studies of its reliability and affirmative responses is used as a summary statistic.
validity, the NHP items and a users’ manual (Hunt It is not possible to calculate an overall health-status
et al. 1986). score, although aggregation within categories is
permitted.
There can be problems with the sensitivity of the
Scoring
NHP as a survey instrument because of the zero
The NHP is scored with scores ranging from 0 (no modal response, which means that the NHP does
problems) to 100 (where all problems in a section not discriminate for a substantial proportion of the
are affirmed). The weighted scores are summed for adult population. The problem with using the NHP
each NHP domain, but an overall score is not within a population survey is that the sample would
obtainable. It has been extensively tested for include a large number of relatively fit members
reliability and validity, and results were reported by who would gain low NHP scores. As it is a severe
the authors to be good. The weights for the NHP measure, minor illnesses are not detected by the
were derived using Thurstone’s method of paired instrument, and therefore minor improvements
comparisons. judgements were obtained from over time are unlikely to be detected. This problem
several hundred people and converted into weights was acknowledged by its authors: ‘The NHP is
using the appropriate formula (McKenna et al. clearly tapping only the extreme end of perceived
1981). However, Jenkinson et al. (1991) provided health problems. Such a distribution was built into
evidence that the use of Thurstone’s scaling the NHP during its development. At an early
method was unsuccessful as the distributions of the stage it was decided that it would be undesirable to
NHP were similar whether the scale items were include health problems which would be affirmed
weighted or simply added. They also detected by a large proportion of the population’ (Hunt et al.
inconsistencies in the scoring system with the score 1986).
for a person with walking difficulties erroneously Hunt (1988) reported on the proportion of
exceeding the scores for those who were totally zero scorers from combined data from several
unable to walk. community studies using the NHP:
The NHP scores are not ‘true’ numbers but
are obtained from a scaling technique; thus the Section Proportion of zero scores
appropriate statistical tests for testing hypotheses are Pain 83
non-parametric. Its highly skewed distribution of Social isolation 82
scores (see column 2) means that careful con- Physical mobility 78
sideration needs to be given to the application Energy 75
of statistical tests as many assume that the results Emotional reactions 61
are normally distributed although statistical trans- Sleep 56
As she admits, this instrument only taps the more the scale: the items were drawn from lay experi-
extreme end of a distribution. ences and respondents were able to relate to them
Kind and Carr-Hill (1987) used the NHP with and understand the relevance of the items (Hunt
1,598 people in a follow-up study of the Rowntree et al. 1986).
Poverty Survey in York. The sample was weighted Testing for discriminative ability took place with
towards older and retired adults. Negative responses four groups of people aged 65+; 40 people partici-
to all categories of the NHP exceeded 60 per cent, pating in a research exercise programme; 19 patients
and for social isolation the figure rose to 89 per from a general practitioner’s (GP’s) list who had no
cent, in evidence of its negative skew. Three items in known disability or illness and who had not con-
particular only received 2 per cent endorsement: tacted a doctor within the prior two months; 49
‘I find it hard to dress myself’, ‘I’m unable to walk people with a variety of health and social problems
at all’, and ‘I feel that life is not worth living’. Also, attending a local authority-run luncheon club; 54
three of the most often cited items were drawn chronically ill patients on GPs’ lists; 352 randomly
from the sleep category and three of the least fre- selected patients from GPs’ lists. Results showed
quently cited items were from the physical-mobility that the NHP was able to discriminate between
category. The authors argued that the skewness of groups of ‘well’ and ‘ill’ people, and in terms of
the distribution of responses does not appear to be physiological fitness; high and low GP consulters;
based on any logic. For example, nearly half of those social classes, age groups and sexes and that the
who made just one positive response selected content of the questions was understood by, and
‘I’m waking up early in the morning’, 17.5 per acceptable to, elderly people. Perceived health status
cent also chose ‘I lose my temper easily’ but no was also associated with objective health status
other item was chosen by more than 5 per cent (Hunt et al. 1980; 1986). Jenkinson et al. (1988)
of this group. In contrast, among those who assessed the sensitivity of the NHP with 39 rheuma-
scored heavily (11 or more positive responses), six toid arthritis sufferers and 43 migraine sufferers.
items were more popular than ‘I lose my temper The NHP was able to distinguish between the two
easily’. They reported no simple relationship patient groups: the arthritis patients scored worse in
between the overall score and the probability of relation to effects on energy levels and mobility.
responding to any one item and that there appeared Discriminant analysis showed that the NHP was
to be considerable redundancy among items. The able to discriminate between 79 per cent of arthritis
scale and its scoring system have frequently been and 93 per cent of migraine patients. Other investi-
criticized for inconsistencies and anomalies gators have also reported good results with the
(Anderson et al. 1993). Jenkinson et al. (1991) also NHP with coronary patients (Permanyer-Miralda
provided evidence that the use of Thurstone’s et al. 1991), and have supported the ability of the
scaling method was unsuccessful as the distributions NHP to discriminate between a normal population
of the NHP were similar whether the scale items and those with a range of serious medical con-
were weighted or simply added. They also detected ditions (van Agt et al. 1993).
inconsistencies in the scoring system with the score The NHP has been shown to be sensitive to
for a person with walking difficulties erroneously change. One unpublished study cited by Hunt
exceeding the scores for those who were totally et al. (1986) was based on a sample of 80 pregnant
unable to walk. women. The NHP was administered to the women
at three stages during their pregnancy – 18, 27
and 37 weeks. The NHP was sensitive to changes
Validity
during pregnancy. The NHP was also administered
The NHP was well tested for face, content and to 141 patients attending a fracture clinic and an
criterion validity by its developers, and has generally equal number of control subjects. Scores were
been reported to be a satisfactory measure of obtained soon after the fracture occurred and eight
subjective health status in physical, social and weeks later. Scores were sensitive to changes in
emotional spheres. The face and content validity perceived health, concomitant with the healing of
of the subjective items were established by the the fracture (McKenna et al. 1984). O’Brien et al.
developers on the basis of the method for devising (1988) used the instrument to measure quality of
52 MEASURING HEALTH
life before and after combined lung and heart Reliability
transplantation in the UK. It was again sensitive to
changes and correlated well with clinical measures Modest inter-correlations (0.32–0.50) between the
such as exercise capacity (correlation coefficient for NHP dimensions of pain, energy and sleep were
the latter not given). The results indicated signifi- reported by over 1,000 patients with zoster (reacti-
cant reductions in NHP scores for patients at three vation of the varicella zoster virus) were reported by
months after surgery; no further differences were Mauskopf et al. (1994) (the other dimensions of the
recorded at six and twelve months post-operatively. NHP were weaker). The developers reported on
In a study of 196 heart transplant patients, the NHP two studies which tested the reliability of the NHP,
was reported to be sensitive to deterioration in using the test-retest technique. These were based
patients’ condition prior to transplant and to post- on 58 patients with osteoarthritis and 93 with per-
transplant improvement. It was able to predict ipheral vascular disease. The questionnaires were
outcome in relation to length of hospital stay, repeated with these two groups at four and eight
return to work and leisure activity at three months weeks after the first administration respectively.
after transplantation (Buxton et al. 1985; Caine et al. Both demonstrated a fairly high level of reliability
1990). Caine et al. (1991) also used the NHP to with correlations of between 0.71 and 0.88, with
study the quality of life of 100 males aged under 60 the exception of the items on home life (0.64),
before and after coronary artery bypass grafting. social life (0.59), interests and hobbies (0.44) for
The NHP was sensitive to improvements in health patients with osteoarthritis; and social life (0.61),
following the procedure (at three months and one looking after the home (0.64) and work (0.55)
year later). The instrument was reported to be for patients with vascular disease (Hunt et al.
sensitive to improvements in patients’ condition 1981; 1986). Kutlay et al. (2003) tested the reliability
following transplantation and coronary artery of the NHP with haemodialysis patients. They
bypass surgery (Buxton et al. 1985; Wallwork and reported test-retest Pearson’s correlation co-
Caine 1985). efficients (at two weeks apart) were 0.61 or greater.
However, its performance in relation to respira- Cronbach’s alpha coefficients for the dimensions
tory disease, and clinical measures of respiratory of the NHP ranged between 0.61 and 0.79 (the
function, has been variable (Alonso et al. 1992). energy, sleep and social isolation dimensions were
While the scale appears sensitive to changes follow- well below 0.70). The NHP has been reported to
ing dramatic treatment interventions, its perfor- be less stable in relation to rheumatology patients
mance with less dramatic, and more minor, treat- (Fitzpatrick et al. 1992).
ments is less certain (Hunt et al. 1984a; Brazier et al. Part II of the NHP does not obtain as good
1992). Jenkinson et al. (1988) used both the NHP results as Part I, which explains its withdrawal.
and Goldberg’s General Health Questionnaire For example, Hunt et al. (1981) reported that the
on a population of 39 rheumatoid arthritis and test-retest reliability coefficients on the osteo-
43 migraine patients and reported a moderate arthritis patients ranged between 0.77 and 0.85 for
correlation of 0.49 between the GHQ and the Part I in comparison with a wider range for Part II
emotional-reactions scale of the NHP for both at 0.44 to 0.86. The authors note that any changes
samples. The implication is that the NHP was pro- in perceived health between the two administra-
viding no more than a moderately accurate measure tions will consequently reduce the correlations
in this domain. (Hunt et al. 1981). The NHP does not meet the
There has been some criticism that each section requirements for carrying out split-half reliability as
of the NHP does not represent just one dimension it is too short and the items are not homogeneous.
(Kind and Carr-Hill 1987; Kind, Mimeo undated). The authors also argue that it is not possible to test
In particular, fairly high correlations between it against an acceptable gold standard as no suitable
the pain and physical-mobility categories were measure exists.
reported. While covariation might be expected A study of its adaptation into French by Bucquet
for items within categories, covariation between et al. (1990) confirmed the immediate intelligibility
items in different categories raises difficulties in of the French version, although it was not without
interpretation of a cross-category profile. problems. The authors used the same methods of
calculating item weights as McKenna et al. (1981) THE MCMASTER HEALTH INDEX QUESTIONNAIRE
for the original version, based on Thurstone’s paired (MHIQ)
comparisons; their study was based on a quota
sample of 625 people (judges). However, Bucquet The McMaster Health Index Questionnaire was
et al. did report that the respondents had difficulties developed in Ontario, Canada, as a measure of
making these ratings, especially in relation to the physical, social and emotional functioning, under-
pain, mobility, energy and sleep sections. They did pinned by the WHO (1947, 1958) definition of
not all grasp the concept of a ‘general view’ easily health. The measure was intended to provide
when comparing the health statements. Inter- independent measurements of these separate areas,
national versions of the scale have been reviewed by given the authors’ recognition that two individuals
Anderson et al. (1993). with the same level of physical disability may differ
In sum, the advantages of the NHP are that it is widely in their social and emotional functioning
short, simple and inexpensive to administer, it (Chambers 1998; Chambers et al. 1976). The aim
can be self-administered or interview based, it has was to produce a health-status questionnaire suit-
been well tested for validity. However, it provides able for administration to general populations that
only a limited measure of function, and some dis- could be used to predict a health professional’s
abilities are not assessed at all: e.g. sensory defects, clinical assessment of a person’s health. It was
incontinence, eating problems. It also lacks an developed on the basis of a literature review of
adequate index of mental distress and requires health-status measurement, brainstorming sessions,
supplementation if used as a broader measure of and consultation of experts.
health-related quality of life. The measure was not The Social Function Index section was
intended by its developers to measure health- developed after consideration of existing scales,
related quality of life, nor to detect health con- including the Spitzer Mental Status Schedule, the
ditions or states of mild symptom severity (Hunt Cornell Medical Index and the Katz Adjustment
1984). It is too short to test for split-half reliability, Scale (Brodman et al. 1949; Herron et al. 1964;
and it is a severe measure with a highly skewed Spitzer et al. 1964), and a range of survey instru-
distribution – it may not measure minor improvements. Review of sociological studies of leisure
ments in health. The result of the focus on severe and social participation produced an additional
conditions is that some people who are in distress list of social function questions. The selection
may not show scores on the profile. Similarly, criteria for item inclusion were positive as well
normally healthy persons or those with few ail- as negative discriminative ability (i.e. good as well as
ments may affirm only a small number of state- poor functioning was intended to be identified);
ments on some sections. This makes it difficult to general applicability and acceptability; low
compare their scores over time. People who score cost and quick administration; and quantifiable
zero cannot be shown to improve over time. It is a responses.
negative measure of health. The Health Index section of the questionnaire
Despite limitations, there have been numerous included additional items on physical function and
applications of the NHP in clinical and community items relating to symptoms (respiratory symptoms)
settings. It has been used successfully as an out- and behaviour (cigarette use). Clinical assessments
come measure with patients undergoing heart of social function from ‘very poor’ to ‘very good’
surgery (O’Brien 1988; O’Brien et al. 1988), were included. The original version contained
correlates well with clinical judgements of morbid- 172 items. The best 59 items were identified after
ity and prognosis and is simple to administer and assessing responsiveness to change, prediction of
analyse. Although it has been a popular measure of physician’s assessments and multivariate analyses,
health status and outcome in Europe, and it is still using physicians’ assessments as the criterion
being adapted for use in other languages (Lovas variable. These items were tested initially on 70
et al. 2003; Uutela et al. 2003), its use is declining. patients in an acute medical ward and repeated
The measure is strictly copyrighted and there after their discharge. The draft questionnaire was
are charges for its use (with some exceptions, e.g. reported to be quick and simple to administer,
student use). acceptable to respondents and sensitive to changes
54 MEASURING HEALTH
in health status (values unreported). Clinical assess- the time the MHIQ was completed. Examples of
ments were repeated by an independent physician questions are:
on 54 patients. Consistency ratings resulted in a
Goodman-Kruskal Index of Agreement value of Physical
0.90. Today, are you physically able to take part in any
Further testing with over 200 patients registered sports (hockey, swimming, bowling, golf, and so forth) or
with a family physician reported high consistency exercise regularly?
between social functioning items (dichotomizing 1 No
responses into good and poor). The sensitivity 2 Yes
of the instrument was assessed to be good. Self- Do you have any physical difficulty at all driving a car by
completion of the final version of the McMaster yourself?
Health Index Questionnaire takes 20 minutes. The 1 No
McMaster Health Index Questionnaire has been 2 Yes
used on a number of different patient populations, Is this because of a physical difficulty?
including psychiatry out-patients, diabetic patients, 1 No
respiratory-disease patients, patients with myo- 2 Yes
cardial infarction and patients with rheumatoid
arthritis (see review by Chambers 1984). Emotional health
(strongly agree 1———strongly disagree 5)
Content I sometimes feel that my life is not very useful.
People feel affectionate towards me.
While the early version of the scale contained 150
items, it was later abbreviated to 59 items. It con- Social health
tains 24 physical-function items (physical activity,
mobility, self-care, communication (sight, hearing), (good 1———poor 5)
global physical function); 25 social-function items How would you say your social functioning is today?
(general well-being, work/social role performance/ (By this we mean your ability to work, to have friends,
material welfare, family support/participation, and to get along with your family.)
friends’ support/participation and global social How much time, in a one-week period do you usually
function); and 25 (overlapping with social-function spend watching television? (none———more than two
items) emotional-function items (self-esteem, hours a day).
feelings toward personal relationships, thoughts
about the future, critical life events and global
Scoring
emotional function). The MHIQ contains only 59
items as some items cover both social and emotional A score of 1 is given to ‘good function’ responses
functions. The overlapping items are thus counted for each item, and 0 is assigned to ‘poor function-
twice in the scoring, and carry twice the weight of ing’ responses. The scores are summed, although
other items. Validation of this method is required. alternative weighting schemes have been developed
All the physical function items are designed to (Chambers et al. 1982).
evaluate the patient’s functional level on the day
the MHIQ is administered. The social-function
Validity
items are explicitly concerned with a specific time
period (usually the present). The agree–disagree The authors reported, without publishing the
emotional-function items are phrased in the present values, that the MHIQ correlated well with other
tense. Other emotional-function items refer to scales: the physical-function item correlated well
the recent past as specifically defined within the with rheumatologists’ and occupational therapists’
question (e.g. within the last year). The emphasis is assessments based on the Lee Index of Functional
on ability, rather than performance (‘Can you . . .?’ Capacity; the MHIQ index of emotional, physical
rather than ‘Do you . . .?’). The aim was to elicit and social function correlated well with Bradburn’s
information on activities that could be observed at measures of psychological well-being. The physical-
function index also correlated well with analogue amenable to mathematical scoring construction; it
pain scales and with clinical and biological results focuses on present ability; it is positive in orienta-
(Chambers et al. 1982). tion; it is simple and inexpensive to administer, and
Assessments with a group of 96 physiotherapy acceptable to patients. It also has an acceptable level
patients showed a change in MHIQ scores between of reliability. However, it is of questionable applic-
first visit and discharge, indicating sensitivity to ability to frailer older people (e.g. the question on
change (Chambers et al. 1982) (numbers and values sports participation in physical index).
unreported). A subset of the items has been success-
fully used to predict patient outcome in a random-
ized controlled trial of patients treated by nurse THE RAND HEALTH INSURANCE/MEDICAL
practitioners and doctors (Sackett et al. 1974). Self- OUTCOMES STUDY BATTERIES
completion of the MHIQ has been reported to be
superior in terms of sensitivity to change than the The Rand Corporation’s health batteries were
other methods (Chambers et al. 1987). Ninety-six designed for the Rand Health Insurance Study
patients in a physiotherapy clinic were administered (HIS), which was an experiment of health outcome
the MHIQ at four points in time. Patients were following the random allocation of adults to various
randomly assigned to different modes of adminis- insurance plans in the USA. The HIS was based on
tration: interviewer administered, self-completion, a sample of about 8,000 people in 2,750 families
or telephone interview. Self-completion was in six sites across the USA. The Medical Out-
superior in terms of sensitivity to change than the comes Study (MOS) involved the more detailed
other methods (Chambers 1984). However, the assessment of health outcome and led to further
measure has been reported to be less sensitive to development of the batteries (Stewart and Ware
changes in rheumatoid arthritis patients’ conditions 1992). The batteries were developed for use in
than the disease specific McMaster-Toronto population surveys, and more specifically as out-
Arthritis and Rheumatism (MACTAR) question- come measures to detect changes in health status
naire (Tugwell et al. 1987). that might be expected to occur as a result of
health-service use within a relatively short period
of time. The batteries cover physical health,
Reliability
physiological health, mental health, social health
Chambers (1984) reported that reliability was and perceptions of health. The measures were
assessed by asking 30 physiotherapy out-patients to intended to be sensitive to differences in health in
complete the MHIQ on two occasions within a general populations (Ware and Karmos 1976;
one-week period. Patients were not expected to Stewart et al. 1978, 1981, 1989; Ware et al. 1979,
change in their functional status. The correlation 1980; Stewart and Ware 1992). The HIS batteries
between the physical- and emotional-function were developed after extensive research into
values was 0.80. Intraclass correlation coefficients existing measures, testing of adaptations of existing
ranged from 0.48 to 0.95 for the physical, emotional measures, or of new measures developed on the
and social scores. Internal consistency coefficients basis of extensive reviews of the literature. Each
between the physical-, emotional- and social- section can be used independently. Many investi-
functional indices were 0.76, 0.67 and 0.51 respec- gators across the world also used the Rand batteries
tively. Reliability was not affected by self- or (e.g. Hall et al. 1987). Each battery can be used inde-
interviewer-administered scales. Inter-rater agree- pendently. The authors have a wide range of pub-
ment between four raters for the physical dimen- licly available reports of findings to date, and some
sion of the early version of the scale was 0.71 published papers which show the interrelationships
(Kendall’s coefficient of concordance) (Chambers between the HIS batteries (Stewart et al. 1989;
1993). Wells et al. 1989a, 1989b). A short 20-item version
In sum, while more studies of its reliability and of the batteries was developed (Stewart et al. 1988),
validity are needed, and the use of the scale is although this was soon overtaken by a 36-item
declining, the advantages of the scale are its flexi- version (the Short Form-36), which is now used
bility – it can be used in different settings; it is worldwide, with good results for tests of reliability
56 MEASURING HEALTH
and validity (Anderson et al. 1990; Stewart and Do you have any trouble either walking several blocks
Ware 1992). The International Quality of Life or climbing a few flights of stairs, because of your health?
Assessment (IQOLA) Project translated and Yes 1/No 2
adapted the widely used the SF-36 Health Survey Do you need help with eating, dressing, bathing, or
Questionnaire – in several countries, and provided using the toilet because of your health? Yes 1/No 2
norms for the new translations, in order to facilitate Can you do hard activities at home, heavy work like
their use in international studies of health outcomes scrubbing floors, or lifting or moving heavy furniture?
(Aaronson et al. 1992). Yes 1/Yes, but only slowly 2/No, I can’t do this 3
If you wanted to, could you participate in active sports
PHYSICAL HEALTH BATTERY such as swimming, tennis, basketball, volleyball, or
rowing a boat? Yes 1 /Yes, but only slowly 2/No,
Physical health in the Health Insurance Study was I can’t do this 3
operationalized in terms of functional status. A
review of the literature revealed six categories
Many of these items are inapplicable to very elderly
of functional activity for which performance has
people (e.g. strenuous sports activity) or are too
been assumed to reflect physical health: self-care
general (e.g. asking about several limitations in
(e.g. feeding, bathing), mobility, physical activities,
the same question) to be useful in surveys where
role activities (e.g. employment), household
discriminative ability is important.
activities and leisure activities. These six areas were
thus incorporated into the Rand battery, called the
Functional Limitations Battery. These items in the Scoring
Functional Limitations Battery were based on items
from the US National Health Interview Survey and Item score ranges vary: 0–3 (none to more severe
Patrick et al.’s (1973a, 1973b) work on developing limitations) for mobility; 0–4 for physical limita-
functional ability scales. Performance was measured tions; 0–2 for role limitations. Items can be scored
by three separate batteries of items: functional in groups: mobility limitations, role limitations, self-
limitations, physical abilities and disability days. care limitations (the latter included only one item).
An advantage of the Rand Functional Limita- An overall score – the Functional Status Index –
tions Battery over a number of other scales is utilizes all items and is an index of the number
that it relates the scale to the cause of the physical of categories in which a person has one or more
incapacity. The Rand Physical Health Battery limitations (0–4). This has not been fully tested for
includes the Physical Abilities Battery in addition validity.
to the Functional Limitations Battery. These are Results from the Rand study showed that the
similar to each other and were intended to be physical health measures yielded skewed distri-
administered to respondents at different stages of butions: most sample members had no functional
the longitudinal HIS. limitations or physical disabilities. This is a limita-
tion regarding its suitability for use in population
Content surveys.
Fourteen HIS items assess activities in self-care,
mobility, physical, household and leisure activities, Validity
and role activities. Response choices are either ‘yes’
or ‘no’ or ‘yes, can do’, ‘yes, can do but only slowly’, The authors judged the measure to have acceptable
‘no, unable to do’. Positive responses to problems content validity as the content of items included
lead to further items on length of restriction. in the physical-health measures reflected the range
Examples from the (revised) Functional Limita- of content reported in the literature. Construct
tions Battery and Physical Abilities Battery are: validity was assumed by the authors as associations
between the Functional Limitations Battery and
Does your health limit the kind of vigorous activities the Physical Abilities Battery were moderate to
you can do, such as running, lifting heavy objects, or strong (0.49–0.99), indicating an underlying
participating in strenuous sports? Yes 1/No 2 general construct, presumably physical health.
Reliability over the way you act, talk, think, feel, or of your
memory?
Reliability was estimated for the physical-abilities
measures using internal-consistency coefficients. No, not at all 1
The internal-consistency and reproducibility co- Maybe a little 2
Yes, but not enough to be concerned or worried
efficients for HIS measures of physical health were
about it 3
high (0.90 or above); test-retest coefficients (at four Yes, and I have been a little concerned 4
months apart) were also high (0.92–0.99). Yes, and I am quite concerned 5
Yes, and I am very much concerned about it 6
MENTAL HEALTH BATTERY In general, would you say your morals have been above
reproach?
As a result of an extensive literature review, ranging Yes, definitely 1
from assessments of depression scales to general Yes, probably 2
well-being schedules, the measurement of symp- I don’t know 3
toms of affective (mood) disorders (e.g. depression, Probably not 4
anxiety) was considered important, as well as well- Definitely not 5
being and self-control of behaviour, moods, thought
and feelings. The HIS Mental Health Battery How often have you felt like crying, during the past
contains items hypothesized to measure these con- month?
structs. It has been revised since its development Always 1
in order to extend the range of measurement Very often 2
(particularly with anxiety and depression). Fairly often 3
Sometimes 4
Almost never 5
Content Never 6
The Revised Mental Health Battery is constructed
from the General Well-Being Questionnaire. A Scoring
new subscale has been added defining loss of
behavioural/emotional control. The revised version Eight items measure social desirability and should
was based on factor analyses. The General Well- be scored separately from the 38 Mental Health
Being Questionnaire comprises 46 questions with Index items.
four to six response choices for each item, ranging Some items are recoded for scoring purposes.
from extremely positive to extremely negative Subscales can be created relating to a life-satisfaction
evaluations. The Mental Health Battery utilizes item; a psychological-distress scale; a psychological
38 of these questions; the remaining eight items well-being scale and the mental-health index. This
are used to estimate socially desirable responses. involves the simple addition of item scores and the
Examples of items are: recoded scores. High scores are interpreted dif-
ferently for different subscales, according to the
How much have you been bothered by nervousness or scale name; for example, a high score for a negative
your ‘nerves’ during the past month? scale corresponds with an unfavourable score and
a high score for a positive scale corresponds with a
Extremely so – to the point where I could not work favourable score. Full details of the scoring method
or take care of things 1
Very much bothered 2
are available from the authors at Rand Corporation,
Bothered quite a bit by nerves 3 Santa Monica.
Bothered some – enough to bother me 4
Bothered just a little by nerves 5 Validity
Not bothered at all by this 6
The content validity of the HIS Mental Health
During the past month, have you had any reason to Battery was assumed by the authors as it contained
wonder if you were losing your mind, or losing control items represented in the literature. Construct
58 MEASURING HEALTH
validity is more questionable as correlations six CES-D items and the two DIS items. The latter
between these scales and those constructed for two items were important predictors of depression
validity studies ranged from 0–0.01 to 0.94. This and had the largest coefficients in the final regres-
reflects doubt about the measurement of a common sion model. A six-item screener, with the latter two
construct within the scale. The authors use these items removed was also tested – the six-item depres-
findings to support their decision to score and inter- sion screener. This performed very similarly to the
pret separately the four construct specific mental- eight-item screener, but the eight-item screener
health scales. An Australian community survey by performed slightly better overall.
Hall et al. (1987) reported a correlation of −0.76
between the Rand Mental Health Battery and the Content
General Health Questionnaire which largely All the questions relate specifically to depressive
assesses depression and anxiety. symptoms. The response format ranges from yes/no
choices for the first three depression questions, to a
Reliability four-point response choice ranging from 0 = rarely,
or none of the time, to 3 = most or all of the time
The reliability of the HIS mental-health measures (the scores are reversed for one positive item in the
was estimated using internal-consistency and test- scale: ‘I enjoyed life’). Examples are:
retest coefficients. Internal consistency estimates
were fairly high (0.72–0.94). Test-retest estimates 11 In the past year, have you had two weeks or more
ranged from 0.70 to 0.80, the time period between during which you felt sad, blue or depressed; or
when you lost all interest or pleasure in things that
administrations being generally less than one week.
you usually cared about or enjoyed?
A more recent development is a brief eight-item Yes/no
depression scale, with initially promising results for
reliability and validity (Burnam et al. 1988; Wells 12A Have you felt depressed or sad much of the time in
et al. 1989a, 1989b). the past year?
Yes/no
13 For each statement below, mark one circle that best
DEPRESSION SCREENER describes how much of the time you felt or behaved
this way during the past week.
This eight-item, short self-report measure was During the past week:
developed to screen for depression (major depres- (b) I had crying spells
sion and dysthymia) in the Rand Medical Out- (d) I enjoyed life
comes Study (MOS) in the USA. It was developed
for use in a screening instrument of three chronic Rarely or none of the time (<1 day)/some or a little
diseases, and it was intended that the whole battery of the time (1–2 days)/occasionally or a moderate
amount of the time (3–4 days)/most or all of the
should not take more than 10 minutes to complete
time (5–7 days).
(Burnam et al. 1988).
The scale was developed on over 5,000 people
Scoring
from a general population sample, mental health
service and primary care users. The study measures The individual items carry different weights, and
included the 20-item Center for Epidemiologic two of the items relate to diagnostically relevant
Studies Depression Scale (CES-D) (Radloff 1977) periods. These features distinguish it from other
which enquired about symptoms and frequency, and depression scales. There is, however, a complicated
two items from the Diagnostic Interview Schedule scoring equation because of the differential weights
(DIS) on duration of symptoms (Robins et al. applied to the items (see Burnam et al. 1988 for
1981). The full Diagnostic Interview Schedule was details).
also used to assess psychiatric disorders (as a gold
standard). To select the best items for the screener, Validity
logistic regression analyses were employed. The The test results showed that the screener had high
final set of items selected for the screener included sensitivity and good positive predictive value for
detecting recent depressive disorders and those that short screening questionnaire based on the Rand
met full DSM-III criteria (American Psychiatric Health Insurance Experiment Mental Health
Association 1987, 1994; Burnam et al. 1988). It Inventory. Berwick et al. (1991) have developed a
was better at predicting depressive disorder in the five-item screening test which is able to detect the
past month than within the past 6 or 12 months. most significant Diagnostic Interview Schedule
Varying the cut-off point for the screener improved (DIS) disorders, and it performed as well as the
the sensitivity for longer prevalence periods (6 original longer (18-item) version, and as well as
months, 12 months and lifetime). Detailed results Goldberg’s (1978) General Health Questionnaire
for the specificity and sensitivity of the instrument (30-item version).
by cut-off points have been published (Burnam et
al. 1988).
THE SOCIAL HEALTH BATTERY
Reliability The Social Health Battery was developed alongside

The test-retest reliability of the two DIS items from the other Rand health measures assessing physical
the screener were tested on 230 adults living in the and psychological status.
community (baseline interview with the DIS and On the basis of the literature again, social health
telephone follow-up of the depression sub-section was operationally defined in terms of interpersonal
of the DIS) showed that the overall agreement interactions. The Social Health Battery measures
between the two DIS items asked on a lifetime basis social well-being and support, operationalized by
was 86 per cent for two weeks of feeling depressed, measuring social interaction and resources. The
and 91 per cent for two years of feeling depressed. two main dimensions of the 11-item battery are the
The authors stated that the screener is suitable for number of social resources a person has, and the
population surveys and surveys of health-service frequency with which he or she has contact with
users. It is a promising development; some minor relatives and friends.
alteration of wording (e.g. ‘blue’) will be required,
and testing needed, before adoption elsewhere Content
(Burnam et al. 1988). The 11-item scale covers home, family, friendships,
social and community life. It does not cover satis-
Modifications of the screener and similar depression faction with relationships. The scale has been
screeners from the Rand batteries described by Donald and Ware (1982). Questions
include:
Three key items in the screener have been adapted
for inclusion in one of the US versions of the SF-36 About how many close friends do you have – people
(the SF-36D). The items relate to (1) depression in you feel at ease with and can talk with about what is on
the past year for two weeks or more, (2) depression your mind? (You may include relatives.)
for most days over a two-year period, and (3)
depression for much of the time in the past year. During the past month, about how often have you had
The complex scoring of the screener has not been friends over to your home? (Do not count relatives.)
attempted, but instead the patterns of yes/no Every day.
responses are used to identify patients at risk for Several days a week.
major depression or dysthymia. A ‘yes’ reply to About once a week.
question 1 indicates a risk for major depression, Two or three times in the past month.
and a ‘yes’ answer to questions 2 and 3 indicates a Once in the past month.
risk for dysthymia. Similar questions tested in large Not at all in the past month.
community studies identified 89 per cent of adults
with a psychiatric diagnosis of major depression or And how often were you on the telephone with close
dysthymia (Health Outcomes Institute (previously friends or relatives during the past month?
Interstudy) 1990). Every day.
There have been other attempts to develop a Several times a week.
60 MEASURING HEALTH
About once a week. on the subjective components of social support.
Two or three times. An early 17-item version was used on almost 2,000
Once. patients with chronic diseases in the Rand MOS
Not at all. study. For this study, social support was operational-
ized by four multi-item measures of the availability,
An additional item asks about frequency of letter if needed, of four distinct types of functional
writing but the authors advise dropping this item as support: tangible support, involving the provision of
so few people answer in the affirmative at all. material aid or behavioural assistance; affectionate
support, involving expressions of love and affection;
Scoring positive social interaction, involving the availability
of other persons to do pleasurable things with;
An overall social support score utilizes all the items
and emotional/informational support, involving
(except letter writing and a general question asking
the expression of positive affect, empathetic under-
about how the person gets on with others). This
standing, and the offering of advice, guidance or
scoring has not been fully tested for validity.
feedback (Sherbourne and Hays 1990). The latter
was judged to be important to include because the
Validity authors felt that this type of support would be
Data about validity was drawn from 4,603 inter- beneficial to the health outcomes of people with
views from the Health Insurance Study. The Social chronic illnesses.
Health Battery is judged by its authors to have Although the authors initially used a 17-item
content validity in so far as it reflects the two scale to represent these dimensions, they sub-
major components of social health identified in the sequently developed a 19-item version, which
literature: interpersonal interaction and social par- involved dividing the emotional/informational
ticipation. The authors do not attempt to include support domain into two dimensions, and thus the
any of the other areas of social health; thus content scale then contained five, rather than four, dimen-
validity is only partial. sions of social support (Sherbourne and Stewart
Criterion validity was tested by using the items 1991; Sherbourne et al. 1992). The emotional
in the scale and a nine-item self-rating of health, a and informational domains were later combined
measure of emotional ties and a nine-item psycho- back into the emotional/informational support
logical well-being scale. The correlations were low. subscale following evaluation by multi-trait
Further evidence of the validity of the battery is correlation matrix, which showed considerable
required. overlap between the items (Sherbourne and Stewart
1991).
In the 19-item version, two single items on the
Reliability structure of social support are included in order to
The inter-item correlations are low. Test-retest compensate for the lack of focus on the structure
coefficients are moderate and range from 0.55 to of the network (the number of close friends and
0.68. Further testing for reliability is required. relatives, and marital status). The development of
The results for reliability and validity are not the 19-item support scale was based on the same
so far convincing. Further testing is required. The conceptual framework, question type and response
large-scale and longitudinal nature of the Rand HIS format as the 17-item scale, and was described by
study offers exciting possibilities for further testing Sherbourne and Stewart (1991). The 19-item scale
and development of the measure. is the current version.
The items selected for inclusion were derived
from a larger pool of 50 items constructed on the
SOCIAL SUPPORT SCALE basis of a literature review (Sherbourne and Stewart
1991). The items deliberately reflect subjective
The Rand social support questionnaire was impressions of social support, rather than objective
developed during the MOS (Sherbourne and Hays network structures (the latter was omitted in order
1990). It reflects more recent conceptual thought to reduce respondent fatigue).
Content better levels of physical functioning and emotional
For each item (in both the 17- and 19-item versions), well-being than those with low levels of support,
respondents are asked how often each kind of supporting its discriminative ability. In relation to
support was available to them if they needed it. the same sample, the 19 items were reported by
The five-point choice response scale for each item Sherbourne and Stewart (1991) to be correlated
ranges from ‘none of the time’, ‘a little of the time’, weakly to moderately with measures of loneliness,
‘some of the time’, ‘most of the time’ to ‘all of the health perceptions, mental health and measures of
time’. Five points were chosen by the authors on family and social functioning. Bowling et al. (2002)
the basis of their review of the research evidence used the instrument in a survey of the quality of
that five to seven response categories are necessary life among people aged 65 and over in Britain. The
for optimal assessment. Examples from the scale are: overall social support score was a significant,
independent predictor of self-rated quality of life,
Next are some questions about the support that is further supporting the construct validity of the
available to you. measure. The internal consistency was fairly high
(Cronbach’s alpha: 0.60) (Bowling and Gabriel
1 About how many close friends and close relatives do 2004).
you have (people you feel at ease with and can talk
Sherbourne et al.’s (1992) study reported that
to about what is on your mind)?
standardized factor loadings ranged from 0.76 to
Write in number of close friends and relatives: 0.93 for the tangible support factor, 0.86–0.92 for
the affection factor, 0.82–0.92 for the emotional/
Circle one number on each line informational factor, and 0.91–0.93 for the positive
None of A little of Some of Most of All of
interaction factor. Results of principal components
the time the time the time the time the time factor analysis of the 19 items also supported the
(1) (2) (3) (4) (5) construction of the overall index (the first unrotated
factor showed high loadings for each of the items,
2 Someone to help you if you were confined to bed ranging from 0.67 to 0.88). These results support
3 Someone you can count on to listen to you when you the scale as containing four dimensions and as
need to talk providing a common measure of overall support.
4 Someone to give you good advice about a crisis Sherbourne and Stewart (1991) reported that
5 Someone to take you to the doctor if you needed it the correlations (Pearson’s) between the items and
10 Someone who hugs you the subscales were strong. Item-scale correlations
15 Someone to help with daily chores if you were sick ranged from 0.72 to 0.87 for the tangible support
20 Someone to love and make you feel wanted
scale, 0.80–0.86 for the affection scale, 0.82–0.90
for the emotional/informational scale, and
Scoring 0.87–0.88 for the positive interaction scale. The
Each response (‘none of the time’ to ‘all of the additional item on number of close friends and rela-
time’) is scored 1 to 5, and summed to provide an tives correlated low to moderately with each of the
overall score of social support (the higher the score, functional support items (0.18–0.23), indicating its
the greater the level of social support). distinct status; marital status was not associated with
numbers of close friends or relatives (0.01), but was
more highly correlated with the functional support
items (0.69–0.82).
Sherbourne et al. (1992) used the 19-item scale in a Because of its concentration on subjective per-
study of the effects of social support and stressful ceptions, this scale needs to be supplemented with
life events, on long-term physical functioning and an objective measure of the structure of the net-
emotional well-being of 1,402 chronically ill work (e.g. size, composition, frequency of contact,
people (with hypertension, diabetes, coronary heart geographical proximity, contacts by telephone/
disease or depression) participating in the Rand mail). Although it requires supplementation, and
Medical Outcomes Study. The authors reported far more extensive testing, it is a promising scale
that patients with high levels of social support had and merits wider use in order to more fully assess
62 MEASURING HEALTH
its psychometric properties and cross-cultural When I’m sick, I try to just keep going as usual.
applicability. One advantage of this scale is that it I expect to have a very healthy life.
contains more health-specific items than many of When I think I am getting sick, I fight it.
the more generic social support scales that have
been developed. Most social support scales were It is apparent from these questions that analyses
developed in the USA and contain culture-specific need to control for age, especially in relation to
items which would be unusual in other societies expectations about the future and current health
(e.g. about having someone who would loan the status.
respondent a car).
Scoring
GENERAL HEALTH PERCEPTIONS BATTERY All items are scored to produce a global figure,
although the scoring has not been fully tested for
This asks respondents for an assessment or self- validity. Distributions appear to be fairly normal.
rating of their health in general (Stewart et al. 1978; Hall et al. (1987) reported the Rand measures to be
Davies and Ware 1981). These were defined in the less skewed than the SIP or GHQ.
HIS with respect to time (perceptions of prior,
current and future health) and with respect to
three other constructs indicative of general health Validity
perceptions, including resistance or susceptibility The measure of health perceptions was reported
to illness, health worry and concern, and sickness by the authors to be correlated with the physical,
orientation (the extent to which people perceive mental and social health batteries (coefficients
illness to be a part of their lives). unreported). In the Australian study by Hall et al.
A major strength of the General Health Per- (1987) the Rand batteries relating to general health
ceptions Battery is ease of administration. Self- perceptions were tested on 160 patients along
administration of this section takes approximately with the Sickness Impact Profile and the General
seven minutes. It is of potential use in studies Health Questionnaire. A correlation of −0.30 was
attempting to predict the use of medical services. considered to be the threshold for assessment of
relationships between the three instruments. The
Content Rand mental-health index was reported to corre-
The battery contains 29 items, 26 of which were late weakly to moderately well with three of the
taken from the Health Perceptions Questionnaire SIP components (emotional behaviour): −0.32 to
developed by Ware and Karmos (1976) for the −0.54. To some extent the same constructs are
National Center for Health Services Research. being measured, although these correlations are not
The items include statements of opinion about high. If 0.50 is taken as the minimum correlation
personal health (e.g. ‘I expect to have a very healthy coefficient for validity acceptance for group studies,
life’), accompanied by five standardized response then the instruments tested do not achieve this
categories defining a true–false continuum: def- (Ware et al. 1980). On the other hand, it was
initely true, mostly true, don’t know, mostly false, reported previously that the GHQ correlated
definitely false. These items are used to score six highly with the Rand Mental Health Battery
subscales assessing different dimensions of health (−0.76) (Hall et al. 1987). The developers indicated
perceptions: past health, present health, future that it can successfully discriminate between those
health, health-related worries and concerns, resist- with and without a chronic disease, is sensitive to
ance or susceptibility to illness and the tendency to individual differences in disease severity, and is
view illness as part of life. Examples of items are: sensitive to changes over time in both physical and
mental functioning. Hall et al. (1987) reported that
I try to avoid letting illness interfere with my life. of the three instruments used in the Australian
I will probably be sick a lot in the future. study (the Rand instrument, the SIP and the
I don’t like going to the doctor. GHQ), the Rand measures had the best discrimi-
I’m not as healthy as I used to be. native ability and were reported by the authors to be
preferred as a general health-status measure in a were added (Ware et al. 1992). This was used in
general population. the MOS. The authors decided to extend the scale
to make it more comprehensive and with better
psychometric properties. This led to the longer
Reliability
SF-36 item version. The SF-36 is made up of the
The reliability of the Health Perceptions Question- items which loaded best on factor analyses from 149
naire scale was estimated using internal consistency items from the longer batteries, based on the results
and test-retest coefficients. These were tested by its from over 22,000 patients in the Rand HIS/MOS
authors using a non-HIS population. Test-retest studies. The SF-36 takes 5–10 minutes to complete
reliability estimates were based on data collected and is self-administered.
approximately six weeks apart from the same The SF-36 was initially distributed by the Rand
respondents. Results indicate that the scale was Organization in Santa Monica, where it was
more reliable than single-item measures, although developed, and, formerly, by the Health Outcomes
internal-consistency coefficients (unreported) were Institute in Bloomington (previously known as
lower than for test-retest. Internal consistency for InterStudy). Rand still distribute, free of charge, the
the scale generally exceeded 0.50 (sometimes 0.90). original SF-36, and it is called the Rand 36-item
The stability of the Health Perceptions Battery has Health Survey. A similar format of the instrument
been estimated for intervals of one, two and three was distributed solely by the Medical Outcomes
years between administrations. The median stability (Study) Trust (MOT) of the Health Institute at the
coefficients for one, two and three years for adults New England Medical Center, in Boston, where
are 0.66, 0.59 and 0.56 respectively. one of its original developers, John Ware, is based.
The instrument is known there as the Short Form-
36 Health Survey (SF-36 Health Survey) (Medical
THE SHORT FORM-36 HEALTH SURVEY Outcomes Trust 1993; Ware et al. 1993), and is
QUESTIONNAIRE (SF-36) strictly copyrighted. Users are requested to com-
plete and submit an online licence application to
The Short Form-36 Questionnaire was developed QualityMetric Incorporated (license@quality-
at the Rand Corporation in the USA for use in metric.com). A manual is obtainable from the
the Health Insurance Study Experiment/Medical QualityMetric website. A variable fee is now pay-
Outcomes Study (HIS/MOS) (Stewart and Ware able for the use of the MOT instrument (US, UK
1992; Ware et al. 1993). It is a concise 36-item and other versions) (except in cases of unfunded
health-status questionnaire, and its use across the academic research. While the MOT copyrighted
world has escalated since 1990. The authors of the instrument is increasingly used worldwide, many
SF-36 aimed to develop a short, generic measure of researchers, particularly those investigating clinical
subjective health status that was psychometrically outcomes, continue to use the original version
sound, and that could be applied in a wide range of which is still freely distributed and available from
settings. It was constructed with the aim of satisfy- Rand.
ing the minimum psychometric standards necessary The SF-36 Health Survey is now the most
for making comparisons between groups. The eight frequently used measure of generic health status
dimensions it includes were selected from the 40 across the world. It is also popular among social
dimensions included in the MOS, and were selected gerontologists investigating the quality of life of
to represent the most frequently measured con- older people (Michalos et al. 2001). Population
cepts in health surveys and those most affected by norms for the SF-36 in many countries have been
disease and treatment. Most SF-36 items are based published (Aaronson et al. 1992; Ware et al. 1993,
on instruments that had been used since the 1970s 1997; Bullinger 1995; Sullivan et al. 1995; Gandek
and 1980s, including Dupuy’s (1984) Psychological and Ware 1998), including those for the UK
General Well-Being Index. (Brazier et al. 1992, 1993; Garratt et al. 1993; Jenkin-
Initially, the SF-20 (Short Form 20-item version) son et al. 1993, 1996, 1999). The only changes made
was designed. For this, 17 items were taken from the to the original scale for the UK version include
questionnaires used in the HIS and three new items Anglicization of some of the language and a slight
64 MEASURING HEALTH
alteration of the positioning and coding of one of Climbing one flight of stairs
the social-functioning items in order to improve Bending, kneeling or stooping
reliability in the UK and ease of administration. Walking more than a mile
A second version, compatible with the US version Walking half a mile
Walking 100 yards
2.0, developed by Ware et al. (1997) has been Bathing and dressing yourself
developed (Jenkinson et al. 1999). The second
version improved the wording and layout of the 4 During the past four weeks, how much of the time
original scale, and added more sensitive response have you had any of the following problems with your
categories to the role functioning subscale, which work or other regular daily activities as a result of your
has increased its reliability, raised the ceiling and physical health?
lowered the floor ends of the scale, and improved Cut down on the amount of time you spent on work or
precision (Jenkinson et al. 1996, 1999). Several other activities
investigators have reported notable floor effects for Accomplished less than you would like
the SF-36 (Riazi et al. 2003). A manual for the UK Were limited in the kind of work or other activities
version can be purchased (Jenkinson et al. 1996). Had difficulty performing the work or other activities
(e.g. it took extra effort)
Content
All of the time/most of the time/some of the time/a
The SF-36 contains 36 items which measure eight little of the time/none of the time
dimensions: physical functioning (10 items), social
(SF-36 Health Survey, Copyright (© Medical Outcomes Trust. All rights
functioning (2), role limitations due to physical reserved. Reproduced with permission of the Medical Outcomes Trust)
problems (4), role limitations due to emotional
problems (3), mental health (5), energy/vitality (4), Scoring
pain (2) and general health perception (5). There
is also a single item about perceptions of health The previous mix of scaled and dichotomous
changes over the past 12 months (in effect, pro- response formats confused respondents and led to
viding a ninth domain). It claims to measure item non-response. Thus the second version has
positive as well as negative health. changed the dichotomous response formats at the
Two versions are available with varying time ‘role-physical’ and ‘role-mental’ dimensions to
recall referents in the responses to current health- five-level response choices, as well as changing the
status questions – respondents are asked about their six-level response format on the mental health
health over the past four weeks (the most com- dimension to five levels.
monly used) or (in the case of acute conditions) The coding format requires recoding before
over the past one week. The latter should be more each subscale can be summed. The subscales are not
responsive to recent changes. Examples are: summed together to produce an overall score,
instead the scores for each of the eight dimensions
1 In general, would you say your health is: are reported. The scoring algorithms adopted for
the SF-36 were published after careful study of a
Excellent/Very good/Good/Fair/Poor
number of alternatives. The scoring method
3 Health and daily activities. The following questions selected was chosen for its simplicity and to opti-
are about activities you might do during a typical mize the ability to make comparisons of results
day. Does your health limit you in these activities? If across studies (Ware et al. 1993). In the original
so, how much? scoring method, the item scores for each of the
Yes, limited a lot/Yes, limited a little/No, not limited eight dimensions are summed and transformed,
at all using a scoring algorithm, into a scale from 0 (poor
Vigorous activities, such as running, lifting heavy
health) to 100 per cent (good health).
objects, participating in strenuous sports The original 0–100 scoring algorithms of the
Moderate activities, such as moving a table, pushing a SF-36 (based on summated ratings) has been
vacuum, bowling or playing golf improved on by norm-based scoring (the standard-
Lifting or carrying groceries ization of mean scores and standard deviations for
Climbing several flights of stairs the SF-36 scales), which has facilitated the speed
and ease of interpretation of SF-36 scores. Ware (1992), together with the history of the develop-
et al. (1994) described how linear transformations ment of the SF-36 from these instruments. There is
were performed to transform the scale scores to a a vast number of publications of the psychometric
mean of 50 and a standard deviation of 10 (in the properties of the SF-36. Garratt et al. (2002) judged
general US population). With this method, then, it to be the most widely evaluated generic health-
each scale is scored to have the same average (50) status instrument. For consistency, and in order
and the same standard deviation (10). Without to avoid confusion, this review will focus on the
the need to refer to norms, scale values below 50 results for the SF-36 Health Survey. Equally good
are interpreted as below average, and each point is psychometric properties have been reported for the
one-tenth of a standard deviation. original Rand SF-36 item Health Survey (e.g. King
The results have conventionally been reported as and Roberts 2002; Lowrie et al. 2003; Oga et al.
mean scores for each subscale, rather than frequency 2003).
distribution, despite the well-known tendency of The manuals and bibliographies of the SF-36
means to distort results by reflecting small numbers Health Survey reference the main studies which
of outlying values. While generally accepted, the report the instruments’ psychometric properties
validity of this method, and the common usage of (Ware et al. 1993, 1997; Shiely et al. 1996). The
parametric statistics to analyse the SF-36, has, International Quality of Life Assessment Project
however, been seriously questioned by Julious et al. published the psychometric properties of the trans-
(1995), given the non-normal distributions of the lated and tested versions of the SF-36 in inter-
data. national use (Gandek and Ware 1998). The results
Two summary scales can also be obtained – the of British studies of the reliability and validity of the
Physical Component Summary Score (PCS-36) and SF-36 have been summarized by Jenkinson et al.
the Mental Component Summary Score (MCS- (1996, 1999). Bowling et al. (1999) compared the
36), with the advantage of enabling a reduction in various British norms for the SF-36 and critically
the number of statistical comparisons conducted. discussed their variations by mode of administra-
These two summary scales capture about 85 per tion, survey context and question order. Their con-
cent of the reliable variance in the eight scale SF-36 clusions were consistent with those of McHorney
and yield average scores which closely mirror et al. (1994) and Lyons et al. (1999) who reported
those in the 36-item scale (Ware et al. 1994, 1995, under-reporting of health problems using the SF-
1996b; Jenkinson et al. 1997). 36 in personal interviews in comparison with postal
Another approach to scoring the SF-36 was approaches, especially in relation to emotional and
developed by Brazier et al. (2002). This is a mental health.
preference-based health utility index, using a six- Brazier et al. (1992), on the basis of the results of
domain classification of health states, and is called a postal survey in the UK, reported that the SF-36
the SF-6D. The SF-6D preferences can be applied was found to be more sensitive to gradations in
to any SF-36 dataset for economic evaluations poor health than the EuroQol (EuroQol Group
(e.g. estimation of QALYs). However, comparisons 1990) and the Nottingham Health Profile (Hunt
with the SF-6D and the Health Utilities Index et al. 1986). The authors cautioned that there was a
mark 3 (HUI3), which is a widely used, valid and higher rate of item non-response among older
reliable, multi-attribute health utility scale, showed people, a finding confirmed by Sullivan et al. (1995)
that these two instruments differed markedly in on the basis of a Swedish postal survey sample (in
their distributions and point estimates of derived contrast to interview surveys). However, reports are
utilities (O’Brien et al. 2003). The implication is that contradictory. A later postal survey by Walters et al.
utilities and QALYs derived from each instrument (2001) of almost 1,000 people aged 65 and over
are not comparable. registered with 12 general practices found that the
response rate was 82 per cent and dimension com-
pletion rates ranged between 86.4 per cent and 97.7
Validity
per cent. Research in the USA has found good
The results of the studies which tested the longer responses among elderly people (Health Outcomes
Rand batteries were published by Stewart and Ware Institute 1990). Lyons et al. (1994, 1997) reported
66 MEASURING HEALTH
that item response was good among elderly people et al. (1993), in the manual of the SF-36, reported
in the UK when the instrument was interviewer- studies showing that in tests for validity, the SF-36,
rather than self-administered, and it distinguished was able to discriminate between groups with
between those with and without markers of poor physical morbidities (the physical functioning sub-
health, supporting the instrument’s construct scale performed best and the mental-health subscale
validity. discriminated between patients with mental-health
On a more critical level, Hill and Harries (1994) morbidities the best). Investigators in Grampian
and Hill et al. (1995) reported serious flaws in the reported that different common medical conditions
instrument when used to measure the outcomes (low back pain, menorrhagia, suspected peptic ulcer
of health care for people aged 65 and over in and varicose veins) achieved a distinct score profile,
community settings in the UK. The respondents indicating that the SF-36 can discriminate between
(who tended to have high levels of co-morbidity conditions (Garratt et al. 1993). Further papers by
and poor physical functioning) often reported these authors reported the scale means for these
during interviews that the SF-36 did not reflect patient groups and good responsiveness to change
their values. In one district, the median physical in clinical condition (Garratt et al. 1994; Ruta et al.
functioning score for the group was zero (worst 1994a). The mental-health subscale has a particu-
functioning), meaning that the group as a whole larly impressive validity. For example, it was reported
were pushed to the margins of the scale. While to be correlated by between 0.92 and 0.95 with the
dramatic improvements in physical functioning in full Mental Health Inventory from different samples
this group, and therefore improvements in scores, from the HIS (Davies et al. 1988; Stewart and Ware
were unlikely, the measure was reported to miss 1992). Correlations between the SF-36 and the
some changes that were important to people. General Psychological Well-Being measure (Dupuy
These findings support the argument of Hunt and 1984) ranged from r = 0.19 to r = 0.60, with a
McKenna (1993) who argued that the development median of r = 0.36. Ware et al. (1993) presented the
and testing of the SF-36 has relied too heavily on data from several studies that show that the corre-
psychometric techniques at the expense of serious lations for the subscales range from weak to strong,
consultation of lay people for their views about the but strong correlations were reported between the
instrument. physical functioning subscale and the equivalent
Mangione et al. (1993), in a study of 745 major subscales of the SIP, the AIMS and the NHP (0.52
elective (non-cardiac) patients in Boston, reported to 0.85). Strong correlations were reported between
that the SF-36 health perception scale had the the mental-health subscale and other psychological
greatest correlation with the energy and fatigue subscales (the range was r = 0.51 to r = 0.82). Longi-
scale (r = 0.45), correlated moderately with mental tudinal population data in the UK showed that the
health (r = 0.35), social function (r = 0.32), and SF-36 was sensitive to respondents’ short-term
physical function (r = 0.33), but correlated less changes in health status (Hemingway et al. 1997).
well with pain (r = 0.23). These results support the Not all results have been good (see review by
distinctive components of the health perceptions Anderson et al. 1993). For example, the bodily pain
subscale although, on the other hand, these scale has been reported to have poor convergent
moderate to weak correlations among variables validity when tested against severity of illness and
which would be expected to be more highly corre- independent pain scores in the case of knee con-
lated (health perceptions and the various domains ditions (McHorney et al. 1993). Other studies have
of health status) also suggest that the health percep- reported ‘floor’ effects in the role functioning scales
tions battery might have weak convergent validity. in severely ill patients, where 25–50 per cent of
Mangione et al. (1993) also reported that the patients obtained the lowest score possible, with the
SF-36 was able to discriminate between surgical implication that deterioration in condition will not
groups (major elective, non-cardiac surgery), and be detected by the scale (Kurtin et al. 1992). The
between younger patients and patients who were comparable percentage in an HIV population is
aged 70 years and over in relation to role function, 63 per cent (Watchel et al. 1992). Anderson et al.
energy, fatigue and physical function (the older (1993) suggested that the item codes can be subject
patients had poorer scores on these domains). Ware to ‘ceiling’ effects as they appear too crude to detect
improvements. The modifications to the item scores all but 11 coefficients reported in studies in the
in the second version of the SF-36 should alleviate USA and the UK exceeded the 0.70 standard
this problem. suggested by Nunnally (1978). Reliability estimates
Other investigators reported that it has little for the two summary scores generally exceed 0.90
discriminatory power among women receiving (Ware et al. 1994). Meyer-Rosenberg et al. (2001)
different treatments for Stage II breast cancer reported that the SF-36 had higher internal consist-
(Levine et al. 1988; Guyatt et al. 1989). The physical ency reliability coefficients than the Nottingham
functioning scale also focuses more on mobility at Health Profile (NHP), which was previously used
the expense of other pertinent areas of functioning widely in Britain and Europe, and correlations
(e.g. domestic), necessitating its supplementation between the two instruments were moderate giving
with other scales in studies of people with chronic some support to their convergent validity. However,
conditions which affect their functioning (e.g. the NHP had additional items on pain and sleep
rheumatoid arthritis). Its sensitivity may vary with which were more relevant to their population of
disease type. The SF-36 and both summary scales pain patients. Seymour et al. (2001) tested the reli-
have been reported to have good validity in other ability of the SF-36 in a sample of over 300 elderly
clinical areas: they were able to detect improve- rehabilitation patients and found the results to be
ment in patients’ conditions after heart surgery less good than those reported in earlier studies. The
(Sedrakyan et al. 2003); the SF-36 was associated Cronbach alphas for the eight dimensions of the
with severity of hearing loss in the expected SF-36 ranged from 0.54 (social functioning) to
direction (Dalton et al. 2003) and with Crohn’s 0.933 (bodily pain) for the cognitively normal
Disease Activity Index (Kiran et al. 2003). The patients; they were slightly lower for the cogni-
mental health scale and the MCS summary scale tively-impaired patients (range 0.413–0.861). The
have been reported to be of value in screening for values were significantly higher among the normal
psychiatric disorders, particularly depression (Ware group for the bodily pain, mental health and role-
et al. 1994). emotional dimensions. Test-retest reliability coef-
ficients were also higher for the normal group (at
least 0.7 was attained for five out of eight dimen-
Reliability
sions) than the impaired group (0.7 was attained for
In the UK, Brazier et al. (1992) reported internal four out of eight dimensions).
coefficiency correlations for the eight scales as Ware et al. (1993) also reported the results of
ranging from 0.60 to 0.81, with a median of 0.76. a factor analysis of the SF-36, which provided
High inter-item correlations were reported for the strong evidence for the conceptualization of
subscales (e.g. mental health). Jenkinson et al. (1993) health underlying the SF-36, and indicated that
reported that it has high internal consistency some scales principally measure physical health,
between dimensions, with high Chronbach’s alphas some measure mental health, and others measure
being obtained of between 0.76 and 0.90. Garratt both. Garratt et al. (1993) also reported the results of
et al. (1993) reported that the internal consistency a factor analysis which confirmed the distinct scale
between the items exceeded 0.80 with Chron- dimensions.
bach’s alpha, and inter-item correlations ranged The SF-36 is probably the most widely used
from 0.55 to 0.78. All satisfied statistical criteria of health-status scale across the world, largely due to
acceptable levels. Walters et al.’s (2001) survey of its brevity and coverage of broader health status.
people aged 65 and over reported that Cronbach’s This increasing standardization of measurement
alpha exceeded 0.80 for all dimensions except social facilitates comparisons between populations and
functioning. Ware et al. (1993) reviewed 14 studies evaluations of patient outcomes. Due to its coverage
in the USA which analysed the reliability of the SF- of social, physical and emotional health, many
36. The reliability coefficients for internal consist- investigators use it as a proxy measure of health-
ency range from 0.62 to 0.94 for the subscales, for related quality of life. Many disease-specific scales
test-retest reliability the coefficents range from 0.43 also recommend its use as a generic core (see
to 0.90, and for alternate form reliability the coef- Measuring Disease, Bowling 2001). Translations of
ficient was 0.92. In relation to internal consistency, the scale are available in over 50 languages, and each
68 MEASURING HEALTH
translated scale has conformed to strict protocols of and Fisher (1999) used it with over 2,000 heart and
the process (information about translations, as well stroke patients in Australia. The SF-12 was able to
as an online scoring service, and missing data esti- discriminate between age groups, emergency versus
mator, is available on http://www.SF-36.com or on planned admission patients, males and females and
www.qualitymetric.com). patients with varying lengths of stay. However, they
The SF-36 can be self, interviewer, telephone or did report that it had a high non-completion rate,
computer administered, and takes about five cautioning against its use with these patients. A
minutes to complete. Ryan et al. (2002) reported study by Salyers et al. (2000) in the USA of a sample
that 71 per cent of their respondents (healthy indi- of people with severe mental illness found that the
viduals and chronic pain patients) who completed SF-12 distinguished this group from the normal
both electronic and paper versions preferred the population, and that it was stable over a one-week
electronic version. interval. The Rand version of the 12-item scale
(Rand-12) has been redeveloped, and summary
scores for the physical and mental health com-
THE SHORT FORM-12 HEALTH SURVEY ponents are available. These performed better than
QUESTIONNAIRE (SF-12) and HEALTH STATUS the SF-12 summary scores (licensed by Quality-
QUESTIONNAIRE-12 (HSQ-12) Metric) in distinguishing between patients with dif-
ferent severities of diabetes (Johnson and Maddigan
A 12-item version of the SF-36 (the SF-12) (Ware 2004). The SF-12 manual lists over 40 existing
et al. 1995, 1996a, 1996b; and see http://www.sf- translations of the SF-12. A scoring algorithm,
36.org and http://www.qualitymetric.com/sf-36) involving weighted item responses, is available with
and a 12-item version of the original Rand SF-36 the manual. As the SF-12 is shorter, it is less precise
questionnaire (HSQ-12; redeveloped as Rand-12, than the SF-36, and is suitable for use with larger
see http://www.rand.org) (Radosevich and Pruitt samples in which broader health outcomes are
1995) have been developed. These are one-page being monitored (Ware et al. 1996b).
versions of the SF-36, which takes about a minute The items which were included in the HSQ-12
to complete. The 12 items for both versions include were derived by using regression analysis on the
the self-assessment of health, physical functioning; data from over 4,000 respondents to a 39-item version
physical role limitation; mental role limitation; of the questionnaire. Stepwise linear regression was
social functioning; mental-health items and pain. used to select preliminary sets of questions that
Standard four-week and one-week (acute) recall accounted for approximately 90 per cent of the
versions are available. These 12 items yield the variability in overall scale scores (Radosevich
eight-scale profiles of the SF-36, but with fewer and Husnik 1995). The developers decided to
levels and with less precise scores, as would be emphasize items from the physical functioning and
expected with fewer items (e.g. some subscales now mental-health subscales as these have been shown
have just one or two items). to be the most sensitive in distinguishing medical
Ware et al. (1996a, 1996b) reported that the 12 and psychiatric conditions (McHorney et al. 1993).
items that were selected for inclusion in the SF-12 The instrument was then initially tested against
(with an improved scoring algorithm), on the their longer 39-item version of the (SF-36) battery
grounds of their psychometric properties, were able among a group of cardiology patients and in a
to reproduce at least 90 per cent of the variance in healthy working population (Radosevich and
the physical and mental subscales of the SF-36, and Pruitt 1995). Over 800 respondents took part in the
reproduced the profile of the eight dimensions of reliability testing. Replicate reliability coefficients
the SF-36. They argued that, consequently, the for the short (12 items) and long forms (39 items)
population norms for the SF-36 summary measures were above 0.52 for all scales.
(PCS-36 and MCS-36) can also be used as norms Both the SF-12 and the HSQ-12 have been
for the SF-12 summary measures. Because the SF- tested in Britain (Bowling and Windsor 1997;
12 is a subset of the SF-36, the many translations Jenkinson and Layte 1997; Jenkinson et al. 1997;
available for the SF-36 can be used for the SF-12. Pettit et al. 2001). In a community psychiatric sur-
Results for its validity and reliability are good. Lim vey of almost 1,000 people in London, Pettit et al.
(2001), reported that the HSQ-12, but not the SF- 1989). The World Organization of Family Doctors
12, could distinguish between people with and (WONCA) selected the charts as an international
without dementia. In the UK, as well as the USA, set of approved self-administration instruments for
the SF-12 reflects the revised version of the SF-36 measuring health and functional status in primary
(version 2). Jenkinson and Layte (1997) reported care consultations, and called them the Dartmouth
that the SF-12 Summary Scores produced very COOP Functional Health Assessment Charts/
similar results to the longer SF-36 Summary Scores. WONCA (abbreviated as the COOP/WONCA
Charts) (Scholten and van Weel 1992; van Weel
et al. 1995). A feasibility study was launched in
THE SHORT FORM-8 HEALTH SURVEY seven countries by WONCA (Landgraf and Nelson
QUESTIONNAIRE (SF-8) 1992). The instrument was revised and comprised
six, not nine, charts, and each chart was renamed:
The SF-8, by definition, contains just eight items.
physical fitness, feelings, daily activities, social
It relies on a single item to measure each of the
activities, change in health and overall health. The
eight dimensions of health status contained in the
pain, quality of life and social support charts
SF-36. It was constructed on the basis of empirical
were not included in the WONCA version. Other
studies linking each questionnaire item to a com-
revisions were made to the question wording
prehensive pool of commonly used questions,
and response categories (van Weel and Scholten
including the SF-36, which measured the same
1992). The history of the development of the charts
health concept. Each item was calibrated on the
has been summarized by Anderson et al. (1993).
same metric as the corresponding SF-36 scale. As
A manual is available from the developer’s website
with the 36- and 12-item versions, four-week and
(www.globalfamilydoctor.com/publications/
one-week recall time reference versions are avail-
coop-woncacharts/COOP-WONCHACH).
able, as well as a 24-hour recall version. Thus the
SF-8 is an eight-item version of the SF-36 and
yields a comparable eight-dimension health profile Content
and comparable physical and mental summary score
The full, original US version contains three charts
estimates. It has been translated and validated for use
on functioning (social, physical and role function-
in over 30 countries. A manual is available (How to
ing), two on symptoms (pain and emotional con-
score and interpret single-item health status meas-
dition), three on perceptions (change in health,
ures: a manual for user of the SF-8 Health Survey –
overall health and quality of life) and one on social
see www.sf-36.org/tools/sf8.shtml). It is regarded as
support. As mentioned earlier the WONCA version
more a more practical and efficient instrument for
contains six, not nine, charts, and each chart was
use in population studies (Turner-Bowker et al.
renamed: physical fitness, feelings, daily activities,
2003).
social activities, change in health and overall health.
Each chart contains a title, a question referring to
THE DARTMOUTH COOP FUNCTION CHARTS the status of the patient, and an ordinal level five-
point response scale, ranging from ‘No limitation
These charts were developed by a collaborative at all (1)’ to ‘Severely limited (5)’, each illustrated
body of medical doctors in community settings with a drawing. Each item is rated on the five-point
who aimed to produce a simple, concise, valid and scale.
reliable instrument that could be used easily during The original US charts took the last four weeks
doctor–patient consultations. as the time reference periods for each question, but
The original US version of the charts contains the WONCA version takes two weeks (e.g. ‘During
nine sections (or charts): three on functioning the past two weeks . . . What was the hardest
(social, physical and role functioning), two on physical activity you could do for at least two
symptoms (pain and emotional condition), three minutes?’). Each chart has its own five-point Likert
on perceptions (change in health, overall health and response choices (e.g. in response to the previous
quality of life) and one on social support (Nelson example of an item, the responses are: ‘very heavy
et al. 1983, 1987, 1990a, 1990b, Nelson and Berwick (for example) run, at a fast pace (1)’ to ‘very light
70 MEASURING HEALTH
(for example) walk, at a slow pace or not able to Scoring
walk (5)’; ‘no difficulty at all’ to ‘could not do’; The scoring on each chart has 1 equalling the best,
‘much better’ to ‘much worse’). The problem with and 5 equalling the worst, level of functioning. Each
this particular chart is that the examples of the chart represents a distinct domain and the charts
activities within each response item (run, jog, walk) are not summed together to form an overall score
do not all fit logically or sensibly with the response (Nelson et al. 1990a, 1990b). The administration
item descriptors (very heavy to very light): how time for the nine charts was five minutes. The
does one run in a way that is ‘very heavy’? How charts have been translated into 20 languages.
can the inability to walk logically equate with the The developers initially advised investigators to
response code ‘very light’? The response codes use descriptive statistics (frequency distributions),
conjure up images of strength rather than mobility. to present results for the charts, and not to com-
This leads to conceptual confusion. The other five pute means, standard deviations or other statistical
charts do not suffer from this problem. techniques, because the charts are based on ordinal
Illustrations (stick figures, faces and symbols scales. However, investigators tend to treat the scales
plus–minus and arrow signs) are included on each assuming they can be translated into interval level
chart, to illustrate the response choices (Nelson et scales. The developers now accept this develop-
al. 1987). These were developed for use with the ment; it is a common trend in the analysis of all
original charts to enhance the attractiveness and health-status scales.
appeal of the scale to respondents (Nelson et al.
1987), and were also regarded as helpful in the
Validity
WONCA version when used with populations
with a high degree of illiteracy (van Weel et al. The original Dartmouth charts were initially tested
1995). The charts are self-administered. The on over 1,400 patients with different groups of
physical fitness question (WONCA version) is conditions from four medical centres in the USA
shown below. (Nelson et al. 1987). Studies undertaken since, based
on the Dartmouth version, have reported the charts
Physical fitness chart: to be suitable for use as clinical outcome measures
During the past two weeks . . .
and responsive to changes in back pain and in levels
What was the hardest physical activity you could do for of physical fitness and ability in people aged 65 and
at least two minutes? over (Wollstadt et al. 1997; Bronfort and Bouter
1999; Aure et al. 2003).
Very heavy (for example) In The Netherlands, the WONCA version was
run, at a fast pace tested on over 5,000 patients, and their population
norms have been published (Scholten and van Weel
1992; van Weel et al. 1995). The testing procedures
Heavy (for example) for the WONCA version have been briefly sum-
jog, at a slow pace marized by van Weel and Scholten (1992) and van
Weel et al. (1995). In support of the convergent
Moderate (for example)
walk, at a fast pace
validity of the WONCA version, the charts on
functioning and feelings have been reported to
correlate well with other measures of physical
Light (for example) and emotional functioning respectively, such as the
walk, at a medium pace Barthel Index and the Zung Depression Scale
(Schuling and Meyboom-de Jong 1992). The
psychometric testing of the WONCA version of
Very light (for example) the scale among two populations in The Nether-
walk, at a slow pace lands (one with an average age of 43, and the other
or not able to walk aged 60 and over) reported that, in the younger
(Dartmouth COOP Functional Health Assessment Charts/WONCA.
age group, the physical fitness chart and the daily
Copyright © Trustees of Dartmouth/COOP Project 1995.) activities chart correlated moderately to moderately
well with the physical mobility subscale of the weeks, they were stronger at 0.67–0.82, with a
Nottingham Health Profile, at 0.53 and 0.66 kappa of 0.49–0.59; and over one year they were
respectively, and with the Rand SF-36 physical 0.36–0.72, with a kappa of 0.31–0.38 (Meyboom-
functioning subscale at 0.52 and 0.55 respectively. de Jong and Smith 1990).
The feelings chart correlated moderately well with The inter-chart correlations for the WONCA
Goldberg’s General Health Questionnaire (0.63) version range between 0.15 and 0.66 (van Weel
and with the mental-health subscale of the Rand et al. 1995). Multi-trait multi-method analysis has
SF-36 (0.71). The overall health chart correlated reported that the physical and role functioning
moderately well with the general health subscale charts are highly correlated, suggesting a common
of the Rand SF-36 (0.62).The results for the group domain, while the emotional status and social sup-
aged 60 and over were weaker. Only the compar- port charts are more independent, suggesting that
isons with the Rand SF-36 was reported for this they are measuring different domains of function-
group. The physical fitness chart and the daily activ- ing (Landgraf et al. 1990).
ities chart correlated weakly to moderately with the Despite their popularity in studies based in
physical functioning subscale of the SF-36 (0.56 general practice (Harvey et al. 1998), the charts have
and 0.31 respectively); the feelings chart correlated not yet been satisfactorily tested for their psycho-
more strongly with the mental-health subscale of metric properties, and details of the research design
the SF-36 (0.76); the overall health chart correlated on which the testing has been based are often
moderately well with the general health subscale of lacking (see review by Anderson et al. 1993). The
the SF-36 (0.67). Responsiveness to change was deliberate omission of pain, social support and
investigated using data from four longitudinal quality of life from the short WONCA version, in
Dutch studies. These showed that the charts differ the interests of brevity, is regrettable, particularly
in responsiveness to change by diagnosis (van Weel in view of the current emphasis on including a
et al. 1995). The evidence on responsiveness to specific self-assessment of quality of life in broader
change is limited, although studies have indicated health-status scales. The charts also suffer from the
that the charts are responsive to improvements in same disadvantage as single-item measures: they are
elderly patients with a range of conditions (van limited in content and therefore their sensitivity is
Weel et al. 1995). potentially limited. Anderson et al. (1993) con-
The performance of single charts is less precise cluded that the content validity of the physical
in detecting differences in functional status than fitness chart, for example, is too restricted in scope
multi-item health-status scales, although the full to be sensitive to disability in older people.
set of charts perform well together in comparison The psychometric testing of the adapted charts is
with the Rand Medical Outcomes Study short a constant process under the umbrella of WONCA,
form measures (Meyboom-de Jong and Smith who provide a handbook of the scale (van Weel
1990), and show higher levels of sensitivity than et al. 1995). The charts are available in over 20
Hunt et al.’s (1986) Nottingham Health Profile languages. The COOP/WONCA charts may be
(Coates and Wilkin 1992). The influence of the used for research and clinical care. Permission to use
illustrations in increasing sensitivity, rather than the COOP/WONCA charts specifically exclude
biasing respondents in some way, particularly in the right to distribute, reproduce or share them in
different cultural settings, is uncertain (McHorney any form for commercial purposes.
et al. 1992).
THE CORNELL MEDICAL INDEX (CMI)

Reliability
Test-retest correlations at four weeks in a German The Cornell Medical Index was initially designed
study were reported to be modest at 0.42–0.62 (van for use by physicians in their medical history taking.
de Lisdonk and van Weel 1992). These results The aim was to save their time and ensure, where
echo the low test-retest correlations reported for appropriate, that a large body of medical and
the original charts (Meyboom-de Jong and Smith psychiatric data could be collected. The CMI was
1990). In a Dutch study, with an interval of three intended to serve as a standardized medical history
72 MEASURING HEALTH
(Brodman et al. 1949, 1986). The informal language 18 sections (labelled A–R) relating to physical
used on the instrument was intended to be readily problems (eight sections), personal habits, frequency
translatable into medical terminology. For example: of illness (four sections), moods and feelings (six
‘Does your heart often race like mad?’ (tachycardia); sections). The sections of the CMI are:
‘Do you have to get up every night and urinate
(pass water)?’ (nocturia). A Eyes and ears
In the process of its development, many vari- B Respiratory system
ations of the CMI items were tested for validity on C Cardiovascular system
more than 1,000 people in several geographic areas D Digestive tract
(Lowe 1975). Those items containing replies sub- E Musculoskeletal system
stantiated on interview were included in the final F Skin
G Nervous system
version. The final form was tested on 179 medical
H Genito-urinary system
out-patients and 191 admitted medical patients I Fatigability
whose CMI responses were compared with their J Frequency of illness
medical histories taken by participating physicians. K Miscellaneous disease
The CMI interpretations identified 94 per cent of L Habits
the diagnostic categories in which disease was con-
firmed by clinical investigations (Brodman et al. Mood and feelings
1951). The CMI was widely used in the USA until M Inadequacy
the 1980s, and was considered to be valid and reli- N Depression
able. Some changes to wording were made, in an O Anxiety
initial attempt to update the instrument, although P Sensitivity
no substantive changes were made, and the revised Q Anger
version was reprinted and copyrighted in 1986 R Tension
and sold until 1990 when its use was phased out. It
was felt that the CMI required revision and Examples of questions are:
revalidation.
Although the index was intended for use as an aid Does your face often get badly flushed? Yes/No
to medical history taking in clinic and hospital set- Do you suffer badly from frequent severe headaches?
tings, the authors stated that it could also be useful Yes/No
in population health surveys. A one-page diagnostic Do you usually feel bloated after eating? Yes/No
Has a doctor ever said you have kidney or bladder
sheet is used to enable physicians to summarize
disease? Yes/No
their diagnostic inferences from the CMI data (eyes Did anyone in your family ever have a nervous break-
or ears, respiratory, cardiovascular, teeth, gastro- down? Yes/No
intestinal, liver and gall bladder, musculoskeletal, Do you often feel unhappy and depressed? Yes/No
skin, nervous system, genital system, urinary tract,
endocrine, metabolism, other; and a scale for There are two forms of the CMI, one for men and
severity of emotional disturbance is included). one for women (differing only in the genito-
The CMI is self-administered. Individuals usually urinary section). The number of problems admitted
take 10 to 30 minutes to complete it. The authors are scored and the total number indicates the degree
report that the time taken to complete the CMI of deviancy.
depends on people’s level of education, and the
speed at which they reach decisions. They report
Scoring
the CMI to be acceptable to patients. No training is
needed for administration or analysis. Responses indicating the existence of a problem
(the ‘yes’ replies) are totalled. More than 25 positive
responses to items indicate the presence of serious
Content
disorder. A medically significant emotional dis-
The CMI is a four-page questionnaire which turbance is considered present if the total score is
contains over 200 yes–no questions divided into 30 or more. If the ‘yes’ replies are chiefly located
in one or two sections, the medical problem is In a large-scale study based in New York,
localized. If they are scattered throughout the index, Brodman et al. (1959) compared the results of
the problem is likely to be diffused, usually involving medical examinations of 5,929 hospital out-
a medical disturbance. More than two or three ‘yes’ patients with computer analyses of the CMI for
answers on the last page (moods, feelings, attitudes the same patients. These authors reported that
and behaviour questions) suggest a psychological CMI data provided correct diagnoses of 60 com-
disturbance. However, interpretation of symptoms mon disorders in only 44 per cent of patients
depends on medical knowledge and thus the index known to have these disorders. Abramson et al.
is not a useful diagnostic tool outside medical (1965), in a study of 120 randomly selected
settings. adults from a Jerusalem housing project, tested its
sensitivity and specificity, the former in relation to
the proportion of unhealthy persons who gave
the defined CMI response (‘yes’ to items), and
Although this was a popular early measure, there specificity in terms of the proportion of healthy
are few reports of the instrument’s reliability and persons who did not give this response. Health
validity. One published study supporting the con- status was independently assessed by local
current validity of the CMI, based on 5,119 patients physicians. The authors reported that the responses
reported that older people reported more physical to individual CMI questions referring to named
problems than younger people; and that women disorders were not very valid indicators of the
reported more problems than men (Brodman et al. presence of these disorders. Of the people whom
1953). Abramson et al. (1965) summarized the CMI physicians reported as having a specific con-
scores for 16 samples of people in the USA and the dition, only a half, or in some cases less than a half,
UK. Reported CMI scores of 30+ ranged between of these people themselves reported its presence.
4 and 82 per cent for female sample members, and However, the conditions were rarely reported by
were again lower, between 0 and 79 per cent, for people who were assessed by physicians as not
males. The populations studied ranged widely from having them. CMI scores (based on the number of
psychiatric hospital out-patients to people labelled ‘yes’ responses) were significantly correlated with
as ‘healthy’. There are many examples of early the physicians’ ratings of overall ill health and
applications of the CMI in clinic and institutional emotional ill health, although the correlations
settings. Comprehensive references on applications were weak to moderate (0.26–0.48). Moderate
of the index have been compiled by its owners at correlations were also obtained between
Cornell Medical University (Lowe 1975). Brodman physicians’ ratings of emotional and overall
et al. (1954a), in a study of 7,527 men undergoing health (0.52 for women and 0.57 for men); 78 per
medical examination (including the CMI) for army cent of the people rated as ‘ill’ by physicians
training, analysed the follow-up records of 900 of were also rated by them as emotionally disturbed.
them four months later, after their commencement The authors also identified the ten key questions
of training. High CMI scores (50+) were signifi- which were the best predictors of physician
cantly associated with a large amount of reported ratings of emotional ill health, and there was a
sickness, hospitalized days, convictions and dis- moderate correlation between the number of these
charges from the service. The CMI scores for items with positive scores and total CMI scores
psychiatric and psychosomatic disturbances was (0.63):
reported to be able to predict ability in performing
military duties (Brodman et al. 1954a, 1954b). Are you easily upset and irritated?
Do you suffer from severe nervous exhaustion?
Within a year of induction, a third of those within
Do you usually have great difficulty in falling asleep or
the highest 5 per cent of scores were no longer effec- staying asleep?
tively performing their duties. In a later study of Are you definitely underweight?
military veterans in the USA, men with experience Do you often become suddenly scared for no good
of combat and non-combat trauma were reported reason?
to have 16 per cent more symptoms on the CMI Are you considered a nervous person?
than non-exposed men (Schnurr et al. 1998). Are you constantly keyed up and jittery?
74 MEASURING HEALTH
Do you usually feel unhappy and depressed? the categories of sensory, affective and evaluative
Does life look entirely hopeless? words describing intensity; the results were then
Does every little thing get on your nerves and wear checked by 20 reviewers. Further testing among
you out? 140 students, 20 physicians and 20 patients led to
the current 78-item format of the scale and the
Factor analysis revealed six factors in the psychiatric scale values. The 78 words are grouped in 20 sub-
symptom dimensions of the CMI: irritability, classes of three to five descriptive words, and the
inability to cope, depression, timidity, normal 20 sub-classes are grouped in four sections: sensory,
anxiety and clinical anxiety (Costa and McCrae affective, evaluative and miscellaneous. Several
1977). versions of the scale exist, and a short version of the
The CMI is a medical history tool, rather than a scale has also been developed which consists of 15
patient-based measure of broader health status, items (Melzack 1987). The completion of the full-
although it has also been used as a survey instru- length scale takes from 5 to 15 minutes, and the
ment, and with good results for reliability (internal short version takes 2 to 5 minutes, depending on
consistency) (Osaka et al. 1998), and in studies of the familiarity of the patients with the words.
health outcomes. In a study of outcome of vasec-
tomy, symptoms measured with the CMI were
Content
associated with a scale measuring neuroticism as
well as post-surgical complications (Wig et al. Examples of the 78 words respondents are asked to
1970). However, Abramson et al. (1965) has argued choose from to describe their pain are:
that the index was of little value in comparisons
of widely divergent cultures within and between Flickering
societies. As stated earlier, concerns about the Quivering
relevance and psychometric properties of the CMI Pulsing
have led to its withdrawal from use by its owners Throbbing
Beating
at Cornell University Medical College (now the Pounding
Weill Medical College of Cornell University),
on the grounds that it is outdated and no longer Tight
suitable for clinical research. Cornell still retains the Numbing
copyright to enable it to reinitiate the CMI in the Drawing
Squeezing
future if revision and revalidation are carried
Tearing
out (Brodman et al. 1986; Weill Cornell Medical
Library 2003). In addition to the 78 words describing pain, nine
words also assess course over time and the location
of the pain is assessed with a drawing of a body with
THE MCGILL PAIN QUESTIONNAIRE (MPQ) the words ‘external’ and ‘internal’ added.
The short version (SFMPQ) includes only the
This pain scale is not a broader measure of health items: throbbing, shooting, stabbing, sharp, cramp-
status but it comprises an important, and often ing, gnawing, hot-burning, aching, heavy, tender,
neglected, domain of measurement; it is therefore splitting, tiring-exhausting, sickening, fearful,
included here. Many health-status and health- punishing-cruel.
related quality of life measures fail to incorporate a
full assessment of pain. Apart from various single-
Scoring
item visual-analogue rating scales to assess pain, the
most frequently used and most tested measure is the Each description carries a weight corresponding to
McGill Pain Questionnaire (MPQ). The McGill the severity of the pain, according to evaluations of
Pain Questionnaire consists principally of lists of appropriate ordinal rankings by panels of doctors,
terms describing the quality and intensity of pain. patients and students. People then receive pain
Melzack and Torgerson (1971) selected 102 words scores according to the number of descriptive items
describing pain from the literature, sorted them into they select to describe their pain and their assigned
weights (0 = no pain; 1 = mild pain; 2 = discomfort- sophisticated analytic procedures and a large
ing; 3 = distressing; 4 = horrible; 5 = excruciating). sample.
The McGill Pain Questionnaire generates four
subscales differentiated by factor analysis. These Reliability
include sensory and evaluative subscales to desig-
nate perception of pain, and an affective subscale to Evidence of reliability is more sparse (Graham et al.
denote emotional response to pain. The fourth 1980). Melzack (1975) reported a test-retest study,
is a miscellaneous subscale. These four subscales based on just ten patients who completed the scale
result in four scores. The four scores add up to three times at intervals of three to seven days
a total score (the pain rating index). The scale was between each. The average consistency of response
developed in recognition of these distinct com- was reported to be 70.3 per cent. The test-retest
ponents of pain. reliability for the 20 categories of pain descriptors,
There are four possible and distinct scoring the three specific subscales and the total score have
methods ranging from the number of words been reported to range from weak to moderately
chosen to the sum of the scale values (based on the good (Reading et al. 1982; Love et al. 1989).
scale weights) for all the words selected in a given Although it is difficult to demonstrate that the
category or across all categories (Melzack 1975). McGill Pain Questionnaire does reflect the concep-
Full details of the scale were published by Melzack tual definition of pain as three distinct dimensions,
and Katz (1992). it is still the leading measure of pain. Cross-cultural
applications of the full scale have been reviewed by
Naughton and Wiklund (1993), and a list of the
Validity languages it has been translated into is given by
Dubuisson and Melzack (1976) reported that the Melzack and Katz (1992). It is more often used for
McGill Pain Questionnaire was able to correctly the measurement of chronic, rather than acute,
classify 77 per cent of 95 patients with eight pain pain, as it is generally believed that most single-item
syndromes into diagnostic groups on the basis of visual analogue scales work adequately for acute
their verbal description of pain. Other research on pain (McQuay 1990).
patients with facial pain has shown that the scale
can correctly predict diagnosis for 90 per cent of THE EUROQOL (EQ-5D)
patients (Melzack et al. 1986). Correlations between
McGill Pain Questionnaire score and visual ana- The EuroQol is a generic, multi-dimensional health
logue ratings for 40 patients ranged from 0.50 to profile, designed to generate a single index value
0.65 (Melzack 1975). The specificity of the measure for each health state (EuroQol Group 1990; Kind
was demonstrated by Greenwald (1987) in a study 1996). The aim of the EuroQol is to provide a
of 536 cancer patients. standardized, non-disease-specific survey instru-
The short version correlates consistently highly ment and to generate a cardinal index of health for
with the long version of the McGill Pain Question- describing health-related quality of life and for use
naire (Melzack 1987). It has also been shown to be in economic evaluation (EuroQol Group 1990).
sensitive to treatment for pain among patients It thus provides a simple descriptive health profile
with sciatic pain (Eisenberg et al. 2003), and was and a single index value for health status. It is
selected as one of the most psychometrically sound assumed that the five health dimensions included
measures of pain in a review of instruments by in the measure equate with health-related quality of
Nemeth et al. (2003). life. It was intended to complement other health-
There is support from results of principal related quality of life measures.
components analysis for the distinction between An earlier version of the scale contained six
affective and sensory dimensions of pain, but less dimensions as well as some design faults, which
for a distinctive evaluative component (Reading were largely corrected in the revised version – EQ-
1979; Prieto and Geisinger 1983). Lowe et al. 5D. Population norms are available (Kind et al.
(1991) have more recently supported the basic 1999). The instrument is cognitively simple, and it
three-factor structure of the scale, using more was designed for self-completion.
76 MEASURING HEALTH
Content Validity and reliability
Section one consists of five single-item dimensions
covering mobility, self-care, usual activities, pain/ Much of the evidence on the psychometric proper-
discomfort and anxiety depression, each with a ties of the EuroQol relates to previous versions. A
three-point response scale to indicate the level of population survey in Sweden showed that the
problems. The respondent is requested to indicate EuroQol mean health state scores were associated
their health state by ticking the box next to the with socio-economic status in the expected direc-
response statement that applies to them. This pro- tion, and discriminated between disease groups
duces a single code (number) for each dimension. (Burström et al. 2001). Dorman et al. (1997a,
The self-care subscale still retains some ambiguity as 1997b) reported results from patients registered in
the scaled response choices do not consistently refer a UK stroke trial, who were randomized to follow-
to the same activities. The item response choices are up with either the EuroQol or the SF-36. The
‘I have no problems with self-care’, ‘I have some response rates to both instruments was about 50
problems with washing and dressing myself’, ‘I am per cent. Respondents to the EuroQol reported
unable to wash and dress myself’. The difficulty in dependency in activities in daily living significantly
interpretation is that ‘self-care’ usually refers to a more often than patients responding to the SF-36.
wider range of activities than simply washing and While the authors interpreted this as evidence of
dressing. Respondents are expected to infer that the greater dependency among EuroQol responders, it
item refers to washing and dressing by reading is possible that the result might reflect the brevity of
down the list of other response choices. The devel- the EuroQol scale, which inevitably results in loss of
opers have acknowledged that the range of response information and sensitivity. Insinga and Fryback
items in Section 1 is narrow, and that some (2003) reported a lack of correspondence between
respondents may find them too limited. However, the EuroQol health state descriptions and their
one appeal of the scale is its brevity and simplicity. respondents’ actual health experiences, probably
Section two contains a 100-point VAS – therm- because the EQ-5D descriptions are too sparse to
ometer – scale on which respondents are asked to capture health states accurately. They suggested that
mark how good or bad their own health is ‘today’. assigning new health state levels or dimensions may
The anchors are 0 (worst imaginable health state) to improve the scale. Luo et al. (2003) compared the
100 (best imaginable health state). results of the EQ-5D, the SF-36 and the Health
Utilities Index mark 3 (HUI3) in patients with
rheumatic diseases. They found that patients who
Scoring
reported no problems with mobility on the EQ-5D
The descriptive data from Section one can be used reported different levels of difficulty with ambula-
to provide health-related quality of life profiles tion on the HUI3. In explanation, they concluded
across the five dimensions. It can also be used to that the instruments measure slightly different,
generated a weighted health index, based on tables although related, dimensions of health. It has suf-
of values derived from samples of the general popu- fered from moderate to low response rates in a
lation. The VAS score can be used to analyse number of population surveys, is highly skewed and
changes in health status of individuals or groups has relatively poor sensitivity (Brazier et al. 1993a,
over time. 1993b; Bowling 1998), particularly in relation to
The EQ-5D health states may be converted to a disease-based outcomes research (Casellas et al.
single summary (EQ-5D index) by applying scores 2000; Selai et al. 2000).
from a standard set of population values. For Test-retest reliability correlations for previous
example, the UK EQ-5D index tariff, based on data versions ranged between 0.69 and 0.94 (Uyl-de-
from a population sample in the UK, links a single Groot et al. 1994; Van Agt et al. 1994). Other
index value to the health states described by the research indicates that the instrument is highly
EQ-5D (Dolan et al. 1996; Dolan 1997). Population skewed with large ceiling effects (Brazier et al. 1992,
values for a subset of health states defined by the 1993). It is also possible that the length of the
EQ-5D are also available for several other thermometer scale (0–100) is biasing, and the time
countries. referent is also very short (‘today’). It is frequently
criticized (Carr-Hill 1992; Jenkinson and McGee the EQ-5D is still fairly limited. The EuroQol
1998). The EuroQol Group has recognized meth- is available in several languages. It is in the
odological problems with the EuroQol and has public domain and can be used without charge
worked to improve it (Brooks 1996), although no for non-commercial research. Information about
further revisions are imminent at present. the EuroQol can be found on http://
The literature on the psychometric properties of www.euroqol.org/
5
MEASURING
PSYCHOLOGICAL
WELL-BEING
There are numerous scales of psychological well- relation to cross-cultural applicability and sensitivity
being, in particular those which are aimed specific- (Copeland 1990).
ally at detecting common psychiatric disorders The following sections review the most popular
such as anxiety/depression, dementia and mental and easily administered scales of common psychi-
confusion. The reliability of the classifications of atric disorders in current use in the UK: anxiety/
many of these has been questioned. Gold standards depression and, among the elderly, mental con-
for diagnostic categorization have been developed fusion. Some of the broader health-status scales also
over the years, including the revised Diagnostic include sections or batteries on mental health, for
and Statistical Manual (DSM-IV) of the American example the Rand Batteries (depression screener)
Psychiatric Association (2000) and the Inter- and the SF-37 mental functioning dimension and
national Classification of Diseases (World Health summary scale (see Chapter 4).
Organization 1992, 1993, 1994), and some highly Scales of psychiatric status are generally
regarded diagnostic instruments such as the Present developed and applied within specific countries.
State Examination and the Research Diagnostic It is often difficult for such scales to have cross-
Criteria (Wing et al. 1974; Spitzer et al. 1978; Wing cultural applicability without careful adaptation
1991). However, these still fail to account realistic- because of culture-specific concepts and values,
ally for features of personality or physical illness, although some of the more popular scales have been
which may intervene. In the UK, the Geriatric translated and adapted for use in many countries
Mental State Questionnaire (GMS) (Copeland et al. outside the UK (e.g. the General Health Question-
1976) has been developed for use with the elderly naire and the Geriatric Mental State).
and is often regarded as a gold-standard survey
instrument; the GMS is reviewed in this chapter.
A more recently developed and tested assessment DEPRESSION
schedule for the diagnosis of mental disorder,
although again only in relation to elderly people, The term depression is increasingly used to cover
is the Cambridge Examination for Mental Dis- a wider range of psychological disturbances. There
orders of the Elderly (CAMDEX) (Roth et al. is considerable confusion about the variety of
1986). This also pays special attention to the detec- meanings and this, in turn, has led to conflicting
tion of dementia. However, it is extremely long, research findings about aetiology and treatment of
and unsuitable for use as a research tool except choice (Snaith 1987). There is little confusion about
in a clinical psychiatric survey; it also compares the recognition of a severe state, sometimes called a
unfavourably with other well-developed scales in psychosis, but milder degrees of the condition lack
MEASURING PSYCHOLOGICAL WELL-BEING 79
definition. Sometimes the term ‘depressive neurosis’ clinical diagnostic criteria most commonly used to
is used but a variety of concepts are associated with characterize depressive disorders. It aims to screen
this term. In the Present State Examination Manual, for affective, psychological and somatic symptoms
Wing et al. (1974) have provided the most succinct of depression. Factor-analysis-derived symptom
definitions of psychopathological terms, and in an clusters led to the selection of 20 items covering
attempt to clarify the issues of terminology their ‘pervasive affect’, ‘physiological equivalents or
definitions of depressive states related to depressed concomitants’ and ‘psychological concomitants’.
mood, recent loss of interest, self-depreciation, The scale is simple to complete and takes about
hopelessness and observed depression. There were ten minutes.
a number of additional definitions provided for
neglect due to brooding, subjective anergia, Content
slowness and under-activity, inefficient thinking,
poor concentration, suicidal plans or acts, morning The scale contains 20 statements; 10 are worded
depression, social withdrawal, guilt and other related symptomatically positive and 10 are worded
symptoms. symptomatically negative. Respondents are asked
Wide discrepancies in the early literature on how the listed statements apply to them. Adminis-
estimates of the prevalence of mental illness in dif- tration is by self-completion; and completion
ferent communities were largely the consequence time is ‘a few minutes’ (Zung et al. 1965). Examples
of the use of different diagnostic criteria. The are:
development and use of standardized research
instruments linked to clearly stated diagnostic I have trouble sleeping at night.
criteria led to less variation between studies. I eat as much as I used to.
However, considerable differences still exist I notice I am losing weight.
between measures with the implication that com- I have trouble with constipation.
My heart beats faster than usual.
parisons between studies can be made only if two
I feel hopeful about the future.
studies have utilized the same basic measurement I find it easy to make decisions.
scale.
(Copyright © William Zung 1965; 1974. All rights reserved. Reproduced
Many research studies have based the definition with the permission of the author’s family.)
of a ‘case’ of depression on a certain score having
been attained on one of the large number of depres- There are four choices of response category, with
sion-rating scales. The different construction of numerical values of 1–4 respectively: none or a little
these scales and the different individual items they of the time; some of the time; a good part of the
include create major difficulties in comparing time; most of the time.
studies, unless they have been developed in relation
to clear criteria such as the DSM-IV (American
Scoring
Psychiatric Association 2000).
The majority of depression-rating scales contain Each item scores 1–4. The score values for the 20
a diverse collection of symptoms, attitudes and items are summed. An index is derived by dividing
feelings and will produce high prevalence rates and the sum of the raw scores by the maximum possible
a large proportion of false positives if case detection score of 80, converted to a decimal and multiplied
relies upon them (Snaith 1987). Kutner et al. (1985), by 100 (Zung et al. 1965). The need to transform
studied depression and anxiety in dialysis patients, the scores has been questioned by Gurtman
found that scales containing disease-related items (1985).
yielded exaggerated scores. The interpretation of scores is based on norms,
which are given in the scale’s manual. Below 50 =
normal, 50–59 = minimal to mild depression,
ZUNG’S SELF-RATING DEPRESSION SCALE 60–69 = moderate to marked depression, 69+ =
severe to extreme depression. The norms for
The Self-Rating Scale was developed by Zung these scores were based on adults aged 20–64
(1965). It was constructed on the basis of the and may not be appropriate for younger or older
80 MEASURING HEALTH
people. The cut-off points are not agreed, and Hamilton of r = 0.80, although the correlation was
investigators have adopted a range of cut-offs from lowest at greatest severity levels. In comparison
50 to 60. with clinical ratings of severity, Biggs et al. (1978)
Zung et al. (1965) reported that the mean score reported a correlation with the Zung of 0.69, with
on the Zung Scale for out-patients diagnosed as differences at higher severity. Toner et al. (1988)
depressive reaction was 64. Zung (1965) reported reported a similar correlation against psychiatric
that the mean score for hospitalized in-patients judgements at r = 0.65. The scale is also able to
diagnosed with depressive disorders was 74. Hagg distinguish between patients with a confirmed
et al. (2003), on the basis of a clinical trial of treat- diagnosis of a depressive state and patients with an
ment and a study of the clinical meaning of score initial diagnosis of depression, but subsequently
changes, estimated the minimal clinically impor- reviewed and given another psychiatric diagnosis
tant difference of Zung scores to be 8 units, not (Zung 1965, 1967). Zung et al. (1965) reported
exceeding the tolerance interval of 8–9 units. They the scale was able to distinguish reliably between
concluded that the Zung might therefore require an patients with a psychoneurotic depression and those
increase in scores in order to exceed the 95 per cent with an anxiety reaction and found the instrument
tolerance interval when used in clinical decision sensitive to clinical change. Gallegos-Orozco et al.
making, and for the calculation of statistical power (2003) also reported that patients categorized as
when estimating sample size. depressed had poorer scores on the Short Form-36
Altogether over half of the items of this widely questionnaire (a measure of broader health status),
used depression scale are composed of feelings or thus supporting its construct validity.
symptoms which do not necessarily indicate the Toner et al. (1988) reported that the convergent
presence of a psychological disorder, and Kutner validity of the Zung Scale, when tested against
et al. (1985) found that these disease items led to an physicians’ judgements, was lower than that of
exaggerated score. the well-tested Comprehensive Assessment and
Referral Evaluation (CARE) scale, 0.65 (kappa:
0.29) in comparison with 0.73 (kappa: 0.46). Also,
Validity
Carroll et al. (1973) assessed patients known to
Zung et al. (1965) tested the scale for validity using have varying degrees of depression and reported
the Minnesota Multiphasic Personality Inventory that the Zung Scale was unable to distinguish
as a gold standard. They administered the scale to between the groups; they reported a correlation of
new psychiatric out-patients at Duke University 0.41 between interview-based and Zung ratings.
(number not specified). The correlations between Carroll et al. (1973) blame the self-report method
their instrument and three subscales of the Min- rather than the Zung Scale, as the former adopts
nesota Inventory were: 0.70, 0.68 and 0.13. The a different perspective to a psychiatric interview.
authors commented that the latter correlation was Other studies have found the Zung Scale to be
‘unexpectedly low’. Brown and Zung (1972) tested unresponsive to changes in treatment patterns
the scale against the Hamilton Depression Scale (Hamilton 1976). Arfwidsson et al. (1974) cast
and reported a correlation of 0.79. The convergent doubt on the validity of the scale, and argued
validity of the Zung scale, when tested against that doctors’ ratings are more valid (i.e. unbiased)
with physicians’ judgements, was lower than that as a means of assessing the degree and quality of
achieved by the Comprehensive Assessment and depressive symptoms.
Referral Evaluation (CARE), 0.65 (kappa: 0.29), in The instrument was designed to be unidimen-
comparison with 0.73 (kappa: 0.46) (Toner et al. sional, although a factor analysis by Morris et al.
1988). (1975) confirmed two dimensions: agitation and
Davies et al. (1975) reported a correlation self-satisfaction. Blumenthal’s (1975) factor analysis
between the Zung and Hamilton Scale of r = 0.62, of the scale yielded four subscales: a well-being
between the Zung and the Beck Depression Scale index, a depressed mood index, an optimism index
of r = 0.73, and between the Zung and a visual and a somatic-symptoms index. Cross-cultural
analogue scale of r = 0.62. Biggs et al. (1978) applications have been reviewed by Naughton and
reported a correlation between the Zung and the Wiklund (1993).
Reliability Content
There is little evidence of the scale’s reliability. The scale’s ten items encompass the most com-
Zung (1986) reported split half correlations of 0.92. monly occurring symptoms of depressive illness
Knight et al. (1983) reported a coefficient of 0.79. which change in response to treatment (see above).
The scale has been used successfully with clustering Examples are:
techniques by Byrne (1978). Kaszmak and Allender
(1985) suggest that it is unsuitable for use with Apparent sadness: representing despondency, gloom
elderly people because of the large number of som- and despair (more than just ordinary transient low
atic items. Also, response rates for self-completed spirits), reflected in speech, facial expression and
scales may be lower than for administered scales. A posture. Rate by depth and inability to brighten up:
comparative study of Zung’s Scale with the short 0 No sadness.
version of CARE showed a response rate of 65 per 1
cent for Zung and 100 per cent for CARE. Among 2 Looks dispirited but does brighten up without
the reasons for non-response were visual problems difficulty.
(28 per cent), illiteracy (9 per cent) and lack of 3
motivation (34 per cent) (Toner et al. 1988). These 4 Appears sad and unhappy most of the time.
5
response difficulties apply to all self-rating scales.
6 Looks miserable all the time. Extremely despondent.
The scale is popular because it is short and easy
to complete. Zung (1972) developed an observer Reduced sleep: representing the experience of reduced
rated scale to complement the Zung, although duration or depth of sleep compared to the subject’s own
Thompson (1989), in line with most reviewers of normal pattern when well.
the scale, reported that it has no advantages. 0 Sleeps as usual.
1
2 Slight difficulty dropping off to sleep or slightly
MONTGOMERY-ASBERG DEPRESSION RATING reduced, light or fitful sleep.
SCALE (MADRS) 3
4 Sleep reduced or broken up by at least two hours.
Observer rating scales are more often used than 5
patients’ own self-rating scales in medical treatment 6 Less than two or three hours’ sleep.
trials for depression (Demyttenaere and Fruyt
2003). The Montgomery-Asberg Depression Lassitude: representing a difficulty getting started or
Rating Scale was developed by Montgomery and slowness initiating and performing everyday activities.
Asberg (1979) on the basis of 54 English and 52 0 Hardly any difficulty in getting started. No
Swedish patients who completed a 65-item psycho- sluggishness.
pathology scale. Analysis identified the 17 most 1
commonly occurring symptoms in depressive ill- 2 Difficulties in starting activities.
ness in the sample. Subsequent analyses, using 64 3
patients on different types of anti-depressive drugs, 4 Difficulties in starting simple routine activities which
are carried out with effort.
were then used to create a ten-item depression 5
rating scale which consisted of those items showing 6 Complete lassitude. Unable to do anything without
the greatest changes with treatment. The scale can help.
be used for assessing the severity of depression, and
changes, during medical treatment for depression. Pessimistic thoughts: representing thoughts of guilt,
This scale is oriented towards psychic symptoms inferiority, self-reproach, sinfulness, remorse and
and covers apparent sadness, reported sadness, ruin.
inability to feel, difficulty in concentration, inner 0 No pessimistic thoughts.
tension, pessimistic thoughts, suicidal thoughts, 1
lassitude, reduced sleep and reduced appetite. The 2 Fluctuating ideas of failure, self-reproach or self-
scale’s advantage lies in its brevity and ease of use depreciation.
by raters. 3
82 MEASURING HEALTH
4 Persistent self-accusation, or definite but still rational Parker et al. (2003) analysed the factor structure
ideas of guilt or sin. Increasingly pessimistic about the of the MADRS with a sample of older patients with
future. depression, and reported that all ten items loaded
5 <0.60 on a domain. There were three interpretable
6 Delusions of ruin, remorse or unredeemable sin.
Self-accusations which are absurd and unshakeable.
MADRS factors which reflected geriatric depres-
sion dimensions: dysphoric apathy/retardation
which comprised five items (apparent sadness,
Scoring reported sadness, lassitude, reduced concentration,
Scoring involves computing of scale items. The inability to feel); psychic anxiety comprised three
procedure of arriving at a definition of a case by items (inner tension, pessimistic thoughts, suicidal
adding up numbers derived from a severity-grade thoughts); and vegetative symptoms comprised
score of a number of symptoms has a certain attrac- two items (sleep and appetite). They concluded that
tion until it is realized that one case may be very the scale may be useable for monitoring treatment
different from another. outcomes among geriatric patients with depression,
Validity Reliability
Montgomery and Asberg (1979) tested scale scores Good results for reliability have been reported by
against psychiatrists’ judgements, using 18 patients the authors and the scale was found to be robust
who responded to treatment and 17 who did when used by different professionals in a variety of
not. The scale was able to discriminate between the health care settings (GPs, nurses, psychologists,
two groups. The scale was also tested against the psychiatrists) and high inter-rater correlations were
Hamilton Depression Scale with a reported corre- produced. Comparisons between two English
lation of 0.70. Kearns et al. (1982) reported that the raters, two Swedish raters and one English and
scale was able to distinguish between different levels one Swedish rater, rating 11 to 30 patients, pro-
of severity, and performed as well as the Hamilton duced correlations of between 0.89 and 0.97
Depression Scale. Snaith and Taylor (1985) reported (Montgomery et al. 1978).
a high correlation between the scale and the depres- The MADRS has been used less often for rating
sion scale of their self-rated Hospital Anxiety and depression than the Hamilton, although it is easy
Depression (HAD) scale (0.81), although the corre- to use and Khan et al. (2002) recommended it as a
lation with the anxiety scale of the HAD was low desirable rating tool in large-scale clinical trials of
(0.37). Cooper and Fairburn (1986) found that a treatment for depression. The advantage of observer
group of bulimic and a group of depressed patients rating scales, over patients’ self-ratings, in treatment
had similar scores on the Montgomery-Asberg trials for depression is that the measure is not biased
Depression Rating Scale but different symptoms, by the patient’s optimism–pessimism or other biases
or scale items, contributed strongly to the overall (e.g. people in lower spirits tend to perceive other
score in the two groups. However, 30 per cent of aspects of life in more negative terms than people
the total score is apparently accounted for by three in higher spirits). However, there is also a danger in
items which all commonly occur in physical ill- assuming that clinical ratings are unbiased; the
nesses and states of distress other than depression MADRS scores might be more accurately inter-
(Williams 1984; Cooper and Fairburn 1986). preted as clinical impressions of depression and
Khan et al. (2002) reported the use of the change (Mulder et al. 2003). Demyttenaere and
MADRS and the Hamilton Depression Scale, in a Fruyt (2003) pointed out that the MADRS and the
retrospective record review of over 200 adults with Hamilton Depression Scale were both developed
depression who had been participants in clinical to evaluate medical therapy (e.g. anti-depressant
trials, and assigned to anti-depressant or placebo medication) rather than psychotherapy (where
treatment. The effect size of the MADRS was 0.53, patient self-rating scales may be used more often).
which was similar to the Hamilton’s effect size. The They also reported that where scales exist in
authors concluded that the scales were similar in observer and patient rating formats, differences in
terms of sensitivity, regardless of type of treatment. ratings are found.
HAMILTON DEPRESSION RATING SCALE Somatic symptoms, gastrointestinal: 0–2
Loss of appetite
This is the most widely used observer rating scale Heavy feelings in abdomen
for depression, and changes after medical treatment. Constipation
It is also referred to as the Hamilton Depression
Insight: 0–2
Scale. It includes assessment of cognitive and
behavioural components of depression and is par- Loss of insight
ticularly thorough in the assessment of the somatic Partial or doubtful loss
aspects (Hamilton 1967). Like many depression No loss
scales, the Hamilton Depression Scale cannot be
used to establish a diagnosis of depression, but The somatic categories of the scale have been used
only to assess severity once depression has already alone, in conjunction with other depression-rating
been diagnosed. It is administered in a semi- scales. The somatic items relate to muscular, sensory
structured clinical interview, which requires about systems and to cardiovascular, respiratory, gastro-
30 minutes for the clinical interview (no interview intestinal, genito-urinary and autonomic symp-
guide is included, although guides have been toms. For example Martin (1987) used these items
developed by others and tested with good results without the Hamilton depression (psychosomatic)
(Freedland et al. 2002). Interviewer training is items together with the Montgomery-Asberg
required. Depression Rating Scale. Although the scale is one
of the most widely used in psychiatric research
(Freemantle et al. 1993), many investigators have
Content modified it (Paykel 1985). Potts et al. (1990)
The current version of the Hamilton Depression developed a fully structured interview version
Scale (Hamilton 1959, 1967) consists of 17 items: (suitable for use with lay interviewers) of the 17-
depressed mood, feelings of guilt, suicidal ideation, item scale for use in the Rand Medical Outcomes
work and activities, insight, retardation, agitation, Study. Inter-rater reliability with two psychiatrists
insomnia (early, middle and late), psychic anxiety, rating 20 subjects was good (Pearson’s r = 0.96), a
somatic anxiety, gastrointestinal symptoms, general finding confirmed by other studies (Korner et al.
somatic symptoms, genital symptoms (loss of libido 1990). The alpha correlations for internal con-
or menstrual disturbances), hypochondriases and sistency were 0.82 to 0.83. The test-retest corre-
loss of weight. The earlier version of the scale, lations were high at 0.65 for the total score,
which is still in use, contained 21 items. Other although the item correlations were variable at
versions containing 23 and 24 items were also −0.04 to 0.77 (15-day retest). They omitted the
developed. Examples of scale items are: items with low retest results and drew up a 14-item
to replace the 17-item version.
Depressed mood: 0–4
Scoring
Gloomy attitude, pessimism about the future
Feeling of sadness Some items are scored 0–4 and others are scored
Tendency to weep: 0–2. Scale items marked 0–4 have the response
sadness and/or mild depression choices for the rater of absent (0), mild or trivial (1),
occasional weeping and/or moderate depression moderate (2–3), severe (4); and scale items marked
frequent weeping and/or severe depression 0–2 have the choice of absent (0), slight or doubtful
extreme symptoms (1), clearly present (2). Four items, which provide
more details of the characteristics of depression, are
Anxiety, psychic: 0–4 not included in the original scoring scheme,
Tension and irritability although some users include them. The interpreta-
Worrying about minor matters tion of categories is described by Hamilton (1967)
Apprehensive attitude and reproduced by Williams (1984). The total is
Fears the simple sum of responses. The total scores range
84 MEASURING HEALTH
from 0–100 (representing the sum of two raters’ somatic disturbance and account for 31 per cent
scores or double the score for one rater). Some of the possible total score; if the insomnia items
studies report total scores with a maximum of 50. which so often rate highly in physical illness are
Hamilton did not provide advice on score cut-offs, included, then 42 per cent of the total is accounted
although most users use scores of <7 to indicate an for (Williams 1984).
absence of depression, 7<17 to indicate mild depres- Although initial factor analyses gave poor results
sion, 18<24 moderate depression, and 25+ severe (Hamilton 1960), later factor analyses were more
depression. Variations exist (Bech et al. 1986). satisfactory (Hamilton 1967; Mowbray 1972;
Bech 1981). A review of factor analyses of the scale
showed that it has produced between three and
Validity
seven factors (Berrios and Bulbena-Villarasa 1990).
The scale is reported to have high concurrent valid- Subsequent factor analysis of the scale yielded a
ity with good agreement with other scales, particu- five-dimensional solution for the scale, although
larly the Beck, with correlations reported of over only the first factor (comprising depressed mood,
0.70 (Hamilton 1976). Schwab et al. (1967) com- guilt, suicide, work and interests, agitation, psychic
pared the Hamilton with the Beck scale on 153 anxiety, somatic anxiety and loss of libido) was well
medical in-patients. The correlation between these defined and clinically interpretable (Gibbons et al.
two scales was 0.75. Knesevich et al. (1977) reported 1993). Dozois (2003) analysed the factor structure
the Hamilton scale correlated 0.68 with a change in of the 17- and a 23-item version of the scale with
global rating on a 10-point scale. Hamer et al. (1991) a sample of undergraduate students. Both versions
reported that a threshold score of 8 gave a sensitivity yielded four factors. The scale has been criticized
of 88 per cent and a positive predictive value of 80 for its narrowness, its focus on behaviour, and
per cent in comparison with diagnoses made with neglect of other pertinent areas, including cognitive
the DSM-III. Silverstone et al. (2002) reported that and affective symptoms (Raskin 1986).
two items from the 17 item scale – Depressed mood
(item 1) and Psychic anxiety (item 10) – were pre-
Reliability
dictive of treatment response, differentiated between
treatment groups, and predicted remission rates in Its inter-rater reliability is reported to be good:
clinical trials of depression. correlations are high ranging from 0.84 to 0.98
Tamaklo et al. (1992) reported that it correlated (Hamilton 1976; Knesevich et al. 1977; Rehm
highly with the Montgomery-Asberg Depression 1981). Muller and Dragicevic (2003) tested the
Rating Scale. However, Montgomery and Asberg inter-rater reliability with clinically inexperienced
(1979) reported the scale to be less sensitive than raters, after three standardized training sessions, and
their own, especially at the severe end of the scale, reported moderately high rates (kappa = 0.57–0.73)
a finding confirmed by Knesevich et al. (1977). for the total and for the items.
Carroll et al. (1973) reported that the Hamilton Results for the internal consistency of the scale
scale was better able than the Beck Depression have been variable (Bech 1981). Schwab et al. (1967),
Inventory to distinguish between groups of patients on the basis of a study of 153 medical in-patients,
known to have varying degrees of depression. They reported that it had higher internal consistency than
argued that self-report scales, such as the Beck, the Beck Depression Inventory. Potts et al. (1990)
overweight subjective scores. The Hamilton scale reported research which showed that the scale has
has also been reported to have a greater effect size for a high degree of scale reliability. However, they did
change than the Beck Depression Inventory (Sayer point out that it has been criticized for its lower
et al. 1993), and is frequently reported to be sensi- item reliability and its heavy reliance on the exper-
tive to treatment (Tollefson and Holman 1993). tise of the interviewer, who is psychiatrically trained.
Hamilton (1967) reported that the scale discrimin- The Hamilton Scale is the most consistently
ated between men and women; women generally used measure by raters (Freemantle et al. 1993), and
are more likely to have higher depression scores has been translated into many languages. Although
than men. popular, the number of items measuring somatic
However, six of the scale items are symptoms of problems requires some caution in interpretation.
THE BECK DEPRESSION INVENTORY (BDI) Content
The BDI amended and revised scale and the BDI-II
If the researcher is interested only in depression, and consists of 21 items which stress cognitive symp-
not more generally in both anxiety and depression, toms of depression. Each has four response choices,
then it is reasonable to use a specially designed scale. in the form of statements, ranked in order of
The Beck Depression Inventory (amended and severity, from which the respondent selects one that
revised) (BDI) is such a specific scale, and is the best fits the way he or she feels at ‘this moment’.
most widely used instrument for detecting depres- The symptoms and attitudes which the BDI aimed
sion. It was designed by Beck et al. (1961) because to measure are sadness, pessimism/discouragement,
other widely used scales (e.g. the Minnesota Multi- sense of failure, dissatisfaction, guilt, expectation
phasic Personality Inventory) were not specifically of punishment, self-dislike, self-accusation, suicidal
designed for the measurement of depression or ideation, crying, irritability, social withdrawal,
were based on old psychiatric nomenclature. It was indecisiveness, body-image distortion, work retard-
developed on the basis of the authors’ observations ation, insomnia, fatigability, anorexia, weight loss,
of two samples of 226 and 183 depressed patients’ somatic preoccupation and loss of libido. Each of
attitudes and symptoms while undergoing psycho- the 21 items on the new BDI-II also consists of a list
therapy, and clinical consensus regarding the symp- of four statements about a symptom for depression,
toms of depressed patients. These observations in order of increasing severity; this brings it into
led to the creation of a 21-item inventory; each line with the criteria in the fourth edition of the
category describes a specific behavioural mani- depression criteria of the Diagnostic and Statistical
festation of depression and consists of a series of Manual of Mental Disorders (DSM-IV) (American
self-evaluation items which are graded and ranked Psychiatric Association 2000). New items on the
according to severity (neutral to maximum). A scale replace the previous items on symptoms of
short 13-item version of the BDI was developed weight loss, changes in body image, and somatic
although this has been less often used (Beck and preoccupation. The original item on work dif-
Beck 1972; Beck et al. 1974; Reynolds and Gould ficulty was revised to examine loss of energy. The
1981). A version for children and adolescents items on sleep loss and appetite loss were also
was developed (Kovacs and Beck 1977). A new revised in order to assess increases and decreases in
edition of the BDI (BDI-II) has been developed, these areas. In order to comply with the DSM-IV
also consisting of 21 items to assess the intensity of guidelines which require assessing symptoms of
depression (Beck et al. 1996a). depression over the past two weeks, the time frame
The original amended and revised Beck Depres- was changed from one week (original BDI) to two
sion Inventory was generally regarded as better than weeks in BDI-II.
the Minnesota Multiphasic Personality Inventory Examples of items are:
and better than similar scales such as the Zung
Scale (Hammen 1981). Originally developed to be I am not particularly pessimistic or discouraged about
interviewer administered, it is now a self-rating the future.
scale. A large proportion of items relate to somatic I feel I have nothing to look forward to.
disturbance, and this has led to some controversy. I feel that I won’t ever get over my troubles.
It cannot be used to diagnose depression in the I feel that the future is hopeless and that things cannot
absence of a prior diagnosis, and is only a measure improve.
of severity once the clinical diagnosis has been I am no more irritated now than I ever am.
made. This is to exclude people being diagnosed I get annoyed or irritated more easily than I used
as depressed when they have high BDI scores for to.
situational reasons (e.g. bereavement). It has been I feel irritated all the time.
reported to have acceptable levels of reliability for I don’t get irritated at all by the things that used to irritate
use as a research screening instrument with an me.
elderly population (Gallagher et al. 1982). It I have not lost interest in other people.
takes 10–15 minutes to complete, and can be self- I am less interested in other people now than I used to
or interviewer-administered. be.
86 MEASURING HEALTH
I have lost most of my interest in other people and have lations between the Beck and global severity scores
little feeling for them. of r = 0.62 to 0.77 (Metcalfe and Goldman 1965;
I have lost all my interest in other people and don’t care Crawford-Little and McPhail 1973; Bech et al.
about them at all. 1975). However, Kearns et al. (1982) have reported
The Beck Depression Inventory. Reproduced with permission. Copyright the Beck to be weak in differentiating moderate
on the Beck Depression Inventory is held by the Psychological Cor-
poration (PsychCorp.com), 555 Academic Court, San Antonio, Texas, from severe depression. Beck et al. (1961), on the
78204-2498 USA and 24 Oval Road, London NW1 7DD UK, to whom basis of 226 hospital out-patients and admissions,
all enquiries relating to reproduction and use should be directed.
and 183 patients in a replication group, tested the
BDI against independent psychiatric diagnoses
Scoring made by four psychiatrists. Their agreement with
The Beck Depression Inventory is based on a the scale was 56 per cent; and agreement within one
Guttman scale. The original amended and revised degree of specificity was achieved in 97 per cent of
scale permitted selections of four to seven responses cases. The authors reported that the scale was able
which were given a weight of 0 to 3. Revisions to to discriminate between depth-of-depression cat-
the original scale were made in 1974 and in 1978, egories based on clinical ratings for both original
standardizing the response choices to four for each and replication groups, the correlations ranging
item; each still carries a weight of 0 to 3. from 0.59 to 0.68. Correlations of 0.66 were
The numerical values of 0 (low) to 3 (high) obtained between the Beck and Depression Adjec-
which are assigned to each statement indicate the tive Check Lists and of 0.75 with the Beck and the
degree of severity. In some categories two alterna- Minnesota Multiphasic Personality Inventory
tive statements (2a and 2b) are presented at a given (Beck 1970). It has been used successfully in clinical
level but are assigned the same weight. In the and general populations, although most studies have
revised version there is one alternative score for focused on psychiatric patients (Beck et al. 1961;
each level (so no statement is assigned the same Williams 1984). Against the Hamilton, correlations
weight). Items are scored, with a maximum total have been reported of 0.58 to 0.82 (Schwab et al.
of 63. The ability to analyse scores as continuous 1967; Williams et al. 1972; Bech et al. 1975; Davies
data makes it preferable to critics of cut-off points et al. 1975; Miller et al. 1985). Carroll et al. (1973)
who argue that they are artificial (Steer et al. argue that this camouflages the lack of congruence
1986). Several, slightly different, scoring guides at severe levels of depression. A review of the litera-
exist. Generally, however, normative data suggest ture from 1961 to 1986 by Beck et al. (1988)
the following categories of severity level: normal reported that the concurrent validities of the BDI
0–9; mild 10–15; mild–moderate 16–19; moderate– with respect to comparisons with clinical ratings
severe 20–29; severe >29+. Steer et al. (1986) pro- and the Hamilton Scale for Depression were high.
posed a slightly different scoring scheme. The scale The mean correlations with the Hamilton Scale
is available in a format suitable for computer scoring and clinical ratings for psychiatric patients were
from National Computer Systems, Minneapolis. over 0.70. The respective mean correlations for
The value of the scale is that it can be analysed as non-psychiatric patients were 0.74 and 0.60. The
continuous data, without artificial cut-off points. Beck was also shown to discriminate between sub-
types of depression and to distinguish depression
from anxiety. The literature reviewed indicated
Validity
relationships with the Beck and suicidal behaviour
A comprehensive review of the literature on the and alcoholism. There is other support for its dis-
reliability and validity of the original amended and criminative ability and convergent validity (Moreno
revised scale was published by Beck et al. (1988). et al. 1993). The scale has been reported to be
Tests to date indicate that this has moderate to good associated with perceived social support in a study
levels of validity and reliability, although most test- of carers for elderly confused people (those with
ing has been conducted on psychiatric populations. lower BDI scores reported that they received
The scale correlated well with clinicians’ ratings greater social support) (Morris et al. 1989). Novy et
of severity of depression and with other depression al. (1993) reported that it was highly associated
scales. Investigators have generally reported corre- with the State-Trait Anxiety Inventory in a study of
people with pain. The validity of the BDI was severity in almost a unidimensional manner, relying
reviewed by Richter et al. (1998). heavily on cognitive symptoms (Louks et al. 1989).
The BDI-II was compared with the original Factor analyses of the BDI-II with adult
amended and revised BDI by Beck et al. (1996b) out-patients with clinical depression and general
in a study of general psychiatric out-patients. The psychiatric out-patients showed that two factors
scale was self-administered. The mean BDI-II total were represented: somatic-affective and cognitive
score was about two points higher than with the dimensions (Beck et al. 1996a; Steer et al. 1999). A
original scale, and out-patients endorsed one more subsequent confirmatory factor analysis supported
symptom on the BDI-II than with the original. The a model in which the BDI-II was composed of
correlations of the BDI with sex, age, ethnicity, one underlying second-order dimension of self-
the Beck Anxiety Inventory and diagnosis of mood reported depression, which was composed of the
disorder were each within one point of the corre- two first-order factors representing cognitive and
lations of the BDI-II with the same variables. The non-cognitive symptoms (Steer et al. 1999). These
BDI-II has been shown to be able to discriminate results were confirmed in a study of primary care
between out-patients with a major depressive dis- patients by Arnau et al. (2001).
order and those with a dysthymic disorder (Ball and
Steer 2003), and out-patients with mild, moderate
Reliability
and severe major depressive episodes (DSM-IV
criteria) (Steer et al. 2001). A study of psychiatric Reliability for the original BDI was tested by the
in-patients aged 55 and over, suggested that the developers on 226 hospital out-patients and admis-
BDI-II can be used with clinically depressed sions and 183 patients in a replication group (all
geriatric in-patients (Steer et al. 2000). A study in adults). Internal consistency was tested using 200
Primary Care by Arnau et al. (2001) showed that of the cases. The score for each of the 21 items was
the BDI-II was associated with the Short Form-36 tested with the total BDI score for each person; all
questionnaire for measuring broader health status were associated at the P < 0.0001 level. Split-half
(including a mental health dimension), support- reliability was tested using 97 of the cases, the
ing its convergent validity. They reported that a correlation between the two halves of the scale was
receiver operating characteristic analysis demon- 0.86. Split-half reliability was also assessed using a
strated criterion-related validity: BDI scores pre- group of adults by Weckowicz et al. (1967); these
dicted a standardized primary care diagnosis of authors reported a lower figure of 0.53. Alternate
major depressive disorder, and indicated that it can forms of reliability were demonstrated by testing
also be used in primary care settings. Krefetz et al. the original 21-item version with the 13-item short
(2002) used the BDI-II in adolescent psychiatric form; correlations have ranged between 0.89 and
in-patients and reported that it correlated signifi- 0.97, suggesting that the short form can be sub-
cantly with the Reynolds Adolescent Depression stituted for the longer version (Beck et al. 1974),
Scale (RADS): r = 0.84, supporting its convergent although it should be cautioned that short form
validity with this group. The areas under the versions are usually less consistent. Smith et al.
receiver operating characteristic (ROC) curves for (2000) caution against short versions of scales.
the BDI-II and the RADS were 0.78 and 0.76 Beck et al. (1961) did not recommend con-
respectively. ventional test-retest methods in case people
The literature reviewed by Beck et al. (1988) also remembered their scores and biased the retest result
suggested that the original BDI amended and (short periods) or there were genuine changes in
revised scale represents one underlying general the intensity of depression over time (long period).
syndrome of depression, comprising three highly Thus they carried out test-retest correlations,
intercorrelated factors: negative attitudes to self or along with repeated psychiatric ratings in case
suicide, performance impairment and somatic dis- people remembered their scores or genuinely
turbance. When second-order factors are extracted, changed between testings. Consistent relationships
a single overall depression factor emerges (Beck between the instrument and clinical ratings were
et al. 1988; Shaver and Brennan 1991). Another reported, using 38 cases, at two- to five-week inter-
factor analysis indicated that it measures depressive vals. Reliability coefficients were above 0.90 (Beck
88 MEASURING HEALTH
et al. 1961; Beck 1970). Test-retest correlations at HOSPITAL ANXIETY AND DEPRESSION SCALE (HADS)
6 and 21 days apart were also carried out by
Gallagher et al. (1982) on a sample of 159 patients The Hospital Anxiety and Depression Scale was
and volunteers from a ‘senior centre’ in Los developed by Zigmond and Snaith (1983). It is a
Angeles. Test-retest correlations for the total brief assessment of anxiety and depression, con-
sample, the normal sub-sample and the depressed sisting of 14 items divided into two subscales for
sub-sample were: 0.90, 0.86 and 0.79 respectively. anxiety and depression, in which the patient rates
Groth-Marnat (1990) reported test-retest reliability each item on a four-point scale. As with all self-
ranged from 0.48 to 0.86 depending on the time assessment instruments, it is useful for screening, not
interval used. definitive diagnostic, purposes. The term ‘hospital’
Gallagher et al. (1982) also carried out tests in the title is misleading as many studies have
for the internal consistency of the scale and confirmed that it is valid when used in community
reported coefficient alphas of 0.91, 0.76 and 0.73 settings and primary health care (Snaith 2003).
for these three groups respectively, indicating Two common problems with questionnaires for
high internal consistency. The extensive review the detection of mood disorders are that scores are
of the literature by Beck et al. (1988) reported affected by the physical illness of the patient, and
that a meta-analysis of the BDI’s internal con- that there is insufficient distinction between one
sistency estimates yielded a mean coefficient mood disorder and another. Zigmond and Snaith
alpha of 0.86 for psychiatric patients and 0.81 for developed the Hospital Anxiety and Depression
non-psychiatric respondents. Lower item inter- Scale in partial response to this. The scale measures
correlations of 0.32 to 0.62 were reported by anhedonic depression which the authors take as the
Schwab et al. (1967) on the basis of a study of 153 best indicator of hypomelancholia and advised the
medical in-patients. prescription of anti-depressants for those with a
The developers of BDI-II tested the new items high score. Snaith (1987) also recommends the use
on 500 patients, and compared the item-option of a combined researcher-administered and self-
characteristic curves. The BDI-II showed improved assessment scale.
clinical sensitivity and the BDI-II had higher The authors purposefully excluded all items
internal consistency than the original BDI (BDI-II: relating to both emotional and physical disorder
coefficient alpha = 0.92) (Beck et al. 1996a). In (e.g. dizziness, headaches), and the items included
Beck et al.’s (1996b) comparative study, the co- in the HADS were based solely on the psychic
efficient alphas of the original and the BDI-II were symptoms of neurosis. They also aimed to dis-
0.89 and 0.91 respectively. Krefetz et al. (2002) used tinguish between the concepts of anxiety and
the BDI-II in adolescent psychiatric in-patients and depression. The seven items comprising the depres-
reported that the Cronbach’s alpha for BDI-II was sion subscale were based on the anhedonic state.
0.92. The authors justified this by reference to the
The BDI has long been regarded as the scale of evidence that this is probably the central psycho-
choice for researchers in the selection of depressed pathological feature of that form of depression
subjects from a larger population, although, as which responds well to anti-depressant drug treat-
with most scales, its accuracy depends on people’s ment and is therefore clinically useful information.
motivation to report their emotional state The seven items comprising the anxiety subscale
accurately (Stehouwer 1985). Reynolds and Gould were selected after study of the Present State Exami-
(1981) reported a correlation of 0.26 between nation and analysis of the psychic manifestations of
the BDI and the Crowne and Marlowe Social anxiety neurosis. Severity scales were developed,
Desirability Scale, suggesting that it suffers from with ratings from 0 to 3.
modest social desirability bias. This is likely to be
the case with self-report instruments in general.
Content
It has been used across the world, and the psycho-
metric properties of the translated BDI-II are being Unlike most other scales, the HADS is not derived
reported (e.g. in Spanish, by Penley et al. 2003). from factor analysis but from clinical experience. It
consists of two sections, with four-point response
scales. One section contains the seven items on 50 patients reported that the severity ratings correl-
depression and the other contains the seven items ated highly with psychiatric assessments (r = 0.70
on anxiety. The scale assesses emotional state over for depression and r = 0.74 for anxiety) (Zigmond
the ‘past week’. Examples of the scale are: and Snaith 1983; and see Snaith and Taylor 1985).
There was evidence that the anxiety and depression
I feel tense or ‘wound up’: items were tapping different dimensions. It was
Most of the time. easily understood and acceptable to patients.
A lot of the time.
Fallowfield et al. (1987) reported a good level of
From time to time, occasionally.
acceptability among general medical patients.
Worrying thoughts go through my mind: Aylard et al. (1987) reported correlations with other
A great deal of the time. well-known depression and anxiety scales ranging
A lot of the time. from 0.67 to 0.77 (Aylard et al. 1987). It has been
From time to time but not too often. reported to perform better than the General Health
Only occasionally.
Questionnaire (Goldberg 1978) in identifying cases
I feel as if I am slowed down: against the criterion of a psychiatric assessment
Nearly all the time. (Wilkinson and Barczak 1988).
Very often. It has been reported to be equal to the General
Sometimes. Health Questionnaire in ability to detect cases of
Not at all. minor psychiatric disorder (Lewis and Wessely
I get sudden feelings of panic: 1990). As a screening instrument, it has a sensitivity
Very often indeed. of 88 per cent (at a threshold score of 8), when
Quite often. compared with the Structural Clinical Interview
Not very often. for the DSM-III (Hamer et al. 1991). It was sensitive
Not at all. to change in a study of treatment of patients
(HADS Copyright © R.P. Snaith and A.S. Zigmond 1983, 1992, 1994. with neurotic disorders (Tyrer et al. 1988). Further
Record form items originally published in Acta Psychiatrica Scandi-
navica, Volume 67 (1983). Copyright © Munksgaard International
tests by Zigmond and Snaith (1983) showed that
Publishers Ltd., Copenhagen 1983. Extract reproduced by permission physically ill patients, who were not assessed as
of the publishers NFER-NELSON Publishing Company Ltd., Darville having mood disorder, had similar scores to the
House, 2 Oxford Road East, Windsor, Berkshire, SL4 1DF, UK. All rights
reserved.) normal sample and scale scores were judged not to
be affected by physical illness. In further support
Scoring of its construct validity, it has successfully dis-
criminated between people with and without
Individual items are scored from 0–3 to 3–0, epilepsy (Trueman and Duthie 1998) and between
depending on the direction of the item wording. patients with primary Sjögrens syndrome and
The item scores represent the degree of distress: arthritis patient controls (Valtysdottir et al. 2000).
none = 0, a little = 1, a lot = 2, unbearably = 3. It was sensitive to patients’ improvements in a
Items are summed. The higher scores indicate the randomized controlled trial (RCT) of behaviour
presence of problems. Using psychiatric diagnoses therapy (Evans et al. 1999). Le Fevre et al. (1999)
as a gold standard, HAD depression ratings of 7 or compared the HADS with the GHQ and reported
less were considered to be non-cases; scores of 8–10 that a combined score cut-off of 20 had a sensitivity
were considered doubtful cases; and scores of 11+ of 0.77 and a specificity of 0.85, with a positive
implies definite cases. Various cut-offs have been prediction value of 0.48. He also suggested that the
used by investigators, but the cut-off of 11+ appears anxiety and depression subscales should be scored
to be preferable in sorting cases from non-cases separately. Lloyd-Williams et al.’s (2001) study of
(Carroll et al. 1993). palliative care patients reported that the HAD scales
had low efficiency when used singly as a screening
Validity tool. They reported an optimum cut-off of 19,
The scales were tested for validity on over 100 which achieved a sensitivity of 0.68 and a specificity
psychiatric out-patients and hospital staff, with of 0.67, with a positive prediction value of 0.36.
good results (Zigmond and Snaith 1983). Tests on A factor analysis of the scale by Andersson (1993)
90 MEASURING HEALTH
reported that a two-factor solution did not split factory (the false positive/negative rates were
the items in the way originally intended, and a between 1 and 5 per cent). The severity ratings
four-factor solution with three interpreted factors correlated with psychiatric judgements, with the
gave a better solution. However, a factor analysis results for depression: 0.70, and for anxiety: 0.74.
by Moorey et al. (1991) did confirm the two-factor Correlations between items and interviewer ratings
structure of the scale, which proved stable when provided some evidence that the depression and
sub-samples as well as the total sample was analysed. anxiety items were tapping different mood dimen-
Martin et al. (2003) conducted a factor analysis sions, rather than the same thing, although some
of the scale with coronary care patients. They overlap is inevitable. It was apparently easily under-
reported that the underlying factor structure of the stood and completed by patients.
HADS comprised three distinct factors: anhedonia, The authors presented the HADS as a reliable
psychic anxiety and psychomotor agitation. The and valid instrument, although more work on its
factor structure requires further investigation with reliability and validity is still required.
different clinical groups.
The depression subscale has been reported to
have a reasonable specificity and sensitivity in the GOLDBERG’S GENERAL HEALTH QUESTIONNAIRE
Urdu language among Asian people living in (GHQ)
Britain (Nayani 1989), although the scale was
simply translated for this study and not suitably The General Health Questionnaire is the most
modified for the Asian subjects (see critique by widely applied self-completion measure of psychi-
Chaturvedi 1990). Subsequent research has tested atric disturbance in the UK and also has numerous
the translated scale’s conceptual equivalence and worldwide applications. A major advantage for
reported it to be satisfactory (Mumford et al. potential users of the GHQ is the existence of
1991). The scale has been used in several countries, periodically updated handbooks containing its
including the Netherlands, Sweden and Spain, with method, a comprehensive review of applications,
good results for reliability and validity (Berglund and studies of reliability and validity (Goldberg
et al. 2003; Kuijpers et al. 2003; Quintana et al. 1978; Goldberg and Williams 1988).
2003). The GHQ was developed in London during
the 1960s and 1970s and was intended for use in
general practice settings. It was derived from various
Reliability
scales, including the Cornell Medical Index. In
Attempts were made to overcome response bias in the construction of the GHQ the concept of psy-
the scale by alternating the order of responses so chiatric disorder was thought to be appropriate to
that at one item the first response indicates general practice settings.
maximum severity and at another item the last The GHQ is a screening questionnaire for
response indicates maximum severity. Four possible detecting independently verifiable forms of psychi-
responses were chosen to prevent people from atric illness and does not make clinical diagnoses.
opting for a middle grade. If these are necessary, a two-stage strategy must be
The authors tested internal consistency of the employed. It is not suitable for the assessment of
scale, using data from 50 patients. The correlations long-stage (chronic) problems, as it does not detect
for the anxiety items ranged from 0.41 to 0.76. The them. It is a pure state measure, assessing present
analysis of the depression items revealed one weak state in relation to usual state (this question wording
item which was removed from the scale (‘I am is not distortive as most people see their usual state
awake before I need to get up’), along with the as a normal state) (Goldberg and Williams 1988).
weakest of the anxiety items. The remaining The advantage of the GHQ is that it concentrates
depression items had correlations ranging from on broader components of psychiatric morbidity
0.30 to 0.60. Higher correlations were reported by (particularly anxiety and depression) and is designed
Moorey et al. (1991). to be self-administered. It does not attempt to
The criteria were then tested for reliability with detect mental subnormality, senile dementia or
a further 50 patients and results judged to be satis- mania (most of the people within these categories
would not be able to complete the questionnaire). It been developed. The 12-item version is apparently
was not intended to be used for the detection of as efficient as the 30-item version as a case detector.
functional psychoses (schizophrenia or psychotic Examples of questions, which all appear both in
depression), although these conditions are in fact the GHQ-30 and the GHQ-60, each of which
detected. A study of 111 acute geriatric medical relate to the past few weeks, are:
in-patients by O’Riordan et al. (1990) showed that
there were no significant differences on the GHQ Have you recently:
when dementia was the variable (e.g. between Been able to concentrate on whatever you’re doing?
normal depressed patients and demented depressed Better than usual, same as usual, less than usual, much
patients), although threshold scores did require less than usual.
raising.
Spent much time chatting with people?
One further advantage of using the GHQ is that More time than usual, about the same as usual, less time
several short-item versions exist (12, 20, 28, 30), than usual, much less than usual.
which, although slightly less valid and sensitive than
the long version (60 items), are more suitable than Felt on the whole you were doing things well?
the longer versions for use with older frail people. Better than usual, about the same, less well than usual,
much less well.
The 28-item version has an additional advantage
over the other versions in that it also permits analysis Been feeling unhappy and depressed?
by sub-categories; it was developed mainly for Not at all, no more than usual, rather more than usual,
research purposes (Goldberg and Hillier 1979). much more than usual.
There have been numerous applications of the Felt that life isn’t worth living?
GHQ in survey research and in clinical settings Not at all, no more than usual, rather more than usual,
(e.g. GPs’ surgeries). It has been used among 662 much more than usual.
people aged 85 and over living in London and was (Items extracted from the GHQ-30 Copyright © David Goldberg, 1978
found to be acceptable to respondents, although and GHQ-60 Copyright © David Goldberg and the Institute of
Psychiatry, 1978, by permission of the publishers, NFER-NELSON
some required assistance with completion due to Publishing Company Ltd., Darville House, 2 Oxford Road East,
poor sight and stiff finger joints (Bowling 1990). Windsor, Berkshire, SL4 1DF, UK. All rights reserved.)
The GHQ-30 is popularly used in large social
surveys and epidemiological research in the UK
Scoring
(Huppert and Garcia 1991; Stansfeld and Marmot
1992). The GHQ consists of a checklist of statements
Although it has been used extensively in the UK, asking respondents to compare their recent
there have also been many applications of the scale experience to their usual state on a 4-point scale
in other countries, particularly in the USA. The of severity. The scoring scale consists of 0 or 1 (the
30-item version, for example, was used in a psycho- 0–0–1–1 scoring scale, the scores following the
logical morbidity survey of 1,649 new adult sequence of response categories across the page
enrollers in a Health Maintenance Organization in from left to right, is the most commonly used).
the USA (Berwick et al. 1987). Although the GHQ Some items are negative, others are positive. The
is culture specific in development, it works well overall GHQ is the sum of the item scores. The
in other settings, e.g. among both white and black 0–0–1–1 scoring scale is simply a count of
residents in Philadelphia, in Calcutta, China, rural the symptoms and is the simplest and also avoids the
Iceland, Brazil and Australia (Tennant 1977; Marl problems of middle-user response bias. An alterna-
and Williams 1985; for review see Goldberg and tive is a Likert-type severity scoring system.
Williams 1988). It has been translated into at least Because of the nature of its response scale, the
38 languages. GHQ is likely to miss very long-standing disorders,
since respondents answer ‘same as usual’ (and thus
score zero) for symptoms they are experiencing
Content
and have been experiencing for a long time. How-
The original version of the GHQ consists of 60 ever, Goldberg and Williams (1988) point out that
items; shorter versions of 30, 28, 20 and 12 have also the loss of cases is minimal as many people cling
92 MEASURING HEALTH
to a concept of their ‘usual self’ as being without It is possible to compare the amount of psychi-
symptoms. They suggest including questions on atric disturbance in two populations by comparison
medication taking and whether the person thinks of the central tendency (mean, median) and dis-
he or she has a nervous illness to detect chronic persion (SD, inter-quartile range) of scores in
patients. Goodchild and Duncan Jones (1985) each population. Also a given population can be
argued that the response ‘no more than usual’ to an tested on different occasions to assess the changes in
item describing pathology should be treated as an psychiatric disturbance over time.
indicator of chronic illness rather than good health Since physically ill people score highly on the
as is conventional. They reported data showing that GHQ, it is not surprising that they are over-
scoring revised to reflect this (0–1–1–1) provided represented among false positives. Goldberg and
a better prediction of caseness, measured with the Williams suggest raising the thresholds for use
Present State Examination, than conventional with severe physical illnesses. Goldberg (1978), in
scoring, and was more stable in repeated testing. the initial manual for the administration of the
This consists of dividing the GHQ questions up GHQ, pointed to the necessity of manipulating
into items detecting caseness (negative, e.g. feeling the threshold score to enhance discrimination in
constantly under strain) and those indicating health different populations.
(positive, e.g. enjoying day-to-day activities). This
method assigns a score to those replying ‘same as
Validity
usual’ to any of the negative items (so the score for
negative items becomes 0111, and for positive items The principal components and item analysis used
0011). A further advantage is that the scores are during the development of the GHQ ensured that
more normally distributed and the scale is a more it has content validity; its construct validity was
sensitive indicator when used over time. Given the demonstrated in the principal components analysis
recent development of this method, Goldberg and which showed that there was a large general factor
Williams suggest that this method should be used in found in all the analyses reported. There is good
addition to previous methods, rather than instead evidence that assessments of the severity of psychi-
of. Surtees (1987) tested both methods of scoring in atric illness are directly proportional to the number
their longitudinal study of psychiatric disorder in of symptoms reported on the GHQ (Goldberg and
women. They reported that ROC analysis revealed Huxley 1980). The predictive validity of the GHQ
that both scoring methods discriminated affective in comparison with other well-known scaling tests
conditions from others, and there was no significant of depression is also good (Goldberg 1985; Williams
difference in their ability to do so. Goldberg and 1987).
Williams (1988) demonstrated that very similar Over 50 validity studies have been conducted of
results are obtained with the different scoring the GHQ. Vieweg and Hedlund (1983) reviewed
methods in existence, and little is gained by a Likert many of the earlier studies. Although not perfect, it
severity score. correlates well with psychiatric diagnoses of mor-
Threshold scores are defined as equivalent to the bidity and depression (Finlay Jones and Murphy
concept of ‘caseness’ that corresponds to the average 1979; Williams 1987). The GHQ-30 has been the
patient referred to psychiatrists. If the results of a most widely validated. For example, 29 such studies
population of GHQ scores are compared with inde- were reported by Goldberg and Williams (1988).
pendent psychiatric assessment, it is possible to state Correlations with other gold standards have
the number of symptoms where the probability that established the criterion validity of the GHQ. Using
an individual will be thought to be a case exceeds standardized psychiatric interviews as a gold
0.5. This is called a threshold score. The proportion standard, reported sensitivities range between
of respondents with scores above this threshold is 0.55 and 0.92, and specificities between 0.80 and
the probable prevalence of illness. Finlay Jones and 0.99, depending on the choice of threshold score
Murphy (1979) have shown that in order to identify (Vieweg and Hedlund 1983). Comparisons of
‘cases’ that correspond to standards derived from GHQ scores with a structured clinical interview
the Present State Examination, it is necessary to (e.g. the Present State Examination or the Clinical
raise the threshold score. Interview Schedule) report correlations from 0.45
to 0.83. The GHQ-60 had the highest correlations The GHQ is sensitive for transient disorders,
and the GHQ-30 the poorest (see Goldberg and detecting symptoms of at least two weeks’ duration.
Williams 1988 for review). It is as sensitive to depression disorders as any of the
Also a study in the USA found a correlation of specially designed depression scales (such as the
0.72 between the Beck Depression Index and the Beck or HAD) and detects anxiety disorders as well,
GHQ-30 (Cavanaugh 1983), and an Australian so it is suitable for use when the researcher wants a
study reported a correlation of 0.57 between the broader measure. Numerous surveys indicate that
Zung depression scales and the 30-item GHQ the GHQ is suitable for use with younger and older
(Henderson et al. 1981b). Bowling and Browne men and women in community and primary health
(1991), on the basis of surveys with 662 people aged care settings (Sims and Salmons 1975; Tarnopolsky
85 and over and almost 800 aged 65 and over, living et al. 1979; Benjamin et al. 1982; Cleary et al. 1982;
at home in London and in Essex, reported correl- Banks 1983; Hobbs et al. 1983, 1984; Goldberg and
ations with the GHQ-28 and Neugarten’s Life Williams 1988).
Satisfaction Scale A of 0.47 in each case. This is a Watson and Evans (1986) used the GHQ with
moderate correlation, reflecting the different dimen- a multi-cultural sample of mothers with young
sions of emotional well-being tapped by these two children. While they found that mothers’ GHQ
scales. Functional ability was also predictive of scores correlated well with interviewers’ ratings of
changes in GHQ-28 score in a follow-up study by the mothers’ distress, an analysis of the translation
this research team, supporting the scale’s discrimina- of the questionnaire into Bengali for the Bengali
tive ability (Bowling et al. 1992). In further support mothers did reveal some problems. The question-
of the discriminative ability of the GHQ, Bowling et naire was back-translated by an independent trans-
al. (2002), on the basis of a population survey of lator. There was a high level of agreement between
almost 1,000 people aged 65 and over, reported that one translation and the original, with the exception
the GHQ-12 was a significant independent pre- of the item: ‘Have you recently been feeling
dictor of self-rated quality of life. In support of its nervous and strung-up all the time?’ which was
convergent validity, Watson and Evans (1986), in translated as ‘Did you suffer from mental break-
their study of the health of a multi-cultural sample of down and mental anxiety?’ The other translation
mothers with young children in London’s East End, had problems with the item ‘Have you recently
reported that mothers’ GHQ scores correlated well been finding life a struggle all the time?’ which was
with interviewers’ ratings of the mothers’ distress. translated ‘Are you thinking yourself a struggler?’
The 30-item GHQ and the Nottingham Health They also found that not all items were suitable for
Profile (NHP) were administered to people suffer- assessing psychiatric disturbance in mothers with
ing from either migraine or arthritis by Jenkinson et young babies, e.g. ‘Have you recently been having
al. (1988). Correlations between the emotional- restless disturbed nights?’, ‘Have you been getting
reactions subsection of the NHP and the GHQ were out of the house as much as usual?’, ‘Have you spent
moderate (0.49) for both groups of patients. much time chatting with people?’ Because young
Although the GHQ was not developed as a pre- mothers are more likely to have false positive scores
dictive tool, some studies have reported findings with these items, the authors recommended raising
demonstrating predictive validity, although two the threshold score, using the 30-item version,
studies have reported negative results. Criterion from 4 or 5 to 8. There can be other problems in
validity was established using health services as the administration which may affect reliability. Very
criterion. Those with the highest GHQ scores have elderly people with failing eyesight and arthritic
been reported to have the highest use of services fingers may have difficulty completing the GHQ
(e.g. general practitioner services) (Goldberg and independently and will require varying degrees of
Williams 1988). Berwick et al. (1987) provided fur- assistance from an interviewer (e.g. reading out the
ther evidence of the predictive validity of the GHQ: items or recording responses) (Bowling 1990).
elevations of GHQ scores, over two administrations Berwick et al. (1987) carried out a further factor
at seven-month intervals, were strongly associated analysis of responses to the GHQ which disclosed
with the probability of both mental-health and non- six factors (anxiety/strain, confidence, depression,
mental-health care within 12 months of enrolment. energy, social function and insomnia) and a strong
94 MEASURING HEALTH
tendency for items of similar wording (positive feelings of anxiety. It enables the investigator to
phrasing) to cluster together. Ohta et al. (1995) distinguish between the two forms of anxiety: state
tested the factor structure of the GHQ-30 in a (temporary or transitory feelings of fear or worry)
population survey of 1,216 people in Japan. They and trait (dispositional or long-standing tendency to
identified eight factors: depression, anxiety and ten- respond anxiously to stressful situations, or prone-
sion, anergia, interpersonal dysfunction, difficulty in ness) (Chaplin 1984).
coping, insomnia, anhedonia and social avoidance. The pre-1983 versions of the STAI are known
Huppert et al. (1989) also tested the factor structure as Y1 and Y2 and the revised 1983 versions are
of the GHQ-30 in a population survey of 6,317 known as X1 and X2. Marteau and Bekker (1992)
people in Britain, using ten random samples of developed a six-item state scale of the STAI which
600 adults each and 12 age–sex groupings. They performed as well as the longer version. The STAI
reported that the factor structure was highly con- was developed from an item pool of 177 questions
sistent, and represented fewer (five) distinct factors: taken from existing anxiety scales. These were
anxiety, feelings of incompetence, depression, subjected to various tests to ensure acceptable levels
difficulty in coping, and social dysfunction. of consistency and content. This was repeated in
order to develop the items for ‘state’ and the items
Reliability for ‘trait’. Extensive testing of items was carried out,
largely on college students (n = 5,000). The final
Split-half and test-retest correlations have been version was subjected to item and factor analyses.
carried out on the GHQ with good results. Split- The essential qualities measured by the STAI are
half reliability has been carried out with 853 com- feelings of apprehension, tension, nervousness and
pleted questionnaires, and the correlation achieved worry.
was 0.95. Internal consistency, using Cronbach’s
alpha, has been reported in a range of studies with
correlations ranging from 0.77 to −0.93. Bowling Content
and Gabriel (2004) reported that Cronbach’s alpha, The STAI consists of 20 items for measuring trait
based on their national sample of people aged 65 anxiety and 20 items for measuring state anxiety
and over, was 0.83. Test-retest reliability correlations (Spielberger et al. 1970, 1983). The STAI is printed
have been reported ranging from 0.51 to 0.90, on a single sheet, with the state-anxiety scale on one
the correlations being higher with clinically defined side and the trait-anxiety scale on the other. Each
groups with a high prevalence of disorder (Gold- state item is rated on a four-point intensity scale,
berg and Williams 1988). The internal consistency from ‘not at all’ to ‘very much so’. Respondents are
of the Polish version of the GHQ-12 was tested by asked to blacken the circle of the appropriate
Makowska and Merecz (2000) in over 2,500 response to indicate how they feel ‘right now’. Each
employees and the GHQ-28 was tested in over trait item is rated on a four-point frequency scale,
1,000 employees. The Cronbach alphas of both from ‘almost never’ to ‘almost always’. The instru-
were high and comparable to the results of the ment is self-administered, and takes less than ten
early studies of the GHQ’s internal consistency minutes to complete. The state-anxiety scale should
(0.859 and 0.934 respectively). The correlation for be administered first to avoid bias from the anxiety
test-retest reliability was 0.70. A problem posed arising from test conditions.
by test-retest reliability is one of distinguishing
between true change and unreliability. Goldberg
and Williams state that the definitive test-retest Scoring
reliability study of the GHQ remains to be done. The STAI has two scores. One score reflects the
current level of state anxiety with scores between
STATE–TRAIT ANXIETY INVENTORY (STAI) 20 and 80; the other score indicates the current level
of trait anxiety, and also ranges from 20 to 80. High
The STAI was developed in the 1960s, and revised scores reflect greater levels of, or more, anxiety.
in 1983 (Spielberger et al. 1983). It measures in- The items are simply summed to obtain the scores,
built tendency to anxious response and current although the coding of some items requires prior
reversal. A manual is available for purchase, and this schizophrenia, anxiety reaction); and, in support of
includes a computer program for analysis written in its construct validity, the STAI-state showed higher
Statistical Analysis Systems (SAS). mean values in stressful situations than in neutral or
relaxed situations (Spielberger et al. 1983; Chaplin
Children 1984). Novy et al. (1993), in a study of 285 people
in anxiety-provoking and stressful situations associ-
There is a version of the STAI for children, which ated with pain, reported that the STAI correlated
is regarded as one of the best available for research strongly with different depression scales, including
purposes (Walker and Kaufman 1984). It has the the Beck Depression Inventory (Beck et al. 1961).
same 20:20 item format for trait and state as the In a study of 180 randomly sampled people
adult version, and has good results for reliability living in Stockholm, Forsberg and Bjorvell (1993)
and validity; it does depend on reading ability reported that the STAI-state was significantly and
(Spielberger et al. 1973, 1983; Walker and Kaufman negatively associated with reported health symp-
1984). One example of a state scale item involves toms and the Rand Health Perceptions Battery.
asking children about the presence of feeling As would be expected, the better the health and
upset (‘I feel . . . very upset/upset/not upset’), and perceptions of health, the lower the rated anxiety.
an example of a trait scale item is the indication In a review of the scale evidence, Chaplin (1984)
of frequency of occurrence of behaviours such as concludes that the measure of state anxiety is
sweaty hands (‘My hands get sweaty’) and worry stronger (in terms of validity) than the measure of
(‘I worry about school’). trait anxiety. Kvaal et al. (2001) warned against
Examples from the STAI: its use with very elderly patients, as their study of
Form Y1
geriatric in-patients showed that a high STAI state
subscore was biased in this population due to
I feel calm confounding by reduced well-being.
I am tense
I feel upset
I feel self-confident Reliability
I am worried
The alpha coefficient has been reported to be high
Not at all (1) / somewhat (2) / moderately so (3) / very for both state (r = 0.93) and trait (r = 0.90) anxiety,
much so (4) indicating internal consistency. Test-retest corre-
Form Y2 lations with college students at intervals from 1 hour
to 104 days showed stability for the trait scale (0.65–
I feel pleasant 0.86), but less so for the state scale (0.16–0.62),
I feel satisfied with myself
I worry too much over something that really doesn’t
although lower repeatability of the state scale would
matter be expected as it measures responses to transient
I am happy situations (Spielberger et al. 1983). The developers
Some unimportant thought runs through my mind and reported that factor analyses with 2,000 students
bothers me and Air Force recruits confirmed the scale’s
homogeneity.
Almost never (1) / sometimes (2) / often (3) / almost
always (4)
The STAI is one of the most widely used
measures of anxiety in psychological and clinical
research. It was reviewed by Andrews et al. (1994).
Validity
It can be previewed briefly before deciding to
Testing of the original scale for construct validity purchase on http://www.mindgarden.com
against other anxiety scales produced correlations of
between 0.52 and 0.80. It has been successfully used
in many clinical studies, and it has been reported THE GERIATRIC DEPRESSION SCALE (GDS)
to correlate well with other tests of personality. The
scale is able to distinguish between normal adults The Geriatric Depression Scale was developed as a
and different groups of psychiatric patients (e.g. screening instrument in a clinical setting to facilitate
96 MEASURING HEALTH
assessment of depression in older adults (Yesavage Validity
et al. 1983; Sheikh et al. 1991). Scales for assessing It has been reported to perform comparably with
depression in younger people are not necessarily the Hamilton Depression Rating Scale in dis-
suitable for use with older people because some criminating between patients with different
symptoms indicative of depression may occur as a levels of depression, according to the Research
result of physical illness, resulting in false positive Diagnostic Criteria (Yesavage et al. 1983). It has also
cases if they are included. The GDS does not been reported to correlate significantly with the
include somatic symptoms. Hamilton Depression Rating Scale (0.62–0.81)
It is suitable for assessing severity, and therefore (Lyons et al. 1989), and with the Beck Depression
for monitoring the outcome of treatment. A short Inventory (0.85) (Norris et al. 1987). The sensitivity
form was also developed (Sheikh and Yesavage of the GDS has generally been reported to be high,
1986). It can be used in clinical and normal popula- and the specificity somewhat lower, although this
tions, in both community, hospital and residential has varied between studies (Brink et al. 1983;
settings. It is a self-rating scale but is recommended Koenig et al. 1988; Agrell and Dehlin 1989).
by the developers to be administered orally by an The validity of the short version was assessed
interviewer on the grounds that cognitive problems with good results against psychiatric diagnoses and
can effect the accuracy of self-reported problems. It against other validated scales among people living
is simple to administer, and takes about ten minutes. in residential care in London (Richardson and
Hammond 1996); it approaches that of the original
Content 30-item scale (Van-Marwijk et al. 1995). It was
The GDS was developed from a pool of 100 items used in a large epidemiological survey of 14,545
generated by clinicians and researchers. The 30 people aged 75 and over in the UK and it was
items which formed the original GDS were reported that, as would be expected, women were
selected on grounds of their high item-total significantly more likely to score over all thresholds
correlations. The GDS has a dichotomous Yes/No than men (Osborn et al. 2002). Alden et al. (1989)
response choice for each item. Later versions reported a correlation between the long and short
retained the most powerful items. The most com- versions of the GDS of 0.66, while Sheikh and
monly used version contains 15 items (GDS-15). Yesavage (1986) reported this correlation to be
The time frame is feelings over the past week. 0.84.
Examples of items in the 15-item version are:
Reliability
1 Are you basically satisfied with your life? Yesavage et al. (1983), in a study of normal and
3 Do you feel happy most of the time?
depressed older people, reported the coefficient
5 Do you feel that life is empty?
6 Do you often get bored?
alpha to be 0.94, although other investigators have
9 Do you feel helpless? reported slightly lower coefficients (Lesher 1986;
14 Do you feel that your situation is hopeless? Agrell and Dehlin 1989; Lyons et al. 1989). In a
sample of nursing-home residents, Brink et al.
(1983) reported the test-retest reliability of the
Scoring
GDS to be between 0.85 and 0.98 at 10 to 12 days,
Responses indicating depression are scored as 1, and and inter-rater reliability was 0.85. Richardson
no depression as 0, and the 1s are summed to form and Hammond (1996) also reported the test-retest
a total score. Items are treated as equal in weight. reliability of the short version to be good among
The scores on the original 30-item scale are usually people living in residential care in London. Some
interpreted as 0–9 = no depression; 10–19 = mild studies have reported that the GDS is less psycho-
depressive; and 20–30 = severe depressive. The metrically sound when used with cognitively
scores on the short 15-item version are: 0–4 = impaired people (Kafonek et al. 1989), although
no depression, 5–9 mild depression, 10+ severe Sutcliffe et al. (2000) reported that the internal
depression. Some variations on the scale score reliability of the shorter GDS-15 was highest if
interpretation exist (usually by one point). three items were removed, and that the resulting 12
items were a suitable and reliable tool for use with The GMS is a well-validated and tested scale but
people in long-term care homes. is extremely lengthy, although there is a shorter
The scale is in the public domain and has been community version available (GMSA). The short
translated into over 20 languages. In a friendly version was developed on the basis of data from a
website on the GDS the scale developers provide random sample of 396 elderly people living at
the contact addresses for translated versions, but home in London. There is also an even shorter
caution that they cannot vouch for their accuracy version of this: the SHORT-CARE, which covers
(http://www.stanford.edu/~yesavage/GDS.html). only dementia, depression and disability (Gurland
The short version is also available on http:// et al. 1984). The full length GMS is frequently
www.jr2.ox.ac.uk/geratol/GDSdoc.htm used as a gold standard against which to assess other
scales of mental state; it has been chosen as the prin-
cipal instrument in the three Medical Research
THE GERIATRIC MENTAL STATE (GMS) Council multi-site studies in the UK and in three
similar studies initiated by the World Health and
The problem of recognizing depression in elderly Pan-American Health Organizations (Copeland
people is exacerbated because it may present with 1990).
features similar to dementia; the two conditions The full GMS takes between 30 and 45 minutes
may also coexist. This has led to many researchers in to administer. The CARE, which has 1,500 items
psychiatry turning to instruments of high validity concerned with expressed physical symptoms, social
and reliability in relation to the elderly (Gurland problems and the use of services, takes between
1980; Henderson et al. 1983). These include the one-and-a-half to two hours. The short community
Geriatric Mental State Examination and the Com- version (GMSA) takes from 15–20 minutes.
prehensive Assessment and Referral Evaluation
(CARE) interview schedule, or its short form.
Content
The Geriatric Mental State (GMS) was
developed by Copeland et al. (1976) from British The GMS consists of standard tests and inter-
and US instruments: the Present State Examination viewer ratings of the degree of confidence in the
(PSE) (Wing et al. 1974), and the Psychiatric Status information. The confusion items are standard
Schedule (Spitzer et al. 1970). A version suitable for (e.g. checks on day of week). Examples of other
use in community settings was developed, and items are:
used within a broader semi-structured interview
schedule called the Comprehensive Assessment and Guilt
Referral Evaluation (CARE) interview (Gurland Do you tend to blame yourself for anything or feel guilty
et al. 1983; Copeland et al. 1987a, 1987b). The about anything? What?
CARE interview is broad and covers psychiatric, Code: Obvious self-blame over past and present
medical, nutritional, economic and social problems. peccadilloes.
Mentions regrets about past which may or may not be
Specialized interviewer training is required for
justifiable.
CARE and for the GMS.
The psychiatric diagnosis derived from the GMS Irritability
has since been standardized by the application of Have you been more irritable (angry) lately?
computer methods and called AGECAT (Dewey Code: Admits to irritability (anger).
and Copeland 1986). AGECAT was used on a Concentration
sample of general practitioners’ elderly patients in Can you concentrate on a television (radio, film) pro-
Liverpool and produced a diagnosis of depressive gramme? Can you watch it (listen to it) all the way
neurosis in 8 per cent of cases, and of depressive through?
psychosis in 3 per cent (total 11 per cent). If Code: Difficulty in concentrating on entertainment.
marginal cases were included, the incidence of mild Perceptual distortion
depression was 22 per cent which was close to the Is something odd (strange) going on which you cannot
20 per cent found in the early study in Newcastle explain?
(Kay et al. 1964; Copeland et al. 1987a, 1987b). Code: Puzzled by something odd going on.
98 MEASURING HEALTH
Scoring In the Australian study, Henderson et al. (1983)
Detailed instructions on the rating, coding and reported satisfactory inter-rater reliability of the
scoring procedures, which are fairly lengthy, are shortened version with average correlations of 0.56
available from the authors in the Department of and 0.84.
Psychiatry at the University of Liverpool. Although lengthy, the GMS is now established as
one of the most commonly used mental health
assessments for older people (Copeland et al. 2002).
Validity Since 1976 the GMS has been translated into French,
The full CARE interview was tested against Spanish, German, Danish, Dutch and Icelandic.
psychiatric judgements on 396 people in Greater
London, with high overall agreement: agreement SHORT MENTAL-CONFUSION SCALES
on depression and dementia reached 88 per cent
of cases (Cohen’s kappa for overall agreement was Mental impairment is of such frequent occurrence
0.7) and AGECAT (a computerized diagnostic among very elderly populations that a method of
system) was judged to provide a reasonable diag- assessment during survey and evaluative research in
nosis (Copeland et al. 1986). Other studies by the this population group is now essential.
authors have also reported good levels of agreement Mental confusion tests generally have a fairly
(Copeland et al. 1987a, 1987b). It has been shown similar content and vary in length and complexity.
to be sensitive to change over time, and agreement There are several short ten-item tests in use world-
with clinical diagnosis is good (Copeland and wide (see Measuring Disease (Bowling 2001) for a
Gurland 1978). The GMS, has been the subject broader review). Each is culturally specific, although
of extensive validity studies with good results adaptations are minor (e.g. US scales ask about the
(Copeland et al. 2002). name of the last President; UK scales ask about
The authors of the instrument have also pub- the name of the last Prime Minister or name of the
lished a number of papers and reviews based on Queen).
analyses of their studies using the GMS (Copeland
et al. 1986, 2002; Dewey and Copeland 1986;
Copeland et al. 1987a, 1987b, 1988; Davidson et al. THE MENTAL STATUS QUESTIONNAIRE (MSQ)
1988; McWilliam et al. 1988; Sullivan et al. 1988;
Saunders et al. 1989). Copeland et al. (2002) admit The Mental Status Questionnaire is a concise
that the many studies of the GMS and CARE have measure of orientation in time and place, memory
also exposed weaknesses which require addressing. and knowledge (Kahn et al. 1960a, 1960b). The
For example, the over-diagnosis of organic states items were drawn from standard mental-status
in populations with poorly developed education examinations and clinical experience. The final
systems, which requires the development of more version contains ten items. This was also modified in
culture-specific instruments. the USA by Pfeiffer (1975) and known as the Short
Portable MSQ. It contains ten items, five of which
are identical to five in the MSQ and five other
Reliability
items were also selected; the scoring procedures are
The full GMS has been shown to be reliable between also different to the MSQ.
trained psychiatric interviewers (Copeland and The MSQ has been popular, and widely used in
Gurland 1978). Reliability studies were initially both community and clinical populations. It was
undertaken in London and New York, and in successfully used with older people living at home
Australia and Liverpool with the shorter versions. in the USA by Cornoni-Huntley et al. (1985) and
Inter-rater reliability was 69 per cent for the first has been judged to be a useful measure in insti-
study but only 38 per cent in the second, although tutional settings (Ebmeier et al. 1988). Wilson
partial agreement was much higher at between and Brass (1973) reported that it could be easily
80 and 85 per cent. No significant differences in administered to 90 per cent of geriatric-ward
the number of positive ratings made between patients. They indicate that half the questions could
interviewers in the USA or the UK were found. be asked without the patient knowing that he or she
is being tested, and the other half can follow after Isaacs and Walkey (1964) and found the MSQ to
brief explanation: ‘How is your memory? I would be less sensitive than the latter. Fillenbaum (1980)
like to test it.’ They reported that it rarely provoked tested the discriminative ability of the MSQ among
anxiety or embarrassment. It can be given without people with psychiatric diagnoses of organic brain
causing fatigue in the very ill. It is usually inter- syndrome and a normal population. Sensitivity was
viewer administered. fairly low at 45–55 per cent, while specificity was
In an analysis of the ten-item MSQ, Wilson and high at 96–98 per cent. Fillenbaum also reported
Brass (1973) found that the first item (‘Name of two studies which showed correlations of 0.88 and
town’) could be excluded, as it contributed little 0.97 between the MSQ and the Short portable
to discrimination. Only the remaining nine items MSQ. Wilson and Brass judged the MSQ to be a
are usually used, and it has become the norm to powerful measure for detecting and quantifying
refer to the scale as having nine items. The authors mental impairment.
caution its use with people who are deaf and sug-
gest writing down the questions, and acknowledged
Reliability
that it cannot be used with dysphasic patients.
However, here are several versions of the MSQ, and Kahn et al. (1960a, 1960b) reported reliability to be
different cut-off points are in use. satisfactory. In a series of 55 cases, selected for the
likely stability of their physical condition, the MSQ
Content was administered four times at three-week intervals:
The nine questions are: three-quarters of the scores changed by one or less
(Wilson et al. 1973). The association between the
Address/name of place? five items: town, place, age, month born, year born
Today’s date (error of three days on either side of correct and the five test items on the prime minister’s name
date allowed)? and current date, correlated fairly strongly; r = 0.68
Month? (Wilson and Brass 1973).
Year?
Age?
Year of birth? THE ABBREVIATED MENTAL TEST SCORE (AMTS)
Month of birth?
Name of prime minister? The Abbreviated Mental Test Score (AMTS) was
Name of previous prime minister?
developed by Hodkinson (1972) from the Modified
Roth Hopkins Test (Blessed et al. 1968; Thompson
Scoring
and Blessed 1987). The most discriminating items
The number of correct cases are scored (correct = 1; were incorporated. There was no evidence that
false = 0). There is no accepted cut-off (for normal/ patients’ performance improved with practice.
abnormal), scores of 3 or over are taken to indicate
impairment although opinions vary over scores
Content
indicating severity (Zarit et al. 1978).
The test contains ten items. These cover the
Validity following:
Kahn et al. (1960b), on the basis of 1,066 patients in
homes for the aged, nursing homes and state mental Age
hospitals in New York City, reported that the MSQ Time (nearest hour)
Year
was highly associated with psychiatrists’ evaluations
Name of place
of the presence and degree of chronic brain syn- Recognition of two persons
drome. However, the authors simply reported Birthday (date and month)
percentage differences without significance values. Date of World War I
Milne et al. (1972), in a longitudinal survey of Queen’s name
mental health, compared results from the MSQ with Counting 20–1 backwards
the Mental Impairment Measurement, designed by Five-minute recall: full street address
100 MEASURING HEALTH
Scoring the predictive efficiency of the AMT was 79 per
Scores are totalled. Each correct item scores 1; maxi- cent.
mum score: 10. The cut-offs for scoring (normal/ Jitapunkel et al. (1991) developed a short version
abnormal) have varied between studies. Jitapunkul of the AMTS (AMT7). Its validity, internal con-
et al. (1991) briefly reviewed the literature and sistency and coverage of the relevant domains was
reported that investigators have used cut-offs in the comparable to the AMTS but it had a slightly
range of <6 to <10 (a score of 10 = cognitively higher sensitivity (81 per cent) (with acceptable
normal). They carried out a study of 168 acutely-ill specificity: 85 per cent) than the full AMTS.
patients admitted to a ward for the care of the Cronbach’s alpha, based on the internal consistency
elderly and validated the AMTS against clinical of the AMT7, was 0.85 (it was 0.89 for the full
diagnoses plus medical records (based on DSM- version). The proportion of patients correctly
IIIR criteria) (tested on admission and one week classified was 89.9 (91.1 per cent for the ten-item
later), and reported that the best cut-off was 8. test). Swain and Nightingale (1997) tested a four-
item version of the AMT (age, date of birth,
place and year) among elderly patients seen on
Validity domiciliary visits, with impaired cognition indi-
It has been further tested against the Crichton Royal cated by a score of less than four. This correlated
Behaviour Rating Scale, and clinical diagnosis highly with the ten-item AMT (0.90), and had little
of dementia by Vardon and Blessed (1986), using loss of efficiency (predictive efficiency: 91 per cent).
99 residents of homes for the elderly, with a mean Antonelli Incalzi et al. (2003) tested the AMT on
age of 82.7. The authors reported that the AMTS over 2,000 acute geriatric in-patients in Italy. They
does reveal significant cognitive decline which is reported that a principal components analysis
characteristic of dementias: over 80 per cent of isolated two components of the AMT: orientation
residents allocated a clinical diagnosis of dementia to time and space (which explained 45 per cent of
scored 55 per cent or less correct answers on testing. AMT score variance) and memory and attention
It produced scores closely similar to those obtained (which explained 13 per cent of AMT variance).
on the Modified Roth Hopkins Test. It has also They reported that AMT scores of more than 6
been tested against a 37-item Roth Hopkins test by reliably ruled out dementia, but scores of less than 7
Thompson and Blessed (1987). These authors based required a second-level cognitive assessment.
their research on 52 mentally ill psycho-geriatric
day-care patients (mean age: 75). They reported
Reliability
that the ten-item test was better tolerated. The
functionally ill scored consistently better than the Its performance has been assessed in institutional
organically ill. Correlations with longer mental- and community settings, with good results for inter-
status scales were 0.91 to 0.96 which were com- observer reliability and repeatability (Qureshi and
parable with those reported by Qureshi and Hodkinson 1974; Vardon and Blessed 1986; Little
Hodkinson (1974) for institutionalized patients (the et al. 1987; Thompson and Blessed 1987), although
latter reported correlations with longer mental-test there are few published data on its reliability.
scales of 0.87 to 0.96). Swain et al. (1999), in a study The AMT is popularly used in descriptive
of patients admitted to an elderly medicine unit, (Hamilton et al. 2000) and evaluative research, and
reported that cognitive state as determined by the has been reported to be a useful measure across
MSE was significantly associated with that deter- Europe (e.g. Sarasqueta et al. 2001; Zuccala et al.
mined by the Mini-Mental State Examination, and 2003).
6
MEASURING SOCIAL
NETWORKS AND
SOCIAL SUPPORT
SOCIAL-NETWORK ANALYSIS 1986; and see reviews by Bowling 1991, 1994;

Bowling and Grundy 1998), which may partly
The largest body of empirical research on the pre- reflect the lack of standardization in choice of
dictors of well-being has focused on the structure, measurement instruments.
functioning and supportiveness of human relation- Social networks and social support are two dif-
ships, the social context in which people live, and ferent concepts. Network analysis was originally
their integration within society. The emphasis on developed by sociologists and social anthropolo-
social health is supported by research on the pub- gists, although recent methodological advances
lic’s priorities in life, which has reported that social have been due to the increasing involvement of
relationships and activities are one of the most social psychologists in this area (Mitchell and Trick-
important areas of life nominated by the public, ett 1980). Sociologists believe that the charac-
and a main area that gives quality to life to people teristics of the network have some explanatory
aged 65 and over (Bowling 1995; Farquhar 1995b; power of the social behaviour of the people
Bowling and Windsor 2001; Bowling et al. 2003). involved (Mitchell 1969). The framework most
Network models have been employed in numer- applicable to the study of social support derives
ous areas of sociology, psychology and anthro- from the theory of social networks. This describes
pology. In research on health, lack of social support, transactions among individuals. Each individual is
participation and contact has been associated with a node in the network and each exchange a link.
increased mortality risk, delayed recovery from dis- Networks are defined as the web of identified
ease, poor morale and mental health (e.g. Lowenthal social relationships that surround an individual
and Haven 1968; Berkman and Syme 1979; Lin and the characteristics of those linkages. It is the set
et al. 1979; Blazer 1982; House et al. 1982; Welin of people with whom one maintains contact and
1985; Bowling and Charlton 1987; Cohen et al. has some form of social bond. Social contacts and
1987, 2000b; Maes et al. 1987; Orth-Gomèr and relationships are important ways for the individual
Johnson 1987a; Seeman et al. 1987; Kaplan et al. to influence the environment and provide path-
1988; Sugisawa et al. 1994; Oman and Reed 1998). ways through which the environment influences
The evidence that social support is beneficial to the individual (Saronson et al. 1977).
health is considerable (Olsen 1992; Stansfeld 1999), The importance of social networks, and their
and there is some evidence that this benefit remains characteristics, lies in the extent to which they fulfil
influential in very old age (Grundy et al. 1996). members’ needs. Their functions can be sum-
However, not everyone fits identical patterns, and marized as ‘that set of personal contacts through
research is also contradictory (Schoenbach et al. which the individual maintains his social identity,
and receives emotional support, material aid, ser- Composition and member homogeneity: friend, neigh-
vices, information and new social contacts’ (Walker bour, children, sibling, other relatives; similarities
et al. 1977). House (1981) has suggested that social between members (age, socio-economic status,
support involves emotional concern (liking, love); etc.).
instrumental aid (services); information (about Frequency of contact between members
environment) and appraisal (information for self- Strength of ties: degree of intimacy, reciprocity,
evaluation). One approach to defining social sup- expectation of durability and availability, emo-
port proceeds from a consideration of its source, tional intensity.
such as who provides it; the functions it serves for Social participation: involvement in social, political,
people (e.g. material aid); and the intimacy charac- educational, church, other activities.
teristics of the relationship (e.g. whether it is a con- Social anchorage: years of residence in, and famili-
fiding relationship) (Tolsdorf 1976). Thus social arity with, neighbourhood, involvement in
support can be defined as the interactive process in community.
which emotional, instrumental or financial aid is
obtained from one’s social network. Cobb (1976) These structural characteristics of the network will
defines social support as ‘information leading the influence the availability of instrumental and
subject to believe that he is cared for and loved, emotional support, its adequacy, satisfactions with,
esteemed, and a member of a network of mutual and perceptions of the network and support/aid
obligations’. Thus support exists only if it leads obtained.
to certain beliefs in the recipient. Thoits (1982) Networks can thus be operationally defined in
expanded this model to include instrumental aid. terms of size, geographic dispersion, strength of
Despite several attempts to conceptualize social ties, density/integration, composition and member
support, no agreement has yet been reached. homogeneity (Mitchell 1969; Craven and Wellman
Several characteristics of networks appear rele- 1974; Walker et al. 1977).
vant in terms of support. First, people must have These structural characteristics are useful in calcu-
connections with other people (network) in order lating the number and distribution of relationships
to receive social support, but social connections within a network and their degree of connection.
do not guarantee access to social support. Finch The emerging patterns can then be studied in rela-
(1989) also stressed the importance of the type of tion to the particular life situation of the individual.
genealogical relationship, the past pattern of It can be hypothesized that different types of net-
social exchanges, the balance of dependence and work structure have differing degrees of signifi-
independence in the relationship, timing in life, cance depending on the nature of need to be met.
and the quality of the relationship. Sarason et al. ‘Dynamic features of the social network’ refers to
(1994) argued that there is a need for theories of the positive or negative nature of network inter-
social support to incorporate the complexity of actions. Analyses need to take account of the nature
those situational, interpersonal and intrapersonal of human emotions involved. The size of the net-
processes that shape people’s perceptions of their work and calculations of frequency of contact
interactions with others. Other relevant dimensions between members are of little value if these inter-
are: actions are negative and stressful. It is also possible
that the existence of a single confidant(e) is of
Size: the number of people maintaining social greater value in terms of meeting an individual’s
contact; this can include those who are only emotional needs than a larger number of more
called on when needed. superficial friendships. The individual’s subjective
Geographic dispersion: networks vary from those ‘view’ of the network takes into account the mean-
confined by a household, to those in a single ing of relationships and the strength of affectional
neighbourhood, and those that are more widely ties. A different but related dimension is the concept
dispersed. Transport facilities may influence fre- of loneliness. This can be a consequence of per-
quency of contact. ceived or actual poor emotional or social support.
Density/integration: the extent to which network This concept, with the measurement of loneliness,
members are in each other’s networks. is discussed at the end of this chapter.
MEASURING SOCIAL NETWORKS AND SOCIAL SUPPORT 103
SOCIAL CAPITAL (e.g. high levels of interpersonal trust and mutual
aid) (Putnam 1995, 2000; Kawachi and Berkman
There is increasing interest among health re- 2000). It thus refers to the extent to which com-
searchers and social epidemiologists in broadening munities offer members opportunities, through
social measures to include social capital in order to active involvement in social activities, voluntary
study its effects on health status and mortality. work, group membership, leisure and recreation
While the social deprivation of areas has been facilities, political activism and educational facilities,
studied in relation to morbidity and mortality (Yen to increase their personal resources (i.e. their social
and Kaplan 1999), broader social capital is still a capital) (Coleman 1984; Putnam 1995; Brissette
disputed concept. The definition and measurement et al. 2000). Stocks of social capital were said to be
of social capital are still evolving. It is generally used cumulative, as collaborations build and extend social
to refer to the collective value of all formal and connections.
informal social networks. Investigators have used High levels of social capital have been reported to
a conceptual mixture of indicators of both the be independently associated with lower mortality
structure and function of social relations (such as rates and also with better self-rated health and
community membership (structure)), and the moral functional status, and with better quality of life
resources of trust, bonding, information flows and (Kawachi et al. 1997a, 1997b, 1999; Grundy and
cooperation between people, and reciprocity Bowling 1999; Kawachi and Berkman 2000; Ross
(function, or by product of the function) (Putnam and Mirowsky 2001; Bowling et al. 2003; Hyyppa
1995, 2000; Coulthard et al. 2001). The latter are and Maki 2003). Although there is an increasing
said to be influenced, and fostered, by the avail- amount of research on the stock of social capital by
ability and type of societal, environmental and the socio-demographic characteristic of individuals
neighbourhood (community) facilities and (Li 2003; Coulthard et al. 2001), relatively few
resources (i.e. which enable group membership, investigators have fully explored the independent
community and civic engagement). In theory, associations between social capital and physical
these are all factors that can effect both perceived health, mortality, psychological well-being or
autonomy and self-actualization; they can also act quality of life (Brissette et al. 2000). This requires
to improve the health, wealth and industry of a the careful controlling of socio-demographic
community (Putnam 2000). variables, and the use of multi-level analysis.
More specifically, social cohesion can be defined Measures of broader social capital have included
as the connectedness and solidarity between groups objective indicators of indices of crime, pollution,
of people (Kawachi and Berkman 2000). A cohesive cost of living, shopping facilities, access to areas of
society is marked by its supportiveness, rather than scenic quality, cost of owner-occupied housing,
forcing individuals to rely entirely on their own education facilities, policing, employment levels,
resources (Durkheim (1897), and is well endowed wage levels, unemployment levels, climate, access to
with stocks of social capital (Kawachi and Berkman indoor/outdoor sports, travel to work time, access
2000). The concept incorporates shared value sys- to leisure facilities, quality of council housing,
tems and interpretations, perceptions of a common access to council housing, cost of private rented
identity, a sense of belonging to the community, accommodation (in order of perceived order of
trust and reciprocity between individuals and importance to people’s quality of life (Flax 1972;
towards institutions. It is typically measured with Rogerson et al. 1989; Rogerson 1995)). Other indi-
questions about feelings of commitment and trust, cators have included access to convenient and
values and norms, feelings of belonging. affordable transport, various community resources
Social capital can be defined as a subset of the and facilities and the general characteristics of
concept of social cohesion: as those features of neighbourhoods. Subjective indicators include pub-
organizations and social structures which act as lic values, perceptions and levels of satisfaction with
resources for individuals to form connections area of residence, its facilities, transport, travel to
and social networks, and which lead to norms of work time, and perceptions of neighbourliness and
reciprocity, trust, collective action, beneficial safety from crime (Rogerson et al. 1989; Cooper et
cooperation and organization between members al. 1999; Coulhard et al. 2001).
There has been little standardization of measures Population Laboratory which provided the most
of social capital, and researchers have often devised convincing evidence of a link between social net-
their own items to measure this concept. However, works and mortality (Berkman and Syme 1979).
in the UK General Household Survey (GHS), This was modified by Lubben (1988), and labelled
the Office for National Statistics (ONS) has used a the Lubben Social Network Scale. If more studies
social capital set of questions which was developed used adequate and standardized measures, evidence
from a cross-government review, conducted by of the links may prove to be stronger and the debate
ONS, of social capital research (Cooper et al. 1999; about links less controversial.
Walker et al. 2001). They were based on Putnam’s There is now a trend away from simple totalling
(1995) definition of social capital, and the items of social contacts and single-item questions towards
were subject to tests of internal consistency (co- developing scales. Many earlier approaches simply
efficient alpha) and factor analyses before being assessed social network and support by questions
included in the final questionnaire (Coulhard et al. on marital status and household composition (see
2001). The questions cover length of residence in review by Hirsch 1981). The implication from
the area, enjoyment of the area, ratings of local the early studies on social ties and mental and
services and facilities, safety, perception of power physical health is that social ties are important and
to influence neighbourhood decisions and civic merit further study. Studies relying on single-item
engagement, crime and experience of crime, ratings questions and crude or simplistic measures cannot
of anti-social behaviour, litter, dog mess, noise, be used to derive substantive implications for
teenagers hanging around, drug problems, feelings practice. It is necessary to differentiate relationships
about their immediate neighbourhood of resi- according to their content, process and develop-
dence (defined as their ‘street or block’), including ment. Failure to obtain these data precludes infor-
neighbourliness, trust in neighbours, social contacts. mation on how social networks function as social
These questions are displayed in the annex of the support systems. Survey questions that do not
report on the 2000 GHS (Walker et al. 2001), and separate social support from network structure do
the detailed responses and questions are displayed not permit identification of the social conditions
in the accompanying report on social capital under which help and support are provided. It
(Coulhard et al. 2001). cannot be assumed that having a daughter living
nearby will necessarily lead to adequate support and
help. Most attempts to measure quality of the net-
METHODS OF MEASUREMENT OF work consist of questions asking respondents whom
SOCIAL NETWORKS AND SOCIAL they are close to, in contact with and if they see
SUPPORT enough of the mentioned people. Such single
items are inadequate as evaluations of the perceived
There is currently no assessment scale which com- quality of relationships (Adams 1967). Some assess-
prehensively measures the main components of ment of satisfaction in relation to social support is
social network and support with acceptable levels essential, given that this has been found to be related
of reliability and validity. Part of the problem stems to well-being, and its relationship to frequency of
from lack of agreement on conceptual bases, or contact or perceived support is inconsistent in the
even failure to consider these at all. literature (Fiore et al. 1986; Seeman and Berkman
Many surveys have relied on single-item ques- 1988).
tions such as marital status, frequency of contact Seeman and Berkman (1988) found that network
with others and existence of a confidant(e). Single- characteristics do not appear to be so highly corre-
item measures have been found to be powerful lated with aspects of social support that they are
predictors of health status and mortality but alone interchangeable. Cohen et al. (1987), on the basis of
provide no insight into the dynamics of the net- their study of 155 elderly residents of midtown
work. It is important to match methodology to the Manhattan single-room occupancy hotels, reported
empirical issue and correct disciplinary approach. the results of a factor analysis showing that only 7 of
Among the studies using only simple measures of the 19 network variables utilized had sufficient
social networks was the Alameda County Human commonality to form a potential scale, suggesting
that most variables are independent of each other. between subjective and objective elements
The authors criticize the use of scales without prior accurately. Perhaps additional interviews could be
analysis of scale items and the development of scales carried out with a member of the respondent’s
without the use of parametric approaches such as social network in order to establish the validity
factor analysis. As they point out, a possible problem of the information provided by the respondent.
with combining all network variables into one scale Sokolovsky (1986) suggests an ethnographic
is the potential for premature treatment of network approach to validate information given: participant
variables as unidimensional (i.e. representing one observation, life histories, genealogies and informal
underlying construct). Combining variables which interviews to probe the social support elements
may be independent of each other into a simple of social networks. An example is Francis’s (1984)
scale may attenuate true variance and obscure dif- comparison of the Jewish elderly in Cleveland
ferences that may be important. Much previous and England. Francis’s methods were based on a
research has been limited to the use of bivariate comprehensive set of semi-structured questions
statistical techniques, therefore, making it impossible centring around practical (transportation, shopping,
to control for overlap among variables or to deter- money, etc.) and emotional (advice, visiting, etc.)
mine the relative strength of variables. Measures of services. Another example is Wentowski’s (1982)
network need to be multi-faceted, taking into study of an urban population in which he elicited
account structure and dynamic features of the examples of support as various events were reported
network, as well as the individual’s subjective (e.g. critical incidents technique).
perceptions of it. In sum, there has been little attempt to test
Researchers often assume that measures of net- measures of social support for reliability and validity
work size and frequency of social contacts are fairly (Tardy 1985). Existing research generally suffers
‘objective’ and stable in comparison with measures from methodological problems: imprecise def-
of the content and quality of relationships which are initions, failure to treat social support as a multi-
likely to be confounded by mental-health status dimensional concept, and various intervening
(House and Kahn 1985). Donald and Ware (1982) variables confound studies. The main problem with
reported one-year test-retest reliability coefficients most studies has been the inadequate conceptualiza-
of between 0.40 and 0.60 for reports of social con- tion and operationalization of social support. It is
tacts. This lack of stability may sometimes reflect common simply to itemize presence or absence of
the changing nature of relationships, particularly in a spouse, confidant(e), household composition
older populations who experience large network and social activities. Most researchers then total
losses through death and illness of (also old) relatives respondents’ scores to questions about the structure
and friends, necessitating caution in the interpreta- of social networks and ignore the different dimen-
tion of results (Bowling and Farquhar 1995). sions of support. Conceptual definitions are more
One problem with network scales is that visits to rarely offered, although sociologists are beginning
relatives are so routine that they may be taken for to offer more developed conceptual statements:
granted and unreported as formal social visits that the respondent believes he or she is cared for/
(Stueve and Lein 1979). Spouses may also be so loved/esteemed/valued and belongs to a network
taken for granted as part of the network that they of significant others.
too are unreported in network scales (Bowling
et al. 1988). Another problem with asking questions
about social support is that feelings about the AVAILABLE MEASURES OF SOCIAL
supportive nature of the network (affective com- NETWORK AND SUPPORT
ponent) are influenced by psychological well-being
or depression as well as by network structure and Several chapters in Cohen et al. (2000a) provide
functions. Thus people with adequate support may brief overviews of several measures of social net-
perceive it to be inadequate because they are feeling works, support and social integration; other reviews
depressed. This problem has not been resolved. Sub- include those by Payne and Graham Jones (1987)
jective perceptions are, of course, important, but a and Orth-Gomer and Unden (1987b). The latter
measurement schedule should be able to distinguish also reviewed a number of the shorter measures,
consisting mainly of single-item questions with aspects of relationships. In both scales, some of the
little reference to quality of, and satisfaction with, items are specific to people with mental-health
relationships, e.g. Berkman’s Social Network problems and they are not recommended for other
Index (Berkman and Syme 1979); House’s Social types of populations. The Team for the Assessment
Relationships and Activities Scale (House et al. of Psychiatric Services (TAPS) have developed the
1982); Lin’s Social Support Scale (Lin et al. 1979) Social Network Scale as part of their battery of out-
and Orth-Gomer and Johnson’s Social Network come measures, but this is a semi-structured
Interaction Index (Orth-Gomer and Johnson instrument and can take up to an hour to administer
1987a). Although these short-item measures appear (Leff et al. 1990; Leff 1993). The full instruments
to be inadequate in scope and depth, Berkman and were reviewed by the author in Measuring Disease
Syme’s, House’s, and Orth-Gomer and Johnson’s (Bowling 2001), and as these subscales on social
scale items did predict mortality in longitudinal support were developed for use with a specific
population surveys. On the other hand, the items patient population they are not included in this
are insufficiently detailed to indicate precisely what chapter. Interested readers are referred to the
dimensions of poor social support structures are relevant references.
most important. Moreover, there are doubts about Some of the more general measures of health
the validity of short scales. For example, there status and quality of life include items on social
is evidence that there is massive item bias in support and activities; as these have been reviewed
Berkman’s Social Network Index, indicating that it elsewhere in this volume, they will not be reviewed
cannot be used as a valid measure of social network here (e.g. OARS). The Rand Batteries also included
(Dean et al. 1994). two batteries on network structure (the Social
A review by O’Reilly (1988) of 33 instruments Health Battery) and social support (the Social
purporting to measure social support reported only Support Scale). These were reviewed in the section
modest agreement on conceptual definition and on the Rand Batteries in Chapter 4 and will also
frequently the concepts were not defined or were ill not be repeated here.
defined. In particular, definitional confusion
between social network and social support was
apparent. Variables used to operationalize these INVENTORY OF SOCIALLY SUPPORTIVE
BEHAVIOURS (ISSB)
concepts reflected this conceptual confusion. For
example, some of the measures he reviews define The ISSB is a measure designed by Barrera (1981)
social support in terms of social-network character- for use with a wide range of community popula-
istics (size source and frequency of contact). tions. Social support was conceptualized as the
O’Reilly was less optimistic about the value of diversity of natural helping behaviours that indi-
existing measures and pointed out that more rigor- viduals actually receive, derived from the previous
ous standards to establish validity and reliability literature on social support. The authors felt that
were required. Several scales, including some of most existing scales concentrated on the structure
those reviewed here, which were judged to be suit- of the network rather than what the members
able for use with elderly people, have been briefly actually did, especially in view of noted discrepan-
described by Oxman and Berkman (1990). There cies between actual amount of help provided and
are also several longer scales of social support subjective perceptions of the amount of help (Liem
which have been designed specifically for use with and Liem 1978).
people with mental-health problems. For example, This scale is operationalized by 40 items gener-
Lehman’s (1983) Quality of Life Interview contains ated from the literature to specify the amount of help
a sub-section on frequency and nature of social received in the past month. The average time taken
relationships and contacts, as well as social activities to complete the questionnaire is about ten minutes.
(rating satisfaction using the Delighted–Terrible
Faces as the response scale). Bigelow et al.’s (1991)
Content
Quality of Life Questionnaire also includes a sub-
stantial set of questions on social relationships, The ISSB measures four types of support: emo-
support and activities including negative-positive tional, instrumental, information appraisal and
socializing. The index asks respondents to state how Validity
people have helped them in the last month and to The validity of the ISSB was tested by correlating
respond on a five-point Likert-type scale to each of results, based on 43 students, with a measure of
the 40 items as ‘not at all’, ‘once or twice’, ‘about family relations (Family Environment Scale); the
once a week’, ‘several times a week’ or ‘about every correlation, although significant, was not high:
day’. It measures the receipt of support, but not the 0.35. The author speculated that this was because
source. Examples of items include: the two scales were measuring different dimensions
of support. Construct validity was assumed by
Emotional correlations of the index with a measure of life
Expressed interest and concern in your well-being. events: 0.38 to 0.41 (Barrera 1981; Sandler and
Listened to you talk about your private feelings. Barrera 1984). Notes available from the author on
Was right there with you (physically) in a stressful the scale summarize the correlations between ISSB
situation. and measures of distress from ten published studies.
Items on instrumental appraisal support The correlations range from 0.01 to 0.50, although
Provided you with a place where you could get away for most were fairly weak. The ISSB has been reported
a while. to be significantly related to recovery in stroke
Loaned you over $25. patients (Glass and Maddox 1992), supporting its
Provided you with some transportation. predictive ability.
Informational appraisal support There is agreement between investigators who
Gave you some information on how to do something. have carried out exploratory factor analyses of
Gave you feedback on how you were doing without the scale, and report that the ISSB yields three fac-
saying it was good or bad. tors (Barrera and Ainlay 1983; Stokes and Wilson
Helped you understand why you didn’t do something 1984; Walkey et al. 1987; Pretorius and Diedricks
well. 1993). A study by Caldwell and Reinhart (1988) is
Socializing also consistent with these analyses and labels the
Talked to you about some interest of yours. clusters as guidance, emotional support and tangible
Did some activity together to help you get your mind off support.
things.
Reliability
There is a problem with all these statements in that The test-retest reliability correlation coefficient,
they may or may not relate to real events in people’s based on 71 students tested over two days, was 0.88;
immediate lives, and some items may measure test-retest reliability coefficients, again using
resources available to the supporter rather than students, over a one-month period were 0.80 and
support to the respondents (e.g. items on financial 0.63 (Barrera 1981; Barrera and Ainlay 1983;
and car loans). Valdenegro and Barrera 1983). Internal consistency
reliability coefficients of 0.92 and 0.94 were
reported for the first and second administrations
Scoring
of the scale (Barrera 1981). Notes on the ISSB are
The five-point ratings of each item are summed to available from the author at Arizona State Uni-
produce a total score, with higher scores indicating versity; these report that tests from five studies show
greater support. The author also suggests calculating the internal consistency coefficients of the scale to
an average frequency score as this permits a global be above 0.90.
score to be produced when there is some missing In sum, there are problems with interpretation
data for items. of the scale. Also, although preliminary results for
A weakness with scales of this type is that, with- reliability testing appear good, more work on both
out a distinction between available and enacted the reliability and validity of the scale is needed
support, the scale may simply be measuring the before it can be recommended. However, the scale
number, type and severity of problems recently has been popular, and several investigators have
experienced by the respondent. adapted it for use with other cultures or with
specific disease groups (e.g. Manne and Schnoll 2 A few more opportunities?
2001). 3 Or was this about right?
4 During the past month, how much do you think
you needed people to talk about things that were very
ARIZONA SOCIAL SUPPORT INTERVIEW personal and private?
SCHEDULE (ASSIS) 1 Not at all.
2 A little bit.
The ISSB (above) was not designed to provide 3 Quite a bit.
information on people who supplied resources, nor
Social participation
respondents’ subjective appraisals of the adequacy
of support. Thus Barrera (1980, 1981) subsequently 1 Who are the people you get together with to have fun
designed the Arizona Social Support Interview or relax? These could be new names or ones you listed
Schedule (ASSIS) to address this gap. The ASSIS before.
Probe: Anyone else?
was developed as an instrument to measure several
aspects of support, including procedures for identi- 2 During the past month, which of these people did you
fying support-network membership and subjects’ actually get together with to have fun or relax?
satisfaction with and need for support. His con- Probe: Ask about people who were named in 1 but
ceptual definition, based on previous literature, not 2.
of social network relates to people who provide 3 During the past month, would you have liked:
the functions defined as support. The scale is 1 A lot more opportunities to get together with
based on self-report and takes 15–20 minutes to people for fun and relaxation?
complete. 2 A few more?
3 Or was it about right?
4 How much do you think you needed to get together
Content
with other people for fun and relaxation during the
The scale is operationalized by asking subjects to past month?
identify individuals who provide support in the 1 Not at all?
following areas: private feelings, material aid, advice, 2 A little bit?
positive feedback, physical assistance and social 3 Quite a bit?
participation. In each area, subjects are asked about The interview schedule also includes questions
such support from the named individuals in the concerning negative interactions, on the basis of
past month, and whether the support was sufficient; the psychiatric literature linking these to mental
ratings of satisfaction are made on a three-point disturbance (identification of people with whom
scale. Examples of items are: they have had conflicts in the past month). Finally,
subjects are asked about the age, sex and ethnicity of
Private feelings people named.
1 If you wanted to talk to someone about things that are
very personal and private, who would you talk to?
Scoring
Give me the first names, initials, or nicknames of the
people that you would talk to about things that are The data obtained allow the calculation of specified
very personal and private. scores from the relevant items of total available and
Probe: Is there anyone else that you can think of? total utilized network size, conflict network size,
2 During the last month, which of these people did you unconflicted network size, amount of support
actually talk to about things that were personal and satisfaction and support need. The author reported
private? that the support-satisfaction measure suffers from
Probe: Ask specifically about people who were listed an extremely skewed distribution in the direction of
in response to 1 but not listed in response to 2. high satisfaction scores (Barrera 1981). Notes which
3 During the last month, would you have liked: identify the item scores which should be used in
1 A lot more opportunities to talk to people about calculations are available from the author at Arizona
your personal and private feelings? State University.
Validity A study by Barrera et al. (1985) of 36 mental
The ability of the instrument to discriminate health out-patients who agreed to supply the name
between groups was tested on a sample of 86 preg- of one network member showed significant kappa
nant adolescents. The ASSIS was administered coefficients between subjects’ and informants’
along with the ISSB. Barrera (1981) reported that reports of support. Of the 31 cases of non-
conflicted network size correlated significantly agreement of ASSIS items, 24 were due to inform-
with depression and anxiety, satisfaction correlated ants stating that they had provided aid when sub-
with depression and anxiety; expressed need corre- jects indicated that they had not; and seven were
lated significantly with depression and anxiety and due to subjects reporting that aid was provided and
somatization (0.23–0.51). The reported relationship informants did not.
between life events and social support was taken as Further testing for both reliability and validity is
evidence for the index’s construct validity (0.25– required. At best the validity of the scale is modest.
0.38), although these correlations are at best
modest. The ISSB showed a modest but significant PERCEIVED SOCIAL SUPPORT FROM FAMILY
correlation with total network size (0.32), but was AND FRIENDS
not significantly correlated with satisfaction or need
on the ASSIS. It has been used successfully in several This was devised by Procidano and Heller (1983) as
studies of mental and physical health. Rivera et al. a measure of perceived social support. It was
(1991) used the ASSIS in a study of women caring designed to assess the functions of social networks
for frail family members. They reported that carers specified by Caplan (1974): ‘the extent to which an
who were depressed reported a higher incidence individual perceives that his/her needs for support,
of negative interactions with others, and non- information, and feedback are fulfilled by friends
depressed carers reported greater use of social . . . and by family’. Thoits (1982) has criticized this
support resources. Woods et al. (1994), in a study of definition as inadequate on the grounds that it
depression in young women, reported that the includes the very term to be defined (support).
ASSIS was able to predict self-esteem and depres- The scale measures available and received support,
sion. Clinton et al. (1998), based on the results of especially emotional support. It takes about eight
their longitudinal survey of community adaptation minutes to complete.
in people with schizophrenia, reported that the
perceived social support variables from the ASSIS Content
independently accounted for most of the variance
in their four measures of community adaptation. It comprises two 20-item self-report measures,
derived, after testing with students, from an initial
pool of 84 items. One 20-item measure is for
Reliability
perceptions of family support and one 20-item
The instrument has been tested for test-retest reli- measure is for perceptions of support from/given
ability, and results appear moderately satisfactory by friends. Responses require a simple ‘yes’, ‘no’, or
to good. It was tested on 45 students and total ‘don’t know’. Examples are:
network size produced a correlation coefficient
of 0.88 over three days, and studies with a further Family
group of students produced a test-retest correlation I rely on my family for emotional support;
over one month of 0.70 (Barrera 1980; Valdenegro My family is sensitive to my personal needs;
and Barrera 1983). Test-retest correlations were My family gives me the moral support I need.
0.54 for size of conflicted network, 0.69 for satis- Friends
faction and 0.80 for support need. There was also My friends enjoy hearing about what I think;
modest support for predictive and construct validity I have a deep, sharing relationship with a number of
(Barrera 1981). friends;
Internal consistency correlations for support sat- I feel that I’m on the fringe of my circle of friends.
isfaction and support need were low to moderate: A few items refer to support given
0.33 and 0.52 respectively. My friends come to me for emotional support;
Certain members of my family come to me when they of support was derived from Colby’s theory of
have problems or need advice. attachment and is based on the existence or avail-
ability of people who can be relied upon, who care
Scoring about, love and value the recipient.
In the construction of the scale, which is based
Positive item responses are totalled and presented on self-report, 61 items were written to sample the
separately for family and friends. An overall score situations in which social support might be impor-
can also be calculated. tant to people. They were administered to 602
students who were asked to list people who pro-
Validity and reliability vided them with such support. Items that showed
The developers reported that the scale correlated low correlations with other items were deleted.
with measures of psychopathology and distress Scoring methods were also piloted. The scoring
(0.85) therefore supporting its discriminative ability method selected was the simplest – a count of
(Procidano and Heller 1983). A study of adolescent supportive persons. The availability index selected
out-patients confirmed the early results for the was the number of persons listed divided by the
validity of the scale, and reported that it could number of items.
predict psychosocial maturity levels (Gavazzi 1994). Most of the items are concerned with emotional
It has been used successfully in clinical studies. support, so this scale is appropriate for measure-
Grummon et al. (1994) used the scale in a study of ment of emotional support only. The scale takes an
psychological adjustment among intravenous drugs average of 15 minutes to complete. A short three-
users with AIDS, and reported that perceived social item version and a six-item version have been
support from family (but not from friends) corre- developed by Sarason et al. (1987b).
lated positively with psychological adjustment.
Lin (2002) reported that Perceived Social Support Content
from Friends subscale moderated the relationship
between dysfunctional attitudes and depression in a The instrument comprises 27 items which ask the
study of Taiwanese adolescents. subjects (a) to list all the people to whom they can
In relation to internal consistency, tests by the turn in specific situations and (b) to indicate their
developers with 222 students produced correlation satisfaction with each of these identified supports
coefficients of 0.88 and 0.90 respectively for the on a scale ranging from very satisfied to very dis-
items relating to family and those relating to friends. satisfied. The scale was published in full in Sarason
Test-retest reliability, using 222 students, was 0.83 et al. (1983). Examples are:
over a one-month period (Procidano and Heller
1983). In a study investigating the reliability of the Whom can you really count on to listen to you when you
need to talk?
scale among students, the intercorrelations among
Whom could you really count on to help you out in a
the subscales and between the subscales and the crisis situation, even though they would have to go out of
total scales were reported to be strong and highly their way to do so?
significant (Eskin 1993). With whom can you totally be yourself?
Although the scale has good internal reliability, Whom can you really count on to be dependable when
test-retest reliability and predictive validity, and has you need help?
been popularly used in social and clinical studies, Who do you feel really appreciates you as a person?
more evidence of its psychometric properties is Whom can you count on to console you when you are
required. very upset?
Scoring
SOCIAL SUPPORT QUESTIONNAIRE
Each item is scored (number of persons listed); the
This questionnaire was developed by Sarason et al. satisfaction score ranges from 1 to 6 (very satisfied
(1983) to measure the availability of, and satisfaction to not very satisfied). Two scores are computed by
with, social support. The conceptual definition dividing the sum of each of the two scores (number
of people; overall satisfaction) by the 27 items: contained one factor that represented the structure
average (per item) number of people, and average of the scale.
level of satisfaction with support.
Reliability
Validity
Test-retest reliability over a four-week period, using
Sarason et al. (1983) tested the measure in a study of 602 students, was 0.90 for the availability of support
227 students who were administered the scale along items, and 0.83 for satisfaction with support, using
with personality scales (extraversion and neuroti- 105 students. The alpha coefficients for internal
cism). The correlations with the Social Support reliability were 0.97 for availability and 0.94 for
Questionnaire were weak to moderate: −0.02 and satisfaction with support, using 602 university
−0.43 for availability and satisfaction with support. students. Inter-item correlations ranged from 0.37
Another study they reported with 295 students to −0.71 for availability, and from 0.21 to −0.74 for
showed an association between positive life events satisfaction with support. A second study of 227
and number of social supports. And a study with students showed a correlation between the avail-
440 students showed that people with more social ability and satisfaction with support scores of 0.31
supports had more positive self-concepts. In for men and 0.21 for women (Sarason et al. 1983).
addition, students who were high in social support In sum, it is a viable measure of support, with
were rated as more attractive and skilled socially good results for internal reliability; and some
(Sarason et al. 1983, 1985). Also in further support support for its construct and predictive validity. A
of its construct validity, Swindells et al. (1999), validated German version exists (Franke et al. 2003).
based on a study of patients with HIV, reported The development and testing of the scale has led
significant correlation between the Social Support to the authors emphasizing the importance of
Questionnaire and the Short Form-36 measure of perceived support, rather than network size per se,
broader health status. The developers have pub- as the most relevant variable (Sarason et al. 1987a).
lished the results of several similar studies support- This work has led them to develop a further scale:
ing the construct validity of the scale (Sarason et al. the Quality of Relationship Index (published in
1987a, 1987b). Predictive validity was judged to be full by Pierce et al. 1991). This is not reviewed here
satisfactory on the basis of correlations with depres- and interested readers are referred to the latter
sion (the correlations for 100 men and 127 women reference.
for each dimension were from −0.22 to −0.43).
Sinha et al. (2002) supported the construct validity
of the scale in their study of older people in India. INTERVIEW SCHEDULE FOR SOCIAL INTERACTION
Their prediction that social support, as measured by (ISSI)
the scale, would have a modifying effect on positive
attitudes to life and perceived control over life was This index was developed by Henderson et al.
supported. Separate factor analyses were performed (1980, 1981a). The conceptual definition of support
by the developers for each of the two scores. was based on the theory that social relations are
Each analysis showed a very strong first (unrotated) based on attachment, social integration, nurturance,
factor. The first factor accounted for 82 per cent reassurance of personal worth, sense of reliability,
of the common variance for the availability score, help and guidance. The scale was developed over a
and 72 per cent for the satisfaction with support year in pilot studies of 130 people in health centres,
score. out-patient departments in a club for the elderly,
All factor loadings exceeded 0.60 and 0.30 for and in a general population sample in Canberra. It
each score respectively. There appeared to be good was reported to be acceptable to both healthy and
evidence that one strong factor underlies each score psychiatrically disturbed respondents.
and that they represent different dimensions of The scale takes approximately 30 minutes to
the concept. Pretorius and Diedricks (1993) also complete, although reports of 60 minutes have also
carried out exploratory factor analysis and con- been published. A short version of the scale has
firmed that the Social Support Questionnaire been developed by Unden and Orth-Gomer
(1989), with similar levels of reliability, validity and AVSI: the availability of more diffuse relationships,
discriminative ability as the full scale. as with friends, work associates and acquaintances
(social integration).
ADSI: the perceived adequacy of these more diffuse
Content relationships.
The scale comprises 52 questions asking about the
availability and adequacy of people in specific roles: The scoring and computing instructions have been
attachment, provided by close affectional relation- published by Henderson et al. (1981a).
ships; social integration, provided by membership
of a network of persons having shared interests
and values; the opportunity for nurturing others; Validity
reassurance of personal worth; a sense of reliable The authors judged the scale to have face validity and
alliance; and obtaining help and guidance from stated that the items effectively tap the constructs
informal advisers in times of difficulty. It does not of availability and adequacy of attachment in adult-
adequately measure the availability and adequacy of hood and social integration. Henderson et al. (1980)
attachment and social integration. established the validity of the scale by analysing its
A question about the availability of provision is four dimensions in relation to personality assess-
immediately followed by a question on adequacy. ments (neuroticism and introversion–extraversion
Examples are: as measured by the Eysenck Personality Inventory).
The sample comprised 225 members of the general
How many friends do you have whom you could visit at
population. The authors hypothesized that a
any time, without waiting for an invitation? You could
arrive without being expected and still be sure you
neurotic person would have problems in forming
would be welcome. and maintaining social relationships and would con-
sequently be dissatisfied with those relationships.
Would you like to have more or fewer friends like this, or Inverse relationships with availability and satisfac-
is it about right for you? tion measures and neuroticism were reported
(−0.18 to −0.31). Results were less consistent with
Three items ask about negative interactions: extraversion. The construct validity of the scale was
How many people whom you have to see regularly do
also supported by findings reported by Magne et al.
you dislike? (1992). They found that, of their sample of in-
patients who had attempted suicide, social support
was associated with marital status and economic
Scoring
activity in the expected directions.
The four principal indices yield four scores: Predictive validity was reported by Henderson
availability of attachment, perceived adequacy of et al. (1980) on the basis of a study of neurosis on
attachment, availability of social integration and 756 residents of Canberra, Australia. The measure
perceived adequacy of social integration. The was associated with Zung’s Self-Rating Depression
following indices are created: Scale and Goldberg’s General Health Question-
naire (−0.15 to −0.96) (Henderson 1981). The
AVAT: the availability of affectionally close relationships study, which was based on baseline data collection
(attachments). from 756 residents in Canberra, with three follow-
ADAT: the perceived adequacy of what comprises these up interviews with a sub-sample over 12 months,
close relationships, expressed as a percentage of what reported that deficiencies in social relationships had
is available.
an effect on the early development of neurotic
NONDAT: those who lack close relationships might not
be unhappy with their situation. The NONDAT index
symptoms. As both availability and adequacy items
is a measure of such satisfaction despite the absence of correlated significantly and negatively with psychi-
attachment. atric disorder and depression, predictive validity was
ATTROWN: the number of attachment persons with judged to be satisfactory. The scale was also able to
whom the respondents has been having rows in the discriminate between new migrants to Canberra
last month. and those who had been living there for seven
months or more. The strength of this study was that faction with emotional support. The eight ratings
it was carried out on a representative sample of the are summed to provide a measure of satisfaction.
population and was prospective, not cross-sectional;
thus the measures of social ties were less likely to
Content
be contaminated by already established neurotic
symptoms. The correlations between respondents’ Hirsch’s (1980) conceptual definition of support
ISSI scores and those of someone who was nomi- was based on interaction that affects coping ability:
nated as well informed about their social relation- guidance, social reinforcement, aid, socializing and
ships were weak to moderate, although significant, emotional support. The social support dimension of
ranging between 0.26 and 0.59 for the items. the scale was operationalized by asking respondents
This may be an indicator of the difference between to specify the amount of interaction with network
actual and perceived support. Persson and Orbaek members in five areas of supportive activity, and
(2003) also reported that the ISSI was sensitive to their degree of satisfaction with the interaction.
differences in trait anxiety levels in a group of The social network dimension of the scale con-
healthy women, as defined by the STAI-Y. ceptualized social network as a natural support sys-
tem, based on significant others. This is operational-
ized by asking people to list in a matrix ‘the initials
Reliability of up to 20 people who are significant in your life
Test-retest reliability scores for the scale indices over and with whom you have contact at least once a
an 18-day period using 51 people from the general month’. People then put an X in those boxes of the
population ranged from 0.71 to 0.76; using 756 matrix that connect people who are significant in
adults from a general population over an 18-day each other’s lives and who have contact with each
period, scores ranged from 0.66 to 0.85, and over a other at least once a month. They also indicate
12-month period ranged from 0.66 to 0.85. The which persons in their lists are relatives, and which
problem with this long time period is that support persons they can ‘confide in or turn to for help in
structures might have changed and thus a lower an emergency’. This is probably more important
correlation may reflect this rather than the reliabil- in terms of providing effective, satisfying support
ity of the measure. Internal consistency reliability than network size (Brown et al. 1975; Conner et al.
coefficients range from 0.67 to 0. 81 for the indices. 1979; Stokes 1983). This yields information on:
In sum, although this is a promising scale, again
more testing is required. 1 The size of the network, taking the definition of net-
work as people who are significant in the respondent’s
life and with whom the respondent interacts regularly.
2 The number of people in the network whom the
THE SOCIAL NETWORK SCALE (SNS)
respondent feels close to – how many one can confide
in or turn to for help in an emergency.
The Social Network Scale was adapted by Stokes 3 The number of relatives in the network.
(1983), based on Hirsch’s (1980) work. This work 4 Network density (i.e. the degree to which network
was based on just 20 recent young widows and 14 members are themselves interconnected).
older women recently re-entering college studies.
Stokes (1983) based the Social Network Scale on
four dimensions of network that he judged to be Scoring
important: network size, number of people the Scoring consists of totalling the number of network
respondent feels close to, number of relatives in members, relatives and confidant(e)s. Scoring could
the network and network density. also identify the overlap between these categories
A supplement to the SNS is an eight-item scale (e.g. between relatives and confidant(e)s).
which asks people to rate their total social net- A larger number of variables can be computed
works, and their networks of friends, on four and scored from the grid:
dimensions: general satisfaction with the network,
amount of desired change in the network, satisfac- (i) Network size: number of people listed
tion with assistance in daily activities, and satis- (ii) Relatives: number of relatives listed
(iii) Friends: number of people listed who are not network composed of 50 per cent relatives could
relatives be a network of just two members, one of whom is
(iv) Confidant(e)s: number of people respondents said a relative, or it could be a network size of 20, 10
they could confide in or turn to for help in an of whom are relatives. This does not appear to be a
emergency
(v) Relative confidant(e)s: number of confidant(e)s
satisfactory basis for making comparisons between
who are relatives respondents.
(vi) Friend confidant(e)s: number of confidant(e)s who Other main problems with this scale are that it
are not relatives does not have an index of geographical proximity,
(vii) Percentage relatives: proportion of the network or frequency of contact, of network members, and
who are relatives (relatives/size) it does not distinguish between types of relatives
(viii) Percentage confidant(e)s: proportion of the network (e.g. sons and daughters are not distinguishable) or
members who are confidant(e)s (confidant(e)s/size) the nature of the contact (supportive, instrumental,
(ix) Percentage relative confidant(e)s: proportion of etc). The value of the scale is the conciseness of the
confidant(e)s who are relatives (relative confi- grid, which saves pages of questions, and it is easily
dant(e)s/confidant(e)s)
supplemented with other structural items and
(x) Density: the proportion of the total possible
number of relationships which actually exist
questions on support (Bowling and Farquhar 1995).
among members of a social network, excluding the The structural network features measured by the
respondent grid have been reported to have little predictive
(xi) Relative density: density of the relatives in the value in longitudinal studies of psychological
social network list morbidity or life satisfaction (Bowling et al. 1992,
(xii) Friend density: density of the friends in the social 1993). Moreover, the author’s own experience with
network list the scale is that respondents tend to omit their
(xiii) Relative–friend density: number of relationships spouses from the network scale – their significance
between relatives and friends as a proportion of the and contact being taken for granted (Bowling et al.
total possible number of such relationships. 1988).
The social support part of the scale has been little
tested but was judged to have face reliability; the
correlation between respondent and interviewer
Although the scale has been judged to have face ratings of satisfaction and support was 0.53 and was
validity, it has not been tested fully for reliability judged to be reliable. Inter-item correlations, testing
and validity. It is also limited in scope; Stokes has for reliability, ranged from 0.22 to 0.51 (Hirsch
himself supplemented the use of the SNS with 1980).
other scales measuring the frequency of receipt of The sensitivity of the scale is unknown and more
support (e.g. the Inventory of Socially Supportive evidence of its reliability and validity is required.
Behaviours (Stokes 1985)). Stokes (1983), on the Generally, it will require supplementation with
basis of administering the SNS to 82 students, other items.
submitted these dimensions to a principal com-
ponents analysis and four factors emerged which
accounted for over 80 per cent of the variance in THE LUBBEN SOCIAL NETWORK SCALE (LSNS)
the original matrix: size, with emphasis on the
number of friends in the network; presence of con- Lubben (1984, 1988) developed a composite social
fidant(e)s; dominance of relatives in the network; network scale for use with older people. It was a
and density. Stokes (1983) also reported that the modification of the well-known Berkman-Syme
number of confidant(e)s in the social network was Social Network Index, which successfully dis-
predictive of perceived-support satisfaction. tinguished between survivors and non-survivors in
It is not clear what the basis is for limiting social their longitudinal general population survey in
network members to 20. Also, a flaw with cal- Alameda County (Berkman and Syme 1979).
culating the percentage of relatives, confidant(e)s The LSNS attempted to take into account many of
or friends in the network is that this does not take the criticisms levelled at existing social network
account of size in the presentation: for example, a measures. These included the need to measure
more than household composition (as most older (iii) interdependent social supports: composed of inter-
people live alone, this is an inadequate indicator of dependent relationships covered by four questions
social isolation or need); the need to distinguish on existence of a confidant, is a confidant, relies
between social support and social networks (to upon and helps others, living arrangements).
facilitate attempts to distinguish between lack of
available support and lack of need for support), and An additional question covers living arrangements
between family, friend and peer interactions and (lives alone, with spouse, other relatives or friends,
networks (given the literature on the superiority of unrelated individuals or lives alone). A copy of the
friendship over family networks for psychological scale can be found in Lubben (1988). Examples are
well-being); the need to measure network size, shown below:
the existence of a confidant(e) relationship, and the Confidant relationships
existence of reciprocal relationships, given their
theoretical importance for the timely provision of When you have an important decision to make, do you
help and support when required, and their con- have someone you can talk to about it?
tribution to psychological well-being); the unwieldy 5 = always/4 = very often/3 = often/2 = sometimes/1 =
nature of existing instruments (most were too seldom/0 = never
long).
The original Berkman and Syme indictor When other people you know have an important deci-
consisted of a simple index based on four main sion to make, do they talk to you about it?
components: marital status, nature of relationships 5 = always/4 = very often/3 = often/2 = sometimes/1 =
with relatives and friends, church membership, and seldom/0 = never
membership of other organizations and clubs. It was
Helping others
able to predict risk of mortality among members
of their longitudinal survey. However, Lubben Does anybody rely on you to do something for them
noted that there was little variation among samples each day? For example: shopping, cooking dinner, doing
of elderly people in relation to the indicators of repairs, cleaning house, providing child care, etc.
organizational participation and marital status and No/Yes
therefore deleted these items. However, this deci-
If No: Do you help anybody with things like shopping,
sion needs revising in view of the increasing par- filling out forms, doing repairs, providing child care, etc.?
ticipation of the current generation of people aged
65 and over in community and voluntary activities. 4 = very often/3 = often/2 = sometimes/1 = seldom/0 =
and associations between marital status and health, never
well-being and mortality. Lubben also simplified
the original scoring system. Scoring
The LSNS score is obtained from an equally
Content weighted sum of the ten items, each of which
ranges from 0 to 5. The total score can therefore
The LSNS is a composite measure of social net-
range from 0 to 50.
works and consists of ten items within three
dimensions:
Validity
(i) family networks: composed of three areas: size There is support for the discriminative ability of
of active family network, size of intimate family the scale. Lubben (1988) tested the LSNS with
network, and frequency of contact with a family
non-institutionalized members of the California
member. These are covered by questions on the
number seen monthly, number feels close to, and
Senior Survey, a random sample of elderly Medic-
frequency of social contact; aid recipients from eight communities within
(ii) friends networks: composed of three similar areas, California. The LSNS correlated significantly with
and covered by questions on the number feels close a life satisfaction index, in-patient hospital admis-
to, number seen monthly, and frequency of social sions for six or more days within the last 12 months,
contact; and health behaviours. Rutledge et al. (2003), in
their follow-up study of osteoporosis among sion, conflict or anger. The three sections, with
women aged 65 and over, reported that the LSNS examples of item response choices, are:
scores were a robust predictor of mortality. Chou
and Chi (2001), in their study of a cross-section of Cohesion: the degree to which family members are
elderly people in Hong Kong, also reported that the helpful and supportive of each other (e.g. ‘There is a
LSNS mediated the association between everyday feeling of togetherness in our family’).
competence and depression. Expressiveness: the extent to which family members are
encouraged to act openly and to express their feelings
directly (e.g. ‘We tell each other about our personal
Reliability
problems’).
All ten items of social networks were reported to Conflict: the extent to which the open expression of
inter-correlate highly, with a Cronbach’s alpha of anger and aggression and generally conflictual inter-
0.70, indicating high internal consistency (Lubben actions are characteristic of the family (e.g. ‘We fight a
1988). Chou and Chi (2001), in their study of a lot in our family’).
cross-section of elderly people in Hong Kong,
reported a slightly higher Cronbach’s alpha of 0.76.
In sum, the scale has been popular due to its Scoring
simplicity (Chou and Chi 1999, 2001), and its The standardized scores of each subscale are
derivation from the well-known Alameda County averaged to produce a composite score. A manual
social network measure, although more evidence of is available (Moos and Moos 1981, 1994).
its psychometric properties is needed.
Validity
THE FAMILY RELATIONSHIP INDEX (FRI) Although there is no reference to the predictive
validity of the full scale in the manual, the Family
The Family Relationship Index attempts to Relationship subscale was reported to have predic-
measure support from within the family. It has no tive validity by Billings and Moos (1982) on the basis
clear conceptual basis. It was developed by Moos of findings from their longitudinal study of social
and Moos (1981) and Billings and Moos (1982). support and functioning among 113 patients who
It was derived from a 90-item true–false response had been treated for alcoholism. An association
choice instrument, with ten subscales, called the between depression and low levels of reported
Family Environment Scale, also developed by the support would be expected (due either to actual
authors. The FRI is composed of the first three low support or the tendency of people in low spirits
subscales of the Family Environment Scale. The to perceive their lives more pessimistically). Signifi-
scale was initially developed with a number of cant correlations reported were those for depression
normal and distressed families in different ethnic and level and source of reported support (0.19 to
groups. Over 1,000 families were involved. 0.74). Garber et al. (1998), in a longitudinal survey
of maternal depression and their children, reported
that the FRI mediated the significant association
Content
between maternal depression and adolescent
The FRI subscale is composed of three sections suicide.
(Moos and Moos 1981), taken from the Family There has been little published data on the test-
Environment Scale. The three sections relate to ing of the scale.
‘cohesion’, ‘expressiveness’ and ‘conflict’. The
‘cohesion’ section assesses the amount of commit-
Reliability
ment, assistance and sustenance family members
contribute to each other; ‘expressiveness’ assesses the The FRI was reported by Billings and Moos (1981)
extent to which family members are encouraged to and Holahan and Moos (1981) to have a high
express their feelings directly; and ‘conflict’ assesses internal consistency (0.89) and to be related to
the extent to which family members express aggres- other measures of social support.
Test-retest reliabilities for the full Family Scoring
Environment Scale’s ten subscales at intervals of 8 Three scores can be computed: SS-A, which is the
weeks, 4 months and 12 months ranged between sum of all 23 items; SS-A (family) which is the sum
0.52 and 0.89 (Moos and Moos 1981; Caldwell of the eight family items; and SS-A (friends) which
1985). is the sum of the seven friends’ items. The limita-
In sum, although tests for reliability appear to be tion of the scale is that it provides no information
good, the validity of the scale is more questionable, about the numbers of people involved in providing
and again the need for further testing is evident. supportive behaviour, nor of who they are (e.g.
Caldwell (1985), in his critique of the scale, states spouse, sister, son, etc.).
that the full instrument is useful if used with
‘caution’, given, in particular, the lack of evidence
Validity
regarding its psychometric properties.
Convergent and divergent validity was tested in
relation to five samples of students and five com-
THE SOCIAL SUPPORT APPRAISALS SCALE (SS-A) munity samples (the total number of respondents
AND THE SOCIAL SUPPORT BEHAVIOURS SCALE was around 1,000). It was tested against other
(SS-B) measures of social support (the Perceived Social
Support Scale; the Family Relations Index; the
The Social Support Appraisals Scale was developed Social Support Questionnaire and two other less
by Vaux et al. (1986a, 1986b). The concept of social well-known scales; and three single-item questions
support employed was based on Cobb’s (1976) on satisfaction with friends). The authors also
definition of social support and was designed to employed two further scales developed by them-
tap the extent to which individuals believe that selves. One of these was the Social Support
they are loved by, esteemed by and involved with Resources Scale. This was designed to tap many
family, friends and others. One of the authors has aspects of the individual’s social support network.
reported a relationship between type of network Respondents were asked to list up to ten people
orientation (e.g. positive or negative) and respon- who provided them with five types of support:
dents indicating either secure or anxious attach- emotional support, practical assistance, financial
ment styles (Wallace and Vaux 1993). One main assistance, socializing and advice/guidance.
advantage is the brevity of the scale. Respondents provided satisfaction ratings for the
five types of assessed support. Respondents were
Content: SS-A asked further questions about the characteristics of
The SS-A is a 23-item scale consisting of a list of the people identified and the nature of the relation-
statements about relationships with family and ship (e.g. spouse, friend). Mean scores were com-
friends. Eight items relate to family relationships, puted for each variable. The final scale tested against
seven to relationships with friends and eight to the SS-A was the Social Support Behaviours Scale,
‘others’ in a general way. Respondents have four also developed by the authors (see below). The
response choices: ‘strongly agree’, ‘agree’, ‘disagree’ authors also tested the SS-A against six personality
or ‘strongly disagree’. There is no middle value. and depression inventories, including the Affect-
Examples are: Balance Scale, and the UCLA Revised Loneliness
Scale (see later). The authors provided evidence of
My friends respect me convergent validity with a variety of the support
My family cares for me very much measures. The vast number of correlation co-
I am not important to others efficients presented by the authors is too extensive
I am well liked to reproduce here. However, none of the corre-
I feel a strong bond with my friends lations with other support or personality measures
If I died tomorrow very few people would miss me were very strong (ranging from 0.03 to −0.51, with
I feel close to members of my family one correlation coefficient achieving −0.72). This is
My friends and I have done a lot for one another
not surprising given the very different theoretical
(Strongly agree/agree/disagree/strongly disagree) concepts underlying the different scales. Also in
support of the discriminative ability of the SS-A, certainly do this’ (see above). Items are scored 0 (no
Coen et al. (1997) reported, on the basis of their one) to 5 (certainly) and summed.
study of the burden on carers of patients with
Alzheimer’s disease, that the carer burden was Validity
independently associated with the SS-A.
The predictive validity of the SS-B has been dem-
onstrated in relation to psychological distress and it
Reliability correlates moderately with other support measures
There is no published evidence of reliability. In (including SS-A) (Vaux and Wood 1985; Vaux et al.
sum, the validity of the scale remains questionable, 1986a, 1986b). In addition, five methods were used
and evidence of reliability is also required. to test the validity of the SS-B as a measure of
support: the classification of items by judges; an
analogue (role-playing) simulation of samples
Content: SS-B deficient in each mode of support; tests against a
The Social Support Behaviours Scale was related measure of supportive behaviour (ISSB);
developed by Vaux et al. (1987) to tap five modes an examination of levels of each type of support
of supportive behaviour. It is similar to Barrera provided for different problems and confirmatory
et al.’s ISSB (1985: 94) but does not suffer from factor analysis (Vaux et al. 1987). The classification
the limitation of asking solely about actual rather of items according to their content by judges
than potential support (which may simply reflect (students) provided some evidence of the content
number of problems). This lists 45 specific sup- validity of the SS-B. The percentage of items
portive behaviours, in five subscales, tapping the five correctly classified by judges ranged from 13 to 100
modes of support listed in their Social Support per cent; in most cases it was over 60 per cent.
Resources scale: emotional support, socializing, The unique role-playing exercise with students
practical assistance, financial assistance and advice/ showed deficits in available support corresponding
guidance. Respondents indicate how likely their to their enacted role, providing evidence of sub-
family and friends would be to engage in each of scale sensitivity. However, the correlations were
the behaviours in time of need. Examples are: weak with the ISSB. This may be because the two
measures are based on different concepts of social
Would suggest doing something, just to take my mind off support – available and enacted respectively.
my problems. The factor analysis conducted by the developers
Would visit with me, or invite me over. demonstrated that the pattern of item convergence
Would give me a ride if I needed one. and divergence was highly consistent with pre-
Would have lunch or dinner with me. dictions and confirmed that the SS-B taps the five
Would loan me a car if I needed one. modes of support conceptualized very well (Vaux
Would go to a movie or concert with me.
et al. 1987). Further psychometric testing of the
(No one would do this/someone might do this/some SS-B by Corcoran et al. (1998) showed that their
family member/friends would probably do this/some factor analyses on separate subscales showed
family member/friend would certainly do this/most loadings which were more consistent for ‘family’
family members/friends would certainly do this.) than ‘friends’. They also found the SS-B to be a
poor discriminator between adolescents in different
A problem with these types of scales is that they ethnic groups who were attending pregnancy pre-
assume social values, i.e. that respondents all equally vention programmes, and concluded that the scale
value ‘a movie or concert’, and that other people are needs further development to enhance its sensitivity
in a financial position to, for example, ‘loan a car’. among people with varying ethnic and socio-
economic backgrounds.
Scoring
Reliability
The choice of five responses for each type of
behaviour itemized ranges from ‘no one would do Excellent internal consistency was obtained for the
this’ to ‘most family members/friends would SS-B items (the lowest alpha was 0.82). In sum, the
SS-A and SS-B scales are promising but have been Scoring
little tested for reliability and validity. The ISEL is scored by counting the number of
responses indicating support. There is no informa-
tion on the validity of this method. Factor analyses,
INTERPERSONAL SUPPORT EVALUATION LIST (ISEL) based on data from 133 college students, by
Brookings and Bolton (1988) recommended that
The Interpersonal Support Evaluation List was ISEL subscale scores should be analysed and that a
developed by Cohen et al. (1985), on the basis of total should not be calculated which results in loss
the theoretical assumption that the buffering of unique information.
effect of social support is cognitively mediated. The
testing of this theoretical assumption required an Validity
instrument which aimed to measure the perceived Cohen et al. (1985) reviewed the evidence on the
availability of support. The items were developed reliability and validity of the scale, most of which
on theoretical grounds to cover the domain of was previously unpublished. The scale was tested on
social support that could facilitate coping with over 500 students in Oregon; over 100 students at
stress. It was designed to assess perceived availability Carnegie-Mellon University; almost 100 students
of support in four distinct areas: material aid at Delaware University; over 100 students at Guelph
(tangible subscale); someone to talk to about one’s University; and over 100 students at Arizona State
problems (appraisal subscale); comparisons of self University. Four studies were also carried out
with others (self-esteem subscale); people to share on the version of the scale for the general popu-
activities with (belonging subscale). lation using student samples at the University of
California, Carnegie-Mellon University and the
Content
University of Oregon (total: over 200 students),
and a general population sample participating in
The ISEL consists of a list of 40 (48 in the version the Oregon Smoking Cessation Program (over 60
for students) statements about the perceived avail- adults).
ability of potential social resources. The items are The ISEL student scale was found to correlate
balanced: half the statements comprising the scale moderately with other social-support scales: 0.46
contain positive items about social relationships, and with Barrera’s ISSB; 0.30 with the total score of
half contain negative items. the Moos Family Environment Scale; 0.39 with
Respondents are asked to indicate whether each reported network size; 0.46 with number of close
statement is ‘probably true’ or ‘probably false’. friends reported; and 0.42 with number of close
Examples of the items are: relatives reported. The scale for the general popula-
tion correlated 0.31 with a scale measuring the
There is at least one person I know whose advice I really quality of relationships with partner/spouse. It was
trust. expected that the scale would be associated with
There are very few people I trust to help solve my self-esteem, as this is influenced by feedback from
problems. others; the correlation with the Rosenberg Self-
If I decide on a Friday afternoon that I would like to go Esteem Scale was 0.74. The scale was reported to
to a movie that evening, I could find someone to go with be significantly associated with depression; social
me. anxiety was controlled to eliminate the possibility
Most people I know don’t enjoy the same things that I that the social support concept was not merely
do.
a proxy measure of personality (e.g. social anxiety
If I were sick and needed someone to drive me to the
doctor, I would have trouble finding someone.
or social skills). Longitudinal analyses of the groups
If I had to mail an important letter at the post office by in the studies showed that the scale was able to
5.00 and couldn’t make it, there is someone who could predict changes in depressive symptomatology.
do it for me. The latter association indicated that the scale was
In general people don’t have much confidence in me. not simply measuring symptomatology. There was
Most people I know think highly of me. a small but significant association with physical
symptomatology and changes in these symptoms. used successfully with psychiatric patients living
The scale was reported to be a good predictor of in the community, and the scale was able to dis-
smoking cessation and was associated with measures criminate between those who lived independently
of stress in the direction suggestive of the buffering and those who lived with their families or in group
effects of social support. These data suggest the homes (Pomeroy et al. 1992).
importance of appraisal support as a protector
against the pathogenic effects of stressful life events.
Espwall and Olofsson (2002) also reported that THE NETWORK TYPOLOGY: THE NETWORK
patients with musculo-skeletal disorders reported ASSESSMENT INSTRUMENT
less emotional support on a condensed version of
the scale, supporting its construct validity. Because support can be predictive of outcomes in a
Factor analyses, based on data from 133 college range of areas of life, knowing a person’s network
students, by Brookings and Bolton (1988) con- type can be valuable to health and social services
firmed a four-factor model, which provided a professionals. Also, the aim of social service pro-
reasonable fit for the data, although the high corre- viders is to develop a care plan for people which
lations between the four factors suggested a general supplements, rather than supplants, the care they
second-order social support factor. receive from members of their social network.
Therefore, several investigators have attempted to
develop frameworks for the assessment of the net-
Reliability
work for care plans (Kaufman 1990). One of the
Adequate internal and test-retest reliability was more successful attempts has been that of Wenger
reported by the authors. With the scale for students, (1989, 1992, 1994) in relation to elderly people.
the internal-reliability alpha coefficient was 0.86, As a result of qualitative, more anthropological,
ranging from 0.60 to 0.92 for each subscale. The research with elderly people, Wenger was able to
alpha coefficients for the general population scale identify different types of social networks and she
ranged from 0.88 to 0.90. The subscale correlations constructed a typology of support networks
ranged from 0.62 to 0.82. (Wenger 1989, 1994; Wenger and Shahtahmasebi
The student scales were tested for test-retest 1991). The typology is based on the theory that the
reliability at four-week intervals with reported experience of ageing is mediated through, and
correlations of 0.87 for the total scale, and corre- determined by, the capacity of the support network
lations of 0.71–0.87 for the subscales. The general to respond to change and the nature of the resulting
population scale was administered twice with a change (Wenger and Shahtahmasebi 1991). The
two-day interval; the reported correlation for the support network is defined by Wenger (1994) as
total scale was 0.87, and the subscale correlations those involved with the person in a significant
ranged from 0.67 to 0.84. The six-week test-retest way: as a member of the household, in providing
correlations from the smoking-cessation group or receiving: companionship, emotional support,
sample were 0.70 overall, and 0.63–0.69 for the sub- instrumental help, advice or personal care.
scales. The six-month test-retest correlations for The Network Typology is developed from the
this group were 0.74 for the total scale and 0.49– data collected by a questionnaire – the Network
0.68 for the subscales. In sum, although this is a Assessment Instrument, which identifies the avail-
scale which has only recently been developed, and ability and proximity of family and other ‘support
further testing is required, it does cover a wide ties’, the degree of involvement demonstrated by
range of supportive relationships and results of respondents with family, friends, neighbours and
testing for reliability and validity are fairly good. community, network density (how many network
One limitation is that the scale does appear to be members know each other), and the size and con-
orientated towards a young or active population tent of the larger social network. The distinguishing
group and so may not be suitable for use with frail factors between the different types of network
elderly people (e.g. questions on advice over job are: the availability of local close kin, the level of
changes, sexual problems and availability of people involvement of family, friends and neighbours, and
to give lifts to the airport, etc.). The scale has been the level of interaction with the community and
voluntary groups. The collection of this data Private restricted support network: This network is
enabled the investigators to identify five types of associated with absence of local kin, although a high
support networks in relation to the lifestyle and proportion are married. Contacts with neighbours are
relationship of the elderly person to their network minimal, there are few friends nearby and there is a
low level of community contacts or involvements.
(Wenger 1994): There are two sub-types included in this network: inde-
pendent married couples and dependent elderly people
Local family-dependent support network: This network who have withdrawn or become isolated from local
is mainly focused on close local family ties, with few involvement. A low level of social contact often repre-
peripheral friends and neighbours; it is often based on sents a lifelong adaptation. Networks are smaller than
a shared household with, or close to, an adult child, average.
usually a daughter. Nearly all support needs are met by The most numerous members of this type of network
the family. Community involvement is low. The network are relatives who live more than 50 miles away, followed
is small, and the person is more likely to be widowed, by members of the household or relatives who live
older and in poorer health than people with other types 15–50 miles away, reflecting the absence of local kin.
of network.
The most numerous members of this network are rela- Wenger (1994) reported that most networks can
tives living within one mile, followed by household be categorized into one of these types, and that
members and neighbours. local self-contained and private restricted networks
Locally integrated support network: This includes close are less robust than other network types, and more
relationships with local family, friends and neighbours vulnerable in the face of ill health or a crisis. The
(the latter two often overlap). These are based on long- different lifestyles and social characteristics of
term residence and active community involvement people represented by the different network types
in religious/voluntary organizations currently, or in the are also described.
recent past. These networks are larger than others.
These networks have the highest numbers of neigh-
bours, who are also likely to be friends. Content
Local self-contained support network: This is character- The Network Assessment Instrument was pub-
ized by ‘arm’s-length’ relationships or infrequent contact
lished in full by Wenger (1994), but it can only be
with at least one relative living in the same or adjacent
community (usually a sibling, niece or nephew). Child-
used in conjunction with the appropriate training
lessness is common. Reliance is mainly on neighbours, package developed by Wenger. This is because the
and the lifestyle is focused on the household. Com- identification of the network is the planning aid
munity involvement, if any, is low key. Networks are and is no substitute for professional training on
smaller than average. appropriate interventions. A video-based training
These networks also have more neighbours than other and resource pack for practitioners and service pro-
network categories, and the network is predominantly viders is available (Wenger 1995), and the author
local. has details of training courses. The use of the
Wider community focused support network: This net- typology has been piloted with a number of social
work is typified by active relationships with distant rela- services teams, and a guide has been published
tives, usually children, and high salience of friends and (Wenger 1994).
neighbours. The distinction between friends and neigh- The Instrument contains eight questions, each
bours is maintained and there is engagement in com- with one of three types of 1–3 (coded A, B, C) or
munity or voluntary organizations. There has frequently 1–6 (coded A–F) point response categories, relating
been retirement migration, absence of local kin, and net- to distance of residence of network member or
works are larger than average. This network is commonly
frequency of interaction/activity (e.g. No relatives
a middle-class or skilled working-class adaptation.
The most numerous membership of these networks are
(A) to (lives) 50+ miles (F); Never/no friends (A) to
friends who live within a one-mile radius, followed by (chat/do something) Less often (F); Yes, regularly
relatives who live more than 50 miles away, and then (A) to No (C)). The questionnaire also contains the
equal numbers of friends at one to five miles away and information for the interviewer to be able to code
neighbours. This membership reflects the absence of (A–F) immediately the type of network from each
local kin. response code. Examples of the questions are:
How far away, in distance, does your nearest child or LONELINESS
other relative live? (Do not include spouse.)
If you have any friends in this community/neighbour- A further concept requires introduction under the
hood, how often do you have a chat or do something heading of social networks and support: that of
with one of your friends? loneliness. Loneliness is distinct from social iso-
Do you attend meetings of any community/neighbour- lation and can be defined as an unwelcome feeling
hood or social groups, such as old people’s clubs, lec- of lack or loss of companionship. Weiss (1973)
tures or anything like that? distinguished between situational and personality
theories of loneliness. The former emphasize
Scoring environmental factors as causes of loneliness – e.g.
death of a spouse, moving to a new city. Evidence
Each item is analysed independently, and the net- relating to the size and quality of social networks
work typology is constructed on the computer, and associations with loneliness is unclear. Apart
using the response codes and the network type from a number of psychological studies which are
coded by interviewers on the questionnaires. of questionable generalizability as they tend to use
student volunteers as subjects, the quality of data on
loneliness in relation to the quality of social net-
While much descriptive data has been published works is poor. Most surveys have either been based
based on the Network Typology and the Network on small sample sizes or have used crude measures
Assessment Instrument, little information on its of social network or loneliness (Jones et al. 1985;
reliability and validity has been published to date. Bowling et al. 1989).
It was piloted with social services practitioners, and Although a large number of scales for the assess-
Wenger (1994) reported that most of her sample ment of loneliness have been devised, few have been
members were able to be classified by the typology. published. The existing scales have been reviewed
Network type was associated, as predicted, with by Russell (1982). The most widely used scale is the
levels of dependency, reliance on services, informal UCLA Loneliness Scale, and this is reviewed below.
help, emotional support and social interaction, and
survival of respondents over an eight-year follow-
up period (Wenger and Shahtahmasebi 1991). THE REVISED UNIVERSITY OF CALIFORNIA AT
These associations suggest that the instrument has LOS ANGELES (UCLA) LONELINESS SCALE
construct and predictive validity. Despite changes
over time in the network sizes of respondents, The UCLA Loneliness Scale is the most well-
the overall frequency distribution of the network known and widely used measure of loneliness. It
type of the sample remained constant (Wenger was developed, and later revised, by Russell et al.
and Shahtahmasebi 1991), suggesting that the (1978, 1980a, 1980b). The authors aimed to identify
instrument is robust. common themes that characterized the experience
Kirby et al. (2000) used the measure in a study of of loneliness for a broad spectrum of individuals.
over 1,000 elderly people living at home. They The UCLA Loneliness Scale was intended to be
reported associations between cognitive impair- global.
ment and network type in the expected direction: The scale was based on statements used by
those with cognitive impairment had a lower people to describe their loneliness. It began with 25
proportion of locally integrated and a higher pro- items and asked individuals to rate how frequently
portion of private restricted networks. Late-life they felt the way described, from ‘never’ to ‘often’
depression was also associated with low levels of on a four-point scale. The scale was tested on clinic
community integration. Data on the psychometric volunteers and students. The final loneliness
properties (validity and reliability correlations, scale consisted of 20 items; selected items all had
factor analyses) of the instrument has not yet item-total correlations above 0.50.
been published. It has been presented here in One problem with the original version of the
view of its potential and popularity among service scale was that all the items on the measure were
providers. worded in the same (lonely) direction. The implica-
tion is that tendencies to respond in a certain way measure (0.79); a comparison of the scores of
could systematically influence loneliness scores. two samples of participants (the mean of the clinic
A second potential problem was social-desirability sample was 60.1, significantly different from the
bias, given the possible stigma associated with student group’s mean of 39.1); and the loneliness
admissions of loneliness. Thus Russell et al. (1980a, scores were also strongly associated with depression,
1980b) revised the scale to take account of these anxiety, dissatisfaction, unhappiness and shyness.
problems, and ten negatively (lonely) worded and High correlations with other measures of loneliness
ten positively (non-lonely) worded items were have been reported (ranging from 0.72 to 0.74).
included. The revised version was tested on 162 These have been reviewed by Russell (1982). The
students. Testing involved using the original scale main limitation of this research is that it has been
and 19 new items, written by the authors, and largely confined to samples of students, raising
anxiety and depression measures. The ten positively questions about the validity of the scale in assessing
and ten negatively worded items with the highest loneliness for other populations. Although there
correlations with a set of items about whether is a need for further testing of the scale on other
they were lonely were included in the scale. All of population groups, initial results do appear
the item-criterion correlations were above 0.40. encouraging.
Further revisions led to the current version 3 Early evidence of the discriminative and pre-
(Russell et al. 1980a, 1980b; Russell and Cutrona dictive validity of the revised scale was provided
1991; Russell 1996). In the latest version the by studies reporting strong relationships between
wording of the items and the response format were the scale and measures of depression, and earlier
simplified to facilitate the administration of the research showed strong relationships between lone-
measure to less educated populations. liness items and anxiety and self-esteem (Russell
1982). The scale was tested for validity on 162
Content students, and also using the Beck Depression
Inventory. Correlations of 0.32 were obtained
The scale consists of 20 statements with a Likert with an anxiety index and of 0.62 with the Beck
response scale attached to each, half of which are Depression Inventory. The discriminant validity
descriptive of feelings of loneliness and half descrip- was also assessed by a study of 237 students, with
tive of non-loneliness or satisfaction with relation- the aim of assessing whether it was measuring
ships. A copy of the most recent version can be seen a different dimension of well-being to the depres-
in Russell (1996). Examples of items are: sion scales. Multivariate analysis was used for
this purpose and it was reported that a loneliness
I lack companionship index explained an additional 18 per cent of the
It is difficult for me to make friends variance in loneliness scores beyond that accounted
There is no one I can turn to for by the mood and personality measures. Con-
I feel part of a group of friends
current validity was also assessed, and significant
There are people I feel close to
associations were reported between solitary
activities, having fewer friends and loneliness scores
There are four choices for replies: never (1), rarely
(Russell 1982).
(2), sometimes (3) and often (4).
However, the validity of loneliness scales is dif-
ficult to assess. In terms of content and face validity,
Scoring the most face-valid loneliness measures are ques-
The total score of the scale is the sum of all 20 items. tions simply asking ‘are you lonely?’ This then
Some need to be reversed before scoring (i.e. 1 = 4, faces the problem of social desirability bias that
2 = 3, 3 = 2, 4 = 1); these are asterisked on the scale. could limit validity. Problems with the assessment
of criterion validity also exist. Loneliness is not
synonymous with isolation. There is an absence
Validity of external validity criteria for loneliness. Even
Validity was initially assessed by correlations of ratings by others known to the person are not
the scale with a single-item self-rating of loneliness highly reliable.
The scale has been used in a number of studies of factor analysis which indicated that a model incor-
emotional well-being, and its validity supported. porating a global bipolar loneliness factor, along
Stokes (1985) analysed social-network structure with a two method factor reflecting direction of
and personality factors in relation to loneliness, item wording, provided a good fit.
using the earlier version of the UCLA, in a sample
of 97 male and 82 female students. He reported
Reliability
network density to be the most strongly related to
loneliness: people with interconnected networks Studies of both the original version (which had a
tend to be less lonely. This may be because they feel different scoring method of 0–3) by Russell et al.
more sense of community, of belonging to a group. (1978) and Perlman and Peplau (1981), and the
Extraversion and introversion were also related to revised version, all reported good results for the
loneliness. validity and reliability of the scale (Russell et al.
In further support of its construct validity, Mellor 1980a, 1980b; Russell 1982). Russell (1996) cited
and Edelmann (1988) used the revised UCLA lone- coefficient alphas ranging from 0.89 to 0.94 for
liness scale (version 3) in a study of loneliness in 36 the revised version 3, in support of its internal
people aged over 65. They found that loneliness consistency.
was associated with having fewer confidant(e)s and In relation to test-retest consistency, Jones (cited
with lack of mobility. Riggio et al. (1993) also used in Russell et al. 1978) found a correlation of 0.73
the scale and reported that level of social skills pre- over a two-month period in a student sample, and
dicted perceptions of loneliness. Mahon et al. Russell (1996) reported a test-retest reliability of
(1995), in a study of adolescents, reported that the r = 0.73. In sum, this scale is the most extensively
construct validity of the scale was supported by its tested of all the loneliness scales available and results
associations with theoretically relevant variables so far are encouraging. The main limitation of
including friend solidarity and dependency. Russell testing to date has been its reliance on student
(1996) also reported significant correlations with subjects who are not representative of the general
other measures of loneliness, and measures of health population.
and well-being. In sum, this is the most widely used scale of lone-
Factor analyses of the scale have not produced liness, and it has been found to predict a wide range
consistent results, with factor structures suggesting of mental and physical health outcomes. The scale is
either uni-dimensional or two to three factor available in Russell (1996) and while freely avail-
structures (Cuffel and Akamatsu 1989; Mahon and able, it is copyrighted by Russell, and the developers
Yarcheski 1990; McWhirter 1990; Mahon et al. request that users should send them a copy of all
1995). Russell (1996) reported a confirmatory results once work is completed.
7
MEASURING THE
DIMENSIONS OF
SUBJECTIVE WELL-BEING
Quality of life cannot be equated with just one Influential social-science models of quality of life
dimension of well-being – it is the subjective sum in North America were then based primarily on
of multiple physical, emotional, social and objective the related but distinct concepts of ‘the good life’,
dimensions of one’s life. However, it has been ‘life satisfaction’, ‘subjective well-being’, ‘social well-
argued that, in the developed world where basic being’, ‘morale’, the balance between positive and
human needs have generally been met, quality of negative affect, ‘the social temperature’, or ‘happi-
life equates with perceived subjective well-being, ness’ (Gurin et al. 1960; Cantril 1965; Bradburn
and is the extent to which pleasure and happiness, 1969; Andrews and Withey 1976; Campbell et al.
and ultimately satisfaction with life, have been 1976; Andrews 1986). Of concern, however, both
obtained (Andrews 1974). This reflects the influence theoretically and methodologically, is the inter-
of early Greek and nineteenth-century utilitarian changeable use, without justification, of these dis-
philosophy, with their focus on hedonistic aspects tinct concepts. For example, morale and well-being
of life – the maximization of well-being, happiness, are commonly categorized as components of
pleasure and satisfaction. This is also reminiscent of psychological well-being and measured using one
Bentham’s ([1834] 1983) utilitarian philosophy, of a number of overlapping scales of life satisfaction,
which regarded well-being as ‘the difference in well-being, or morale and affect (e.g. Kutner et al.
value between the sum of pleasures of all sorts and 1956; Neugarten et al. 1961; Cantril 1965;
the sum of pains of all sorts which a man experi- Bradburn 1969; Wood et al. 1969; Lawton 1972,
enced in a given period of time’, and that society 1975; Andrews and Withey 1976; Campbell et al.
should aim for the greatest good of the greatest 1976; Coleman 1984; Dupuy 1984; Antonovsky
number. Others argue that pleasure and satisfaction 1993).
are insufficient for a good quality of life and a sense The terms subjective well-being and psycho-
of purpose or meaning, self-esteem and self-worth logical well-being are also used interchangeably,
are crucial for good QoL, including QoL in people although these concepts have been distinguished by
with dementia (Sarvimäki 1999). Keyes et al. (2002) who defined subjective well-being
The concept of subjective well-being emerged as the evaluation of life in terms of satisfaction
in the 1950s in attempts to move beyond objective and balance between positive and negative affect,
indicators of life quality in the monitoring of social and psychological well-being, drawing on theories of
change (e.g. societal levels of income, crime, hous- human development, as the perception of engage-
ing quality), and towards more subjective measures, ment with existential challenges of life. They argued
which more meaningfully reflected people’s lives that while both approaches assess well-being, they
and experiences (Land 1975; Keyes et al. 2002). address different features of this concepts: subjective
well-being was said to involve more global evalu- and affective components of subjective well-being
ations of affect and life quality, whereas psycho- is particularly useful when interpreting data. For
logical well-being addressed perceived thriving example, elderly people often report lower levels of
in relation to the existential challenges of life happiness but higher levels of life satisfaction than
(examples include the pursuit of meaningful goals, younger people (Campbell et al. 1976; Campbell
personal development, self-actualization, coping 1981). This suggests, then, that while cognitive
strategies, growth and mastery (especially in the evaluations of life as a whole increase with age,
face of cumulative adversity), and the development positive affect may decline. Perhaps people become
of quality social relationships). Their study of over more jaded in their emotions with age, but increase
3,000 adult Americans provided strong empirical their perceptions of their level of achievement, or
evidence to support their hypothesis that these are adjust their aspirations (Andrews and Robinson
related but distinct areas. Their factor analyses con- 1991). While the differences could simply be due to
firmed the two concepts are related but distinct; birth cohort effects (Campbell 1981; Inglehart and
and their analyses also showed that people who Rabier 1986), the example still illustrates the value
were younger and more educated had different of the different concepts.
combinations of them (higher psychological The theoretical literature on subjective well-
than subjective well-being) in comparison with being also divides this global concept into state
respondents who were older and with less (current well-being) and trait (well-being as a
education. feature of character), and postulates self-reported
However, the selection of measures is often made well-being measures that reflect at least four factors:
by investigators without any theoretical justification circumstances, aspirations, comparisons with others,
or any attempts to fit a predefined definition or and a person’s baseline happiness or disposition
model of ‘well-being’, despite the fact that, while (Warr 1999). Some measures of well-being also
overlapping, a scale measuring life satisfaction include individuals’ assessments of their overall
cannot adequately measure the other related but past and present life. The following dimensions are
distinct concepts. defined next for conceptual clarity: happiness, life
The various dimensions of well-being are sub- satisfaction, morale, self-esteem and self-concept
jective, and thus their measurement is also based and sense of coherence. The measures that are
largely on subjective self-ratings (Campbell 1976). reviewed here relate to these concepts of subjective
The measurement of well-being has been enhanced well-being, although some of these measures
by the development and use of psychometrically also include items relating to psychological well-
sound measurement scales (Andrews and Crandall being as defined by Keyes et al. (2002), and not all
1976). However, a major methodological problem authors have made their theoretical underpinning
has still been the lack of consistency in the usage of (i.e. concepts and definitions) fully explicit.
the terms ‘happiness’, ‘life satisfaction’ and ‘morale’
and related concepts (Stones and Kozma 1980; Stull
1987). As previously indicated, these concepts are HAPPINESS
not identical, although many researchers continue
to treat these concepts as interchangeable (George Happiness has an affective or emotional component
and Bearon 1980). For example, while the concepts (Andrews and McKennel 1980). In contrast to
and measures of the different dimensions of well- morale and life satisfaction, happiness is a short-
being are related, and generally correlate highly, term affect and can fluctuate on a daily basis, it is a
suggesting that they are all tapping a common transitory mood of ‘gaiety and elation’ that reflects
underlying construct (Lohmann 1977), ‘life satis- how people feel towards their current state of affairs
faction’ and ‘morale’ have a more cognitive com- (Campbell et al. 1976). Blanchflower and Oswald
ponent to them, while happiness has a more (2001), following Veenhoven (1991, 1993) defined
affective or emotional component (Andrews and happiness as the degree to which the individual
McKennel 1980). The cognitive component implies judges the overall quality of his or her life to be
evaluation, while affect refers to the positive/ favourable or unfavourable. Some have argued that
negative feeling. Distinguishing between cognitive the achievement of happiness depends on what one
MEASURING THE DIMENSIONS OF SUBJECTIVE WELL-BEING 127
has relative to a norm, although Veenhoven (1991) European opinion surveys (Eurobarometer surveys)
has criticized relativity theory and points to data and ad hoc data from the British Household Panel
that shows that happiness is associated with objec- Survey show similar results (Blanchflower and
tive, rather than relative, improvements in one’s Oswald 2001).
circumstances. Heylighen and Bernheim (2000)
have incorporated both arguments and define the
dimensions that make up well-being and quality LIFE SATISFACTION
of life, including happiness, as the sum of mainly
relative subjective factors but with a small contribu- Life satisfaction refers to an overall assessment of
tion from objective factors. Some investigators also one’s life, or a comparison reflecting some per-
define happiness in terms of life satisfaction, thus ceived discrepancy between one’s aspirations
confusing the two concepts. For example, Argyle and achievement. As well as feelings of success in
et al. (1989) defined happiness in their studies of the achieving life goals, commonly used measures of life
concept as the frequency of joy, the average level satisfaction usually include several factors pertaining
of satisfaction and the absence of negative feelings. to the present, including pleasure from everyday
Reflecting the vogue for self-empowerment and activities, perception of life as meaningful, positive
self-improvement, the term authentic happiness has self-image, optimistic outlook, and satisfaction
been coined by Seligman (2002), who claimed that with different domains of life, such as health,
happiness can be cultivated from self-knowledge work, relationships, activities and standard of living
and building on one’s signature (positive, opti- (Neugarten et al. 1961). Some explicit or implicit
mistic) characteristics by knowing one’s greatest comparison group is usually involved (‘compared to
strengths and then recrafting one’s life to use them. others’). Thus, it is a long-term cognitive appraisal
He also provided over a dozen questionnaires pur- of past, present and overall life and is relatively
porting to measure different dimensions of happi- stable in middle to old age (Campbell et al. 1976;
ness including current happiness, general happiness, Campbell 1981; Bowling et al. 1996). This is the
approaches to happiness, sensitive happiness, likely explanation for overall lack of changes in life
positive and negative affect, optimism, purpose in satisfaction measures when administered in longi-
life and also Diener et al.’s (1985) Satisfaction with tudinal surveys. Some studies report an increase in
Life scale. satisfaction in older age groups; and there are no
Happiness, like other subjective measures, is not consistent associations with sex. It is uncertain
without measurement problems. Smith (1979) has whether investigators have adequately separated life
also stated that there appears to be seasonal variation satisfaction from happiness. It has been widely used
in the global measure of happiness, with happiness as a social indicator of quality of life (e.g. Andrews
highest in the spring, declining in the summer and and Withey 1976; Campbell et al. 1976).
autumn and dropping to its lowest point in winter. In addition to the development of measures of
The other is the possibility of positivism bias among overall life satisfaction, some investigators have
sample members: some people may be reluctant developed measures of satisfaction with illness
to admit the extent of any unhappiness they may (Hyland and Kenyon 1992) and customized
feel on the basis of the assumption that it is socially measures of life satisfaction as a complement to
desirable to be ‘happy’. However, most investigators symptom-orientated measures (Frisch et al. 1992).
chose relatively simple measures of happiness. A These indicators are not included here, but are a
simple question on happiness has been asked in the welcome development in a field dominated by
US General Social Surveys since 1946: ‘Taken all negative measures of broader health and health-
together, how would you say things are these days – related quality of life.
would you say you are very happy, pretty happy,
or not too happy?’ (GSS question 157, reproduced
by Blanchflower and Oswald 2001; and see classic MORALE
analyses by Gurin et al. (1960) and Bradburn
(1969)). It shows that stability over time, and overall, Morale is the most poorly defined concept of
well-being, has not risen systematically across time. these terms, despite its importance in older age
(Wenger 1992). In contrast to happiness, morale romantic relationships, work, standard of living/
(like life satisfaction) has a more cognitive com- material domains) (Campbell et al. 1976).
ponent, which relates to the positive/negative Available evidence suggests that positive self-
feeling (Andrews and McKennel 1980). It has esteem is an important component of general
been suggested that it can be measured multi- assessment of life (Andrews and Withey 1976). Self-
dimensionally in relation to a person’s feelings esteem may become increasingly salient with the
about their life, himself/herself and their relation to transition from middle to old age (Schwartz 1975).
the world (Nydegger 1986). It is often defined in Most self-esteem theorists suggest, with some evi-
terms of a basic sense of satisfaction with oneself, a dence, that self-esteem is developed and maintained
feeling that there is a place in the environment for through a successful process of personal interaction
oneself, acceptance of life, and a generalizable feel- and negotiation with the environment (Rosenberg
ing of well-being (Lawton 1972, 1975), or more 1965; Wells and Marwell 1976). In sociology, the
precisely, in terms of confidence and enthusiasm critical role of social interaction and the significance
(George 1979; Stones and Kozma 1980). Kutner et of others in developing self-esteem and its main-
al. (1956) defined morale as a mental state, or set tenance are generally emphasized (Wylie 1974). In
of dispositions, which condition one’s response to relation to health and illness, negative self-image
problems in daily life. been analysed in relation to effects on recovery
from disease (Wilson Barnett 1981). As successful
negotiations are less likely in later life, self-esteem
SELF-ESTEEM AND SELF-CONCEPT among older people is less likely to be positive.
If age changes in self-esteem are to be expected,
Self-esteem is viewed as a component of mental measures of the impact of a relevant intervention
health, as well as a component of general assessment could become confounded with developmental
of life (Andrews and Withey 1976), and thus satis- changes over time.
faction with life. Again, these concepts are distinct,
while interlinked. Self-esteem is defined in terms
of self-worth – a belief or evaluation that one is a SENSE OF COHERENCE
person of value, accepting personal strengths and
weaknesses. Psychologists refer to self-esteem in Antonovsky (1987) coined the concept of a ‘sense
relation to a sense of self-worth – a belief that one is of coherence’, composed of three elements of com-
a person of value, accepting personal strengths and prehensibility, manageability and meaningfulness,
weaknesses (Rosenberg 1965; Coopersmith 1967; defined respectively as ‘a global orientation that
Wells and Marwell 1976). Self-esteem is therefore a expresses the extent to which one has a pervasive,
self-evaluation. There are several commonly used enduring though dynamic, feeling of confidence
scales of self-esteem in adults of all ages (e.g. Fitts that (1) the stimuli deriving from one’s internal and
1965; Rosenberg 1965; Coopersmith 1967). external environments in the course of living are
Related concepts include self-regard, self- structured, predictable, and explicable; (2) the
acceptance, self-concept or self-image. These are all resources are available to one to meet the demands
based upon the individual’s assessment and evalu- posed by these stimuli; and (3) these demands are
ation of himself or herself (Crandall 1973; Wells and challenges, worthy of investment and engagement’.
Marwell 1976). Self-concept is the cognitive com- These concepts have been little tested, although
ponent of the self and consists of individuals’ per- the inclusion of sense of coherence within a model
ceptions of themselves (i.e. what I am really like?). of quality of life has been given some empirical
Self-esteem is reflected in one’s self-concept, or support (Sarvimäki and Stonbock-Hult 2000).
self-image (which can be divided into ideal self (the
image aspired to) and the actual self (Coopersmith
1967). The self-concept is multi-dimensional in GLOBAL MEASURES
that people might also view themselves as having
multiple selves – e.g. different self-related beliefs can Life satisfaction, morale and happiness are all global
emerge in different life domains (family, friends, concepts, referring to life as a whole rather than to
specific aspects of it. Global measures are of rele- reviewed a wide range of items and measures of life
vance in assessing well-being, although they may satisfaction, well-being and happiness.
be of limited utility in evaluative research (George
and Bearon 1980). Carp (1977) cautions against
drawing policy relevant inferences from data which THE LIFE SATISFACTION INDEX A (LSIA) AND INDEX B
do not reveal why and with what people are (LSIB)
satisfied. Specific measures are an alternative (e.g.
housing satisfaction in evaluating effect of reloca- The Life Satisfaction Indexes A and B were
tion) (Andrews and Withey 1976; Campbell et al. developed by Neugarten et al. (1961) in order to
1976). Global and item-specific measures are produce a relatively short self-report measure of
appropriate for different research questions. life satisfaction based on respondents’ feelings. The
One of the advantages of using a few global items aim of these scales is to measure general feelings of
rather than global or specific scales is brevity: one or well-being in order to identify ‘successful’ ageing.
two questions rather than a whole battery. How- Several versions of the LSI exist. All were derived
ever, a positivism bias, for example, respondents’ from the five dimensions of past and present life
desire to be socially desirable, can be obtained with from life-satisfaction ratings obtained by a clinical
short items. Also short items lack sensitivity and interview: zest and apathy, resolution and fortitude,
therefore are of limited predictive value in longi- congruence between desired and achieved goals,
tudinal studies. The effect of question order with positive self-concept and mood tone.
such items has been relatively unexplored, although LSIA and LSIB were first developed in 1956 on
the National Opinion Research Centre found that the basis of item analyses of four rounds of inter-
placing a question about marital happiness immedi- views on the subject of life satisfaction, with people
ately before a global question of happiness resulted aged 50–70. The samples were based on a stratified
in a more positive response on global happiness probability sample of middle- and working-class
(Smith 1979). A review of the literature indicates people residing in Kansas City. A second sample
that well-being has been measured by three major aged 70–90 was interviewed two years later, based
scales as well as short global items: Life Satisfaction on a quota sample (overall total 177).
A (Neugarten et al. 1961); Bradburn Affect-Balance All versions of the index are easily administered
Scales (Bradburn and Caplovitz 1965; Bradburn and rest on a substantial amount of empirical
1969); Philadelphia Geriatric Center Morale Scale support. The problem with the scale is its global
(Lawton 1975); and global items of happiness nature which poses uncertainty about what is being
and life satisfaction (Robinson and Shaver 1973; measured. Scale A has been further criticized by
Campbell et al. 1976; Smith 1979). Other popular Liang (1984) for failing to measure transitory
scales are the Delighted–Terrible Faces Scale effects.
(Andrews and Withey 1976) and the Psychological
Well-Being Schedule (Dupuy 1978). These scales
Content
produce global scores of well-being (life satisfaction
or morale), except the Delighted–Terrible Faces Scales A and B differ only slightly in content but
Scale which contains current life specific items greatly in form. Life Satisfaction Index A has a
that should be analysed separately. These will be checklist of 20 items, statements with which the
reviewed in the following sections. Popular single- respondent either agrees or disagrees. Life Satis-
item measures and batteries, asking respondents to faction Index B has 12 open-ended questions that
assess their lives as a whole, are the Ladder Scale are given a score based upon the content of the
(Cantril 1965) and the Delighted–Terrible scales answers. The two instruments can be used together
(Andrews and Withey 1976) in which respondents or separately (Neugarten et al. 1961). Index A
rate themselves on a Likert category scale, mark has been used more frequently than B, probably
a rung on a ladder or select a circle or a face to due to ease of administration and quantification of
represent how they feel over a specified area of life. structured items.
Robinson and Shaver (1973), Sauer and Warland The original LSIA consists of 20 items (12
(1982) and Andrews and Robinson (1991) have positive and 8 negative). A second version, adapted
by Wood et al. (1969) contains 13 of the original 20 of this is that the separate dimensions are con-
items. Another version uses 18 items. An advantage founded.
of Scales A and B, and the shorter versions of A, is There are two scoring methods for the LSI. In
that they have positive and negative items. the original method, a two-point agree/disagree
A wide variety of content areas are tapped by response choice rated dissatisfaction as 0 and satis-
each of these scales, for example, ranging from faction as 1. Subsequent analysis of the ‘undecided’
happiness to satisfaction with level of activity. responses then led to the use of a three-point
Some items reflect back over past lives or involve scale, rating satisfaction as 2, uncertain as 1, and
comparing the present with the past, as well as dissatisfaction as 0 (Wood et al. 1969).
assessment of the present. Examples of Scale A items Population norms for North America have been
are: provided for the full and 13-item versions of the
scale, but they vary between studies, probably
Positive items reflecting the different samples, and should be
These are the best years of my life. used with caution (see Harris 1975 and reviews
The things I do today are as interesting to me as they ever by George and Bearon 1980, and McDowell and
were. Newell 1996. Comparisons between studies also
I am just as happy as when I was younger.
require caution, given the different scoring systems
Negative items used.
This is the dreariest time of my life. In sum, in view of the multi-dimensionality of
Most of the things I do are boring and monotonous. the scale, the use of a single score blurs the relation-
Compared to other people I get down in the dumps too ships between the items and a series of subscales
often.
should be calculated instead.
Respondents agree or disagree with the items on
Scale A, and they can also classify statements as Validity
undecided.
Using a sample of 177 people aged 50 and over,
Examples of the open-ended items on B are:
Neugarten et al. (1961) reported a correlation of
Positive 0.55 between LSIA and the original clinical life-
What are the best things about being the age you are satisfaction rating instrument. The correlation of
now? scores on LSIA with judges’ ratings was 0.52, and
(1 = positive answer; 0 = nothing good about it.) the correlation of LSIB with judges’ ratings was
0.59. Details of these were described by Neugarten
What is the most important thing in your life at the
moment?
et al. (1961).
(2 = anything outside of self, or pleasant interpretation of Criterion validity was established by assessments
future; 1 = hanging on; keeping health or job; 0 = getting by a clinical psychologist, although these took place
out of present difficulty, or nothing now, or reference to 18–22 months after the fourth interview on 80
past.) remaining sample members, and were of question-
Negative
able value.
Do you wish you could see more of your close friends Correlations between the LSIA and the Affect-
than you do, or would you like more time to yourself? Balance Scale and the Philadelphia Geriatric
(2 = OK as it is; 0 = wish could see more of friends; 0 = Center Morale Scale have been reported as 0.66
wish more time to self.) and 0.76 respectively (Bild and Havighurst 1976;
Lohmann 1977). Bowling and Browne (1991), in
How much unhappiness would you say you find in your
life today?
an interview study of 662 people aged 85 and over
2 = almost none; 1 = some; 0 = a great deal.) living at home in London, also supported the
convergent validity of the scale. As hypothesized,
the correlation between Neugarten’s Life Satis-
Scoring
faction Scale A and the Delighted–Terrible Faces
Scores are summed over all the items; thus ratings Scale (life satisfaction – current circumstances)
of each dimension are combined. The criticism was significant at r = −0.65, and with the General
Health Questionnaire (which assesses mainly rated descriptions of 177 cases. The coefficient of
anxiety and depression) was −0.47 (i.e. greater life correlation between the pairs of ratings was 0.79.
satisfaction was associated with low anxiety and For the 177 cases life-satisfaction scores for Scale A
depression). A similar study by the authors of almost ranged from 8 to 25, with a mean of 17.8, and a
300 people aged 65 and over living at home in standard deviation of 4.6. There was no significant
Essex, and of almost 400 people aged 65–85 living difference with sex (Neugarten et al. 1961). The
at home in London reported correlations between correlation between judges’ and clinical ratings was
Neugarten’s Scale and the Delighted–Terrible 0.64 which was regarded as satisfactory in view of
Faces Scale and the General Health Questionnaire these problems. Test-retest reliability coefficients
of −0.24 and −0.47 (Essex) and −0.57 and have been reported to range between 0.80 and 0.90
−0.41 (London) respectively (unpublished data). in patients with chronic diseases (Burchhardt et al.
Although the correlations with the General Health 1989).
Questionnaire were very similar for the three Bowling et al. (1993, 1996), in their longitudinal
studies (−0.41 to −0.47), it is uncertain why the study of people aged 85 and over in London, and
correlations with the Faces Scale vary between aged 65 and over in Essex and London found the
studies (−0.24 to −0.65), or should be weaker with measure to be stable over two and a half years, but
the younger elderly groups. High correlations also detected problems with the scale which affect
between these scales would not be expected as they reliability (unpublished information). The final
tap different dimensions of well-being, although item ‘In spite of what people say, the life of the
overlap would be (as is demonstrated). Scale A was average person is getting worse not better’ is often
reported in follow-up studies to be significantly received with puzzlement about a stereotyped
associated with poor functioning and health, sup- ‘average person’. This item assumes agreement over
porting the scale’s discriminative ability (Bowling an ‘average’ person. Moreover, respondents’ agree-
et al. 1993). ment with this item does not necessarily inform us
The scale was able to predict outcome of about whether their own lives are getting ‘worse
reminiscence therapy with elderly nursing-home not better’. It makes the scale too diverse in the
residents (Cook 1998), although it has not always number of different dimensions of life satisfaction
been successful in distinguishing between patients covered. Two other items also require refinement:
with different conditions (Celiker and Borman ‘I feel my age but it does not bother me’ is difficult
2001). Other inconsistencies using the scale have for respondents to reply to if they feel their age,
been reported. For example, although the original but it does bother them; and ‘Compared to other
authors reported that LSIA did not correlate with people my age, I look smart when I am dressed to
demographic factors, other studies have indicated go out’ is difficult to reply to in the case of respon-
that it does. In particular, positive correlations dents who are housebound or who live in institu-
with socio-economic status have been reported by tions and do not go out. Bowling et al. (1993, 1996)
a number of authors (see review by McDowell and found the alpha reliability coefficients for the scale
Newell 1996). ranged from alpha 0.73 to 0.80; and the split-half
Several studies using factor analysis and multiple- reliability coefficients ranged from 0.65 to 0.74
regression models have confirmed the multi- (unpublished data).
dimensional nature of the scale, and have
questioned the original conceptual formulation
(Hoyt and Creech 1983). THE LIFE SATISFACTION INDEX Z 13-ITEM VERSION
(LSIZ)
Reliability Validity and reliability

Internal consistency coefficients in various studies Wood et al. (1969) derived a 13-item version of
range from 0.70 to 0.76 (Dobson et al. 1979) to Neugarten et al.’s (1961) Scale A, known as LSIZ,
around 0.79 (Wood et al. 1969; Stock and Okun and which is probably the most popular. The
1982), depending upon the version used. Reliability scoring for this version of the scale ranges from 0 to
ratings were made by seven pairs of judges who 26. Item scores are: satisfaction: 2; dissatisfaction: 1;
don’t know: 1. The 13-item LSIZ has been used Wood et al. (1969) reported that the refined
extensively in the USA. 13-item version of the Index shows a split-half
Moderate correlations were obtained between reliability of 0.79. Edwards and Klemmack
the LSIZ and a scale of social engagement (0.49) (1973) reported an internal consistency reliability
and between the LSIZ and the Symptoms of coefficient of 0.90.
Anxiety and Depression Scale (0.49), suggesting Although the LSI measures are satisfactory in
that the scales assess some common aspect of well- terms of standard tests of reliability and validity,
being (Wood et al. 1969). It was used by Usui et al. their global and multi-dimensional nature poses
(1985) in a community survey of people aged 60+ problems. The issue is: what is being predicted?
in Jefferson County, Kentucky. It was significantly However, these scales are the most commonly used
associated with other variables in the expected to measure well-being in gerontological research
direction, in support of its construct validity. The (Larson 1978; Stull 1987).
authors reported a correlation of −0.22 between
the number of physical health problems and life
satisfaction, and similar correlations with various THE AFFECT-BALANCE SCALE (ABS)
social activities, income level and life satisfaction;
these were confirmed by multiple-regression analy- Bradburn (1969) described the Affect-Balance
sis. In contrast to Morgan et al.’s (1987) study of Scale as an indicator of happiness or general psycho-
over 1,500 people aged 65+ in Nottinghamshire in logical well-being. Bradburn and Caplovitz (1965)
the UK, they reported that older respondents had hypothesized that subjective well-being could be
higher life-satisfaction scores. indicated by a person’s position on two independ-
The LSIZ was used by Kozma and Stones (1987), ent dimensions: positive and negative affect. Well-
along with other measures of well-being, in a study being is expressed as the balance between these two.
of 150 people, aged between 50 and 82, in acute Research on the structure of psychological well-
psychiatric or community psychiatric wards in being supports the hypothesis that positive and
Newfoundland. Correlations between the LSIZ negative well-being are related, although they do
and the Philadelphia Geriatric Center Morale Scale not form a bipolar continuum (Bradburn 1969;
(PGCMS) were reported to be high (0.74). The Ryff 1989; Ryff and Keyes 1995). Thus, a person
authors reported that the LSIZ was able to correctly could experience some aspects of negative well-
identify 72 per cent of the community-hospital or being and other aspects of positive well-being
acute-ward samples, and PGCMS correctly identi- simultaneously (Marks and Lambert 1999). Positive
fied 74 per cent. A scale of social desirability was factors can compensate for negative feelings. The
also used, and the authors reported that controlling scale was developed on the basis of a sample of
for social desirability did not enhance the construct 2,006 adults in Illinois (Bradburn and Caplovitz
validity of the well-being scales. 1965), and revised on the basis of a study of 2,787
It was found to be sensitive to change in a study adults of mixed socio-economic and ethnic groups
of participation in community programmes (Wylie based on five probability random samples in Detroit,
1970). Kritz-Silverstein et al. (2002) reported that it Chicago, Washington and ten other US cities
was able to discriminate between women who had (Bradburn 1969). Respondents were reinterviewed
undergone hysterectomy and those who had not, 12 weeks apart for the latter study.
with the former group expressing significantly This scale has been subjected to a great deal of
higher life satisfaction. A longitudinal study of analysis (Knapp 1976). Originally 12 items, it is now
menopausal transitions by Dennerstein et al. (2000) composed of ten items, five referring to ‘positive
which used the LSIZ reported that, at six years after affect’, and five referring to ‘negative affect’. The two
baseline measurement, there were no differences in subscales are independent, although both correlate
life satisfaction (irrespective of menopausal status with happiness (Bradburn 1969). Balance refers to
or use of hormone replacement therapy), although the balance between positive and negative effect
life satisfaction was associated with mood, earlier reflected by an individual’s score on the scale (the
attitudes towards the menopause and ageing and balance is the result of an additive process). How-
relationship with partner. ever, this scale is also complicated by items referring
to activation (e.g. the item ‘excited’ or ‘interested’ The scale was used by Berkanovic et al. (1988) in
in something). Additionally, some of the items also an interview study of distress and help-seeking
appear to measure accomplishments. The scale is among 950 respondents in Los Angeles. The
self-administered. authors found no relationship between distress and
use of medical care, although they reported that
Content the distressed reported more illnesses. Significant
differences on scale scores by sex of respondents
The wording of the questions appears to have varied have been reported (Kushman and Lane 1980).
between studies. Bradburn specified that the time Possibly the most well-known application of the
referent should be ‘the past few weeks’ (originally scale was by Berkman (1971) in the Alameda
‘the past week’). Others have used ‘the past few County survey, although she only used eight of
months’ or even no time referent (see review the items. She reported a correlation of 0.48 with a
by McDowell and Newell 1996). An advantage of 20-item index of neurotic traits. Subsequent studies
using the scale is that some items refer to positive have reported associations in the expected direction
psychological states, reflecting the recent interest in with the ABS and extraversion and neuroticism
positive health. Examples of scale items are: (Charles et al. 2001; Cheng 2003) and social inter-
actions (Balaswamy and Richardson 2001).
Positive items include Maitland et al. (2001) tested the factor structure of
Things going your way. the scale using data from cross-sectional and longi-
Excited, interested in something. tudinal surveys of adults aged 54+. They reported
Pleased about having accomplished something. that the stability of the positive and negative affect
Negative items include factors was moderate over a three-year follow-up
period, and reported age differences in loadings
Upset because someone criticized you. (for the ‘upset’ item at Time 1) and also gender
Very lonely, remote from people.
differences in loadings (for items on ‘feeling on top
Bored.
of the world’ and things ‘going your way’).
Cherlin and Reeder (1975) have criticized the
Scoring scale, suggesting that the two-dimensional structure
Replies are dichotomous (yes/no). Differential is not correct. They suggested that there is a third
weights were tested but did not significantly alter component (activation level) included (e.g. ‘particu-
the results and so are not used. Each yes response larly excited or interested in something’). Borgatta
to the ten items in the scale is assigned a value and Montgomery (1987) have also argued that
of 1. The five items that reflect positive affect are some items also seem to be measuring instrumental
summed separately to the five that reflect nega- aspects (e.g. accomplishments). It has been reported
tive affect. The difference between the scores on to be sensitive to change (Bradburn 1969).
positive and negative affect is computed and is taken
as the final score, indicating the level of psycho- Reliability
logical well-being. Bradburn (1969) suggested
adding a constant (+5) to remove the negative Internal consistency (inter-item) correlations range
summary scale scores. from 0.47 to 0.73 for the positive scale and from
0.48 to 0.73 for the negative scale (Cherlin and
Reeder 1975; Warr 1978). Inter-scale correlations
Validity
were modest at 0.24 to 0.26 (Warr 1978). These
Correlations with other measures (testing for correlations are considerably higher than the early
validity) are around 0.66; this was achieved with an correlations reported by Bradburn (1969).
18-item version of Neugarten’s LSIA (Bild and Bradburn (1969) tested the scale for reliability
Havighurst 1976). A review of the scale by George and reported a test-retest correlation of 0.76 three
and Bearon (1980) reports inter-scale correlations days apart; for nine items associations exceeded
with other morale scales and an 18-item version of 0.90 and for the item ‘excited or interested’ the
the LSIA of between 0.61 and 0.64. test-retest correlation was 0.86.
In sum, it has acceptable levels of validity and Scoring
reliability and has been found to be applicable for Most items are dichotomously coded. One point is
use with older people, although it was not scored for each response indicating high morale.
developed specifically as a measure for them. It The range of scores is from 0 (low) to 17 (high),
is easily administered. George and Bearon (1980) with higher scores indicating greater morale. The
rate it as the best measure of affect (frequency scale can be treated as three subscales or as an overall
of experienced feelings and kinds of reported scale. The scale, descriptive statistics and details of
feelings). the scoring are available from Lawton. Liang and
Bollen (1983) have reviewed the various scoring
methods, and recommended that three subscales
THE PHILADELPHIA GERIATRIC CENTER MORALE
SCALE (PGCMS)
should be calculated: agitation, dissatisfaction, and
attitudes towards one’s own ageing.
This scale, and its revised version, were developed
on the basis of the assumption that morale is a Validity
general feeling of well-being, and which is multi- Correlations testing the PGCMS for validity with
dimensional (Lawton 1972, 1975). Lawton viewed Neugarten’s scales (reported earlier) vary with
morale in terms of general well-being. The scale Neugarten’s various indexes from 0.57 to 0.79
also takes into account two other properties: applic- (Lawton 1972; Lohmann 1977). It has been
ability to older, institutionalized populations, and reported that the scale is able to discriminate
optimal scale length allowing reliability without between social groups. The PGCMS has had
respondent fatigue (Knapp 1976). A preliminary numerous applications. Ward et al. (1984) used the
version of the scale with 41 items was tested on 17-item version and excluded two items that they
300 people, with an average age of 78. The scale was were already measuring in a series of questions
reduced to 22 items, but with subsequent analyses about social integration: ‘How much do you feel
was revised and now contains 17 (Lawton 1975; lonely?’ and ‘I see enough of my friends and rela-
Morris and Sherwood 1975). Lawton recom- tives’. They used the scale with a sample of people,
mended that these 17 items be referred to as the average age 70.6, living in Albany-Schenectady-
‘revised PGC Morale Scale’. It was developed for Troy, New York, and reported that morale was
use with older people and is, therefore, appropriate associated with satisfaction with frequency of con-
for these populations. It is easily administered and tact with others, thus supporting its discriminative
can be self- or interviewer-administered. ability. It was also used by Noelker and Harel (1978)
in their survey of 14 nursing-home residents in the
Content USA. The average age of respondents was 81. The
authors reported that twice as many residents who
The 17 items form three major dimensions: desired to live in the homes had high morale scores
agitation (six items), attitude towards own ageing (mean: 12.98; standard deviation 4.25), compared
(five items) and lonely dissatisfaction (six items). to residents who desired to live elsewhere (mean:
The scale consists largely of attitude statements, 10.33; standard deviation 4.96). Morale was also
and respondents are asked to indicate whether best predicted by functional health status: 39 per
they apply to them or not, plus specific questions. cent of the variance in morale scores was explained
Examples of scale statements and questions are: by self-rated health. There is limited evidence of
sensitivity to change (some reports exist by Kalson
How much do you feel lonely?
(1976) and Morris (1975)).
Things keep getting worse as I get older
I see enough of my friends and relatives.
The content validity of the scale has been
As I get older, things are (better, worse, same) than/as I questioned by Borgatta and Montgomery (1987)
thought they would be. because it includes measures of happiness and life
I am as happy now as when I was younger. satisfaction, alleged to be questionable in a scale
Life is hard for me much of the time. purporting to measure ‘morale’. This confusion,
How satisfied are you with your life today? they argue, is made even more problematic by the
use of different time referents (e.g. ‘I am as happy that this concentration of responses at the ‘satisfied’
now as when I was younger’ and ‘How satisfied are end of the scale posed statistical and conceptual
you with your life today?’). problems. Therefore, they developed their D–T
While early factor analyses reported that the scale scales with a broader range of response categories.
formed six factors, three factors have since been for example, the inclusion of seven faces on the
confirmed for the 17-item version of the scale: D–T scale was an attempt to reduce the skew of
agitation (six items), attitude towards one’s own distributions and improve discrimination between
ageing (five items) and lonely dissatisfaction (six respondents. In the Faces Scale, they also offered a
items) (Lawton 1975; Morris and Sherwood 1975). neutral face, as they felt it was important for
Most studies of the factor structure have confirmed respondents to ‘opt out’ if none of the faces repre-
the three-factor structure. Liang and Bollen (1985) sented their feelings.
confirmed these three first-order factors and also The faces show clear expressions and each face is
identified a second-order factor: global life satis- represented by an alphabetical letter, ranging from A
faction, linked with the three first-order factors. (delighted) to G (terrible), depending on its expres-
Ranzijn and Luszcz (2000) also confirmed that the sion. This was seen as an improvement on visual
scale contained factors of positive and negative analogue scales which are laid out along a single
affect. dimension with only the end categories labelled,
leaving the respondent to infer the appropriate
meanings for the intermediate categories.
Reliability
While the D–T faces scale appears to be the most
Lawton (1972) reported a split-half reliability co- popular and best tested, other D–T scales were
efficient of 0.74, a coefficient of internal consist- developed and tested by the authors. These
ency of 0.81, and test-retest reliability coefficients included:
ranging from 0.75 (after three months) to 0.91
(after five weeks). Lawton (1975) provided alpha • a D–T ladder scale. This was originally developed
coefficients for internal consistency of between by Cantril (1965). There are many adaptations
0.81 and 0.85. of this scale, usually with good psychometric
In sum, the scale has acceptable levels of reliabil- properties (Andrews and Robinson 1991; Keyes
ity and validity and is widely believed to be the et al. 2002). Respondents rate their life satisfaction
superior of the existing life-satisfaction and morale via one of the nine ladder rungs (the top rung is
scales. It has been used successfully in other cultural labelled ‘Best I could expect to have’ and the bot-
groups (e.g. Japanese people in the USA, Liang et al. tom is labelled ‘Worst I could expect to have’.
(1992)). One criticism of it is that the inclusion of They are told: Here is a picture of a ladder. At the
items measuring both happiness and satisfaction is bottom of this ladder is the worst situation you
questionable, given the earlier definition of morale might reasonably expect to have. At the top is the
(Stull 1987). best you might expect to have. The other rungs
are in-between . . . Where on the ladder is your
. . .? On which rung would you put it?’
DELIGHTED–TERRIBLE FACES (D–T) SCALE • a D–T visual analogue scale with seven boxes at
intervals along the line representing statements
The D–T scales were developed in response to the about life satisfaction (delighted–pleased–mostly
recognition of deficiencies in other scales, and on satisfied–mixed (about equally satisfied and dis-
the basis of the recognition that life satisfaction is satisfied)–mostly dissatisfied–unhappy–terrible)
subjective and dependent on one’s evaluations of (Andrews and Withey 1976). Respondents are
the different components of life (Andrews and told: ‘We want to find out how you feel about
Withey 1974, 1976). These authors also reported various parts of your life, and life in this country as
that the survey of well-being by Campbell et al. you see it. Please indicate the feelings you have
(1976) found that one-half to two-thirds of now – taking into account what has happened in
respondents selected one of the two most satisfied the last year and what you expect to happen in the
categories which were presented to them. They felt near future . . . How do you feel about . . .?’
• a D–T circles scale consisting of nine circles, div- 50–62 per cent of the variance in evaluations of
ided into eight slices containing a progressive life, are self-accomplishment and problem handling,
number of either plus or negative (minus) signs, family life, income, fun and enjoyment, accom-
so that the extreme left-hand circle (8) contains modation, family togetherness, time to do things
eight plus signs in each of the eight slices and the one wants to, non-work activities, national govern-
extreme right-hand circle (0) contains eight ment activities, quality of local goods and services,
minus signs in each slice. The middle circle con- health status and employment.
tains four plus and four negative signs. Respon-
dents are told: ‘Here are some circles that we can
Scoring
imagine represent the lives of different people.
Circle 0 has all minuses in it, to represent a person The seven faces are given scores from 1 (delighted)
who has all bad things in his or her life. Circle 8 to 7 (terrible). While item responses can be summed,
has all pluses in it, to represent a person who has each item can also be analysed independently.
all good things in his or her life. Other circles A study of 662 people aged 85+ in the East End
are in-between. Which circle comes closest to of London by Bowling and Browne (1991) and
matching how you feel about . . .?’ Bowling et al. (1993) reported that the D–T Faces
Scale was fairly skewed, with about a quarter of
The D–T scales all provide an affective evaluation respondents choosing the terrible faces, while over
of quality of life which involves a cognitive evalu- half selected a delighted face. Research on stroke
ation and some degree of positive/negative feeling patients in the UK by Anderson (1988) also found
(affect). the measure to be skewed, with 19 per cent of
respondents choosing a terrible face and two-thirds
Content choosing a delighted face. However, both studies
reported good acceptance of the scale by respon-
Respondents are shown seven faces ranging from dents. Andrews and Withey (1976) had recognized
wide smiles to turned down mouths. They are told: this problem of a positive skew, which was their
‘Here are some faces expressing various feelings reason for increasing the response categories.
(delighted, pleased, mostly satisfied, mixed, mostly
dissatisfied, unhappy, and terrible). Below each is a
letter. Which face comes closest to expressing how Validity and reliability
you feel about . . .?’ (specific items and/or ‘life as a Comparisons between studies are difficult to make
whole’ are asked about). The faces are shown here: because investigators choose the topic item they
wish to include for self-evaluation using the D–T
scales. However, the authors of the scale compared
the D–T scales with other similar scales and pre-
sented evidence that they were more valid measures
than most other scales assessed; the exception was
their more visually complex ‘circles’ scale. Based
on multimethod-multitrait analysis, the median
validity coefficients for each of their four D–T
scales, in the assessment of six areas of life satisfac-
tion, ranged from 0.70 to 0.82, and were constant
Reproduced with permission from Andrews and Withey (1976) Social
when different areas of life were evaluated using
Indicators of Well-being: Americans’ Perceptions of Life Quality. the same scale (Andrews and Crandall 1976).
It was previously reported by Bowling and
Andrews and Withey (1976) published a wide Browne (1991) (see section on Neugarten Life
selection of topic items which could be incor- Satisfaction Scales), that, on the basis of their
porated within a scale measuring life satisfaction. samples of elderly people living at home in London
Examples of items which they suggested could and Essex, the Spearman’s correlations between
be included, and which were reported to explain the D–T Faces Scales used and Neugarten’s Life
Satisfaction Scale A fluctuated fairly widely from Content
weak to moderate: from r = −0.24 to −0.65 (the The SWLS comprises five items which measure
correlations were in the expected direction – the global cognitive judgements about one’s life. It
minus sign reflects the direction of the scoring). takes about one minute to complete. The scale is
Further analyses (unpublished) showed reliability reproduced below:
coefficients for the D–T Faces Scales of around
coefficient alpha 0.80; correlations between D–T
Below are five statements that you may agree or disagree
Faces Scale items were weak to moderate, ranging with. Using the 1–7 scale below indicate your agreement
from 0.30 to 0.59 (this is not unexpected as quite with each item by placing the appropriate number on
different areas of life were being assessed). Andrews the line preceding that item. Please be open and honest
and Withey (1976) reported average test-retest about your responding.
reliabilities of the D–T scales across several studies,
7 – Strongly agree
when administered during the same interview, 6 – Agree
about 0.70, 5 – Slightly agree
The Delighted–Terrible Faces Scale, and the D–T 4 – Neither agree nor disagree
response scale, are popular among investigators in 3 – Slightly disagree
mental health, where they have been frequently 2 – Disagree
adapted and used, e.g. Baker and Intagliata 1982; 1 – Strongly disagree
Lehman 1988; Oliver et al. 1997). Elsewhere, the ___ In most ways my life is close to my ideal.
use of the D–T faces and the category scale as a ___ The conditions of my life are excellent.
response format on a range of topics is increasingly ___ I am satisfied with my life.
common. The authors have published the topic ___ So far I have gotten the important things I want in life.
items, and suggested that investigators select rele- ___ If I could live my life over, I would change almost
vant items for their own questionnaires. Apart from nothing.
the original work by Andrews and Withey, which Strongly disagree (1)/Disagree (2)/Neither agree nor
reported good reliability and validity, there have disagree (3)/Slightly agree (4)Agree (6)/Strongly agree
been relatively few published studies reporting (7)
usage of the D–T scales in their original format, as
opposed to the selection of items.
Scoring
Each item is scored 1–7. The items are summed,
SATISFACTION WITH LIFE SCALE (SWLS) leading to a range of 5 to 35. The scoring guide
is: 5–9 extremely dissatisfied, 10–14 dissatisfied,
The SWLS is a short instrument which assesses a 15–19 slightly dissatisfied, 20 neutral, 21–25
person’s conscious evaluative judgement of domains slightly satisfied, 26–30 satisfied, 31–35 extremely
of life using his or her own criteria, and weighting satisfied.
each domain themselves (Diener et al. 1985). The
developers conceptualized subjective well-being as
Validity
consisting of the emotional or affective component
and the judgemental or cognitive evaluation of A series of studies with college students by Diener
life (Diener et al. 2003). Most researchers focus et al. (1985) showed that scores on the SWLS
on the former. The Satisfaction with Life Scale was correlated moderately to highly with several
developed as a judgemental component of sub- other measures of subjective well-being (including
jective well-being (Pavot et al. 1991). It aimed Andrews and Withey’s Delighted–Terrible Scale;
to improve on existing scales which appeared to Bradburn’s Affect-Balance Scale and Cantril’s Self-
include factors other than life satisfaction and, Anchoring Ladder). They reported that the corre-
unlike Neugarten et al.’s (1961) LSI-A, does not lations between all the other measures and the
include related concepts such as vigour. It was SWLS, in two samples of college students, ranged
designed to be appropriate for adults of all ages from −0.37 to 0.68, giving support to the con-
(Pavot et al. 1991). vergent validity of the SWLS. Pavot et al. (1991),
in two studies of college students, reported further a high level of stability on test-retest at two months
evidence of the convergent validity of the scale, with (correlation coefficient: 0.82; coefficient alpha:
correlations between the SWLS and Neugarten 0.87). They reported a further study of college
et al.’s (1961) Life Satisfaction Scale A and the students which showed that the inter-item corre-
Philadelphia Geriatric Center Morale Scale of 0.65 lations for the five SWLS items were: 0.81, 0.63,
and 0.81 respectively. They also reported a signifi- 0.61, 0.75 and 0.66, supporting the internal con-
cant, moderate correlation between respondents’ sistency of the scale. Pavot et al.’s (1991) study of
peer and family reports of life satisfaction of 0.54. students reported average test-retest coefficients at
Criterion validity was assessed in a further study five days of 0.83.
of college students in terms of correlations of The scale is relatively popular because it is short
the SWBS with a life-satisfaction rating made by and simple, and quick for respondents to complete.
interviewers (0.73) (Diener et al. 1985). The scale language needs minor adaptations to
The psychometric properties of the scale were the wording for use in the UK (e.g. changing
reviewed by Pavot and Diener (1993) who reported ‘responding’ to ‘responses’, ‘gotten’ to ‘got’). The
that the scale had good convergent validity when disadvantage is the fact that the psychometric test-
compared with other scales and assessments of ing has been largely conducted on college students
subjective well-being, and discriminant validity in the USA, with unknown generalizability. Pavot
when compared with measures of emotional well- et al. (1991) admitted that further testing was still
being. It was also demonstrated to have sufficient needed. The scale is in the public domain and can
sensitivity to be able to detect changes in life be used without charge or permission.
satisfaction after clinical interventions.
It has been reported to correlate moderately in
the expected direction with personality charac- SCALES OF PSYCHOLOGICAL WELL-BEING
teristics (extraversion) (Emmons and Diener 1985). (PWB)
A study of adolescents and college students in
Portugal and Cape Verde found that the scale The Scales of Psychological Well-Being were
was able to predict reported loneliness (Neto and derived from theories of adult development as
Barros 2000), and a study of residents in Japan and well as conceptions of positive psychological
Australia found that it could predict loneliness functioning from existential and humanistic
among Australian but not Japanese respondents psychology, and clinical psychology (Ryff 1989,
(Schumaker et al. 1993). Harrington and Loffredo 1995; Ryff and Essex 1991; Ryff and Keyes 1995).
(2001), in a study based on college students, The scales contain both positive and negative items
also reported that extraverts showed higher life which cover six dimensions: autonomy, environ-
satisfaction than introverts, again supporting the mental mastery, personal growth, positive relations
discriminate ability of the SWLS. with others, purpose in life, and self acceptance
Diener et al. (1985) reported that the scale had a (Ryff 1989).
single factor, in a study of college students, which
confirmed 66 per cent of the variance. Studies by
Content
Pavot et al. (1991) and Lewis et al. (1995) also con-
firmed a single-factor model, and thus supported The original parent version contains six scales,
the unidimensionality of the scale, based on a with 20 items for each of the six dimensions.
sample of college students. The factor structure, There are three versions of the scales (14-item,
convergent and discriminant validity, of the scale 9-item, 3-item). Each of these covers all six
was supported by Clark et al. (1995), again in a dimensions, they vary in the number of items per
study of college students. dimension: 14, 9 and 3 respectively. The 14-item
version is used by Ryff in her own studies. The
items representing each dimension are mixed on
Reliability
the questionnaire. The scales are self-administered.
The series of studies with college students by Some examples from the 14-item scale are shown
Diener et al. (1985) demonstrated that the scale had below.
Environmental mastery Reliability
In general, I feel I am in charge of the situation in which
I live. The results of tests for the reliability of the 20-item
I am quite good at managing the many responsibilities of parent scales were given by Ryff (1989). The internal
my daily life. consistency coefficient alphas for the scales, and
If I were unhappy with my living situation, I would take their correlations with the 20-item parent scale
effective steps to change it. respectively, were reported to be high: autonomy
(0.83; 0.97), environmental mastery (0.86; 0.98),
Autonomy
personal growth (0.85; 0.97), positive relations with
People rarely talk me into doing things I don’t want others (0.88; 0.98), purpose in life (0.88; 0.98),
to do. and self acceptance (0.91; 0.99) (Ryff 1989). Ruini
It’s difficult for me to voice my own opinions on contro-
et al. (2003) reported the Pearson coefficients for
versial matters.
I am not the kind of person who gives in to social test-retest reliability of the scales at one month,
pressures to think or act in certain ways. based on a sample of 450 members of the popula-
tion, and which were judged to be satisfactory for
Strongly disagree (1)/moderately disagree (2)/slightly all six scales. However, a version of the scale with
disagree (3)/slightly agree (4)/moderately agree (5)/
18 items achieved lower internal consistency alphas
strongly agree (6)
in the Canadian Study of Health and Aging (Clarke
et al. 2001).
Scoring The results of tests for reliability and validity of
the short 3-item scales were presented by Ryff and
The items have a six-point response format Keyes (1995). They reported that the 3-item scales
from strongly disagree (1) to strongly agree (6). correlated from 0.70 to 0.89 with the original
Negatively phrased items are reverse coded in the scales. The 3-item version has low internal con-
scoring scales. High scores indicate high self-ratings. sistency: autonomy (alpha = 0.43), environmental
Definitions of high and low scorers on each scale mastery (alpha = 0.57), personal growth (alpha =
are provided. 0.50), positive relations with others (alpha = 0.54),
purpose in life (alpha = 0.37), and self-acceptance
(alpha = 0.53). Thus Ryff does not recommend
Validity
them for high quality assessments of well-being.
There is strong support for the validity of the scales The lower internal consistency for the shorter
indicating that they are adequate measures of version reflects Ryff’s decision to create short
psychological functioning. The results of tests for versions of the scales which represented the multi-
the validity of the 20-item parent scales were factorial structure of the full-length 20-item version,
given by Ryff (1989). The validity of the scales are rather than aim for high internal consistency.
supported by the pattern of correlations between The scales have been used successfully in large
the PWB and various measures of personality in longitudinal population studies in the USA and
samples of the adult population show distinct Canada (Marks and Lambert 1999; Clarke et al.
associations, and which also indicate that the 2001). It has also been translated and used success-
association between psychological well-being fully in other languages, including Dutch (Spruytte
and personality is complex (Schmutte and Ryff et al. 1999) and Swedish (Lindfors 2002). While
1997; Ruini et al. 2003). It was reported to be more information on its psychometric properties
negatively associated with self-consciousness in a from a wider range of descriptive and outcome
study of college students (Harrington and Loffredo studies is still needed, existing studies of the relia-
2001). bility and validity of the full-length scales have been
Factor analyses by Ryff and Keyes (1995) pro- good. In particular, the scales make a potentially
vided evidence that the best-fitting model for valuable contribution to research on ageing,
the scales was six dimensional. Clarke et al. (2001) given the emphasis in social gerontology on the
also reported a confirmatory factor analysis, which theoretical importance of autonomy, self-
supported a six-factor structure. acceptance and self-mastery for ‘successful ageing’
(Baltes and Baltes 1990). The scale is freely available, bothered by nervousness; generally tense; anxious,
although the developer requests copies of how the worried, upset; relaxed, at ease versus highly strung;
instrument will be used and a copy of the results. felt under strain, stress or pressure); depressed mood
(e.g. felt depressed; felt downhearted and blue;
sad, discouraged, hopeless); positive well-being (e.g.
THE (PSYCHOLOGICAL) GENERAL WELL-BEING general spirits; happy, satisfied with personal life;
SCHEDULE (GWBS) interesting daily life; felt cheerful, light-hearted);
self-control (e.g. in firm control; afraid of losing
The General Well-Being Schedule, sometimes control; felt emotionally stable, sure of self); general
called the Psychological General Well-Being health (e.g. bothered by illness, bodily disorders,
Index or Schedule (PGWB), is a concise multi- or aches and pains; healthy enough to do things;
dimensional indicator of subjective feelings of concerned, worried about health); and vitality (e.g.
well-being and distress. It was designed for use in energy, pep; waking feeling fresh, rested; felt active,
the US Health and Nutrition Examination Survey vigorous versus dull, sluggish; felt tired, worn-out,
(HANES), with the aim of providing an index that used up).
could be used to measure self-reports of intra- The frame of reference for questions is ‘during
personal affective or emotional states reflecting a the last month’. The first 14 questions have six
sense of subjective well-being or distress (Dupuy different response choices; and four questions
1973, 1974, 1978, 1984). Population norms for the use rating scales, defined by adjectives at each end.
USA were provided by the HANES. Examples of questions are:
The most well-known application is the modi-
fied version incorporated into the Rand Mental How have you been feeling in general (during the past
Health Inventory, based on a large community month)?
sample of people aged 14–60+ (Brook et al. 1979a,
In excellent spirits
1979b). The Rand Health Insurance Survey pro- In very good spirits
vided the most recent national reference standards In good spirits mostly
for the GWBS in the USA: 71 per cent of adults I have been up and down in spirits a lot
fell into the ‘positive well-being’ category (scores In low spirits mostly
73–110), 15.5 per cent showed moderate distress In very low spirits
(scores of 61–72), and 13.5 per cent were classified
as experiencing ‘severe distress’ (scores of 0–60). How happy, satisfied, or pleased have you been with
Fifteen of the GWBS items were retained for use in your personal life (during the past month)?
the final version of the Rand Mental Health Inven- Extremely happy – could not have been more satisfied or
tory. Brook et al. (1979a, 1979b) have extensively pleased
reviewed the scale. Very happy
Fairly happy
Satisfied . . . pleased
Content Somewhat dissatisfied
Very dissatisfied
The initial draft of the instrument contained
68 items, 18 of which were used for the US How concerned or worried about your health have you
HANES; these were referred to as the General been (during the past month)?
Well-Being Schedule. A 33-item version was also
0 1 2 3 4 5 6 7 8 9 10
developed and used (Fazio 1977). One version of
the index has 22 items. The items include indicators Not concerned Very
of both positive and negative affect. It is a self- at all concerned
administered questionnaire; administration time is
12 minutes.
Scoring
It includes items for six states of being. The sub-
scales used to measure these six states contain three The first 14 questions have six response choices
to five items each. The six subscales are anxiety (e.g. which are scored on a scale of 0 to 5. A value of 0 is
allocated for the most negative response and 5 for reliability produced good results (with the excep-
the most positive response (rating scales are also tion of a lowered test-retest coefficient of 0.50
used; see earlier). The range of scores is from 0 to when the interval was extended from one week
110, and the range for the subscales is from 0 to 15, to one month). Test-retest reliability coefficients
or 20 to 25. The overall score or subscale scores can ranged from 0.50 to 0.86, with a median of 0.66.
be used for analysis. Dupuy proposed that scores of These results were based on a wide range of studies
0–60 reflect ‘severe distress’, 61–72 ‘moderate from samples of students to the large sample of
distress’, while 73 to 110 represent ‘positive well- adults participating in the Rand Health Insurance
being’. Study. They have been reviewed by Dupuy (1984).
Test-retest reliability coefficients of 0.68 and
0.85 for two sub-samples within the US HANS
Validity
were also reported by Monk (1981). It is difficult
Fazio’s (1977) study of 195 students found that the to know whether the lower correlations reflect
GWBS correlated moderately with interviewers’ the instability of the instrument or changes in
ratings of depression (0.47); and that the average individuals.
correlation of the GWBS and six other depression Monk (1981) also reported internal consistency
scales was 0.69, and 0.64 with three anxiety scales. coefficients of 0.93 on the basis of analyses of 6,913
Fazio (1977) and Ware et al. (1979) also reported people. Fazio (1977) reported similar internal con-
criterion correlations between the GWBS and sistency coefficients of 0.91 for males and 0.95 for
interviewers’ ratings generally ranging between females; and correlations among sub-scores ranging
0.65 and 0.90. Nakayama et al. (2000) tested the from 0.16 to 0.72. Ware et al.’s (1979) review
Japanese version of the scale in over 1,000 adults reported three studies using the GWBS which also
and reported that it had good concurrent validity found internal consistency coefficients of over 0.90.
in comparison with five other validated scales of However, Edwards et al. (1978) reported a lower
anxiety depression. coefficient of 0.69.
Edwards et al. (1978) showed that the scale was This is a scale with good test results on the whole.
sensitive enough to detect the progress over three One advantage of this scale is that it avoids reference
weeks of the 21 psychiatric day patients in their to physical symptoms of emotional distress and
study. Kammann and Flett (1983) reported a corre- so avoids problems of interpretation. Fluctuations
lation of 0.74 between the GWBS scale and their in test-retest reliability may be problematic in
96-item scale of general happiness and well-being assessing individuals. A major disadvantage is that
(the Affectometer). most of the early validation studies of it are
Dupuy (1978) and Wan and Livieratos (1977) unpublished, although these have been thoroughly
reported factor analyses of the GWBS which reviewed by Brook et al. (1979a, 1979b). The scale
showed three factors explaining 51 per cent of the was adapted and tested for use in Britain, as an
variance: anxiety, tension and depression; health and outcome indicator of depression, by Hunt and
energy; and positive well-being or life satisfaction. McKenna (1992).
A six-factor solution was produced confirming
the six subscales using multi-trait and factor analysis
(Ware et al. 1979). Studies of the factor structure SENSE OF COHERENCE SCALE (SOC)
of the scale in other cultures have supported the
structural validity of 6-factor, 3-factor and 4-factor A generic scale of coherence, which overlaps with
models (Nakayama et al. 2000). life satisfaction, is included in this section: the Sense
of Coherence Scale (Antonovsky and Sagy 1986;
Antonovsky 1987, 1993). The author reviewed
Reliability
more specific scales of adjustment, coping and con-
Extensive scaling tests and tests of reliability and trol in Measuring Disease (Bowling 2001). The Sense
validity were carried out on 1,209 respondents. of Coherence Scale is global in content, and is
Internal consistency coefficients for the six sub- increasingly popular in European studies of health
scales ranged between 0.72 and 0.88. Test-retest outcome (to measure modifying factors).
The development of the concept and the scale Has it happened that people whom you counted on
was based on intensive interviews with 52 respon- disappointed you?
dents who had suffered major life crises. The 1 2 3 4 5 6 7
Sense of Coherence Scale was derived from a
theoretical model designed to explain the main- Never Always
tenance or improvement of one’s position on a happened happened
health–ease/disease continuum (Antonovsky 1993). Life is:
The sense of coherence was defined by Antonovsky
(1987) as: 1 2 3 4 5 6 7
Full of Completely
a global orientation that expresses the extent interest routine
to which one has a pervasive, enduring though
dynamic feeling of confidence that (1) the What best describes how you see life?
stimuli deriving from one’s internal and 1 2 3 4 5 6 7
external environments in the course of living
are structured, predictable, and explicable; (2) One can always find There is no solution
the resources are available to one to meet the a solution to painful to painful things in
demands posed by these stimuli; and (3) these things in life life
demands are challenges, worthy of investment Doing the things you do every day is:
and engagement.
1 2 3 4 5 6 7
These three elements are called the comprehensi- A source of deep Deep source of pain
bility, manageability and meaningfulness of life. pleasure and and boredom
In effect, it is a global orientation to one’s inner satisfaction
and outer environments, and can be used as an indi-
cator of coping capacity in stressful life situations. You anticipate that your personal life in the future
will be:
Antonovsky argued that a strong sense of coherence
is necessary for the successful management of 1 2 3 4 5 6 7
tension due to stress, and the movement towards
the healthy end of the ease–disease continuum Totally without meaning Full of meaning
(Antonovsky 1990). or purpose and purpose
Content Scoring
The scale is a 29-item 7-point numeric scale, Each of the 29 items have a 7-point numeric scale,
comprising 11 items on comprehensibility, 10 on which are simply summed to produce the scale’s
manageability and 8 on meaningfulness. A range of score. Thus the 29 items are scored from 29 to
shorter versions have been described; of these the 203, the higher the score the stronger the sense of
13-item version is the generally accepted short coherence. The scoring follows the same format
version, with acceptable levels of reliability and for the 13-item version. Langius et al. (1992)
validity (Antonovsky 1987; Langius 1995; Gana and reported testing the numeric semantic-differential
Garnier 2001; Pallant and Lae 2002). Examples format of the scale with an alternative linear visual
from the 29-item version are: analogue scale response method; no significant
differences were found between the two scaling
Do you have the feeling that you don’t really care about formats.
what goes on around you?
1 2 3 4 5 6 7 Validity
Very seldom Very The validity and reliability of the SOC have been
or never often tested in studies in over 20 countries. It was judged
to be applicable cross-culturally (Antonovsky 1993), was used in 16 studies, and the Cronbach’s alpha
but as it uses some culture-specific colloquialisms, was 0.74–0.91. Some of the studies reported the
care in translation and testing for meaning is test-retest correlations, which were stable at 0.54
required. Antonovsky (1993) presented the evi- over two years (Antonovsky 1993). Langius et al.
dence for the scale’s validity from all published (1994) and Langius (1995), in studies of patients
studies up to 1993, and reported significant corre- with oral or pharyngeal cancer, reported the
lations between the Sense of Coherence Scale and Cronbach’s alpha to be 0.88–0.89.
measures of health, illness, well-being, orientation The 13-item version of the scale is increasingly
to self and stress, indicating the scale’s criterion and popular in clinical and social research, and it appears
convergent validity. A study of the outcome of to have good psychometric properties.
patients undergoing heart surgery by Dantas et al.
(2002) supported the discriminative ability validity
of the scale on the basis of its hypothesized inde- SCALES OF SELF-ESTEEM
pendent associations with quality of life ratings.
Pallant and Lae (2002) used the 13-item version Among the most popular and commonly used
alongside measures of physical and psychological measures of self-esteem are: the Self-Esteem Scale
health, personality and coping, and reported sig- (Rosenberg 1965); the Tennessee Self-Concept
nificant associations between these measures and Scale (Fitts 1965); and the Self-Esteem Inventory
the SOC in the expected directions. In addition, (Coopersmith 1967). Several other popular scales
Matsuura et al. (2003), in a study of patients with have been reviewed by Crandall (1973), Robinson
systemic sclerosis in Japan, also reported a high and Shaver (1973), Wylie (1974) and George and
correlation between the Japanese version and the Bearon (1980); the reader is referred to these
Beck Depression Inventory, and multiple- authors for a more detailed review. Most scales
regression analysis confirmed that low SOC was appear to warrant further testing on a wider range
an independent predictor of depression. Björvell of population types as the early studies were con-
et al. (1994), in a study of obese patients, reported centrated on studies of students, although they are
a significant correlation of −0.55 between the Sense now increasingly used in clinical studies.
of Coherence Scale and a measure of motivation,
indicating that the stronger the self-rated sense
of coherence, the greater the perceived self- THE SELF-ESTEEM SCALE
motivation. It has also been reported to be signifi-
cantly associated with measures of life satisfaction Rosenberg (1965) described self-esteem as self-
(Anke and Fugl-Meyer 2003) and to discriminate acceptance or a basic feeling of self-worth, and
between women with and without irritable bowel developed the Self-Esteem Scale, based on Guttman
syndrome (Motzer et al. 2003). scaling, for a study of 5,024 students in public
In a study of 189 US veterans a factor analysis of schools in New York. Little information exists
the scale revealed that all 29 items loaded on one about the development of the scale. Self-esteem
true factor at 0.40 or above (eigenvalue given: scores were correlated with characteristics such as
12.45) (see Antonovsky 1993). Gana and Garnier participation and leadership in school activities.
(2001) evaluated the French version of the 13-item The measure was intended to be brief, global
scale in over 600 adults and reported a three- and unidimensional. It has been widely used in
correlated factor model which included manage- varying settings. Evidence suggests that it is suitable
ability, meaningfulness and comprehensibility. for use with older people. A study of around
5,000 retired teachers and telephone-company
employees by Atchley (1976) reported that men
Reliability
had higher self-esteem than women. Kaplan and
The 29-item version of the scale was tested for Porkorny (1969) in a study of 500 adults in Harris
reliability in 26 studies reported by Antonovsky County, Texas, reported that age is unrelated to
(1993). The Cronbach alpha measure of internal self-esteem. Ward (1977), on the basis of a study
consistency was 0.82–0.95. The 13-item version of 323 residents of Madison, Wisconsin, reported
that predictors for women’s self-esteem were scoring method. It has also been reported to be sig-
current activities, age-related deprivations and nificantly associated with depression (Schmitz et al.
health. Among the elderly (aged 60–92), attitudes 2003), and to perform better than the Coopersmith
towards old age are predictive of self-esteem, and Self-Esteem Inventory in predicting dieting dis-
for men income and education are predictive order psychopathology (Griffiths et al. 1999).
(Ward 1977). Kaplan and Porkorny (1969) reported two
uncorrelated factors which accounted for 45 per
Content cent of the total variance. The items forming the
first factor they called ‘self-derogation’, and they
The scale consists of ten items of which five items stated that the second factor reflected ‘defense of
are positively worded and five items are negatively individual worth’. Kohn (1969) reported similar
worded, with responses reported along a four-point results. A two-factor model was supported in factor
continuum from ‘strongly agree’ to ‘strongly dis- analyses by Greenberger et al. (2003) in a study of
agree’. Examples of items are: over 700 ethnically diverse undergraduate students,
who reported that it had high validity among
I feel that I’m a person of worth, at least on an equal different ethnic groups.
plane with others.
I feel that I have a number of good qualities.
All in all, I am inclined to feel that I am a failure. Reliability
I feel I do not have much to be proud of.
At times I think I am no good at all. Reliability (internal consistency and test-retest)
On the whole, I am satisfied with myself. has been shown to be good by Rosenberg (1965,
I take a positive attitude toward myself. 1986) who reported reproducibility coefficients of
0.85–0.92 and a scalability coefficient of 0.72; Ward
Strongly agree (1)/agree (2)/disagree (3)/strongly disagree
(1977) reported a coefficient of alpha of 0.74 for
(4)
internal consistency; Silber and Tippett (1965)
reported a test-retest reliability coefficient of 0.85
Scoring from administrations of the scale to 28 students
The measure was designed as a ten-item Guttman with a two-week interval, and item correlations of
scale. The category responses were originally 0.56 and 0.83.
designed to be scored from 0 to 6 (strongly agree In sum, the scale is attractive due to its brevity
to strongly disagree). However, there is no agree- and simplicity but still requires further testing
ment over the method of scoring, and some users on wider populations for validity, reliability and
score responses dichotomously as ‘agree’ or ‘dis- sensitivity to change. Wylie (1974), in her extensive
agree’, and other researchers use a simple summing but early review of the scale, concluded that it is
scale. worthy of further research and development.
Also its method of scoring remains unresolved. It
was highly recommended by George and Bearon
Validity
(1980).
Rosenberg (1965) explicitly chose items for the
scale which he felt had face validity. Rosenberg
(1965), in assessing its construct validity, also THE TENNESSEE SELF-CONCEPT SCALE
reported that positive self-esteem was predictive
of several social and psychological characteristics, The Tennessee Self-Concept Scale is probably the
such as reduced shyness, depression and more most popular and most often used scale. It was
assertiveness and social activities. He reported that developed for use in mental-health rehabilitation
the scale had acceptable predictive validity in rela- in 1956 and revised in 1965 (Fitts 1965). Fitts (1965,
tion to depression levels among volunteers assessed 1972; Fitts and Warren 1996) based the scale on
by nurses. Robinson and Shaver (1973) reported Maslow’s theory that individuals who are more
that the scale correlated 0.59–0.60 with Cooper- self-actualizing are more able to realize their true
smith’s Self-Esteem Inventory, depending on the potentialities and to function in a more creative and
effective manner. Fitts saw self-concept related to (Items from the Tennessee Self-Concept Scale, Copyright © 1964 by
William H. Fitts. Reprinted by permission of the publisher, Western
performance: the person who has a clear, consistent, Psychological Services, 12031 Wiltshire Boulevard, Los Angeles,
positive and realistic self-concept will behave in a California 90025, USA. All rights reserved.)
healthy, confident, constructive and effective way.
This is dependent on other things being equal, and
Scoring
this, of course, is not always the case.
The scale was developed from other scales and The total score is a positive self-esteem score. The
open-ended items, and from self-descriptions from ten remaining items are the self-criticism (lie) scale,
samples of psychiatric patients and non-patients. consisting of mildly negative statements. The
These were placed on an internal–external scale. positive self-esteem score has a potential range of
Ninety remaining statements were independently 90–450. Fitts (1965) reported a mean score, on the
classified by seven clinical psychologists into 15 basis of his original sample of 628 adults, of 345.57.
categories. There was perfect agreement on their Scoring is complex, and a set of scoring templates
negative and positive content. The scale was initially and a scoring matrix is required. A computerized
tested on 626 people, aged 12–68. scoring service is available from the publisher.
Administration takes approximately 20 minutes,
and is based on self-completion.
Validity
Content
Correlations demonstrating convergent, dis-
criminant and predictive validity were reported by
The scale consists of 100 self-descriptive items and Fitts (1965), Thompson (1972) and Roid and Fitts
is fairly complicated. Ninety items are categorized (1988). However, much of the evidence to support
under the following labels: physical self, moral- the scale comes from relatively early or unpublished
ethical self, personal self, family self and social self. studies. Correlations with an anxiety scale of −0.70
These labels are divided into statements about support the discriminant ability of the scale (Fitts
internal self-concept: self-identity, self-acceptance 1965), and confirmed by Thompson (1972). Fitts
and behaviour. The scale is intended to summarize (1965) also reports a correlation of 0.68 with a scale
an individual’s feeling of self-worth, and the degree of positive affect. Discriminant validity is partly
to which the image is realistic or deviant. Ninety suggested by a correlation of −0.21 with the F scale
items tap both an internal and external dimension measure of authoritarianism; and predictive validity
of self-concept. The remaining ten items form a lie is suggested by its ability to distinguish between
scale and measure defensive responses. The response mental health and psychopathology (Fitts 1965).
categories to 90 of the items lie along a five-point However, Wylie (1974) questioned the discriminant
continuum, ranging from ‘completely false’ to validity on the scale on the grounds of lack of
‘completely true’. sufficient information reported by Fitts (1965).
Depending on the intended use, two versions While there are many relevant variables that corre-
of the scale are available – one for counselling and late significantly with the scale, there are also many
one for clinical or research purposes. Details of the that do not (Reed et al. 1980).
scale, reviews of unpublished and published studies The scale has been widely used on samples of
and population norms, are available in reports by juveniles and psychiatric patients, as well as normal
Fitts (1965), Roid and Fitts (1988) and Fitts and adults. Self-esteem has been found to be higher
Warren (1996). Its commercial nature prohibits the among older people than among adults generally
reproduction of more than a few items: (Grant 1966). Goodrick et al. (1999) reported
that the scale was able to predict improvement in
I have a healthy body. severity of binge-eating.
I am satisfied with my moral behaviour. Vincent (1968) undertook a factor analysis of the
I am a member of a happy family. scale. Self-acceptance and personal-self loaded with
I am as sociable as I want to be. several similar measures. Vacchiano and Strauss
Completely false (1)/mostly false (2)/partly false and (1968) reported that their factor analysis of the scale
partly true (3)/mostly true (4)/completely true (5) revealed 20 factors. The factor structure remains
unresolved (see McGuire and Tinsley 1981 and 50-item scale. Items were reduced to 25, after an
Roid and Fitts 1988). Applications of the scale do item analysis based on the responses of 121 children.
not show a consistent pattern of results to be able The correlation of the longer with the shorter
to support the definition of self-concept as defined version was 0.95. The scale is self-administered and
by Fitts (Walsh 1984). takes approximately ten minutes.
Reliability Content
Test-retest reliability tests show high correlations of The items consist of short statements which the
0.92 for the positive self-esteem score and 0.75 for subject rates as either ‘like me’ or ‘unlike me’. It is
the self-criticism scale over a two-week period in a multi-dimensional, covering leadership-popularity,
study of 60 students (Fitts 1965). Wylie (1974) again self-derogation, family-parents, and assertiveness-
criticized Fitts (1965) for lack of sufficient infor- anxiety. Examples are:
mation to make independent judgements about
the reliability of the scale. This has been rectified in I can make up my mind without too much trouble.
a subsequent manual by Roid and Fitts (1988). I’m a lot of fun to be with.
Estimates of internal consistency (alpha co- I’m popular with people my own age.
efficients) for the scale range between 0.66 and 0.94 It’s pretty tough to be me.
for the total scale score and subsets of the scale, with Things are all mixed up in my life.
most being between 0.70 and 0.87 (Stanwyck and I often feel upset about the work I do.
Garrison 1982; Tzeng et al. 1985; Roid and Fitts I’m not as nice looking as most people.
1988). However, inter-item correlations appear to If I have something to say, I usually say it.
be relatively low for the various subsets of items Things don’t usually bother me.
(from 0.14 to 0.35) (Tzeng et al. 1985), although
this is within the range expected for such scales The 50-item scale has an additional eight lie-scale
(Roid and Fitts 1988). items (Coopersmith 1967). A similar 25-item
In sum, the scale appears usable with older version also exists, and this can be used with adults
people, although the low self-criticism scores (aged 16+). The version for adults has been pub-
obtained by the elderly should make the user lished by Robinson and Shaver (1973).
cautious (George and Bearon 1980). It is a popular
scale but lengthy to administer. Scoring
The scoring format remains untested. The item
COOPERSMITH SELF-ESTEEM INVENTORY responses ‘like me’ or ‘unlike me’ are allocated a
value and simply summed. A score is derived by
The Coopersmith Inventories were well researched multiplying X, the raw score, by 2 on the short scale
(Coopersmith 1975, 1981a, 1981b), and are widely and 4 on the long scale. A totally positive score is
used in social science and by clinicians. Self-esteem 100 and a totally negative score is 0.
is portrayed as a trait that is not evenly distributed in
the population, but highly desirable to have.
Validity
The manual of the scale offers several sources of
population norms (Coopersmith 1981a). Cooper- Convergent validity correlations between the scale
smith (1967) defined self-esteem as self-judgements and other self-esteem scales, based again on students,
of personal worth, a definition compatible with vary widely between 0.02 and 0.60 (Taylor and
earlier definitions. The Self-Esteem Inventory Reitz 1968; Ziller et al. 1969; Crandall 1973). More
measures attitudes towards the self, encompassing consistent correlations with the Rosenberg Self-
several domains: social, academic, family and per- Esteem Scale were reported by Robinson and
sonal experiences. The scale was devised by five Shaver (1973), again using students (total: 300) from
psychologists for use with children who classified 0.59 to 0.60. Hoffmeister (1976) compared two
items according to high or low esteem to derive a subscales of the Self-Esteem Questionnaire which
he developed with Coopersmith’s Self-Esteem samples of students (total: 500), carried out two
Inventory and reported correlations of 0.40 and factor analyses of the scale which indicated its multi-
0.61. Correlations range between 0.44 and 0.75 dimensional nature. Four factors emerged: self-
when tested against a social desirability scale (Taylor derogation, leadership-popularity, family-parents
and Reitz 1968), indicating that there is possible and assertiveness-anxiety. The family-parents factor
confounding with social desirability. was reported to be the most stable and least
Correlations have been reported with scales ambiguous. Kokenes (1974) estimated the validity
measuring other concepts, which would be of the subscales by factor analysis of the responses
expected on theoretical grounds: Campbell (1967) of 7,600 schoolchildren and reported that the
reported a correlation of 0.31 with an achievement four bipolar dimensions obtained were highly
test; Boshier (1968) reported a correlation of 0.80 congruent with the test’s subscales.
with the scale and liking one’s first name; Wiest
(1965) reported a correlation of 0.22 between the
Reliability
scale and the reporting of mutual liking between
others and self. Other studies have reported corre- The scale was originally administered to 87 school-
lations in the expected direction, for example, children. Early testing for test-retest reliability
with psychiatric disorder in adolescents (Guillon reported high coefficients at 0.88 over five weeks
et al. 2003), and binge-eating in obese hospital and 0.70 over three years, based on the samples
out-patients (Jirik-Babb and Geliebter 2003). of pre-adolescent schoolchildren (Coopersmith
Not all correlations are in the direction expected. 1967). Thus it is a stable measure over time and is
For example, Trowbridge (1970) reported a higher not suitable for longitudinal use where changes
mean on the scale for children who were socio- require measurement (Coopersmith 1967). Split-
economically disadvantaged. A gender bias on six of half reliability tests also show high correlations:
the scale items (25-item version) has been reported 0.90 (Taylor and Reitz 1968). Spatz and Johnson
(Chapman and Mullis 2002), requiring caution in (1973) administered the 50-item child version to
its use. 600 students and reported internal-consistency
Wylie (1974) questioned Coopersmith’s claims coefficients in excess of 0.80. However, internal
of validity for the scale, on the grounds of the consistency was reported to be low in another study
large number of significance tests undertaken in of 453 college students (Crandall 1973), a probable
its development and the non-reporting of the consequence of its multi-dimensionality.
actual number of such tests. The number reaching In sum, the scale appears to have been initially
statistical significance at the 0.5 level was well researched and is widely used. A major
unreported, thus making it impossible to estimate methodological limitation has been its restricted
the number that could have occurred by chance use with samples of students for the testing of
alone. the reliability and validity of the adult version
Robinson and Shaver (1973), on the basis of two (Adair 1984).
8
MEASURES OF BROADER
QUALITY OF LIFE
This chapter focuses on broader, generic measures pared cardiac patients’ expectations of their health
of quality of life. These are theoretically distinct care with the domains of the SF-36 and reported
from measures of broader health status and health- that patients adopted a broader approach than that
related quality of life which have often dominated contained within the eight domains of the SF-36
the quality of life field. Health is valued highly by questionnaire, and mentioned expectations of the
people, but good health is just one of the areas manageability of their condition, the expectation of
nominated by the public as giving their life quality reassurance, and emphasized the future (knowing
(Bowling 1995; Bowling et al. 2003). what will happen to their condition, increasing
Regardless of conceptual distinctions, many their chances of living). This has also led to the
investigators have used measures of broader health development of more individualized measures,
status to measure health-related quality of life or which are presented, with standardized approaches,
even broader quality of life. This has been justified in this chapter.
on the basis of the untested assumption that broader Quality of life spans a broad range of topics and
measures of health status (e.g. the SF-36) include disciplines (see Chapter 1). It is made up of both
the main areas in which health can affect one’s life positive and negative experiences and affect. It is a
(Ware et al. 1993; Bowling 1995), and which are dynamic concept, which poses further challenges
relevant for health services. However, it does lead for measurement. When measuring changes in
to conceptual confusion. In addition, a plethora of quality of life, several variables need to be taken into
disease-specific quality of life measures has been account, including actual changes in circumstances
developed, with little standardization of measure- of interest (e.g. health) and regression to the mean.
ment approaches between studies (Garratt et al. Other factors, which require consideration, include
2002). A pragmatic approach has prevailed in the stable personality characteristics. For example,
literature on quality of life, which has meant that optimism bias might help people to cope with,
the definition of terms has been neglected, and the and adjust to, deteriorating circumstances, leading
selection of measurement scales often appears to be to an optimistic evaluation of their quality of life
ad hoc. as higher (Pearlin and Schooler 1978; Scheier and
An increasing number of investigators are Carver 1985; Diener et al. 1991; Sprangers and
unhappy with the traditional use of measures, or Schwartz 1999; Brissette et al. 2002).
proxy measures, of quality of life which focus on Relevant cognitive or affective processes in
functioning rather than addressing the complexity changing circumstances also include making com-
of people’s views and expectations, and which are parisons of one’s situation with others who are
theoretical. For example, Staniszewska (1999) com- better or worse off, cognitive dissonance reduction
MEASURES OF BROADER QUALITY OF LIFE 149
(defensive preference for the circumstances experi- over time (Andrews and Crandall 1976; Jenkinson
enced), reordering of goals and values and response et al. 1991). And little experimental work has been
shift. Consciously or unconsciously, people may carried out testing the different values, which can
adjust to deteriorating circumstances because they be attached to weights – such as relative importance,
want to feel as good as possible about themselves. satisfaction or goal achievement and gap (‘social
The roots of this process are in control theory, comparisons’ and ‘expectancy’) ratings of
with goals of homeostasis. Response shift refers to individuals.
the process whereby internal standards and values This chapter examines broader scales of quality
are changed – and hence the perception of quality of life that have largely been developed outside
of life (Sprangers and Schwartz 1999). Albrecht and disease-specific contexts. The exceptions are the
Devlieger (1999) focused on the issue of why so LASA and Spitzer scales, which were developed for
many people with serious and persistent disabilities use with cancer patients. These are presented here
report their quality of life to be good or excellent, because researchers across disciplines often adapt
when their lives would be viewed as undesirable by their format (unacknowledged) (i.e. the use of the
external observers. Their in-depth interviews with LASA response scale and the global quality of life
people indicate that consideration of quality of life Spitzer Uniscale). Broad scales of quality of life have
was dependent upon finding a balance between also been developed in mental health; as they are
body, mind and the self (spirit), and on establishing disease specific and limited to research in mental
and maintaining harmonious relationships, support- health they are not included here (these have been
ing the theory of homeostasis. reviewed in Measuring Disease, Bowling 2001).
Some investigators of quality of life use the
‘then-test’ technique to test for changes in internal
standards. With this method, respondents are asked THE WHOQOL
about their perceptions of their situation at baseline
(Tn), and then again at follow-up (Tn + 1), along The Constitution of the WHO (1947a, 1948b)
with retrospective questions at follow-up about defined health as ‘a state of complete physical,
how they perceive themselves to have been at mental and social well-being not merely the
baseline (then-test for Tn). Analysis and com- absence of disease . . .’ Thus it followed that a
parison of the scores indicates the response shift and measure of health outcomes should include not just
the change in quality of life (Sprangers et al. 1999; assessment of clinical changes but also of broader
Joore et al. 2002). The reliability and validity of this well-being and health-related quality of life. The
method has yet to be fully tested. WHO, in collaboration with 15 centres worldwide,
The subjectivity and complexity of quality of developed two instruments for measuring quality of
life presents a challenge not only to the design of life – the WHOQOL-100 and the WHOQOL-
quality of life measurement scales, and their com- BREF – which were intended for use in a wide
position (content validity), but also to their scoring range of cultural settings, enabling different popula-
and/or weighting. If measurement scales give tions to be compared (WHOQOL Group 1994,
equal weighting to the various sub-domains of 1996). While developed for use in clinical practice,
quality of life it is unlikely that the domains will clinical and policy research on outcomes, and audit,
have equal significance to different social groups the WHOQOL instruments were designed broadly
and individuals within these. Even where scales to reflect the WHOQOL Group’s (1993) defin-
are weighted it is unlikely that the weightings will ition of quality of life as based on individual’s per-
be equally applicable to different social groups and ceptions of their position in the context of their
individuals. Moreover, the literature comparing culture and value systems, and in relation to their
standardized weighted and unweighted cardinal (i.e. goals, expectations, standards and concerns. They
summed) scales – whether of life events, life satis- regarded quality of life as not confined to domains
faction or health status – consistently reports no of health but as broad ranging and affected by a
benefit of more complex weighted methods over person’s physical health, psychological state, per-
simple summing of scores in relation to the propor- sonal beliefs, social relationships and relationship to
tion of explained variance or sensitivity to change their environment.
The WHOQOL was developed from statements noise/traffic/climate); transport); (6) spirituality/
collected from patients with a range of conditions, religion/personal beliefs (single facet: spirituality/
professionals, and healthy people, and was initially religion/personal beliefs).
piloted by expert review and qualitative fieldwork. Analyses of the factor structure of the WHO-
It was subsequently tested for validity and reliability QOL-100 indicated that domains 1 and 3 and
on 250 patients and 50 healthy respondents in the domains 2 and 6 could be merged, thereby
15 participating centres. The original instrument creating four domains of quality of life instead of
contained 300 items, and this was reduced to 100 six: physical, psychological, social relationships and
which form the current instrument – WHOQOL- environment. These reflect the current grouping
100. The WHOQOL-BREF, a 26-item version of and scoring and was supported in cross-cultural
the WHOQOL-100, was developed using data studies (WHOQOL Group 1998b; Power et al.
from the field trials of the parent instrument 1999).
(WHOQOL Group 1998b). Work is continuing on All items are rated on a five-point scale. The time
the psychometric properties of the instruments. reference for the questions is ‘in the last two weeks’.
The core WHOQOL instruments assess quality
of life across situations. Modules were also Content
developed which collect more detailed information
from specific groups (e.g. elderly people, refugees, The WHOQOL-100 now contains four broad
people with specific diseases such as cancer, domains of quality of life, 24 facets of quality of life
HIV/Aids). THE WHOQOL is administered by (4 items per facet), and four general items covering
an interviewer, although the WHOQOL-BREF subjective overall quality of life and overall health).
can be self-administered. The full instrument takes These produce 100 items in total. The domains are:
10–20 minutes administration time. It is available in physical, psychological, social relationships and
over 20 different languages, and each translation was environment. The WHOQOL-BREF contains
tested for cultural equivalence. A methodology is two items for overall quality of life and general
available for further translations. A manual is also health and one item from each of the 24 facets in
available from the developers. the WHOQOL-100. All items are rated on a five-
WHOQOL-100 initially contained six broad point scale. The time reference for the questions
domains of quality of life, 24 facets of quality of life is ‘in the last two weeks’. Examples from the
(4 items per facet), and four general items covering WHOQOL-100 are:
subjective overall quality of life and overall health).
These produced 100 items in total. The following questions ask you to say how satisfied,
The domains (and facets where apply) of the happy or good you have felt about various aspects of
your life over the past two weeks. For example, about
WHOQOL-100 initially compromised an overall your family life or the energy that you have. Decide how
domain and six specific domains: Overall quality of satisfied or dissatisfied you are with each aspect of your
life and general health: (1) physical health (energy life and circle the number that best fits how you feel
and fatigue; pain and discomfort; sleep and rest); about this. Questions refer to the past two weeks.
(2) psychological (bodily image and appearance;
negative feelings; positive feelings; self-esteem; How satisfied are you with the quality of your life?
thinking, learning, memory and concentration); In general, how satisfied are you with your life?
(3) level of independence (mobility; activities of How satisfied are you with your health?
daily living; dependence on medicinal substances How satisfied are you with the energy you have?
and medical aids; work capacity); (4) social relations Very dissatisfied/dissatisfied/neither satisfied nor dis-
(personal relationships; social support; sexual satisfied/satisfied/very satisfied
activity); (5) environment (financial resources; free-
dom, physical safety and security; health and social The following questions ask about how much you have
care: accessibility and quality; home environment; experienced certain things in the last two weeks, for
opportunities for acquiring new information and example, positive feelings such as happiness or con-
skills; participation in, and opportunities for, recre- tentment. If you have experienced these things an
ation/leisure; physical environment (pollution/ extreme amount circle the number next to ‘An extreme
amount’. If you have not experienced these things at all, whose treatment was changed, in comparison with
circle the number next to ‘Not at all’. You should circle controls. The WHOQOL-100 was reported to
one of the numbers in between if you wish to indicate have good concurrent validity, greater com-
your answer lies somewhere between ‘Not at all’ and prehensiveness and good responsiveness to clinical
‘Extremely’. Questions refer to the last two weeks.
change in comparison with the SF-36, in a study
of over 100 out-patients with chronic pain in the
How safe do you feel in your daily life?
UK (Skevington et al. 2001).
Not at all/Slightly/Moderately/Very/Extremely
Domain scores produced by the WHOQOL-
Do you feel you are living in a safe and secure BREF have been reported to correlate highly
environment? (0.89) with the four domains of the WHOQOL-
Not at all/Slightly/Moderately/Very much/Extremely 100 domains scores; and this shorter instrument
How much do you worry about your safety and security? was reported to have good discriminant and con-
Not at all/A little/A moderate amount/Very much/An tent validity, internal consistency and test-retest
extreme amount reliability in cross-sectional surveys of adults car-
How comfortable is the place where you live? ried out in 23 countries (The WHOQOL Group
Not at all/Slightly/Moderately/Very/Extremely 1998b; Skevington et al. 2004). The WHOQOL-
BREF was tested by de Girolamo et al. (2000)
How much do you like it where you live? in over 300 people who were in contact with
Not at all/A little/A moderate amount/Very much/An
extreme amount
health services in Italy. They reported only the
physical and psychological domains were able to
© World Health Organization. Reproduced with permission.
www.who.int/msa/qol
discriminate between healthy and unhealthy
respondents.
Although results vary, investigators in various
Scoring countries have reported that the WHOQOL-
The WHOQOL-100 produces scores relating to 100 and WHOQOL-BREF can successfully
specific facets of quality of life, as well as for the discriminate between patient groups across a
main domains measured, and a score for overall wide range of conditions, and has generally good
quality of life and a score for general health. The psychometric properties (e.g. Struttmann et al.
WHOQOL-BREF produces domain but not facet 1999; Flec et al. 2000; Leplege et al. 2000; Skeving-
scores. Scores are produced for: physical, psycho- ton et al. 2001).
logical, social relationships and environment. It was reported earlier that factor analysis of the
WHOQOL-100 supported four domains of quality
of life instead of six: physical, psychological, social
Validity
relationships and environment.
The WHOQOL instruments have been shown to
have good discriminant and content validity
Reliability
although work on their validity and reliability is
continuing (WHOQOL Group 1998a, 1998b). The WHOQOL Group (1998a) reported on the
They appear to be well accepted by respondents. initial psychometric properties of the WHOQOL-
There are too many diverse studies of the psycho- 100; both published and unpublished data show that
metric performance of the instruments to report the instruments was shown to have good test-retest
in detail here. The difficulty with reviewing the and face reliability, and high internal consistency
overall psychometric properties of the instrument (Power et al. 1999; Skevington 1999). While tests
is that the studies all relate to different language for reliability are ongoing, there are, again,
versions, and different disease or population groups. several studies reporting on the initial results for
For example, Pibernik-Okanovic (2001) tested the the reliability of the instrument for the different
discriminant validity of the WHOQOL-100 with a language versions and in different contexts.
small sample of diabetic patients in Croatia and Test-retest reliability of the WHOQOL-BREF
reported that, at two months follow-up, it was was tested by de Girolamo et al.’s (2000) study in
sensitive to improvement in condition in patients Italy, and they reported correlations of between
0.76 and 0.93 for the domains. They also reported of LEIPAD was to develop an instrument that was
internal consistency with alphas ranging from 0.65 sensitive to change and which assessed quality of
to 0.80. Leplege et al. (2000), on the basis of their life subjectively from the older person’s perspective,
study of over 2,000 patients in different types of as well as containing some more objective items.
clinics in France, reported that the homogeneity of Thus the core of the LEIPAD is a self-report
the short version was lower than the full instrument, questionnaire requesting self-evaluations of existing
but was still acceptable: item scale correlations was state and effects on daily life. De Leo et al. (1998a)
greater than 0.40 for two-thirds of the items; and the argued that existing multi-dimensional instru-
Cronbach’s alpha for all domains on the WHO- ments developed for use with older people were
QOL-BREF were over 0.65. Pibernik-Okanovic’s too long (although these are assessment instru-
(2001) study of diabetic patients in Croatia found ments, not health-related quality of life question-
that the WHOQOL-100 produced Cronbach’s naires), and broader health-status scales omitted
alphas of 0.76 to 0.95 for the four domains. relevant dimensions (e.g. cognitive status).
While the WHOQOL-100 is lengthy, the The original LEIPAD contained 37 items, which
advantage is its breadth of scope and applicability were taken from existing questionnaires or created
in different cultures. A short version is available, ‘ad hoc’ by the developers, especially those with
also with good psychometric properties, but short expertise in psychogeriatric medicine. Thus it was
forms are always weaker than the full versions. not developed with older people, but was based on
The manual, appropriate language version of the ‘expert’ views of the important items and dimen-
instrument, scoring instructions and syntax files sions to include. These items initially covered 10
for their computation, permission for use and other areas: self-perceived physical health, mental health,
details of the instruments can be obtained from emotional health, self-esteem, expectations for
the relevant national centre (see WHO website: the future, activities and instrumental activities of
http://www.who.int) or from the WHOQOL daily living, interpersonal and social functioning,
Group at the WHO in Geneva. recreational activities, financial situation and
religiousness/spirituality. This early version had a
single scoring system for all items (the response
LEIPAD QUESTIONNAIRE
categories were: Not at all/A little/Somewhat/
The LEIPAD was developed under the auspices Much/Very much, with a corresponding 5-point
of the European Office of the World Health scale ranging from 0 (high level of well-being) to 4
Organization to assess multi-dimensional quality of (low level of well-being). Objective data were
life in older people, which reflects various aspects included in order to collect background informa-
of daily functioning (e.g. physical, mental, social tion: socio-demographic characteristics, personality
and occupational) (de Leo et al. 1998a, 1998b). The characteristics, the Mini-Mental State (Folstein et
name of the instrument derives from the institu- al. 1975), current illnesses and medications, indica-
tions of its main developers in LEIden and PADua. tors of religiousness/life sense. The authors argued
Field testing of the instrument, on over 500 patients that personality characteristics and mental status
attending general practitioners, was conducted in could be used to distinguish between valid and
Italy, the Netherlands and Finland. invalid self-reports.
The developers recognized that quality of life The instrument was revised and two items were
assessment had become narrow in focus because of added to make 39 items: sleeping and sexual func-
the interest in quality of life as a clinical outcome tioning. The scoring was simplified (a 4-point scale
indicator. This narrow focus was not regarded from 0 to 3 was used). Following testing with over
as useful for older people, and a wider measure 200 people, further revisions were undertaken and
which incorporated a person’s material, physical, the number of items was increased to 51. Questions
social, emotional and spiritual well-being was said were included on tiredness/energy, concentration,
to be preferable. These areas were said to become irritability, temper, tendency to argue, resentment,
more closely interrelated with age due to the negative self-concept, satisfaction with relation-
increased changes in experiencing various adverse ships, trust in others, sexual interest, finances, satis-
events simultaneously. The aim of the developers faction with health care – the latter question was
later removed along with an earlier question on in- functioning (0–9), sexual functioning (0–6), and life
continence. This led to the current 49-item version, satisfaction (0–18). Again, 0 = best and the highest =
which was tested with almost 600 people aged 65 worst.
and over in Italy, the Netherlands and Finland. The responses to the moderator scale items are
dichotomous (yes/no) and scored as 0 or 1 (0 =
Content
no problem and highest = worst). The item scores
are summed to form the subscale scores; the sub-
The current version of the questionnaire comprises scale scores are: self-perceived personality disorders
49 self-assessment items, 31 of which form seven (0–6); anger, resentment, irritability (0–4); social
core subscales: physical function (5 items), self-care desirability (0–3); religious faith (0–2); self-esteem
(6 items), depression and anxiety (4 items), cogni- (0–3). A global index of the scales can be computed
tive functioning (5 items), social functioning (3 with 0 = best and 93 = worst).
items), sexual functioning (2 items), and life satis-
faction (6 items).
The remaining 18 items act as the moderators for Validity
assessing the influence of social desirability bias and The field testing was with almost 600 people
personality on respondents’ scores: self-perceived aged 65 and over in Italy, the Netherlands and
personality disorders (5 questions from Hyler and Finland (de Leo et al. 1998a). The scale was tested
Rieder’s (1987) Personality Diagnostic Question- against the Rotterdam questionnaire which is
naire – Revised); anger, resentment, irritability (4 widely used in Europe, and measures psycho-
items); social desirability (3 items from the Crowne logical stress, physical stress and daily activity.
and Marlow (1964) questionnaire); religious faith Significant, high correlations were obtained on the
(2 items); self-esteem (3 items). Administration of LEIPAD subscales and similar subscales on the
the total scale takes about 15–20 minutes. Some Rotterdam questionnaire, in support of the valid-
examples are given below: ity of the LEIPAD.
The factor structure was tested on the entire
Social Functioning Scale sample aged 65 and over, and then in the sub-
How satisfied are you with your social ties or samples from each of the three countries. This
relationships? showed that the stability of a three-factor solution
Do you feel emotionally satisfied in your relationships
was insufficient. A two-factor structure of the scale
with other people? was supported, and which accounted for more than
half of the total variance: psychosocial functioning
Is there someone to talk with about personal affairs when (life satisfaction, depression and anxiety, and cogni-
you want to? tive functioning) and physical functioning (self-care
Self-Esteem Scale and physical function).
Taking everything into consideration, do you feel inferior
to other people? Reliability
How often do you avoid things (refrain from doing The results of the field testing showed that the scale
things) because you feel inferior? had fairly high internal consistency for the sub-
‘I tend to have a negative opinion of myself.’ scales (De Leo et al. 1998a). The subjective subscale
internal consistency coefficients were: physical func-
tion (Cronbach’s alpha: 0.74), self-care (Cronbach’s
Scoring
alpha: 0.74), depression and anxiety (Cron-
Each subjective item is scored from 0 to 3, with bach’s alpha: 0.78), cognitive functioning (Cron-
0 = the best condition and 3 = the worst. The bach’s alpha: 0.70), social functioning (Cronbach’s
item scores are summed to form the subscale alpha: 0.78), sexual functioning (two items: Pear-
scores; and the subscale scores are: physical func- son’s r = 0.43) and life satisfaction (Cronbach’s
tion (score: 0–15), self-care (0–18), depression and alpha: 0.61, the lower internal consistency reflects
anxiety (0–12), cognitive functioning (0–15), social the diverse life domains asked about).
The moderator subscale coefficients were: self- There were three phases of their scale develop-
perceived personality disorders (Cronbach’s alpha: ment. The first phase involved consultation with a
0.63), anger, resentment, irritability (Cronbach’s panel of experts in gerontology and methodology
alpha: 0.62), social desirability (Cronbach’s alpha: to assess the face validity of the theoretically rele-
0.60), religious faith (two items; correlation: 0.62), vant items which had been selected by the authors
self-esteem (Cronbach’s alpha: 0.63). These co- for inclusion. These items were then piloted with
efficents were lower than for the subjective sub- focus groups of older people as a check on content
scales, but the developers considered them to be validity, structure and duration of completion. The
adequate. content validity of the measure was also tested in
This scale has been relatively recently developed pilot face-to-face interviews. Finally, the 22-item
and far more psychometric testing is required. scale was completed by post by almost 300 people
While the total subscales are relatively long, aged 65–75. After analysis, the scale was reduced to
the attraction of the instrument is that it is simple 19 items.
to understand. While it aimed to be multi-
dimensional and relevant for older people, it Content
omitted key domains of relevance to them (e.g.
feeling in control, perceptions of independence and The current version of the instrument contains 19
autonomy). However, a quality of life questionnaire items in four domains: control, autonomy, self-
for older people, which is not disease focused is a realization and pleasure. It is self-administered, and
positive development. carries a four-item response scale for each question:
often/not often/sometimes/never. The ‘not often’
response is likely to carry some ambiguity and
CASP-19 needs replacing with a more mutually exclusive
code in order to distinguish it from ‘sometimes’ –
The CASP-19 was designed by Higgs and his col- the often/not often responses are more suitable for
leagues (Higgs et al. 2003; Hyde et al. 2003) to a dichotomous than ranked response format. Some
measure quality of life in early old age. The name of examples from the scale are shown below.
the instrument reflects its content: Control, Auton-
omy, Self-realization and Pleasure. The authors Control
used the following definitions of their concepts: My age prevents me from doing the things I would like
control as the ability to actively intervene in one’s to.
environment; autonomy as the right of an indi- I feel that what happens to me is out of my control.
vidual to be free from the unwanted interference of I feel left out of things.
I feel I can do the things that I want to.
others; self-realization and pleasure as capturing the
active and reflexive process of being human. Autonomy
They developed their measure based on Family responsibilities prevent me from doing what I
Maslow’s needs satisfaction model. Maslow (1954) want to.
proposed a hierarchy of shared human needs I feel that I can please myself what I can do.
necessary for maintenance and existence (physio- My health stops me from doing the things I want to do.
logical, safety and security, social and belonging, Shortage of money stops me from doing the things that
ego, status and self-esteem, and self-actualization). I want to.
Maslow (1962) argued that once their basic needs Often (3)/not often (2)/sometimes (1)/never 0 (some items
are satisfied, human beings pursue higher needs are reverse coded)
such as self-actualization and esteem. Hence
Hörnquist (1982) argued that as human needs are
Scoring
the foundations for quality of life, quality of life
can be defined in terms of human needs and the Each item carries a score of 0–3 and the 19 items
satisfactory fulfilment of those needs. Some investi- are summed, to make a score range of 0 to 57. The
gators of quality of life in mental health have also range of the scale is defined at the extremes as 0 =
incorporated a needs-based satisfaction model. ‘complete absence of quality of life’ and 57 (defined
in different descriptive terms) = ‘total satisfaction of the homogeneity improved from 0.59 to 0.77. The
all four domains’. Scores for the 19 items were well- coefficients of the remaining two subscales were
distributed along the range of scores, although there unchanged by the removal of items (coefficients
was a slight negative skew. not given, but the range of the Cronbach’s alphas
The validity of the response scales and scoring for all four subscales was given as between 0.6
needs further development and testing given that and 0.8). Thus the scale was reduced from 22 to
not all of the items are in rank order (e.g. the ‘often/ 19 items across the four subscales. The inter-
not often/sometimes/never’ response categories correlations between the subscales ranged from
include apparently dichotomous response choices 0.35 to 0.67.
(‘often’/‘not often’), and the ‘not often’ choice The CASP-19 is a newly developed measure,
overlaps ambiguously with ‘sometimes’). The hence there is relatively little published information
Likert scaled format ‘Very often/often/sometimes/ about its psychometric properties. The main attrac-
never’ is normally used by other investigators, tion of the CASP-19 is its strong conceptual base
and would probably be preferable in any further in a theory of human need. This now needs testing
development of the instrument. against people’s own definitions of quality of life
and more testing of its general validity. Once there
is more published evidence of its psychometric
Validity properties, and the response scale ambiguity has
been removed, the CASP-19 appears to be a
Validity and reliability were assessed using the postal
potentially useful instrument for measuring the
survey data (Hyde et al. 2003). Concurrent validity
domains of control, autonomy, self-realization and
was partly tested with correlations of the scale
pleasure in older people. It is unlikely that these
with the eight-item index of life satisfaction,
domains alone make up broader quality of life, and
which tapped some of the same topics, and a strong,
researchers aiming to measure broader aspects of
positive association between the two scales was
quality of life will need to supplement it with other
found (r = 0.67). The measure, which was theoretic-
measures. However, the domains included within
ally derived, still needs to be tested against people’s
the CASP-19 reflect some of the dimensions
global self-ratings of their quality of life, as well as
believed to be important for ‘successful ageing’
longer, more comprehensive measures of quality of
(Baltes and Baltes 1990). As few existing measures
life.
cover control and autonomy adequately, the CASP-
Exploratory factor analysis of the original 22-
19 is a welcome addition.
item instrument confirmed the pattern of item
loadings across the four conceptual domains of
the instrument (Higgs et al. 2003). Separate factor
analysis of the four summed scores for the four QUALITY OF LIFE QUESTIONNAIRE (QLQ)
domains revealed strong loadings supporting a
single underlying quality of life factor (Hyde et al. Evans and Cope (1994) designed a self-report
2003). instrument to measure the quality of life of adults
aged 18 and over in descriptive research and in
research evaluating the outcomes of public services.
Reliability The authors also used it to analyse the quality of
The postal survey indicated that the scale had life in relation to mental and physical health and
moderate to high internal consistency. Once two selected scales from it were used to assess the impact
of the original items had been removed from the of liver transplantation.
Control subscale (‘Other people take my opinions
seriously’ and ‘I feel that I am a respected person’) Content
the homogeneity of the scale improved from
Cronbach’s alpha 0.29 to 0.59. Similarly, once one The instrument contains 192 items within 5 major
item (‘At times I think I am no good at all’) had domains, 15 scales and a social desirability scale. The
been removed from the Self-realization subscale five major domains are:
1 General well-being (material well-being, physical The scale is strictly copyrighted and is pur-
well-being, personal growth); chasable, together with a manual, from Multi-
2 Interpersonal relations (marital relations, parent– Health Systems Inc. A computerized version, and a
child relations, extended family relations, extra paper-and-pencil ‘Quixote’ version, is available.
familial relations);
3 Organizational activity (altruistic behaviour,
political behaviour);
4 Occupational activity (job characteristics, LINEAR ANALOGUE SELF-ASSESSMENT (LASA)
occupational relations, job satisfiers);
5 Leisure and recreational activity (creative/
The Linear Analogue Self-Assessment Scale was
aesthetic behaviour, sports activity, vacation
developed to assess the quality of life of cancer
behaviour).
patients (Priestman and Baum 1976). Because it
has been frequently adapted for use in a range of
other contexts, it is included here as an example
The instrument was designed for self-completion. of a measure based on visual analogue scales (VAS)
The time reference is ‘the present time’. Examples for response categories. Although simple, it can
from the scale are: be lengthy in terms of administration time if
respondents are not used to VAS scales.
This questionnaire includes a series of statements. Read
each statement and answer each one TRUE or FALSE. If a
statement is descriptive of you, or if you agree with it, Content
answer TRUE. If a statement is not descriptive of you, or
if you do not agree with it, answer FALSE . . . The LASA questionnaire has 25 items, ten of which
relate to the symptoms and effects of the disease
I seem to be always in a hurry.
There are a lot of things I would like to change about and treatment (e.g. pain and nausea); five examine
myself. psychological consequences (e.g. anxiety and
My job allows me to be creative. depression); five measure other physical indices (e.g.
I learn a lot from friends. ability to perform household chores); and five items
The response format is dichotomous: true/false. cover personal relationships.
Copyright © 1990, 1998, Multi-Health Systems Inc. All rights reserved. The LASA tests employ lines, the length of
In the USA, PO Box 950, North Tonawanda, New York, 14120–0950; which are taken to denote the continuum of some
1–800–456–3003; In Canada, 3770 Victoria Park Ave., Toronto, ON,
M2H 3M6. Internationally + 1–416–492–2627. Fax, + 1–416–492–3343.
emotional or physical experience such as tired-
Reproduced with permission. ness or anxiety (Priestman and Baum 1976). The
lines are usually 10 cm long with stops at right
The QLQ can produce individual scale scores and angles to the line at its extremes, representing
an overall quality of life score. the limits of the experience being measured. It
is a technique that has been easily administered to
5-year-old children (Scott et al. 1977). Examples
Validity and reliability of items are:
The QLQ was reported by the developers to have
good predictive and discriminative ability validity Ability to perform shopping
in relation to people with mental health problems, None – Better than ever
substance abuse, and requiring stress management.
The intra-scale correlations, test-retest reliability, Decision making
Impossible – Excellent
internal reliability coefficients were also high.
Normative data were provided by samples of 163 Nausea
adults and 274 adults. Unfortunately, the results of Constant – None
psychometric testing, while they are summarized Appetite
in the manual, are not available in publications in None – Excellent
accessible scientific journals. This restriction will How would you rate your quality of life?
inevitably limit its use. Very poor – Very good
Scoring sensitive than the four-point scale. Two different
The patient is instructed to mark along the line a groups of 25 patients also filled in the same forms
point that corresponds to his or her perception of on a single day, and daily for five consecutive
the experience. The distance from the ‘none at all’ days, during a period when their clinical state was
mark to the patient’s mark provides a numeric score expected to remain stable. Professionals also com-
for the item. The items are summed. pleted the scales in relation to the same patients.
The LASA scale was found to be easily repro-
ducible, and had greater reproducibility than the
Validity other scales. Little other information about its
There are many references in the literature to the reliability has been published.
validity of the VAS scale technique (Melzack 1983). Although the scale is simple and has been
The LASA has been shown to be able to detect reported to be reproducible and able to discriminate
treatment response in cancer patients (Coates et al. between groups, it may be problematic to adminis-
1987; Butow et al. 1991; Demetri et al. 2002). In ter because some patients may take some time to
a drug trial in palliative care, it showed parallel accustom themselves to representing their feelings
changes to a longer, psychometrically sound along a continuum. Investigators have used various
cancer-specific quality of life questionnaire (the modifications of the scale (Pandey et al. 2000), and
EORTC QLC-C30) (Hedley et al. 2002). Slevin many have adapted the LASA scales for use in non-
et al. (1988) in their testing of the scale, however, clinical studies. More evidence of the psychometric
reported that the LASA scale correlated poorly properties of such VASs is required.
with the Hospital Anxiety and Depression (HAD)
scale. Slevin et al. suggested that this was due to the
HAD items being less applicable to cancer patients. SPITZER’S QUALITY OF LIFE INDEX (QL INDEX)
Gough et al. (1983) assessed 100 patients with
advanced cancer in Australia, using a single LASA Another scale which involves using a visual
item ‘How would you rate your general feeling of analogue scale, in addition to simple category scales,
well-being today?’, a 21-item version of the LASA is Spitzer’s Quality of Life Index (Spitzer et al.
index, self- and interviewer-administered five-item 1981). It covers comparable dimensions to most
QL Index (Spitzer’s) which covered activity, daily broader health-status scales: activity, performance
living, health, support and outlook. Each patient of activities of daily living, perception of health,
was evaluated four times at four-weekly intervals support from family and friends and outlook on life.
for 12 weeks. The correlation coefficients for A Uniscale for rating overall quality of life during
the four methods ranged from 0.38 to 0.86. The the previous week is also included, and this is often
single-item LASA question correlated moderately used separately. Although the developers specified
to well with all three questionnaires (0.38 to that it was to be used to rate the quality of life
0.67). The LASA-21 correlated from 0.46 to −0.65 of cancer patients, the Index and Uniscale are pre-
with the other items. However, shorter scales are sented here because they have been widely used
inevitably less stable (Bernhard et al. 2001). in other clinical and non-clinical contexts. In par-
ticular, the Uniscale has frequently been used or
adapted to rate quality of life.
Reliability
Spitzer et al. (1981) identified components of
The test-retest correlations of the LASA were 0.73 health-related quality of life empirically by ques-
(Priestman and Baum 1976). Slevin et al. (1988) tioning lay people as well as health professionals.
tested LASA against the Hospital Anxiety and They formed three advisory panels each with 43
Depression scale, the Karnofsky Index and the members from Sydney, Australia. These consisted
Spitzer Quality of Life Index with 108 cancer of cancer patients and their relatives, patients with
patients in London and reported that the LASA chronic diseases and their relatives, healthy people
scale showed similar concordance coefficients when aged between 20 and 59, and 60+, physicians,
taken as a whole, compared with being divided into nurses, social workers and other health professionals
four equal parts, i.e. the continuous scale is no more and members of the clergy. One panel received an
open-ended questionnaire designed to elicit Daily living
spontaneous beliefs about factors that could I am able to eat, wash, go to the toilet and dress without
enhance or decrease the quality of life. The second assistance. I drive a car or use public transport without
panel received a more structured questionnaire assistance.
seeking views on various aspects of defined quality Health
of life. The third panel assessed the results from the I lack energy or only feel ‘up to par’ some of the time.
first two and the relative importance of the main Support
factors. The factors that were rated as the most I have good relationships with others and receive strong
important formed the first drafts of the QL Index support from at least one family member and/or friend.
which were tested on 339 people from out-
Outlook on life
patients’ clinics. This resulted in the following I generally look forward to things and am able to make
dimensions of quality of life being incorporated my own decisions about my life and surroundings.
within the definitive QL Index: activity; per-
formance of activities of daily living; perception These statements are also adapted for the clinician
of health; support from family and friends; and to make proxy ratings of the patient’s status.
outlook on life.
Its authors caution that it is not suitable for Spitzer QoL Uniscale: the respondent is asked to
measuring or classifying the quality of life of osten- place an X on the line to rate quality of life:
sibly healthy people (Spitzer et al. 1981), although
the Uniscale is often used with this group. The index
is short and easily administered. The average com- lowest quality of life highest quality of life
pletion time is one minute. There are numerous
applications of this scale in clinical settings. Scoring
The scoring of the Spitzer QL Index is simple. The
Content scale consists of five items, with three options for
replies. The item responses comprise scores 0–2,
The QL Index consists of five items. Each item giving an overall score of 0 to 10. The scale can be
represents a different domain of life functioning: summed into a single score or each item can be
activity, performance of activities of daily living, presented separately.
perception of health, support from family and
friends and outlook on life. Respondents are
Validity
requested to tick the statements which apply to
them. Respondents only have the option of one tick Spitzer et al. (1981) asked 68 lay people and pro-
per statement to indicate that it applies to them. fessionals (e.g. physicians) to assess the scope and
There are problems with interpretation if respond- format of the instrument; most judged it to be satis-
ents can do some tasks but not others. A version factory and the authors judged the scale to have con-
exists for clinicians to complete on behalf of patients. tent validity. They also tested the scale for validity
The scale also comprises a visual analogue rating by inviting 150 physicians to rate 879 patients.
Uniscale in which the respondent places a cross on Less than two-thirds of the physicians (59 per cent)
a horizontal line to indicate their quality of life reported that they were ‘very confident’ of the
during the past week (anchored at each end from accuracy of their scores. However, the analysis of
lowest quality to highest quality). This is repeated physicians’ scores showed that the close clustering
by the clinician to provide a proxy rating of the of high scores among healthy subjects, the spread
respondent’s quality of life. Standardized descrip- of scores among those who were definitely ill, and
tions of the anchor terms are provided. Examples of the low scores of those who were seriously ill, made
the items are presented below: clinical sense and provided evidence of its dis-
criminative ability. Convergent validity was judged
Activity to be adequate by Spitzer et al. (1981) by comparing
I do not work in any capacity nor do I study nor do I physicians’ and patients’ ratings. When the phys-
manage my own household. icians’ ratings were compared to patients’ self-
ratings, the correlation was moderately high (0.61). sensitive for use as an outcome variable in studies
The authors reported that the scale was able to dis- evaluating the effect of a treatment or intervention
criminate between healthy people and patients with on patients’ lives. This was mainly because of the
varying conditions. The authors did not intend it insensitivity of the social functioning index of
to be appropriate for measuring global quality of the scale. Wood-Dauphinee and Williams (1991)
life in healthy populations, and they reported that it reviewed the literature on the QoL Index and the
does not discriminate adequately among well Uniscale and cited several studies that supported its
people (Spitzer et al. 1981). convergent validity and discriminative ability on
Gough et al. (1983) reported that the Uniscale the basis of correlations with other scales, and its
correlated over r = 0.60 with the Karnofsky Per- ability to discriminate between healthy and sick
formance Index. The scale and the Uniscale are also people, and sensitivity to the stages of the disease
correlated moderately to highly on other global- progress.
functioning and disease-specific quality of life scales
(Spitzer et al. 1981; Mor et al. 1984; Morris et al.
Reliability
1986; Mor 1987; Sloan et al. 1998). Mor (1987), on
the basis of three samples of newly diagnosed cancer Spitzer et al. (1981), on the basis of 150 physicians’
patients (total: 2,046), reported a correlation ratings of 879 patients, reported that assessment
between the QL Index and the Karnofsky Perform- of internal consistency demonstrated a high co-
ance Scale of 0.63; the correlation was moderate efficient (0.77), and the correlation for inter-rater
probably because of the multi-dimensional nature reliability was high (0.81). The scale has been
of the QL Index. The item correlations of the scale shown to have reasonable inter-item reliability
with the Karnofsky Scale ranged from 0.13 to 0.57. (Mor et al. 1984; Morris et al. 1986; Mor 1987). The
The correlation coefficients were not high enough stability of the scale is more questionable. Slevin
in these studies to be confident that these scales et al. (1988) administered the index along with the
cover the same dimensions. The QoL Index has also Linear Analogue Self Assessment (LASA) quality-
been reported to be able to predict mortality of-life visual-analogue scales, the Karnofsky
among cancer patients (Mor et al. 1984, Morris et Performance Scale, and the Hospital Anxiety and
al. 1986), although Mor (1987) concluded that it Depression Scale. The scales were completed by
was not sufficiently sensitive for use as an outcome 108 patients and their doctors at the same time.
indicator of care. It has been used in an Australian Two different groups of 25 patients filled in the
study of outcome of breast-cancer patients and was same questionnaires on a single day, and daily for
shown to be capable of discriminating between five consecutive days. Reproducibility was not as
patients on intermittent therapy and those receiving good as the Karnofsky Performance Index. The
continuous therapy (Coates et al. 1987). However, Karnofsky, which correlated with the Spitzer Index
Levine et al. (1988) reported that it did not suf- (patients’ rating of quality of life: 0.49), demon-
ficiently discriminate between patients with strated greater reproducibility than any of the other
breast cancer who had completed or had not yet scales. The variability in their results from repeated
completed their treatment. testing questions the reliability of the Spitzer QL
Morris and Sherwood (1987) in the USA admin- Index. Large discrepancies between patients’ and
istered the QL Index to different samples of cancer doctors’ ratings of the patients’ quality of life have
patients at different stages of the disease. Over been reported by Slevin et al. (1988), questioning
2,000 patients were included in the study. A strong the validity of proxy ratings. Sloan et al. (1998)
correlation was reported between the Karnofsky reported, on the basis of patients with advanced
performance-status rating and the QL Index, in cancer participating in a clinical trial of treatment,
support of construct validity and discriminative that doctors’ ratings on the Uniscale were lower
ability. The QL Index was also able to successfully than the patients’ own ratings. This indicates that
distinguish between cancer patients who were patients with advanced cancer rate their lives more
newly diagnosed, those under active treatment, and highly than the doctors who rated them. Moinpour
those nearing the terminal stages of the disease. et al. (2000) also reported poor agreement between
However, the authors did not feel it was sufficiently patients’ and families’ ratings of the patient’s quality
of life using the QL Index. They concluded that condition on their lives, and ask them to prioritize
proxies are a poor substitute for capturing patients’ or weight the areas mentioned (Joyce et al. 1999).
perspective on their quality of life. This view has Individualized measures require the use of
been supported in studies of proxy ratings using methods of scoring and weighting which are com-
other scales. For example, in a study of patients’ and plex. The individualized weighting procedures also
friends’/relatives’ (proxies) ratings of the patient require interviewer administration for valid results
using the EuroQol, Dorman et al. (1997) reported and optimum response rates. Further supportive
that while moderate agreement between ratings evidence is required for the use of individualized
was found for directly observable domains, agree- measures, given their research and respondent
ment was less good for the more subjective burden. More research is needed to compare the
domains. amount of explained variance in quality of life
Slevin et al. (1988) argued that the QoL Index assessments achieved by existing, relatively complex,
contained inappropriate questions for measuring individualized weighting and scoring methods, in
the quality of life of cancer patients, and that the comparison with more basic methods. While indi-
continued popularity of the Spitzer scale, despite vidualized measures are a welcome development
poor reliability and validity of the scales, stems from on a complex topic, Fitzpatrick (1999) argued that
researchers’ tendency to rely on significance values the hypothesis that they provide more reliable and
when assessing their scales, rather than on the size of valid measures of quality of life than standardized
the correlation value. The QL’s reliance on just five measures has yet to be confirmed. Tugwell et al.
items means that it does not adequately account for (1990) also argued that any superiority of more
the different dimensions of health related quality of individualized approaches over standardized
life. It has been criticized for excluding spiritual and approaches can be explained by the fact that
financial domains (McMillan 1996). Further testing standardized instruments contain more ‘noise’ from
is required. The scale’s reproducibility requires items that are less relevant or irrelevant to patients.
further investigation; and a major disadvantage is The two best-known individualized measures –
the confusion of several dimensions within one the Schedule for the Evaluation of Individual
item. However, the scale has been popular despite Quality of Life and the Patient Generated Index –
limitations (Addington-Hall et al. 1990). are reviewed next.
INDIVIDUALIZED MEASURES OF QUALITY OF LIFE SCHEDULE FOR THE EVALUATION OF INDIVIDUAL

QUALITY OF LIFE (SEIQoL)
The conceptual and methodological difficulties
inherent in quality of life research, and the reliance The SEIQol is a generic, individualized quality of
on psychometric testing for scale development at life scale (Browne et al. 1997). It is based on the
the expense of the relevance of scale items to the rationale that it works within the value system of the
individual (Hunt 1999), supports the case for individual being assessed, rather than the value sys-
capturing individuals’ own values and experiences. tems of others. It derived its cognitive aspects from
More qualitative or semi-structured methods to theoretical studies of perception and their extension
explore quality of life have long existed (e.g. the use to Social Judgment Theory (Joyce et al. 2003).
of diaries; in-depth interviews with critical incident While the SEIQoL is a generic measure of qual-
and life history approaches; repertory grid tech- ity of life, and is not a health-related or disease-
niques). Given the need for meaningful quantitative specific measure, it has been used with many groups
measures in research on health outcomes, some of patients in studies of clinical outcome, as well as
investigators have also attempted to reconcile quali- with older people and carers (e.g. McGee et al.
tative and quantitative approaches (Guyatt et al. 1991; O’Boyle et al. 1992; Coen et al. 1993, 1999;
1987; O’Boyle et al. 1992; Browne 1999; Ruta et al. Browne et al. 1994; Hickey et al. 1996, 1997;
1999). These individualized measures ask people O’Boyle 1996, 1997a, 1997b; Waldron et al. 1999;
themselves about the most important things in their Scholzel-Dorenbos 2000; Tovbin et al. 2003;
lives, or about the most important effects of their Wettergren et al. 2003). It also influenced the
development of the ADDQoL which measures I would like you to show me how important the five
individual’s perceptions of the impact of their dia- areas of life you have mentioned are in relation to
betes on their quality of life (Bradley et al. 1999). each other, by using this disc (indicate SEIQoL-DW).
The SEIQoL was reported to be acceptable to People often value some areas of life as more import-
ant than others. This disc allows you to show me
people, although administration by an interviewer is how important each area in your life is by giving the
required which, due to expense, often limits studies more important areas a larger area of the disc and
to relatively small numbers. the less important areas a smaller area of the disc.
© Department of Psychology, Royal College of Surgeons
Content and scoring in Ireland, 1993. Reproduced with permission.
The SEIQoL enables individuals to nominate the
areas of life they consider to be the most important Weighting
to their quality of life, based on their own values.
Respondents are asked to nominate five areas of Quality of life weights for the importance of
quality of life that are important to them, then they domains may be derived alternatively, using human
are asked to rate their current status in each area, as judgement analysis techniques (based on vignettes
well as their global quality of life, against vertical of conditions) during the interview, although the
visual analogue scales, which are then tabulated shorter direct weighting (DW) procedure described
into bar charts by the interviewer using a laptop above is generally preferred in order to reduce
computer. researcher and respondent burden. It is interviewer
In the Direct Weighting (SEIQoL-DW) method, administered, and takes up to 30 minutes to explain
the respondents weight the relative importance of and administer (Browne et al. 1997).
each area using a hand-held disc which can be
scored and which provides the relative weights for Validity
each area. This is known as the direct weighting Validity has been partly assessed. O’Boyle et al.
procedure (a manual of the procedure is available, (1992) indicated that the SEIQoL was sensitive to
O’Boyle et al. 1995). The disc consists of five individuals’ quality of life. A study of 56 healthy
stacked, centrally mounted, interlocking laminated people aged 65 and over were able to understand
discs. Each disc is a different colour and is labelled and complete the SEIQoL correctly. However,
by the interviewer with one of the five areas of life health status was not correlated with the perceived
selected by the respondent. The coloured discs can importance of health on the SEIQoL at baseline,
be rotated over each other to produce a dynamic and the correlation was low at 12-month follow-up
pie chart where the relative size of each coloured interview. The weight placed on the importance of
‘area of life’ represents the weight the respondent health did not increase over the 12-month study
attaches to that area. There is a 100-point scale on period, despite a significant decline in health status
the circumference of the disc, and the proportion (Browne et al. 1994). In research on hip replace-
that each coloured area represents can be scored ment, health was unexpectedly nominated more
from this to produce the individual weighting of frequently by controls than patients (O’Boyle et al.
the importance the individual attaches to each area 1992). Among a healthy population, relationships,
of life. health, family and finances were the most fre-
quently nominated (by over 50 per cent), and
Examples from the SEIQoL-DW among gastro-intestinal clinic attenders, family,
What are the five most important areas of your life at work, social and leisure activities, and health were
present – the things which make your life a relatively the most frequently nominated domains (McGee
happy or sad one at the moment . . . The things that you et al. 1991). These differences in values appear to
feel determine the quality of your life? justify the use of an individualized scale.
Now that you have named the five most important areas The scale was applied by O’Boyle et al. (1992)
in your life, I am going to ask you to rate how each of in a prospective intervention study of 20 patients
these are for you at the moment. First I will show you an undergoing unilateral total hip replacement
example of how the rating is done . . . surgery, with a six-month follow-up. Comparisons
were made with matched, non-patient controls. the SEIQoL is that both weighting methods lead to
The scale showed improvements in scores after a total of 100. On repeat testing, if the weight pro-
surgery. However, the SEIQoL can present difficul- vided for a particular area of life changes, then so
ties when measuring change. Quality of life is a must the weight for one or more of the other
dynamic concept and patients may rate different domains.
areas as important at different stages in their con- The open-ended questions in the SEIQoL have
dition. Therefore, in order to assess change in pro- inspired the design of other semi-structured ques-
spective studies, the recommended practice in the tionnaires for exploring the quality of life. For
user’s manual is that new cues are elicited at each example, Bowling (1995) asked a national sample of
assessment. Cues nominated at baseline should then adults to list the five most important areas of life, and
be given to the respondent and the SEIQoL pro- to prioritize them. Because it was uncertain whether
cedure repeated. This is necessary in order to enable the important things in life to people equated with
direct comparison between assessments (O’Boyle quality of life Bowling et al. (2003) subsequently
et al. 1995). asked a national sample of people aged 65 and over a
Lintern et al. (2001) compared the results of the series of open-ended questions directly on quality
PGI, the SEIQol and the SF-36 in patients with of life, and confirmed that the domains closely
multiple sclerosis. They reported that the SEIQoL overlapped. The questions they used were:
scores related more closely to SF-36 dimensions of
health and vitality and the PGI scores related closely 1 Thinking about your life as a whole, what is it
to the SF-36 dimension of physical functioning. that makes your life good – that is, the things that
They suggested that these differences may reflect give your life quality? You may mention as many
the conceptual bases of the instruments. Neudert things as you like.
et al. (2001) asked patients with amyotrophic lateral 2 What is it that makes your life bad – that is the
sclerosis to rate their perceptions of the validity things that reduce the quality in your life? You
of the SEIQoL, the Sickness Impact Profile and may mention as many things as you like.
the SF-36, using visual analogue scale ratings. The 3 Thinking of all these good and bad things you
validity of the SEIQoL was rated as higher than have just mentioned, which one is the most
both the SIP and the SF-36, indicating that important to you?
patients felt that the SEIQoL was more likely to 4 What single thing would improve the quality of
measure their quality of life. A study by Smith your life?
et al. (2000), however, reported that the SEIQoL, 5 What single thing would improve the overall
the SF-36 and two disease-specific instruments all quality of life for people of your age?
lacked sensitivity to changes in clinical condition
over time in a study of cardiac patients. A strong case could be made for the use of the
SEIQoL as the core component of disease-specific
Reliability
measures, although more work is required in view
of its potential complexity. A manual and com-
Test-retest of both the direct weighting procedure puterized version of the instrument is available
and the human judgement analysis weightings were (O’Boyle et al. 1995). It has been translated for use
carried out using 40 healthy volunteers at baseline, in other countries, including Sweden and Denmark
at 7–10 days and 14–20 days later. The weights (Scholzel-Dorenbos 2000; Ventegodt et al. 2003;
produced by the two different methods differed, on Wettergren et al. 2003).
average, by 7.2 to 7.8 points over time. The direct
weights varied over time by 4.5 points and the
human judgement analysis weights varied by 8.4 PATIENT GENERATED INDEX (PGI)
points (Browne et al. 1997). The authors of the scale
pointed out that the results suggested stability The Patient Generated Index was developed by
among some respondents. The results indicated that Ruta and his colleagues who aimed to ‘develop an
most people were consistent but individuals varied. instrument that could be used to quantify the dif-
One problem with reliability testing in the case of ference between an individual’s hopes and expecta-
tions and reality in a way that has meaning and Set 2: Scoring each area
relevance in their daily lives’ (Ruta 1992; Ruta et al. In this part we would like you to score the areas you
1994b, 1999; Garratt and Ruta 1999). It rests on the mentioned in Step 1. This score should show how badly
recognition that assessment of quality of life is affected you were over the past MONTH. Please score
subjective, and the measure is based on ‘gap’ or each area out of 10 (10 = Exactly as you would like to be
‘comparisons’ theory (Calman 1984). Within this . . . 1 = The worst you could imagine).
model, quality of life is expressed as the extent Step 3: Spending points
to which hopes and ambitions are matched by We want you to imagine that any or all the areas of your
experience, and the aim of an intervention is to life could be improved. You have 12 imaginary points
narrow the gap between expectations and reality. It to spend to show which areas you would like to see
was based partly on Guyatt et al.’s (1987) disease- improve. Spend more points on areas you would like to
specific scale in which patients are asked about see improve and less on areas that are not so important.
the five most important activities that are affected You don’t have to spend points in every area. You can’t
by their condition, and on the priority evaluation spend more than 12 points in total.
methods used by town planners to assess com-
munity preferences, in which people allocate points © D. Ruta, University of Newcastle. Reproduced with
between a set of characteristics – from shopping permission.
facilities to garden size (Ruta et al. 1999).
To generate an index, the self-ratings for each area
are multiplied by the proportion of points awarded
Content and scoring to that area and summed to give a score between 0
The PGI is completed in three steps. In step 1, and 100. The score aims to represent the extent to
respondents are asked to specify the five most which reality matches expectations in the areas in
important areas of their life (‘affected by their which respondents most value improvement. This,
condition’ if disease or disease-specific versions are then, represents the authors’ construct of quality
used). Step 2 asks them to rate how bad (badly of life (Ruta et al. 1999). Self-administered and
affected) they are in each chosen area on a scale of interviewer-administered versions are available, as
0–100 (0 represents the worst they can imagine for well as generic, health and disease-specific versions.
themselves and 100 represents exactly as they would
like to be). A sixth box enables them to rate all other Validity
areas of life. In step 3, respondents are asked to The PGI was initially tested with 20 patients, and a
imagine that they can improve some or all of their checklist version of areas of life was generated for a
selected areas. They are then given ‘points’ (the postal questionnaire version. This was further tested
number varies with the version – generic, health in a postal survey of 20 more patients, who were
related or disease specific – of the scale used) to subsequently interviewed, and the questionnaire
spend across one or more areas that they would most later modified. A further postal survey of 74 people
like to improve. The points, then, represent the rela- identified by general practitioners as suffering
tive importance of potential improvements in that from low back pain resulted in 47 per cent being
area, and represent the individualized weighting. returned completed; 27 per cent were returned
partly completed and 31 per cent were returned
Examples from the PGI (Health) blank, indicating that the exercise is too complex or
burdensome for self-completion (Ruta et al. 1999).
Step 1: Identifying areas The response rates for self-completion, as opposed
We would like you to think of the most important areas to interviewer administration, of the PGI have been
of your life (that are affected by your HEALTH*). Please reported to be almost two-thirds that of Ware
write up to FIVE areas in the boxes below. et al.’s (1993) SF-36 Health Survey. There is some
(*Note: Health is referred to in the health-related quality evidence that non-responders to the postal versions
of life version, but not in the generic version; in the of the PGI are less well educated, in lower socio-
disease-specific versions the specific condition is economic groups (measured by housing tenure) and
referred to) are more likely to be retired than responders (Ruta
et al. 1999). There is evidence that respondents with other quality of life questionnaires, and it achieved
perceived health problems who complete the PGI moderate levels of responsiveness to changes in
correctly are younger and spent longer in education health. Camilleri-Brennan et al. (2002) adminis-
that those with health problems who do not com- tered the PGI, pre- and 3 months post-operatively,
plete it correctly (Macduff and Russell 1998). to 33 patients with rectal cancer. The PGI corre-
The index correlated well with the Rand SF-36 lated significantly with several domains of other
scales measuring pain, social functioning and role quality of life questionnaires, it was responsive to
limitations attributable to physical problems, and improvement in condition after surgery, and was
with the clinical questionnaire used. The scores more responsive to change than the SF-36.
reflected general practitioners’ assessments of It was stated earlier (see SEIQoL) that Lintern
severity. The PGI was also applied and tested by et al. (2001), in their comparison of scales, found
the developers with patients with low back pain, that the PGI scores related closely to the SF-36
menorrhagia, suspected peptic ulcers and varicose dimension of physical functioning, whereas the
veins (Ruta et al. 1999). In these main validation SEIQoL scores related more closely to SF-36
studies, the correlations between the PGI scores and dimensions of health and vitality. They suggested
the domains of Ware et al.’s (1993) SF-36 scores that these differences may reflect the conceptual
were weak to moderate, although still highly sig- bases of the instruments.
nificant in most cases (r = 0.06 to 0.39). The PGI Also, like the SEIQoL, the PGI can present dif-
was reported to detect small to moderate changes ficulties when measuring change, as quality of life
in three of the four conditions studied over a 12- is dynamic, and patients may rate different areas as
month period. It was more responsive to change important at different stages in their condition.
than the SF-36. Ruta et al. (1999) thus claimed that Also, again like the SEIQoL, on repeat testing, if
the PGI is as sensitive to individuals’ quality of the score (weight) provided for a particular area
life as standardized measures. Further tests by the of life changes, then so must the weight for one or
developers for construct validity showed small or more of the other domains.
non-significant results. It was unknown whether
this reflected the weakness of their hypotheses,
Reliability
the smaller sample sizes, or the weakness of the PGI
(Ruta et al. 1999). Test-retest results at two weeks (post return of a 12-
Tully and Cantrill (2000) also tested the validity month follow-up questionnaire) with patients with
of the PGI with over 1,000 people aged 65 and over low back pain, menorrhagia, suspected peptic ulcers
with arthritis, using a postal survey and follow-up and varicose veins were reported to be adequate by
interviews with a sub-sample. Their response rates the developers (Ruta et al. 1999). Haywood et al.
were high at 78 per cent and 83 per cent respec- (2003), in their postal survey of patients with anky-
tively. They reported that the PGI met four and losing spondylitis, reported that the test-retest reli-
failed to meet six of their criteria for validity. While ability coefficents of the PGI were high (0.80).
their hypotheses were confirmed for associations Tully and Cantrill (2002), in their postal survey of
between the PGI and the Arthritis Impact Measure- over 1,000 patients aged over 65 years with arthritis,
ment Scales, severity of arthritis, and those who had reported the test-retest reliability coefficient to be
sought medical care and those who had not, the lower at 0.55, increasing to 0.67 when respondents
PGI failed to detect changes in health status or who had misinterpreted the instructions were
distinguish between respondents taking analgesia excluded (detected at the follow-up interviews).
or not. Follow-up interviews with a sub-sample of They concluded that the instrument elicited patient
respondents also revealed that there were problems concerns about their condition that other, more
with respondents’ interpretation of instructions. structured, questionnaires may not identify.
In contrast, Haywood et al. (2003), in a postal In sum, the PGI is less suitable for self- or
survey application of the PGI to patients with anky- postal administration because of its complexity.
losing spondylitis, reported that the instrument Many people complete it incorrectly (Macduff
had acceptable completion rates, data quality was and Russell 1998) and, therefore, interviewer
adequate, the PGI correlated significantly with administration is required.
APPENDIX:
A SELECTION OF
SCALE DISTRIBUTORS
AND USEFUL ADDRESSES
Abbreviated Mental Test Score (AMT) 7-item ver- Netherlands. http://www.globalfamilydoctor.com/

sion Professor S. Ebrahim, Department of Social coop-woncacharts
Medicine, University of Bristol, Canynge Hall, Crichton Royal Behaviour Rating Scale Dr D.J.
Whiteladies Road, Bristol, BS8 2PR. Jolley, South Manchester Old Age Psychiatry Service,
Arizona Social Support Interview Schedule Withington Hospital, Nell Lane, West Didsbury,
(ASSIS) Department of Psychology, Arizona State Manchester M20 2LR, UK.
University, Tempe AZ 85287, USA. Dartmouth COOP Function Charts Dr Deborah
Arthritis Impact Measurement Scales Dr R.F. Johnson, Dartmouth COOP Project, Dartmouth
Meenan, Arthritis Center, Boston University Medical Medical School, Hinman Box 7265, Hanover, NH
Center, Conte Building, 80 East Concord Street, 03755–3862, USA.
Boston, MA 02118, USA. EuroQol Dr Frank de Charro, EuroQol Business
Barthel Index (Granger’s modified version and Manager, PO Box 4443, 3006 DR, Rotterdam, The
FIM+FAM replacement scale) Dr C.V. Granger, Netherlands.
Center for Functional Assessment, Department of Family Relationship Index (FRI), Family
Rehabilitative Medicine, University of Buffalo, Environment Scale (FES) Consulting Psycholo-
232 Parker Hill, 3435 Main Street, Buffalo, NY gists Press, 3803 East Bayshore Road, Box 10096,
14214-3007, USA. Palo Alto, CA 94303, USA.
Beck Depression Inventory (BDI) (revised version) General Health Questionnaire (GHQ) NFER-
Psychological Corporation, 555 Academic Court, NELSON, Darville House, 2 Oxford Road East,
San Antonio, TX 78204–2498, USA. http://www. Windsor, Berks SL4 1DF, UK.
PsychCorp.com (then search for Beck). Geriatric Depression Scale – accessible on http://
CASP-19 Dr Paul Higgs, Centre for Behavioural www.stanford.edu/~yesavage/GDS
and Social Sciences in Medicine, University College Geriatric Mental State (GMS) and CARE Profes-
London, Wolfson Building, Riding House Street, sor J.R.M. Copeland, Department of Psychiatry,
London W1N 8AA, UK. Royal Liverpool Hospital, Prescot Street, Liverpool
Cornell Medical Index Withdrawn from use; L7 8XP, UK.
owned by New York Weill Cornell Medical Centre Hamilton Depression Rating Scale Professor M.
Archives, 1300 York Avenue, New York, NY 10021, Hamilton, Department of Psychiatry, University of
USA. Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK. Struc-
Consulting Psychologists Press (scale distributors), tured interview version: Dr M. Potts, Department
3803 East Bayshore Road, Box 10096, Palo Alto, CA of Social Work, California State University, 1250
94303, USA. Bellflower Blvd, Long Beach, CA 90840–0902, USA.
COOP/WONCA Charts Northern Centre for Hospital Anxiety and Depression Scale (HADS)
Health Care Research, University of Groningen, NFER-NELSON, Darville House, 2 Oxford Road
Ant. Deusinglaan 4, 9713 AW Groningen, The East, Windsor, Berks SL4 1DF, UK.
166 APPENDIX
Interview Schedule for Social Interaction Profes- School, Framlington Place, Newcastle upon Tyne,
sor A.S. Henderson, NH & MRC Social Psychiatry NE2 4HH, UK.
Research Unit, Australian National University, Philadelphia Geriatric Center Morale Scale
Canberra ACT 0200, Australia. Dr M.P. Lawton, the Edward and Esther Polisher
Karnofsky Performance Scale To view the scale: Research Institute, Philadelphia Geriatric Center,
http://www.acsu.buffalo.edu/~drstall/assessmenttools 5301 Old York Road, Philadelphia, PA 19141–2996,
MAPI Research Institute – produces published USA.
reviews, QoL newsletter and database of broader Psychological Corporation (scale distributors), 555
health status and health-related quality of life instru- Academic Court, San Antonio, TX 78204–2498,
ments. MAPI Research Institute, 27 rue de la Villette, USA.
69003, Lyon, France. http://www.qolid.org Psychological Tests – indexes psychological
McGill Pain Questionnaire (MPQ) Professor R. tests by subject. http://www.nzcer.org.nz/tests/
Melzack, Department of Psychology, Stewart Bio- psychologicaltest
logical Sciences Building, McGill University, 1205 Quality of Life Questionnaire In the USA, P.O.
Docteur Penfield Avenue, Montreal, Quebec H3A Box 950, North Tonawanda, New York, 14120–0950;
1B1, Canada. 1–800–456–3003; In Canada, 3770 Victoria Park
McMaster Health Index Questionnaire Ave., Toronto, ON, M2H 3M6. Internationally:
(MHIQ) Dr L. Chambers, Department of Clinical +1–416–492–2627. Fax: +1–416–492–3343.
Epidemiology and Biostatistics, McMaster University Quality of Well-Being Scale (QWBS) Professor
Health Sciences Center, 1200 Main Street West, R.M. Kaplan, Division of Health Care Sciences,
Hamilton, Ontario L8N 3Z5, Canada. School of Medicine, University of California, 9500
Mind Garden – distributor of many popularly used Gilman Drive, La Jolla, CA 92093–0622, USA.
psychological measures. http://www. mindgarden- Rand Health Insurance/Medical Outcomes Study
.com/Assessments/Info/ Batteries and Scales; Rand 36-item Health
Montgomery-Asberg Depression Rating Scale Survey; Rand-12 Rand Health Sciences Program,
Professor S.A. Montgomery, Department of Distribution Services, 1700 Main Street, PO Box
Psychiatry, Paterson Centre for Mental Health, 2138, Santa Monica, CA 90407–2138, USA;
20 South Wharf Road, London, W2 1PD, http://www.rand.org
UK. Satisfaction with Life Scale Professor E. Diener,
Network Typology: The Network Assessment Department of Psychology, University of Illinois
Instrument Professor G.C. Wenger, Centre for at Urbaba-Champaign, 603E Daniel Champaign, IL
Social Policy Research and Development, University 61820, USA.
of Wales, Bangor, Gwynedd LL57 2DG, Wales. Scales of Psychological Well-Being Dr Carol Ryff,
Training pack for practitioners from: Pavilion Pub- Department of Psychology, Brogdan Hall, University
lishers, 8 St George’s Place, Brighton, East Sussex, of Wisconsin, Madison, WI 53706, USA.
BN1 4GB, UK. SEIQoL Professor C.A. O’Boyle, Department of
NFER-Nelson (scale distributors – mainly psycho- Psychology, Medical School, Royal College of
logical), Darville House, 2 Oxford Road East, Surgeons in Ireland, Mercer Building, Mercer Street,
Windsor, Berks SL4 1DF, UK. Dublin 2, Ireland.
Nottingham Health Profile (NHP) Galen Short-Form-36, Short Form-12, Short Form-8.
Research, Southern Hey, 137 Barlow Moor Road, For details and permission to use: http://
West Didsbury, Manchester M20 8PW, UK. www.qualitymetric.com/SF-36. Technical queries
Older Americans’ Resources and Services and copies of manuals: The Health Assessment
Schedule (OARS) Dr H.J. Cohen, Center for Lab., The Health Institute, NEMCH Box 345, 750
the Study of Aging and Human Development, Box Washington Street, Boston, MA 02111, USA.
3003, Duke University Medical Center, Durham, NC Short-Form 36, Rand-12 (Rand versions) Rand
27710, USA. Health Sciences Programme, 1700 Main Street, PO
Patient-assessed Health Instruments Group, Unit Box 2138, Santa Monica, CA 90407–2138, USA.
of Health-Care Epidemiology, Institute of Health Sickness Impact Profile (SIP) Health Policy and
Sciences, Old Road, Oxford, OX3 7LF; http:// Management, School of Hygiene and Public Health,
phi.uhce.ox.ac.uk contains a bibliography of health The Johns Hopkins University, 624 North Broadway,
instruments. Baltimore, MD 21205–1901, USA.
Patient Generated Index (PGI) Dr Danny Ruta, Social Network Scale (SNS) Department of
School of Population and Health Sciences, University Psychology, University of Illinois at Chicago, P.O. Box
of Newcastle, William Leech Building, The Medical 4348, Chicago, IL 60680, USA.
APPENDIX 167
Social Support Questionnaire Professor I.G. Tennessee Self-Concept Scale Western Psycho-
Sarason, Department of Psychology, University of logical Services, 12031 Wilshire Blvd, Los Angeles,
Washington, Mail Stop NI-25, Seattle, WA 98195, CA 90025–1251, USA.
USA. Tests and Measures in the Social Sciences – alpha-
Social Support Scale Dr C.D. Sherbourne, Rand, 1700 betical index and references for tests; http://
Main Street, Santa Monica, CA 90407–2138, USA. www.libraries.uta.edu/helen/tests
State-Trait Anxiety Inventory purchasable from Western Psychological Services (scale distributors
Mind Garden, http://www.mindgarden.com/ – psychology), 12031 Wilshire Blvd, Los Angeles, CA
Assessments/Info/ 90025–1251, USA.
Stanford Arthritis Center Health Assessment WHOQOL The WHOQOL Group, Division of
Questionnaire (HAQ) Dr J.F. Fries, Department Mental Health, World Health Organization, 1211
of Medicine, Room S-10213, Stanford University Geneva 27, Switzerland.
School of Medicine, Division of Immunology Zung’s Self Rating Depression Scale DISTA
and Rheumatology, Stanford CA 94305, USA. Products, Eli Lilly Corporate Center, Indianapolis,
(trex@stanford.edu; http://aramis.stanford.edu) IN 46285, USA.
REFERENCES
Aaronson, N.K. (1993) The EORTC QLQ-C30, A Rehabilitation status: a measure of medico-social
quality of life instrument for use in international dysfunction. Lancet, i: 230–3.
clinical trials in oncology (abstract). Quality of Life Agrell, B. and Dehlin, O. (1989) Comparison of six
Research, 2: 51. depression rating scales in geriatric stroke patients.
Aaronson, N.K., Acquadro, C., Alonson, J. et al. (1992) Stroke, 20: 1190–4.
International quality of life assessment (IQOLA) Albrecht, G.L. and Devlieger, P.J. (1999) The disability
project. Quality of Life Research, 1: 349–51. paradox: high quality of life against all odds. Social
Abbey, A. and Andrews, F.M. (1986) Modelling the Science and Medicine, 48: 977–88.
psychological determinants of life quality, in F.M. Alden, D., Austin, C. and Sturgeon, R. (1989) A corre-
Andrews (ed.) Research on the Quality of Life. Michigan, lation between the Geriatric Depression Scale long
Ann Arbor: Survey Research Center, Institute for and short forms. Journal of Gerontology, 44: 124–5.
Social Research, University of Michigan. Alonso, J., Anto, J.M., Gonzalez, M. et al. (1992)
Abelin, T., Brzezinski, Z.J. and Carstairs, V.D.L. (eds) Measurement of a general health status of non-oxygen-
(1986) Measurement in Health Promotion and Protection. dependent chronic obstructive pulmonary disease
Copenhagen: World Health Organization, Regional patients. Medical Care, 30: 125–35 (suppl. 5).
Office for Europe, European Series no. 22. American Psychiatric Association (1987) Diagnostic
Abramson, J.H., Terespolsky, L., Brook, J.G. et al. (1965) and Statistical Manual of Mental Disorders, 3rd edn.
Cornell Medical Index as a health measure in epi- Washington, DC: APA.
demiological surveys. British Journal of Preventive and American Psychiatric Association (1994) Diagnostic
Social Medicine, 19: 103–10. and Statistical Manual of Mental Disorders, 4th edn.
Ada, L., Dean, C.M., Hall, J.M. et al. (2003) A tread- Washington, DC: APA.
mill and overground walking programme improves American Psychiatric Association (2000) Diagnostic and
walking in persons residing in the community after Statistical Manual of Mental Disorders, 4th revised
stroke: a placebo-controlled randomised trial. Archives edn, DSM-IV-TR (text revision). Washington, DC:
of Physical Medical Rehabilitation, 84: 1486–91. APA.
Adair, F.L. (1984) Coopersmith Self-Esteem Inventories, American Psychological Association (1974) Standards for
in D.J. Keyser and R.C. Sweetland (eds) Test Critiques, Educational and Psychological Tests. Washington, DC:
Vol. I. Kansas City, MI: Test Corporation of America. APA.
Adams, B.N. (1967) Interaction theory and the social Anderson, J., Sullivan, F. and Usherwood, T.P. (1990) The
network. Sociometry, 30: 64–78. Medical Outcomes Study Instruments (MOSI) –
Addington-Hall, J.M., MacDonald, L.D. and Anderson, use of a new health status measure in Britain. Family
H.R. (1990) Can the Spitzer Quality of Life Index Practice, 7: 205–18.
help to reduce prognostic uncertainty in terminal Anderson, J.P., Kaplan, R.M., Coons, S.J. and Schneider-
cancer? British Journal of Cancer, 62: 695–9. man, L.J. (1998) Comparison of the quality of well-
Affleck, J.W., Aitken, R.C.B., Hunter, J. et al. (1988) being scale and the SF-36 results among two samples
REFERENCES 169
of ill adults: AIDS and other illnesses. Journal of Clinical Antonovsky, A. (1990) A somewhat personal odyssey in
Epidemiology, 51: 755–62. studying the stress process. Stress Medicine, 6: 71–80.
Anderson, R. (1988) The quality of life of stroke patients Antonovsky, A. (1993) The structure and properties
and their carers, in R. Anderson and M. Bury (eds) of the Sense of Coherence Scale. Social Science and
Living with Chronic Illness: The Experience of Patients and Medicine, 36: 725–33.
Their Families. London: Unwin Hyman. Antonovsky, A. and Sagy, S. (1986) The development of
Anderson, R., Davies, J.K., McQueen, D.V. et al. (1989) a sense of coherence and its impact on responses
Health Behaviour Research and Health Promotion. Oxford: to stress situations. Journal of Social Psychology, 126:
Oxford University Press. 213–25.
Anderson, R.T., Aaronson, N.K. and Wilkin, D. (1993) Arfwidsson, L., Elia, G., D’Laurell, B. et al. (1974)
Critical review of the international assessments of Can self-rating replace doctors’ rating in evaluating
health-related quality of life. Quality of Life Research, antidepressive treatment? Acta Psychiatrica Scandinavica,
2: 369–95. 50: 16–22.
Andersson, E. (1993) The Hospital Anxiety and Depres- Argyle, M., Martin, M. and Crossland, J. (1989) Happiness
sion Scale: homogeneity of the subscales. Journal of as a function of personality and social encounters, in
Social Behavior and Personality, 21: 197–204. J.P. Forgas and J.M. Innes (eds) Recent Advances in Social
Andrews, F.M. (1974) Social indicators of perceived life Psychology: An International Perspective. North Holland:
quality. Social Indicators Research, 1: 279–99. Elsevier Science Publishers.
Andrews, F.M. (ed.) (1986) Research on the Quality of Life. Arnau, R.C., Meagher, M.W., Norris, M.P. and Bramson,
University of Michigan: Institute for Social Research, R. (2001) Psychomatic evaluation of the Beck Depres-
Michigan. sion Inventory-II with primary care medical patients.
Andrews, F.M. and Crandall, R. (1976) The validity of Health Psychology, 20: 112–19.
measures of self-reported well-being. Social Indicators Atchley, R.C. (1976) Selected social and psychological
Research, 3: 1–19. differences between men and women in later life.
Andrews, F.M. and McKennel, A.C. (1980) Measures Journal of Gerontology, 31: 204–11.
of self-reported well-being: their affective, cognitive Aure, O.F., Nilsen, J.H. and Vasseljen, O. (2003) Manual
and other components. Social Indicators Research, therapy and exercise therapy in patients with chronic
18: 127–55. low back pain: a randomised, controlled trial with
Andrews, F.M. and Robinson, J.P. (1991) Measures of 1-year follow-up. Spine, 28: 525–31.
subjective well-being, in J.P. Robinson, P.R. Shaver Aylard, P.R., Gooding, J.H., McKenna, P.J. and Snaith,
and L.S. Wrightsman (eds) Measures of Personality and R.P. (1987) A validation study of three anxiety and
Social Psychological Attitudes. London: Academic Press, depression self assessment scales. Psychosomatic Research,
Inc. 31: 261–8.
Andrews, F.M. and Withey, S.B. (1974) Developing Baker, F. and Intagliata, J. (1982) Quality of life in the
measures of perceived life quality: results from several evaluation of community support systems. Evaluation
national surveys. Social Indicators Research, 1: 1–26. and Program Planning, 5: 69–79.
Andrews, F.M. and Withey, S.B. (1976) Social Indicators of Balaswamy, S. and Richardson, V.E. (2001) The cumula-
Well-being: Americans’ Perceptions of Life Quality. New tive effects of life events, personal and social resources
York: Plenum Press. on subjective well-being of elderly widowers. Inter-
Andrews, G., Peters, L. and Teeson, M. (1994) The national Journal of Aging and Human Development,
Measurement of Consumer Outcome in Mental Health: A 53: 311–27.
Report to the National Health Information Strategy Com- Ball, R. and Steer, R.A. (2003) Mean Beck Depression
mittee. Sydney: Clinical Research Centre for Anxiety Inventory-II scores of out-patients with dysthymic
Disorders. or recurrent-episode major depressive disorders.
Anke, A.G. and Fugl-Meyer, A.R. (2003) Life satisfaction Psychological Reports, 93: 507–12.
several years after severe multiple trauma – a retro- Baltes, P. and Baltes, M. (1990) Psychological perspectives
spective investigation. Clinical Rehabilitation, 17: on successful aging: the model of selective optimisa-
431–42. tion with compensation, in P. Baltes and M. Baltes,
Antonelli Incalzi, R., Cesari, M., Pedone, C. et al. (2003) Successful Aging: Perspectives from the Behavioral Sciences.
Construct validity of the abbreviated mental test in New York: Cambridge University Press.
older medical in-patients. Dementia Geriatric Cognitive Banks, M.H. (1983) Validation of the General Health
Disorders, 15: 199–206. Questionnaire in a young community sample. Psycho-
Antonovsky, A. (1987) Unravelling the Mystery of Health: logical Medicine, 13: 349–53.
How People Manage Stress and Stay Well. San Francisco: Bardelli, D. and Saracci, R. (1978) Measuring the quality
Jossey-Bass Publishers. of life in cancer clinical trials: a sample survey of
170 REFERENCES
published trials, in P. Armitage and D. Bardelli (eds) Bedford, A. and Deary, I.J. (1997) The personal dis-
Methods and Impact of Controlled Therapeutic Trials turbance scale (DSSI/SAD): development, use and
in Cancer. Geneva: Unione Internationale Contre le structure. Journal of Personality and Individual Differ-
Cancer. Technical Report Series, No. 36. ences, 22: 493–510.
Barrera, M. (1980) A method for the assessment of social Bedford, A., Foulds, G.A. and Sheffield, B.F. (1976)
support networks in community survey research. A new personal disturbance scale (DSSI/SAD).
Connections, 3: 8–13. British Journal of Social and Clinical Psychology, 15:
Barrera, M. (1981) Social support in the adjustment of 387–94.
pregnant adolescent assessment issues, in B.H. Gottlieb Benjamin, S., Decalmer, P. and Haran, D. (1982) Com-
(ed.) Social Networks and Social Support. Beverly Hills, munity screening for mental illness: a validity study
CA: Sage Publications. of the General Health Questionnaire. British Journal of
Barrera, M. and Ainlay, S. (1983) The structure of social Psychiatry, 140: 174–80.
support: a conceptual and empirical analysis. Journal Benner, P. (1985) Quality of life: a phenomenological
of Community Psychology, 11: 133–43. perspective on explanation, prediction, and under-
Barrera, M., Baca, L., Christiansen, J. et al. (1985) standing in nursing science. Advances in Nursing Science,
Informant corroboration of social support network Special Issue: Quality of Life, 8: 1–14.
data. Connections, 8: 9–13. Bentham, J. (1834) Deonotology. Oxford: Clarendon Press,
Barry, M.M. (1997) Well-being and life satisfaction as reprinted 1983.
components of quality of life in mental disorders, in Berg, R.L., Hallauer, D.S. and Berk, S.N. (1976)
H. Katschnig, H. Freeman and N. Sartorius (eds) Neglected aspects of the quality of life. Health Services
Quality of Life in Mental Disorders. John Wiley and Sons, Research, 11: 391–5.
Chichester. Berglund, G., Liden, A., Hansson, M.G. et al. (2003)
Bech, P. (1981) Rating scales for affective disorder: their Quality of life in patients with multiple endocrine
validity and consistency. Acta Psychiatrica Scandinavica, neoplasia type 1 (MEN 1). Family Cancer, 2: 27–33.
suppl. 295: 1–101. Bergner, M. (1988) Development, testing and use of
Bech, P., Gram, L.F., Dein, E. et al. (1975) Quantitative the Sickness Impact Profile, in S.R. Walker and
rating of depressive states. Acta Psychiatrica Scandinavica, R.M. Rosser (eds) Quality of Life Assessment and
51: 161–70. Application. Lancaster: MIT Press.
Beck, A.T. (1970) Depression: Causes and Treatment. Bergner, M. (1993) Development, testing and use of the
Philadelphia, PA: University of Pennsylvania Press. Sickness Impact Profile, in S.R. Walker and R.M.
Beck, A.T. and Beck, R.W. (1972) Screening depressed Rosser (eds) Quality of Life Assessment: Key Issues in the
patients in family practice: a rapid technique Post- 1990s (2nd edn). Dordrecht: Kluwer Academic.
graduate Medicine, 52: 81–5. Bergner, M., Bobbitt, R.A., Kressel, S. et al. (1976a)
Beck, A.T., Mendelson, M., Mock, J. et al. (1961) Inven- The Sickness Impact Profile: conceptual formulation
tory for measuring depression. Archives of General and methodology for the development of a health
Psychiatry, 4: 561–71. status measure. International Journal of Health Services,
Beck, A.T., Rial, W.Y. and Rickels, K. (1974) Short form 6: 393–415.
of depression inventory: cross validation. Psychological Bergner, M., Bobbitt, R.A., Pollard, W.E. et al. (1976b)
Reports, 34: 1184–6. The Sickness Impact Profile: validation of a health
Beck, A.T., Steer, R.A. and Garbin, M.G. (1988) status measure. Medical Care, 14: 57–67.
Psychometric properties of The Beck Depression Bergner, M., Bobbitt, R.A., Carter, W.B. et al. (1981)
Inventory: twenty-five years of evaluation. Clinical The Sickness Impact Profile: development and final
Psychology Review, 8: 77–100. revision of a health status measure. Medical Care,
Beck, A.T., Steer, R.A. and Brown, G.K. (1996a) Manual 19: 787–805.
for the Beck Depression Inventory-II. San Antonio, Texas: Berkanovic, E., Hurwicz, M.L. and Landsverk, J. (1988)
Psychological Corporation. Psychological distress and the decision to seek medical
Beck, A.T., Steer, R.A., Ball, R. and Ranieri, W. (1996b) care. Social Science and Medicine, 27: 1215–21.
Comparison of Beck Depression Inventories -IA and Berkman, L.F. and Syme, S.L. (1979) Social networks,
-II in psychiatric out-patients. Journal of Personality host resistance and mortality: a nine-year follow-up
Assessment, 67: 588–97. study of Alameda County residents. American Journal of
Becker, M. (1974) The health belief model and personal Epidemiology, 109: 186–204.
health behaviour. Health Education Monographs, 2: Berkman, P.L. (1971) Life stress and psychological
32–73. well-being: a replication of Langner’s analysis in the
Beckie, T.M. and Hayduk, L.A. (1997) Measuring quality Midtown Manhattan study. Journal of Health and Social
of life. Social Indicators Research, 42: 21–39. Behaviour, 12: 35–45.
REFERENCES 171
Bernhard, J., Sullivan, M., Hurny, C. et al. (2001) Clinical Blazer, D.G. (1982) Social support and mortality in an
relevance of single item quality of life indicators elderly community population. American Journal of
in cancer clinical trials. British Journal of Cancer, 84: Epidemiology, 115: 684–94.
1156–65. Blessed, G., Tomlinson, B.E. and Roth, M. (1968) The
Berrios, G.E. and Bulbena-Villarasa, A. (1990) The association between quantitative measures of dementia
Hamilton Depression Scale and the numerical and of senile change in the cerebral grey matter of
description of the symptoms of depression, in P. Bech elderly subjects. British Journal of Psychiatry, 114: 797.
and A. Coppen (eds) The Hamilton Scales. New York: Blumenthal, M.D. (1975) Measuring depressive
Springer-Verlag. symptomatology in a general population. Archives of
Berwick, D.M., Budman, S., Damico-White, J. et al. General Psychiatry, 32: 971–8.
(1987) Assessment of psychological morbidity in Bombardier, C., Ware, J., Russell, J. et al. (1986)
primary care: explorations with the General Auranofin therapy and quality of life in patients with
Health Questionnaire. Journal of Chronic Diseases, 40: rheumatoid arthritis. American Journal of Medicine,
71S–79S. 81: 565–78.
Berwick, D.M., Murphy, J.M., Goldman, P.A. et al. (1991) Bond, J. and Carstairs, V. (1982) Services for the Elderly.
Performance of five-item mental health screening test. Scottish Health Service Studies no. 42. Edinburgh:
Medical Care, 29: 169–76. Scottish Home and Health Department.
Berzon, R.A., Simeon, G.P., Simpson, R.L. et al. (1995) Bond, J., Gregson, B., Atkinson, A. et al. (1989) Evaluation
Quality of life bibliography and indexes: 1993 update. of Continuing Care Accommodation for Elderly People, vol.
Quality of Life Research, 4: 53–74. 2. The Randomised Controlled Trial of the Experimental
Bigelow, D.A., McFarland, B.H. and Olson, M.M. (1991) NHS Nursing Homes and Conventional Continuing Care
Quality of life of community mental health program Wards in NHS Hospitals. Report no. 38. Newcastle
clients: validating a measure. Community Mental Health upon Tyne: University of Newcastle upon Tyne
Journal, 27: 43–55. Health Care Research Unit.
Biggs, J.T., Wylie, L.T. and Ziegler, V.E. (1978) Validity of Bond, M., Bowling, A., McKee, D. et al. (2003) Is age a
the Zung self-rating depression scale. British Journal predictor of access to cardiac services? Journal of Health
of Psychiatry, 132: 381–5. Services Research and Policy, 8: 40–7.
Bild, B.K. and Havighurst, R.J. (1976) Life satisfaction. Borgatta, E.F. and Montgomery, R.J.V. (1987) Critical
Gerontologist, 16: 70–5. Issues in Ageing Policy: Linking Research and Values.
Billings, A.G. and Moos, R.H. (1981) The role of coping Beverly Hills, CA: Sage Publications.
responses and social resources in attenuating the Boshier, R. (1968) Self esteem and first names in
impact of stressful life events. Journal of Behavioural children. Psychological Reports, 22: 762.
Medicine, 4: 139–57. Bowling, A. (1990) The prevalence of psychiatric mor-
Billings, A.G. and Moos, R.H. (1982) Social support and bidity among people aged 85 and over living at
functioning among community and clinical groups: home. Social Psychiatry and Psychiatric Epidemiology, 25:
a panel model. Journal of Behavioural Medicine, 5: 132–40.
295–311. Bowling, A. (1991) Social support and social networks:
Björvell, H., Aly, A., Langius, A. and Nordstrom, G. their relationship to the successful and unsuccessful
(1994) Indicators of changes in weight and eating survival of elderly people in the community. An
behaviour in severely obese patients treated in a analysis of concepts and a review of the evidence.
nursing behavioural program. International Journal of Family Practice, 8: 68–83.
Obesity, 18: 521–5. Bowling, A. (1994) Social networks and social support
Black, S.E., Blessed, J.A., Edwardson, J.A. et al. (1990) among older people and implications for emotional
Prevalence rates of dementia in an ageing population: well-being and psychiatric morbidity. International
are low rates due to the use of insensitive instruments? Review of Psychiatry, 6: 41–58.
Age and Ageing, 19: 84–90. Bowling, A. (1995) What things are important in
Blanchflower, D.G. and Oswald, A.J. (2001) Well-being people’s lives? A survey of the public’s judgements to
Over Time in Britain and the USA. Warwick: University inform scales of health related quality of life. Social
of Warwick, Research Paper No. 616, Department of Science and Medicine, Special Issue ‘Quality of Life’,
Economics. 10: 1447–62.
Blay, S.L., Ramos, L.R. and Mari-Jde, J. (1988) Validity of Bowling, A. (1998) Variations in population health status
a Brazilian version of the Older Americans Resources (letter). British Medical Journal, 317: 601.
and Services (OARS) mental health screening Bowling, A. (2001) Measuring Disease: A Review of Dis-
questionnaire. Journal of the American Geratrics Society, ease-specific Quality of Life Measurement Scales. 2nd edn.
36: 687–92. Buckingham: Open University Press.
172 REFERENCES
Bowling, A. (2002) Research Methods in Health. Investigat- Bowling, A., Farquhar, M., Grundy, E. and Formby, J.
ing Health and Health Services. Buckingham: Open (1993) Changes in life satisfaction over a two
University Press. and a half year period among very elderly people
Bowling, A. and Browne, P. (1991) Social support and living in London. Social Science and Medicine, 36:
emotional well-being among the oldest old living in 641–55.
London. Journal of Gerontology, 46: S20–32. Bowling, A., Farquhar, M. and Grundy, E. (1994a)
Bowling, A. and Cartwright, A. (1982) Life after a Death: Associations with changes in level of functional ability.
A Study of the Elderly Widowed. London: Tavistock Ageing and Society, 14: 53–73.
Press. Bowling, A., Farquhar, M. and Grundy, E. (1994b)
Bowling, A. and Charlton, J. (1987) Risk factors for Changes in the ability to get outdoors among a com-
mortality after bereavement: a logistic regression munity sample of people aged 85+ in 1987: results
analysis. Journal of the Royal College of General from a follow-up study in 1990. International Journal of
Practitioners, 37: 551–4. Health Sciences, 5: 13–23.
Bowling, A. and Farquhar, M. (1995) Changes in Bowling, A., Farquhar, M. and Grundy, E. (1996) Associ-
network composition among older people living in ations with changes in life satisfaction among three
Inner London and Essex. Journal of Health and Place, samples of elderly people living at home. International
3: 149–66. Journal of Geriatric Psychiatry, 11, 1077–87.
Bowling, A. and Formby, J. (1990) Evaluation of District Bowling, A., Bond, M., Jenkinson, C. and Lamping, D.
Health Authority Funded Nursing Homes and Geriatric (1999) Short Form-36 (SF-36) Health Survey
Wards in City and Hackney. London: Department of Questionnaire: Which normative data should be
Public Health, City and Hackney District Health used? Comparisons between the norms provided
Authority. by the Omnibus Survey in Britain, The Health
Bowling, A. and Gabriel, Z. (2004) An integrational Survey for England and the Oxford Health and
model of quality of life in older age. A comparison Lifestyle Survey. Journal of Public Health Medicine, 21:
of analytic and lay models of quality of life. Social 255–70.
Indicators Research, 69: 1–36. Bowling, A., Bannister, D., Sutton, S., Evans, O. and
Bowling, A. and Grundy, E. (1997) Activities of daily Windsor, J. (2002) A multi-dimensional model
living: changes in functional activity in three samples of QoL in older age. Ageing and Mental Health, 6:
of elderly and very elderly people. Age and Ageing, 26: 355–71.
107–14. Bowling, A., Gabriel, Z., Dykes, J..et al. (2003) Let’s ask
Bowling, A. and Grundy, E. (1998) Longitudinal studies them: definitions of quality of life and its enhancement
of social networks and mortality in later life. Reviews among people aged 65 and over. International Journal of
in Clinical Gerontology, 8: 353–61. Aging and Human Development, 56: 269–306.
Bowling, A. and Windsor, J. (1997) The discriminative Bradburn, N.M. (1969) The Structure of Psychological Well-
power of The Health Status Questionnaire-12 being. Chicago, IL: Aldine Publishing.
(HSQ-12) in relation to age, sex and longstanding Bradburn, N.M. and Caplovitz, D. (1965) Reports on
illness: findings from a survey of households in Britain. Happiness: A Pilot Study of Behaviour Related to Mental
Journal of Epidemiology and Community Health, 51: Health. Chicago, IL: Aldine Publishing.
564–73. Bradley, C., Tod, C., Gorton, T. et al. (1999) The develop-
Bowling, A. and Windsor, J. (2001) Towards the good life. ment of an individualized questionnaire measure of
A population survey of dimensions of quality of life. perceived impact of diabetes on quality of life: the
Journal of Happiness Studies, 2: 55–81. ADDQoL. Quality of Life Research, 8: 79–91.
Bowling, A., Leaver, J. and Hoeckel, T. (1988) The Needs Bradlyn, A.S., Harris, C.V., Warner, J.E. et al. (1993)
and Circumstances of People Aged 85+ Living at Home An investigation of the validity of the Quality of
in City and Hackney. London: Department of Public Well-Being Scale with pediatric oncology patients.
Health, City and Hackney Health Authority. Health Psychology, 12: 246–50.
Bowling, A., Edelmann, R., Leaver, J. et al. (1989) Lone- Bradwell, A.R., Carmal, M.H. and Whitehead, T.P.
liness, mobility, well-being and social support in a (1974) Explaining the unexpected: abnormal results
sample of over-65-year-olds. Journal of Personality and of biochemical profile investigations. Lancet, ii:
Individual Differences, 10: 1189–97. 1071–4.
Bowling, A., Farquhar, M., Grundy, E. and Formby, J. Brazier, J.E., Harper, R., Jones, N. et al. (1992) Validating
(1992) Psychiatric morbidity among people aged 85+ the SF-36 health survey questionnaire: a new outcome
in 1987. A follow-up study at two and a half years: measure for primary care. British Medical Journal, 305:
associations with changes in psychiatric morbidity. 160–4.
International Journal of Geriatric Psychiatry, 7: 307–21. Brazier, J., Harper, R., Waterhouse, J. et al. (1993a)
REFERENCES 173
Comparison of outcome measures for patients with processing machine. Archives of Internal Medicine,
chronic obstructive pulmonary disease. Paper pre- 103: 776–82.
sented to the Fifth European Health Services Brodman, K., Erdmann, A.J., Jr, Wolff, H.G. and
Research Conference, Maastricht, December. Miskovitz, P.F. (1986) Cornell Medical Index Health
Brazier, J.E., Jones, N. and Kind, P. (1993b) Testing the Questionnaire. 1986 Revision. New York: Cornell
validity of the EuroQol and comparing it with the SF- University Medical College.
36 health survey questionnaire. Quality of Life Research, Bronfort, G. and Bouter, L.M. (1999) Responsiveness
2: 169–80. of general health status in chronic low back pain: a
Brazier, J.E., Roberts, J. and Deverill, M. (2002) The comparison of the COOP charts and the SF-36. Pain,
estimation of a preference-based measure of health 83: 201–9.
from the SF-36. Journal of Health Economics, 21: 271–92. Brook, R.H., Ware, J.E., Davies-Avery, A. et al. (1979a)
Bridgwood, A. (2000) People aged 65 and Over. Results Overview of adult health status measures fielded
of an independent study carried out on behalf of the in Rand’s health insurance study. Medical Care, 17
Department of Health as part of the 1998 General (supplement), 1–131.
Household Survey. London: Office for National Brook, R.H., Ware, J.E., Davies-Avery, A. et al. (1979b)
Statistics. Conceptualization and Measurement of Health for Adults
Brink, T.L., Curran, P., Dorr, M.L. et al. (1983) Geriatric in the Health Insurance Study, vol. VIII, Overview.
Depression Scale reliability: order, examiner and Santa Monica, CA: Rand Corporation, R-1987/8-
reminiscence effects. Clinical Gerontology, 2: 57–60. HEW.
Brissette, I., Cohen, S. and Seeman, T.E. (2000) Measur- Brookings, J.B. and Bolton, B. (1988) Confirmatory
ing social integration and social networks, in S. Cohen, factor analysis of the Interpersonal Support Evaluation
L.G. Underwood and B.H. Gottlieb (eds) Social List. American Journal of Community Psychology, 16:
Support Measurement and Intervention. A Guide for 137–47.
Health and Social Scientists. Oxford: Oxford University Brooks, R. (1996) EuroQol: the current state of play.
Press. Health Policy, 37: 53–72.
Brissette, I., Scheier, M.F. and Carver, C.S. (2002) The Brorsson, B. and Asberg, K.H. (1984) Katz Index of
role of optimism in social network development, Independence in ADL: reliability and validity in
coping, and psychological adjustment during a life short-term care. Scandinavian Journal of Rehabilitative
transition. Journal of Personality and Social Psychology, Medicine, 16: 125–32.
82: 102–11. Brown, B., Bhrolchain, M. and Harris, T. (1975) Social
Brissette, I., Leventhal, H. and Leventhal, E.A. (2003) class and psychiatric disturbance among women in an
Observer ratings of health and sickness: can other urban population. Sociology, 9: 225–54.
people tell us anything about our health that we don’t Brown, G.L. and Zung, W.W.K. (1972) Depression scales:
already know? Health Psychology, 22: 471–8. self or physician rating? A validation of certain clinic-
Brodman, K., Erdmann, A.J., Jr and Wolff, H.G. (1949) ally observable phenomena. Comprehensive Psychiatry,
Cornell Medical Index Health Questionnaire. New York: 13: 361–7.
Cornell University Medical College. Brown, J.H., Lewis, M.D., Kazis, E. et al. (1984) The
Brodman, K., Erdmann, A.J., Jr, Lorge, I. and Wolff, H.G. dimensions of health outcomes: a cross validated
(1951) The Cornell Medical Index-Health Question- examination of health status measurement. American
naire II. As a diagnostic instrument. Journal of the Journal of Public Health, 74: 159–61.
American Medical Association, 145: 152–7. Browne, J.P. (1999) Selected methods for assessing
Brodman, K., Erdman, A.J., Jr, Lorge, I. et al. (1953) The individual quality of life, in C.R.B. Joyce,
Cornell Medical Index-Health Questionnaire VI: the H.M. McGee and C.A. O’Boyle (eds) Individual
relation of patients’ complaints to age, sex, race and Quality of Life. Approaches to Conceptualisation and
education. Journal of Gerontology, 8: 339–42. Assessment. The Netherlands: Harwood Academic
Brodman, K., Erdman, A.J., Jr, Lorge, I. et al. (1954a) The Publishers.
Cornell Medical Index-Health Questionnaire VII: Browne, J.P., O’Boyle, C.A., McGee, H.M. et al. (1994)
the prediction of psychosomatic and psychiatric dis- Individual quality of life in the healthy elderly. Quality
abilities in army training. American Journal of Psychiatry, of Life Research, 3: 235–44.
111: 37–40. Browne, J.P., O’Boyle, C.A., McGee, H.M. et al. (1997)
Brodman, K., Deutschberger, J., Erdmann, A.J. et al. Development of a direct weighing procedure for
(1954b) Prediction of adequacy for medical service. quality of life domains. Quality of Life Research, 6:
U.S. Armed Forces Medical Journal, 5: 1802–8. 301–9.
Brodman, K., Van Woerkom, A.J., Jr and Goldstein, L.S. Bruce, B. and Fries, J.F. (2003a) The Stanford Health
(1959) Interpretation of symptoms with a data- Assessment Questionnaire (HAQ): a review of its
174 REFERENCES
history, issues, progress and documentation. Journal of Cairl, R.E., Pfeiffer, E., Keller, D.M. et al. (1983) An
Rheumatology, 30: 167–78. evaluation of the reliability and validity of the
Bruce, B. and Fries, J.F. (2003b) The Stanford Health functional assessment inventory. Journal of the American
Assessment Questionnaire: dimensions and practical Geriatric Society, 31: 607–12.
applications. Health and Quality of Life Outcomes Caldwell, J.R. (1985) Family Environment Scale, in
(BioMed Central Ltd) 1: 20. D.J. Keyser and R.C. Sweetland (eds) Test Critiques,
Bucquet, D., Condom, S. and Ritchie, K. (1990) The Vol. II. Kansas City, MI: Test Corporation of
French version of the Nottingham Health Profile: a America.
comparison of item weights with those of the source Caldwell, R.A. and Reinhart, M.A. (1988) The relation-
version. Social Science and Medicine, 30: 829–35. ship of personality to individual differences in the use
Bullinger, M. (1995) German translation and psycho- of type and source of social support. Journal of Social
metric testing of the SF-36 Health Survey: pre- and Clinical Psychology, 6: 140–6.
liminary results from the IQOLA project. Social Science Calman, K.C. (1984) Quality of life in cancer
and Medicine (Special Issue ‘Quality of Life’), 10: patients – a hypothesis. Journal of Medical Ethics, 10:
1359–66. 124–7.
Burchhardt, C.S., Woods, S.L., Schultz, A.A. et al. (1989) Camilleri-Brennan, J., Ruta, D.A. and Steele, R.J.
Quality of life of adults with chronic illness: a psycho- (2002) Patient generated index: new instrument for
metric study. Research on Nursing and Health, 12: measuring quality of life in patients with rectal cancer.
347–54. World Journal of Surgery, 26: 1354–9.
Burnam, M.A., Wells, K.B., Leake, B. and Landsverk, J. Campbell, A. (1976) Subjective measures of wellbeing.
(1988) Development of a brief screening instrument American Psychologist, 31: 117–24.
for detecting depressive disorders. Medical Care, 26: Campbell, A. (1981) The Sense of Well-being in America.
775–89. New York: McGraw-Hill.
Burström, K., Johannesson, M., Diderichsen, F. et al. Campbell, A., Converse, P.E. and Rogers, W.L. (1976)
(2001) Swedish population health-related quality of The Quality of American Life. New York: Russell Sage
life results using the EQ-5D. Quality of Life Research, Foundation.
10: 621–35. Campbell, L.J. and Fiske, D.W. (1959) Convergent and
Bush, J.W. (1984) General health policy model: Quality discriminant validation by the multitrait-multimethod
of well-being (QWB) scale, in N.K. Wenger, M.E. matrix. Psychological Bulletin, 56: 81–105.
Mattson and C.D. Furbergetal (eds) Assessment of Qual- Campbell, P.B. (1967) School and self concept.
ity of Life in Clinical Trials of Cardiovascular Therapies. Educational Leadership, 24: 510–15.
New York: Le Jacq. Cantril, H. (1965) The Pattern of Human Concerns. New
Butow, P., Coates, A., Dunn, S., Bernhard, J. and Hurny, Brunswick, NJ: Rutgers University Press.
C. (1991) On the receiving end IV: Validation of Caplan, G. (1974) Support Systems and Community Mental
quality of life indicators. Annals of Oncology, 2: Health. New York: Behavioral Publications.
597–603. Carp, F.M. (1977) What questions are we asking of
Buxton, M. (1983) The economics of heart transplant whom? in C.N. Nydegger (ed.) Measuring Morale:
programmes: measuring the benefits, in G. Teeling A Guide to Effective Assessment. Washington, DC:
Smith (ed.) Measuring the Social Benefits of Medicine. Gerontology Society.
London: Office of Health Economics. Carr-Hill, R. (1989) Assumption of the QALY pro-
Buxton, M., Acheson, R.M., Caine, N. et al. (1985) Costs cedure. Social Science and Medicine, 29: 469–77.
and Benefits of the Heart Transplantation Programmes at Carr-Hill, R. (1992) A second opinion: Health related
Harefield and Papworth Hospitals. London: Her quality of life measurement – Euro style. Health Policy,
Majesty’s Stationery Office. 20: 321–8.
Byrne, D.G. (1978) Cluster analysis applied to self Carr-Hill, R. and Morris, J. (1991) Current practice
reported depressive symptomatology. Acta Psychiatrica in obtaining the ‘Q’ in QALYS – a cautionary note.
Scandinavica, 57: 1–10. British Medical Journal, 303: 699–701.
Caine, N., Sharples, L.D., English, T.A.H. and Wallwork, Carroll, B.J., Fielding, J.M. and Blash, T.G. (1973)
J. (1990) Prospective study comparing quality of life Depression rating scales: a critical review. Archives of
before and after heart transplantation. Transplantation General Psychiatry, 28: 361–6.
Proceedings, 22: 1437–9. Carroll, B.T., Kathol, R.G., Noyes, R. et al. (1993)
Caine, N., Harrison, S.C.W., Sharpies, L.D. and Wall- Screening for depression and anxiety in cancer
work, J. (1991) Prospective study of quality of life patients using the Hospital Anxiety and Depres-
before and after coronary artery bypass grafting. British sion Scale. Journal of General Hospital Psychiatry, 15:
Medical Journal, 302: 511–16. 69–74.
REFERENCES 175
Carter, W., Bobbitt, R.A., Bergner, M. et al. (1976) gender bias in the Coopersmith Self-Esteem
Validation of an interval scaling: the Sickness Impact Investory – Short Form. Journal of Genetic Psychology,
Profile. Health Services Research, 11: 516–28. 163: 403–9.
Cartwright, A. and Anderson, R. (1981) General Practice Charles, S.T., Reynolds, C.A. and Gatz, M. (2001) Age-
Revisited: A Second Study of Patients and Their Doctors. related differences and change in positive and negative
London: Tavistock Press. affect over 23 years. Journal of Personality and Social
Casellas, F., Lopez-Vivancos, J., Badia, X. et al. (2000) Psychology, 80: 136–51.
Impact of surgery for Crohn’s disease on health- Charlton, J.R.H., Patrick, D.L. and Peach, H. (1983) Use
related quality of life. American Journal of Gastro- of multivariate measures of disability in health surveys.
enterology, 95: 177–82. Journal of Epidemiology and Community Health, 37:
Cassel, J. (1976) The contribution of the social 296–304.
environment to host resistance. American Journal of Chaturvedi, S.K. (1990) Asian patients and the HAD
Epidemiology, 104: 107–23. scale. British Journal of Psychiatry, 156: 133.
Cavanaugh, S. (1983) The prevalence of emotional and Cheng, H. (2003) Personality, self-esteem, and demo-
cognitive dysfunction in a general medical population: graphic predictions of happiness and depression.
using the MMSE, GHQ and BDI. General Hospital Journal of Personality and Individual Differences, 34:
Psychiatry, 5: 15–24. 921–42.
Celiker, R. and Borman, P. (2001) Fibromyalgia Cherlin, A. and Reeder, L.G. (1975) The dimensions of
versus rheumatoid arthritis: a comparison of psycho- psychological well being: a critical review. Sociological
logical disturbance and life satisfaction. Journal of Methods Research, 4: 189–214.
Musculoskeletal Pain, 9: 35–45. Chiange, C.L. (1965) An Index of Health: Mathematical
Challis, D., Mozley, C.G., Sutcliffe, C. et al. (2000) Models. Washington, DC: US Government Printing
Dependency in older people recently admitted to care Office, PHS Publication no. 1000, Series 2, no. 5.
homes. Age and Ageing, 29: 255–60. Chou, K.L. and Chi, I. (1999) Determinants of life satis-
Chambers, L.W. (1984) The McMaster Health Index faction among Chinese older adults: a longitudinal
Questionnaire, in N.K. Wenger, M.E. Mattson, C.D. study. Ageing and Mental Health, 3: 327–34.
Furberg et al. (eds) Assessment of Quality of Life in Chou, K.L. and Chi, I. (2001) Stressful life events and
Clinical Trials of Cardiovascular Therapies. New York: depressive symptoms: social support and sense of con-
Le Jacq. trol as mediators or moderators? International Journal
Chambers, L.W. (1993) The McMaster Health Index of Aging and Human Development, 52: 155–71.
Questionnaire: an update, in S.R. Walker and R.M. Clark, K.K., Borman, C.A., Cropanzano, R.S. and
Rosser (eds) Quality of life assessment: key issues James, K. (1995) Validation evidence for three coping
in the 1990s. The Netherlands: Kluwer Academic measures. Journal of Personality Assessment, 65: 434–55.
Publishers. Clarke, P.J., Marshall, V.W., Ryff, C.D. and Wheaton, B.
Chambers, L.W. (1998) McMaster Health Index (2001) Measuring psychological well-being in the
Questionnaire. Emotional Function Index scoring Canadian Study of Health and Aging. International
method, in S. Salek, Compendium on quality of life Psychogeriatrics, 13: 79–90.
instruments. Chichester: Wiley and Sons. Cleary, P.D., Goldberg, D.M., Kessler, L.G. et al. (1982)
Chambers, L.W., Sackett, D.L., Goldsmith, C.H. et al. Screening for mental disorder among primary care
(1976) Development and application of an index of physicians: usefulness of the General Health Question-
social function. Health Services Research, 11: 430–41. naire. Archives of General Psychiatry, 39: 837–40.
Chambers, L.W., MacDonald, L.A., Tugwell, P. et al. Clinton, M., Lunney, P., Edwards, H. et al. (1998) Per-
(1982) The McMaster Health Index Questionnaire ceived social support and community adaptation in
as a measure of quality of life for patients with schizophrenia. Journal of Advanced Nursing, 27: 955–65.
rheumatoid disease. Journal of Rheumatology, 9: Coates, A., Gebski, V., Stat, M. et al. (1987) Improving the
780–4. quality of life during chemotherapy for advanced
Chambers, L.W., Haight, M., Norman, G. et al. (1987) breast cancer. New England Journal of Medicine, 317:
Sensitivity to change and the effect of mode of 1490–5.
administration on health status measurement. Medical Coates, A.K. and Wilkin, D. (1992) Comparing the
Care, 25: 470–9. Nottingham Health Profile with the Dartmouth
Chaplin, W.F. (1984) State–trait anxiety inventory, in COOP Charts, in J.H.G. Scholten (ed.) Functional Sta-
D.J. Keyser and R.C. Sweetland (eds) Test Critiques, tus Assessment in Family Practice. Lelystad: Meditekst.
Vol. 1. Kansas City, MO: Test Corporation of Cobb, S. (1976) Social support as a moderator of life
America. stress. Psychosomatic Medicine, 38: 300–14.
Chapman, P.L. and Mullis, A.K. (2002) Readdressing Cochrane, A.L. and Holland, W.W. (1971) Validation
176 REFERENCES
of screening procedures. British Medical Bulletin, Cooper, C.L. and Kasl, S.V. (1995) Research Methods for
27: 3–8. Stress and Health Psychology. Chichester: John Wiley.
Cockerham, W.C. (1995) Medical Sociology (6th edn). Cooper, K., Arber, S., Fee, L. and Ginn, J. (1999) The
Englewood Cliffs, NJ: Prentice Hall. Influence of Social Support and Social Capital on Health:
Coen, R., O’Mahony, D., O’Boyle, C.A. et al. (1993) A Review and Analysis of British Data. London: Health
Measuring the quality of life of dementia patients Education Authority.
using the Schedule for the Evaluation of Individual- Cooper, P.J. and Fairburn, C.G. (1986) The depressive
ised Quality of Life. Irish Journal of Psychology, 14: symptoms of bulimia nervosa. British Journal of
154–63. Psychiatry, 148: 268–74.
Coen, R.F., Swanwick, G.R., O’Boyle, C.A. and Coopersmith, S. (1967) The Antecedents of Self-esteem. San
Coakley, D. (1997) Behaviour disturbance and other Francisco, CA: W.H. Freeman. Reprinted in 1981.
predictors of carer burden in Alzheimer’s disease. Coopersmith, S. (1975) Developing motivation in young
International Journal of Geriatric Psychiatry, 12: 331–6. children. San Francisco, CA: Albion Publishing.
Coen, R.F., O’Boyle, C.A., Swanwick, G.R.J. and Coopersmith, S. (1981a) The Antecedents of Self-Esteem.
Coakley, D. (1999) Measuring the impact on relatives Palo Alto, CA: Consulting Psychologists Press.
of caring for people with Alzheimer’s Disease: quality Coopersmith, S. (1981b) Self-esteem Inventories. Palo Alto,
of life, burden and well-being. Psychology and Health, CA: Consulting Psychologists Press.
14: 253–61. Copeland, J.R.M. (1990) Suitable instruments for detect-
Cohen, S., Mermelstein, R., Karmack, T. et al. (1985) ing dementia in community samples. Age and Ageing,
Measuring the functional components of social sup- 19: 81–3.
port, in I.S. Saronson and B.R. Saronson (eds) Social Copeland, J.R.M. and Gurland, B.J. (1978) Evaluation
Support: Theory, Research and Applications. Boston, MA: of diagnostic methods: an international comparison,
Martinus Nijhoff. in A.D. Isaacs and F. Post (eds) Studies in Geriatric
Cohen, C.I., Teresi, J. and Holmes, D. (1987) Social Psychiatry. Chichester: John Wiley.
networks and mortality in an inner-city elderly Copeland, J.R.M., Kelleher, M.J., Kellet, J.M. et al. (1976)
population. International Journal of Ageing and Human A semi-structured clinical interview for the assess-
Development, 24: 257–69. ment of diagnosis and mental state in the elderly: The
Cohen, S., Underwood, L.G. and Gottlieb, B.H. (2000a) Geriatric Mental State: Schedule 1. Development and
Social relationships and health, in Social Support reliability. Psychological Medicine, 6: 439–49.
Measurement and Intervention. A Guide for Health and Copeland, J.R.M., Dewey, M.E. and Griffiths Jones,
Social Scientists. New York: Oxford University Press. H.M. (1986) A computerized psychiatric diagnostic
Cohen, S., Underwood, L.G. and Gottlieb, B.H. (2000b) system and case nomenclature for elderly subjects:
Social Support Measurement and Intervention. A Guide GMS and AGECAT. Psychological Medicine, 16: 89–99.
for Health and Social Scientists. New York: Oxford Copeland, J.R.M., Dewey, M.E., Wood, N. et al. (1987a)
University Press. Range of mental illness among the elderly in the
Coleman, P. (1984) Assessing self-esteem and its sources community: prevalence in Liverpool using the GMS-
in elderly people. Ageing and Society, 4: 117–35. AGECAT package. British Journal of Psychiatry, 150:
Collen, F.M., Wade, D.T. and Bradshaw, C.M. (1990) 815–23.
Mobility after stroke: reliability of measures of Copeland, J.R.M., Gurland, B.J., Dewey, M.E. et al.
impairment and disability. International Disability (1987b) The distribution of dementia, depression
Studies, 12: 6–9. and neurosis in elderly men and women in an urban
Collin, C., Wade, D.T., Davies, D. et al. (1988) The community: assessed using the GMS-AGECAT
Barthel ADL Index: a reliability study. International package. International Journal of Geriatric Psychiatry, 2:
Disability Studies, 10: 61–3. 177–84.
Conner, K.A., Powers, E. and Bultena, G.L. (1979) Social Copeland, J.R.M., Dewey, M.E., Henderson, A.S. et al.
interaction and life satisfaction: an empirical assess- (1988) The Geriatric Mental State (GMS) used in the
ment of late life patterns. Journal of Gerontology, 34: community: replication studies of the computerized
116–21. diagnoses AGECAT. Psychological Medicine, 18: 219–23.
Convery, F.R., Minteer, M.A., Amiel, D. and Connett, Copeland, J.R., Prince, M., Wilson, K.C. et al. (2002)
K.L. (1977) Polyarticular disability: a functional The Geriatric Mental State Examination in the 21st
assessment. Archives of Physical Medicine and Rehabilita- century. International Journal of Geriatric Psychiatry,
tion, 58: 494–9. 17: 729–32.
Cook, E.A. (1998) Effects of reminiscence on life satisfac- Corcoran, J., Franklin, C. and Bennett, P. (1998) The
tion of elderly female nursing home residents. Health use of the Social Support Behaviors Scale with adoles-
Care Women International, 19: 109–18. cents. Research on Social Work Practice, 8: 302–14.
REFERENCES 177
Cornoni-Huntley, J.C., Foley, D.J., White, L.R. et al. and self-esteem in persons after coronary artery bypass
(1985) Epidemiology of disability in the oldest old: graft surgery. International Journal of Nursing Studies, 39:
methodologic issues and preliminary findings. Milbank 745–55.
Memorial Fund Quarterly, Health and Society, 63: 350–76. Davidson, I.A., Dewey, M. and Copeland, J.R.M. (1988)
Costa, P.T. and McCrae, R.R. (1977) Psychiatric symp- The relationship between mortality and mental dis-
tom dimensions in the Cornell Medical Index among order: evidence from the Liverpool longitudinal study.
normal adult males. Journal of Clinical Psychology, 33: International Journal of Geriatric Psychiatry, 3: 95–8.
941–6. Davies, A.R. and Ware, J.E. (1981) Measuring Health Per-
Costa, P.T. and McCrae, R.R. (1984) Personality as a ceptions in the Health Insurance Program. Santa Monica,
lifelong determinant of well-being, in C. Lalatesta and CA: Rand Corporation: R-2711-HHS.
C. Izard (eds). Affective process in development and aging. Davies, B., Burrows, G. and Poynton, C.A. (1975) Com-
Beverly Hills, CA: Sage. parative study of four depression rating scales. Australian
Costa, P.T., Zonderman, A.B., McRae, R.R. et al. (1987) and New Zealand Journal of Psychiatry, 9: 21–4.
Longitudinal analysis of psychological well-being Davies, A.R., Sherbourne, C.D., Peterson, J.R. and Ware,
in national samples: stability of mean levels. Journal of J.E. (1988) Scoring Manual: Adult Health Status and
Gerontology, 42: 50–5. Patient Satisfaction Measures Used in RAND’s Health
Coulthard, M., Walker, A. and Morgan, A. (2001) Insurance Experiment. Publication No. N-2190HHS.
People’s perceptions of their neighbourhood and Santa Monica, CA: Rand Corporation.
community involvement. Results from the social Dean, K., Holst, E., Kremer, S. et al. (1994) The measure-
capital module of the General Household Survey ment issues in research on social support and health.
2000. London: The Stationery Office. Journal of Epidemiology and Community Health, 48:
Cox, B.D., Blaxter, M., Buckle, A.L.J. et al. (1987) The 201–6.
Health and Lifestyle Survey. London: Health Promotion de Bruin, A.F., de Witte, L.P., Stevens, F.C. and Diederiks,
Research Trust. J.P. (1992) Sickness Impact Profile: the state of the art
Cox, B.D., Huppert, F.A. and Whichelow, M.J. (1993) of a generic functional status measure. Social Science and
The Health and Lifestyle Survey: Seven Years on. London: Medicine, 8: 1003–14.
Health Promotion Research Trust. de Bruin, A.F., Diederiks, J.P.M., de Witte, L.P. and
Crandall, R.C. (1973) The measurement of self esteem Stevens, F.C.J. (1993) The first testing of the SIP-68, a
and related constructs, in J. Robinson and P. Shaver short generic version of the Sickness Impact Profile.
(eds) Measures of Social Psychological Attitudes. Ann Paper presented to the Fifth European Health Services
Arbor, MI: Institute for Social Research. Research Conference, Maastricht, December.
Craven, P. and Wellman, B. (1974) The network city, de Girolamo, G., Rucci, P., Scocco, P. et al. (2000) Quality
in M.P. Effrat (ed.) The Community: Approaches and of life assessment: validation of the Italian version of the
Applications. New York: Free Press. WHOQOL-Brief. Epidemiol. Psichiatr. Soc., 9: 45–55.
Crawford-Little, J. and McPhail, N.I. (1973) Measures of de Haan, R., Limburg, M., Schuling, J. et al. (1993)
depressive mood at monthly intervals. British Journal Clinimetric evaluation of the Barthel Index, a measure
of Psychiatry, 122: 447–52. of limitations in daily activities. Netherlands Tijdschr
Cronbach, L.J. (1951) Coefficient alpha and the internal Geneeskd, 137: 917–21.
structure of tests. Psychometrika, 22: 293–6. de Joode, E.W., van Meeteren, N.L., van den Berg, H.M.
Crossley, T.F. and Kennedy, S. (2000) The stability of self- et al. (2001) Validity of health status measurement with
assessed health status. Social and Economic Dimen- the Dutch Arthritis Impact Measurement Scale 2 in
sions of an Aging Population. SEDAP Research Paper individuals with severe haemophilia. Haemophilia, 7:
no. 26. Canberra, ACT: SEDAP, Australian National 190–7.
University. de Leo, D., Diekstra, R.F.W., Lonnqvist, J. et al. (1998a)
Crowne, D.P. and Marlowe, D. (1964) The Approval LEIPAD, an internationally applicable instrument to
Motive: Studies in Evaluation Dependence. New York: assess quality of life in the elderly. Behavioural Medicine,
John Wiley and Son. 24: 17–27.
Cuffel, B.J. and Akamatsu, T.J. (1989) The structure of de Leo, D., Diekstra, R.F.W., Lonnqvist, J. et al. (1998b)
loneliness: a factor-analytic investigation. Cognitive LEIPAD Questionnaire. Compendium of Quality of Life
Therapy and Research, 13: 459–74. Instruments. Chichester, West Sussex: John Wiley and
Dalton, D.S., Cruickshanks, K.J., Klein, B.E. et al. (2003) Son.
The impact of hearing loss on quality of life in older Demetri, G.D., Gabrilove, J.L., Blasi, M.V. et al. (2002)
adults. The Gerontologist, 43: 661–8. Benefits of epoetin alfa in anemic breast cancer
Dantas, R.A., Motzer, S.A. and Ciol, M.A. (2002) The patients receiving chemotherapy. Clinical Breast Cancer,
relationship between quality of life, sense of coherence 3: 45–51.
178 REFERENCES
Demyttenaere, K. and Fruyt, J. (2003) Getting what you Donald, C.A. and Ware, J.E. (1982) The Quantification of
asked for: on the selectivity of depression rating scales. Social Contacts and Resources. Santa Monica, CA: Rand
Psychotherapy Psychosomatics, 72: 61–70. Corporation R-2937-HHS.
Dennerstein, L., Dudley, E., Guthrie, J. and Barrett- Donald, C.A., Ware, J.E., Brook, R.H. et al. (1978) Con-
Conner, E. (2000) Life satisfaction, symptoms, and the ceptualization and Measurement of Health for Adults in
menopausal transition. Medscape Women’s Health, 5: E4. the Health Insurance Study, vol. IV, Social Health. Santa
Denniston, O.L. and Jette, A.M. (1980) A functional Monica, CA: Rand Corporation R-1987/4HEW.
status assessment instrument: validation in an elderly Dorman, P.J., Slattery, J., Farrell, B. et al. (1997a) A ran-
population. Health Services Research, 15: 21–4. domised comparison of the EuroQol and Short
Dewey, M.E. and Copeland, J.R.M. (1986) Computer- Form-36 after stroke. British Medical Journal, 315:
ized psychiatric diagnosis in the elderly: AGECAT. 416.
Journal of Microcomputer Applications, 9: 135–40. Dorman, P.J., Waddell, F., Slattery, J. et al. (1997b) Are
Deyo, R.A. (1993) Measuring the quality of life of proxy assessments of health status after stroke with the
patients with rheumatoid arthritis, in S.R. Walker and EuroQol questionnaire feasible, accurate, and
R.M. Rosser (eds) Quality of Life Assessment: Key Issues unbiased? Stroke, 28: 1883–7.
in the 1990s. Dordrecht: Kluwer Academic. Dozois, D.J. (2003) The psychometric characteristics
Deyo, R.A., Inui, T.S., Leininger, J.D. et al. (1982) of the Hamilton Depression Inventory. Journal of
Physical and psychological functions in rheumatoid Personality Assessment, 80: 31–40.
arthritis: clinical use of a self-administered instrument. Dubuisson, D. and Melzack, R. (1976) Classification
Archives of Internal Medicine, 142: 879–82. of clinical pain descriptions by multiple group
Deyo, R.A., Inui, T.S., Leininger, J.D. et al. (1983) discriminant analysis. Experimental Neurology, 51:
Measuring functional outcomes in chronic disease: 480–7.
a comparison of traditional scales and a self- Dunnell, K. and Cartwright, A. (1972) Medicine Takers,
administered health status questionnaire in patients Prescribers and Hoarders. London: Routledge and Kegan
with rheumatoid arthritis. Medical Care, 21: 180–92. Paul.
Dias, R.C., Dias, J.M. and Ramos, L.R. (2003) Impact of Dupuy, H.J. (1973) Developmental Rationale, Substantive,
an exercise and walking protocol on quality of life in Derivatable, and Conceptual Relevance of the General Well-
elderly people with OA of the knee. Physiotherapy Being Schedule. Fairfax, VA: National Center for Health
Research International, 8: 121–30. Statistics.
Diener, E. and Suh, E. (1997) Measuring quality of life: Dupuy, H.J. (1974) Utility of the National Center for
economic, social and subjective indicators. Social Indi- Health Statistics’ General Well-Being Schedule in the
cators Research, 40: 189–216. Assessment of Self-Representations of Subjective
Diener, E., Emmons, R.A., Larsen, R.J. and Griffin, S. Well-Being and Distress, in Report of The National
(1985) The Satisfaction with Life Scale. Journal of Conference on Evaluation in Alcohol, Drug Abuse and
Personality Assessment, 49: 71–5. Mental Health Programs. Washington, DC: ADA MHA.
Diener, E., Sandvik, E., Pavot, W. and Gallagher, D. (1991) Dupuy, H.J. (1978) Self representations of general
Response artefacts in the measurement of subjective psychological well-being of American adults. Paper
well-being. Social Indicators Research, 24: 35–56. presented at American Public Health Association
Diener, E., Oishi, S. and Lucas, R.E. (2003) Personality, Meeting. Los Angeles, California, 17 October.
culture, and subjective well-being: emotional and cog- Dupuy, H.J. (1984) The psychological General Well-
nitive evaluations of life. Annual Review of Psychology, being Index, in N.K. Wenger, M.E. Mattson, C.D.
54: 403–25. Furberg et al. (eds) Assessment of Quality of Life in
Doble, S.E., Fisk, J.D., MacPherson, K.M. et al. (1997) Clinical Trials of Cardiovascular Therapies. New York:
Measuring functional competence in older persons Le Jacq.
with Alzheimer’s disease. International Psychogeriatics, Durkheim, E. (1895) The Rules of Sociological Method (ed.
9: 25–38. S. Lukes, trans. W.D. Halls 1938). Free Press, New York.
Dobson, C., Powers, E.A., Keith, P.M. et al. (1979) Reprinted in 1982.
Anomie, self-esteem, and life satisfaction: interrelation- Durkheim, E. (1897) Suicide: A Study in Sociology (ed. G.
ships among three scales of well-being. Journal of Stimpson, trans. J.A. Spaulding and G. Stimpson 1951).
Gerontology, 34: 569–72. Free Press, New York. Reprinted in 1997.
Dolan, P. (1997) Modelling valuations for EuroQol Earl-Slater, A. (2002) The Handbook of Clinical Trials and
health states. Medical Care, 35: 1095–108. other Research. Oxford: Radcliffe Medical Press.
Dolan, P., Gudex, C., Kind, P. and Williams, A. (1996) Ebmeier, K.P., Beson, J.A.O., Eagles, J.N. et al. (1988)
The time-trade-off method: results from a general Continuing care of the demented elderly in Inverurie.
population study. Health Economics, 5: 141–54. Health Bulletin, 46: 32–41.
REFERENCES 179
Edwards, D.W., Yarvis, R.M., Mueller, D.P. et al. (1978) example from quality of life. Journal of the Royal
Test-taking and the stability of adjustment scales: Statistical Association, 165, Part 2: 1–21.
can we assess patient deterioration? Evaluation Q, Fazio, A.F. (1977) A Concurrent Validation Study of the
2: 275–91. NCHS General Well-Being Schedule. Vital and Health
Edwards, J.N. and Klemmack, D.L. (1973) Correlates Statistics Series 2, no. 73 DHEW Publication No.
of life satisfaction: a re-examination. Journal of (HRA) 78–1347. Hyattsville, MA: US Department of
Gerontology, 28: 479–502. Health, Education and Welfare, National Center for
Eisenberg, E., Damunni, G., Hoffer, E. et al. (2003) Lamo- Health Statistics.
trigine for intractable sciatica: correlation between Fillenbaum, G.G. (1978) Multi-dimensional functional
dose plasma concentration and analgesia. European assessment: The OARS Methodology – A Manual,
Journal of Pain, 7: 485–91. 2nd edn. Durham, NC: Center for the Study of Aging
Emmons, R.A. and Diener, E. (1985) Personality corre- and Human Development, Duke University.
lates of subjective well-being. Personality and Social Fillenbaum, G.G. (1980) Comparison of two brief tests of
Psychology Bulletin, 11: 89–97. organic brain impairment, the MSQ and the Short
Eskin, M. (1993) Swedish translations of the Suicide Portable MSQ. Journal of the American Geriatrics Society,
Probability Scale, Perceived Social Support from 28: 381–4.
Family and Friends Scales, and the Scale for Inter- Fillenbaum, G.G. (1988) Multi-dimensional Functional
personal Behavior: a reliability analysis. Scandinavian Assessment of Older Adults: the Duke Older Americans
Journal of Psychology, 34: 276–81. Resources and Services Procedures. Hillsdale, NJ:
Espwall, M. and Olofsson, N. (2002) Social networks of Lawrence Erlbaum.
women with undefined musculoskeletal disorder. Fillenbaum, G.G. and Smyer, M.A. (1981) The develop-
Social Work in Health Care, 36: 77–91. ment, validity and reliability of the OARS multi-
EuroQol Group (1990) EuroQol – a new facility for the dimensional functional assessment questionnaire:
measurement of health-related quality of life. Health disability and pain scales. Journal of Gerontology, 36:
Policy, 16: 199–208. 428–33.
Evans, D. and Cope, W. (1994) The Quality of Life Finch, J. (1989) Family Obligations and Social Change.
Questionnaire–D. Complete Kit. Northern Tonawanda, Cambridge: Policy Press.
New York: Multi-Health Systems. Finlay Jones, R.A. and Murphy, E. (1979) Severity of
Evans, G., Hughes, B.C. and Wilkin, D. (1981) The psychiatric disorder and the 30-item General
Management of Mental and Physical Impairment in Non- Health Questionnaire. British Journal of Psychiatry,
specialist Residential Homes for the Elderly, Research 134: 609–16.
Report no. 4. Manchester: Research Section, Fiore, J., Coppel, D.B., Becker, J. et al. (1986) Social sup-
Psychiatric Unit, University Hospital of South port as a multifaceted concept: examination of
Manchester. important dimensions for adjustment. American Journal
Evans, K., Tyrer, P., Catalan, J. et al. (1999) Manual assisted of Community Psychology, 14: 93–111.
cognitive-behavior therapy (MACT): a randomised Firat, S., Byhardt, R.W. and Gore, E. (2002) Comorbidity
controlled trial of a brief intervention with biblio- and Karnofsky performance score are independent
therapy in the treatment of recurrent deliberate self- prognostic factors in stage III non-small-cell lung
harm. Psychological Medicine, 29: 19–25. cancer: an institutional analysis of patients treated on
Evans, R.W., Manninen, D.L., Overcast, T.D. et al. (1984) four RTOG studies. Radiation Therapy Oncology
The National Heart Transplantation Study: Final Report. Group. International Journal of Radiation Oncology
Seattle, WA: Battelle Human Affairs Research Centre. Biology Physiology, 54: 357–64.
Fallowfield, L.J., Baum, M. and Maguire, G.P. (1987) Do Fitts, W.H. (1965) Tennessee Self-Concept Scale Manual.
psychological studies upset patients? Journal of the Nashville, TN: Counselor Recordings and Tests.
Royal Society of Medicine, 80: 59. Fitts, W.H. (1972) The Self-Concept and Performance,
Fanshel, S. and Bush, J.W. (1970) A health status index Research Monograph no. 5. Nashville, TN: Social and
and its applications to health services outcomes. Rehabilitation Service.
Operational Research, 18: 1021–65. Fitts, W.H. and Warren, W.L. (1996) Tennessee Self-Concept
Farquhar, M. (1995a) Definitions of quality of life: a Scale (4th edn). Los Angeles, CA: Western Psycho-
taxonomy. Journal of Advanced Nursing, 22: 502–8. logical Corporation.
Farquhar, M. (1995b) Elderly people’s definitions of Fitzpatrick, R. (1999) Assessment of quality of life as
quality of life. Special Issue ‘Quality of Life’ in Social an outcome: finding measurements that reflect
Science and Medicine, 10: 1439–46. individual’s priorities (editorial). Quality in Health
Fayers, P.M., and Hand, D.J. (2002) Causal variables, Care, 8: 1–2.
indicator variables and measurement scales: an Fitzpatrick, R., Newman, S., Lamb, R. and Shipley, M.
180 REFERENCES
(1988) Social relationships and psychological database for rheumatic disease. Arthritis and Rheuma-
wellbeing in rheumatoid arthritis. Social Science and tism, 17: 327–36.
Medicine, 27: 399–403. Fries, J.F., Spitz, P.W., Kraines, R.G. and Holman, H.R.
Fitzpatrick, R., Newman, S., Lamb, R. and Shipley, M. (1980) Measurement of patient outcome in arthritis.
(1989) A comparison of measures of health status Arthritis and Rheumatism, 23: 137–45.
in rheumatoid arthritis. British Journal of Rheumatology, Fries, J.F., Spitz, P.W. and Young, D.Y. (1982) The dimen-
28: 201–6. sions of health outcomes: the Health Assessment
Fitzpatrick, R., Ziebland, S., Jenkinson, C. and Mowat, A. Questionnaire, disability and pain scales. Journal of
(1992) A generic health status instrument in the Rheumatology, 9: 789–93.
assessment of rheumatoid arthritis. British Journal of Frisch, M.B., Cornell, J., Villanueva, M. and Retzlaff, P.J.
Rheumatology, 31: 87–90. (1992) Clinical validation of the Quality of Life Inven-
Fjartoft, H., Indredavik, B. and Lydersen, S. (2003) Stroke tory. A measure of life satisfaction for use in treatment
unit care combined with early supported discharge. planning and outcome assessment. Psychological
Long-term follow-up of a randomised controlled trial. Assessment, 4: 92–101.
Stroke, 34: 2691–2. Fry, P.S. (2000) Whose quality of life is it anyway? Why
Flax, M.J. (1972) A Study in Comparative Urban Indicators: not ask seniors to tell us about it? International Journal of
Conditions in 18 Large Metropolitan Areas. Washington, Aging and Human Development, 50: 361–83.
DC: The Urban Institute. Gallagher, D., Nies, G. and Thompson, L.W. (1982)
Fleck, M.P., Louzada, S., Xavier, M. et al. (2000) Appli- Reliability of the Beck Depression Inventory with
cation of the Portugese version of the abbreviated older adults. Journal of Consulting and Clinical
instrument of quality of life WHOQOL-Bref. Rev Psychology, 50: 152–3.
Saude Publica, 34: 178–83. Gallegos-Orzco, J.F., Fuentes, A.P., Gerardo Argueta, J.
Fletcher, A., McLoone, P. and Bulpitt, C. (1988) Quality et al. (2003) Health related quality of life and depres-
of life on angina therapy: a randomized controlled sion in patients with chronic hepatitis C. Archives
trial of transdermal glyceryl trinitrate against placebo. Medical Research, 34: 124–9.
Lancet, 2: 4–8. Gana, K. and Garnier, S. (2001) Latent structure of
Folstein, M.F., Folstein, S.E. and McHugh, P.R. (1975) the sense of coherence scale in a French sample.
‘Mini-Mental State’: A practical method for grading Journal of Personality and Individual Differences, 31:
the cognitive state of patients for the clinician. Journal 1079–90.
of Psychiatric Research, 12: 189–98. Gandek, B. and Ware, J.E. (eds) (1998) Translating
Forsberg, C. and Bjorvell, H. (1993) Swedish population functional health and well-being: International quality
norms for the GHRI, HI and STAI-state. Quality of of life assessment (IQOLA) project studies of the
Life Research, 2: 349–56. SF-36 Health Survey. Journal of Clinical Epidemiology,
Fortinsky, R.H., Granger, C.V. and Selzer, G.B. (1981) 51: 891–1214.
The use of functional assessment in understanding Garber, J., Little, S., Hilsman, R. and Weaver, K.R. (1998)
home care needs. Medical Care, 19: 489–97. Family predictors of suicidal symptoms in young
Francis, D. (1984) Will You Still Need Me, Will You Still adolescents. Journal of Adolescence, 21: 445–57.
Feed Me, When I’m 84? Bloomington, IN: Indiana Garratt, A.M. and Ruta, D.A. (1999) The Patient Gener-
University Press. ated Index, in C.R.B. Joyce, C.A. O’Boyle and H.
Franke, G.H., Reimer, J., Philipp, T. and Heemann, U. McGee, Individual Quality of Life. Approaches to
(2003) Aspects of quality of life through end-stage Conceptualisation and Assessment. Amsterdam: Harwood
renal disease. Quality of Life Research, 12: 103–15. Academic Publishers.
Freedland, K.E., Skala, J.A., Carney, R.M. et al. (2002) Garratt, A.M., Ruta, D.A., Abdalla, M.I. et al. (1993)
The Depression Interview and Structured Hamilton The SF-36 health survey questionnaire: an outcome
(DISH): rationale, development, characteristics, measure suitable for routine use within the NHS?
and clinical validity. Psychosomatic Medicine, 64: British Medical Journal, 306: 1440–4.
897–905. Garratt, A.M., Ruta, D.A., Abdalla, M.I. and Russell, I.T.
Freemantle, N., Long, A., Mason, J. et al. (1993) The (1994) SF-36 health survey questionnaire: II.
treatment of depression in primary care. Effective Responsiveness to changes in health status in four
Health Care, 5 (whole issue). common clinical conditions. Quality in Health Care, 3:
Fries, J.F. (1983) The assessment of disability: from first to 186–92.
future principles. Paper presented at conference, Garratt, A.M., Schmidt, L., Mackintosh, A. and Fitz-
Advances in Assessing Arthritis, held at the London patrick, R. (2002) Quality of life measurement:
Hospital, March (mimeo). bibliographic study of patient assessed health outcome
Fries, J.F., Hess, E.V. and Klinenberg, J. (1974) A standard measures. British Medical Journal, 324: 1417.
REFERENCES 181
Gatz, M., Pederson, N.L. and Harris, J. (1987) Measure- Goodrick, G.K., Pendleton, V.R., Kimball, K.T. et al.
ment characteristics of the mental health scale from (1999) Binge eating severity, self-concept, dieting
the OARS. Journal of Gerontology, 42: 332–5. self-efficiency and social support during treatment
Gavazzi, S.M. (1994) Perceived social support from of binge eating disorder. International Journal of Eating
family and friends in a clinical sample of adolescents. Disorders, 26: 295–300.
Journal of Personality Assessment, 62: 465–71. Gough, I.R., Furnival, C.M., Schilder, L. et al. (1983)
George, L.K. (1979) The happiness syndrome: method- Assessment of the quality of life of patients with
ological and substantive issues in the study of psycho- advanced cancer. European Journal of Cancer and Clinical
logical well-being in adulthood. Gerontologist, 19: Oncology, 19: 1161–5.
210–16. Grady, K.L., Meyer, P.M., Dressler, D. et al. (2003)
George, L.K. and Bearon, L.B. (1980) Quality of Life in Change in quality of life from after left ventricular
Older Persons: Meaning and Measurement. New York: assist device implantation to after heart trans-
Human Sciences Press. plantation. Journal of Heart and Lung Transplantation,
George, L.K., Blazer, D.G., Hughes, D.C. et al. (1989) 22: 1254–67.
Social support and the outcome of major depression. Graham, C., Bond, S.S., Gerkovich, M. M. et al. (1980)
British Journal of Psychiatry, 154: 478–85. Use of the McGill pain questionnaire in the assess-
Gibbons, R.D., Clark, D.C. and Kupfer, D.J. (1993) ment of cancer pain: reliability and consistency. Pain,
Exactly what does the Hamilton Depression Rating 6: 377.
Scale measure? Journal of Psychiatric Research, 27: Granger, C.V. (1982) Health accounting-functional
259–73. assessment of the long term patient, in F.J. Kottke, G.K.
Gilson, B.S., Bergner, M., Bobbitt, R.A. et al. (1979) The Stillwell and J.F. Lehmann (eds) Krusen’s Handbook of
Sickness Impact Profile: Final Development and Testing: Physical Medicine and Rehabilitation, 3rd edn. Philadel-
1975–1978. Seattle, WA: University of Washington phia, PA: W.B. Saunders.
Press. Granger, C.V. and McNamara, M.A. (1984) Functional
Glass, T.A. and Maddox, G.L. (1992) The quality and assessment utilization: the long-range evaluation
quantity of social support: stroke recovery as psycho- system (LRES), in C.V. Granger and G.E. Gresham
social transition. Social Science and Medicine, 34: (eds) Functional Assessment in Rehabilitation Medicine.
1249–61. Baltimore, MD: Williams & Williams.
Goldberg, D.P. (1978) Manual of the General Health Granger, C.V., Albrecht, G.L. and Hamilton, B.B. (1979)
Questionnaire. Windsor: NFER-Nelson. Outcome of comprehensive medical rehabilitation:
Goldberg, D.P. (1985) Identifying psychiatric illness measurement by PULSES profile and the Barthel
among general medical patients. British Medical Journal, Index. Archives of Physical Medicine and Rehabilitation,
291: 161–3. 60: 145–54.
Goldberg, D.P. and Hillier, V.F. (1979) A scaled version of Grant, C.R.H. (1966) Age differences in self-concept
the General Health Questionnaire. Psychological Medi- from early adulthood through old age. Dissertation
cine, 9: 139–45. from the University of Nebraska.
Goldberg, D.P. and Huxley, P. (1980) Mental Illness in the Grasbeck, R. and Saris, N.E. (1969) Establishment and
Community: The Pathway to Psychiatric Care. London: use of normal values. Scandinavian Journal of Clinical
Tavistock. and Laboratory Investigation, Supplement no. 110: 62–3.
Goldberg, D.P. and Williams, P. (1988) A User’s Guide Greenberger, E., Chen, C., Dmitrieva, J. and Farruggia,
to the General Health Questionnaire. Windsor: NFER- S.P. (2003) Item-wording and the dimensionality
Nelson. of the Rosenberg Self-Esteem Scale: do they matter?
Goldstein, M.S., Siegel, J.M. and Boyer, R. (1984) Pre- Journal of Personality and Individual Differences, 35:
dicting changes in perceived health status. American 1241–54.
Journal of Public Health, 74: 611–15. Greenblatt, H.N. (1975) Measurement of Social Well-being
Gompertz, P., Pound, P. and Ebrahim, S. (1993a) The in a General Population Survey. Berkeley, CA: Human
reliability of stroke outcome measurement. Clinical Population Laboratory, California State Department
Rehabilitation, 7: 290–6. of Health.
Gompertz, P., Pound, P. and Ebrahim, S. (1993b) Kudos: Greenwald, H.P. (1987) The specificity of quality of
A Kit for Describing the Outcome of Stroke. London: life measures among the seriously ill. Medical Care,
Department of Public Health, Royal Free Hospital 25: 642–51.
Medical School. Greiner, P.A., Snowdon, D.A. and Greiner, L.H. (1999)
Goodchild, M.E. and Duncan Jones, P. (1985) Chronicity Self-rated function, self-rated health, and postmortem
and the General Health Questionnaire. British Journal evidence of brain infarcts: findings from the Nun
of Psychiatry, 146: 55–61. study. Journal of Gerontology (B), 54: S219–22.
182 REFERENCES
Griffiths, R.A., Beaumont, P.J., Giannakopoulos, E. et al. Guyatt, G.H., Nogradi, S., Halcrow, S. et al. (1989)
(1999) Measuring self-esteem in dieting disordered Development and testing of a new measure of
patients: the validity of the Rosenberg and Cooper- health status for clinical trials in heart failure. Journal
smith contrasted. International Journal of Eating of General Internal Medicine, 4: 101–7.
Disorders, 25: 227–31. Guyatt, G.H., Eagle, D.J., Sackett, B. et al. (1993)
Groessl, E.J., Kaplan, R.M. and Cronan, T.A. (2003) Measuring quality of life in the frail elderly. Journal of
Quality of well-being in older people with osteo- Clinical Epidemiology, 46: 1433–44.
arthritis. Arthritis and Rheumatism, 49: 23–8. Haavardsholm, E.A., Kvien, T.K., Uhlig, T. et al. (2000)
Groth-Marnat, G. (1990) The Handbook of Psychological A comparison of agreement and sensitivity to change
Assessment, 2nd edn. New York: John Wiley and between AIMS2 and a short form of AIMS2 (AIMS2-
Sons. SF) in more than 1,000 rheumatoid arthritis patients.
Grummon, K., Rigby, E.D., Orr, D. et al. (1994) Psycho- Journal of Rheumatology, 27: 2810–16.
social variables that affect the psychological adjust- Haber, L.D. (1968) Prevalence of Disability among Non-
ment of IVDU patients with AIDS. Journal of Clinical institutionalized Adults under Age 65: 1966 Survey of
Psychology, 50: 488–502. Disabled Adults, Research and Statistics Note no. 4. US
Grundy, E. and Bowling, A. (1999) Enhancing the Department of Health Education and Welfare, Office
quality of extended life years. Identification of the of Research and Statistics.
oldest old with a very good and very poor quality of Hagg, O., Fritzell, P., Nordwall, A. and the Swedish
life. Ageing and Mental Health, 3: 199–212. Lumbar Spine Study Group (2003) The clinical
Grundy, E., Bowling, A. and Farquhar, M. (1996) Social importance of changes in outcome scores after treat-
support, life satisfaction and survival at older ages, in ment for low back pain. European Spine Journal, 12:
G. Casselli and A. Lopez (eds) Health and Mortality 12–20.
among Elderly Populations. Oxford: Clarendon Press. Hall, J., Hall, N., Fisher, E. et al. (1987) Measurement of
Guillemin, F., Coste, J., Pouchot. J. et al. (1997) The outcomes of general practice: comparison of three
AIMS2-SF: a short form of the Arthritis Impact health status measures. Family Practice, 4: 117–23.
Measurement Scales-2 – French quality of life in Hall, K.M. (1992) Overview of functional assessment
rheumatology group. Arthritis and Rheumatism, scales in brain injury rehabilitation. NeuroRehabilita-
40: 1267–74. tion, 2: 97–112.
Guillon, M.S., Crocq, M.A. and Bailey, P.E. (2003) Hall, K.M. (1997) The Functional Assessment Measure
The relationship between self-esteem and psychiatric (FAM). Journal of Rehabilitation Outcome Measures,
disorders in adolescents. European Psychiatry, 18: 1: 63–5.
59–62. Hall, R., Horrocks, J.C., Clamp, S.E. et al. (1976) Observer
Gurin, G., Veroff, J. and Feld, S. (1960) Americans View variation in assessment of results of surgery for peptic
their Mental Health. New York: Basic Books. ulceration. British Medical Journal, i: 814–16.
Gurland, B.J. (1980) The assessment of the mental health Hamer, D., Sanjeev, D., Butterworth, E. and Barzak, P.
status of older adults, in J.E. Birren and R.B. Sloane (1991) Using the Hospital Anxiety and Depression
(eds) Handbook of Mental Health and Ageing. Engle- Scale to screen for psychiatric disorders in people
wood Cliffs, NJ: Prentice-Hall. presenting with deliberate self-harm. British Journal of
Gurland, B.J., Copeland, J.R.M., Kelleher, M.J. et al. Psychiatry, 158: 782–4.
(1983) The Mind and Mood of Ageing: The Mental Health Hamilton, B.A., Primrose, W.R. and Muir, K.T. (2000)
Problems of the Community Elderly in New York and Care management in three practices – scope for
London. London: Croom Helm. improvement. Health Bulletin, 58: 380–4.
Gurland, B.J., Golden, R.R., Teresi, J.A. and Challop, J. Hamilton, M. (1959) The assessment of anxiety states by
(1984) The SHORT-CARE: An efficient instru- rating. British Journal of Medical Psychology, 32: 50–5.
ment for the assessment of depression, dementia and Hamilton, M. (1960) Rating scale for depression.
disability. Journal of Gerontology, 39: 166–9. Journal of Neurology, Neurosurgery and Psychiatry, 23:
Gurtman, M.B. (1985) Self-rating Depression Scale in 56–62.
D.J. Keyser and R.C. Sweetland (eds) Test Critiques, Hamilton, M. (1967) Development of a rating scale for
Vol. III. Kansas City, MO: Test Corporation of primary depressive illness. British Journal of Social and
America. Clinical Psychology, 6: 278–96.
Guttman, L. (1944) A basis for scaling qualitative data. Hamilton, M. (1976) Clinical evaluation of depression:
American Sociological Review, 9: 139. clinical criteria and rating scales, including a Guttman
Guyatt, G.H., Berman, L.B., Townsend, M. et al. (1987) Scale, in M. Gallant and G.M. Simpson (eds) Depres-
A measure of quality of life for clinical trials in chronic sion: Behavioral, Biochemical Diagnostic and Treatment
lung disease. Thorax, 42: 773–8. Concepts. New York: Spectrum Publications.
REFERENCES 183
Hammen, C.L. (1981) Assessment: a clinical and cogni- Headey, B.W., Glowacki, T., Holstrom, E.L. and Wearing,
tive emphasis in L.P. Rehm (ed.) Behaviour Therapy A.J. (1985) Modelling change in perceived quality of
for Depression: Present Status and Future Directions. life. Social Indicators Research, 17: 276–98.
New York: Academic Press. Health Outcomes Institute (1990) Report on a Survey
Hanley, J.A. and McNeil, B.J. (1982) The meaning and of Elderly Rural Residents: Health Status, Use of Health
use of the area under a Receiver Operating Charac- Care Services and Satisfaction with Quality of Care.
teristic (ROC) curve. Radiology, 143: 29–36. Bloomington, IN: HOI.
Harman, H.H. (1976) Modern Factor Analysis. Chicago, IL: Heasman, M.A. and Lipworth, L. (1966) Accuracy of
University of Chicago Press. Certification of Cause of Death. Studies on Medical and
Harrington, R. and Loffredo, D.A. (2001) The relation- Population Subjects no. 20. London: General Register
ship between life satisfaction, self-consciousness, and Office.
the Myers-Briggs type inventory dimensions. Journal Hedley, M.M., Oza, D., Feld, R. et al. (2002) The pallia-
of Psychology, 135: 439–50. tive benefit of irinotecan in 5-fluorouracil-refractory
Harris, L. (1975) The Myth and Reality of Ageing in colorectal cancer: its prospective evaluation by a
America. Washington, DC: National Council on the multicenter Canadian Trial. Clinical Colorectal Cancer,
Ageing. 2: 93–101.
Hart, G.L. and Evans, R.W. (1987) The functional status Hemingway, H., Stafford, M., Stansfeld, S. et al. (1997)
of ESRD patients as measured by the Sickness Impact Is the SF-36 a valid measure of change in population
Profile. Journal of Chronic Diseases, 40: 117S–130S health? Results from the Whitehall Study. British
(Supplement). Medical Journal, 315: 1273–9.
Harvey, I., Nelson, S.J., Lyons, R.A. et al. (1998) A ran- Henderson, A.S., Duncan-Jones, P. and Finlay-Jones,
domised controlled trial and economic evaluation of R.A. (1983) The reliability of the Geriatric Mental
counselling in primary care. British Journal of General State Examination. Acta Psychiatrica Scandinavica, 67:
Practice, 48: 1043–8. 281–9.
Harwood, R.H. and Ebrahim, S. (1995) Manual of the Henderson, S. (1981) Social relationships, adversity and
London Handicap Scale. Nottingham: University of neurosis: an analysis of prospective observations. British
Nottingham, Department of Health Care of the Journal of Psychiatry, 138: 391–8.
Elderly. Henderson, S., Duncan-Jones, P., Byrne, D.G. and Scott,
Harwood, R.H., Rogers, A., Dickinson, E. and R. (1980) Measuring social relationships: the Inter-
Ebrahim, S. (1994) Measuring handicap: the London view Schedule for Social Interaction. Psychological
Handicap Scale, a new outcome measure for chronic Medicine, 10: 723–34.
disease. Quality in Health Care, 3: 11–16. Henderson, S., Byrne, D.G. and Duncan-Jones, P. (1981a)
Harwood, R.H., Carr, A.J., Thompson, P.W. and Neurosis and the Social Environment. London: Academic
Ebrahim, S. (1996) Handicap in inflammatory arthritis. Press.
British Journal of Rheumatology, 35: 891–7. Henderson, S., Lewis, I.C., Howell, R.H. et al. (1981b)
Harwood, R.H., Gompertz, P., Pound, P. and Ebrahim, E. Mental health and the use of alcohol, tobacco,
(1997) Determinants of handicap 1 and 3 years after analgesics and vitamins in a secondary school popula-
stroke. Disability and Rehabilitation, 19: 205–11. tion. Acta Psychiatrica Scandinavica, 63: 186–9.
Harwood, R.H., Prince, M., Mann, A. and Ebrahim, S. Herron, M.K., Michaux, W.W., Katz, M.M. et al. (1964)
(1998a) Associations between diagnoses, impairments, Supplemental Instructions for the Administration of the
disability and handicap in a population of elderly Katz Adjustment Scales. Baltimore, MD: Spring Grove
people. Journal of Epidemiology, 27: 261–8. State Hospital, Research Department.
Harwood, R.H., Prince, M., Mann, A. and Ebrahim, S. Heylighten, F. and Bernheim, J. (2000) Global progress I:
(1998b) The prevalence of diagnoses, impairments, empirical evidence for ongoing increase in quality of
disabilities and handicaps in a population of elderly life. Journal of Happiness Studies, I: 323–49.
people living in a defined geographical area: the Hickey, A.M., Bury, G., O’Boyle, C.A. et al. (1996) A
Gospel Oak project. Age and Ageing, 27: 707–14. new short from individual quality of life measure
Hays, R., Siu, A., Keeler, E. et al. (1996) Long term (SEIQoL-DW): application in a cohort of individuals
care residents’ preferences on the QWB scale. Medical with HIV/AIDS. British Medical Journal, 313: 29–33.
Decision Making, 16: 254–61. Hickey, A., O’Boyle, C.A., McGee, H.M. and McDonald,
Haywood, K.L., Garratt, A.M., Dziedzic, K. and Dawes, N.J. (1997) The relationship between post-trauma
P.T. (2003) Patient centred assessment of ankylosing problem reporting and carer quality of life after severe
spondylitis-specific health related quality of life: head injury. Psychology and Health, 12: 827–38.
evaluation of the Patient Generated Index. Journal of Higgs, P., Hyde, M., Wiggins, R. and Blane, D. (2003)
Rheumatology, 30: 764–73. Researching quality of life in early old age: the
184 REFERENCES
importance of the sociological dimension. Social Policy Hubanks, L. and Kuyken, W. (WHOQOL Group) (1994)
and Administration, 37: 239–52. Quality of Life Assessment. An Annotated Bibliography.
Hill, S. and Harries, U. (1994) Assessing the outcome of Geneva: Division of Mental Health, World Health
health care for the older person in community settings: Organization.
should we use the SF-36? Outcomes Briefing. UK Hughes, T.E., Kaplan, R.M., Cons, S.J. et al. (1997) Con-
Clearing House for Health Outcomes, 4: 26–7. struct validities of the Quality of Well-Being Scale and
Hill, S., Harries, U. and Popay, J. (1995) Is the SF-36 the MOS-HIV-34 Health Survey for HIV-infected
suitable for routine health outcomes assessment in patients. Medical Decision Making, 17: 439–46.
health care for older people? Evidence from pre- Hunt, S.M. (1984) Nottingham Health Profile, in
liminary work in community based health services in N.K. Wenger, M.E. Mattson, C.D. Furberg et al.
England. Journal of Epidemiology and Community Health, (eds) Assessment of Quality of Life in Clinical Trials of
50: 94–8. Cardiovascular Therapies. New York: Le Jacq.
Hinterberger, W., Gadner, H., Hocker, P. et al. (1987) Hunt, S.M. (1988) Subjective health indicators and
Survival and quality of life in 23 patients with severe health promotion. Health Promotion, 3: 23–34.
aplastic anaemia treated with BMT. Blut, 54: 137–46. Hunt, S.M. (1999) The researcher’s tale: a story of virtue
Hirsch, B.J. (1980) Natural support systems and coping lost and regained, in C.R.B. Joyce, H.M. McGee and
with major life changes. American Journal of Community C.A. O’Boyle (eds) Individual Quality of Life.
Psychology, 8: 159–72. Approaches to Conceptualisation and Assessment. The
Hirsch, B.J. (1981) Social networks and the coping pro- Netherlands: Harwood Academic Publishers.
cess: creating personal communities, in B.H. Gottlieb Hunt, S.M. and McKenna, S.P. (1992) British adaptation
(ed.) Social Networks and Social Support. Beverly Hills, of the General Well-Being Index: a new tool for
CA: Sage Publications. clinical research. British Journal of Medical Economics,
Hobbs, P., Ballinger, C.B. and Smith, A.H.W. (1983) 2: 49–60.
Factor analysis and validation of the General Health Hunt, S.M. and McKenna, S.P. (1993) Measuring
Questionnaire in women: a general practice survey. patients’ views of their health. SF-36 misses the mark
British Journal of Psychiatry, 142: 257–64. (letter). British Medical Journal, 307: 125.
Hobbs, P., Ballinger, C.B., Greenwood, C. et al. (1984) Hunt, S.M., McKenna, S.P., McEwan, J. et al. (1980)
Factor analysis and validation of the General Health A quantitative approach to perceived health status: a
Questionnaire in men: a general practice survey. validation study. Journal of Epidemiology and Community
British Journal of Psychiatry, 144: 270–5. Health, 34: 281–6.
Hodkinson, H.M. (1972) Evaluation of a mental test Hunt, S.M., McKenna, S.P. and Williams, J. (1981)
score for the assessment of mental impairment in the Reliability of a population survey tool for measuring
elderly. Age and Ageing, 1: 233–8. perceived health problems: a study of patients with
Hoffmeister, J.K. (1976) Some Information Regarding the osteoarthritis. Journal of Epidemiology and Community
Characteristics of the Two Measures Developed from the Health, 35: 297–300.
Self-Esteem Questionnaire (SEQ-3). Boulder, CO: Test Hunt, S.M., McEwan, J., McKenna, S.P. et al. (1984a)
Analysis and Development Corporation. Subjective health assessments and the perceived
Holahan, C.J. and Moos, R.H. (1981) Social support and outcome of minor surgery. Journal of Psychosomatic
psychological distress: a longitudinal analysis. Journal of Research, 28: 105–14.
Abnormal Psychology, 90: 365–70. Hunt, S.M., McEwan, J. and McKenna, S.P. (1984b)
Hörnquist, J.O. (1982) The concept of quality of life. Perceived health: age and sex comparisons in a com-
Scandinavian Journal of Social Medicine, 10: 57–61. munity. Journal of Epidemiology and Community Health,
House, J.S. (1981) Work, Stress and Social Support. 34: 281–6.
Reading, MA: Addison-Wesley. Hunt, S.M., McEwan, J. and McKenna, S.P. (1986)
House, J.S. and Kahn, R.L. (1985) Measures and concepts Measuring Health Status. London: Croom Helm.
of social support, in S. Cohen and S.L. Syme (eds) Huppert, F.A. and Garcia, A.W. (1991) Qualitative
Social Support and Health. Orlando, FL: Academic Press. differences in psychiatric symptoms between high
House, J.S., Robbins, C. and Metzner, H.L. (1982) The risk groups assessed on a screening test (GHQ–30).
association of social relationships and activities with Social Psychiatry and Psychiatric Epidemiology, 26:
mortality: prospective evidence from the Tecumseh 252–8.
Community Health Study. American Journal of Epi- Huppert, F.A., Walters, D.E., Day, N.E. and Elliott, B.J.
demiology, 116: 123–40. (1989) The factor structure of the General Health
Hoyt, D.R. and Creech, J.C. (1983) The life satisfaction Questionnaire (GHQ–30). A reliability study on 6317
index: a methodological and theoretical critique. community residents. British Journal of Psychiatry,
Journal of Gerontology, 38: 111–16. 155: 178–85.
REFERENCES 185
Hutchinson, T.A., Boyd, N.F. and Feinstein, A.R. (1979) Jenkinson, C., Ziebland, S., Fitzpatrick, R. et al. (1991)
Scientific problems in clinical scales as demonstrated Sensitivity to change of weighted and unweighted
in the Karnofsky index of performance status. Journal versions of two health status measures. International
of Chronic Diseases, 32: 661–6. Journal of Health Sciences, 2: 189–94.
Hwang, S.S., Chang, V.T., Rue, M. and Kasimis, B. (2003) Jenkinson, C., Coulter, A. and Wright, L. (1993) Short
Multi-dimensional independent predictors of cancer- Form-36 (SF-36) health survey questionnaire:
related fatigue. Journal of Pain Symptom Management, Normative data for adults of working age. British
26: 604–14. Medical Journal, 306: 1437–40.
Hyde, M., Wiggins, R.D., Higgs, P. and Blane, D. (2003) Jenkinson C., Layte, R., Wright, L. and Coulter, A. (1996)
A measure of quality of life in early old age: the theory, The UK SF-36: An Analysis and Interpretation Manual.
development and properties of a needs satisfaction Oxford: University of Oxford, Health Services
model (CASP-19) Ageing and Mental Health, 7: Research Unit, Department of Public Health and
186–94. Primary Care.
Hyland, M.E. and Kenyon, P. (1992) A measure of Jenkinson, C., Layte, R. and Lawrence, K. (1997)
positive health-related quality of life: the Satisfaction Development and testing of the SF-36 summary scale
with Illness Scale. Psychological Reports, 71: 1137–8. scores in the United Kingdon: results from a large scale
Hyler, S.E. and Rieder, R.O. (1987) Personality survey and clinical trial. Medical Care, 35: 410–16.
Diagnostic Questionnaire-Revised. New York: New Jenkinson, C., Stewart-Brown, S., Petersen, S. and Paice,
York State Psychiatric Institute. C. (1999) Assessment of the SF-36 Mark 2 in the
Hyyppa, M.T. and Maki, J. (2003) Social participation United Kingdom. Journal of Epidemiology and Com-
and health in a community rich in stock of social munity Health, 53: 46–50.
capital. Health Education Research, 18: 770–9. Jenkinson, C., Mant, J., Carter, J. et al. (2000) The London
Idler, E.I. and Kasl, S.V. (1995) Self-ratings of health: do handicap scale: a re-evaluation of its validity using
they also predict change in functional ability? Journal standard scoring and simple summation. Journal of
of Gerontology (B), 50: S344–53. Neurology, Neurosurgery and Psychiatry, 68: 365–7.
Inglehart, R. and Rabier, J.R. (1986) Aspirations adapt to Jette, A.M. (1980) The Functional Status Index: Relia-
situations – but why are the Belgians so much happier bility of a chronic disease evaluation instrument.
than the French? A cross-cultural analysis of the sub- Archives of Physical Medicine and Rehabilitation, 61:
jective quality of life, in F.M. Andrews (ed.) Research on 395–401.
the Quality of Life. Ann Arbor, MI: Survey Research Jirik-Babb, P. and Geliebter, A. (2003) Comparison of
Center, Institute for Social Research, University of psychological characteristics of binging and non-
Michigan. binging obese, adult, female outpatients. Eating and
Insinga, R.P. and Fryback, D.G. (2003) Understanding Weight Disorders, 8: 173–7.
differences between self-ratings and population ratings Jitapunkul, S., Pillay, I. and Ebrahim, S. (1991) The
for health in the EuroQol. Quality of Life Research, Abbreviated Mental Test: its use and validity. Age and
12: 611–19. Ageing, 20: 332–6.
Isaacs, B. and Walkey, P.A. (1964) Measurement of mental Jones, D.A., Victor, C.R. and Vetter, N.J. (1985) The
impairment in geriatric practice. Gerontology Clinics, 6: problem of loneliness in the elderly in the community:
114–23. characteristics of those who are lonely and the factors
Jachuck, S.J., Brierly, H., Jachuk, S. et al. (1982) The effect related to the loneliness. Journal of the Royal College of
of hypotensive drugs on the quality of life. Journal of General Practitioners, 35: 136–9.
the Royal College of General Practitioners, 32: 103–5. Joore, M.A., Potjewijd, J. Timmerman, A.A. et al. (2002)
Jagger, C. and Lindesay, J. (1997) Residential care for eld- Response shift in the measurement of quality of life
erly people: the prevention of cognitive impairment in hearing impaired adults after hearing aid fitting.
and behavioural problems. Age and Ageing, 26: 475–80. Quality of Life Research, 11: 299–307.
Jenkinson, C. and Layte, R. (1997) Development and Joyce, C.R.B., McGee, H.M. and O’Boyle, C.A. (1999)
testing of the UK SF-12 (short form health survey). Individual quality of life: review and outlook, in
Journal of Health Services Research and Policy, 2: 14–18. C.R.B. Joyce, C.A. O’Boyle and H. McGee, Individual
Jenkinson, C. and McGee, H. (1998) Health Status Quality of Life. Approaches to Conceptualisation and
Measurement: A Brief but Critical Introduction. Oxford: Assessment. Amsterdam: Harwood Academic
Radcliffe Medical Press. Publishers.
Jenkinson, C., Fitzpatrick, R. and Argyle, M. (1988) Joyce, C.R., Hickey, A., McGee, H.M. and O’Boyle, C.A.
The Nottingham Health Profile: an analysis of its sen- (2003) A theory-based method for the evaluation of
sitivity in differentiating illness groups. Social Science individual quality of life: the SEIQoL. Quality of Life
and Medicine, 27: 1411–14. Research, 12: 275–80.
186 REFERENCES
Julious, S.A., George, S. and Campbell, J. (1995) Sample Kaplan, H.B. and Porkorny, A.D. (1969) Self derogation
sizes for studies using the short form 36 (SF-36). and psychosocial adjustment. Journal of Nervous and
Journal of Epidemiology and Community Health, 49: Mental Disease, 149: 421–34.
642–4. Kaplan, R. (1985) Social support and social health, in
Juniper, E.F., Guyatt, G.H., Streiner, D.L. and King, D.R. I. Saranson and B. Saranson (eds) Social Support Theory,
(1997) Clinical impact versus factor analysis for Research and Application. The Hague: Nijhoff.
quality of life questionnaire construction. Journal of Kaplan, R.M. (1988) New health promotion indicators:
Clinical Epidemiology, 50: 233–8. the general health policy model. Health Promotion,
Kafonek, S., Ettinger, W.H., Roca, R. et al. (1989) Instru- 3: 35–48.
ments for screening for depression and dementia in a Kaplan, R.M. (1994) Using quality of life information to
long term care facility. Journal of the American Geriatrics set priorities in health policy. Social Indicators Research,
Society, 37: 29–34. 33: 121–63.
Kane, R.L., Rockwood, T., Philp, I. and Finch, M. (1998) Kaplan, R.M. and Anderson, J.P. (1988) The quality of
Differences in valuation of functional status com- well-being scale. Rationale for a single quality of life
ponents among consumers and professionals in index, in S.R. Walker and R. Rosser (eds) Quality of
Europe and the United States. Journal of Clinical Life: Assessment and Application. London: MTP Press.
Epidemiology, 51: 657–66. Kaplan, R.M. and Bush, J.W. (1982) Health-related
Kahn, R.L., Goldfarb, A.I., Pollack, M. et al. (1960a) quality of life measurement for evaluation research
The relationship of mental and physical status in analysis. Health Psychology, 1: 61–80.
institutionalized aged persons. American Journal of Kaplan, R.M. and Ernst, J.A. (1983) Do category rating
Psychiatry, 117: 120–4. scales produce biased preference weights for a health
Kahn, R.L., Goldfarb, A.I., Pollack, M. et al. (1960b) index? Medical Care, 21: 193–207.
Brief objective measures for the determination of Kaplan, R.M., Bush, J.W. and Berry, C.C. (1976) Health
mental status in the aged. American Journal of Psychiatry, status: types of validity and the Index of Wellbeing.
117: 326–8. Health Services Research, 11: 478–507.
Khan, A., Khan, S.R., Shankles, E.B. and Polissar, N.L. Kaplan, R.M., Bush, J.W. and Berry, C.C. (1978) The
(2002) Relative sensitivity of the Montgomery- reliability, stability and generalizability of a health
Asberg Depression Rating Scale, the Hamilton status index. American Statistical Association, Pro-
Depression rating scale and the Clinical Global ceedings of the Social Statistics Section, 704–9.
Impressions rating scale in antidepressant clinical trials. Kaplan, R.M., Bush, J.W. and Berry, C.C. (1979)
International Clinical Psychopharmacology, 17: 281–5. Health Status Index: category rating versus magnitude
Kalra, L. and Crome, P. (1993) The role of prognostic estimation for measuring levels of well-being. Medical
scores in targeting stroke rehabilitation in elderly Care, 17: 501–23.
patients. Journal of the American Geriatrics Society, 41: Kaplan, R.M., McCutchan, J.A., Navarro, A.M. et al.
396–400. (1994) Quality adjusted survival analysis: a neglected
Kalson, C. (1976) MASH – a program of social inter- application of the quality of well-being scale.
action between institutionalised aged and adult Psychology and Health, 9: 131–41.
mentally retarded persons. The Gerontologist, 16: 340–8. Kaplan, R.M., Anderson, J.P., Patterson, T.L. et al. (1995)
Kammann, R. and Flett, R. (1983) Affectometer 2: a Validity of the Quality of Well-Being Scale for persons
scale to measure current level of general happiness. with human immunodeficiency virus infection.
Australian Journal of Psychology, 35: 259–65. Psychosomatic Medicine, 57: 138–47.
Kaplan, B.H. (1975) An epilogue: toward further research Karnofsky, D.A. and Burchenal, J.H. (1949) The clinical
on family and health, in B.H. Kaplan and J.C. Cassel evaluation of chemotherapeutic agents in cancer, in
(eds) Family and Health: An Epidemiological Approach. C.M. McLeod (ed.) Evaluation of Chemotherapeutic
Chapel Hill, NC: University of North Carolina, Agents. Columbia: Columbia University Press.
Institute for Research and Social Science. Karnofsky, D.A., Abelmann, W.H., Craver, L.F. et al.
Kaplan, G.A. and Camacho, T. (1983) Perceived health (1948) The use of nitrogen mustards in the palliative
and mortality: a nine-year follow-up of the Human treatment of carcinoma. Cancer, I: 634–56.
Population Laboratory Cohort. American Journal of Kasl, S. and Cobb, S. (1966) Health behavior, illness
Epidemiology, 117: 292–8. behavior and sick role behavior. Archives of Environ-
Kaplan, G.A., Salonen, J.T., Cohen, R.D. et al. (1988) mental Health, 12: 246–66.
Social connections and mortality from all causes Kasl, S.V. and Cooper, C.L. (1987) Stress and Health Issues
and cardio-vascular disease: prospective evidence from in Research Methodology. Chichester: John Wiley.
eastern Finland. American Journal of Epidemiology, 128: Kaszmak, A.W. and Allender, J. (1985) Psychological
370–80. assessment of depression in older adults, in G.M.
REFERENCES 187
Chaisson-Stewart (ed.) Depression in the Elderly: An Kidd, D., Stewart, G., Baldry, J. et al. (1995) The Func-
Interdisciplinary Approach. New York: John Wiley. tional Independence Measure: a comparative validity
Katz, J.N., Larson, M.G., Phillips, C.B. et al. (1992) and reliability study. Disability and Rehabilitation, 17:
Comparative measurement sensitivity of short and 10–14.
longer health status instruments. Medical Care, 30: Kind, P. (1996) The EuroQol instrument: an index of
917–25. health-related quality of life, in B. Spilker (ed.) Quality
Katz, S. and Akpom, C.A. (1976) Index of ADL. Medical of Life and Pharmacoeconomics in Clinical Trials, 2nd edn.
Care, 14: 116–18. Philadelphia, PA: Lippincott-Raven.
Katz, S., Ford, A.B., Moskowitz, R.W. et al. (1963) Kind, P. (undated) Scaling the Nottingham Health Profile.
Studies of illness in the aged: the index of ADL – a Mimeo. York: University of York, Centre for Health
standardized measure of biological and psychosocial Economics.
function. Journal of the American Medical Association, Kind, P. and Carr-Hill, R. (1987) The Nottingham
185: 914–19. Health Profile: a useful tool for epidemiologists? Social
Katz, S., Ford, A.B., Chinn, A.B. et al. (1966) Prognosis Science and Medicine, 25: 905–10.
after strokes: long-term course of 159 patients with Kind, P., Hardman, G. and Macran, S. (1999) Population
stroke. Medicine, 45: 236–46. norms for EQ-5D. York: University of York: Centre
Katz, S., Vignos, P.J., Moskowitz, R.W. et al. (1968) for Health Economics, Discussion Paper no. 172,
Comprehensive outpatient care in rheumatoid November.
arthritis: a controlled study. Journal of the American King, J.T. and Roberts, M.S. (2002) Validity and relia-
Medicine Association, 206: 1249. bility of the Short form-36 in cervical spondylotic
Katz, S., Downs, T.D., Cash, H.R. et al. (1970) Progress in myelopathy. Journal of Neurosurgery, 97: 180–5.
the development of and index of ADL. Gerontologist, Kiran, R.P., Delaney, C.P., Senagore, A.J. et al. (2003)
10: 20–30. Prospective assessment of Cleveland Global Quality
Katz, S., Akpom, C.A., Papsidero, J.A. et al. (1973) of Life (CGQL) as a novel marker of quality of life and
Measuring the health status of populations, in disease activity in Crohn’s disease. American Journal of
R.L. Berg (ed.) Health Status of Populations. Chicago, Gastroenterology, 98: 1783–9.
IL: Hospital Research and Educational Trust. Kirby, M., Denihan, A., Bruce, I. et al. (2000) The pattern
Kaufman, A.V. (1990) Social network assessment: a of support networks among the community dwelling
critical component in case management for func- elderly in urban Ireland: variations with mental dis-
tionally impaired older persons. International Journal of order. Irish Journal of Psychological Medicine, 17: 43–9.
Ageing and Human Development, 30: 63–75. Kirwan, R.J. and Reeback, J.S. (1983) Using a modified
Kawachi, I. and Berkman, L. (2000) Social cohesion, Stanford Health Assessment Questionnaire to assess
social capital and health, in L.F. Berkman and disability in UK patients with rheumatoid arthritis.
I. Kawachi (eds) Social Epidemiology. Oxford: Oxford Annals of the Rheumatic Diseases, 42: 219–20.
University Press. Knapp, M.R.J. (1976) Predicting the dimensions of life
Kawachi, I., Kennedy, B.P., Lochner, K. and Prothrow- satisfaction. Journal of Gerontology, 31: 595–604.
Stith, D. (1997a) Social capital, income inequality and Knesevich, J.W., Biggs, J.T., Clayton, P.J. and Ziegler, V.E.
mortality. American Journal of Public Health, 87, 1491–8. (1977) Validity of the Hamilton rating scale for depres-
Kawachi, I., Kennedy, B.P. and Lochner, K. (1997b) sion. British Journal of Psychiatry, 131: 49–52.
Long live community: social capital as public health. Knight, R.G., Waal-Manning, H.J. and Spears, G.F.
The American Prospect, November/December: 56–9. (1983) Some norms and reliability data for the State-
Kawachi, I., Kennedy, B.P. and Glass, R. (1999) Social Trait Anxiety Inventory and the Zung Self-Rating
capital and self-rated health: a contextual analysis. Depression Scale. British Journal of Clinical Psychology,
American Journal of Public Health, 89, 1187–93. 22: 245–9.
Kay, D.W.K., Beamish, R. and Roth, M. (1964) Old age Koenig, H.G., Meador, K.G., Cohen, H.J. et al. (1988)
mental disorders in Newcastle upon Tyne, Part I: Self-rated depression scales and screening for major
a study of prevalence. British Journal of Psychiatry, 110: depression in the older hospitalized patient with
146–58. medical illness. Journal of the American Geriatrics Society,
Kearns, N.P., Cruickshank, C.A., McGuigan, K.J. et al. 36: 699–706.
(1982) A comparison of depression rating scales. Kohn, M.L. (1969) Class and Conformity: A Study in
British Journal of Psychiatry, 141: 45–9. Values. Homewood, IL: Dorsey.
Keyes, C.L., Shmotkin, D. and Ryff, C.D. (2002) Opti- Kokenes, B. (1974) Grade level differences in factors of
mizing well-being: the empirical encounter of two self esteem. Development Psychology, 10: 954–8.
traditions. Journal of Personality and Social Psychology, Korner, A., Nielsen, B.M., Eschen, F. et al. (1990) Quanti-
82: 1007–22. fying depressive symptomatology: inter-rater and
188 REFERENCES
inter-item correlations. Journal of Affective Disorders, A preliminary evaluation of the acceptability and
20: 140–9. utility of the COOP function charts, in M. Lipkin
Kovacs, M. and Beck, A.T. (1977) An empirical-clinical (ed.) Functional Status Measurement in Primary Care.
approach toward a definition of childhood depression, New York: Springer-Verlag.
in J.G. Schulterbrandt and A. Raskin (eds) Depression in Langius, A. (1995) Quality of life in a group of patients
Childhood: Diagnosis, Treatment and Conceptual Models. with oral and pharyngeal cancer. Sense of coherence,
New York: Raven Press. functional status and well-being. Stockholm: Depart-
Kozma, A. and Stones, M.J. (1987) Social desirability ment of Medicine, Centre of Caring Sciences North,
in measures of subjective well-being: a systematic Karolinska Institute.
evaluation. Journal of Gerontology, 42: 56–9. Langius, A., Björvell, H. and Antonovsky, A. (1992)
Krefetz, D.G., Steer, R.A., Gulab, N.A. and Beck, A.T. The sense of coherence concept and its relation to
(2002) Convergent validity of the Beck depression personality traits in Swedish samples. Scandinavian
inventory-II with the Reynolds Adolescent Journal of Caring Science, 6: 165–71.
Depression Scale in psychiatric in-patients. Journal Langius, A., Björvell, H. and Lind, M. (1994) Functional
of Personality Assessment, 78: 451–60. status and coping in patients with oral and pharyngeal
Kritz-Silverstein, D., Wingard, D.L. and Barrett-Connor, cancer before and after surgery. Head and Neck, 16:
E. (2002) Hysterectomy status and life satisfaction in 559–68.
older women. Journal of Women’s Health and Gender Larson, R. (1978) Thirty years of research on the sub-
Based Medicine, 11: 181–90. jective well-being of older Americans. Journal of
Kuijpers, P.M., Denollet, J., Lousberg, R. et al. (2003) Gerontology, 33: 109–25.
Validity of the hospital anxiety and depression scale Lawton, M.P. (1972) The dimensions of morale, in D.
for use with patients with noncardiac chest pain. Kent, R. Kastenbaum and S. Sherwood (eds) Research,
Psychosomatics, 44: 329–35. Planning and Action for the Elderly. New York:
Kurtin, P.S., Davies, A.R., Meyer, K.B. et al. (1992) Behavioral Publications.
Patient-based health status measurements in out- Lawton, M.P. (1975) The Philadelphia Geriatric Center
patient dialysis: Early experiences in developing an Morale Scale: a revision. Journal of Gerontology, 30: 85–9.
outcomes assessment program. Medical Care, 30: Lawton, M.P. (1991) Background. A multidimensional
MS136–MS149 (suppl. 5). view of quality of life in frail elders, in J.E. Birren, J.
Kushman, J. and Lane, S. (1980) A multivariate analysis of Lubben, J. Rowe and D. Deutchman (eds) The Concept
factors affecting perceived life satisfaction and psycho- and Measurement of Quality of Life in the Frail Elderly.
logical well-being among the elderly. Social Science San Diego, CA: Academic Press.
Quarterly, 61: 264–77. Leff, J. (ed.) (1993) The TAPS project: evaluating com-
Kutlay, S., Nergizoglu, G., Kutlay, S. et al. (2003) General munity placement of long stay psychiatric patients.
or disease specific questionnaire? A comparative study British Journal of Psychiatry, 162: 1–56 (suppl. 19).
in hemodialysis patients. Renal Failure, 25: 95–103. Leff, J., O’Driscoll, C., Dayson, D. et al. (1990) The TAPS
Kutner, B., Fansel, D., Togo, A.M. et al. (1956) Five project: V. The structure of social network data
Hundred Over 60. New York: Russell Sage. obtained from long-stay patients. British Journal of
Kutner, N.G., Fair, P.L. and Kutner, M.H. (1985) Psychiatry, 157: 848–52.
Assessing depression and anxiety in chronic dialysis Lehman, A. (1983) The well-being of chronic mental
patients. Journal of Psychosomatic Research, 29: 23–31. patients. Archives of General Psychiatry, 40: 369–73.
Kvaal, K., Laake, K. and Engedal, K. (2001) Psychometric Lehman, A. (1988) A quality of life interview for the
properties of the state part of the Spielberger State- chronically mentally ill. Evaluation and Program Plan-
Trait Anxiety Inventory (STAI) in geriatric patients. ning, 11: 51–62.
International Journal of Geriatric Psychiatry, 16: 980–6. Leighton, A.H. (1959) My Name is Legion: Foundations for
Lamb, K.L., Brodie, D.A. and Roberts, K. (1988) Physical a Theory of Man in Relation to Culture. New York: Basic
fitness and health-related fitness as indicators of a Books.
positive health state. Health Promotion, 3: 171–82. Leighton Read, J., Quinn, R.J. and Hoefer, M.A. (1987)
Land, K.C. (1975) Social indicators models: an overview, Measuring overall health: an evaluation of three
in K.C. Land and S. Spilerman (eds) Social Indicator important approaches. Journal of Chronic Diseases,
Models. New York: Russell Sage Foundation. 40(Supplement 1): 7S–21S.
Landgraf, J.M. and Nelson, E.C. (1992) Summary of Leplege, A., Reveilere, C., Ecosse, E. et al. (2000)
the WONCA/COOP international health assessment Psychometric properties of a new instrument for
field trial. Australian Family Physician, 21: 255–69. evaluating quality of life, the WHOQOL-26, in a
Landgraf, J.M., Nelson, E.C., Hays, R.D. et al. (1990) population of patients with neuromuscular diseases.
Assessing function: does it really make a difference? Encephale, 26: 13–22.
REFERENCES 189
Lerner, M. (1973) Conceptualization of health and well- Lin, N., Simeone, R., Ensel, W. et al. (1979) Social
being. Health Services Research, 8: 6–12. support, stressful life events and illness, a model and
Lesher, E.L. (1986) Validation of the Geriatric Depres- an empirical test. Journal of Health and Social Behaviour,
sion Scale among nursing home residents. Clinical 20: 108–19.
Gerontology, 4: 21–8. Lin, Y.L. (2002) The role of perceived social support
Levine, M.N., Guyatt, G.H., Gent, M. et al. (1988) and dysfunctional attitudes in predicting Taiwanese
Quality of life in stage 11 breast cancer: An instru- adolescents’ depressive tendency. Adolescence, 37:
ment for clinical trials. Journal of Clinical Oncology, 823–34.
6: 1798–810. Lindfors, P. (2002) Positive health in a group of Swedish
Le Fevre, P., Devereaux, J., Smith, J. et al. (1999) Screening white-collar workers. Psychological Reports, 91: 839–45.
for psychiatric illness in the palliative care inpatient Lintern, T.C., Beaumont, J.C., Kenealy, P.M. and Murrell,
setting: a comparison between the Hospital Anxiety R.C. (2001) Quality of life (QoL) in severely disabled
and Depresion Scale and the General Health Question- multiple sclerosis patients: comparison of three QoL
naire. Palliative Medicine, 12: 399–407. measures using multidimensional scaling. Quality of
Lewis, C.A., Shevlin, M.E., Bunting, B.P. and Joseph, S. Life Research, 10: 371–78.
(1995) Confirmatory factor analysis of the Satisfaction Linzer, M., Pontinen, M., Gold, D. et al. (1991) Impair-
with Life Scale: replication and methodological refinement of physical and psychological function in
ment. Journal of Perceptual and Motor Skills, 80: 304–6. recurrent syncope. Journal of Clinical Epidemiology, 44:
Lewis, G. and Wessely, S. (1990) Comparison of the 1037–43.
General Health Questionnaire and the Hospital Little, A., Hemsley, D., Bergman, K. et al. (1987) Com-
Anxiety and Depression Scale. British Journal of parison of the sensitivity of three instruments for the
Psychiatry, 157: 860–4. detection of cognitive decline in the elderly living at
Li, Y. (2003) Social capital and social exclusion in home. British Journal of Psychiatry, 150: 808–14.
England and Wales (1972–1999). British Journal of Llobera, J., Esteva, M., Benito, E. et al. (2003) Quality of
Sociology, 54: 497–526. life for oncology patients during the terminal period.
Liang, J. (1984) Dimensions of the life satisfaction Index Validation of the HRCA-QL index. Support Care
A: a structural formation. Journal of Gerontology, 39: Cancer, 11: 294–303.
613–22. Lloyd-Williams, M., Friedman, T. and Rudd, N. (2001)
Liang, J. and Bollen, K.A. (1983) The structure of the An analysis of the validity of the Hospital Anxiety and
Philadelphia Center Morale Scale: a reinterpretation. Depression Scale as a screening tool in patients with
Journal of Gerontology, 30: 77–84. advanced metastatic cancer. Journal of Pain Symptom
Liang, J. and Bollen, K.A. (1985) Sex differences in the Management, 22: 990–6.
structure of the Philadelphia Geriatric Center Morale Lohmann, N. (1977) Correlations of life satisfaction,
Scale. Journal of Gerontology, 40: 468–77. morale and adjustment measures. Journal of Gerontology,
Liang, M.H., Larson, M., Cullen, K. and Schwartz, J. 32: 73–5.
(1985) Comparative measurement efficiency and Louks, J., Hayne, C. and Smith, J. (1989) Replicated
sensitivity of five health status instruments for arthritis factor structure of the Beck Depression Inventory.
research. Arthritis and Rheumatism, 28: 524–47. Journal of Nervous Mental Disease, 177: 473–9.
Liang, J., Bennett, J., Akiyama, H. and Maeda, D. (1992) Lovas, K., Kalo, Z., McKenna, S.P. et al. (2003) Establish-
The structure of the PGC Morale Scale in American ing a standard for patient-completed instrument
and Japanese aged: a further note. Journal of Cross adaptations in eastern Europe: experience with the
Cultural Gerontology, 7: 45–68. Nottingham Health Profile in Hungary. Health Policy,
Liddle, J., Gilleard, C. and Neil, A. (1993) Elderly patients’ 63: 49–61.
and their relatives’ views on CPR (letter). Lancet, Love, A., Loeboeuf, D.C. and Crisp, T.C. (1989) Chiro-
342: 1055. practic chronic low back pain sufferers and self-report
Liem, G.R. and Liem, J.H. (1978) Social support and assessment methods. Part 1. A reliability study of the
stress. Some general issues and their application to the Visual Analogue Scale, the pain drawing and the
problems of unemployment. Unpublished manuscript. McGill Pain Questionnaire. Journal of Manipulative and
Boston College and University of Massachusetts. Physiological Therapeutics, 12: 21–5.
Likert, R. (1952) A technique for the development of Lowe, D.J. (1975) The Cornell Indices: A Bibliography of
attitude scales. Educational and Psychological Measure- Health Questionnaires. New York: Cornell University
ment, 12: 313–15. Medical College Library, Reference and Information
Lim, L.L. and Fisher, J.D. (1999) Use of the 12-item short Services.
form (SF-12) health survey in an Australian heart Lowe, N.K., Walker, S.N. and McCallum, R.C. (1991)
and stroke population. Quality of Life Research, 8: 1–8. Confirming the theoretical structure of the McGill
190 REFERENCES
Pain Questionnaire in acute clinical pain. Pain, 46: McGee, M.A., Johnson, T. and Kay, D.W.K. and the
53–60. analysis group of the MRC CFAS study (1998) The
Lowenthal, M.F. and Haven, C. (1968) Interaction and descriptions of activities of daily living in five
adaptation: intimacy as a critical variable. American centres in England and Wales. Age and Ageing, 27:
Sociological Review, 33: 20–30. 605–13.
Lowrie, E.G., Curtain, R.B., Lepain, N. and Schatell, D. McGuire, B. and Tinsley, H.E.A. (1981) A contribution
(2003) Medical outcomes study short form-36: a to the construct validity of the Tennessee Self-
consistent and powerful predictor of morbidity and Concept Scale: a confirmatory factor analysis. Applied
mortality in dialysis patients. American Journal of Kidney Psychological Measurement, 5: 449–57.
Disease, 41: 1286–92. McHorney, C.A., Ware, J.E., Rogers, W. et al. (1992) The
Lubben, J.E. (1985) Health and psychological assessment validity and relative precision of MOS short- and
instruments of community-based long term care: the long-form health status scales and Dartmouth COOP
California Multipurpose Senior Services Project Charts: results from the medical outcomes study.
(MSSP) experience. Dissertation, Berkeley, CA: Medical Care, 30: MS253-MS265.
University of California. McHorney, C.A., Ware, J.E. and Raczek, A.E. (1993)
Lubben, J.E. (1988) Assessing social networks among The MOS 36-Item Short Form Health Survey
elderly populations. Family and Community Health, (SF–36): II. Psychometric and clinical tests of validity
11: 42–52. in measuring physical and mental health constructs.
Lubeck, D.P. (2002) Health-related quality of life Medical Care, 31: 247–63.
measurements and studies in rheumatoid arthritis. McHorney, C.A., Kosinski, M. and Ware, J.E. (1994)
American Journal of Managed Care, 8: 811–20. Comparisons of the costs and quality of norms for
Lubeck, D.P. and Fries, J.F. (1992) Changes in quality the SF-36 Health Survey collected by mail versus
of life among persons with HIV infection. Quality of telephone interview: results from a national survey.
Life Research, 1: 359–66. Medical Care, 32: 551–67.
Luo, N., Chew, L.H., Fong, K.Y. et al. (2003) A com- McKenna, S.P., Hunt, S.M. and McEwan, J. (1981)
parison of the EuroQol-dD and the Health Utilities Weighting the seriousness of perceived health
Index mark 3 in patients with rheumatic disease. problems using Thurstone’s method of paired
Journal of Rheumatology, 30: 2268–74. comparisons. International Journal of Epidemiology, 10:
Lyons, J.S., Strain, J.J., Hammer, J.S. et al. (1989) Reliabil- 93–7.
ity, validity, and temporal stability of the Geriatric McKenna, S.P., McEwan, J., Hunt, S.M. et al. (1984)
Depression Scale in hospitalized elderly. International Changes in the perceived health of patients recovering
Journal of Psychiatry and Medicine, 19: 203–9. from fractures. Public Health, 98: 97–102.
Lyons, R.A., Perry, H.M. and Littlepage, B.N.C. (1994) McMillan, S.C. (1996) Quality-of-life assessment in
Evidence for the validity of the short-form 36 palliative care. Cancer Control, 3: 223–9.
questionnaire (SF-36) in an elderly population. Age McMurdo, M.E.T. and Rennie, L. (1993) A controlled
and Ageing, 23: 182–4. trail of exercise by residents of old people’s homes.
Lyons, R.A., Crome, P., Monaghan, S. et al. (1997) Health Age and Ageing, 22: 11–15.
status and disability among elderly people in three UK McNeil, B.J., Weichselbaum, R. and Pauker, S.G. (1978)
districts. Age and Ageing, 26: 203–9. Fallacy of the five year survival in lung cancer. New
Lyons, R.A., Wareham, K., Lucas, M. et al. (1999) SF-36 England Journal of Medicine, 299: 1397–401.
scores vary by method of administration: implications McNeil, B.J., Weichselbaum, R. and Pauker, S.G. (1981)
for study design. Journal of Public Health Medicine, 21: Speech and Survival: tradeoffs between quality
41–5. and quantity of life in laryngeal cancer. New England
McColl, E., Steen, I.N., Meadows, K.A. et al. (1995) Journal of Medicine, 305: 982–7.
Developing outcome measures for ambulatory care – McPherson, F.M., Gamsu, C.V., Kiemle, G. et al. (1985)
an application to asthma and diabetes. Special Issue The concurrent validity of the survey version of the
‘Quality of Life’ in Social Science and Medicine, 10: Clifton Assessment Procedures for the Elderly
1339–48. (CAPE). British Journal of Clinical Psychology, 24: 83–91.
McDowell, I. and Newell, C. (1996) Measuring Health: McQuay, H.J. (1990) Assessment of pain, and effective-
A Guide to Rating Scales and Questionnaires, 2nd edn. ness of treatment, in A. Hopkins and D. Costain (eds)
New York: Oxford University Press. Measuring the Outcomes of Medical Care. London: Royal
McGee, H.M., O’Boyle, C.A., Hickey, A. et al. (1991) College of Physicians.
Assessing the quality of life of the individual: The McWhirter, B.T. (1990) Factor analysis of the Revised
SEIQoL with a healthy and a gastroenterology unit UCLA Loneliness Scale. Current Psychology Research
population. Psychological Medicine, 21: 749–59. and Reviews, 9: 56–68.
REFERENCES 191
McWilliam, C., Copeland, J.R.M., Dewey, M.E. et al. Spielberger State-Trait Anxiety Inventory (STAI).
(1988) The geriatric mental state examination as a case British Journal of Clinical Psychology, 31: 301–6.
finding instrument in the community. British Journal of Martin, A.J. (1987) Patients and presentation: a
Psychiatry, 152: 205–8. profile from general practice. Modern Medicine, April:
Macduff, C. and Russell, E. (1998) The problem of 14–18.
measuring change in individual health-related quality Martin, C.R., Lewin, R.J., Thompson, D.R. et al.
of life by postal questionnaire: use of the patient (2003) A confirmatory factor analysis of the Hospital
generated index in a disabled population. Quality of Anxiety and Depression Scale in coronary care
Life Research, 7: 761–9. patients following acute myocardial infarction. Psych-
Maes, S., Vingerhoets, A. and Van Heck, G. (1987) The iatry Research, 30: 85–94.
study of stress and disease: some developments and Maslow, A.H. (1954) Motivation and Personality. New
requirements. Social Science and Medicine, 25: 567–78. York: Harper.
Magne, I.U., Ojehagen, A. and Traskman, B.L. (1992) Maslow, A.H. (1962) Toward a Psychology of Being, 2nd edn.
The social network of people who attempt suicide. Princeton, NJ: Van Nostrand.
Acta Psychiatrica Scandinavica, 86: 153–8. Matsuura, E., Ohta, A., Kanegae, F. et al. (2003) Fre-
Mahon, N.E. and Yarcheski, A. (1990) The dimensional- quency and analysis of factors closely associated with
ity of the UCLA Loneliness Scale in early adolescents. the development of depressive symptoms in patients
Research in Nursing and Health, 13: 45–52. with scleroderma. Journal of Rheumatology, 30: 1782–7.
Mahon, N.E., Yarcheski, T.J. and Yarcheski, A. (1995) Mattison, P.G., Aitken, R.C.B. and Prescot, R.J. (1991)
Validation of the revised UCLA Loneliness Scale Rehabilitation status – the relationship between
for adolescents. Research in Nursing and Health, 18: the Edinburgh Rehabilitation Status Scale (ERSS),
263–70. Barthel Index and PULSES Profile. International
Mahoney, F.I. and Barthel, D.W. (1965) Functional evalu- Disability Studies, 13: 9–11.
ation: the Barthel Index. Maryland State Medical Journal, Mauskopf, J., Austin, R., Dix, L. et al. (1994) The
14: 61–5. Nottingham Health Profile as a measure of quality
Maitland, S.B., Dixon, R.A., Hultsch, D.F. and Hertzog, of life in zoster patients: convergent and discriminant
C. (2001) Well-being as a moving target: measure- validity. Quality of Life Research, 3: 431–5.
ment equivalence of the Bradburn Affect Balance Mechanic, D. (1962) The concept of illness behaviour.
Scale. Journal of Gerontology (B), 56: 69–77. Journal of Chronic Diseases, 15: 189–94.
Makowska, Z. and Merecz, D. (2000) The usefulness Mechanic, D. (1978) Medical Sociology, 2nd edn. New
of the Health Status Questionnaire: D. Goldberg’s York: The Free Press.
GHQ-12 and GHQ-28 for diagnosis of mental Medical Outcomes Trust (1993) How to Score the SF-36
disorders in workers. Medycyny Pracy, 51: 589–601. Health Survey. Boston, MA: Medical Outcomes
Mangione, C.M., Marcantonio, E.R., Goldman, L. et al. Trust.
(1993) Influence of age on measurement of health sta- Meenan, R.F. (1982) The AIMS approach to health status
tus in patients undergoing elective surgery. Journal of measurement: conceptual background and measure-
the American Geriatrics Society, 41: 377–83. ment properties. Journal of Rheumatology, 9: 785–8.
Manne, S. and Schnoll, R. (2001) Measuring supportive Meenan, R.F. (1985) New approaches to outcome
and unsupportive responses during cancer treatment; assessment: the AIMS questionnaire for arthritis,
a factor analytic assessment of the Partner Responses in G.H. Stollerman (ed.) Advances in Internal Medicine,
to Cancer Inventory. Journal of Behavioral Medicine, vol. 31. New York: Year Book Medical Publishers.
24: 297–321. Meenan, R.F. and Mason, J.H. (1990) AIMS2 users’ guide.
Marks, N.F. and Lambert, J.D. (1999) Transitions to Boston, MA: Boston University School of Medicine,
caregiving, gender, and psychological well-being: Boston University Arthritis Center and Department
prospective evidence from the National Survey of of Public Health.
Families and Households. NSFH Working Paper No. Meenan, R.F. and Mason, J.H. (1994) AIMS2 users’ guide
82. Wisconsin: Center for Demography and Ecology, (revised). Boston, MA: Boston University School
University of Wisconsin-Madison. of Medicine, Boston University Arthritis Center and
Marl, J.D.J. and Williams, P. (1985) A comparison of the Department of Public Health.
validity of two psychiatric screening questionnaires Meenan, R.F., Gertman, P.M. and Mason, J.H. (1980)
(GHQ-12 and SRQ-20) in Brazil, using relative Measuring health status in arthritis: the arthritis
operating characteristics (ROC) analysis. Psychological impact measurement scales. Arthritis and Rheumatism,
Medicine, 15: 651–9. 23: 146–52.
Marteau, T.M. and Bekker, H. (1992) The development Meenan, R.F., Gertman, P.M., Mason, J.H. et al. (1982)
of a six-item short-form of the state scale of the The arthritis impact measurement scales: further
192 REFERENCES
investigations of a health status measure. Arthritis and Michalos, A.C., Hubley, A.M., Zumbo, B.D. et al. (2001)
Rheumatism, 25: 1048–53. Health and other aspects of the quality of life of older
Meenan, R.F., Anderson, J.J., Kazis, L.E. et al. (1984) people. Social Indicators Research, 54: 239–74.
Outcome assessment in clinical trials: evidence for Michaud, K., Messer, J., Choi, H.K. and Wolfe, F. (2003)
the sensitivity of a health status measure. Arthritis and Direct medical costs and their predictors in patients
Rheumatism, 27: 1344–52. with rheumatoid arthritis: a three year study of 7,527
Meenan, R.F., Mason, J.H., Anderson, J.J. et al. (1992) patients. Arthritis and Rheumatism, 48: 2750–62.
AIMS2. The content and properties of a revised and Milne, J.S., Maule, M.M., Cormack, S. et al. (1972) The
expanded Arthritis Impact Measurement Scales design and testing of a questionnaire and examination
Health Status Questionnaires. Arthritis and Rheumatism, to assess physical and mental health in older people
35: 1–10. using a staff nurse as the observer. Journal of Chronic
Mellor, K.S. and Edelmann, R.J. (1988) Mobility, Diseases, 25: 385–405.
social support, loneliness and well-being amongst Mitchell, J.C. (1969) The concept and use of social net-
two groups of older adults. Journal of Personality and works, in J.C. Mitchell (ed.) Social Networks in Urban
Individual Differences, 9: 1–5. Situations: Analysis of Personal Relationships in Central
Melzack, R. (1975) The McGill pain questionnaire: African Towns. Manchester: Manchester University
major properties and scoring methods. Pain, 1: Press.
277–99. Mitchell, R.E. and Trickett, E.J. (1980) Social networks
Melzack, R. (1983) Pain Measurement and Assessment. as mediators of social support: an analysis of the effects
New York: Raven Press. and determinants of social networks. Community
Melzack, R. (1987) The short-form McGill Pain Mental Health Journal, 16: 27–44.
Questionnaire. Pain, 30: 191–7. Moinpour, C.M., Lyons, B., Schmidt, S.P. et al. (2000)
Melzack, R. and Katz, J. (1992) The McGill Pain Substituting proxy ratings for patient ratings in
Questionnaire: appraisal and current status, in cancer clinical trials: an analysis based on Southwest
D.C. Turk and R. Melzack (eds) Handbook of Pain Oncology Group trial patients with brain metastases.
Assessment. New York: The Guilford Press. Quality of Life Research, 9: 219–31.
Melzack, R. and Torgerson, W.S. (1971) On the language Monk, M. (1981) Blood pressure awareness and psycho-
of pain. Anesthesiology, 34: 50. logical well-being in the Health and Nutrition
Melzack, R., Terrence, C., Fromm, G. and Amsel, Examination Survey. Clinical Investigative Medicine,
R. (1986) Trigeminal neuralgia and atypical face 4: 183–9.
pain: use of the McGill Pain Questionnaire for dis- Montgomery, S.A. and Asberg, M. (1979) A new depres-
crimination and diagnosis. Pain, 27: 297–302. sion scale designed to be sensitive to change. British
Merrell, M. and Reed, L.J. (1949) The Epidemiology of Journal of Psychiatry, 134: 382–9.
Health, Social Medicine, its Deviations and Objectives. Montgomery, S.A., Asberg, M., Traskman, L. and
New York: The Commonwealth Fund. Montgomery, D. (1978) Cross cultural studies on the
Messick, S. (1980) Test validity and the ethics of assess- use of the CPRS in English and Swedish depressed
ment. American Psychologist, 35: 1012–27. patients. Acta Psychiatrica Scandinavica, 271: 3–37
Metcalfe, M. and Goldman, E. (1965) Validation of an (suppl.).
inventory for measuring depression. British Journal Moorey, S., Greer, S., Watson, M. et al. (1991) The factor
of Psychiatry, 111: 240–2. structure and factor stability of the Hospital Anxiety
Meyboom-de Jong, B. and Smith, R.J.A. (1990) Studies and Depression Scale in patients with cancer. British
with the Dartmouth COOP Charts in general Journal of Psychiatry, 158: 255–9.
practice: comparison with the Nottingham Health Moos, R.H. and Moos, B.S. (1981) Manual for Family
Profile and the General Health Questionnaire, in Environment Scale. Palo Alto, CA: Consulting Psycho-
M. Lipkin (ed.) Functional Status Measurement in logists Press.
Primary Care. New York: Springer-Verlag. Moos, R.H. and Moos, B.S. (1994) Family Environment
Meyer-Rosenberg, K., Burckhardt, C.S., Huizar, K. et al. Scale (FES) and Manual, 3rd edn. Palo Alto, CA:
(2001) A comparison of the SF-36 and Nottingham Consulting Psychologists Press.
Health Profile in patients with chronic neuropathic Mor, V. (1987) Cancer patients’ quality of life over the
pain. European Journal of Pain, 5: 391–403. disease course: lessons from the real world. Journal
Michalos, A.C. (1986) Job satisfaction, marital satisfaction of Chronic Disease, 40: 535–44.
and the quality of life: a review and preview, in F.M. Mor, V., Laliberte, L., Morris, J.N. et al. (1984) The
Andrews (ed.) Research on the quality of life. Ann Karnofsky performance status scale: an examination
Arbor, MI: Survey Research Center, Institute for of its reliability and validity in a research setting.
Social Research, University of Michigan. Cancer, 53: 2002–7.
REFERENCES 193
Moreno, J.K., Fuhriman, A. and Selby, M.J. (1993) Mumford, D.B., Tareen, I.A., Bajwa, M.A. et al. (1991)
Measurement of hostility, anger, and depression in The translation and evaluation of an Urdu version
depressed and non-depressed subjects. Journal of of the Hospital Anxiety and Depression Scale. Acta
Personality Assessment, 61: 511–23. Psychiatrica Scandinavica, 83: 81–5.
Morgan, K., Dallosso, H.M., Arie, T. et al. (1987) Mental Nagi, S. (1965) Some conceptual issues in disability
health and psychological well-being among the old and rehabilitation, in M. Sussman (ed.) Sociology and
and the very old living at home. British Journal of Rehabilitation. Washington, DC: American Sociological
Psychiatry, 150: 801–7. Society.
Morris, J.N. (1975) Changes in morale experienced by Nagi, S. (1991) Disability concepts revisited: implications
the elderly institutionalized applicants along the for prevention, in A. Pope and A. Tarlor (eds) Disability
institutional path. Gerontologist, 15: 345–9. in America: Towards a National Agenda for Prevention.
Morris, J.N. and Sherwood, S. (1975) A re-testing and Washington, DC: National Academy Press.
modification of the Philadelphia Geriatric Center Nakayama, T., Toyoda, H., Ohno, K. et al. (2000) Validity,
Morale Scale. Journal of Gerontology, 30: 77–84. reliability and acceptability of the Japanese version
Morris, J.N. and Sherwood, S. (1987) Quality of life of of the General Well-Being Schedule. Quality of Life
cancer patients at different stages in the disease trajec- Research, 9: 529–39.
tory. Journal of Chronic Disease, 40: 545–53. Nanda, U., McLendon, P.M., Andresen, E.M. and
Morris, J.N., Wolf, R.S. and Klerman, L.V. (1975) Com- Armbrecht, E. (2003) The SIP68: an abbreviated Sick-
mon themes among morale and depression scales. ness Impact Profile for disability outcomes research.
Journal of Gerontology, 30: 209–15. Quality of Life Research, 12: 583–95.
Morris, J.C.D., Suissa, A., Sherwood, S. et al. (1986) National Heart and Lung Institute (1976) Report of
Last days: a study of the quality of life of terminally a task group on cardiac rehabilitation, in Proceedings
ill cancer patients. Journal of Chronic Disease, 39: of the Heart and Lung Institute Working Conference on
47–62. Health Behaviour. Bethesda, MD: US Department of
Morris, L.W., Morris, R.G. and Britton, P.G. (1989) Health, Education and Welfare.
Social support networks and formal support as factors Naughton, M.J. and Wiklund, I. (1993) A critical review
influencing the psychological adjustment of spouse of dimension-specific measures of health related
caregivers of dementia sufferers. International Journal of quality of life in cross-cultural research. Quality of Life
Geriatric Psychiatry, 4: 47–51. Research, 2: 397–432.
Morton-Williams, J. (1979) Alternative patterns of care Nayani, S. (1989) The evaluation of psychiatric illness
for the elderly: Methodological report. London. Social in Asian patients by the HAD scale. British Journal
and Community Planning Research. of Psychiatry, 155: 545–7.
Motzer, S.A., Hertig, V., Jarrett, M. et al. (2003) Sense of Nelson, E.C. and Berwick, D.M. (1989) The measure-
coherence and quality of life in women with and ment of health status in clinical practice. Medical Care,
without irritable bowel syndrome. Nursing Research, 27 (Supplement to no. 3): S77–90.
52: 329–37. Nelson, E.C., Conger, R., Douglas, D. et al. (1983)
Mowbray, R.M. (1972) The Hamilton Rating Scale for Functional health status levels of primary care
Depression: A factor analysis. Psychological Medicine, patients. Journal of the American Medical Association,
2: 272. 249: 3331–8.
Mulder, P.H. and Sluijs, E.M. (1993) Dependent elderly. Nelson, E.C., Landgraf, J.M., Hays, R.D. et al. (1990a)
Quality of life indicators. Bibliography no. 48. The The functional status of patients: how can it be
Netherlands: Netherlands Institute of Primary Health measured in physicians offices? Medical Care, 28:
Care (NIVEL). 1111–26.
Mulder, R.T., Joyce, P.R. and Frampton, C. (2003) Nelson, E.C., Landgraf, R.D., Hays, J.W. et al. (1990b)
Relationships among measures of treatment outcome The COOP Function Charts: a system to measure
in depressed patients. Journal of Affective Disorder, 76: patient function in physician’s office, in WONCA
127–35. Classification Committee: Functional status measurement in
Mulgrave, N.W. (1985) Clifton Assessment Procedures primary care. New York: Springer-Verlag.
for the elderly, in D.J. Keyser and R.C. Sweetland Nelson, E.J., Wasson, J., Kirk, A. et al. (1987) Assessment
(eds) Test Critiques, vol. II. Kansas City, MI: Test of function in routine clinical practice: description
Corporation of America. of the COOP Chart method and preliminary
Muller, M.J. and Dragicevic, A. (2003) Standardized rater findings. Journal of Chronic Diseases, 40 (Supplement 1):
training for the Hamilton Depression Rating Scale 55S–63S.
(HAMD-17) in psychiatric novices. Journal of Affective Nemeth, K.A., Graham, I.D. and Harrison, M.B. (2003)
Disorders, 77: 65–9. The measurement of leg ulcer pain: identification and
194 REFERENCES
appraisal of pain assessment tools. Advanced Skin Wound O’Boyle, C.A., Brone, J., Hickey, A. et al. (1995) Schedule
Care, 16: 260–7. for the Evaluation of Individual Quality of Life
Neto, F. and Barros, J. (2000) Psychosocial concomitants (SEIQoL): a direct weighting procedure for quality of
of loneliness among students of Cape Verde and life domains (SEIQoL-DW). Administration manual.
Portugal. Journal of Psychology, 134: 503–14. Dublin: Department of Psychology, Royal College of
Neudert, C., Wasner, M. and Borasio, G.D. (2001) Surgeons in Ireland.
Patients’ assesment of quality of life instruments: a O’Brien, B.J. (1988) Assessment of treatment in heart
randomized study of SIP, SF-36 and SEIQoL-DW in disease, in G. Teeling Smith (ed.) Measuring Health:
patients with amyotropic lateral sclerosis. Journal of A Practical Approach. Chichester: John Wiley.
Neurological Science, 191: 103–9. O’Brien, B.J., Banner, N.R., Gibson, S. et al. (1988) The
Neugarten, B.L., Havighurst, R.J. and Tobin, S.S. (1961) Nottingham Health Profile as a measure of quality of
The measurement of life satisfaction. Journal of life following combined heart and lung transplanta-
Gerontology, 16: 134–43. tion. Journal of Epidemiology and Community Health, 42:
Nocon, A. and Qureshi, H. (1996) Outcomes of 232–4.
Community Care for Users and Carers. A Social Services O’Brien, B.J., Spath, M., Blackhouse, G. et al. (2003) A
Perspective. Buckingham: Open University Press. view from the bridge: agreement between the SF-6D
Noelker, L. and Harel, Z. (1978) Predictors of well- utility algorithm and the Health Utilities Index. Health
being and survival among institutionalized aged. Economics, 12: 975–81.
Gerontologist, 18: 562–7. Office for National Statistics (2002) Living in Britain:
Norris, J.T., Gallagher, D., Wilson, A. et al. (1987) Results from the 2000 General Household
Assessment of depression in geriatric medical out- Survey. London: ONS web based publication: www.
patients: the validity of two screening measures. statistics.gov.uk/ssd/surveys/general-household-
Journal of the American Geriatrics Society, 35: 989–95. survey
Nou, E. and Aberg, T. (1980) Quality of survival in Office of Population Censuses and Surveys (1987)
patients with surgically treated bronchial carcinoma. General Household Survey (1985). London: HMSO.
Thorax, 35: 255–63. Oga, T., Nishimura, K., Tsukino, M. et al. (2003) A com-
Nouri, F.M. and Lincoln, N.B. (1987) An extended parison of the responsiveness of different generic
activities of daily living scale for stroke patients. health status measures in patients with asthma. Quality
Clinical Rehabilitation, 1: 301–5. of Life Research, 12: 555–63.
Novy, D.M., Nelson, D.V., Goodwin, J. and Rowze, R.D. Ohta, Y., Kawasaki, N., Araki, K. et al. (1995) The factor
(1993) Psychometric comparability of the State-Trait structure of the General Health Questionnaire
Anxiety Inventory for different ethnic subpopulations. (GHQ-30) in Japanese middle aged and elderly
Psychological Assessment, 5: 343–9. residents. International Journal of Social Psychiatry, 41:
Nunnally, J. (1978) Psychometric Theory, 2nd edn. New 268–75.
York: McGraw-Hill. Oliver, J.P., Huxley, P.J., Priebe, S. and Kaiser, W. (1997)
Nybo, H., Gaist, D., Jeune, B., McGue, M. et al. (2001) Measuring the quality of life of severely mentally ill
Functional status and self-rated health in 2,262 non- people using the Lancashire Quality of Life Profile.
agenarians: the Danish 1905 cohort survey. Journal of Social Psychiatry and Psychiatric Epidemiology, 32: 76–83.
the American Geriatrics Society, 49, 601–9. Olsen, O. (1992) Impact of social network on cardio-
Nydegger, C. (1986) Measuring morale and life satis- vascular mortality in middle-aged Danish men. Journal
faction, in C.L. Fry and J. Keith (eds) New Methods of Epidemiology and Community Health, 47: 176–80.
for Old Age Research: Strategies for Studying Diversity. Oman, D. and Reed, D. (1998) Religion and mortality
Boston, MA: Bergin and Garvey. among the community-dwelling elderly. American
O’Boyle, C.A. (1996) Quality of life in palliative care, in Journal of Public Health, 88: 1469.
G. Ford and I. Lewin (eds) Managing Terminal Illness. O’Reilly, P. (1988) Methodological issues in social
London: Royal College of Physicians Publications. support and social network research. Social Science and
O’Boyle, C.A. (1997a) Measuring the quality of later Medicine, 26: 863–73.
life. Philosophy Transactions of the Royal Society of O’Riordan, T.G., Haynes, J.P. and O’Neil, D. (1990)
London, 352: 1871–9. The effect of mild to moderate dementia on the
O’Boyle, C.A. (1997b) Quality of life assessment: a Geriatric Depression Scale and on the General Health
paradigm shift in healthcare? Irish Journal of Psychology, Questionnaire. Age and Ageing, 19: 57–61.
18: 51–66. Orth-Gomer, K. and Johnson, J. (1987a) Social network
O’Boyle, C.A., McGee, H., Hickey, A. et al. (1992) interaction and mortality. A six-year follow-up study
Individual quality of life in patients undergoing hip of a random sample of the Swedish population. Journal
replacement. Lancet, 339: 1088–91. of Chronic Diseases, 40: 949–57.
REFERENCES 195
Orth-Gomer, K. and Unden, A.L. (1987b) The Patrick, D.L. and Erickson, P. (1993) Health Status and
measurement of social support in population surveys. Health Policy. Quality of Life in Health Care Evaluation
Social Science and Medicine, 24: 83–94. and Resource Allocation. New York: Oxford University
Orth-Gomer, K., Britton, M. and Rehnqvist, N. (1979) Press.
Quality of care in an out-patient department: the Patrick, D.L., Bush, J.W. and Chen, M.M. (1973a)
patient’s view. Social Science and Medicine, 13A: 347–57. Methods for measuring levels of well-being for a
Osaka, R., Nanakorn, S. and Chusilp, K. (1998) Cornell health status index. Health Services Research, 11: 516.
Medical Index: a comparative study on health Patrick, D.L., Bush, J.W. and Chen, M.M. (1973b)
problems among Thai and Japanese nursing students. Toward an operational definition of health. Journal of
Southeast Asian Journal of Tropical Medicine and Public Health and Social Behaviour, 14: 6–23.
Health, 29: 293–8. Pattie, A.H. (1981) A survey version of the Clifton
Osborn, D.P., Fletcher, A.E., Smeeth, S. et al. (2002) Assessment procedures for the Elderly (CAPE). British
Geriatric Depression Scale scores in a representative Journal of Clinical Psychology, 20: 173–8.
sample of 14545 people aged 75 and over in the Pattie, A.H. and Gilleard, C.J. (1975) A brief psycho-
United Kingdom: results from the MRC trial of geriatric assessment schedule. Validation against psy-
assessment and management of older people in the chiatric diagnosis and discharge from hospital. British
community. International Journal of Geriatric Psychiatry, Journal of Psychiatry, 127: 489–93.
17: 592. Pattie, A.H. and Gilleard, C.J. (1979) Manual of the Clifton
Ott, C.R., Sivarajan, E.S., Newton, K.M. et al. (1983) A Assessment Procedures for the Elderly. Sevenoaks: Hodder
controlled randomized study of early cardiac rehabili- and Stoughton.
tation: the Sickness Impact Profile as an assessment Pavot, W. and Diener, E. (1993) Review of the Satis-
tool. Heart and Lung, 12: 162–70. faction with Life Scale. Psychological Assessment, 5:
Ottawa Charter for Health Promotion (1986) Health Pro- 164–72.
motion, 14: iii–v. Pavot, W., Diener, E., Colvin, C.R. and Sandvik, E.
Ottenbacher, K.S., Mann, W.G., Granger, C.V. et al. (1991) Further validation of the satisfaction with Life
(1994) Inter-rater agreement and stability of func- Scale: evidence for the cross-method convergence of
tional assessment in the community based elderly. well-being measures. Journal of Personality Assessment,
Archives of Physical Medicine and Rehabilitation, 75: 57: 149–61.
1297–301. Paykel, E.S. (1985) Clinical interview for depression,
Oxman, T.E. and Berkman, L.F. (1990) Assessment of development, reliability and validity. Journal of Affective
social relationships in elderly patients. International Disorders, 9: 85–96.
Journal of Psychiatry in Medicine, 20: 65–84. Payne, R.L. and Graham Jones, J. (1987) Measurement
Pallant, J.F. and Lae, L. (2002) Sense of coherence, and methodological issues in social support, in S.V.
well-being, coping and personality factors: further Kasl and C.L. Cooper (eds) Stress and Health: Issues in
evaluation of the sense of coherence scale. Journal Research Methodology. Chichester: John Wiley.
of Personality and Individual Differences, 33: 39–48. Pearlin, L.I. and Schooler, C. (1978) The structure of
Pandey, M., Singh, S.P., Behere, P.B. et al. (2000) Quality coping. Journal of Health and Social Behavior, 19: 2–21.
of life in patients with early and advanced carcinoma Penley, J.A., Wiebe, J.S. and Nwosu, A. (2003) Psycho-
of the breast. European Journal of Surgical Oncology, 26: metric properties of the Spanish Beck Depression
20–4. Inventory-II in a medical sample. Psychological Assess-
Parker, R.D., Flint, E.P., Bosworth, H.B. et al. (2003) ment, 15: 569–77.
A three-factor analytic model of the MADRS in Perlman, D. and Peplau, L.A. (1981) Toward a social
geriatric depression. International Journal of Geriatric psychology of loneliness, in R. Gilmour and S. Duck
Psychiatry, 18: 73–7. (eds) Personal Relationships: 3. Personal Relationships in
Parker, S.G., Du, X., Bardsley, M.J. et al. (1994) Disorder. London: Academic Press.
Measuring outcomes in care of the elderly. Journal Perlman, R.A. (1987) Development of a functional
of the Royal College of Physicians of London, 28: 428–33. assessment questionnaire for geriatric patients: the
Patrick, D.L. (ed.) (1982) Health and Care of the Physically comprehensive older persons’ evaluation (COPE).
Disabled in Lambeth. Report of Phase II of the Journal of Chronic Diseases, 40: 85S–94S, Supplement.
Longitudinal Disability Interview Survey. London: Perloff, J.M. and Persons, J.B. (1988) Biases resulting from
St Thomas’s Hospital Medical School, Department of the use of indexes: an application to attributional style
Community Medicine. and depression. Psychological Bulletin, 103: 95–104.
Patrick, D.L. (2003) Patient-reported outcomes (PROs). Permanyer-Miralda, G., Alonso, J., Anto, J.M. et al. (1991)
An organizing tool for concepts, measures and Comparison of perceived health status and con-
applications. Quality of Life Newsletter, 31: 1–5. ventional functional evaluation in stable patients with
196 REFERENCES
coronary artery disease. Journal of Clinical Epidemiology, Priestman, T.J. and Baum, M. (1976) Evaluation of
44: 779–86. quality of life in patients receiving treatment for
Persson, R. and Orbaek, P. (2003) The influence of per- advanced breast cancer. Lancet, i: 899–901.
sonality traits on neuropsychological test performance Prieto, E.J. and Geisinger, K.F. (1983) Factor analytic
and self-reported health and social contacts in women. studies of the McGill Pain Questionnaire, in
Journal of Personality and Individual Differences, 34: R. Melzack (ed.) Pain Measurement and Assessment.
295–313. New York: Raven Press.
Pettit, T., Livingstone, G., Manela, M. et al. (2001) Prince, M.J., Harwood, R.H., Blizzard, R.A. et al. (1997)
Validation and normative data of health status Impairment, disability and handicap as risk factors
measures in older people: the Islington study. Inter- for depression in old age. The Gospel Oak Project.
national Journal of Geriatric Psychiatry, 16: 1061–70. Psychological Medicine, 27: 311–21.
Pfeiffer, B.A., McClelland, T. and Lawson, J. (1989) Use Prince, M.J., Harwood, R.H., Thomas, A. et al. (1998)
of the Functional Assessment Inventory to distinguish A prospective population based cohort study of the
among the rural elderly in five service settings. Journal effects of disablement and social milieu on the onset
of the American Geriatrics Society, 37: 243–8. and maintenance of late life depression. The Gospel
Pfeiffer, E. (1975) A short portable mental status ques- Oak Project. Psychological Medicine, 28: 337–50.
tionnaire for the assessment of organic brain deficit Procidano, M.E. and Heller, K. (1983) Measures of per-
in elderly patients. Journal of American Geriatrics Society, ceived social support from friends and from family:
23: 433–41. three validation studies. American Journal of Community
Pibernik-Okanovic, M. (2001) Psychometric properties Psychology, 11: 1–24.
of the World Health Organization quality of life Putnam, M. (2002) Linking aging theory and disability
questionnaire (WHOQOL-100) in diabetic patients models: increasing the potential to explore aging with
in Croatia. Diabetes Research in Clinical Practice, 51: physical impairment. The Gerontologist, 42: 799–806.
133–43. Putnam, R.D. (1995) Bowling alone: America’s declining
Pierce, G.R., Sarason, I.G. and Sarason, B.R. (1991) social capital. Journal of Democracy, 6: 65–78.
General and relationship-based perceptions of social Putnam, R.D. (2000) Bowling Alone. The Collapse and
support: are two constructs better than one? Journal Revival of American Community. New York: Simon
of Personality and Social Psychology, 61: 1028–39. and Schuster.
Pincus, T. Summey, J.A., Soraci, S.A. et al. (1983) Pyne, J.M., Sieber, W.J., David, K. et al. (2003) Use of the
Assessment of patient satisfaction in activities of daily quality of well-being self administered version
living using a modified Stanford Health Assessment (QWB-SA) in assessing health related quality of life
Questionnaire. Arthritis and Rheumatism, 26: 1346–53. in depressed patients. Journal of Affective Disorders, 76:
Pollard, W.E., Bobbitt, R.A., Bergner, M. et al. (1976) 237–47.
The Sickness Impact Profile: reliability of a health Quintana, J.M., Padierna, A., Esteban, C. et al. (2003)
status measure. Medical Care, 14: 57–67. Evaluation of the psychometric characteristics of
Pollard, W.E., Bobbitt, R.A. and Bergner, M. (1978) the Spanish version of the Hospital Anxiety and
Examination of variable errors of measurement in a Depression Scale. Acta Psychiatrica Scandinavica, 107:
survey-based social indicator. Social Indicators Research, 216–21.
5: 279–301. Qureshi, K.N. and Hodkinson, H.M. (1974) Evaluation
Pomeroy, E., Cook, B. and Benjafield, J. (1992) Perceived of a 10 question mental test in the institutionalized
social support in three residential contexts. Canadian elderly. Age and Ageing, 3: 152–7.
Journal of Community Mental Health, 11: 101–7. Radloff, L.S. (1977) Sex differences in depression: the
Potts, M.K., Daniels, M., Burnam, A. and Wells, K.B. effects of occupation and marital status. Sex Roles,
(1990) A structured interview version of the Hamilton 1: 249–65.
Depression Rating Scale: Evidence of reliability Radosevich, D.M. and Husnik, M.J. (1995) An abbrevi-
and versatility of administration. Journal of Psychiatric ated health status questionnaire: the HSQ-12. Update.
Research, 24: 335–50. Bloomington, IN: Newsletter of the Health Out-
Power, M., Harper, A. and Bullinger, M. and the WHO comes Institute, 2: 1–4.
Quality of Life Group (1999) The World Health Radosevich, D.M. and Pruitt, M.J.H. (1995) HSQ-12
Organization WHOQOL-100: tests of the universality Cooperative validation project: phase 1 reliability,
of quality of life in 15 different cultural groups world- validity and comparability. Update. Bloomington, IN:
wide. Health Psychology, 18: 495–505. Newsletter of the Health Outcomes Institute, 2: 3.
Pretorius, T.B. and Diedricks, M. (1993) A factorial Ramey, D.R., Raynauld, J.P. and Fries, J.F. (1992) The
investigation of the dimensions of social support. South Health Assessment Questionnaire 1992. Arthritis Care
African Journal of Psychology, 23: 32–5. and Research, 5: 119–29.
REFERENCES 197
Ramey, D.R., Fries, J.F. and Singh, G. (1996) The Health depression in elderly people. British Journal of Clinical
Assessment Questionnaire 1995 – Status and review, Psychology, 35: 543–51.
in B. Spilker (ed.) Pharmacoeconomics and Quality of Life Richter, P., Werner, J., Heerlien, A. et al. (1998) On the
in Clinical Trials, 2nd edn. Philadelphia: Lippincott- validity of the Beck Depression Inventory. A review.
Raven. Psychopathology, 46: 34–43.
Ranhoff, A.H. and Laake, K. (1993) The Barthel ADL Riggio, R.E., Watring, K.P. and Throckmorton, B.
Index: scoring by the physician from patient interview (1993) Social skills, social support, and psychosocial
is not reliable. Age and Ageing, 22: 171–4. adjustment. Journal of Personality and Individual
Rankin, J. (1957) Cerebral vascular accidents in people Differences, 15: 275–80.
over the age of 60. II. Prognosis. Scottish Medical Journal, Rivera, P.A., Rose, J.M., Futterman, A. et al. (1991)
2: 200–15. Dimensions of perceived support in clinically
Ranzijn, R. and Luszcz, M. (2000) Measurement of depressed and non-depressed female caregivers.
subjective quality of life in elders. International Journal Psychology of Aging, 6: 232–7.
of Aging and Human Development, 50: 263–78. Robins, L.N., Helzer, J.E., Croughan, J.L. and Ratcliff, K.
Raskin, A. (1986) Sensitivity to treatment effects of (1981) The NIMH diagnostic interview schedule:
evaluation instruments completed by psychiatrists, Its history, characteristics and validity, in J.K. Wing,
psychologists, nurses, and patients, in N. Sartorius and P. Bebbington and L.N. Robins (eds) What is a Case?
T. Ban (eds) Assessment of Depression. Berlin: Springer. The Problem of Definition in Psychiatric Community
Read, L.J., Quinn, R.J. and Hoefer, M.A. (1987) Surveys. London: Grant MacIntyre.
Measuring overall health: an evaluation of three Robinson, J.P. and Shaver, P.R. (1973) Measures of Social
important approaches. Journal of Chronic Diseases, 40: Psychological Attitudes. Ann Arbor, MI: Survey
7S–21S. Research Centre, Institute for Social Research.
Reading, A.E. (1979) The internal structure of the Robinson, R.A. (1968) The organisation of a diagnostic
McGill Pain Questionnaire in dysmenorrhoea and treatment unit for the aged, in UK Giegy,
patients. Pain, 7: 353–8. Psychiatric Disorders in the Aged. Manchester: World
Reading, A.E., Everitt, B.S. and Sledmere, C.M. (1982) Psychiatric Association.
The McGill Pain Questionnaire: a replication of Rodgers, H., Curless, R. and James, O.F.W. (1993)
its construction. British Journal of Clinical Psychology, Standardized functional assessment scales for elderly
21: 339–49. patients. Age and Ageing, 22: 161–3.
Reed, P.F., Fitts, W.H. and Boehm, L. (1980) Tennessee Self Rogerson, R.J. (1995) Environmental and health-
Concept Scale: Bibliography of Research Studies. Nashville, related quality of life: conceptual and methodological
TN: Councillor Recordings and Tests. similarities. Social Science and Medicine, 41: 1373–82.
Rehm, L.P. (1981) Behaviour Therapy for Depression. New Rogerson, R.J., Findlay, A.M., Coombes, M.G. and
York: Academic Press. Morris, A. (1989) Indicators of quality of life. Environ-
Renne, K.S. (1974) Measurement of social health in ment and Planning, 21, 1655–66.
a general population survey. Social Sciences Research, Roid, G.H. and Fitts, W.H. (1988) Tennessee Self Concept
3: 25–44. Scale (TSCS). Los Angeles, CA: Western Psychological
Rettig, K.D. and Leichtentritt, R.D. (1999) A general Services.
theory for perceptual indicators of family life quality. Rosenberg, M. (1965) Society and the Adolescent Self Image.
Social Indicators Research, 47: 307–42. Princeton, NJ: Princeton University Press.
Revicki, D.A. and Kaplan, R.M. (1993) Relationship Rosenberg, M. (1986) Conceiving the Self, 2nd edn.
between psychometric and utility-based approaches Malabar, FL: Krieger.
to the measurement of health related quality of life. Rosenberg, R. (1995) Health-related quality of life
Quality of Life Research, 2: 477–87. between naturalism and hermeneutics. Special
Reynolds, W.M. and Gould, J.W. (1981) A psychometric Issue ‘Quality of Life’ in Social Science and Medicine,
investigation of the standard and short form Beck 10: 1411–15.
Depression Inventory. Journal of Consulting and Clinical Ross, C.E. and Mirowsky, J. (2001) Neighbourhood
Psychology, 49: 306–7. disadvantage, disorder and health. Journal of Health
Riazi, A., Hobart, J.C., Lamping, D.L. et al. (2003) and Social Behavior, 42: 258–76.
Evidence-based measurement in multiple sclerosis: Rosser, R.M. and Watts, V.C. (1971) The sanative
the psychometric properties of the physical and outputs of hospitals. Dallas, 39th conference of the
psychological dimensions of three quality of life rating Operational Research Society of America.
scales. Multiple Sclerosis, 9: 411–19. Rosser, R.M. and Watts, V.C. (1972) The measurement
Richardson, C.A. and Hammond, S.M. (1996) A of hospital output. International Journal of Epidemiology,
psychometric analysis of a short device for assessing 1: 361–8.
198 REFERENCES
Roth, M., Tym, E., Mountjoy, C.Q. et al. (1986) CAM- SF-36 General Health Questionnaire to the standard
DEX: a standardized instrument for the diagnosis of paper version. Quality of Life Research, 11: 19–26.
mental disorder in the elderly with special reference Ryff, C.D. (1989) Beyond Ponce de Leon and Life
to the early detection of dementia. British Journal satisfaction: New directions in quest of successful
of Psychiatry, 149: 698–709. aging. International Journal of Behavioral Development,
Ruini, C., Ottolini, F., Rafanelli, C. et al. (2003) The 12: 35–55.
relationship of psychological well-being to distress Ryff, C.D. (1995) Psychological well-being in adult life.
and personality. Psychotherapy and Psychosomatics, 72: Current Directions in Psychological Science, 4: 99–104.
268–75. Ryff, C.D. and Essex, M.J. (1991) Psychological well-
Russell, D. (1982) The measurement of loneliness, in L.A. being in adulthood and old age: descriptive markers
Peplau and D. Perlman (eds) Loneliness: A Sourcebook of and explanatory processes, in K. Warner Schaie and
Current Theory, Research and Therapy. New York: John M. Powell Lawton (eds) Annual Review of Gerontology
Wiley. and Geriatrics, vol. 11: 144–71.
Russell, D. (1996) UCLA Loneliness Scale (version 3): Ryff, C.D. and Keyes, C.L.M. (1995) The structure of
reliability, validity, and factor structure. Journal of psychological well-being revisited. Journal of Personality
Personality Assessment, 66: 20–40. and Social Psychology, 69: 719–27.
Russell, D. and Cutrona, C.E. (1991) UCLA Loneliness Sackett, D.L., Spitzer, W.O., Gent, M. et al. (1974)
Scale version 3, in J.P. Robinson, P.R. Shaver and The Burlington Randomized Trial of the nurse
L.S. Wrightsman (eds) Measures of Personality and Social practitioner: health outcomes of patients. Annals of
Psychological Attitudes. San Diego, CA: Academic Press. Internal Medicine, 80: 137–42.
Russell, D., Peplau, L.A. and Ferguson, M.L. (1978) Sainsbury, S. (1973) Measuring Disability. London: Bell.
Developing a measure of loneliness. Journal of Per- Salaffi, F., Piva, S., Barreca, C. et al. (2000) Validation of an
sonality Assessment, 42: 290–4. Italian version of the arthritis impact measurement
Russell, D., Peplau, L.A. and Cutrona, C.E. (1980a) The scales 2 (ITALIAN-AIMS2) for patients with osteo-
revised UCLA Loneliness Scale: concurrent and arthritis of the knee. Gonarthrosis and quality of life
discriminant validity evidence. Journal of Personality and assessment (GOQOLA) Study Group. Rheumatology,
Social Psychology, 39: 472–80. 39: 720–7.
Russell, D., Peplau, L.A. and Cutrona, C.E. (1980b) Salek, S. (1999) Compendium of Quality of Life Instruments.
Revised UCLA Loneliness Scale version 3 (RULS), Chichester: John Wiley and Sons.
in K. Corcoran and J. Fischer (eds) Measures for Clinical Salyers, M.P., Bosworth, H.B., Swanson, J.W. et al. (2000)
Practice: a Sourcebook. Vol. 2. New York: Free Press. Reliability and validity of the SF-12 health survey
Ruta, D.A. (1992) A new approach to the measurement among people with severe mental illness. Medical Care,
of quality of life. The patient generated index. 38: 1141–50.
Paper presented to the Workshop on Quality of Life, Sandler, I.N. and Barrera, M. (1984) Towards a multi-
Society for Social Medicine 36th Annual Conference, method approach to assessing the effects of social
Nottingham, September. support. American Journal of Community Psychology,
Ruta, D.A., Abdalla, M.I., Garratt, A.M. et al. (1994a) 12: 37–52.
SF-36 health survey questionnaire: I. Reliability in Sarason, B.R., Sarason, I.G., Hacker, T.A. and Basham,
two patient based studies. Quality in Health Care, R.B. (1985) Concomitants of social support: social
3: 180–5. skills, physical attractiveness, and gender. Journal of
Ruta, D.A., Garratt, A.M., Leng, M. et al. (1994b) A Personality and Social Psychology, 49: 469–80.
new approach to the measurement of quality of life: Sarason, B.R., Shearin, E.N., Pierce, G.R. and Sarason,
the patient-generated index. Medical Care, 32: 1109– I.G. (1987a) Interrelationships of social support
26. measures: theoretical and practical implications. Journal
Ruta, D.A., Garratt, A.M. and Russell, I.T. (1999) Patient of Personality and Social Psychology, 52: 813–32.
centred assessment of quality of life for patients Sarason, I.G., Levine, H.M., Basham, R.B. and Sarason,
with four common conditions. Quality in Health Care, B.R. (1983) Assessing social support: the social support
8: 22–9. questionnaire. Journal of Personality and Social Psych-
Rutledge, T., Matthews, K., Lui, L.Y., et al. (2003) Social ology, 44: 127–39.
networks and marital status predict mortality in older Sarason, I.G., Sarason, B.R., Shearin, E.N. and Pierce,
women: prospective evidence from the study of oste- G.R. (1987b) A brief measure of social support.
oporotic fractures (SOF). Psychosomatic Medicine, 65: Practical and theoretical implications. Journal of Social
688–94. and Personal Relationships, 4: 497–510.
Ryan, J.M., Corry, J.R., Attewell, R. and Smithsen, M.J. Sarason, I.G., Sarason, B.R. and Pierce, G.R. (1994)
(2002) A comparison of the electronic version of the Social support: global and relationship-based levels
REFERENCES 199
of analysis. Journal of Social and Personal Relationships, normative aging study. Journal of Nervous Mental
11: 295–312. Diseases, 186: 522–8.
Sarasqueta, C., Bergareche, A., Arce, A. et al. (2001) The Schoenbach, V., Kaplan, B.H., Fredman, L. and Klein-
validity of Hodkinson’s Abbreviated Mental test for baum, D.G. (1986) Social ties and mortality in Evans
dementia screening in Guipuzcoa, Spain. European County, Georgia. American Journal of Epidemiology,
Journal of Neurology, 8: 435–40. 123: 577–91.
Saronson, S.B., Carroll, C., Maton, K. et al. (1977) Human Scholten, J.H.G. and van Weel, C. (1992) Manual for the
Services and Resource Networks. San Francisco, CA: use of the Dartmouth COOP Functional Health
Jossey-Bass. Assessment Charts/WONCA in measuring functional
Sarvimäki, A. (1999) What do we mean by ‘quality of status in family practice (Part I), in J.H. Scholten and
life’ in our care for people with dementia? Journal C. van Weel (eds) Functional Status Assessment in Family
of Dementia Care, January/February 35–7. Practice. Lelystad: Meditekst.
Sarvimäki, A. and Stonbock-Hult, B. (2000) Quality Scholzel-Dorenbos, C.J. (2000) Measurement of quality
of life in old age described as a sense of well-being, of life in patients with dementia of Alzheimer type and
meaning and value. Journal of Advanced Nursing, 32: their caregivers: Schedule for the Evaluation of Indi-
1025–33. vidual Quality of Life (SEIQoL). Tijdschr Gerontology
Sauer, W.J. and Warland, R. (1982) Morale and life satis- and Geriatrics, 31: 23–6.
faction, in D.J. Mangen and W.A. Peterson (eds) Schuling, J. and Meyboom-de Jong, B. (1992) Change in
Research Instruments in Social Gerontology, Vol. 1. clinical status in patients with stroke, in J.H. Scholten
Clinical and social psychology. Minneapolis, MN: and C. van Weel (eds) Functional Status Assessment in
University of Minnesota Press. Family Practice. Lelystad: Meditekst.
Saunders, P.A., Copeland, J.R., Dewey, M.E. et al. (1989) Schuling, J., Greidanus, J. and Meyboom-de Jong, B.
Alcohol use and abuse in the elderly: findings from the (1993) Measuring functional status of stroke patients
Liverpool longitudinal study of continuing health in with the Sickness Impact Profile. Disability and
the community. International Journal of Geriatric Psych- Rehabilitation, 15: 19–23.
iatry, 4: 103–8. Schumaker, J.F., Shea, J.D., Monfries, M.M. and Groth-
Sayer, N.A., Sackheim, H.A., Moeller, J.R. et al. (1993) Marnat, G. (1993) Loneliness and life satisfaction in
The relations between observer-rating and self-report Japan and Australia. Journal of Psychology, 127: 65–71.
of depressive symptomatology. Journal of Psychological Schwab, J.J., Brolow, M.R. and Holser, C.E. (1967)
Assessment, 5: 350–60. A comparison of two rating scales for depression.
Schaafsma, J. and Osoba, D. (1994) The Karnofsky per- Journal of Clinical Psychology, 23: 94–6.
formance status scale re-examined: a crossvalidation Schwartz, A.N. (1975) An observation of self-esteem
with the EORTC-C30. Quality of Life Research, 3: as the linchpin of quality of life for the aged. An essay.
413–24. Gerontologist, 15: 470–2.
Schag, C.A.C., Heinrich, R.L. and Ganz, P.A. (1984) Schwartz, N. and Strack, F. (1999) Reports of subjective
Karnofsky Performance Status revisited: reliability, wellbeing: judgemental processes and their method-
validity and guidelines. Journal of Clinical Oncology, ological implications, in D. Kanahan, E. Diener and N.
2: 187–93. Schwartz (eds) Wellbeing: the Foundations of Hedonistic
Scheier, M.F. and Carver, C.S. (1985) Optimism, coping Psychology. New York: Russell Sage Foundation.
and health: Assessment and implications of generalised Scott, P.J., Ansell, B.M. and Huskisson, E.C. (1977) The
outcome expectancies. Health Psychology, 4: 219–47. measurement of pain in juvenile chronic polyarthritis.
Schmitz, N., Kugler, J. and Rollnik, J. (2003) On the Annals of the Rheumatic Diseases, 36: 186–7.
relation between neuroticism, self-esteem, and depres- Scottish Health Education Group (1984) European
sion: results from the National Comorbidity Survey. Monographs in Health Education Research, no. 6.
Comprehensive Psychiatry, 44: 169–76. Edinburgh: Scottish Health Education Group.
Schmutte, P.S. and Ryff, C.D. (1997) Personality and Sedrakyan, A., Vaccarino, V., Paltiel, A.D. et al. (2003)
well-being: re-examining methods and meanings. Age does not limit quality of life improvement in
Journal of Personality and Social Psychology, 73: 549–59. cardiac valve surgery. Journal of the American College of
Schneiderman, L.J., Kaplan, R.M., Pearlman, R.A. et al. Cardiology, 42: 1215–17.
(1993) Do physicians’ own preferences for life sustain- Seedhouse, D. (1986) Health: the Foundations of Achieve-
ing treatment influence their perceptions of patients’ ment. Chichester: John Wiley.
preferences? Journal of Clinical Ethics, 4: 28–33. Seeman, T.E. and Berkman, L.F. (1988) Structural charac-
Schnurr, P.P., Spiro, A., Aldwin, C.M. and Stukel, T.A. teristics of social networks and their relationship with
(1998) Physical symptom trajectories following social support in the elderly: who provides support?
trauma exposure: longitudinal findings from the Social Science and Medicine, 26: 737–49.
200 REFERENCES
Seeman, T.E., Kaplan, G.A., Knudsen, L. et al. (1987) Siegel, M., Bradley, E.H. and Kasl, S.V. (2003) Self-rated
Social network ties and mortality among the elderly life expectancy as a predictor of mortality: evidence
in the Alameda County study. American Journal of from the HRS and AHEAD surveys. Gerontology, 49:
Epidemiology, 126: 714–23. 265–71.
Selai, C.E., Elstner, K. and Trimble, M.R. (2000) Quality Silber, E. and Tippett, J. (1965) Self esteem: clinical
of life pre and post epilepsy surgery. Epilepsy Research, assessment and measurement validation. Psychological
38: 67–74. Reports, 16: 1017–71.
Seligman, M. (2002) Authentic Happiness: Using the New Silverstone, P.H., Entsuah, R. and Hacket, D. (2002)
Positive Psychology to Realize Potential for Lasting Two items on the Hamilton Depression rating
Fulfilment. New York: Free Press. scale are effect predictors of remission: comparison
Seymour, D.G., Ball, A.E., Russell, E.M. et al. (2001) of selective serotonin reuptake inhibitors with the
Problems in using health survey questionnaires in combined serotonin/norepineph reuptake inhibitor,
older patients with physical disabilities. The reliability venlafaxine. International Clinical Psychopharmacology,
and validity of the SF-36 and the effect of cognitive 17: 273–80.
impairment. Journal of the Evaluation of Clinical Practice, Sims, A.C.P. and Salmons, P.H. (1975) Severity of
7: 411–18. symptoms of psychiatric out-patients: use of the
Shah, S., Frank, V. and Cooper, V. (1989) Improving the General Health Questionnaire in hospital and general
sensitivity of the Barthel Index for stroke rehabilita- practice patients. Psychological Medicine, 5: 62–6.
tion. Journal of Clinical Epidemiology, 42: 703–9. Singer, E., Garfinkel, R., Cohen, S.M. et al. (1976)
Shanas, E., Townsend, P. and Wedderburn, D. et al. (1968) Mortality and mental health: evidence from the mid-
Old People in Three Industrial Societies. London: town Manhattan re-study. Social Science and Medicine,
Routledge and Kegan Paul. 10: 517–21.
Shaver, P.R. and Brennan, K.A. (1991) Measures of Sinha, S.P., Nayyar, P. and Sinha, S.P. (2002) Social
depression and loneliness, in J.P. Robinson, P.R. Shaver support and self-control as variables in attitude
and L.S. Wrightsman (eds) Measures of Personality and towards life and perceived control among older
Social Psychological Attitudes. San Diego, CA: Academic people in India. Journal of Social Psychology, 142:
Press. 527–40.
Sheikh, R.L. and Yesavage, J.A. (1986) Geriatric Depres- Sitjas M.E., San Jose, L.A., Armadans, G.L., Mundet, T.X.
sion Scale (GDS): recent evidence and development and Vilardell, T.M. (2003) Predictor factors about
of a shorter version. Clinical Gerontologist, 5: 165–73. functional decline in community-dwelling older per-
Sheikh, R.L., Yesavage, J.A., Brooks, J.O. et al. (1991) sons. Atencion Primaria, 32: 282–7.
Proposed factor structure of the Geriatic Depression Skevington, S.M. (1999) Measuring quality of life in
Scale. International Psychogeriatrics, 3: 23–8. Britain: introducing the WHOQOL-100. Psychomatic
Sherbourne, C.D. and Hays, R.D. (1990) Marital status, Research, 47: 449–59.
social support and health transitions in chronic disease Skevington, S.M., Carse, M.S. and Williams, de C. (2001)
patients. Journal of Health and Social Behaviour, 31: Validation of the WHOQOL-100: pain management
328–43. improves quality of life for chronic pain patients.
Sherbourne, C.D. and Stewart, A.L. (1991) The MOS Clinical Journal of Pain, 17: 264–75.
Social Support Survey. Social Science and Medicine, 32: Skevington, S.M., Lotfy, M. and O’Connell, K.A. (2004)
705–14. The World Health Organization’s WHOQOL-
Sherbourne, C.D., Meredith, L.S., Rogers, W. and Ware, BREF quality of life assessment: psychometric proper-
J.E. (1992) Social support and stressful life events: ties and results of international field trials. A report
Age differences in their effects on health related from the WHOQOL Group. Quality of Life Research,
quality of life among the chronically ill. Quality of Life 13: 299–310.
Research, 1: 235–46. Skinner, D.E. and Yett, D.E. (1972) Debility index for
Sherwood, S.J., Morris, J., Mor, V. and Gutkin, C. (1977) long-term care patients, in R.L. Berg (ed.) Health
Compendium of Measures for Describing and Assessing Status Indexes. Chicago, IL: Hospital Research and
Long Term Care Populations. Boston, MA: Hebrew Education Trust.
Rehabilitation Center for the Aged. Slevin, M.L., Plant, H., Lynch, D. et al. (1988) Who
Shiely, J.C., Bayliss, M.S., Keller, S.D. et al. (1996) SF-36 should measure quality of life, the doctor or the
Health Survey. Annotated bibliography (1988–1995). patient? British Journal of Cancer, 57: 109–12.
Boston, MA: New England Medical Center. Sloan, J.A., Loprinzi, C.L., Kuross, S.A. et al. (1998)
Shin, D.C. and Johnson, D.M. (1978) Avowed happiness Randomized comparison of four tools measuring
as an overall assessment of the quality of life. Social overall quality of life of patients with advanced cancer.
Indicators Research, 5: 475–92. Journal of Clinical Oncology, 16: 3662–73.
REFERENCES 201
Smith, A. (1987) Qualms about Qalys. Lancet, 1: 1134–6. gerontologist’s new clothes. International Journal of
Smith, A.H.W., Ballinger, B.R. and Presley, A.S. (1981) Aging and Human Development, 50: 297–318.
The reliability and validity of two assessment scales for Spitzer, R.L., Burdock, E.I. and Hardesty, A.S. (1964)
the elderly mentally handicapped. British Journal of Mental Status Schedule. New York: Department of
Psychiatry, 138: 15–16. Psychiatry, College of Physicians and Surgeons,
Smith, H.J., Taylor, R. and Mitchell, A. (2000) A com- Columbia University and Biometrics Research
parison of four quality of life instruments in cardiac Section, New York State Department of Mental
patients: SF-36, QLI, QLMI, and SEIQoL. Heart, Hygiene.
84: 390–4. Spitzer, R.L., Endicott, J., Fliess, J.L. et al. (1970)
Smith, T.W. (1979) Happiness: time trends, seasonal vari- Psychiatric status schedule: a technique for evaluating
ations, inter-survey differences and other mysteries. psychopathology and impairment in role functioning.
Social Psychology Quarterly, 42: 18–30. Archives of General Psychiatry, 23: 41–55.
Snaith, R.P. (1987) The concepts of mild depression. Spitzer, R.L., Endicott, J. and Robins, E. (1978) Research
British Journal of Psychiatry, 150: 387–93. Diagnostic Criteria: rationale and reliability. Archives
Snaith, R.P. (2003) The Hospital Anxiety and Depression of General Psychiatry, 35: 773–82.
Scale. Health and Quality of Life Outcomes, 1: 29. Spitzer, W.O., Dobson, A.J., Hall, J. et al. (1981)
Snaith, R.P. and Taylor, C.M. (1985) Rating scales for Measuring quality of life of cancer patients: a concise
depression and anxiety: a current perspective. British QL-index for use by physicians. Journal of Chronic
Journal of Clinical Pharmacology, 19: 17S–20S (suppl.). Diseases, 34: 585–97.
Sokolovsky, J. (1986) Network methodologies in the Sprangers, M.A.G. and Schwartz, C.E. (1999) Integrating
study of ageing, in J. Keith (ed.) New Methods for Old response shift into health-related quality of life
Age Research. Westport, CT: Greenwood Press, Bergin research: a theoretical model. Social Science and
and Garvey. Medicine, 48: 1507–15.
Spatz, K. and Johnson, F. (1973) Internal consistency of Sprangers, M.A.G., Van Dam, F.S.A.M., Broersen, J. et al.
the Coopersmith Self Esteem Inventory. Educational (1999) Revealing response shift in longitudinal
and Psychological Measurements, 33: 875–6. research on fatigue: the use of the thentest approach.
Spector, W.D., Katz, S., Murphy, J.B. et al. (1987) The Acta Oncologica, 38: 709–18.
hierarchical relationship between activities of daily Spruytte, N., Verschueren, K. and Marcoen, A. (1999)
living and instrumental activities of daily living. Journal Grandparents: their experience of the relationship
of Chronic Diseases, 40: 481–9. with the oldest grand-child and their psychological
Spielberger, C.D., Gorsuch, R.L. and Luchene, R.E. well-being. Tijdschr Gerontology and Geriatrics, 30:
(1970) Manual for the State–Trait Anxiety Inventory 21–30.
Palo Alto, CA: Consulting Psychologists Press. Staniszewska, S. (1999) Patient expectations and health-
Spielberger, C.D., Davidson, K., Lighthall, F. et al. (1973) related quality of life. Health Expectations, 2: 93–104.
STAI Preliminary Manual. Palo Alto, CA: Consulting Stansfeld, S.A. (1999) Social support and social cohesion,
Psychologists Press. in M. Marmot and R.G. Wilkinson (eds) Social
Spielberger, C.D., Gorsuch, R.L., Luchene, R.E. et al. Determinants of Health. Oxford: Oxford University
(1983) Manual for the State–Trait Anxiety Inventory Press.
(revised edition). Palo Alto, CA: Consulting Psycholo- Stansfeld, S.A. and Marmot, M.G. (1992) Social class and
gists Press. minor psychiatric disorder in British civil servants: a
Spilker, B. (ed.) (1996) Pharmacoeconomics and Quality validated screening survey using the GHQ. Psycho-
of Life in Clinical Trials, 2nd edn. Philadelphia: logical Medicine, 22: 739–49.
Lippincott-Raven. Stanwyck, D.J. and Garrison, W.M. (1982) Detecting on
Spilker, B., Molinek, F.R., Johnson, K.A. et al. (1990) faking on the Tennessee Self-Concept Scale. Journal
Quality of life bibliography and indexes. Medical Care, of Personality Assessment, 46: 426–31.
28 (supplement 12): DSl–DS77. Steer, R.A., Beck, A.T. and Garrison, B. (1986)
Spilker, B., Simpson, R. and Tilson, H. (1992a) Quality of Applications of the Beck Depression Inventory, in
life bibliography and indexes: 1991 update. Journal of N. Sartorius and T.A. Ban (eds) Assessment of Depres-
Clinical Research Pharmacoepidemiology, 6: 205–66. sion. Berlin: Springer-Verlag.
Spilker, B., White, W.S.A., Simpson, R.L. and Tilson, Steer, R.A., Ball, R., Ranieri, W.F. and Beck, A.T. (1999)
H.H. (1992b) Quality of life bibliography and Dimensions of the Beck Depression Inventory-II in
indexes: 1990 update. Journal of Clinical Research clinically depressed out-patients. Journal of Clinical
Pharmacoepidemiology, 6: 87–156. Psychology, 55: 117–28.
Spiro, A. and Bossè, R. (2000) Relations between Steer, R.A., Rissmiller, D.J. and Beck, A.T. (2000) Use of
health-related quality of life and well-being: the the Beck Depression Inventory-II with depressed
202 REFERENCES
geriatric inpatients. Behavioral Research and Therapy, Quality-of-life assessment in the old using the
38: 311–18. WHOQOL 100: differences between patients with
Steer, R.A., Brown, G.K., Beck, A.T. and Sanderson, senile dementia and patients with cancer. Int. Psycho-
W.C. (2001) Mean Beck Depression Inventory-II geriatr., 11: 273–9.
scores by severity of major depressive episode. Psycho- Stueve, A. and Lein, L. (1979) Problems in network
logical Reports, 88: 1075–6. analysis: the case of the missing person. Paper pre-
Stehouwer, R.S. (1985) Beck Depression Inventory, sented at 32nd annual general meeting of the Geronto-
in D.J. Keyser and R.C. Sweetland (eds) Test logical Society of America.
Critiques, vol. II. Kansas City, MI: Test Corporation of Stull, D.E. (1987) Conceptualization and measurement
America. of well-being: implications for policy evaluation, in
Stewart, A.L. and Ware, J.E. (1992) Measuring Functioning E.F. Borgatta and R.J.V. Montgomery (eds) Critical
and Well-being: The Medical Outcomes Study Approach. Issues in Ageing Policy. Beverly Hills, CA: Sage
Durham, NC: Duke University Press. Publications.
Stewart, A.L., Ware, J.E., Brook, R.H. et al. (1978) Con- Sugisawa, H., Liang, J., Liu, X. et al. (1994) Social
ceptualization and Measurement of Health for Adults in the networks, social support, and mortality among older
Health Insurance Study: vol. II: Physical Health in Terms people in Japan. Journal of Gerontology, 49: S3–13.
of Functioning. Santa Monica, CA: Rand Corporation: Sullivan, C.F., Copeland, J.R.M., Dewey, M.E. et al.
R-1987/2-HEW. (1988) Benzodiazepine usage amongst the elderly:
Stewart, A.L., Ware, J.E. and Brook, R.H. (1981) findings of the Liverpool community survey.
Advances in the measurement of functional status: International Journal of Geriatric Psychiatry, 3: 289–92.
construction of aggregate indexes. Medical Care, 19: Sullivan, M., Ahlmen, M. and Bjelle, A. (1990) Health
473–88. status assessment in rheumatoid arthritis: 1. Further
Stewart, A.L., Hays, R.D. and Ware, J.E. (1988) The MOS work on the validity of the Sickness Impact Profile.
Short-form General Health Survey. Reliability Journal of Rheumatology, 17: 439–47.
and validity in a patient population. Medical Care, Sullivan, M., Karlsson, J. and Ware, J.R. (1995) The
26: 724–35. Swedish SF-36 Health Survey – 1. Evaluation of data
Stewart, A.L., Greenfield, S., Hays, R. D. et al. (1989) quality, scaling assumptions, reliability and construct
Functional status and well-being of patients with validity across general populations in Sweden. Special
chronic conditions: results from the medical outcomes Issue ‘Quality of Life’ in Social Science and Medicine,
study. Journal of the American Medical Association, 10: 1349–58.
262: 907–13. Sultan, N., Pope, J.E. and Clements, P.J. (2004) The health
Stineman, M.G., Escarce, J.J., Goin, J.E. et al. (1994) assessment questionnaire (HAQ) is strongly predictive
A case mix classification system for medical rehabilita- of good outcome in early diffuse scleroderma: results
tion. Medical Care, 32: 366–79. from an analysis of two randomised controlled trials in
Stock, W.A. and Okun, M.A. (1982) The construct early diffuse scleroderma. Rheumatology, 43: 472–8.
validity of life satisfaction among the elderly. Journal Surtees, P.G. (1987) Psychiatric disorder in the com-
of Gerontology, 37: 625–7. munity and the General Health Questionnaire. British
Stokes, J.P. (1983) Predicting satisfaction with social sup- Journal of Psychiatry, 150: 828–35.
port from social network structure. American Journal of Sutcliffe, C., Cordingley, L., Burns, A. et al. (2000) A new
Community Psychology, 11: 141–52. version of the Geriatric Depression Scale for nursing
Stokes, J.P. (1985) The relation of social network and and residential home populations: the Geriatric
individual difference variables to loneliness. Journal of Depression Scale (residential) (GDS-12R). Inter-
Personality and Social Psychology, 48: 981–90. national Psychogeriatrics, 12: 173–81.
Stokes, J.P. and Wilson, D.G. (1984) The Inventory Swain, D.G. and Nightingale, P.G. (1997) Evaluation of
of Socially Supportive Behaviours: dimensionality, a shortened version of the Abbreviated Mental Test
prediction and gender differences. American Journal in a series of elderly patients. Clinical Rehabilitation,
of Community Psychology, 12: 53–70. 11: 243–8.
Stones, M.L. and Kozma, A. (1980) Issues relating to the Swain, D.G., O’Brien, A.G. and Nightingale, P.G. (1999)
usage of conceptualizations of mental constructs Cognitive assessment in elderly patients admitted to
employed by gerontologists. International Journal of hospital: relationship between the abbreviated Mental
Ageing and Human Development, 11: 269–81. Test and the Mini-mental State Examination. Clinical
Streiner, D.L. and Norman, G.R. (2003) Health Measure- Rehabilitation, 13: 503–8.
ment Scales: A Practical Guide to their Development and Swindells, S., Mohr, J., Justis, J.C. et al. (1999) Quality
Use, 3rd edn. Oxford: Oxford University Press. of life in patients with human immunodeficiency
Struttmann, T., Fabro, M., Romieu, G. et al. (1999) virus infection: Impact of social support, coping style
REFERENCES 203
and hopelessness. International Journal of STD and in geriatric major depression. International Journal of
AIDS, 10: 383–91. Clinical Psychopharmacology, 8: 253–9.
Tamaklo, W., Schubert, D.S., Mentari, A. et al. (1992) Tolsdorf, C.C. (1976) Social networks, support and
Assessing depression in the medical patient using coping: an exploratory study. Family Process, 15:
the MADRS, a sensitive screening scale. Journal of 407–17.
Integrative Psychiatry, 8: 264–70. Toner, J., Gurland, B. and Teresi, J. (1988) Comparison of
Tardy, C.H. (1985) Social support measurement. self-administered and rater-administered methods of
American Journal of Community Psychiatry, 13: assessing levels of severity of depression in elderly
187–202. patients. Journal of Gerontology, 43: 136–40.
Tarnopolsky, A., Hand, D.J., McLean, E.K. et al. (1979) Torrance, G.W. (1986) Measurement of health state
Validity and uses of a screening questionnaire (GHQ) utilities for economic appraisal. Journal of Health
in the community. British Journal of Psychiatry, 134: Economics, 5: 1–30.
508–15. Torrance, G.W. (1987) Utility approach to measuring
Taylor, J. and Reitz, W. (1968) The Three Faces of Self- health related quality of life. Journal of Chronic Diseases,
esteem. London: University of Western Ontario, 40: 593–600.
Department of Psychology. Research Bulletin no. 80. Torrance, G.W., Thomas, W.H. and Sackett, D.L. (1972)
Teeling Smith, G. (1988) Measuring Health: A Practical A utility maximisation model for the evaluation
Approach. Chichester: John Wiley. of health care programs. Health Services Research, 7:
Tennant, C. (1977) The General Health Questionnaire: a 118–33.
valid index of psychological impairment in Australian Torrance, G.W., Boyle, M.H. and Horwood, S.P. (1982)
populations. Medical Journal of Australia, 2: 392–4. Application of multiattribute utility theory to measure
Tentes, A.A., Tripsiannis, G., Markakidis, S.K. et al. (2003) social preferences for health states. Operations Research,
Peritoneal cancer index: a prognostic indicator of 30: 1043–69.
survival in advanced ovarian cancer, 29: 69–73. Tovbin, D., Gidron, Y., Jean, T. et al. (2003) Relative
Thoits, P.A. (1982) Conceptual, methodological and importance and interrelations between psycho-
theoretical problems in studying social support as a social factors and individualized quality of life of
buffer against life stress. Journal of Health and Social hemodialysis patients. Quality of Life Research, 12: 709–
Behaviour, 23: 145–59. 17.
Thomas, M.R. and Lyttle, D. (1980) Patient expectations Townsend, P. (1962) The Last Refuge. London: Routledge
about success of treatment and reported relief from and Kegan Paul.
low back pain. Journal of Psychosomatic Research, 24: Townsend, P. (1979) Poverty in the United Kingdom. Har-
297–301. mondsworth: Pelican.
Thompson, C. (1984) The Reliability of a Schedule for Trowbridge, N. (1970) Effects of socio-economic class
Assessing Dependency in the Elderly in Residential Care. on self-concept of children. Psychology in the Schools, 7:
Manchester: University of Manchester. Working 304–6.
papers in applied social research, no. 2. Trueman, P. and Duthie, T. (1998) Use of the Hospital
Thompson, P. (1989) Affective disorders, in P. Thompson Anxiety and Depression Scale (HAD) in a large,
(ed.) The Instruments of Psychiatric Research. Chichester: general population survey of epilepsy. Quality of Life
John Wiley. Newsletter, 19: 9–10.
Thompson, P. and Blessed, G. (1987) Correlation Tugwell, P., Bombardier, C., Buchanan, W.W. et al.
between the 37-item Mental Test Score and abbrevi- (1987) The MACTAR patient preference disability
ated 10-item Mental Test Score by psychogeriatric day questionnaire: an individualised functional priority
patients. British Journal of Psychiatry, 151: 206–9. approach for assessing improvement in physical dis-
Thompson, W. (1972) Correlates of the Self Concept. ability in clinical trials in rheumatoid arthritis. Journal
Nashville, TN: Counselor Recordings and Tests. of Rheumatology, 14: 446–51.
Thuriaux, M.C. (1988) Health promotion and indicators Tugwell, P., Bombardier, C., Buchanan, W.W. et al. (1990)
for health for all in the European region. Health Promo- Methotrexate in rheumatoid arthritis: Impact on
tion, 3: 89–99. quality of life assessed by traditional standard-item
Toevs, C.D., Kaplan, R.M. and Atkins, C.J. (1984) The and individualised patient preference health status
costs and effects of behavioral programs in chronic questionnaire. Archives of Internal Medicine, 150: 59–62.
obstructive pulmonary disease. Medical Care, 22: Tully, M. and Cantrill, J. (2000) The validity of the
1088–100. modified Patient Generated Index – a quantitative
Tollefson, G.D. and Holman, S.L. (1993) Analysis of the and qualitative approach. Quality of Life Research, 9:
Hamilton Depression Rating Scale factors from a 509–20.
double-blind, placebo-controlled trial of fluoxetine Tully, M. and Cantrill, J. (2002) The test-retest reliability
204 REFERENCES
of the modified Patient Generated Index. Journal of collected with the EuroQol questionnaire. Social
Health Services Research and Policy, 7: 81–9. Science and Medicine, 39: 1537–44.
Turner-Bowker, D.M., Bayliss, M.S., Ware, J.E. and Van de Lisdonk, E.H. and van Weel, C. (1992) Cataract
Kosinski, M. (2003) Usefulness of the SF-8 Health and functional status, in J.H. Scholten and C. van Weel
Survey for comparing the impact of migraine and (eds) Functional Status Assessment in Family Practice.
other conditions. Quality of Life Research, 12: S1003– Lelystad: Meditekst.
1012. Van-Marwijk, H.W., Wallace, P., de-Bock, G.H. et al.
Turner-Stokes, L., Nyein, K., Turner-Stokes, T. and (1995) Evaluation of the feasibility, reliability and
Gatehouse, C. (1999) The FIM and FAM: develop- diagnostic value of shortened versions of the geriatric
ment and evaluation. Clinical Rehabilitation, 13: depression scale. British Journal of General Practice, 45:
277–87. 195–9.
Twining, T.C. and Allen, D.G. (1981) Disability factors van Weel, C. and Scholten, J.H.G. (1992) Report of an
among residents in old people’s homes. Journal of international workshop of the WONCA Research
Epidemiology and Community Health, 36: 303–5. and Classification committee, in J.H. Scholten and
Tyrer, P., Seivewright, N., Murphy, S. et al. (1988) The C. van Weel (eds) Functional Status Assessment in Family
Nottingham study of neurotic disorder: comparison Practice. Lelystad: Meditekst.
of drug and psychological treatments. Lancet, 2: van Weel, C., König-Zahn, C., Touw-Otten, F.W.M.M.
235–40. et al. (1995) COOP/WONCA Charts. A Manual. The
Tzeng, O.C., Maxey, W.A., Fortier, R. and Landis, D. Netherlands: World Organization of Family Doctors,
(1985) Construct evaluation of the Tennessee Self European Research Group on Health Outcomes,
Concept Scale. Educational and Psychological Measure- Northern Centre for Health Care Research: Uni-
ment, 45: 63–78. versity of Groningen.
Unden, A.L. and Orth-Gomer, K. (1989) Development Vardon, V.M. and Blessed, G. (1986) Confusion ratings
of a social support instrument for use in population and abbreviated mental test performance: a com-
surveys. Social Science and Medicine, 29: 1387–92. parison. Age and Ageing, 15: 139–44.
Usui, W.M., Keil, T.J. and Durig, K.R. (1985) Socio- Vaux, A. and Wood, J. (1985) Social support resources,
economic comparisons and life satisfaction of elderly behaviors and appraisals: a path analysis. Paper pre-
adults. Journal of Gerontology, 40: 110–14. sented at the meeting of the Midwestern Psycho-
Uutela, T., Hakala, M. and Kautiainen, H. (2003) Validity logical Association, Chicago.
of the Nottingham Health Profile in a Finnish Vaux, A., Burda, P. and Stewart, D. (1986a) Orientation
out-patient population with rheumatoid arthritis. toward utilizing support resources. Journal of Com-
Rheumatology, 42: 841–5. munity Psychology, 14: 159–70.
Uyl-de-Groot, C.A., Rutten, F.F.H. and Bonsel, G.J. Vaux, A., Phillips, J., Holly, L. et al. (1986b) The social
(1994) Measurement and valuation of quality of life support appraisals (SS-A) scale: studies of reliability
in economic appraisal of cancer treatment. European and validity. American Journal of Community Psychology,
Journal of Cancer, 30A: 111–17. 14: 195–219.
Vacchiano, R.B. and Strauss, P.S. (1968) The construct Vaux, A., Riedel, S. and Stewart, D. (1987) Modes of
validity of the Tennessee Self Concept Scale. Journal social support: the social support behaviours (SS-B)
of Clinical Psychology, 24: 323–6. scale. American Journal of Community Psychology, 15:
Valdenegro, J. and Barrera, M. (1983) Social support as 209–337.
a moderator of life stress: a longitudinal study using a Veenhoven, R. (1991) Is happiness relative? Social
multi-method analysis. Paper presented at the meeting Indicators Research, 24: 1–34.
of the Western Psychological Association, San Veenhoven, R. (1993) Happiness in Nations: Subjective
Francisco, California. Appreciation of Life in 56 Nations 1946–1992.
Valtysdottir, S.T., Gudbjörnsson, B., Lindqvist, U. et al. (Rotterdam, Netherlands: RISBO, Erasmus University
(2000) Anxiety and depression in patients with of Rotterdam).
primary Sjögren’s syndrome. Journal of Rheumatology, Veenhoven, R. (2000) The four qualities of life. Ordering
27: 165–9. concepts and measures of the good life. Journal of
Van Agt, H.M.E., Esssink-Bot, M.L., van der Meer, J.B.W. Happiness Studies, 1: 1–39.
and Bonsel, G.J. (1993) The NHP (Dutch version) Veenhoven, R. (2002) Why social policy needs subjective
in general and specified populations. Paper presented indicators. Social Indicators Research, 58: 33–45.
to the Fifth European Health Services Research Ventegodt, S., Merrick, J. and Andersen, N.J. (2003)
Conference, Maastricht, December. Measurement of quality of life III. From the IQOL
Van Agt, H.M.E., Essinck-Bot, M.L., Krabbe, P.F.M. et al. theory: the global generic SEIQoL questionnaire.
(1994) Test-retest reliability of health state evaluations Scientific World Journal, 3: 972–91.
REFERENCES 205
Vetter, N. and Ford, D. (1989) Anxiety and depression A.J. (1987) Multiple replication of the factor structure
scores in elderly fallers. International Journal of Geriatric of the inventory of socially supportive behaviors.
Psychiatry, 4: 159–63. Journal of Community Psychology, 15: 513–19.
Vetter, N., Jones, D.A. and Victor, C.R. (1982) The Wallace, J.L. and Vaux, A. (1993) Social network orienta-
importance of mental disabilities for the use of services tion: the role of adult attachment style. Journal of Social
by the elderly. Journal of Psychosomatic Research, 26: and Clinical Psychology, 12: 354–65.
607–12. Wallston, K.A., Brown, G.K., Stein, M.J. and Dobbins,
Vetter, N., Smith, A., Sastry, D. and Tinker, G. (1989) C.J. (1989) Comparing the long and short versions of
Day Hospital – Pilot Study Report. Cardiff, S. Wales: the Arthritis Impact Measurement Scales. Journal of
Research Team for the Care of Elderly People, Rheumatology, 16: 1105–9.
Department of Geriatrics, St David’s Hospital. Wallwork, J. and Caine, N. (1985) A comparison of
Vieweg, B.W. and Hedlund, J.L. (1983) The General the quality of life of cardiac transplant patients and
Health Questionnaire: a comprehensive review. Journal coronary artery bypass graft patients before and
of Operational Psychiatry, 14: 74–81. after surgery. Quality of Life and Cardiovascular Care,
Vincent, J. (1968) An explanatory factor analysis 1: 317–31.
relating to the construct validity of self concept Walsh, J.A. (1984) Tennessee Self Concept Scale,
labels. Educational and Psychological Measurement, 28: in D.J. Keyser and R.C. Sweetland (eds) Test
915–21. Critiques, vol. 1. Kansas City, MI: Test Corporation of
von Strauss, E., Aguero-Torres, H., Kareholt, I. et al. America.
(2003) Women are more disabled in basic activities of Walters, S.J., Munro, J.F. and Brazier, J.E. (2001) Using the
daily living than men in very advanced ages: a study SF-36 with older adults: a cross-sectional community
on disability, morbidity, and mortality from the based study. Age and Ageing, 30: 337–43.
Kungsholmen Project. Journal of Clinical Epidemiology, Wan, T.T.H. and Livieratos, B. (1977) A validation of the
56: 669–77. General Well-being Index: a two-stage multivariate
Wade, D.T. (1992) Measurement in Neurological Rehabilita- approach. Paper presented at American Public Health
tion. Oxford: Oxford University Press. Association meeting, Washington, DC.
Wade, D.T. and Collin, C. (1988) The Barthel ADL Ward, R.A. (1977) The impact of subjective age and
Index: a standard measure of physical disability. stigma on older persons. Journal of Gerontology, 32:
International Disability Studies, 10: 64–7. 227–32.
Wade, D.T. and Langton-Hewer, R. (1987) Functional Ward, R.A., Sherman, S.R. and LaGory, M. (1984)
abilities after stroke: Measurement, natural history Subjective network assessments and subjective well-
and prognosis. Journal of Neurology, Neurosurgery and being. Journal of Gerontology, 39: 93–101.
Psychiatry, 50: 177–82. Ware, J.E. (1984) Methodological considerations in the
Wade, D.T., Legh-Smith, G.L. and Langton-Hewer, R. selection of health status assessment procedures, in
(1985) Social activities after stroke: Measurement and N.K. Wenger, M.E. Mattson, C.D. Furberg et al. (eds)
natural history using the Frenchay Activities Index. Assessment of Quality of Life in Clinical Trials of Cardio-
International Rehabilitation Medicine, 7: 176–81. vascular Therapies. New York: Le Jacq.
Wade, D.T., Collen, F.M., Robb, G.F. and Warlow, C.P. Ware, J.E. (1993) Measuring patients’ views: the opti-
(1992) Physiotherapy intervention late after stroke and mum outcome measure. British Medical Journal, 306:
mobility. British Medical Journal, 304: 609–13. 1429–30.
Waldron, D., O’Boyle, C.A., Kearney, M. et al. (1999) Ware, J.E. and Karmos, A.H. (1976) Development and
Quality of life measurement in advanced cancer: Validation of Scales to Measure Perceived Health and
assessing the individual. Journal of Clinical Oncology, 17: Patient Role Propensity, vol. 2 of a final report. Carbon-
3603–11. dale, IL: Southern Illinois University School of
Walker, A., Maher, J., Coulthard, M. et al. (2001) Living in Medicine, publication no. PB 288–331.
Britain. Results from the General Household Survey 2001. Ware, J.E. and Young, J. (1979) Issues in the concep-
London: The Stationery Office. tualization and measurement of value placed on
Walker, C.E. and Kaufman, K. (1984) State–Trait health, in S.J. Mushkin and D.W. Dunlop (eds) Health:
Anxiety Inventory for Children, in D.J. Keyser and What is it Worth? New York: Pergamon Press.
R.C. Sweetland (eds) Test Critiques, Vol. I. Kansas City, Ware, J.E., Johnson, S.A., Davies-Avery, A. et al. (1979)
MO: Test Corporation of America. Conceptualization and Measurement of Health for Adults in
Walker, K., Macbride, A. and Vachon, M.L.S. (1977) the Health Insurance Study, vol. III, Mental Health. Santa
Social support networks and the crisis of bereavement. Monica, CA: Rand Corporation: R1987/3-HEW.
Social Science and Medicine, 11: 34–41. Ware, J.E., Brook, R.H., Davies-Avery, A. et al. (1980)
Walkey, F. H., Seigert, R.J., McCormick, I.A. and Taylor, Conceptualization and Measurement of Health for Adults in
206 REFERENCES
the Health Insurance Study, vol. VI, Analysis of Relation- of the Cornell Medical Index. New York: Weill
ships among Health Status Measures. Santa Monica, CA: Medical College of Cornell University, Weill Cornell
Rand Corporation: R1987/6-HEW. Medical Library (available on line http://
Ware, J.E., Sherbourne, C.D. and Davies, A.R. (1992) library.med.cornell.edu/library).
Developing and testing the MOS 20-item Short Form Weinberger, M., Tierney, W.M., Booher, P. and Hiner,
Health Survey: A general population application, in S.L. (1990) Social support, stress and functional
A.L. Stewart and J.E. Ware (eds) Measuring Functioning status in patients with osteoarthritis. Social Science and
and Well-being: The Medical Outcomes Study Approach. Medicine, 30: 503–8.
Durham, NC: Duke University Press. Weiss, R.S. (1973) Loneliness: The Experience of Emotional
Ware, J.E., Snow, K.K., Kosinski, M. and Gandek, B. and Social Isolation. Cambridge, MA: MIT Press.
(1993) SF-36 Health Survey: Manual and Interpretation Welin, L., Tibblin, G., Svardsudd, K. et al. (1985) Pro-
Guide. Boston, MA: The Health Institute, New spective study of the social influences on mortality.
England Medical Center. The study of men born in 1913 and 1923. Lancet, 1:
Ware, J.E., Kosinski, M. and Keller, S.D. (1994) SF-36 915–18.
Physical and Mental Health Summary Scales: A User’s Wells, E. and Marwell, G. (1976) Self Esteem. Beverly
Manual. Boston, MA: The Health Institute, New Hills, CA: Sage Publications.
England Medical Center. Wells, K.B., Hays, R.D., Burnam, M.A. et al. (1989a)
Ware, J.E., Kosinski, M. and Keller, S.D. (1995) How to Detection of depressive disorder for patients receiving
score the SF-12 Physical and Mental Health Summary pre-paid or fee for service care: results from the
Scales, 2nd edn. Boston, MA: The Health Institute, medical outcomes study. Journal of the American Medical
New England Medical Center. Association, 262: 3298–302.
Ware, J.E., Kosinski, M. and Keller, S.D. (1996a) SF-12. Wells, K.B., Stewart, A., Hays, R.D. et al. (1989b) The
An Even Shorter Health Survey. Boston, MA: Medical functioning and well-being of depressed patients:
Outcomes Trust Bulletin, 4: 2. results from the medical outcomes study. Journal of the
Ware, J.E., Kosinski, M. and Keller, S.D. (1996b) A 12- American Medical Association, 262: 914–19.
item short-form health survey. Construction of Wenger, G.C. (1989) Support networks in old age –
scales and preliminary tests of reliability and validity. constructing a typology, in M. Jefferys (ed.) As Britain
Medical Care, 34: 220–33. Ages. London: Routledge.
Ware, J.E., Snow, K.K., Kosinski, M. and Gandek, B. Wenger, G.C. (1992) Help in Old Age – Facing up to
(1997) SF-36 Health Survey: Manual and Interpretation Change. Liverpool: Liverpool University Press.
Guiide. Revised edition. Boston, MA: The Health Insti- Wenger, G.C. (1994) Support Networks of Older People:
tute, New England Medical Center. A Guide for Practitioners. Bangor: Centre for Social
Warr, P. (1978) A study of psychological well-being. Policy Research and Development, University of
British Journal of Psychology, 69: 111–21. Wales.
Warr, P. (1999) Well-being and the workplace, in Wenger, G.C. (1995) Practitioner Assessment of Network
D. Kahnerman, E. Diener and N. Schwarz (eds) Well- Type (PANT). Training and Resource Pack. Brighton:
being: The Foundations of Hedonic Psychology. New York: Pavilion Press.
Russell Sage. Wenger, G.C. and Shahtahmasebi, S. (1991) Survivors:
Watchel, T., Piette, J., Mor, V. et al. (1992) Quality of support network variation and sources of help in rural
life in persons with human immunodeficiency virus communities. Journal of Cross-Cultural Gerontology, 6:
infection: Measurement by the medical outcomes 41–82.
study instrument. Annals of Internal Medicine, 116: Wenger, N.K., Mattson, M.E., Furberg, C.D. et al. (eds)
129–37. (1984) Assessment of Quality of Life in Clinical Trials of
Watson, E. and Evans, S. (1986) An example of cross- Cardiovascular Therapies. New York, Le Jacq.
cultural measurement of psychological symptoms in Wentowski, G. (1982) Reciprocity and the coping
post-partum mothers. Social Science and Medicine, 23: strategies of older people: cultural dimensions of
869–74. network building. Gerontologist, 21: 600–9.
Webb, E.J., Campbell, D.T., Schwartz, R.D. et al. (1966) Wettergren, L., Björkholm, M., Axdorph, U. et al. (2003)
Unobtrusive Measures: Non-reactive Research in the Social Individual quality of life in long-term survivors of
Sciences. Chicago, IL: Rand McNally College Hodgkin’s lymphoma – a comparative study. Quality
Publishing. of Life Research, 12: 545–54.
Weckowicz, T., Muir, W. and Cropley, A. (1967) A factor WHOQOL Group (1993) Measuring quality of life: the
analysis of the Beck Inventory of Depression. Journal development of the World Health Organization
of Consulting Psychology, 31: 23–8. Quality of Life Instrument (WHOQOL). Geneva:
Weill Cornell Medical Library (2003) A brief history World Health Organization.
REFERENCES 207
WHOQOL Group (1994) The development of the Lives in Public Places. London: Tavistock Press.
WHO quality of life assessment instrument (the Williams, A. (1985) The value of QALYS. Health and
WHOQOL), in J. Orley and W. Kuyken (eds) Quality Social Services Journal, 95: 3–5.
of Life Assessment: International Perspectives. Heidelberg: Williams, J.G., Barlow, D.H. and Agras, W.S. (1972)
Springer-Verlag. Behavioural measurement of severe depression. Arch-
WHOQOL Group (1995) The World Health Organiza- ives of General Psychiatry, 27: 330–3.
tion quality of life assessment (WHOQOL): position Williams, J.M.G. (1984) The Psychology of Depression.
paper from the World Health Organization. Special Beckenham: Croom Helm.
Issue ‘Quality of Life’ in Social Science and Medicine, Williams, P. (1987) Depressive thinking in general
10: 1403–9. practice patients, in P. Freeling, L.J. Downey and
WHOQOL Group (1996) The World Health Organiza- J.C. Malkin (eds) The Presentation of Depression: Current
tion Quality of Life (WHOQOL) Assessment Instru- Approaches. Royal College of General Practitioners,
ment, in B. Spilker (ed.) Quality of Life and Pharma- Occasional Paper, 36: 17–20.
coeconomics in Clinical Trials, 2nd edn. Hagerstown, MD: Williams, R.G.A., Johnston, M., Willis, M. et al. (1976)
Lippincott-Raven Publishers. Disability: a model and measurement technique.
WHOQOL Group (1998a) The World Health Organ- British Journal of Preventive and Social Medicine, 30:
ization Quality of Life Assessment (WHOQOL): 71–8.
development and general psychometric properties. Williams, S.J. and Bury, M.J. (1989) Impairment, dis-
Social Science and Medicine, 46: 1569–85. ability and handicap in chronic respiratory illness.
WHOQOL Group (1998b) Development of the World Social Science and Medicine, 29: 609–16.
Health Organization WHOQOL-BREF Quality of Wilson, L.A. and Brass, W. (1973) Brief assessment of
Life Assessment. Psychological Medicine, 28: 551–8. the mental state in geriatric domiciliary practice: the
Wiest, W.M. (1965) A qualitative extension of Heider’s usefulness of the mental status questionnaire. Age and
theory of cognitive balance applied to interpersonal Ageing, 2: 92–101.
perception and self esteem. Psychological Monographs: Wilson, L.A., Roy, S.K. and Bursill, A.E. (1973) The
General and Applied, 79: 1–20. reliability of the mental status questionnaire in
Wig, N.N., Singh, S., Sahasi, G. et al. (1970) Psychiatric geriatric practice. (Unpublished, cited in Wilson and
symptoms following vasectomy. Indian Journal of Brass.)
Psychiatry, 12: 169–76. Wilson-Barnett, J. (1981) Assessment of recovery:
Wilkin, D. (1987) Conceptual problems in dependency with special reference to a study with post-operative
research. Social Science and Medicine, 24: 867–73. cardiac patients. Journal of Advanced Nursing, 6:
Wilkin, D. and Jolley, D.J. (1979) Behavioural Problems 435–45.
among Old People in Geriatric Wards, Psychogeriatric Wing, J.K. (1991) Measuring and classifying clinical dis-
Wards and Residential Homes, 1976–78. Research orders: learning from the PSE, in P.E. Bebbington
Report no. 1, Research Section, Psychiatric Unit, (ed.) Social Psychiatry: Theory, Methodology and Practice.
University Hospital of South Manchester. London: Transaction Publishers.
Wilkin, D. and Thompson, C. (1989) User’s Guide to Wing, J.K., Cooper, J.E. and Sartorius, N. (1974) The
Dependency Measures for Elderly People. Sheffield Measurement and Classification of Psychiatric Symptoms:
Social Services Monographs: Research in Practice. An Instruction Manual for the PSE and CATEGO
University of Sheffield, Joint Unit for Social Services Program. Cambridge: Cambridge University Press.
Research. Wollstadt, L.J., Glasser, M. and Nutter, T. (1997) Vari-
Wilkin, D., Hallam, L. and Doggett, M. (1992) Measures of ations in functional status among different groups
Need and Outcome for Primary Care. New York: Oxford of elderly people. Family Medicine, 29: 394–9.
Medical Publications. Wood, V., Wylie, M.L. and Scheafor, B. (1969) An analysis
Wilkinson, M.J.B. and Barczak, P. (1988) Psychiatric of a short self-report measure of life satisfaction: corre-
screening in general practice: Comparisons of the lation with rater judgements. Journal of Gerontology,
General Health Questionnaire and the Hospital 24: 465–9.
Anxiety and Depression Scale. Journal of the Royal Wood-Dauphinee, S. and Williams, J.I. (1991) The
College of General Practitioners, 38: 311–13. Spitzer Quality of Life Index: Its performance as a
Wilkinson, P.R., Wolfe, C.D., Warburton, F.G. et al. measure, in D. Osoba (ed.) Effect of Cancer on Quality of
(1997) Longer term quality of life and outcome in Life. Boston, MA: CRC Press.
stroke patients: is the Barthel Index alone an adequate Woods, N.F., Lentz, M., Mitchel, E. and Oakley, L.D.
measure of outcome? Quality in Health Care, 6: (1994) Depressed mood and self-esteem in young
125–30. Asian, black and white women in America. Health
Willcocks, D., Peace, S. and Kellaher, L. (1987) Private Care and Women International, 15: 243–62.
208 REFERENCES
World Health Organization (1948a) Preamble to the Con- the assessment of disability in chronic airflow limita-
stitution of the World Health Organization as adopted by tion in old age. Age and Ageing, 27: 369–74.
the International Health Conference, New York 19–22 June Young, F.B., Lees, K.R. and Weir, C.J. (2003) Strengthen-
1946. Geneva: World Health Organization. ing acute stroke trials through optimal use of disability
World Health Organization (1948b) Official Records of the end points. Stroke, 34: 2676–80.
World Health Organization, no. 2, p. 100. Geneva: World Zarit, S.H., Miller, N.E., Kahn, R.L. (1978) Brain
Health Organization. function, intellectual impairment and education in
World Health Organization (1979) Handbook for the aged. Journal of the American Geriatrics Society, 26:
Reporting Results of Cancer Treatments. WHO Offset 58–67.
Publication No. 48. Geneva: World Health Zigmond, A.S. and Snaith, R.P. (1983) The Hospital
Organization. Anxiety and Depression Scale. Acta Psychiatrica
World Health Organization (1980) International Classifica- Scandinavica, 67: 361–70.
tion of Impairments, Disabilities and Handicaps. Geneva: Ziller, R.C., Hagey, J., Smith, M.D. et al. (1969) Self-
World Health Organization. esteem: a social construct. Journal of Consulting and
World Health Organization (1985) Targets for Health for Clinical Psychology, 33: 84–95.
All by the Year 2000. Copenhagen: World Health Ziller, R.C. (1974) Self-other orientations and quality of
Organization. Regional Office for Europe. life. Social Indicators Research, 1: 301–27.
World Health Organization (1992–4) International Zissi, A., Barry, M.M. and Cochrane, R. (1998) A
Classification of Diseases (10th revision). Vols I–III. mediational model of quality of life for individuals
Geneva: World Health Organization. with severe mental health problems. Psychological
World Health Organization (1998) ICIDH-2. Inter- Medicine, 28: 1221–30.
national Classification of Impairments, Activities and Zubrod, C.G., Schneiderman, M., Frei, E. et al. (1960)
Participation. A Manual of Dimensions of Disablement and Appraisal of methods for the study of chemotherapy
Functioning, Geneva: World Health Organization. of cancer in man: Comparative therapeutic trial of
World Health Organization (2001) International Classifica- nitrogen mustard and triethylene thiophosphoramide.
tion of Functioning, Disability and Health. Geneva: World Journal of Chronic Diseases, 11: 7–33.
Health Organization. Zuccala, G., Pedone, C., Cesari, M. et al. (2003) The
Wylie, M.L. (1970) Life satisfaction as a program impact effects of cognitive impairment on mortality among
criterion. Journal of Gerontology, 25: 36–40. hospitalized patients with heart failure. American
Wylie, R.C. (1974) The Self Concept. Lincoln, NE: Journal of Medicine, 115: 97–103.
University of Nebraska Press. Zung, W.W.K. (1965) A self-rating depression scale.
Yates, J.W., Chalmer, B. and McKegney, F.P. (1980) Archives of General Psychiatry, 12: 63–70.
Evaluation of patients with advanced cancer using Zung, W.W.K. (1967) Depression in the normal aged.
the Karnofsky Performance Status. Cancer, 45: Psychosomatics, 8: 287–92.
2220–4. Zung, W.W.K. (1972) The Depression Status Inventory:
Yen, I.H. and Kaplan, G.A. (1999) Poverty area residence An adjunct to the self-rating depression scale. Journal of
and changes in depression and perceived health Clinical Psychology, 28: 539–43.
status: evidence from the Alameda County Study. Zung, W.W.K. (1986) Zung Self-Rating Depression
International Journal of Epidemiology, 28: 90–4. Scale and Depression Status Inventory, in N. Sartorius
Yesavage, J.A., Brink, T.L., Rose, T.L. et al. (1983) and T.A. Ban (eds) Assessment of Depression. Heidel-
Development and validation of a geriatric depression berg: Springer-Verlag.
screening scale – a preliminary report. Journal of Zung, W.W.K., Richards, C.B. and Short, M.J. (1965)
Psychiatric Research, 17: 37–49. Self rating depression scale in an out-patient clinic:
Yohannes, Y., Roomi, J., Waters, K. and Connolly, M. further validation of the ZDS. Archives of General
(1998) A comparison of the Barthel index and Psychiatry, 13: 508–15.
Nottingham extended activities of daily living scale in
INDEX
Abbreviated Mental Test Score (AMTAS) 99–100 depression 78–9

activities of daily living 20 disability 3–4, 20–1
Affect Balance Scale (ABS) 129, 132–4 disease specific scales 44–5
AGECAT (computerized diagnostic system) 97–8 discriminative ability 13
Arizona Social Support Interview Schedule (ASSIS) discriminant validity 12
108–9
Arthritis Impact Measurement Scales, (AIMS, AIMS1, equivalence rating scales 17
AIMS2) 20, 26–8 EuroQol 75–7
Barthel Index 20, 34–6
Faces scales 129, 135–7
Beck Depression Inventory (BDI) 85–8
factor structure 13
Family Environment Scale 116
Cambridge Examination for Mental Disorders of the
Family Relationship Index (FRI) 116–17
Elderly 79
Functional Assessment Questionnaire 21–3
Cantril’s Self-Anchoring Ladder 129, 135
Functional Independence/ Assessment Measure
Circles Scale 136
(FIM/FAM) 35–6
Clifton Assessment Procedures for the Elderly (CAPE)
Functional Limitations Profile (FLP) 48–9
41–2
functioning 19–21
Comprehensive Assessment and Referral Evaluation
activities of daily living (ADL) 20
(CARE) 97–8
disability and health (WHO classifications) 3–4
CASP-19 154–5
instrumental activities of daily living (IADL) 20
COOP/WONCA charts 69–71
measuring 19–21
Coopersmith Self-Esteem Inventory 143, 146–7
Cornell Medical Index (CMI) 71–4
Crichton Royal Behaviour Rating Scale (CRBRS) gap model 8
39–41 General Health Perceptions Battery 62–3
criterion validity 12 General Health Questionnaire (GHQ) 90–4
Cronbach’s alpha 14 General Well-Being Schedule (GWBS) 140–1
Geriatric Depression Scale (GDS) 95–7
Dartmouth COOP Function Charts 69–71 Geriatric Mental State (GMS) 97–8
definitions Guttman scales 15
working 3
operational 3 Hamilton Depression Rating Scale 83–4
Delighted–Terrible Faces Scale (D–T) 129, 135–7 happiness 6, 126–7, 129
Delighted–Terrible Scales 129, 135–7 Health Assessment Questionnaire (HAQ) 20, 23–6
210 INDEX
health Network Typology: The Network Assessment
mental 2, 4, 78–9, 149 Instrument 120–2
outcome measures 1–2 Nottingham Health Profile (NHP) 49–53
perceptions 9, 43
physical 1, 2, 4, 149 Older Americans’ Resources and Services Schedule
positive 2, 4–5 (OARS) 21–3
social 2, 5–6, 149 outcomes of care 1–2
status scales 2, 5, 43–5
subjective 2, 6, 43–4 pain 74–5
health related quality of life 3, 5, 7, 149 Patient Generated Index (PGI) 160, 162–4
health status measures 2, 43–5 Perceived Social Support from Family and Friends
Health Status Questionnaire-12 (HSQ-12) 68–9 109–10
Hospital Anxiety and Depression Scale (HADS) 88–90 personality 8
human needs 8, 154 phenomenology 8
Philadelphia Geriatric Center Morale Scale (PGCMS)
Index of Activities of Daily Living (ADL) 20, 28–30 129, 134–5
individualised Physical Health Battery 56–7
model 8 precision 13
measures 148, 160 psychological concepts 8, 125–8
instrumental activities of daily living 20 (Psychological) General Well-Being Schedule 140–1
Interpersonal Support Evaluation List (ISEL) 119–20
Interview Schedule for Social Interaction (ISSI)
111–13 Quality-Adjusted Life Year (QALY) 16–17
intra- and inter-rater agreement (reliability) 14 quality of life 3, 7–9, 125, 148–9
Inventory of Socially Supportive Behaviours (ISSB) definitions 5, 7–9, 125, 148–9
106–8 measurement issues 7, 148–9
models 8–9, 125
Karnofsky Performance Index 20, 32–4 Quality of Life Questionnaire 155
Quality of Relationship Index 111
Ladder Scales 129, 135 Quality of Well-Being Scale (QWBS) 37–9
LEIPAD Questionnaire 152–4
life satisfaction 6, 125, 127, 129 Rand Batteries 55–69
Life Satisfaction Index A (LSIA) and Index B (LSIB) Rand 36-item Health Survey 63–8
129–31 Depression screener 58
Life Satisfaction Index Z (LSIZ) 131 General Health Perceptions Battery 62–3
Likert scales 15 Mental Health Battery 57–8, 140
Linear Analogue Self Assessment (LASA) 156–7 Physical Health Battery 56–7
London Handicap Scale 36–7 Short Form-12, 68–9
loneliness 102, 122–4 Short Form-36 63–8
Lubben Social Network Scale (LSNS) 114–16 Social Health Battery 59–60
Social Support Scale 60–2
McGill Pain Questionnaire (MPQ) 74–5 rating scale 17
McMaster Health Index Questionnaire (MHIQ) 53–5 receiver operating characteristc (ROC) curves 13
measurement scale levels reliability 14
interval 11–12 alternate forms 14
nominal 11 internal consistency 14
ordinal 11 inter-rater 14
ratio 12 intra-rater 14
Mental Health Battery 57–8 multiple-form 14
Mental Status Questionnaire (MSQ) 98–9 split half 14
Montgomery-Asberg Depression Rating Scale (MADRS) test-retest 14
81–2
morale 8, 126, 127–8 Satisfaction with Life Scale 137–8
multiple form reliability 14 self-concept 8, 128, 143
multitrait-multimethod matrix 12 self-esteem 8, 128, 143
INDEX 211
sense of coherence 128 UCLA Loneliness Scale 122–4
sensitivity 13 utility rating scales equivalence 16–17
Scales of Psychological Well-Being 138–40 quality-adjusted life year (QALY) 16–17
scaling responses 15 rating scale 17
scale type, standard gamble 17
Guttman 15 time trade-off 17
Likert 15
Thurstone 15 validity 11–13
visual analogue scale 15, 135, 156, 158 face 11
Self-Esteem Scale 143–4 concurrent 12
Sense of Coherence Scale (SOC) 141–3 construct 12
Schedule for the Evaluation of Individual Quality of Life content 11
(SEIQoL) 160–2 convergent 12
Short Form-8 69 criterion 12
Short Form-12 68–9 discriminant 12
Short Form-36 63–8 discriminant ability 13
Sickness Impact Profile (SIP) 45–9 divergent 12
single-item questions 15, 19, 43, 44 multitrait-multimethod 12
social capital 103–4 precision 13
Social Health Battery 59–60 predictive 12
social network 6, 101–2, 104–7 responsiveness 13
Social Network Scale (SNS) 113–14 sensitivity 13
social support 6, 101–2, 104–7 specificity 13
Social Support Appraisals Scale (SS-A) 117–19 visual analogue scales 15, 135, 156, 158
Social Support Behaviours Scale (SS-B) 117–19
Social Support Questionnaire 110–11 weighting 16, 149, 160
Social Support Resources Scale 117 well-being,
Social Support Scale 60–2 subjective 6–8, 125–6
specificity 13 psychological 6, 78–9, 125
Spitzer’s Quality of Life Index (QL Index) 157–60 physical 4–5, 149
standard gamble rating scale 17 WHOQOL 9, 149–52
Stanford Arthritis Center Health Assessment WHOQOL-BREF 149–152
Questionnaire (HAQ) 20, 23–26 WHOQOL Group 9, 149–152
State-Trait Anxiety Index (STAI) 94–5 World Health Organization (WHO)
definition of disability 3–4
Tennessee Self-Concept Scale 143, 144–6 definition of health 4–5, 149
Thurstone’s scales 15 definition of quality of life 8–9, 149
time trade-off rating scale 15
Townsend’s Disability Scale 30–2 Zung’s Self Rating Depression Scale 79–81
MEASURING HEALTH
A review of quality of life measurement scales
Third Edition
Reviews of previous editions:
An excellent resource for anyone involved in health research and

highly recommended.
Palliative Medicine
A valuable source book for health services researchers, health care

providers, and others interested in quantifying quality of life for
clinical or research purposes.
The International Journal for Quality in Health Care
Includes accounts of a number of recently developed scales, while

retaining the breadth, concision and clarity that marked the first edition.
Medicine, Healthcare and Philosophy
Second Edition Highly Commended BMA Medical Book Competition 1998
This thoroughly revised and updated edition offers a comprehensive guide

to measures of health and is an essential reference resource for all health
professionals and students. Containing details of the use of most of the
major measures of health and functioning, the new edition includes:
• A new chapter on measuring global quality of life

• Updated analysis of measures of subjective well-being
• A revised and up-to-date selection of useful addresses
cover design: Kate Prentice

Measuring Health is key reading for upper level undergraduates and
postgraduates in health studies, health sciences, research methods and
social sciences.
Ann Bowling is a social scientist and is Professor of Health Services

Research in the Department of Primary Care and Population Sciences
at University College London. She is also author of Measuring Disease and
Research Methods in Health, both published by Open University Press.
ISBN 0-335-21527-0
9 780335 215270

Measuring Health A Review of Quality of Life Measurement Scales

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Measuring Health A Review of Quality of Life Measurement Scales

Diunggah oleh

Hak Cipta:

Format Tersedia

Downloaded by [ Faculty of Nursing, Chiangmai University 5.62.156.86] at [07/18/16]. Copyright © McGraw-Hill Global Education Holdings, LLC.

Not to be redistributed or modified in any way without permission.

Open University Press

First published 2005

Copyright © Ann Bowling 2005

All rights reserved. Except for the quotation of short

A catalogue record of this book is available from

ISBN 0 335 21527 0 (pb) 0 335 21528 9 (hb)

Library of Congress Cataloging-in-Publication Data

Typeset by ReﬁneCatch Ltd, Bungay, Suﬀolk

PREFACE TO REVISED EDITION ix

1 CONCEPTS OF FUNCTIONING, HEALTH, WELL-BEING AND

Measuring health outcome 1

Choice of health indicator 10

The Older Americans’ Resources and Services Schedule (OARS): Multi-Dimensional

4 MEASURING BROADER HEALTH STATUS 43

The Sickness Impact Proﬁle (SIP) 45

5 MEASURING PSYCHOLOGICAL WELL-BEING 78

6 MEASURING SOCIAL NETWORKS AND SOCIAL SUPPORT 101

Social-network analysis 101

7 MEASURING THE DIMENSIONS OF SUBJECTIVE WELL-BEING 125

8 MEASURES OF BROADER QUALITY OF LIFE 148

The WHOQOL 149

APPENDIX: A SELECTION OF SCALE DISTRIBUTORS AND USEFUL ADDRESSES 165

ABS Aﬀect-Balance Scale FIM+FAM Functional Independence Measure

CHOICE OF HEALTH INDICATOR the outcome of intervention B a nominal scale may

A Independent in feeding, continence, transferring, Reliability

TOWNSEND’S DISABILITY SCALE (a) Washing down (whether in bath or not)?

Reliability The Social Health Battery was developed alongside

THE CORNELL MEDICAL INDEX (CMI)

SOCIAL-NETWORK ANALYSIS 1986; and see reviews by Bowling 1991, 1994;

Reliability Validity and reliability

INDIVIDUALIZED MEASURES OF QUALITY OF LIFE SCHEDULE FOR THE EVALUATION OF INDIVIDUAL

Abbreviated Mental Test Score (AMT) 7-item ver- Netherlands. http://www.globalfamilydoctor.com/

Abbreviated Mental Test Score (AMTAS) 99–100 depression 78–9

Reviews of previous editions:

An excellent resource for anyone involved in health research and

A valuable source book for health services researchers, health care

Includes accounts of a number of recently developed scales, while

Second Edition Highly Commended BMA Medical Book Competition 1998

This thoroughly revised and updated edition offers a comprehensive guide

• A new chapter on measuring global quality of life

cover design: Kate Prentice

Ann Bowling is a social scientist and is Professor of Health Services

Anda mungkin juga menyukai