H.P.G. Schneider, M.D, PhD, FRAM Department of Obstetrics and Gynecology, University of Muenster, Germany Corresponding Author: Department of Obstetrics and Gynecology, University of Muenster, Von-Esmarch- Str. 56, ZMBE,, D-48149 Muenster phone +49/2 51/83-5 57 10, fax +49/2 51/83-5 57 11 e-mail HPG.Schneider@uni-muenster.de
HPG Schneider Quality-of-life scoring systems 1 Psychometry and the construction of scales and subscales The standard method used for collecting information on the prevalence and severity of complaints has been a check list of symptoms. Symptoms are defined as "an indication of a disease or a disorder noticed by the patient himself. A presenting symptom is one that leaves a patient to consult a doctor" 1 . Symptoms represent a subjective expression or manifestation of some underlying physical, psychological or social dysfunction. Symptoms are, in effect, evidence of dis-ease 2 . Particular knowledge of symptoms and their effect on the daily lives of women will assist the care-giver in providing competent care together with long- standing professional assistance during the ageing process.
Reliable and valid measures of multi- symptom conditions generally come in form of scales and subscales, developed on the basis of principles of test construction and scaling 3 . In the field of psychology, the techniques developed to construct such measures became known as psychometrics. The first experimental psychology laboratory was founded by Wilhelm Wundt at the University of Leipzig in 1870. The interest was to establish the general principles of psychological expression. As there is, however, a wide variation in individual expression, the construction of measures was required sensitive enough to distinguish between subjects and the various items under investigation. This led to attempts to construct "scales". By definition, scales are instruments which measure phenomena on a continuum using ordinal scaling 2 . Scales measuring more complex human characteristics, such as intelligence or personality traits, invariably consist of a number of items which are summated to give an overall score for each person. A number of various symptoms may yield a total score which reflects the degree of severity of a condition along a graded continuum for each individual. Moreover, each symptom is usually rated in terms of its frequency of occurrence or severity. Factor analysis is a multivariate mathematical technique traditionally used in psychometrics to construct measures of psychological and behavioral characteristics, such as intellectual abilities or personality traits 2 . In theory, it addresses the problem of how to analyze the structure of the inter- relationship (correlations) among a large number of variables (test scores, questionnaire responses, behavior, symptoms) by identifying a set of underlying dimensions known as factors. The overall objective of factor analysis is data summarization and data reduction. A central aim of factor analysis is the orderly simplification of a number of interrelated measures. Factor analysis aims to order and give structure to observed variables and, by virtue of that, allows for the construction of instruments in the form of scales and subscales. The relationship between a symptom and a factor is measured by a correlation coefficient known as a factor loading. On that basis, an instrument can be constructed which consists of several separate subscales; it will measure different aspects of the symptom picture, based on the way symptoms cluster together with factors and on the size of the factor loadings. As a result, a scale will emanate which yields a symptom profile for each subject 2 . By identifying symptoms which cluster together or form groups of factors, one may be able to delineate facets of the symptom picture and identify those symptoms that are an essential part of a syndrome and those which are not. Scales for measuring a complex phenomenon or multifaceted syndromes are generally made up of a number of subscales; they each measure a different facet of the syndrome. Summating symptoms from apparently different domains very often is meaningless. Greene, in his methodological evaluation, has compared this to adding a person's height and waist measurement to give an overall measure of "size". Such a measure would fail to distinguish tall, thin people from small, obese people, because both would tend to have a similar overall "size" score 2 . Similarly, the common practice of reporting symptoms individually is bound
HPG Schneider Quality-of-life scoring systems 2 to fail because such a measure would not assess a condition comprehensively.
General scales and condition-specific scales are the two types for measuring human characteristics or conditions. When questionnaires are developed, they either focus on generic or disease- and treatment- specific aspects. Different generic scales show many similarities, as they assess the ability of patients to cope with their condition physically, emotionally and socially as well as their general performance at work and in daily life 4 . The most commonly used generic measures are the Sickness Impact Profile 5 , the Nottingham Health Profile 6 , the Quality of Well-Being Scale 7 and the Short Form (SF)- 36 Health Survey 8 . They all cover the multidimensional aspects of quality of life over a wide range of health problems. These scales may be less responsive to treatment- induced changes and could be considered lengthy and time-consuming.
Disease-specific measures, on the other hand, are more likely to be responsive and make sense to clinicians as well as to patients. Their specific measures relate to concepts and domains in patient populations, diagnostic groups or diseases. One of the very first was the Women's Health Questionnaire (WHQ) 9 . It was developed to assess a wide range of physical and emotional symptoms to study possible health changes of mid-aged women. The WHQ consists of 36 items grouped into nine domains. Self-reported symptoms are scored on a five-point Likert Scale 10 (table 1).
Table 1: Psychometric Response Scales 10
After the questionnaire is completed, each item may be analyzed separately, or item responses may be summed to create a score for a group of items Traditionally a five-point scale, many psychometricians advocate use of seven- or nine-point scales The Likert Scale is a bipolar scaling method, measuring either positive and negative to a statement Typical test item in a Likert Scale is a statement A respondent is asked to indicate a degree of agreement with the statement <Ice cream is good for breakfast > Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree
HPG Schneider Quality-of-life scoring systems 3 Particular aspect of health questionnaires may refer to psychiatric problems, such as the Beck Depression Inventory 11 . This index was designed to assess clinical depression in psychiatric patients and proved to be much less sensitive to change than were other non- psychiatric measures. Although the depressed mood as experienced by climacteric women may not be less severe than that of psychiatric patients, it has different origins and may therefore be of a different context than psychiatric depression. Other test systems include pain scores, sleep disturbances, the assessment of sexual dysfunction, mental and cognitive function.
Health-related quality of life The World Health Organization definition of health to be a "state of complete physical, mental, and social well-being and not merely absence of disease or infirmity" has remained unchanged since 1948 12 . Although mortality was previously the measure of choice to reflect population health, the importance of "non-fatal" health outcomes (i. e., functioning and disability in various aspects of life) has recently been recognized. National mortality statistics, reported on the basis of the International Classification of Diseases (ICD) system was useful for tracking life expectancy and causes of death but failed to reflect health status among the living population. This led to the development of the International Classification of Impairments, Disabilities, and Handicaps (ICIDH) 13 to classify the consequences of diseases. "Impairment" refers to any loss or abnormality of psychological, physiological, or anatomical structure or function of the tissue, organ, or whole body system level (e. g. reduced muscle strength). "Functional limitation" refers to any restriction or inability (resulting from an impairment) to perform an activity in the manner or within the range considered normal for the human being (e. g. limited ability to walk). "Disability" has been subclassified into four categories, including physical, mental, social, and emotional disability. Disability represents any restrictions or limitations in the fulfillment of a person's normal (depending on their age, gender, social and cultural factors) socially defined roles and tasks at work, school, or recreation, or for personal care 14 . Recently, the WHO has created a revised ICIDH model titled International Classification of Functioning, Disability and Health (ICIDH-2), where the domains include impairment, activity and participation 15 . This ICF has been accepted by 191 countries as the international standard to describe and measure health and disability. WHO estimates that as much 500 million healthy life years are lost each year due to disability associated with health conditions. This is more than half the years that are lost annually due to premature death. The ICF provides a common meter about this immense problem. While traditional health indicators are based on the mortality rates of populations, the ICF shifts focus to "life", e. g. how people live with their health conditions and how these can be improved to achieve a productive, fulfilling life. It has implications for medical practice; for law and social policy to improve access and treatments; and for the protection of the rights of individuals and groups. The ICF takes into account this social aspect of disability and provides a mechanism to document the impact of the social and physical environments on a person's functioning (figure 1).
HPG Schneider Quality-of-life scoring systems 4
Figure 1: WHO International Classification of Functioning and Disability Interactions between the components of ICF
The way this has been interpreted was by the example of a person with a serious disability which finds it difficult to work in a particular building because it does not provide ramps or elevators; the ICF identifies the needed focus of intervention, e. g. that the building should include those facilities and not that the person be forced out of the job because of an inability to work. Thereby, ICF puts all disease and health conditions on an equal footing irrespective of their cause. A person may not be able to attend work because of a cold or angina, but also because of depression. This neutral approach puts mental disorders on a par with physical illness and is contributed to the recognition and documentation of the world-wide burden of depressive disorders, which is currently the leading cause, world-wide, of life years lost to disability. The clustering nine domains with their qualifiers and scoring system are listed in tables 2 and 3. Validation studies are under way to ensure that ICF is applicable across cultures, age groups and genders so as to collect reliable and comparable data on health outcomes of individuals and populations.
Health Condition ( Disorder or Disease ) Activities Participation Body Functions and Structures Environmental Factors Personal Factors
HPG Schneider Quality-of-life scoring systems 5 Table 2: WHO International Classification of Functioning and Disability 15
Qualifiers D o m a i n s * Performance Capacity d1 Learning and applying knowledge d2 General tasks and demands d3 Communication d4 Mobility d5 Self-care d6 Domestic life d7 Interpersonal interactions and relationships d8 Major life areas d9 Community , social and civic life
Domains cover the full range of life areas. The component can be used to denote activities or participation or both. The domains of this component are qualified by two qualifiers of performance and capacity.
Table 3:
Scoring System of the International Classification of Functioning and Disability 15
0 No problem (none, absent, negligible, ) 0 4 % 1 Mild problem (slight, low, ) 5 24 % 2 Moderate problem (medium, fair, ) 25 49 % 3 Severe problem (high, extreme, ) 50 95 % 4 Complete problem (total, ) 96 100 %
Broad ranges of percentages are provided for those cases in which calibrated assessment instruments or other standards are available to quantify the impairment, capacity limitation, performance problem or barrier.
HPG Schneider Quality-of-life scoring systems 6 Health-related quality of life refers to the effects of an individual's physical state on all aspects of psycho-social functioning. Generally speaking, quality of life may also be defined as "the extent to which our hopes and ambitions are matched by experience" 16 . Recently, there is growing awareness of the aspects of quality of life and aging. Quality of life is a subjective parameter and direct questioning is therefore a simple and appropriate way of accruing information about how patients feel and function. Accordingly, measures of quality of life (QOL) attempt to gauge the effect ill health has across a number of physical, psychological and social parameters.
Quality of life and ageing Those years of life in which a woman passes through a transition from the reproductive stage of life to the postmenopausal years form a period marked by waning ovarian function, best referred to as the climacteric. The Massachusetts Women's Health Study has provided information that women would express either positive or neutral feelings about menopause with the exception of those who experience surgical menopause 17 . By that token, the majority of women feel healthy and happy and do not seek contact with physicians. Medical intervention at this point of life should rather be regarded as an opportunity to provide and reinforce a program of preventive healthcare. These issues of preventive healthcare for women include family planning, cessation of smoking, control of bodyweight and alcohol consumption, prevention of heart disease and osteoporosis, maintenance of mental well- being (including sexuality), cancer screening, and treatment of neurological problems. Chronic disease in an ageing population is incremental in nature. The best health strategy would be to change the rate at which illness develops and thus postpone the clinical illness; in the end, if it is postponed long enough, it might be prevented effectively. This postponement of illness has been termed "compression of morbidity" 18,19 . The target is to lead a relatively healthy life and compress illness into a short period of time just before death. Thus, disease is something not necessarily best treated by medication or surgery, but by prevention or, more accurately, by postponement. Improvement of quality of life is a primary purpose of health promotion. This can be achieved by preventive health programs with their greater impact on morbidity rather than mortality 19 . The aim is maximal vigor in life rather than accepting linear senescence. Some linear decline is unavoidable, but the slope can be changed by effort and practice.
How to assess quality of life in ageing and climacteric women An example of a simple symptoms inventory without attempts to standardize it or to apply psychometric methodology has been the Kupperman Index 20,21 . This questionnaire focused primarily on symptomatic relief, assessed on the basis of the physician's summary of the severity of climacteric complaints, and assessed it by a weighting index, rather than letting women assess their perceived symptoms. Some decades later, time had come for the development of more specific symptom lists or other questionnaires as instruments to measure changes and to validate them in a scientific manner. Psychometric methods were more frequently applied in the 1950s and 60s; this knowledge, however, was greatly restricted to psychology and social science and not yet common in medicine. Test theory and test construction developed rapidly in the 1960s also, spreading to the medical field. It was during this period of time when social scientists had to acknowledge the differences between "objective hard data" and "subjective soft data", as different degrees of proof. In particular, increased awareness emanated of subjectively perceived quality of life to best serve the description of treatment benefits.
HPG Schneider Quality-of-life scoring systems 7 Instruments were utilized to develop a scale and evaluate their basic properties such as dimensions (domain). This would e. g. require to analyze the structure of a construct such as menopausal complaints. By analyzing the possible intercorrelations of all symptom combinations, it was found that symptoms would cluster into "factors", which allow assessing variation. Factor analysis will distinguish the "domains or subscales" of a complex construct such as menopause. Among clinicians and researchers, there is a trend to increasing recognition of the role of patient-reported data as outcome measure for clinical and drug research. Health authorities are in support of this growing interest. As a result, multiple attempts have been undertaken for a state- of-the-art development of health-related quality of life scales applicable to women in their menopausal transition. There are four criteria by which scales would qualify as standardized or disease- specific (adapted from 2 ): 1. They have been constructed on the basis of a factor analysis. 2. They consist of several subscales, each measuring a different aspect of a specific symptomatology. 3. The scales possess sound psychometric properties. 4. They have been standardized using adequate populations of women. With these criteria being fulfilled, a series of instruments currently dominates international practice. Although some of them do not necessarily meet the criterion of primarily being considered health-related quality of life (HRQoL) instruments, they are listed because of their extensive clinical usage and the large amount of statistical information collected. The following scales are introduced according to their chronological order of construction. They are short-listed in table 4.
Table 4:
Standardized Menopause-Specific QOL Scales* Name of scale Number of items Rating points Scoring Number of subscales ( domains ) Reliability of subscales Greene Climacteric Scale 21 4 Likert Scale 4 0.83 0.87 Women's Health Questionnaire 32 2 Present / Absent 8 0.78 0.96 Qualifemme 15 6 VAS 100 mm 4 0.84 0.98 Menopause-Specific QOL Questionnaire 16 7 Likert Scale 4 0.55 0.85 Menopausal Symptom List 25 6 Frequency Severity 3 0.73 0.83 Menopause Rating Scale 11 5 Likert Scale 3 0.74 0.82 Menopause Quality of Life Scale 48 6 Likert Scale &VAS 7 0.69 0.91 Utian QOL Scale 23 5 Likert Scale 4 0.73 0.84 * For more details see text.
HPG Schneider Quality-of-life scoring systems 8 Greene Climacteric Scale This was the first properly analyzed climacteric symptom scale. In 1976, J . G. Greene developed his original 30-item self- administered scale 22 . It was derived from an earlier study by Neugarten and Kraines 23 . Based on endocrine and emotional factors underlying the etiology and dynamics of menopause, Greene investigated the relationship between menopausal symptoms. Factor analysis of climacteric symptoms established independent domains such as vasomotor and physical. The original 50 women aged 40 to 55 years were scored on a four-point Likert scale (0 to 3). The results were inter-correlated using product-moment coefficients with a resulting matrix being submitted to principal component analysis. The final scale yielded three independent symptom groups or factors, equivalent to subscales. These were psychological, somatic and vasomotor symptoms. Items with factor loadings greater than +0.40 on one factor and less than 0.30 on the other two factors were included in the questionnaire. The resulting 21 items from an initial list of 30 were included in the scale. Those items with factor loading above +0.50 were given a weighting factor of 2. Gerald Greene's tool represents a pioneering piece of work. While the original scale was never designed to be a genuine HRQoL instrument as defined today, it first applied quantitative techniques to questionnaire construction and marked the beginning of the use of factor analysis in clinical studies with "patient-reported" outcomes as endpoint in the field of women's health. Since these days, factor analysis has been applied world around in order to generate new menopausal scales. Later, Gerald Greene tried to reconcile the findings of seven other factor analytic studies and meet the demand for a "communal and comprehensive measure" of climacteric symptoms; this revises new tool was based on a sample of 200 rather than 50 women. It was published in 1998 24
and looked at the optimum number of factors or domains to be established with resultant "communal" scales of psychological, somatic and vasomotor symptoms. By only selecting symptoms found to have a factor loading of more than 0.35 in three or more studies, he also determined which symptoms should be included. These new studies therefore replaced four items from the original 1976 scale by four new ones. Four other symptoms underwent a change in the wording. An additional item on loss of sexual interest was added, and the psychological symptoms domain was broken down into an anxiety and a depressed mood scale. The result is a 21-item, four-level questionnaire. This "standardized" Greene Climacteric Scale of 1998 was employed in a trial of Kliogest 25 .
Women's Health Questionnaire The Women's Health Questionnaire (WHQ), developed by Myra Hunter, is a self-administered questionnaire which measures physical and emotional experience and functioning of women aged 45 to 65 years 9 . It was designed specifically to study possible changes in perceptions of health and well-being during the menopausal transition. The questionnaire was initially developed in UK English and is composed of 36 items. Of those, 35 items investigate nine domains providing scale scores: depressed mood, somatic symptoms, memory/concentration, vasomotor symptoms, anxiety/fear, sexual behavior, sleep problems, menstrual symptoms and attractiveness. The WHQ was used both in epidemiological and intervention studies. It was employed in the Adelphi Women's Health Program in 1998, with subsequent publications 26,27,28 . A double-blind, randomized, placebo-controlled multi-centre clinical study was performed in 1995 29 . This trial examined 223 volunteering Swedish postmenopausal women with mild to severe climacteric symptoms at baseline in terms of their HRQoL response to transdermal estradiol or placebo patches, respectively. Recently, the structure of the WHQ was examined in a UK sample; a revised model was developed and verified to be used in
HPG Schneider Quality-of-life scoring systems 9 multi-center, international studies 30 . The revised WHQ comprised 23 items, investigating six domains. The cross- sectional psychometric properties of the 23- item WHQ were good and better than those of the 36-item version. The 23-item WHQ was assessed with multi-national data to evaluate cross-cultural equivalents of linguistically adapted versions. Reproducibility and responsiveness need to be documented.
Qualifemme The Qualifemme questionnaire was developed in France to measure the impact of menopausal hormone deficiency on a woman's quality of life. The first version consisted of 32 items delineated from several other validated and accepted HRQoL instruments. These items were translated and linguistically validated for use in France 31 . The Qualifemme is scored using a visual analogue scale. Item weighting was achieved by a group of menopausal experts contributing their clinical experience. The original investigation consisted of a subject pool of 351 women aged 51 to 68. A principal component analysis identified five domains with 32 items: general (9), psychological (12), vasomotor (2), urogenital (6), and a final domain covering pain and problems with hair and skin (3). Internal consistency was demonstrated by a Cronbach's alpha coefficient of 0.87. Subsequently, a reduction process removed 17 items from the original instrument and resulted in the current 15-item questionnaire. This reduction did not alter the instrument's quality psychometric standards 32 . Internal consistency (Cronbach's alpha) was 0.73. The Qualifemme was applied in a multi- centre trial in France. HRQoL was compared before and after sequential versus continuous combined application of 17-oestradiol percutaneous gel and nomegestrol acetate in 141 postmenopausal women from 36 centers during the years 1996 and 1998. The global quality of life score increased by 44.6 % for those on sequential treatment and 38.3 % in the continuous-combined treatment group 33 . From this experience, Qualifemme appears to be a valid instrument; it also attempts to include the side effects of menopausal hormone therapy such as androgenic skin effects.
Menopause-Specific Quality of Life Questionnaire The Menopause-Specific Quality of Life Questionnaire (MENQOL) was developed by a group of researchers from Canada during the mid-1990s 34 . A list of postmenopausal symptoms was established by extrapolation from the menopause and quality of life literature plus quality of life questionnaires and the plus the investigators' clinical experience. The final questionnaire collected 106 items. The original five domains such as physical, vasomotor, psychosocial and sexual, and working life, upon completion of the study, were reduced by omission of the domain "working life". The final 32-item menopause-specific HRQoL instrument encompasses four subscales (physical, vasomotor, psychosocial and sexual) plus one overall HRQoL item. Each domain is scored separately within a possible range from 1 (not experiencing a problem) to 8 (extremely bothered). The mean of the subscale serves as the overall subscale score. As with the WHQ, no overall score can be obtained from this questionnaire, as the relative contribution of each domain to an overall score is unknown. Internal consistency (Cronbach) from 0.81 to 0.89. Construct validity (evaluative and discriminative) oscillates between 0.40 and 0.65 or 0.28 and 0.60, resp. They were determined within a randomized, parallel- group design trial of conjugated versus transdermal estrogen, both supplemented with MPA in a sequential fashion. While all domains improved during treatment, there were significant differences between groups 35 . Discriminative power was poor in the vasomotor domain and good in other domains. Evaluative performance was fair in the vasomotor and libido, poor in the global subscale. The lack of introducing
HPG Schneider Quality-of-life scoring systems 10 factor analysis is another shortcoming, as it withholds correlation patterns of data variants. As most of the other instruments, MENQOL also does not address the full picture of potential side effects of menopausal hormone therapy 36 .
Menopausal Symptom List The Menopausal Symptom List (MSL) was developed in 1997 to measure the severity of symptoms commonly associated with menopause. The theoretical symptom check list was sent to 40 women aged 45 to 55 years living in Australia. Following two principal component analyses, 25 significant items emerged in three domains, labeled psychological, vaso-somatic, and general somatic 37 . The latter combines the anxiety and depression subscales of the Greene Climacteric Scale and the Women's Health Questionnaire. The vasomotor subscale, besides two vasomotor symptoms, also includes other somatic symptoms for reasons not quite apparent. The items are scored on a six-point Likert scale of both frequency and severity. The MSL is a symptom inventory in terms of the selection, wording of items and its scoring. Validation experience is limited.
Menopause Rating Scale The first version of the Menopause Rating Scale has been used since 1992 38,39 . It was initially developed to provide the physician with a tool to document specific climacteric symptoms and their changes during treatment and was seen as an improvement over the commonly applied Kupperman Index. A critical assessment of this new scale, however, disclosed methodological deficiencies, which both in theory and practice limited its use. Accordingly, the original physician-based scale was improved as follows: Application of the scale in a representative sample of women after questionnaire revision. Revision of the questionnaire such that women will complete it themselves; first of all because self-assessment is more sensitive, and second, a self-administered questionnaire would not limit future application. Modification of the wording of items to a simple, laymen-appropriate form. Proper psychometric evaluation of the revised scale based on a representative sample and development of simple-to-use standardized items with clear dimensions. Classification of the severity of complaints based on a normal population sample. Provision of normative data, representative for the climacteric age in the female population. This new MRS questionnaire was standardized in early 1996 using a representative random sample of 689 German women aged 40 to 60 years 40 . This revision of the questionnaire mainly concerned the layout, some adjustments regarding the number, structure, and wording of items; these were made to support applicability as self-administered questionnaire. The MRS was formally standardized following up-to-date psychometric rules. Factor analysis of the standardized eleven-item version encompassed three domains: psychological, somato-vegetative, and urogenital dimension. Scoring is based on a 5-point Likert scale ranging from no symptoms to mild, moderate, marked or severe complaints. A follow-up investigation was performed from August to October 1997 in 306 women from the original study. The retest reliability of scores between the two points was evaluated using Pearson's correlation coefficient. The results of the follow-up survey demonstrate stability in the individual scores. The total score and scores of the three defined dimensions have significant agreement as demonstrated using statistics 41 . The validity of the MRS to measure HRQoL in postmenopausal women was determined by comparing the instrument to both the Kupperman Index 20,21 and the SF- 36 42 . The Kupperman Index introduced weighting factors based "prevalence and consequence" in the way the developer had perceived it. Thus, the assignment of such
HPG Schneider Quality-of-life scoring systems 11 weighting factors is not explicit and merges distinct concepts into one coefficient. This rather simple symptom questionnaire of the late 1950s never experienced quantitative research or psychometric validation. The MRS proved to be a much more sound and accurate instrument than the Kupperman Index; the differences between the scores could easily be explained by the domains resulting from factor analysis. There was, however, a high degree of association between both instruments as documented by Kendall's -b coefficient and Pearson's correlation coefficient 42 . Truly more important were the results of comparing the MRS to SF-36. The psychological and somato-vegetative MRS subscales did not correlate equally well across all eight domains of the SF-36. However, the pattern of correlation was understandable, as the highest degree of correlation occurred in the domains of the SF-36 that are most relevant to women during the menopausal transition 2 . Thus, the MRS is a reliable, well-defined instrument for measuring the impact of climacteric symptoms on quality of life 43,44 . It should be regarded as a brief and compact instrument, easy to complete and to score, and suitable for routine controls. It covers the key complaints of women during and after menopause. This type of scale is not tailored to detail specific therapies to the needs of each individual woman. The need for cross-nationally and cross- culturally valid, reliable, and responsive HRQoL instruments has never been so great as today. Linguistic validation of the MRS created an excellent international response and acceptance. The first translation was into English 45 . Other translations followed 46 , and the following versions are currently available: Brazilian, Bulgarian, Belgium-French, Belgium-Dutch, Chilean, Chinese, Croatian, English, French, German, Greek, Indonesian, Mexican/Argentinean, Polish, Spanish, Swedish, Romanian, Russian, South African English, South African Afrikaans, Turkish, Ukrainian (Russia), Ukrainian (Ukraine) language. Some of these versions are available in published form 46 , and all including the unpublished can be downloaded in PDF-format from the internet (see reference 43 and www.menopause- rating-scale.info).
Menopausal Quality of Life Scale The Menopausal Quality of Life Scale (MQOL) was developed in 2000 47 . It was intended as a condition-specific questionnaire that examines the effects of menopause on HRQoL as well as the impact of employment, age, and medical history; in addition, cross-sectional information on differences in HRQoL was obtained in a community-based sample of women consequent to a self-rated change in menopausal status. The effects of hormone replacement therapy in the early postmenopause were investigated. Based on interviews of 32 and later another 29 women, a pilot questionnaire was developed containing 63 items divided into seven domains. These were energy, sleep, appetite, cognition, feelings, interactions, and symptoms impact. Each of these items is reported using a six-point Likert scale. The return of 99 questionnaires served for psychometric analysis and resulted in a 48- item questionnaire as well as a global HRQoL question to rate the overall quality of life. Oblimin rotation was applied in a second analysis with a resultant seven-factor hierarchical structure, which accounted for 57 % of the data variance. This structure proved unstable across sub-samples. Therefore, the MQOL questionnaire was given an overall instead of seven subscale scores for each of the seven domains 21 . Strong correlations of interdependence between domains were demonstrated. Consequently, the global quality of life index was disregarded as a single factor; all the items were evaluated with the same importance and were added in a total score. The empirical foundation of this questionnaire construction with its psychometric shortcomings have unsuccessfully been tried to circumvent or mask.
HPG Schneider Quality-of-life scoring systems 12 Utian Quality of Life Score The Utian Quality of Life Score (UQOL) is a modification of the original Utian questionnaire from the 1970s 48 . It was developed from the old questionnaire designed to assess the sense of well-being of participants in a treatment study comparing estrogen to placebo 49 . The UQOL is focused on general quality of life rather than QOL in menopausal women. Factor analysis was applied through two- stages. The 23-items are rated with a five- point Likert scale and create four subscales (occupational, health, emotional, and sexual). A field study was conducted on 327 women aged 46 to 65, recruited from eleven separate communities throughout the east and mid-west of the United States. The resulting 23-item instrument was then administered to a second sample of 270 menopausal women and subsequently re- administered to determine test-retest validity. The SF-36 was concurrently administered to determine scale validity. The UQOL can measure severity of QOL burden. However, only limited data on reliability and validity are as yet available. The paucity of menopausal symptom- specific items may require a parallel application of another more menopausal symptom-related scale for the most widely practiced application of such scales, which is during the menopausal transition.
Menopausal hormone therapy and QOL In 2002, Hogervorst et al. 50
systematically reviewed the effect of menopausal hormone therapy (MHT) on cognitive function. Their study included fifteen publications with a total of 566 postmenopausal women. This meta- analysis did not report any favorable effect of MHT on cognitive functions (verbal measures, spatial measures, speed of reading or memory). Randomized data systematically report that hormone therapy improves quality of life only when it is hampered by the presence of climacteric symptoms. When symptoms are not present, hormone therapy does not improve quality of life; and would not do so in elderly women. This analysis would explain why estrogen plus progestin in the WHI resulted in no significant effects on general health, vitality, mental health, depressive symptoms, or sexual satisfaction. The use of estrogen plus progestin in this large study was associated with a statistically significant but small and not clinically meaningful benefit in terms of sleep disturbance, physical functioning, and bodily pain after one year 51 . The postmenopausal women in the WHI had a mean age of 63 years with a range of 50 to 79 years. An open, uncontrolled post-marketing study with over 9000 women with pre- and post-treatment data of the MRS scale was organized to evaluate the capacity of the scale to measure the health-related effects of hormone treatment independent from the severity of complaints at baseline. Hormone therapy consisted of a combination of 2 mg estradiol valerate continuously and 1 mg cyproterone acetate in a sequential addition (Climen). The mean age was 49.8 years (SD 6.4); about half of the women participating were still perimenopausal (51.9 %), the others already in the postmenopausal period (48.1 %). The mean body mass index was 24.7 (SD 3.7). The absolute improvement of the symptoms during treatment was 9.3 points of the MRS total score on average. Did treatment effects relate to the severity of complaints at baseline? The answer is documented in figure 2. The relative improvement of complaints or quality of life increases with the degree of severity of symptoms at baseline. Very importantly, MRS scale detects a positive treatment effect also in women with little complaints 52 . The MRS-assisted assessments of menopausal hormone therapy and the meta-analysis of Eva Hogervorst both would explain why the WHI investigation with menopausal complaints as exclusion criterion did not produce major benefits in terms of quality of life outcomes except small benefits in terms of vasomotor symptoms and sleep disturbance 53 . This
HPG Schneider Quality-of-life scoring systems 13 may be considered another piece of evidence as to the general experience that the study as well designed and big in size as it may be, will never provide answers to any problem that it was not designed for.
Figure 2: HRT: Relative Change of the MRS Mean Values (SD) in Four Categories of Severity at Baseline
Reproduced from Schneider HPG et al. 43
Practical considerations Researchers have been criticized for their failure to use appropriate measures of health- related quality of life in the evaluation of the impact of any intervention through assessment of patient outcome. Trials may either neglect outcomes other than conventional clinical, laboratory and radiological measures or may use limited, inappropriate, or poorly validated indicators as surrogates of the patient's own experience. The recent enthusiasm for the potential of questionnaires to provide accurate evidence of outcomes from the patient's perspective has created numerous reports, although it is not clear how well developed the applied methods are and whether they are available across the full range of health problems. British authors from the Institute of Health Sciences at Oxford 54 have undertaken an extensive review to describe the extent to which patient-assessed outcome measures have been developed and applied and examined whether such instruments are available for all aspects of clinical research. They collected 3,921 reports, of which 46 % were disease- or population-specific, another 22 % were generic, 18 % were dimension- specific, 10 % were utility- and 1 % were individualized measures. During 1990 to 1999, the number of new reports of development and evaluation rose from 144 to 650 per year. Over 30 % of evaluations were cancer, rheumatology and musculo-skeletal disorders, and older people's health. The generic measures SF-36, Sickness Impact Profile, and Nottingham Health Profile accounted for 16 % of the reports. The authors were not surprised that there is evidence of a lack of consistency in the selection of measures for clinical trials which hinders comparison between studies. In a study of 67 clinical trials, 48 were found to use 62 different existing measures and 13 reported new measures. For routine application in clinical practice or in clinical trials, it is essential that the instruments employed are simple and comparatively short. The majority of patients or test persons welcome the opportunity to report 0 10 20 30 40 50 60 t h e r a p e u t i c i m p r o v e m e n t ( % ) 10.8 + 10.6 55.1 + 13.8 32.2 + 9.8 43.9 + 11.8 no / little symptoms mild symptoms moderate symptoms severe symptoms baseline total score
HPG Schneider Quality-of-life scoring systems 14 how symptoms and their subsequent treatment affect daily life. Psychometrically evaluated questionnaires allow uniform administration and unbiased quantification of data as the response options are predetermined and thus equal for all respondents. A core set of questionnaires would allow the comparison of study results in patient populations. This is why such widely used and excellently validated instruments have been introduced in this report. Certain difficulties, however, introduce bias into the interpretation of data. These include the experiences of some interviewed individuals, particularly of older age who might have difficulty with reading or writing, or being exposed to less experienced interviewers. The expenses involved in gathering quality of life data may also create divergence. Standardization, compatibility, eradication of possible bias and economy are therefore important variables for the validity of any type of quality of life assessment. The application of health-related quality of life instruments requires the same scrutiny and intention as the measurement of physiological outcomes. Random and representative samples of the population should be investigated in sufficient numbers and over prolonged periods of time. In terms of statistics, quality of life is, by definition, an assessment of multiple variables. The use of many measures and multiple statistical tests reduces the statistical power of the analysis. Health-related quality of life certainly is a multi-dimensional concept. Whether or not the aggregation of several dimensions into a summary index is appropriate remains open to continuing debate. A summary score may falsely suggest improvement in one vital area and conceal deterioration in another. Indices, however, are practical and are a convenient method of information transfer. In a larger representative Berlin Study 55 , important sequelae for the understanding of well-being in menopausal women were found to be women's self-confidence, the quality of their partner relationship and the re-orientation process initiated by menopause and their psychosocial condition. Employment is considered to be a protective factor. The experience of relief from several physical and psychosocial conditions has to be considered in the assessment of well-being in menopausal women. Another important example of the application of HRQoL instruments is the prevalence of individual menopausal symptoms to differ among ethnic groups of Asian women 56 . Within each ethnic group, the percentage of women reporting items of the MENQOL varied substantially (table 5) 57 . Therefore, it may be inappropriate to utilize the same QOL measuring instrument across continents and maybe not even across regional ethnicities, unless linguistic and cultural adaptation is provided. A hypoactive sexual desire disorder causes marked distress or interpersonal difficulty with severe impact on quality of life. In addition to the menopause-related questionnaires and inventories, which more or less consider sexual behavior as a separate domain, a more specific evaluation has emanated 58 . The aspects of sexuality and quality of life have been the subject of another report during this Workshop. Human beings are social individuals. If one changes the health status or quality of life of an ageing person, the partner might also be affected, sometimes strongly and with positive or negative interaction. This is rarely considered in the development of tools to measure treatment. Interdisciplinary consensus can also help to determine the most suitable measure for a particular application. Researchers should undertake comprehensive literature searches to ascertain whether any suitable measure is available before they decide to develop a new one.
HPG Schneider Quality-of-life scoring systems 15 Table 5:
PAM Study: Baseline Domain Scores by Ethnic Group 56
MENQOL (29) d o m a i n (mean S. D.) Ethnic origin No. of women Vasomotor Psychosocial Physical Sexual Chinese 249 3.13 (1.67) 2.84 (1.37) 3.21 (1.15) 4.04 (2.20) Filipino 199 3.17 (1.60) 3.33 (1.41) 3.20 (1.23) 3.03 (2.03) Indonesian 60 2.28 (0.87) 2.40 (0.68) 2.66 (0.63) 2.63 (1.18) Korean 97 2.21 (1.40) 3.06 (1.46) 3.29 (1.24) 3.55 (2.29) Malay 24 3.02 (1.56) 2.78 (1.11) 2.93 (1.08) 3.14 (1.78) Pakistani 60 4.96 (2.41) 4.24 (1.64) 4.84 (1.61) 2.90 (1.70) Taiwanese 81 2.29 (1.39) 2.37 (1.32) 2.84 (1.23) 2.11 (1.32) Thai 150 2.87 (1.61) 3.10 (1.22) 3.28 (1.08) 2.89 (1.90) Vietnamese 100 5.71 (1.59) 5.96 (1.48) 5.39 (1.20) 6.55 (1.67)
HPG Schneider Quality-of-life scoring systems 16 References: 1. Martin EA. The Oxford Medical Dictionary. Oxford: Oxford University Press, 1994 2. Greene J G. Measuring the symptoms dimension of quality of life: General and menopause- specific scales and their subscale structure. In Schneider HPG, ed. Hormone Replacement Therapy and Quality of Life. Carnforth, New York: Parthenon Publishing, 2002:35-43 3. Peck D, Shapiro C. Measuring Human Problems; A Practical Guide. Chichester: Wiley, 1990 4. Fitzpatrick R, Fletcher A, Gose S, et al. Quality of life measures in health care. I: Applications and issues in assessment. Br Med J 1992;305:1074-1077 5. Bergner M. Development, use and testing of the Sickness Impact Profile. In Walker S, Rosser M, eds. Quality of life assessment: Key issues in the 1990s. Dordrecht: Kluwer Academic Press, 1993:201-209 6. Hunt SM, McKenna SP, McEwen J , et al. The Nottingham Health Profile: Subjective health and medical consultations. Soc Sc Med 1981;15A:221-229 7. Kaplan RM, Anderson J P, Ganiats T. The Quality of Wellbeing Scale: Rationale for a single quality of life index. In: Walker S, Rosser M, eds. Quality of life assessment: Key issues in the 1990s. Dordrecht: Kluwer Academic Press, 1993:65 ff 8. McHorney CA, Ware J E, Raczek AE. The MOS 36-item short-form health status survey (SF- 36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247-263 9. Hunter M. The Women's Health Questionnaire (WHQ): a measure of mid-aged women's perceptions of their emotional and physical health. Psychol Health 1992;7:45-54 10. Likert R. A technique for the measurement of attitudes. Arch Psychol 1932; 140: 55 11. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry 1962;4:561-574 12. World Health Organization. Preamble to the Constitution of the World Health Organization. International Health Conference, New York, N. Y., June 19 - July 22, 1946: Report of the U. S. Delegation, Including the Final Act and Related Documents, Department of State publication 2703, Conference Series 91. New York: WHO, 1946 13. World Health Organization. The International Classification of Impairments, Disabilities and Handicaps. Geneva: WHO, 1980 14. Woodhouse LJ , Mukherjee A, Shalet SM, et al. The influence of growth hormone status on physical impairments, functional limitations, and health-related quality of life in adults. Endocr Rev 2006;27:287-317 15. World Health Organization. International Classification of Functioning, Disability, and Health. Geneva: WHO, 2001 16. Calman KC. Quality of life in cancer patients an hypothesis. J Med Ethics 1984;10:124-127 17. McKinlay SM, Brambilla DJ , Posner J G. The normal menopause transition. Maturitas 1992;14:103-115 18. Fries J F. Aging, natural death and the compression of morbidity. N Engl J Med 1980;303:130- 135 19. Fries J F, Green LW, Levine S. Health promotion and the compression of morbidity. Lancet 1989;1:481-483
HPG Schneider Quality-of-life scoring systems 17 20. Kupperman HS, Blatt MHG, Wiesbader H, et al. Comparative clinical evaluation of estrogen preparations by the menopausal and amenorrhoea indices. J Clin Endocrinol 1953;13:688-703 21. Kupperman HS, Wetchler BB, Blatt MHG. Contemporary therapy of the menopausal syndrome. J AMA 1959;171:1627-1637 22. Greene J G. A factor analytic study of climacteric symptoms. J Psychosom Res 1976;20:425- 430 23. Neugarten BL, Kraines RJ . Menopausal symptoms in women of various ages. Psychom Med 1965;27:266-273 24. Greene J G. Constructing a standard climacteric scale. Maturitas 1998;29:25-31 25. Ulrich LG, Barlow DH, Sturdee DW, et al. for the UK continuous combined HRT study investigators. Quality of life and patient preference for sequential versus continuous combined HRT: the UK Kliofem multicenter study experience. Int J Gynaecol Obstet 1997;59 (Suppl1):11-17 26. Zllner Y, Piercy J , Alt J . Mental Heath Aspects of Peri- and Post-Menopausal Women. Attitudes, quality of life, and the role of HRT (Poster). Arch Women's Mental Health 2001;3 (Suppl2):68 27. Zllner Y, Kay S, Abetz L, et al. La qualit de vie sexuelle des europennes. Gyn Info 2001;51:9-11 28. Piercy J , Zllner Y, Kay S, et al. Quality of life in postmenopausal women in five European countries (Poster). Val Health 2001;4:168 29. Karlberg J , Mattsson LA, Wiklund I. A quality of life perspective on who benefits from estradiol replacement therapy. Acta Obstet Gynecol Scand 1995;74:367-372 30. Girod I, de la Loge C, Keininger D et al. Development of a revised version of the Women's Health Questionnaire. Climacteric 2006;9:4-12 31. Le Floch J P, Colau J CI, Zartarian M. Validation d'une mthode d'valuation de la qualit de vie en mnopause. Refs en Gyncol Obsttr 1994;2:179-188 32. Le Floch J P, Colau J CI, Zartarian M, et al. Rduction d'un questionnaire d'valuation de la qualit de vie en mnopause. Contracept Fertil Sex 1996;24:238-245 33. Le Floch J P, Chevalier T, Gelas B, et al. Quality of life improvement and hormonal replacement therapy: comparison of sequential versus continuous combined schedules with 17b estradiol percutaneous gel and nomegestrol acetate. Menopause Rev 1999;4:87-96 34. Hilditch J R, Lewis J , Peter A, et al. A menopause-specific quality of life questionnaire: Development and psychometric properties. Maturitas 1996;24:161-175 35. Hilditch J R, Lewis J E, Ross AH, et al. A comparison of the effects of oral conjugated equine estrogen and transdermal estraldiol-17 combined with an oral progestin on the quality of life in postmenopausal women. Maturitas 1996;24:177-184 36. Zllner YF, Acquadro C, Schaefer M. Literature review of instruments to assess health-related quality of life during and after menopause. Qual Life Res 2005;14:309-327 37. Perz J M. Development of the menopause symptom list: A factor analytic study of menopause associated symptoms. Women Health 1997;25:53-69 38. Hauser GA, Huber IC, Keller PJ , et al. Evaluation der klimakterischen Beschwerden (Menopause Rating Scale [MRS]). Zentralbl Gynakol 1994;116:16-23
HPG Schneider Quality-of-life scoring systems 18 39. Schneider HPG, Doeren M. Traits for long-term acceptance of hormone replacement therapy results of a representative German survey. Eur Menopause J 1996;3:94-98 40. Potthoff P, Heinemann LAJ , Schneider HPG, et al. Menopause-Rating Skala (MRS): Methodische Standardisierung in der deutschen Bevlkerung. Zentralbl Gynakol 2000;122:280-286 41. Schneider HPG, Heinemann LAJ , Rosemeier HP, et al. The Menopause Rating Scale (MRS): Reliability of scores of menopausal complaints. Climacteric 2000;3:59-64 42. Schneider HPG, Heinemann LAJ , Rosemeier HP, et al. The Menopause Rating Scale (MRS): Comparison with Kupperman index and quality-of-life scale SF-36. Climacteric 2000;3:50-58 43. Schneider HPG, Schultz-Zehden B, Rosemeier HP, et al. Assessing well-being in menopausal women. In Studd J , ed. The Management of the Menopause The Millennium Review 2000. New York, London: Parthenon Publishing, 2000:11-19 44. Wiklund I. Methods of assessing the impact of climacteric complaints on quality of life. Maturitas 1998;29:41-50 45. Schneider HPG, Heinemann LAJ , Thiele K. The Menopause Rating Scale (MRS): Cultural and linguistic validation into English. Life Med Sc Online 2002;3:DOI:10.1072/LO0305326 46. Heinemann LAJ , Potthoff P, Schneider HPG. International versions of the Menopause Rating Scale (MRS). Health Qual Life Outcomes 2003;1:28 http://www.hqlo.com/articles/browse.asp 47. J acobs P, Hyland ME, Ley A. Self rated menopausal status and quality of life in women aged 40-63 years. Br J Health Psych 2000;5:395-411 48. Utian WH. The mental tonic effect of oestrogens administered to oophorectomised females. S Afr Med J 1972;46:1079-1082 49. Utian WH, J anata J W, Kingsberg SA, et al. The Utian Quality of Life (UQOL) Scale: development and validation of an instrument to quantify quality of life through and beyond menopause. Menopause 2002;9:402-410 50. Hogervorst E, Yaffe K, Richards M, et al. Hormone replacement therapy for cognitive function in postmenopausal women. Cochrane Database Syst Rev 2002;CD003122 51. Shumaker SA, Legault C, Rapp SR, et al.; WHIMS Investigators. Estrogen plus progestin and the incidence of dementia and mild cognitive impairment in postmenopausal women: the Women's Health Initiative Memory Study: a randomized controlled trial. JAMA 2003;289:2651- 2662 52. Heinemann LAJ , DoMinh T, Strelow F, et al. The Menopause Rating Scale (MRS) as outcome measure for hormone treatment? A validation study. Health Qual Life Outcomes 2004;2:67 53. Hays J , Ockene J K, Brunner RL, et al. Effects of estrogen plus progestin on health-related quality of life. N Engl J Med 2002;348:1839-1854 54. Garratt A, Schmidt L, Mackintosh A, et al. Quality of life measurement: bibliographic study of patient assessed health outcome measures. BMJ 2002;324:1417-1421 55. Schultz-Zehden B. FrauenGesundheit in und nach den Wechseljahren. Die 1000 Frauenstudie. Gladenbach: Verlag Kempkes, 1998 56. Haines CJ , Xing SM, Park KH et al. Prevalence of menopausal symptoms in different ethnic groups of Asian women and responsiveness to therapy with three doses of conjugated estrogens/medroxyprogesterone acetate: the Pan-Asia Menopause (PAM) study. Maturitas. 2005;52:264-276
HPG Schneider Quality-of-life scoring systems 19 57. Limpaphayom KK, Darmasetiawan MS, Hussain RI, et al. Differential prevalence of quality-of- life categories (domains) in Asian women and changes after therapy with three doses of conjugated estrogens/medroxyprogesterone acetate: the Pan-Asia Menopause (PAM) study. Climacteric 2006;9:204-214 58. Derogatis L, Rust J , Golombok S, et al. Validation of the Profile of Female Sexual Function (PFSF) in surgically and naturally menopausal women. J Sex Marital Ther 2004;30:25-36