Anda di halaman 1dari 87

Department of Economics Internt undervisningsmateriale K nr.

44

Guidelines for Writing Papers in Descriptive Economics

Prepared by Hans Linderoth and Jan Bentzen

2007

Foreword Guidelines for Writing Papers in Descriptive Economics is based upon a rewriting and updating of data in Guidelines for Writing Empirical Papers in the Social Sciences Using Statistical Material (HH, 1996). We thank cand. rer. soc. Birgit Nahrstedt for updating some of the data and secretaries Ann-Marie Gabel and Bodil Rasmussen for making the publication ready for press. Hans Linderoth Docent, cand. oecon., ph.d. Jan Bentzen Lektor, cand. oecon.

1. Introduction 1 2. Phases of the work .................................................................................................................... 1 3. Formulating the statement of the problem............................................................................. 4 3.1. Base material........................................................................................................................ 4 3.2. Comparative material........................................................................................................... 6 3.3. Explanatory material............................................................................................................ 8 A. Correlation........................................................................................................................ 9 B. Background factors ........................................................................................................ 10 C. The direction of causation ............................................................................................. 13 D. The cause and effect mechanism................................................................................... 13 E. Short and long run ......................................................................................................... 14 F. The selection of explanatory factors final comments............................................... 16 3.4. Examples of the selection of explanatory material ............................................................ 16 3.4.1. Economic growth ........................................................................................................ 16 3.4.2. Regional differences in income................................................................................... 18 3.4.3. Space heating in private households........................................................................... 19 4. Collecting data and other material........................................................................................ 20 5. Working with the material..................................................................................................... 24 5.1. Table construction.............................................................................................................. 24 5.2. Figure construction ............................................................................................................ 29 5.2.1. Technical criteria........................................................................................................ 30 5.2.2. Logarithmic scales ...................................................................................................... 34 5.2.3. Bar and pie diagrams.................................................................................................. 39 5.2.4. Scatter diagrams ......................................................................................................... 45 5.2.5. Lorenz curves .............................................................................................................. 45 5.3. Using comparative and explanatory material .................................................................... 49 5.4. Standardizing means .......................................................................................................... 52 5.4.1. Price and quantity (volume) indexes........................................................................... 55 5.5. Analyzing time series data ................................................................................................. 61 5.5.1. The elements of a time series ...................................................................................... 62 5.5.2. Moving averages......................................................................................................... 64 5.5.3. Seasonal correction .................................................................................................... 66 5.5.4. Trends and cycles........................................................................................................ 73 6. Making commentaries ............................................................................................................ 76 7. Construction of the report...................................................................................................... 78 References.................................................................................................................................... 81

1. Introduction
The following guidelines are intended for the student writing an empirical paper on a subject in descriptive economics. The guidelines present the means and methods to be used in the preparation of that paper and are applicable to analyses of subjects for which statistical material is the primary source. The techniques presented here are relatively elementary and are intended for students in their first year of study. However, in many cases, these techniques are relevant for the writing of all papers assigned throughout the course of study. The empirical paper has a form that is typically used in short reports that are delivered to, and received from, public authorities or private management. These reports contain issues and problems that are also defined, described, and analysed on the basis of collected statistical material, etc. The following pages contain a description of the work phases one goes through in producing such a report. The content focuses on the methods, based on both technical and analytical principles, for producing reports. The technical principles include techniques for calculating statistical data, for producing appropriate tables, etc., and the analytical principles include techniques for formulating the problem, conclusions, etc. It should be emphasized that not all methods presented here are relevant for all subjects tackled in a paper. The student must decide for him or herself which methods and techniques are appropriate for a successful result.

2. Phases of the work


The empirical paper starts with the idea, the issue and/or the problem one wishes to investigate. These ideas/issues/problems must be precisely defined and set out in the form of a statement of the problem, which in a detailed and concrete manner forms the agenda for the ensuing investigation. Such an agenda demands that all concepts and terms used in the investigation are also precisely defined, that all relevant questions or sub-problems that are to be answered in the investigation are formulated, and that the method to be used in arriving at the solution is presented. A paper in descriptive economics is, for the most part, to be based on statistical material published by institutions in the public sector. The method entails, therefore, an identification of the statistical material that is to be collected.

For example, if the issue or problem is related to the state's future ability to provide for the elderly, the first step in working toward the statement of the problem must be to define the concept "ability to provide". The ability to provide might be defined or measured, for example, as public sector disbursements to the elderly in relation to total public sector disbursements or in relation to tax revenues. In the process of formulating the statement of the problem, one is often led to a narrowing of the issues and problems in the investigation. For example, in referring to the example above, one can choose to ignore service-based public disbursements (disbursements to homes for the elderly, hospitals, etc.) so that the investigation only considers income transfers to the elderly. The definitions or delimitations selected will suggest the formulation of sub-problems: How many persons will, in the future, be counted in the older age groups? How many of the elderly will receive various types of income transfers, etc? These sub-problems indicate a concrete need for material in the form of population projections, pension payments, early retirement benefits, etc. In general, working toward the statement of the problem requires a kind of brainstorming in which one dissects the problem, sets the delimitations, and presents a method for solving the problem. Formulating the statement of the problem (phase l) is treated in-depth in the next section. When the statement of the problem has been formulated, the collection phase can begin. Then the collected material is processed, analysed and commented on. Finally, all the work is strung together into one report. It is important to take the work phases in the order sketched in Figure 1 and, for example, not to begin immediately with collection of the material without first having the steering mechanism in place. If the order is not followed, there is a danger you will collect and waste time on a deal of irrelevant material and that your seminar will consist of loose, unrelated parts. It is, however, a good idea to read relevant chapters in, for example, The Danish Economy before working out the statement of the problem in that knowledge of your subject is a prerequisite for being able to formulate the statement.

Figure 1. Phases of work for the report.


Idea/issue/problem

Formulating the statement of the problem

Section 3

Collecting the material

Section 4

Working with the material

Section 5

Making commentaries

Section 6

Construction of the report

Section 7

When collecting data and information (phase 2), you might encounter previously unnoticed material and points of view that can be relevant to the analysis and which make it necessary to revise the statement of the problem. Therefore, you should be prepared to differentiate between a draft statement of the problem, which results from preliminary investigations during the beginning phase of the work, and the definitive statement of the problem, which gets decided after a substantial amount of material has been collected. You might also encounter new material along the way that suggests a narrowing or broadening of the focus on the issue or problem being investigated. In processing and manipulating the data and information (phase 3), you might feel compelled to work again with earlier phases of the work. For example, if a calculated annual growth rate reveals that a noticeable change has occurred in a particular year, it might be necessary to find relevant explanatory material pertaining to that year. The same kind of need may arise when working in the analysis phase (phase 4). Because the process of discovery and preparation is not necessarily linear, you may wind up cycling around several times between the phases of the work. These various phases will be described in-depth in the following sections.

3. Formulating the statement of the problem


In the previous section, it was indicated that the formulation of the statement of the problem results in a list that serves to identify materials (statistics, legislation, analyses, etc.) needed to carry out the investigation. The formulation itself could also contain a general description of how the issues are to be described, analysed, and judged, given these materials. The following sub-sections will describe three categories of data and information, all of which must be included in producing the report. Base material comprises the data and information that the title of the report directly reflects. If the topic is the deficit in the state budget, basic data and information will comprise statistics that measure this deficit. If the topic is U.S. oil imports, the statistical material will, of course, include quantity and value of these imports. Comparative material comprises the data and information that is used as a standard or scale against which the base material is compared. For example, the deficit in the state budget could be compared with the state's revenues, the deficit in other countries, and/or the GDP, all of which can be used for evaluating the seriousness of this deficit. The third category is explanatory material, which comprises data and information that is used to deepen the analysis by explaining the course of development, movement, and/or changes that have been demonstrated in the base and comparative material. For example, why has the deficit, measured in relation to revenues, risen by a particular amount in a particular period? The following section will treat these categories more thoroughly.

3.1. Base material


If there is any doubt about the meaning of the terms used in the title of the paper, these terms must be clearly and precisely defined. A distinction should be made between theoretical terms and operational terms. Theoretical terms can be defined more precisely using other, more wellknown terms, while operational terms must be defined more precisely using measurement methods. Often, operational definitions are provided in the explanations of terms in the texts in which the data is found. Theoretically, an unemployed person can be defined as a person without work, who wants and can work for a wage that is normally paid to persons with similar qualifications. If one uses statistics that only include individuals eligible to receive

unemployment benefits, however, the theoretical and operational definitions will not agree because there will always be a group of out-of-work individuals who would and could work, but who are not eligible to receive unemployment benefits. In such a case, the operational term is not considered adequate for the theoretical term. If writing a paper titled "Market Sensitivity of the Textile Industry", you must decide how to measure market sensitivity. It would be reasonable to use a measure of production that could be related to measures of production used in other industries and/or other branches. Such a measure could be used to determine whether the textile industry is more or less sensitive than other branches to market fluctuations. A paper titled "Kuwait's Economy" also demands theoretical considerations. It must be decided exactly how to define and measure "economy". Data and information regarding the national accounts, balance of payments, government budgets, etc. might be part of that definition. But in a 15-page paper, there is not enough space for a comprehensive economic description of any country. One must choose the economic factors considered most important for the country of interest and be ready to justify that choice. GDP per capita in constant prices is the term usually used to indicate development in a country's economy. It is problematic to use the term in certain cases because GDP per capita in constant prices can fall during a period, or the growth rate can fall, even if the country has obviously become much more prosperous. For example, this might be the case for an oilexporting country after a distinct oil price rise coupled with reduced oil exports. The reduced oil exports will, ceteris paribus, reduce GDP in constant prices, but the increased oil revenues can be used to increase consumption via increased imports. In this case, it would at least be natural to supplement GDP per capita with consumption per capita as a measure of economic welfare. The Gulf War has also been used as a seminar topic, from an economic perspective. This topic demands a precise discussion of which economic factors might be the most important for the subject and should, therefore, be included. Clarification of the topic "Fuel Oil Consumption" demands neither theoretical nor operational considerations as it deals with the consumption of a well-defined good, which is reported quarterly in the statistics. A long list of examples could be presented here for which the clarification of terms is not necessary. It is the responsibility of the author to decide if the paper title contains problem words requiring clarification.

In addition to clarifying the meaning of the terms used in the base material, clarification should also be made with respect to the choice of time period and the degree of detail. Deciding the time period involves not only the choice of the year in which the investigation begins, but also the interval of years used throughout. If the paper focuses on a 10-year period, it is not in all cases necessary to include material from all 10 years. It can sometimes be a good idea to divide the whole period up into sub-periods, for example. Regarding the end point in the time period, it is ultimately important to use the most recently available material. The degree of detail concerns the division of the base material into sub-groups. For a subject involving age distribution, you must decide how many age groups and what range of ages to use. For a subject involving Denmark's energy consumption, you must decide whether to divide consumption by energy products or consumption sectors. Should consumption, thereafter, be divided into all different forms of energy products, such as petrol, fuel oil, coal, brown coal, etc., or only divided into oil products, solid fuels, etc.? For a subject involving industrial structure, should the manufacturing industry be treated according to its sub-industries, or should the industry be treated as an aggregate? The division of material into sub-groups is usually an essential part of the analysis. The optimal degree of detail is determined by the objective of the paper. The student often uses an unnecessary amount of detail, and this results in a paper with large, unclear tables in which patterns in the material of interest are difficult to figure out. Considerations about the use and definition of terms, time periods, and degree of detail are relevant to not only the base material, but also to comparative and explanatory material. This applies as well to consistency between the three material categories. Students often mistakenly use inconsistent time periods when discussing base material, comparative material, and explanatory material either the intervals are different and/or the beginning and ending dates do not match. As a ruIe, this does not work, especially because the comparative and explanatory material must relate directly to the base material.

3.2. Comparative material


It was earlier mentioned that comparative material should be used as a context for the base material. For example, an analysis of the wages of primary and lower secondary school teachers

might also be compared to the wages of individuals in other occupational groups. Or, an analysis of the history of employment in a particular sector might be related to employment in other sectors and/or to employment in that sector in other countries. Without comparative material, it is not possible to judge if the base material reflects a high, an average, or a low value in a given year, or if a growth rate measuring development, changes, or movement is high, average, or low. In a paper written about copper, information indicated that the total amount of copper in manganese deposits on the sea floor had been estimated at 3 billion tons. Such information cannot stand alone. It must be related to copper consumption and/or quantities of copper from other sources. In Meadows (1972), it is stated that: Given present resource consumption rates and the projected increase in these rates, the great majority of the currently important non-renewable resources will be extremely costly 100 years from now. ... The price of mercury, for example, has gone up 500 percent in the last 20 years; the price of lead has increased 300 percent in the last 30 years. It is clear that the last sentence is included to make more credible the prediction of a steep rise in the future price of raw materials given that the prices of some raw materials have already begun to rise sharply. However, the question is if the prices of mercury and lead really have risen very much? That cannot be concluded without comparative material in the form of price movements of other related products, for example. Besides, a price rise of 300% over the course of 30 years is equivalent to annual rate of 4%, which is hardly more than the rise in prices for many other products. In another paper, it was mentioned that cultivated land area in Iraq increased by less than 2% between 1979 and 1989. The paper concluded by saying "there had been very little change in the size of cultivated land". That seems reasonable given that 2% is a modest number in many cases. But cultivated land area changes very little over a decade, and in most countries, this area actually decreases in size. Therefore, seen in the context of world-wide changes, a 2% increase is relatively large. In a third paper, it was stated that grain was the most important agricultural product. This conclusion was based on the total quantity of production in tons. However, the production of grain should be compared in value terms with the production of other agricultural products if the intention is to identify the most important agricultural product. It would actually be best to use 7

value added as a measure for value. Comparative material forms the context for evaluation. The better this context, the more indepth an analysis of the base material can be made, thus leading to a greater understanding of the issue under investigation.

3.3. Explanatory material


An analysis including only base material, supplemented possibly by comparative material, can only answer how, when, and what questions. The purpose is to map out the objects of the analysis using a certain amount of information. For example, in 2006, there were x unemployed persons on average per week, of which y were ... etc. For another example, in the period 19722006, oil consumption fell by x PJ of which y PJ is due to a fall in oil consumption used for space heating, z PJ is due to a fall in oil consumption in the utility sector, etc. Such a decomposition of the total can be said to explain some of the development in the total number. You can, but only to a certain degree, respond to the "why" question. A deeper analysis of "why", however, requires a cause and effect (or causal) analysis.

Cindep. Explanatory factor

Edep. Explained factor

A causal analysis provides the greatest knowledge about the objects of the analysis. The material that supplements the base and comparative material in a causal analysis is called the explanatory material. The purpose of a causal analysis is to establish a factor C as the cause of a particular effect E. E can be the number of unemployed individuals divided into groups by characteristics at different points in time. For example, in a causal analysis, one might be to explain why unemployment is larger in North Jutland than in other parts of Denmark, or why unemployment is larger among women than among men. The purpose might be to establish a causal relationship between occupation and mortality, between marital status and mortality, or between income and mortality.

In the following sub-sections, the discussion gets around the considerations one should take into account in making a causal analysis. These considerations are relevant both to the choice of explanatory material and to the conclusion phase of the paper (cf. Section 6). A. Correlation The material should reveal a pattern between C and E. If C is occupation and E is mortality, the pattern might consist of a large difference in mortality rates among the occupational groups. Which occupations are hazardous and which are not? To the extent there is no difference, occupation is not an explanatory factor in an analysis of mortality. Occupation is an example of a qualitative variable, the value of which cannot be measured or expressed in numbers. Other examples of qualitative variables include gender, municipalities, countries, marital status. As opposed to qualitative variables, quantitative variables can be expressed in numbers. Examples of quantitative variables include age, height, product, and income. Instead of using occupation as an explanatory factor in the analysis of mortality, one can choose income, as mentioned earlier. These guidelines do not contain a discussion of the statistical tests used to determine if two or more variables are correlated. You must be content to compare the variables using a sketched figure based on the respective variables' values or by listing these values in a table and looking for the pattern in the material. If large values of C correspond to large values of E, then the correlation is positive. If large values of C correspond to small values of E, then the correlation is negative. For example, if unemployment (E) is larger this year while economic growth (C) is lower, then the correlation is negative. As a rule, there is a positive correlation between consumption and income and a negative correlation between consumption and the price of a good.1 The probing for positive and negative correlations has, of course, only significance in an analysis of relationships between quantitative variables. The degree of the relationship or correlation between two variables can be measured by R2, which indicates the degree of linearity between the variables in question. In scatter diagrams,
1

For normal goods, income elasticity is positive and price elasticity is negative.

Excel can display R2 values on charts. The higher the value of R2, the stronger is the relationship between the variables. If R2 is one, there is a perfect linear relationship between the variables. A value equal to zero means there is no relationship at all. B. Background factors The pattern or correlation that is revealed under point A is not necessarily a sign of causality. Correlation can be found among a number of variables for which no causality is present. The correlation can be due to the condition that both C and E are causally connected to a common cause Cl (cf. case 1 in the following figure). Income per capita is correlated with a number of variables that are not necessarily causally connected themselves. For example, there is a positive correlation between GDP per capita and women's participation in the labour force and between GDP per capita and alcohol consumption. The positive correlation between women's participation in the labour force and alcohol consumption that results from these two relationships hardly expresses a causal relationship. At least, this requires a demonstration that women in the labour force, ceteris paribus, drink more alcohol than women who remain at home. C C1 E Case 1: Situation with common causes B Case 2: Situation with contributory causes Case 3: Situation with intermediate causes C E C B E

The usual problem in causal analysis is that factors other than C have significance for E. These other factors are called background factors (B, contributory causes), cf., case 2. In the example of occupation and mortality, the background factors could be age, gender, inclination to smoke, eating habits, alcohol consumption, marital status, etc. In another example, not only is fertility dependent on income, but it is also dependent on occupation, religion, marital status, and residence, among other factors.

10

Case 3 treats intermediate causes. As an example, alcohol consumption could be an intermediate cause between occupation and mortality. In certain occupations, there may be a tradition for an relatively high alcohol consumption. That is, it is not the work itself that is dangerous. Globally, a negative correlation between income and fertility can be displayed, cf. Figure 3.1. Maybe this correlation is based on a positive correlation between income and the mother's level of education and a negative correlation between the mother's level of education and fertility. If this is the correct relationship, fertility will not fall as income rises if women's level of education does not rise as well when income rises. Figure 3.1. Correlation between fertility and GNI per capita, 2004.
7 6 5 Fertility rate, total 4 3 Israel 2 1 0 0 10000 20000 30000 GNI per capita, PPP 40000 50000 60000 China Russia Kuwait Denmark R = 0.4641
2

Saudi Arabia

Luxembourg

Hong Kong, China

Source: World Development Indicators, 2006. Figure 3.1 shows a significant spread around the drawn curve. It shows, for example, that the point for China lies significantly under the curve. This is partly explained by a distinct policy China has for limiting fertility, and is also presumably partly explained by the high level of education women receive in China relative to the level of income. In contrast, points for 11

countries in the Middle East lie above the drawn curve, presumably because of the low level of education for women relative to the level of income. And the relatively low level of education for women in the Middle East can possibly be explained on the basis of religious and cultural background. One must remember, however, that income level in the Middle East has increased tremendously over a short period of time as a consequence of the development in the oil market. Danmarks Statistik2 has shown that the risk of an accident, and the resulting personal damage, associated with private cars that are 8-11 years old are approximately double that for cars that are only 0-3 years old.3 These numbers indicate a clear causal relationship between the age of a car and the risk of an accident. But maybe a substantial part of this relationship can be explained by the age of the driver. It has been documented that drivers under 25, and over 65, years of age run, respectively, 4 times and 2 times the risk of an accident than do drivers between 35 and 64 years of age. And due to economic reasons, drivers of older cars are principally under 25, and over 64, years of age! Therefore, it can be the driver's age that is so decisive for accident risk and not that of the car. In all, it can be said that one faces a complicated network of relationships,4 where a factor can be explained by a series of other factors which themselves can be explained by a series of other factors, etc. These kinds of networks are called causal chains. This involves explanations of explanations. If the purpose of the paper is not to illustrate the relationship between E and a particular explanatory factor C, then the distinction between C and B has no meaning in the formulation of the statement of the problem, in which one takes into consideration only those explanatory factors that should be brought into the analysis. On the other hand, where the relationship between E and C is important, this distinction has great significance for the comments of correlation between two variables. If an important background factor is not accounted for in the analysis, the conclusion will most likely be completely off track.

2 3

This is the name for Denmark's statistical office, Statistics Denmark. News from Danmarks Statistik (NYT), No. 321, 1993. 4 The economist attempts to account for this network of relationships by constructing models that build in the causal relationships among a range of economic variables.

12

C. The direction of causation Does an occupation result in a particular mortality rate, or does a particular mortality rate lead to a particular occupation? Should the arrow (the direction of causation) be turned around? There is hardly any doubt that some occupations require a particular standard of health and thereby are connected to mortality. Often the timing between factors is not clear. Do increased wages lead to increased prices or vice verse? Has the increased mechanization in agriculture led to the increased exodus of workers or vice versa? Does increased income per capita lead to increased levels of education or vice versa? There is a negative correlation between income per capita and agriculture's share of GDP at factor prices. This is not the same as saying that "increased income is the cause of a fall in agriculture's share of GDP" because the increased income may possibly be based on a transfer of labour from agriculture to other sectors where the wages to factors of production are higher than in agriculture. If this is the case, a fall in the share of GDP is a contributory cause to an increased GDP per capita. On the other hand, high economic growth in general encourages the migration of workers from agriculture because high economic growth creates relatively good employment possibilities in the manufacturing sector, for example. This transfer of a factor of production to other sectors reduces the agricultural sector's share of GDP, ceteris paribus. When two factors influence each other (C E), there is mutual causality. One cannot maintain that one factor causes the other. Economic growth and agriculture's decreasing share of GDP are mutually related. In the context of mutual causality, the actual issue being investigated can be decisive for which factor should be treated as the dependent factor (E) and which factor should be treated as the independent factor (C). In statistical analyses, the aim is often to test the explanatory power of a factor. For example, this can be done by investigating if the change in an explanatory factory (C) takes effect before a change in the explained factor (E). That is, the data is analyzed closely to determine if potential changes in C typically lead (in time) to subsequent changes in E. D. The cause and effect mechanism The relationship between C and E might be based on a sequence of events which can be

13

described in more or less detail. Industry x is characterized by work taking place in shifts and involving hazardous substances, etc. By supplementing the investigation with an explanation of the causal mechanism, one can further establish whether the correlation between C and E is of a causal type. To explain the relationship between economic variables, you must use economic theories. In reality, economic theories are brought in as the first step in trying to decide what the explanatory material should consist of, in that these theories point towards material that is meaningful in a given relationship. When the analysis concerns the consumption of a good, it is natural to bring in disposable income as an explanatory factor, as well as other factors. Disposable income is, in itself, dependent on tax policy. Investments are dependent on interest rate movements and economic development in general, etc. In a paper, the terms of trade (export price index/import price index) entered as an important economic growth factor. It was hypothesized that improved terms of trade during a period had led to increased growth. The correctness of this hypothesis depends on why the terms of trade had improved. If it had improved because of increased domestic wages, competitive ability would have become worse, ceteris paribus, which would have influenced the quantity of exports negatively. Improved terms of trade based on domestic increases in costs is, therefore, growth reducing. On the other hand, the terms of trade could have improved as a consequence of increased demand for the country's export goods, and this increases growth. Throughout the first two oil crises, the terms of trade fell for a range of industrial countries as a consequence of the increased import prices for energy. Because energy consumption was/is very price inelastic, at least in the short term, an increased share of income had to be used on energy consumption, which of course reduced the demand for other goods and services, and this reduced growth. Growth was negative in the wake of the energy crises. To summarize, one should be able to justify the choice of explanatory material. The causal connection between C and E must be made plausible. E. Short and long run Many examples can be found in economic theory where the effect is first felt after a period of time has passed. A permanent increase in income leads normally to increased consumption, but

14

the full effect is first felt after consumers become used to the higher income. Higher oil prices lead to an increased demand for other energy sources, but the increase in demand for energy products other than oil is greater in the long run than in the short run because substitutability is greater in the long run than in the short run. If one is interested in the effects in the long run, one cannot be satisfied with data that registers effects in the short run. An incorrect registration with respect to time can result in an incorrect conclusion concerning the direction of correlation (positive/negative) between C and E and as well as the strength of the correlation. If the correlation is negative in the short run and positive in the long run, and one can only determine the short run effects, the chances are high that incorrect conclusions will be drawn. In the example about occupation and mortality, the following time-related sources might incorrectly be drawn in: occupation x occupation x retirement due to disability illness occupation u death death

Occupation x is the hazardous occupation which a worker leaves after a few years. As a consequence of this hazardous occupation, he or she either retires with disability payments or is so ill that he or she chooses the less hazardous occupation "u" after the illness period. In using the correct data to determine the relationship, one can see that a hypothesis relating mortality and occupation should be rejected; too simple a model overlooks the intervening time variables. A paper on economic growth included a section on basic growth factors in the long run. One discussion centred on increases in factors of production, such as capital, labour, and productivity. But the paper included data for only a few years and was limited to a description of short-term fluctuations in GDP. In reality, the long run explanatory factors were not of interest, since the paper very clearly used only data that referred to the business cycle. In another paper, it was mentioned that increased economic growth implied increased public expenditures because greater growth increased government revenues and consequential the possibility for committing to larger expenditures. This positive correlation between economic growth and public expenditures applies in the long run in that a rich country generally makes greater public expenditures than a less rich country does. In the short run, however, increased economic growth will result in a fall in public transfer payments to unemployment benefits and 15

welfare; that is, the correlation is negative in the short run. And since this paper had only data for a shorter number of years, neither the long run relationship nor the included hypothesis was relevant. F. The selection of explanatory factors final comments The objective of the paper is to decide which explanatory material to use. In a descriptive investigation, there is need for only little or no explanatory material. On the other hand, in making a causal analysis, one should not strive to pack in as many explanatory factors as possible, but only to select the presumably most important factors, which can be treated more thoroughly as a result. In the selection of these presumably most important factors, the distinction between short run and long run explanatory factors is especially important. If the analysis is to examine the change in base and comparative data over the short term, the factors having great explanatory power will not be the same as those having explanatory power for long run effects (cf. examples discussed earlier). Often the base material is divided up into groups. As examples, industry in general can be divided up into industry branches, and energy consumption can be divided into sectors as well as energy products. In the selection of explanatory material, one should not select material having great explanatory power for only a small sub-group. This will unbalance the paper. However, in most cases, an analysis will be strengthened if special attention is paid to include explanatory material relevant to analyses of those periods where the changes are distinct. Finally, it should be mentioned that finding a reasonable argument for causality between two variables does not mean that one has proved causality. More sophisticated tests are required. But it can be said that causality is likely, given the arguments made in the paper.

3.4. Examples of the selection of explanatory material 3.4.1. Economic growth


As mentioned earlier, a socio-economic issue most often involves a network of relationships. An analysis of the basis for economic growth can therefore be very complicated, involving a wide

16

range of explanatory variables, cf. Figure 3.4.1. Figure 3.4.1. Causal network of economic growth

Primarily, growth can be explained by the development in factor inputs and the development in output per unit of factor input. The larger the increase in the effort of production factors, the greater is growth, and the greater output per unit of factor input, the greater is growth. To make the matter more complicated, the included factors or variables are not independent of each other. The development of technological progress is not independent of capital effort and educational level. It is also clear that the development in employment is dependent on economic growth rate and vice versa. It is, in general, difficult to isolate the contribution of the individual factors, so some debatable calculated assumptions must be used. These will not be treated here. The variables discussed above are proximate sources of growth. So-called ultimate sources of growth reflect basic relationships in an economy, such as culture (tradition for education, etc.), demography, history, institutional relationships, economic policy, etc. Even in a comprehensive analysis, which is much more than what is expected in a paper, you can select only a limited number of explanatory variables and be satisfied with that. The choice of variables will depend on the actual issues under investigation. Under all circumstances, it is an advantage to be able to recognize the larger causal network when concluding the paper. 17

3.4.2. Regional differences in income


It is apparent that average income for the active worker is a function of a region's distribution of industries and industry-determined wages. A region can have a relatively low average income because the region has relatively many employed in industries where the wages, in general, are low. On average income may be low because wages, themselves, are relatively low for the region for a given industry. It would, therefore, be natural to start by investigating the distribution of industries and industry-determined wages by establishing the explanatory material, as in Figure 3.4.2.
Figure 3.4.2. Explanatory material of regional differences in active workers' incomes.

The regional distribution of industries is determined by the resource base, among other things. The resource base includes agricultural land, fish stocks, tourist attractions, etc. The distribution of industries is also determined by the age and gender structure in the population. For example, not only is the distribution of jobs held by young women different from those held by older 18

women, but the rate of participation also differs between the two groups. It must be remembered, however, that the opportunities for working influence the age and gender distribution found in the working population. For example, poor employment opportunities lead to an emigration of young people, especially. The international business cycle influences industry income fluctuations to different degrees. Industries that produce investment goods for export are especially sensitive to these business cycles. EU's agricultural prices influence earnings in the agricultural sector. The justification for all the arrows in Figure 3.4.2 will not be given here. It is left as an exercise for the student to work out the justification for these sketched relationships, as well as to suggest others. As mentioned in the last section under point F, one should select only those explanatory factors considered most important in relation to the chosen time horizon, among other things. In Figure 3.4.2, one would consider resource base, age, and gender to be long run factors; one would consider agricultural prices and the international business cycle to be short run factors.

3.4.3. Space heating in private households


In Figure 3.4.3, income, prices, temperature, housing area, etc. appear as explanatory factors. Naturally, income influences energy consumption. The higher the income, the higher is energy consumption. High energy consumption, caused by a low outdoor temperature, among other things, will increase insulation activities (the arrow from consumption to insulation) and will lead to a reduced indoor temperature (arrow from consumption to indoor temperature setting) because high consumption means that the proportion of energy consumption in total consumption is high. Desired indoor temperature should, therefore, be seen in relation to prices and income. Changes in outdoor temperature will influence consumption somewhat in the short run, while housing area will influence consumption in the long run. If one is interested in explaining the per capita use of energy for space heating among selected countries, relevant factors will include income, price, housing area, and degree of insulation. Further explanation of Figure 3.4.3 is left to the reader.

19

Figure 3.4.3. Explanatory elements for energy used for space heating in private households.

Often, one must do one's best without relevant explanatory factors because information about these factors cannot be located or is missing. For example, it would most likely be difficult to obtain information about insulation standards in the majority of countries.

4. Collecting data and other material


When the formulation of the statement of the problem is finalized, one knows to a great extent which material needs to be collected in the libraries. While the library staff can help with search techniques, these guidelines will focus on problems that might occur in the process of collecting data and information. As indicated in Table 4.1, different statistical sources often report different results for presumably identical terms. Several sources are often used when one is working with a period

20

where the oldest data must be taken from one source and the newest data must be taken from another source. One can check to see if the shift in sources creates problems by comparing data for the same year in the two sources. If there are significant differences, one ought to carefully read the explanation/definition of terms usually accompanying statistical material. In the conclusion, then, one can draw attention to the divergence in the statistics and provide an assessment as to whether this divergence is significant for the analysis.
Table 4.1. Denmark's total energy requirements and final energy consumption in 2001, as assessed by various organizations (PJ).
Statistics Denmark Gross energy consumption Gross energy consumption, adjusted Danish Energy Agency Total gross energy consumption Total final energy consumption Gross energy consumption, adjusted BP OECD Total supply Total final consumption

787 815 829 642 831 779 828 635

Sources: Statistical Ten Year Review 2003 (Statistics Denmark), The Danish Energy Agency (2002), Statistical Review of World Energy 2001 (BP, www.bp.com), OECD (2003).

Several sources might also be used when one wishes to compare energy consumption in several sectors, for example. That information may not necessarily be found in one source. It should be noted that energy statistics, especially, are plagued by a lack of consistency among sources. In most statistics, international efforts to work out the inconsistencies found in term definition and structure, etc., have been so comprehensive that many comparisons today can be carried out without a problem. The most typical cases of inconsistency in data arise when you rely on statistical material found in books. The material is often incomplete, and term definition is often lacking. It must be emphasized that you should collect data and information from primary statistical sources and not from books, to as great an extent as possible. Not only may books often be filled with errors, they will not contain the most up-to-date material either. One must also pay close attention to the continuous updating of, and corrections made to, statistical data. For example, the numbers in the national accounts are issued in several versions 21

at different periods of time because the primary material used in creating the national accounts is available at different periods of time. To the extent possible, therefore, one must use numbers from the most recent sources. Breaks in the data will also occur when the methods for constructing that data change. These kinds of breaks occur often in a time series. One must asses, then, if the break in the data has significance for the analysis. If this is the case, then the break in the data must be discussed in the paper. A data break can be caused by changes in administrative structure. For example, the reform of local government structure in 1970 like the recent reform (2007) - resulted in a significant change in the number of municipalities and counties. This made statistical comparisons of data collected before and after 1970 either very difficult or virtually impossible. Further, a data break can be caused by a change in the definition of industries and branches. OPEC,5 EU and EFTA have represented a different number of countries at different points in time. This means that you should not only be aware of data breaks in the data issued by these organizations, but also in the data issued by other organizations where similar changes may have taken place. In summary, it is very important that you are aware of significant changes occurring in a time series due to breaks in the data. The student must closely read, and be familiar with, all footnotes, notes, etc. that accompany data. Warnings about breaks in the data, term redefinition, etc. will usually be found in footnotes, notes, etc. Thus, data should only be collected from primary statistical sources as the national statistical bureaus, OECD, ECB, etc. and not from a general search on the internet. Most of the information and data found at different web-sites are not produced in a quality similar to the before-mentioned sources and cannot generally be recommended for use in empirical papers with exemptions, of course. A problem with the electronic data sources is the limited amount of information directly available when accessing the databases compared to the printed, statistical material - and it is often necessary to search for more information about the data, definitions etc, e.g. the OECD homepage (or SourceOECD) where a lot of reports etc. are available along huge amounts of data. Finally, be aware of the different use of 'comma' and 'period' used as separators in the data bases, where e.g. SourceOECD would list a number as
5

Organization of the Petroleum Exporting Countries.

22

1,280.00 which would appear as 1280,00 (or 1.280,00) if Statistics Denmark should report such a number. List of www-addresses http://www.dst.dk/ (Statistics Denmark) http://www.statistikbanken.dk/ (Databank at Statistics Denmark) http://www.sfi.dk/ (National Institute of Social Research) http://www.akf.dk/ (Institute of Local Government Studies - Denmark) http://www.fm.dk/ (Ministry of Finance) http://www.skm.dk/ (Ministry of Taxation) http://www.sm.dk/ (Ministry of Social Affairs) http://www.retsinfo.dk/ (Information on Danish Laws) http://www.oecd.org/ (OECD) http://www.ssb.no/ (Statistics Norway) http://www.scb.se/ (Statistics Sweden) http://www.ae-dk.dk/ (Economic Council of the Labor Movement) http://www.di.dk/ (Danish Industry) http://www.dors.dk/ (Economic Council, Denmark) http://www.ecb.int/ (European Central Bank) http://www.imf.org/ (IMF) http://www.nationalbanken.dk/ (The central bank of Denmark) http://www.undp.org/ (UNDP) http://www.who.int/ (WHO) http://www.doe.gov/ (The American Energy Administration) http://www.iea.org/ (International Energy Agency (IEA)) http://www.iisd.ca/ (International Institute for Sustainable Development) http://www.ipcc.ch/ (The UN Intergovernmental Panel on Climate Change (IPCC)) http://www.ens.dk/ (Danish Energy Agency) http://www.da.dk/ (Danish Employers Confederation) http://www.danmark.dk/ (Rules, transfer income and eligibility) http://www.saf.se/ (Swedish Employers Confederation) http://www.europa.int/comm/eurostat (Eurostat) http://www.europa.eu.int/ (EU) http://www.wto.org/ (WTO) http://www.bis.org/ (BIS) http://www.finansraadet.dk/ (Danish Bankers Association) http://www.forsikringenshus.dk/ (Danish Insurance Association) http://www.ftnet.dk/ (Danish Financial Supervisory Authority) http://www.realkreditraadet.dk/ (The Association of Danish Mortgage Banks) http://www.xcse.dk/ (Copenhagen Stock Exchange) http://www.em.dk/ (Danish Ministry of Business and Industry) http://www.fao.org/ (FAO) http://www.fvm.dk/ (Danish Ministry of Food, Agriculture and Fisheries) http://www.landbrug.dk/ (Links to Danish agricultural institutions etc.)

23

http://www.min.dk/ (Danish Ministry of the Environment) http://www.fedstats.gov/ (Links to US federal agencies) http://www.ks.dk/ (Danish Competition Authority) http://www.worldbank.org/ (The World Bank) http://www.worldwatch.org/ (Worldwatch Institute) http://www.wri.org/ (World Resources Institute) http://www.ilo.org/ (International Labour Organization) fmwww.bc.edu/ec/data.html (Economic and Financial Data) http://www.econlinks.com/ (Economics News and Data) http://www.economagic.com/ (Economic Time Series)

5. Working with the material


Working with the material means producing tables and figures as well as calculating relevant indexes and other data, etc. This chapter covers the techniques for achieving just that. The first two sections treat the techniques for table and figure construction. The sections following apply to how you use comparative and explanatory material as well as how to standardize means and analyze time series data.

5.1. Table construction


The main purpose in using tables is to present statistical material in a clear and succinct form. A text filled with a lot of numbers is difficult to wade through. By creating a clear presentation of the data in a table, the text is no longer cluttered with numbers. A table consists of data presented in a special frame containing all the necessary information for understanding what the data stands for and which sources have been used. Such a frame for a table is illustrated below. The table number is used in the text to refer back to the table. You can use consecutive numbering throughout or use consecutive numbering within each section, as is done here. The title must precisely state what information is found in the table and will most likely consist of three elements. The first element is the statistical unit being counted. The composite whole, also called the population, makes up the sum of all the units in the data. The whole can be, for example, the population of Denmark, and the corresponding unit would be one Dane. The title would begin with "Number of persons in Denmark" or "Population of Denmark".

24

Tabel 5.1.1. Title


Heading for the row variables Headings for the column variables

Row variables

Data

Total Notes, if any: Footnotes, if any: Sources:

The second element is often an identifying variable(s) associated with the units in the table. Identifying variables associated with Danes could be age, gender, income, marital status, etc. If the included variables are age and gender, then, as the second element, they appears in the title as "by age and gender". Note that the categories representing the identifying variables (in the case for gender, these categories are female and male) are not mentioned in the title. The third element included in the title is the time period. As in Table 5.1.2 below, the chosen time period is January 1, 1970 and 2006. The title in this table thus becomes "Number of persons in Denmark, by gender and age, January 1, 1970 and 2006." The categories representing the identifying variables appear in the column and row headings. In this case, there are two variables plus the time period. With three dimensions, it becomes necessary to assign two of the dimensions to either the column or the row heading. If only two dimensions were included, then there would be no need to divide either the column or the row heading. If four dimensions were included, then it would be necessary to divide up both the column and row headings unless either the row or the column heading could be divided up into three. The student is warned against using more than four dimensions in a table because this would create only confusion not clarity, which the table is supposed to achieve.

25

Table 5.1.2.

Number of persons in Denmark, by gender and age, January 1, 1970 and 2006. Female Male Total

1970 2006 1970 2006 1970 2006 ------------------------------------ 1000 persons --------------------------------------0 - 19 years 743 648 780 682 1523 1330 20 - 39 years 667 699 689 713 1357 1412 40 - 59 years 593 754 575 768 1169 1523 60 years and over 472 640 388 523 859 1163 Total 2474 2740 2432 2686 4907 5427
Source: www.statistikbanken.dk

Labels are used in both the column and row headings to ease the interpretation of the table. Note that nothing is gained by writing "gender" above the headings "female and male" because it is already clear that the feature is gender. Correspondingly, there is no gain in writing "country" in the heading above if Denmark, Sweden, Norway, etc. appear, or "year" where 1970, 1971, etc. appear. In the selected example, however, it could be of value to include a heading for the included age groups even if you could figure out what the variable is all about just by reading the title. The reason is that it is not immediately clear what these groupings represent. Next, a table requires mention of the measure used in the data material. Is the measure being made in millions of persons, thousands of units, GJ (billion Joules) or something else? In addition, the measure can be placed in a number of places. If there is only one measure represented in the table, it can be placed last in the title or just above where the data is located. The latter is often preferred (see Table 5.1.2). If there are several measures, these must be placed outside or inside the areas of the respective columns or rows. The measures must not be placed in a footnote or a note where they can easily be overlooked. Note in the first column of Table 5.1.2 that, when all the numbers are added up, they do not match the total given below. This occurs because of the practice of rounding off the individual numbers for presentation while the total is based on the sum of the numbers before they are rounded off. As a result, there is often a mismatch between the sum of the numbers in a column and the total given for that column. Rounding off is practiced because it is not always necessary to include all the digits in the data to give a reasonable presentation of the numbers. For example, it is rarely necessary to write the population of Denmark with seven digits. In Table 5.1.2, only 4 digits are used and the measure 26

is in 1000 persons. Note the use of lines in the table the data are not placed in separate windows, but appear instead as a body. There is room in the table for the sum of the columns; and notes and footnotes are placed under the last line of the table, if there are any. Notes are used to provide any supplementary information about the statistical unit being measured or the table in general; footnotes are used to provide supplementary information about individual elements in the table. For example, there might be a need to discuss the definition of a term or the technique used in calculating some of the numbers in a footnote. The table is made complete by a clear citation of the sources used. When a table is used in a report, you need to identify the source for the information under that table with just enough information that the reader can easily locate the full information for that source in the reference list at the end of the report. The information and form requirements for the reference list are discussed in Section 7. You can identify a source by the author and year of publication, for example, Andersen, et al. (2006), if one has used The Danish Economy as the source. If the author appears several times in the reference list for the same year, one should provide a suffix to the year, such as a, b, etc. (for example, 2006a). This suffix must also be used in the report's reference list. In table 5.1.2, the source is a statistical database. Including the name of the variable (www.statistikbanken.dk/BEF1A) will make it easier to find the data. Diverse organizations, both public and private, often issue reports. When these reports are used as the source for a table, citation of the source should contain either the title and year or number of publication (for example, World Development Report, 2006 which is issued by the World Bank) or the name of the organization and year (for example, The World Bank (2006)). The citation of the source in the table is made complete by a page or table number. A reference to the pages used makes it easier for the opponent, or other interested readers, to reproduce the material. It is a general requirement for technical reports that the included material can be reproduced. Without proper reference to pages or tables, it can be especially difficult to locate the actual material when books are used as sources The rules discussed for citing sources for information used in tables also apply for material that is used in the text. References to sources for material used in the text are placed either in the text (for example, Andersen (2006), p. 25), or as a footnote located at the end of the page. The latter is often preferred in that a series of references to sources in the text can make the text

27

cumbersome. It may be necessary to include numbers/data in the text when the amount to be used is too small to establish a table. For example, one would hardly construct a table to present two or three numbers. These numbers can be mentioned in the text without making the text cumbersome. Footnotes can also be used to annotate the text or to make side comments that are not central to the subject, but which can be interesting to the reader anyway. In Table 5.1.2, only the actual numbers are included. It can be seen that the population increased from 1970 to 2006 and that this increase took place only in the age groups 20 years and over. But the structure in the material becomes clearer if the numbers are presented as percentages, as in Table 5.1.3.
Tabel 5.1.3. Number of persons in Denmark, by gender and age, January 1, 1970 and 2006.

Female Male Total 1970 2006 1970 2006 1970 2006 -------------------------------------------- % --------------------------------------------0 - 19 years 30 24 32 25 31 25 20 - 39 years 27 25 28 31 28 26 40 - 59 years 24 28 24 26 24 28 60 years and over 19 23 16 18 17 21 Total 100 100 100 100 100 100 1000 persons 2474 2704 2432 2686 4906 5427
Source: As in Table 5.1.2.

In Table 5.1.3, the proportion of the population in each age group is shown by gender for 1970 and 2006. The actual numbers are included in the last row, distributed by gender and year, making it is possible for the reader to calculate back to the original numbers found in Table 5.1.2. The table also reveals the development in the population for each gender and age group. If the focus is to be on growth in the population for each gender and age group, the numbers should be presented as in Table 5.1.4. In this table, the change in the population for each gender and age group can be seen directly by comparing the index numbers in 2006 with the index number for each group in 1970, or 100. Instead of index numbers, changes in per cent could have been used. In this case, in the location "0-19 years / 2006 / female", 13% would replace the index number 87. The most useful form of presentation depends on the issue being investigated. But it should be emphasized that in the greater number of cases, you will have to work the data

28

you get in order to present them into the form you want.
Tabel 5.1.4. Number of persons in Denmark, by gender and age, January 1, 1972 and 1992.

0 - 19 years 20 - 39 years 40 - 59 years 60 years and over Total

1970 2006 Female Male Total Female Male Total ------------------ 1000 ------------------- ------------------ 1970 = 100 ------------743 780 1523 87 87 87 667 689 1356 105 103 104 593 576 1169 127 133 130 472 388 859 136 135 135 2474 2432 4907 111 110 111

Source: As in Table 5.1.2.

In summary, a good table fulfils two requirements. First, it meets the technical criteria just described. Second, it presents the material in such a way that the relevant patterns become obvious. This second point concerns not only the calculation of per cent values and/or index numbers, but refers also to an optimal degree of detail. In the previous tables, the material was divided into four age groups. If you were interested in dependence of public sector expenditures on shifts in the age distribution of the population, another division of age groups might be better. However, you should be careful not to use too much detail; relevant patterns can drown in detail. A good table is clear.

5.2. Figure construction


Above all, it should be pointed out that a good figure, like a good table, presents the material clearly. The strength of a figure is that the pattern in the material appears more obvious than in a table, in many cases. This applies especially if one is working with a data series extending over a long period of time. If one wants to compare several data series that only cover 10 years, for example, a figure will also provide greater clarity than a table, as a rule. The next section presents the technical criteria for a figure, while the following sections present various figure configurations.

29

5.2.1. Technical criteria


In an empirical paper, the curve is the most popular type of figure used for picturing one or several sets of data. That is why the curve will be used as the example for describing the technical criteria associated with figures. The figure must have a title and source identification analogous to that used for tables. The title can, of course, be placed in other locations, but the placement on the top of the figure is most often used. Any notes to be added are usually placed under the axis of the abscissa (xaxis). Axis labels must be included unless what is being measured on the axes is obvious. For example, one does not need to include the word "year" when 1980, 1981, etc. appears on the axis (this is most often the x-axis). Axis labels contain often a scale, for example, 1000 persons or millions of $. Also, it is too cumbersome to include too many zeros on the axes. Instead of writing 100,000, 200,000, etc. on the axis, it is better to use 100, 200, etc. and include 1000 on the axis label. The scale can also be placed in the title. Labels to the curves indicate what the individual curves represent (here A and B) and can be placed in several places. The placement at the end of the respective curves is preferred in most cases. In some cases, the curves run nearly together at the end, and so a placement at the end of the curves can be problematic. You can, instead, place a symbol on the curves at a place where the curves are clearly separate from each other. Labels to the curves can also be placed under the x-axis. However, this placement reduces the clarity a bit, especially if the figure includes several curves. One needs to remember the symbols for these representations to be able to read the figure. Figure 5.2.1 consists of two figures showing the development in the number of persons employed in Denmark, where the ordinate axis (y-axis) begins at zero in the upper figure and at 2300 in the lower figure. It seems clear that the curves are very different in the two cases. You might be inclined, therefore, to comment that the curves are different. In the upper figure, you might note the smaller swings in employment, but would focus more on the rising trend, while in the lower figure, you might be caught by the swings in the curve, for example, the increasing employment from 1983 to 1987 as well as from 1993 to 2001. Which presentation is best depends on the issue one is investigating. Normally, it is best to include the point of origin so as

30

not to exaggerate the swings in the data. But in certain cases, even smaller swings can yet be essential for the issue under investigation. If this is the case, these swings of course should be brought out by ignoring the point of origin.
Figure 5.2.1. Number of persons employed in Denmark, 1966-2006, 1000 persons.
3000

2500

2000

1500

1000

500

0 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

2900

2800

2700

2600

2500

2400

2300 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

Source: www.statistikbanken.dk/NAT18.

31

A figure must not be overcrowded, cf., the demand for clarity. In Figure 5.2.2, the world's oil reserves are indicated by country or country group in a bar diagram. But this representation makes it difficult to overview the material. And the overview is not much better in Figure 5.2.3, where the bars from Figure 5.2.2 for the countries for each year are constructed on top of one another. This means that the development associated with those countries in the middle of the columns is difficult to read.
Figure 5.2.2. World oil reserves, by country or country group, 1982-2005, end of year.
35

30

25

20 % 15

10

0 1982 Saudi A. 1984 Iran 1986 Iraq 1988 Kuwait 1990 1992 1994 1996 1998 2000 2002 2004

The Emirates

Venezuela

Non-OPEC

Other OPEC countries

Source: www.opec.org Annual Statistical Bulletin.

32

Figure 5.2.3. World oil reserves, by country or country group, 1982-2005, end of year.
100%

80%

60%

40%

20%

0% 1982
Saudi A.

1984
Iran

1986
Iraq

1988
Kuwait

1990

1992

1994

1996

1998

2000

2002

2004

The Emirates

Venezuela

Non-OPEC

Other OPEC countries

Source: As in Figure 5.2.2.

Figure 5.2.4 shows the development in labour productivity as measured in eight sectors. The figure appears burdened. It does not help that the rather complicated key to the curves has to be placed under the x-axis because of space limitations. This placement makes the figure even less clear.

33

Figure 5.2.4. Labour productivity (GDP in 2000 basic prices per worker) for the main sectors of the economy, 1966-2006.
900 800 700 600 500 400 300 200 100 0 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 All sectors Manufactoring Trade, hotels, restaurants Financial intermediation, business activities Agriculture, fishing and quarrying Construction Transport, storage and communication Public and personal services

Source: www.statistikbanken.dk/NATo7 and NAT18.

5.2.2. Logarithmic scales


In the previous sections, examples of curves were used to illustrate the development of one or several data series (time series), where individual observations were connected into straight lines.6 As mentioned earlier, curves are the figure used most often. Curves drawn with a logarithmic scale on one axis (also called semi-logarithmic) are discussed in this sub-section. The use of the logarithmic scale implies that equal linear distances on the axis correspond to
6

Data can be represented either as flow variables or as stock variables. Flow variables concern a period of time, for example, the number of births during 2006, while stock variables concern a value as of a certain point in time, e.g., population January 1, 2006. Actually, when plotting the point for the number of births in 2006 in a figure, the value for 2006 ought to be identified with the middle of the year, i.e., July 1st, but this is not the custom. More usually, the number of births for all of 2006 is indicated on the x-axis. Corresponding indications are used for stock variables in

1000 DKK./worker

34

equal percentage changes on a normal scale. This implies that curves with the same slope have the same growth rate, as measured in per cent. Figure 5.2.5 illustrates three cases of varying slopes, where each case represents two curves drawn with the same orientation. In all three cases, the curves run parallel, which means that the two time series depicted by the curves have the same percentage growth rate. In the first case, the curves are straight lines, have constant slopes, and have, therefore, constant growth rates. This is, in other words, an exponential function growing by a constant per cent from period to period. In many cases, it can be expedient to calculate the growth rates and include them in the accompanying discussion.7 Seen over the long run, many time series follow approximately an exponential course, which can be depicted on a semi-logarithmic scale.
Figure 5.2.5. Logarithmic scale and growth rates.
Log
Case 1 Growth rates: Identical and constant Caee 2 Growth rates: Identical and increasing Case 3 Growth rates: Identical and decreasing

In the second case, the increasing slopes of the curves mean increasing growth rates; and in the third case, the growth rates fall with time. It is apparent that the logarithmic depiction is especially useful when you wish to compare the growth rates of different time series. A second advantage with semi-logarithmic scales is illustrated in Figure 5.2.6. In the upper figure, a normal scale for GNP per capita is used; in the lower figure, a semi-logarithmic scale
that one as a rule includes the correct date in the title. You could write "Population of Denmark, January l, 2006" and plot the population value on the x-axis. 7 yt = yo(1+r)t indicates that y grows exponentially with the growth rate r. yt is the value at time t, yo is the value at time o and t is the number of time periods between o and t. When yt yo and t are known, r can of course be determined from the equation by isolating r. It is, however, more normal to use a PC or calculator, where r is found simply by keying in the three known values. The known rules for logarithims transform the previous equation to log yt = log yo + tlog(l+r), the equation for the straight line represented in the first case, where log yo is the intercept on the y-axis and log(1+r) is the slope.

35

for GNP per capita is used. In the upper figure countries are added by a textbox. In many cases, the location of countries will be of interest. A trendline, the equation of the trendline, as well as the corresponding R2 value are displayed on the chart.
Figure 5.2.6. Total fertility rate as related to GNP per capita world-wide, 2004.
7 6 5 Fertility rate, total 4 3 Israel 2 1 0 0 10000 20000 30000 GNI per capita, PPP 40000 50000 60000 China Russia Kuwait Denmark R2 = 0.4641 Luxembourg Saudi Arabia

Hong Kong, China

7 6 5 Fertility rate, total 4 3 2 1 0 100 y = -0,8131Ln(x) + 9,6942 R2 = 0,4641

1000

10000

100000

GNI $ (PPP) per capita (log scale)

Source: The World Bank: World Development Indicators.

36

Many countries have a very small per capita income compared with that of western industrialized countries. In a plot of observations of per capita income using a normal scale, those for the poorer countries will lie in a large dump close to the y-axis. If the plot is instead made using a logarithmic scale, the clump will dissolve, and the material will stand more distinct. Therefore, the semi-logarithmic scale is often better than a normal scale when comparing numbers that differ by magnitudes. This advantage is illustrated even more clearly in Figure 5.2.7. The curve of employment in the electricity, gas, and heat sector is nearly one with the xaxis in the upper figure, while the corresponding curve is clearly represented in the lower figure and lies distinctly separate from the x-axis. The lower part of Figure 5.2.7 illustrates, in addition, that total employment was constant after 1966. However, this constant level of employment does not appear in the upper figure. This illustrates the weakness of the semi-logarithmic scale. It is often not useful for illustrating smaller changes that can be important in the investigation of certain issues.

37

Figure 5.2.7. Employment in the electricity, gas, and water sector and in Denmark in general, 19662006.
3000

2500

Total

2000 1000 persons

1500

1000

500 Electricity, gas and w ater 0 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

10000

Total 1000 1000 persons (log scale)

100

10

Electricity, gas and water

1 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

Source: As in Figure 5.2.4.

38

5.2.3. Bar and pie diagrams


In bar diagrams, data is represented by a column or part of a column. There are two types of bar diagrams. One type uses qualitative variables and has no scale. Municipalities, gender, countries, etc., are qualitative variables, where specific values for gender are male/female, specific values for countries are Saudi Arabia, Iran, etc. Normally, a ratio (e.g., male/female) is measured along the y-axis and the qualitative variable is measured along the x-axis. Because there is no scale along the x-axis, the bars can be placed freely, and one uses normal room between the bars to create clarity. The bars can also be placed as extensions of each other or next to each other. The other type of bar diagram is called a histogram and is used for illustrating quantitative variables. Quantitative variables are age, income, length of marriage, company profits, size of farms, etc. The quantitative variable is divided up into intervals placed on the x-axis only after the scale has been determined. Next, the unit intervals must be chosen. This can be seen in the following example.
Table 5.2.1. Number of divorces, by length of marriage, 2005.
Under 1 1 2 3 4 5 6-7 8-9 10-14 year year years years years years years years years 169 568 872 1088 1277 1107 1763 1416 2816 1) Excluding 1 case for which duration of marriage was not given. Source: www.statistikbanken.dk/SKI107. 15-19 years 1832 20-24 25 years Total years and over 1008 1383 15299

In Table 5.2.1, the intervals have different widths. This is not accounted for in the upper part of Figure 5.2.8 which gives the impression that a large number of marriages ended in divorce after 10-14 years of marriage.8 This is not the case. The problem is that the width of that interval is not consistent with the unit interval (which is one year) and therefore overrepresents the importance of divorces for the time period. By plotting instead the number of divorces per year of marriage as in the lower part of Figure 5.2.8, the correct picture of the relationship between the two variables will be made. It can be seen that the greatest number of divorces occurs after four years

The interval measures marriages that have lasted 14.999 years; that is, the interval measures up to, but does not include, the 15th year.

39

of marriage.9
Figure 5.2.8. Number of divorces, by length of marriage, 2005.
3000

2500 Number of divorces

2000

1500

1000

500

0 0 5 10 15 20 25 30 Duration of marriage, years

1400 1200 1000 25 years and over 800 600 400 200 0 0 5 10 15 20 25 30 Duration of marriage, years

Source: As in Table 5.2.1.

This conclusion should be taken with reservation because it is not clear how many marriages make up the basis for the divorce data. In other words, one must know the divorce rate for the individual years of the marriage's duration. to be able to determine how many years of marriage are associated with the breaking up of the greatest number of marriages.

No. of divorces pr. year of marriage

40

The frequency in a histogram is determined by the area of the bar; the number of divorces in the interval 10-14 years is the column height (563.2) multiplied by the ratio of the interval width to the unit interval (5/1) which is equal to 2816. The column height can be calculated as the frequency divided by the number of times the interval width is larger than the unit interval.10 It is clear that changing the widths of intervals should not change a figure completely when wishing to illustrate the relationship between two variables. The figure should look like what one would have if all the information about the number of divorces occurring for each length of marriage were available. This ideal is not entirely fulfilled when the distribution is skewed. Figure 5.2.8 is skewed to the right (the tail is on the right). It must, then, be assumed that there are more observations in the first half of an interval (prior to the peak) than in the last half of an interval. The frequency for the interval at 25 years and over is expressed as a line in the upper part of Figure 5.2.8 without closure to indicate that the interval is open. One could choose to close the interval at, for example, 50 years and then calculate the height of the column by dividing the number of divorces by 25. The open interval can also be represented as a rectangle placed appropriately in the figure, as in the lower part of Figure 5.2.8, where a rectangle is drawn in corresponding to 1383 divorces. This area can be immediately compared with the other areas in the figure. Another example is the number of tax-paying persons, arranged according to size of taxable income, as in Table 5.2.2. Income is a quantitative variable, so the frequency will also be plotted by unit interval, chosen to be 25,000 DKK in Figure 5.2.9.

10

The various interval widths make it a little difficult to use a graphics programme. One can use a scatter diagram which represents the relationship between two variables with points, cf. Section 5.2.4. The points are used to draw the columns and the points are erased after the columns are drawn in.

41

Table 5.2.2. Number of tax-paying persons, by size of taxable income, 2005 Total income mill. DKK 1,975 4,625 11,150 31,463 57,243 67,801 69,008 71,500 77,477 73,302 112,722 64,966 39,904 120,968 804,103 No. of persons % 7.1 2.9 4.0 8.1 11.5 11.3 9.8 8.7 8.4 7.1 9.5 4.6 2.5 4.6 100.0 No of persons accumulated % 7.1 10.0 14.1 22.2 33.7 45.0 54.8 63.5 71.8 78.9 88.4 93.0 95.4 100.0

Income, DKK < 25,000 25,000 - 49,999 50,000 - 74,999 75,000 - 99,999 100,000 - 124,999 125,000 - 149,999 150,000 - 174,999 175,000 - 199,999 200,000 - 224,999 225,000 - 249,999 250,000 - 299,999 300,000 - 349,999 350,000 - 399,999 400,000 and over Total

1000 persons 312 125 177 355 502 495 426 381 365 309 414 202 107 199 4369

Income Acc., % 0.2 0.8 2.2 6.1 13.2 21.7 30.3 39.1 48.8 57.9 71.9 80.0 85.0 100.0

0,5B(A+(A+ C)) 0.71 1.45 6.00 33.62 110.98 197.19 254.80 301.89 369.18 378.79 616.55 349.37 206.25 425.50 3252.26

Source: Statistikbanken.dk/IF13 and IF23.

There are a number of persons whose taxable income equals zero. This is without doubt the most typical income to the extent the material is divided up by very small income intervals. Here, an open interval is used for income levels at 25,000 and under. The interval 250,000 to 299,999 DKK is two times the unit interval. This means that the column height in Figure 5.2.9 is only 207, rather than 414, as is shown in the table. The figure shows a sharp fall in the column height after 250,000 DKK. There is hardly doubt that there are more taxpayers between 250,000 and 274,999 than between 275,000 and 299,999, so the figure is drawn somewhat incorrectly, cf., the discussion of the examples regarding length of marriage and divorce. Therefore, material should be reported with as small an interval width as possible. If the same interval width is used throughout, it implies that the unit interval is equal to the width of the interval. There is no need to list the unit interval on the axis label in this case. The remaining data in the table will be used in section 5.2.5.

42

Figure 5.2.9. Number of tax-paying persons, by size of taxable income, 2005.

600

500 1000 persons, unit interval 25,000 DKK

400

300

200

100

0 0 50 100 150 200 250 300 350 400 450 Taxable income in 1000 DKK

Source: As in Table 5.2.2.

Population pyramids are a special form of bar diagrams, where the frequency is placed on the xaxis instead of on the y-axis. Circle diagrams are used for illustrating per cent distributions. Figure 5.2.10 uses a circle diagram to show the distribution of global oil reserves among the eight countries or country groups used in Figures 5.2.2 and 5.2.3. For clarity's sake, it is recommended that labels for the individual sections of the circle be written in proximity to the respective areas, as in the upper part of the figure, instead of written elsewhere, as in the lower part of the figure. In addition, you are warned against using too much detail in the circle diagram and against using too many of them. It is apparent that 10 circle diagrams or more in a paper to illustrate the development in global oil reserves since 1960 is not at all sensible. One table or two curve diagrams (four countries/country groups in each diagram) would be much more preferable.

43

Figure 5.2.10. World oil reserves, by country, 2005, in per cent.

Other OPEC countries Saudi Arabia

Non-OPEC

Iran

Venezuela Iraq The Emirates Kuwait

Saudi Arabia Iran Iraq Kuwait The Emirates Venezuela Non-OPEC Other OPEC countries

Source: www.opec.org Annual Statistical Bulletin.

44

5.2.4. Scatter diagrams


As mentioned earlier in the section on formulating the statement of the problem, one normally includes explanatory material in a paper. This is often done by comparing the base material (the explained variable) and the explanatory variable in a curve, where time is on the x-axis. One looks for a pattern in the material that indicates a relationship between the two variables. In many cases, this pattern can be illustrated using a plot of the values (most usually values from the same year) of the explained variable and the explanatory variable in a so-called scatter diagram. The scatter diagram is used in Figure 5.2.6 with GNI per capita as the explanatory variable and fertility as the explained variable. Of course, a lot of variables are correlated with income.

5.2.5. Lorenz curves


GNI per capita differs significantly among the countries of the world. This variation or skewness in global income distribution can be illustrated using a histogram, where GNI per capita is divided up into intervals, and the number of countries falling within the individual intervals is, then, the value that determines the size of the columns. One could choose to let the value for each country be determined by population size and assume that all persons in a given country have an income corresponding to the respective country's GNI per capita. In this case, China, with a population approximately 225 times that of Denmark, would be represented by a column that is 225 times larger than that of Denmark. The skewness in a distribution can also be illustrated in a Lorenz curve, as in Figure 5.2.11. The Lorenz curve is constructed in the following way: First, the countries are arranged according to size of GNI per capita. Next the countries' per cent share of world GNI and world population are calculated, respectively, and, after that, accumulated. Finally, the two accumulated per cent shares are plotted in a scatter diagram. From this, one can read how large a share of the world's GNI accrues altogether to the poorest 50% of world population. The dotted line from 50% on the x-axis to the curve indicates on the y-axis that the poorest 50% receive approximately 5% of the world's GNI. One can also read from the curve that the poorest 80% of the population receive about 16% of the world's GNI. This means that the distribution of the

45

world's GNI is very much skewed. Note that no consideration is taken for the spread of income within the individual countries.
Figure 5.2.11. GNI, by world population, 2004, Lorenz Curve.
% of GNI, accumulated
100

90

80

70

60

50

40

Totally even distribution

30

20

10

Totally uneven distribution

0 0 10 20 30 40 50 60 70 80 90 100

% of population, accumulated Source: The World Bank: World Development Indicators

A totally even income distribution means that the per cent share of GNI and population are the same over the entire curve, as illustrated by the straight line (totally even distribution) in Figure 5.2.11. A totally uneven income distribution means that the person with the highest income has, in fact, all the income. This distribution is represented by a horizontal curve following the x-axis up to 100% and the vertical line from 100% to the top of the diagram. The closer the Lorenz curve lies to the line illustrated totally even distribution, the more equal the distribution. By drawing a Lorenz curve for several years of data, one can determine with a picture if the direction of change has been toward greater global equality or the opposite. Table 5.2.2 showed the distribution of taxable income for taxpayers and Figure 5.2.9 showed

46

a histogram of this distribution. The data can also be represented in a Lorenz curve, as in Figure 5.2.12.
Figure 5.2.12. Taxable income, by size of taxable income, 2005, Lorenz Curve.
100 90 80 % of taxable income, acc. 70 60 50 40 30 20 10

C
0

A0

10

20

30

40

50

60

70

80

90

100

% of taxpayers, acc.

Source: As in Table 5.2.2.

It is apparent that the distribution of data in Figure 5.2.14 is not nearly as skewed as the distribution shown in Figure 5.2.13. 80% of the taxpayers with the lowest income received approximately 60% of taxable income. In Figure 5.2.13, the corresponding amount was approximately 16%. It seems as if world income distribution is more skewed than income distribution in Denmark. Such a conclusion should be taken with precaution given that different definitions of income have most likely been used in calculating income distribution. In addition, the different income definitions each have weariness that affects a true calculation of income distribution. A substantial criticism can, in any case, be made against using GNI as a measure of prosperity, and

47

taxable income can be reduced when deductions are taken into account. The value of taxable income does not take into consideration that income varies over lifetimes. Two persons who have the same lifetime income will most likely receive largely different taxable incomes for a given year. The skewness in the distribution can also be illustrated using the the Gini coefficient which measures the ratio of the area bounded by the line AB (totally even distribution) and the Lorenz curve, divided by the area of the triangle ABC = 0,5*100*100 = 5000. The larger the Gini coefficient, the larger the skewness in the distribution. It is apparent that an equal income distribution results in a Gini coefficient of 0 and a totally unequal income distribution results in a coefficient of 1. The Gini coefficient for the curve in Figure 5.2.12 is calculated to be 0.35.
Calculation of the Gini coefficient.

Gi

Yi

Yi

Xi

Start by calculating the area between the Lorenz curve and ACB (totally uneven distribution). Area income interval i = Yi * Xi + 0,5 (Xi*Gi) = 0,5Xi (Yi + (Yi + Gi)). Income interval 75.000 99.999 DKK: = 0,5 * 8.1 (2.2 + 6.1) = 33.62, cf. Table 5.2.2. Total area is then the sum of the area for all the income intervals = 3252.26, cf. Table 5.2.2. The area bounded by AB and the Lorenz curve = 5000 3252.26 = 1747.74. Gini coefficient = 1747.74/5000 = 0.35.

Deciles and quartiles are often used in income and wealth statistics. As with the construction of Lorenz curves, taxpayers are arranged according to size of income. The first decile (10% fractile) is the value of income below which 10% of the observations lie. The second decile (20% fractile) is the income below which 20% of the observations lie. The first quartile corresponds to the 25% fractile and the upper quartile corresponds to the 75% quartile. The 50%

48

quartile is also called the median. The median is equal to the taxable income of the person who is located just in the middle of the distribution when the taxpayers are arranged according to size of taxable income.

5.3. Using comparative and explanatory material


In discussing the formulation of the statement of the problem, it was seen that it is of great value to put the base material in a context using comparative material, in many cases. Comparisons can be carried out by assembling the base and comparative data in the same table or figure and by using an accompanying text to highlight the differences in the two. In an analysis of the industry structure in the County of Ringkbing, it would be natural to compare that with the industry structure in Denmark in general. This comparison could be made based on the per cent distribution of employment within each industry, as in Table 5.3.1. But the use of a simple calculation can often be an advantage when comparing distributions. In Table 5.3.1, the coefficients in the last column are calculated by dividing the per cent value for Ringkbing by the corresponding value for the whole country. If the coefficient is greater than 1, a relatively large number of persons are employed in the respective industry in the County of Ringkbing. The table shows that there are relatively many employed in agriculture, fishing and manufacturing (especially manufactory of textile and leather) in the County of Ringkbing compared to Denmark as a whole. In this example, two distributions of the same kind are compared, that is, employment distributed by industry. But these relative coefficients can also be used with advantage when relating distributions of different kinds to each other. In energy analyses, relative energy intensities can be calculated for different industries by dividing the share of energy consumption for the industry (as a proportion of that for all industries) with the share of production for the industry. This results in a measure of the relative energy demand for the production of individual industries. No further comment of Table 5.3.1 will be made here. It should be pointed out, however, that all material used in the paper must be processed and worked on so that the relevant comparisons appear clearly in the report.

49

Table 5.3.1. Employment, by industry, County of Ringkbing and Denmark, January 1, 2005.

Agriculture, horticulture and forestry Fishing Mining and quarrying Manufactory of food, beverages and tobacco Manufactory of textiles and leather Manufactory of wood products, printing and publication Manufactory of chemicals and plastic products Manufactory of other non-metallic mineral products Manufactory of basic metals and fab. of metal products Manufactory of furniture, manufacturing n.e.c. Electricity, gas and water supply Construction Sale and repair of motor vehicles sales of auto. Fuel Wholesale except of motor vehicles Retail trade and repair work exc. of m. vehicles Hotels and restaurants Transport Post and telecommunications Finance and insurance Letting and sale of real estate Business activities Public administration Education Human health activities Social institutions etc. Associations, culture and refuse disposal Activity not stated Total Source: www.statistikbanken.dk

Relative coefficient ------------------------ % ------------------3.12 6.01 1.93 0.16 0.69 4.44 0.14 0.05 0.39 2.72 4.36 1.61 0.37 2.51 6.78 2.10 3.85 1.83 1.87 0.57 6.19 0.99 0.53 6.27 2.25 5.80 6.96 3.07 4.28 1.87 2.71 1.67 9.74 5.46 7.54 5.81 12..08 5.29 0.44 100.0 1.45 0.62 9.78 1.94 0.43 6.15 2.37 6.48 6.93 2.43 3.36 1.13 1.96 1.25 5.72 3.89 6.37 4.57 11.27 4.07 0.37 100.0 0.77 1.09 1.58 1.96 0.81 0.98 1.05 1.12 1.00 0.79 0.79 0.60 0.73 0.75 0.59 0.71 0.85 0.79 0.93 0.77 0.83 1.0

Denmark

Ringkbing

In the previous discussion on causal analysis (and in that on formulating the statement of the problem), it was shown that an event or an effect is normally caused by a long series of factors. If the wish is to determine the relationship between C and E, it is apparent that the background factors B should be seriously considered. An example was given where C was occupation and E was mortality. The B's were then, other factors that had contributed to death, such as gender, age, alcohol misuse, etc. 50

The background factors must be taken into account when clustering the data. That is, a comparison is made of groups that are equivalent with respect to background factors. In the example with mortality, you would, then, compare mortality between occupations u and x for those groups that are equivalent with respect to gender, age, or alcohol misuse. If the most important B's are included in the clustering, the remaining difference in mortality can probably be ascribed to occupation. Clustering is based simply on a division of the total into groups. This could be called an additive analytical method. Deaths are divided up into groups in which different values of the characteristics relevant for an analysis of mortality are represented. If you are to analyze energy consumption, it would be natural to divide this consumption up into sectors or purposes consisting of sub-sectors or sub-purposes in which development in consumption is dependent on the same explanatory factors. Industries' energy consumption is dependent on factors other than factors affecting energy consumption in the heating of private households. Industries' energy consumption is, to a great extent, dependent on industry production, while private household's energy consumption is very much dependent on explanatory factors such as disposable income, the relative price of energy, etc. The multiplicative analytical method can be used to advantage for other problems being investigated. If you are analyzing petrol consumption, it is natural to include the following explanatory factors: petrol consumption/mile, miles/car, number of cars/GDP (constant prices) and GDP (constant prices). All these factors multiplied by each other result in petrol consumption. The first factor is a measure of energy intensity for petrol-driven cars. This factor is dependent on petrol prices, among other things. The higher the price of petrol, the higher is the willingness to drive in cars that get high mileage per gallon. The second factor is a measure of how much the car has been used. This factor can also be assumed to be dependent on petrol prices and disposable income. The third factor links the number of cars together with the current measure for economic development. An increasing GDP will, ceteris paribus, result in a larger fleet of cars. These explanatory factors can be compared in a table or figure with data for a series of years, and one can calculate the individual factor's contribution to the change in petrol consumption. The multiplicative method of analysis is used also in the calculation of the standardized mean, which is discussed in the next sub-section.

51

5.4. Standardizing means


Let us say the result, E, can be explained by the multiplication of two factors, B1 and B2' and one wishes to know how much one factor influences the result when the other factor is held constant (i.e., the calculation is standardized).

B1 Components of the standardized mean B2 E Result

In a note to Table 5.4.1, it is indicated that the proportion of women employed in manufacturing in 1979 was somewhat larger in the County of Ringkbing than in the rest of Denmark. The differences in the proportion of women employed can partly be explained by the differences in manufacturing structure and partly by the differences in the proportion of women in the various manufacturing branches. The data can be clustered using the proportion of women employed in the individual manufacturing branches for Denmark as the standard, as in Table 5.4.1. It is calculated, then, how many women would be employed in the County of Ringkbing if the manufacturing groups in Ringkbing employed the same proportion of women that the manufacturing groups in Denmark employ in general. This calculation yields 11,635 women, corresponding to a proportion of woman equal to (100 x 11,635/35,415) 32.9. But the actual share of women was 29.9.11 That is, the proportion of women employed in the individual branches of manufacturing in the County of Ringkbing was lower than that for those branches in the rest of Denmark. It is interesting to note, however, that the County of Ringkbing is the industrial centre for industries in which the share of women employed was (is) high, for example, the textile, and leather industry.

11

Statistisk rbog 2006, S 2005 Table 114.

52

Table 5.4.1. Proportion of women employed in manufacturing, by sector, 2005.


Total numbers Proportion of employed in county of women Ringkbing Denmark1 78 13.0 Mining and quarrying 6287 40.6 Food, beverages and tobacco 3626 54.0 Textiles and leather 5545 32.3 Wood products, printing and publication 2088 41.6 Chemicals and plastic products 898 18.5 Other non-metallic mineral products 14095 23.8 Basic metals and fabric metal products 2798 33.2 Furniture, manufacturing n.e.c. Total 31.4 35415 1. Number of employed women x 100/total employed. 2. The share of women employed in all of Denmark multiplied by the number of men Ringkbing/100. Source: Statistikbanken.dk Calculated proportion of employed women in the country of Ringkbing2 10 2550 1959 1792 869 166 3360 929 11635 and women employed in

An analogous example could be made for the changes in energy intensity. Using data from 1973 through 2005, energy intensity is measured by energy consumption divided by GDP in constant basic prices. The change in the total intensity can partly be explained by changes in the intensity for various industries and partly by the shift in the relative significance of industries, as measured by their contribution to GDP at basic prices. GDP at basic prices, distributed by industries in 1973, can be chosen as the standard. By multiplying this standard by the energy intensities in 2005, the energy consumption can be calculated as if there had not been changes in industries' contribution to GDP at basic prices from 1973 to 2005. The calculated energy consumption for 2005 is then compared with the actual energy consumption for 1973, and if the calculated consumption is less than the actual, the energy intensity for industries has fallen during the period. The calculated consumption can be expressed in per cent of the actual consumption, and the conclusion could be that energy consumption has fallen by x% as a result of the industries' falling energy intensity. It might be easier to sketch the explanatory factors and the calculated results in the following:

53

Energy intensity 1973 GDP at basic prices distributed by industries 1973 2005 Actual Calculated 2005 Calculated Actual Diagram for standardized means

By using the above diagram, it should be clear what the differences are and which factors are responsible for these differences. When comparing the results in the upper row, the standard is GDP at basic prices distributed by industries in 1973. The difference in results should, then, be assigned to the other factor, here energy intensity for the various industries. Energy intensity could also be used as the standard and a calculation could be made for how much energy consumption changed as a result of the development in GDP at basic prices. A corresponding diagram can be worked out for the proportion of women employed in industry in the County of Ringkbing and in Denmark in general. An analogous calculation of standardized means can be made using population statistics. The total fertility rate is dependent on both the inclination of women to give birth as well as the number of women in the child-bearing years. When measuring women's fertility, you would need to neutralize the age structure. Said in another way, the age structure must be standardized when illustrating the development in fertility. This is done by calculating total fertility for 1000 women going through the child-bearing years. This measure is, then, independent of the number of women in the child-bearing years. The national accounts contain a standardized share of wages. The total wage share (total wages/GDP at basic prices) is dependent on both the wage share for the individual industries as well as the industry structure. By standardizing the industry structure, the development in the wage share can be illustrated. There are many examples in which the final value is the result of the multiplication of two factors. In all of these examples, standardized calculations can be used to advantage.

54

5.4.1. Price and quantity (volume) indexes


The most frequently used techniques for standardization appear in connection with the calculation of price and volume indexes. Movements in values depend on both price and quantity changes. It is interesting to know, for example, if increased domestic consumption is caused by both increased consumer prices and increased consumption in terms of quantity. If you wish to isolate the pure price movement, the quantities must be used as a standard. In this section, index and other related calculations will be demonstrated. The price index represents a total expression of the movement in prices for several goods or services. In the following, the discussion is limited to goods only. The problem with index calculations is how to determine the weights that appropriately represent price movements for individual goods used in the summary price index. The weight problem is solved in different ways in the following three, most popular index formulas.
Laypeyres price index:

LA t :o

p = p
i i

i ,t

qi ,o qi ,o

100

i ,o

The budget method:

=
i

pi ,t pi ,o

Bi ,o 100,

Bi,o =

p
i

pi ,o qi ,o
i ,o

qi ,o

i = l,....,m, p = prices, q = quantity, and Bi,o = the budget share for good i in year o. The numerator indicates the expenditure on the quantities of m goods bought in the index base year (year o) valued at the prices in the final year (year t). When this is expressed in relation to the same quantities valued at the prices of the base year of the index (the denominator), the result is the price increase for the m goods from year o to year t.12 The Laspeyres index uses the quantities of the base year of the index as the standard, and this means that this index measures price movements for a fixed goods combination. The index can also be calculated using the (equivalent) budget method, in which the price increase for the individual good is weighted by the share of the expenditure on that good in the budget for the base year of the index. The greater the weight given the good consumed, the

55

stronger is the representation of the price increase of this good in the consumer price index.
Paasche price index:

PA t :o

p = p
i i

i ,t

qi ,t qi ,t

100

i ,o

The Paasche price index uses up-to-date weights, which means the weights derive from the current time period. The calculation of the Laspeyres and Paasche indexes are illustrated in Table 5.4.2.
Table 5.4.2. Calculation of the beer and wine index.
Year 0 1 2 3 4 Beer q 100 75 100 135 50 p 4 4 4 4 4 q 50 75 50 30 100 Wine p 10 10 12 12 12

PtLA :o
100 100 111 111 111

PtPA :o
100 100 111 107 117

Table 5.4.2 shows that the two indexes do not react on pure quantity changes, e.g., the first year. The table also shows that the two indexes are identical when no quantity changes have taken place, e.g., in the second year. This relationship is evident when the two indexes are identical, i.e., when qo and qt are identical. The table further shows that the price increase calculated by the Laspeyres index is larger than that calculated by the Paasche index when consumption of the good that has become relatively cheaper increases (beer consumption increases relative to wine consumption; the beer price has fallen relative to the wine price), e.g., the third year. Normally, consumption of a good will increase relative to the consumption of all other goods when the price of that good decreases relative to the prices of all other goods. The Laspeyres index does not take into consideration this substitution that takes place when relative prices change. This results in a numerator that is too high because prices and quantities relate to different years. Therefore, the index overestimates the real price increase. In contrast, the Paasche index underestimates the real price increase (the denominator is too high). Table 5.4.2 shows, however, that the two indexes "exchange places" when consumption increases relatively for the good for which the price has increased relatively, e.g., the fourth year.
The ca1cu1ation can, of course, be made on a period less than l year. The time dimension given in the example is used for pedagogical reasons.
12

56

It should be stressed that the Laspeyres index only overestimates, and the Paasche index only underestimates, the real price increase when normal substitution takes place, ie., away from the good that has become relatively expensive. Table 5.4.3 shows the calculation of the price movement from 1996 to 2006 using three components of the consumer price index, using the Laspeyres formula.
Table 5.4.3. Calculation of the "housing index".

Rent housing Electricity and fuel Furniture, furnishings, households service, etc.
1) Year 2000=100.

Weight distribution 2003, % 22.47 7.50 6.22

Consumer price index1) 1996 2006 91 116 81 121 93 109

22.47 116 7.50 121 6.22 109 LA + + P 2006 : 1996 = 100 22.47 + 7.50 + 6.22 91 22.47 + 7.50 + 6.22 81 22.47 + 7.50 + 6.22 93 = 130.25
Source: www.statistikbanken.dk

In an empirical paper, it can often be necessary to calculate partial indexes of the price index. The calculation of these partial indexes is normally made using the budget formula of the Laspeyres index in that the prices are indexed and the share of the budget is provided. These elements are sufficient for calculating the price index. In the budget formula, you only need to know the relative price (pt/po) and not the individual prices in the two years. In Table 5.4.3, the weights are not taken from the base year of the index, the year in which the index equals 100. The weights derive from the values in 2003. When the year from which the weights are taken for the index lies between the base year and the most current year in the data series, it is not possible to claim that the index overestimates or underestimates the real price rise when substitution takes place. Danmarks Statistik changes the weight basis used in calculating the Laspeyres index on a continuous basis, and they also change the base year of the index once in a while. If you use a price index in a paper covering a longer period of time, a linkage between the indexes will often be necessary. This type of linking is illustrated in Table 5.4.4. If you want the price index for 2006, using 1990 as the base year of the index, you must first calculate the index for the year in which the link is being made (2003), with 1990 set equal to 57

100. Next, the index for 2006 is calculated, with 2003 set equal to 100. Finally, the two indexes are multiplied, yielding the price rise from 1990 to 2006.
Table 5.4.4. Linking consumer price indexes.

1980=100 1990=100 2000=100 2003=100 1990=100 Source: www.statistikbanken.dk

1990 177.4 100 2003 107.0 100 1990 100

2003 234.6 132.3 2006 112.3 104.9 2006 138.8

(234.6*100/177.4) (112.3*100/107.0) (132.3*104.9/100)

Since the Laspeyres index normally overestimates the real price rise and the Paasche index normally underestimates it, it seems natural to calculate an index that lies between the two indexes. One such intermediate index is the Fisher index, which calculates a geometric average of the two other indexes.
Fisher price index:

Pt:FI = Pt:LA Pt:PA o o o

The Fisher index is used for calculating export and import price indexes in trade statistics. Data from these statistics are used for creating Table 5.4.5.
Table 5.4.5. Denmark's import of new petrol-driven cars from Germany, by motor size, 2000-2005. Quantity 1 88 4,127 446 4,661 2000 Value Price Quantity 1000 (DKK) DKK 2 3 4 4,115 46,762 131 214,668 52,015 9,493 153,552 344,287 1,071 372,335 79,883 10,695 2005 p05 * q00 p00 * q05 Value Price 1*6 3*4 1000 (DKK) DKK 5 6 1000 DKK 3,637 27,770 2,444 6,126 492,911 51,924 214,289 493,782 389,033 363,242 162,006 368,731 885,582 82,803 378,739 868,640

1000 cm3 1000-1500cm3 1500 cm3 Total


LA P05 :00 =

p 05 q 00 p 00 q 00

100 =

378 , 739 100 = 101 . 72 372 , 335

58

PA P05 :00 =

p 05 q 05 p 00 q 05

100 =

885 , 582 100 = 101 . 95 868 , 640

FI P05 :00 =

LA PA P05 :00 P05 :00 =

101 . 72 101 . 95 = 101 . 84

Source: www.statistikbanken.dk

The table shows a price increase of cars with motor sizes above 1500 m3 and a price decrease elsewhere. If normal substitution occurred during the period, the import of cars with motor sizes above 1500 m3 would fall relatively. This is not the case. The import of cars with motor sizes above 1500 m3 makes up about 10% of the import measured in quantities in 2000 as well as in 2005. The import of small cars decreased from 1.8% to 1.2% of the import measured in quantities even though the price decreased relatively much. An abnormal substitution has taken place. The Paasche index, therefore, increased just a little bit more than the Laspeyres index. A price index based solely on the import of new cars in total can be calculated using the numbers from Table 5.4.5: 82,803 x 100/ 79,883 = 103.66. This index shows a larger price rise than the other indexes, which analytically are the best. An "in total" price index is actually not a proper price index because it is influenced also by quantity changes. The index is based on an average price (total import value/quantity of imports) for one year divided by average price for another year. Therefore, the quantities, as well as the prices, are from two different years. This price rise of cars with motor size above 1500 m3 may not be real. Given the product groups in the trade statistics, there has, perhaps, been a shift toward the most luxury cars. In other words, no account has been taken for a shift within the individual product groups. The table illustrates the quality problem in index calculations based on these statistics. The price rise can be based on both price increases and quality changes. Totally analogous to these price indexes, there are three corresponding quantity or volume indexes. In these indexes, the prices are standardized: Q
LA t :o

p = p
i i

i ,o

qi ,t qi ,o

100, Q

PA t :o

i ,o

p = p
i i

i ,t

qi ,t qi ,o

100, QtFi = QtLA QtPA :o :o :o

i ,t

Using base year prices is a problem for periods far apart from the base year. Therefore, one may give preference to chain indices as a measure of real changes in quantities.

59

LA LA Chain Laspeyres' volume index: QtLA = Q1:0 Q2:1 ....... QtLA1 :0 :t

The quantity indexes show the real changes in quantities, or the changes given constant prices. Using the various index formulas, it can easily be shown that:

Vt:o

p = p
i i

i ,t

qi ,t qi ,o

100 = Pt:PA QtLA = Pt:LA QtPA = Pt:Fi QtFi o :o o :o o :o

i ,o

V is a value index that relates the value in year t to the value in year o. If you know V as well as

a price index, the quantity index can be easily calculated. For example, the Fisher price index is calculated in Table 5.4.5 to be 186.3. V can be calculated in the following way: 885,582 x 100 /
372,335 = 237.85. The Fisher quantity index is then: 237.85 x 100 /101.84 = 233.55. If you find

the quantity change by considering only the number of cars (each car counting as l), the result is: 10,695 x 100/46.61 = 229.46, which lies below the result measured using the quantity index. In the system of national accounts, the material is reported in both constant as well as current prices. Dividing the value index by the quantity index produces the (implicit) price index. You can calculate this implicit price index for many of the indexes presented in the national accounts. If you know the value index and a price index, the quantity index can be calculated by dividing the value index by the price index. Such a calculation is called deflating the index. There is often a need to deflate in a paper because it is the real change or movement that is of interest. If you have, for example, hourly earnings, income, or private consumption in current prices and a Laspeyres price index, real quantity movements can be calculated, as in Table 5.4.6.
Table 5.4.6. Index of average hourly earnings in Danish manufacturing B nominal and real changes, 1989-2005. .
Index of average hourly earnings: 1980=100 1989=100 Consumer price index: 1900=100 1989=100 Index of real hourly earnings (quantity index): 1989=100 1) 178x100/140. Source:www.statistikbanken.dk 1989 181 100 4142 100 100 2005 323 178 5790 140 1271

60

Deflating must be made with careful thought! It is necessary, when deflating, to use a price index that is relevant for the given relationship. The deflating used in empirical papers is often unsuitable. For example, export values in the trade statistics are deflated using the consumer price index. The calculated result cannot be interpreted meaningfully because consumer prices are influenced by the price movements of goods and services that are not at all inc1uded in the export of goods and services. Consumer prices are influenced also by indirect taxes, which also do not affect export goods. If the weight of goods in the consumption basket of employees in manufacturing industries differs substantially from the weights used in the consumer price index, deflating with the consumer price index can be a problem. If the prices for the goods weighted heavily in the consumption basket of employees in manufacturing industries have increased relatively greatly, deflation with the consumer price index will overestimate the movement in the real hourly wage since this division is with a price index that has risen too little with respect to the consumption choices of employees in manufacturing industries. A corresponding problem applies to retired individuals. If the value of goods consumed by retired individuals is deflated using the consumer price index, the result will most likely be incorrect since retired individuals have another consumption pattern than that of the population in general. Often, a price index will be used to deflate another price index to illustrate the relative or real price movement. For example, a price index for oil can be deflated with a price index for exported manufactures. Such an index can be interpreted meaningfully in that it shows the movement in purchasing power for a barrel of oil measured in manufactured goods.

5.5. Analyzing time series data


Many economic indicators, such as GDP, are reported for a given time period. When these values are available over several time periods, a time series is produced: observations over time for a given variable, where the time distance between observations is identical. For example, GDP is often discussed as if it were only available on a yearly basis, that is, that the time series consisted of only one observation per year. However, some time series are available on a quarterly, monthly, weekly, and daily basis, depending on the frequency with which the data is collected.

61

For some economic data, the activities behind the data are carried out over a period of time, and the measurement of the data concerns the activity for that entire period. GDP is one example of this type of indicator. There will often be a lower bound with respect to the length of period. For example, Statistics Denmark publishes quarterly data for GDP in addition to the annual data. For other variables, you can imagine that observations relate to a particular time period, for example, bond interest rates, currency rates, etc. where the price formation through "electronic trades" occurs continuously. Data for economic variables, such as currency rates, will typically appear as daily data. That is, an average of the day's prices or a price at a particular time (for example, the currency rate at 12:00 p.m.) might form the basis for the respective observation value.

5.5.1. The elements of a time series


In a time series, there is often dependency between the observed value in the current time period,
Xt, and the value in the previous period, Xt-1. This dependence must be analyzed when the

movement of a given economic variable is estimated over time. In general, it is likely that fluctuations in economic time series can result from movements in one or more of the following components:
Trend (T): Long-term movements in the respective variable. It can be either positive or

negative, i.e., the values of the variable are either increasing or decreasing in general.
Cycle (C): Movement over the course of the business cycle, i.e., normal1y over more than one

year, where peaks and troughs in a business cycles cause cyclical swings in a number of economic variables. These swings are not necessarily 'even', i.e., the swings are not necessarily identical in magnitude nor in duration. Swings that typically last several years can be difficult to distinguish from a possible trend (you can work with a trend/cycle component; refer to the following section of seasonal corrections).
Seasonal swings (S): Movements that repeat themselves within a given period (typically a

year), i.e., a particular pattern in variation emerges over a time period that is observed in other periods too. A pattern observed in a time series based on monthly data will repeat itself every year. For example, sales of certain vegetables are always greatest in the summer months, car sales are largest in the spring months, etc. For these examples, you could possibly work with

62

quarterly or monthly data and still observe the seasonal variations over the year. Moreover, you should consider the number of work days per year when economic conditions are being analyzed. For example, if you have a time series with monthly data, you might choose further to correct for the uneven number of (work) days in each month, given that the number of Sundays and holidays, etc. are unevenly distributed over the months.
Irregular swings (I): Coincidental swings (noise) that appear after consideration is made for

the T, C, and S components and get allocated to residual variation. These stochastic fluctuations are unpredictable and can be due to political-economical interference in the economy, natural catastrophes, etc. A time series (Y) can be modelled with the help of the T, C, S, and I components in two ways: Multiplicative formula: Additive formula: Y=TxCxSxI Y=T+C+S+I

In analyzing economic data, the multiplicative model is often used. For example, when the trend is increasing, seasonal swings of 10% in a given month mean that the absolute swing will become larger and larger over time. This can be totally reasonable considering that the increasing trend implies increasing levels for the respective variable. On the other hand, the additive model implies identical, absolute seasonal swings. This can be reasonable in certain cases, for example, with the seasonal correction of unemployment numbers. The purpose of time series analysis is to estimate the dynamic or time structure in the data of interest, i.e., to divide the time series up into the above stated, possible components. It is sensible to start with a graphical analysis of the time series, i.e., construct a figure with the observed values as a function of time, and to make a first assessment. If a given time series is valued in current prices, the data must be deflated, because it is normally the real change in the data that is of interest. The following Section 5.5.2 presents a discussion of the calculation technique for the so-called moving average, which can be used in connection with the determination of the trend component in the above-mentioned models. The moving average is used also as a central element in the seasonal correction of data, which is treated in Section 5.5.3. Finally, the analysis ends with a discussion of the T and C components of the model in Section 5.5.4.

63

5.5.2. Moving averages


A method for smoothing time series consists of the calculation of a so-called moving average, where the idea is to modify a given period's observation using an average of the time-related observations just prior and after that in focus. Using this method can make it easier to determine a possible trend in the time series because more short-term, inc1uding coincidental swings, are smoothed out. The method for calculating can be illustrated using numbers for GDP at factor prices (1966-1980) and a moving average that here is based on five terms and calculated as: Y1' = (Yt-2 + Yt-l + Yt + Yt+1 + Yt+2) /5 The first value in the 5-term moving average can be calculated for 1968 (average of 1966-1970), the value for 1969 becomes the average of the next five periods, etc., and the last calculation taken is for 1978. If only data for the period 1966-1980 is available, values in the beginning and at the end of that period will be missing. In the case here, data for the years after 1980 is available, and therefore the values for 1979-1980 can be calculated. Figure 5.5.1 shows the result when the period is extended from 1966 to 2002.
Table 5.5.1. Agriculture's contribution to GDP at factor prices (in millions of 1995-DKK) and the 5-term moving average, 1966-1980.
Original time series Yt 1966 13662 1967 13473 1968 13469 1969 13919 1970 11894 1971 13363 1972 13760 1973 12981 1974 15133 1975 13636 1976 12247 1977 14416 1978 14756 1979 14540 1980 15048 Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

5-term moving average

13283 13224 13281 13183 13426 13775 13551 13683 14038 13919 14201

64

With yearly data, like that used in Figure 5.5.1, a 5-term moving average will smooth out all swings with respect to those 5 years, and this results in a c1earer picture of the long-term trend in the time series. For example, the development in the agricultural sector during the time Denmark joined the EU in 1972 can be clearly seen; a prior falling trend in agriculture's contribution to GDP at factor prices was reversed rather strongly.
Figure 5.5.1. Agriculture's contribution to GDP at factor prices (in billions of 1995-DKK) and the 5-year moving average, 1966-2002.

Billion kr. 36 32 28 24 20 16 12 8 1966 1970 1974 GDP 1978 1982 1986 1990 1994 GDP95 1998 2002

Note: GDP95 is a 5-term moving average of real GDP (1995-DKK). Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

Individual observations can strongly influence a moving average, for example, a large fall in agriculture's contribution to GDP 1969-1970 is inc1uded in the calculations for all the years 1968-1972. This can be c1early seen in the figure. A correction for this could be to use a weighted moving average where different weights are used for the yearly values.13 That is, the greatest weight is given to Yt, and declining weights are given to the remaining values.14 For example, an extension of the calculation period to a 7-year moving average would in this case not greatly change the already shown 5-year average. When using an equal number of periods in the moving average, you must use a technique that
13

For the earlier shown average all the yearly values have identical weights (0.2 in this case).

65

centres the calculated average around the statistic in focus. Suppose you wish to smooth out a time series based on quarterly data for the period 1990-1992. A 4-term moving average seems most reasonable, since one observation from each of the four quarters will be used in the calculation of a given value in the moving average,15 e.g., an average where the calculation uses data from the third quarter 1990 through and inc1uding the second quarter 1991. In this last mentioned example, the calculated value will reflect a value for the middle of the calculation period, which is January 1, 1991. To obtain a value in the middle of a quarter, however, the calculation can be made using 5 terms and letting the first and last periods enter with a weight equal to 0.5. That is, the first quarter 1991 is calculated using observations from the third quarter 1990 until and inc1uding the third quarter 1991 (using the weights 0.5, 1, 1, 1, 0.5). This results in a centred moving average.

5.5.3. Seasonal correction


In using time series data where the distance between the observations is less than one year, for example, quarterly, monthly, or daily data, it may be necessary to further process the data to estimate the potential seasonal elements (S-components from the earlier used model). You can carry out seasonal correction using a reasonably simple calculation technique, which will be illustrated in the following sub-section. The purpose is to remove the more or less systematic swings, for example, over the year when they often are irrelevant for an analysis of fundamental long-term trends. On the other hand, the purpose can also be to establish the pattern of seasons. With respect to the course of correction over the months of the year, a calculation is made for the value of a given month as if it were a normal month. Seasonally corrected data will be a big help in judging actual business cyc1e developments. An overview of the time series data, on which Statistics Denmark makes seasonal corrections and publishes, is found via the home page (www.dst.dk).

The low values for agriculture's GDP in 1970 will then enter with a smaller weight in the calculations for the surrounding periods, but concerning 1970, the observation will enter with larger weight than before. 15 The length of the calculation period implies here that all swings within a year (four quarters) are smoothed out, that is, the seasonal movements are eliminated which is why the method is often used in connection with seasonal corrections. The 1ength of the period can in this way be uniquely determined from the formula (e.g., seasonal correction), where the length of the moving average in other contexts (e.g., the five terms in Figure 5.5.1) must be determined from more subjective considerations.

14

66

Year to year comparisons and moving averages

A very simple and often used method for estimating, for example, a given monthly value is to compare that value with the value from the same month the previous year. By comparing values from the same month over the period of analysis, the seasonal element ought to be removed. But the method is very crude and sensitive to incidental movements in the months under consideration and changing growth rates in the trend. This can be illustrated with data for GDP and the seasonally-corrected GDP (quarterly figures, calculated by Statistics Denmark, DS), cf. Figure 5.5.2, which shows these two time series for 1992-1994. If you estimate GDP in the third quarter 1993 in the context of the corresponding quarter from the year before, you should conc1ude that there is a decreasing trend in GDP. If you look, however, at the development from the second to the third quarters 1993 in the seasonallycorrected series, you would reach another conclusion, namely a stable development in the data. Without these seasonally-corrected time series, therefore, you would wind up drawing the wrong conclusions in certain cases.
Figure 5.5.2. GDP and seasonally-corrected GDP (billion 1995-DKK), 1992-1994.
Billion kr. 260 255 250 245 240 235 230 225 92:1 92:2 GDP95 92:3 92:4 93:1 93:2 93:3 93:4 94:1 94:2 94:3 94:4 GDP95S GDP95(4)

Note: GDP95S is the seasonally corrected series of the original data (GDP95). GDP95(4) is a 4-term centred moving average. Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

67

Another simple method that can eliminate or reduce seasonal swings is the calculation of the moving average, as earlier described. With quarterly data, a 4-term moving average (12 terms for monthly data) will smooth out swings over a year, that is, remove the seasonal fluctuations. To illustrate this, a calculation is made on the indicated GDP figures. Here, quarterly data for GDP is given for a period longer than 1992-1994, which is why a 4-term centred moving average can be easily calculated for all the quarters in the designated period, cf., GDP95(4) in Figure 5.5.2. In this case, a 4-term moving average apparently leads to a smoothing of both seasonal and irregular swings that is more powerful than the seasonal corrections made by Danmarks Statistik.
Seasonal indices and the X-11 procedure

A seasonal index for the year's 12 months states the seasonal swings over the year in index form and is calculated so that the index's average value is 100. A value for July equal to 96 means that, for that month, the observations are expected to lie 4% under what the trend and cycle components are in general. To establish a seasonal index, you must first estimate the seasonal component in the time series, which is not so easy and can only be approximately determined, as ought to be obvious from the previous discussion. In the following, a multiplicative relationship is assumed among the T, C, S, and I components, and seasonal correction, etc. will be illustrated using monthly data of retailers' sales of food, beverages, and tobacco.16 This is a quantity index, which is why deflating is not necessary. The calculations are made for the period 1990:01-2003:10, and to make the resulting construction of the method easier to follow, individual data from the time series as well as some of the results from the calculations are shown in Table 5.5.2.

16

Statistics Denmark's seasonal correction of this time series is based also on an assumption of a multiplicative relationship.

68

Table 5.5.2. Calculation of the seasonal index for food, beverages, and tobacco.

Yt Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Sum 90.20 86.00 98.80 99.10 106.70 102.90 101.70 102.30 93.80 97.50 100.00 121.00

1990 Yt'

YSI

Yt 92.70 87.60 103.40 97.80 109.20 101.70 108.10 105.10 94.70 100.80 101.20 121.80

1991 Yt' 100.99 101.38 101.53 101.70 101.89 101.98 102.08 102.27 101.96 101.80 101.95 102.05

YSI 91.79 86.41 101.84 96.16 107.17 99.73 105.89 102.77 92.88 99.02 99.27 119.36

Yt 99.50 93.80 110.53 104.92 114.74 107.07 110.74 114.52 102.43 109.13 110.96 124.34

2002 Yt' 107.69 107.95 108.10 108.32 108.58 108.60 108.78 109.14 108.99 108.95 109.22 109.26

YSI 92.40 86.89 102.25 96.86 105.67 98.59 101.80 104.93 93.98 100.16 101.60 113.80

Yt 104.92 97.04 103.73 110.74 115.17 107.83 113.98 114.63 101.46 112.47

2003 Yt' 109.43 109.57 109.53 109.63

YSI 95.87 88.56 94.70 101.01

100.10 101.59 100.28 102.02 100.53 93.30 100.67 96.85 100.72 99.28 100.78 120.07

Monthly Seasonal ave. of index YSI 92.00 92.02 88.06 88.08 99.42 99.44 99.07 99.09 104.01 104.03 100.64 100.67 103.80 103.82 102.15 102.17 94.64 94.66 98.48 98.50 98.40 98.43 119.06 119.08 1199.73 1200.00

Note: Yt indicates the quantity index for sales of food, beverages, and tobacco (1990=100). Yt' is a centred 12month moving average and YSI =(Yt/ Yt') x 100. Source: Statistikbanken.dk/DETA2 (Statistics Denmark).

The original time series, as well as the 12-term centred moving average, which is assumed to smooth out seasonal swings, is shown in Figure 5.5.3. There is a clear seasonal pattern in retail sales of food, beverages, and tobacco, where the largest sales occur in December. This pattern disappears totally in the moving average values, which show the course of the trend and cycle.
Figure 5.5.3. Quantity index and the 12-term moving average for sales of food, beverages, and tobacco, January 1990-December 1995.
140 130 120 110 100 90 80 90:1 90:7 91:1 91:7 92:1 92:7 93:1 93:7 94:1 94:7 95:1 95:7

Q(12)

Note: Q(12) is a centred 12-term moving average of the quantity index of the sales of food etc. (Q). Source: Statistikbanken.dk/DETA2 (Statistics Denmark).

69

If you assume that the moving average only contains the trend and cycle components, the following division17 of those components in Figure 5.5.3 can be shown as: YSI = (T x C x S x I) / (T x C) The total index is divided by the moving average.18 The result becomes an index (time series), cf. YSI in Table 5.5.2, that apart from the irregular components only contains the seasonal component. Because the trend and cycle swings are eliminated in the new index, for the most part, the average for the 12 months in each of the years is approximately 100. Here the calculations are carried out for January 1990 to October 2003, which means thirteen observations for each month. Because YSI can contain irregular elements (I), an average is computed on the basis of these observations for each month so that the final result becomes an index that only represents the S component;19 cf., the seasonal index in Table 5.5.2 and Figure 5.5.4.
Figure 5.5.4. Seasonal index of sales of food, beverages, and tobacco (constant prices).
130 120 110

100 90

80 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Source: Table 5.5.2.


17

An additive relationship between T, C, S, and I means using an addition/subtraction similar to the procedure here. You can multiply by 100 throughout the calculation if you want to maintain an index level in that form. The sketched method is called the "ratio-to-moving-average method". 19 The average for the year comes close to 100 but can deviate a little (due to rounding of numbers and incomplete smoothing of all non-season determined components), in which case the index is level-adjusted as in Table 5.5.2, where 1200 / 1199.7 is multiplied by the monthly average of YSI.
18

70

Given this seasonal index, the original time series can now be seasonally corrected. By dividing the total index by the seasonal index, the movement is cleared of seasonal swings. This is shown in Figure 5.5.5, together with that of Statistic Denmark's published seasonally-corrected quantity index for the same data.
Figure 5.5.5. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated" and DS, 2000-2001.

130 125 120 115 110 105 100 95 90 J F M A M J J A S O N D J F M A M J J A S O N D 2000 2001


Original data DS Calculated

Note: "Calculated" is indicated by the seasonally-corrected index as given in the text. Danmarks Statistik's seasonally-corrected index is shown as well as the original uncorrected time series. Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and Table 5.5.2.

For clarity's sake, only the values for 2000-01 are shown. The calculated seasonally-corrected index deviates a little from that of Statistics Denmark's published index, partly because the method for calculation is relatively simple, but also because the calculations here are carried out on data that only covers the period 1990-2003. Seasonal correction is made at Statistics Denmark (and usually also applied at other statistical agencies) with the help of a programme called X-11 or X-12, which is the latest version of the programme - developed by the U.S. Bureau of the Census in the 1960's. This is capable of seasonally correcting quarterly and monthly data. The programme separates a time series into a trend and cycle component (TC), a seasonal component (S), and an irregular component (I). All unevenness in the number of work and business days over the year can be corrected for.

71

The calculation procedure builds on the technique using moving averages, a centred 12-month moving average for establishing the T-C components. This is used as was earlier shown to obtain a first estimate of the S-I components. Going through several iterations of calculations, using various forms of the moving average, yields an adjusted (final) estimate of the season component. In this connection, an attempt is made to isolate the I component, and extreme values (outliers) are given less weight so that their influence is reduced. Given the output possibilities in X-11, the original time series can be divided up into a trend-cyclical component, a seasonal component, the irregular component, and of course, a seasonally-corrected time series. As illustration of the last, the time series used in Figure 5.5.3 is seasonally-corrected with the help of X-11. Only the period 1990-2003 is used, and a correction has not been made for the number of work days, which is why the results will deviate from the earlier shown seasonallycorrected numbers from Statistics Denmark. For clarity sake, only the results for 2000-01 are presented again, cf., Figure 5.5.6.
Figure 5.5.6. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated" and X-11, 2000-01.
111.6 110.4 109.2 108.0 106.8 105.6 104.4 103.2 J F M A M J J A S O N D J F M A M J J A S O N D 2000 2001
X11 Calculated

Note: The calculations are carried out using the time series programme SAS/ETS. The index "calculated" is as shown in Figure 5.5.5., and X-11 is calculated on data covering the period 1990-2003 (corresponding to the data set that the original 12-month average was computed from). Source: Statistikbanken.dk/DETA2 (Statistics Denmark), and the X11 procedure.

72

There is a nice merging between the result from X-11 and the earlier manually-computed index the differences have no practical significance. Correspondingly, the seasonal index produced by X-11 (not exhibited) is nearly totally identical with that presented in Figure 5.5.4.

5.5.4. Trends and cycles


An example of a trend is seen in Figure 5.5.7, where GDP in constant factor prices is shown from 1900 to the present. For certain sub-periods, the course is somewhat smoothly increasing, that is, at a constant growth rate over time. As shown earlier in Section 5.5.2, a development that can be described by an exponential function as that in Figure 5.5.7 resembles approximately will have a constant growth rate. It is seen in the figure here that, by applying a logarithmic scale to GDP, the graph becomes partly linear the slope is determined by the GDP growth rate. For the indicated period, there are cyclical swings and reactions to certain events, such as wars and oil price shocks. When yearly data is used, seasonal variation will be eliminated (which can be an advantage in that there is one less component to isolate in a given time series). The trend in a time series can be of different types the exponential function has already been mentioned. A second type is a simple, linear relationship over time. A third type is the logistical curve, which has an S-shape.
Figure 5.5.7. GDP at factor prices, 1900-2002 (in billions of 1995-kroner).
Kr. billion

1100 1000 900 800 700 600 500 400 300 200 100 0 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

73

Kr. billion (logarithmic scale)

1000
500

100
50

10 1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000

Source: Sv. Aa. Hansen: konomisk vkst i Danmark (Economic Growth in Denmark) Copenhagen, 1974; Adam's databank; Statistikbanken.dk/NAT07.

If a time series consists of, for example, monthly data for which a seasonal index has been computed, the trend and cycle components can be established based on this seasonal index. This can be illustrated with data used earlier for sales of food, beverages, and tobacco. In order to make a judgement about the trend, the time period must be relatively long, and in the present case data from 1990 has been used. First, a seasonal correction is made on the entire series by dividing the monthly values for the individual years by the seasonal index, as shown earlier in Figure 5.5.4. The result appears in Figure 5.5.8. The seasonally-corrected values exhibit significantly fewer fluctuations than the original series. At the same time, it can be seen that the projection of a trend forward from the period 1990-2003 appears less favourable since the increasing trend here does not seem to apply to the future periods (the upper part of Figure 5.5.8). The seasonally-corrected curve contains only T, C, and I components. The second step in the analysis is to evaluate the trend, which here has led to the assumption of a constant growth rate over the whole period (exponential growth). With the help of regression analysis,20 this trend is determined from the seasonally-corrected values, which are sketched in the lower diagram of

20

Under the assumption about exponential growth, the trend is determined as the curve for which the sum of the squared distance between the observations and the curve is minimized.

74

Figure 5.5.8.
Figure 5.5.8. Quantity index of sales of food, beverages, and tobacco, January 1990 - October 2003 (1990=100).
Original data
132 126 120 114 108 102 96 90 84 90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1

Seasonally-corrected data
116 112 108 104 100 96 92 90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1

Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.

A similar exercise can be done with the data from Figure 5.5.3 where a 12-term centred moving average did seem to smooth out the seasonal pattern. With data for the period 19902003 the (seasonal) adjusted data and the trend are exhibited in Figure 5.5.9. One interpretation of the cycles or fluctuations around the linear trend will be that this illustrates the business cycle component of the original series.

75

Figure 5.5.9.Trend and cyclical components of the sales of food, beverages, and tobacco (Index, 1990=100).
112 110 108 106 104 102 100 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.

With this, the decomposition of the original time series is finished. But you must still remember that behind the calculations are some (self chosen) assumptions, which is why the final decomposition may not represent the "true" picture. This means that the trend in the last figure can be rightfully criticized, because it does not explain much of the variation in the time series. A corresponding trend with a constant growth rate would be a much better fit for the GDP data in Figure 5.5.7.

6. Making commentaries
All tables and figures included in the paper must be discussed in the paper. The purpose of this section is to provide some guidelines as to what types of comments are appropriate. The paper must be written in clear language that is easily read and does not use complicated and twisted sentence construction. Banal language like slang, catchwords, or clichs are to be avoided, just as the "I" and "we" form, as well as other diction, should be avoided. The following sentences have been taken (and translated) from previous empirical papers as examples of what has not worked, mainly because of the lack of substance in the words:

76

o o o o

Everyone knows that sales have been nothing, but ideal until now. It has always been a popular, vogue, fashionable, favourite phenomenon to compare us with each other here in Scandinavia. Now it all hangs together ... One thing is imports and exports a relatively positive experience but what about interest rates?

In addition, you must avoid exaggeration and assertions that are not covered in the material used in the paper. The following sentences have been taken (and translated) from previous papers as examples of what has not worked, mainly because of the lack of documentation behind the statements:
o o o o

We don't have to go far into the future before energy becomes a scarce commodity. It is a known fact that youth are more environmentally-minded than the elderly. Bank failure on the Faro Islands has been an everyday affair. The development has been full of swindles and diverse allegations.

With the help of few well chosen sentences, comments must point toward the patterns that the tables and figures reflect, without repeating the data itself in the text. An example of an appropriate comment concerning Table 5.1.3 follows:
Table 5.1.3 shows first that, in the period 1970-2006, a shift occurred in age distribution resulting in relatively more elderly individuals and relatively fewer children and young adults, both male and female. Second, the table shows that, during this period, the population increased. And third, the table shows that there are more females than males among the elderly and fewer females than males among the children and young adults.

The comment is short and precise. The type of material used to analyze the issues under investigation influences the type of commentaries that should be made. If the focus of the paper has been, for example, on public sector expenditures, comments should be directed toward the effect the stated changes in age distribution could have on these public sector expenditures. So

77

the comment should not just be short and correct, it must be relevant to the problem under investigation. An important aspect in the comment might be to point toward what might be lacking in terms of available material (or material adequacy) in light of the chosen formulation of the statement of the problem. The material is adequate for the statement of the problem when it allows a substantial analysis of the problem. Material inadequacy can also result from a lack of material concerning explanatory factors. If a decisive explanatory factor has not been accounted for in the analysis, you will most likely make incorrect conclusions, as earlier mentioned. Therefore, you should include in the comment the lack of material for an important explanatory factor, if this is the case. In general, the points in the previous discussion of causal analysis are all relevant for evaluating the adequacy of the material used in the paper. In conclusion, a good commentary 1. is written in precise and concise language remember to use correct punctuation. 2. highlights the patterns in the material without a long-winded discussion of the individual elements. 3. to a greater or lesser extent, addresses the adequacy of the material with respect to both conceptual understanding and as well as a lack of material. 4. contains assessments of the explanatory value of the included material seen in the context of the statement of the problem and therefore contributes to continuity between the sections.

7. Construction of the report


This section covers the formal demands, not previously mentioned, for preparing the empirical paper. The paper begins with a title page. After the title page comes a table of contents, which overviews the sections included in the paper, presenting the section title, number and the page on which the respective section begins, cf., the table of contents for these guidelines. The first section is called the introduction and is used for a discussion of, and a justification

78

for, the chosen statement of the problem and of the chosen delimitations. A start in setting the delimitations might be the defining of the central concepts. It should be pointed out that you should not bother to define concepts that the audience is expected to be familiar with. The introduction should also be used to point out aspects of the problem that could possibly be relevant/interesting, but which are not to be treated. The introduction binds the succeeding sections together in that these sections present relevant material that is first introduced in the introduction. As mentioned, the statement of the problem is the control mechanism for the succeeding phases of work. The introduction is , therefore, the control mechanism for all succeeding sections of the report. It should be emphasized that the introduction must not be a verbalization of the table of contents, and you should not start by saying that the purpose of the paper is to give an account of that which stands in the title. This ought to be obvious and is, therefore, unnecessary to mention. Finally, it should be mentioned that data material does not normally appear in the introduction. In a paper 15 pages in length, where the choice of method, etc., does not require an in-depth discussion, the introduction will typically fill one page maximum. In the sections following the introduction, the statement of the problem is addressed using the collected data and information. Normally, the base material is located in the second section. The remaining sections are used, then, to account for the development in this material. Both sections and sub-sections can be used. Every section treats a sub-problem of the statement of the problem and will, as a rule, comprise at least one-half of a page. The sections must follow each other in a logical order and with a reasonable weight, determined relative to the problem at hand. The order and weighting given to the paper is given large consideration in the evaluation of the paper. A sensible weighting involves also the choice of which material is to be visualized and in what form it is to be visualized. Note, it is by exception that data already presented in one visual form is again presented in another. For example, rather than treating the same data in both figure and table forms, you might instead include additional explanatory material and thereby reach a greater depth in the analysis. Use short and precise section titles and avoid having tables and/or figures follow immediately after each other. Comments should be used, instead, to "encircle" the tables or figures that are being referred to. A section should not, under normal circumstances, begin with a table or figure, but rather a text. Text and tables and figures are separated with an extra line so there is

79

space between, for example, a table's source and the surrounding text. However, to avoid large empty spaces (typically at the bottom of the page), it might be necessary to separate comments and tables or figures from each other in the text. The final section in the paper is called the conclusion and is used to summarize the most important conclusions reached in the text. Reading just the introduction and the conclusion should be enough to give the reader the essence of the report this is a great advantage for the busy reader. New aspects or information must not be treated in the conclusion, and all clichs about what the future might bring should be avoided they are subjective predictions about future development that have no basis in the material. The sections must be numbered, and the section titles must be marked clearly, for example, with underlining or by using bold or italic type. After the conclusion comes the reference list, which appears on a separate page and presents an unambiguous overview of the utilized sources. That the reference list is unambiguous means that, since each source is unique, each source must be cited with enough information to uniquely identify the source. The reference list includes enough information to be able to uniquely identify sources. The following rules should be used: books are written with author(s), title, location of publisher, publisher, year for example, Andersen, T. M, et al.: The Danish Economy, 2. edition, DJF Publishing, Copenhagen, 2006. Note that the author' s name is written last name first and that, as in the example given, only one author's name is written, followed by 'et al.' when three or more authors are associated with a given text. With two or three authors, the first name appears as noted above, and the following names appear with first name first. Note further, that the title of the work might appear in italic type (although style may dictate that it appears in bold). Periodicals are written with author(s), title of work, title of periodical, year, volume (if any), issue, and page number(s) for example, Bentzen, J.: An empirical analysis of gasoline demand
in Denmark using cointegration techniques, Energy Economics, Vol. 16, No. 2, 1994, pp. 139-

143. Statistics are referenced by organization issuing the statistics, title, year, number for example, OECD: Annual National Accounts Volume 1 Main aggregates, 2007. Normally, homepage and database addresses are placed at the end of the reference list. If there are appendices, they come after the reference list. The appendices contain the raw data

80

and other information that was used to establish the base, comparative, and explanatory material used and presented in the tables and figures in the text. The appendices should not include copies of tables from the statistical sources. The appendices might also contain an account of the calculations used to obtain further representations of the data then used in the text. Presentation of these results gives the reader the chance for replicating the presented material. A legal text can also be included in the appendix material. In general, the appendices should be used for the material that ties the material used in the text back to the original form of the data and for that material which was not directly used in analyzing the problem. In this vein, the appendix material is not directly discussed in the text, but perhaps referred to.

References
Adam's databank, Statistics Denmark Andersen. T.M. et al.: The Danish Economy. DJF Publishing. Copenhagen 2006. Danmarks Statistik: NYT, No. 321, 1993. Danmarks Statistik: Statististisk tirsoversigt (Statistical ten-year review) (STO). Danmarks Statistik: Statistisk rbog (Statistical yearbook) (S), 2006. Danish Energy Agency: Energistatistik (Energy statistics), 2002. Hansen, Sv.Aa.: konomisk vkst i Danmark (Economic Growth in Denmark) Copenhagen, 1974. Meadows, D, et al.: The Limits to Growth, The New American Library, Inc., 1972. Middle East OECD: Energy Balances of OECD Countries, 2003 Edition. The World Bank: World Development Report, 2006. The World Bank: World Development Indicators. www.statistikbanken.dk www.opec.org Annual Statistical Bulletin 2006. www.bp.com Review of World Energy. 2001.

81

Anda mungkin juga menyukai