Anda di halaman 1dari 23

Original Article Evaluating State Performance: A Critical View of State Failure and Fragility Indexes

Francisco Gutierrez San nw


Instituto de Estudios Pol ticos y Relaciones Internacionales, Universidad Nacional de Colombia, Carrera 7 no. 83-36 Apartamento 202, Bogota, Colombia. E-mail: fgutiers2002@yahoo.com
Researcher at the Instituto de Estudios Pol ticos y Relaciones Internacionales Universidad Nacional de Colombia. He presents here the results of research conducted under the auspices of the LSE Crisis States Research Centre, funded by UK aid from the Department for International Development.
w

Abstract The article criticizes poor state performance (PSP) indexes, that is to say, cross-national
data sets that mark or rank contemporary states according to their performance. In particular, I claim that current indexes provide very little genuine information about performance orderings. The criticism focuses on structural PSP problems: those that cannot be circumvented, have no obvious solution, and generally stem from the very nature of the exercise. I suggest that there is a generalized failure to acknowledge, let alone solve, in all three stages of index building conceptualization, codification/operationalization, aggregation fundamental issues including defining, dealing with intrinsic ambiguity, and with lack of complete order in the informational domain (that is, the database). ` Cet article est une critique des indicateurs de mauvaise performance de l0 Etat, c0 est a dire des ensembles de donnees nationales transversales qui classent les Etats contemporains en fonction de leurs performances. En particulier, je soutiens que les indicateurs utilises actuellement fournissent ` tres peu de reelles informations concernant les classifications des performances. Notre critique est ` centree sur les problemes )structurels* qui caracterisent ces indicateurs: ceux qui sont inevitables, ` n0 ont pas de solutions evidentes, et resultent en general de la nature meme de l0 exercice. Je considere qu0 on neglige generalement de reconna tre, et par consequent de resoudre lors des trois phases de construction des indicateurs (conceptualisation, codification/operationalisation, et agregation) ` ` ` certains problemes fondamentaux lies, notamment, a l0 ambigu te intrinseque au domaine in formationnel et au manque d0 ordre qui caracterise ce dernier. European Journal of Development Research (2011) 23, 2042. doi:10.1057/ejdr.2010.53; published online 2 December 2010 Keywords: state performance; statehood; classification; order; ambiguity; aggregation

According to the Celestial Emporium of Benevolent Knowledge, animals are divided into: those that belong to the Emperor, embalmed ones, those that are trained, suckling pigs, mermaids, fabulous ones, stray dogs, those included in the present classification, those that tremble as if they were mad, innumerable ones, those drawn with a very fine camelhair brush, others, those that have just broken a flower vase, those that from a long way off look like flies. Jorge Luis Borges

Introduction
State fragility and failure have constituted core policy and academic concerns in the last decade. OECD leaders of all political shades, from Bush (Taylor, 2008) to Obama
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 www.palgrave-journals.com/ejdr/

State Performance Indexes: A Critical View (Briscoe, 2009), have underscored its centrality, for example, while state fragility is regularly linked to both security and development issues in mainstream discourse: Since September 11, 2001, the United States and other governments have frequently asserted that threats to international peace and security often come from the worlds weakest states. Such countries can fall prey to and spawn a host of transnational security threats, including terrorism, weapons proliferation, organized crime, infectious disease, environmental degradation, and civil conflicts that spill over borders (Rice and Stewar, 2009). Indeed, the World Bank has declared that weak states are the toughest development challenge of our era (Briscoe, 2009), and substantial economic, political and military efforts have been mobilized in the past few years to shore up or so it is claimed fragile states (FS). Correspondingly, major intellectual efforts have also been invested in the classification, description, understanding and evaluation of state weakness, fragility or deficiency (see for example Helman and Ratner, 1993; Dorff, 2000). One of the main thrusts of such effort has been the creation of what I will term poor state performance (PSP) indexes,1 that is to say, cross-national data sets that, together with a classificatory tool the index proper mark or rank states according to their performance. The final product of the index can be, and generally is, as simple2 as a tag that attributes each country/year a number, for example on a scale from 1 to 10. This article makes a critical evaluation of PSPs and of their use in probabilistic models that pretend to find the correlates of fragility. My objective is to diagnose and discuss the structural problems of PSPs and their use in both research and policy. PSPs, in effect, have come to constitute a tool for the allocation of scarce resources, global evaluation and decision making, and also for the analysis and characterization of developing countries. This is not accidental, as will be seen below. I will claim in this article, however, that current PSPs do not take into account the specificities of the problem they are dealing with, and are not usable measurable tools (that is to say, producing output that can be used in regressions). In particular, they only generate genuine information at the extremes of their conceptual space, precisely where it is less useful and interesting. To establish clearly the scope and limits of my discussion, I must first clarify what it does and what it does not include. Many important weaknesses of current PSPs will not be dealt with here, not only because they are not necessary (see below), but also because they have been treated thoroughly elsewhere. For example, there is a plethora of definitions of PSP that are rarely explicit about theoretical underpinnings (Di John, 2010), and tend to be unaware of the minimal requirements that a well-performing classificatory tool should have. Indeed, following a careful evaluation, Cammack et al (2006, p. 16) come to the conclusion that both researchers and policymakers in the PSP field tend to count with labels, rather than well-defined PSP concepts, insofar as Fragile states is a label currently in use by the international community to identify a particular class of states. Actors conceptualize the FS agenda differently according to their concerns and goals. The word fragile is often substituted without a precise change in meaning by failed, failing, crisis, weak, rogue, collapsed, poorly performing, ineffective, or shadow; a fragile state may also be called a country at risk of instability or under stress, or even a difficult partner. In most cases, these labels do not have a meaning that is clearly understood far beyond the author who has used them. Moreover, many of the FS definitions mix up the meaning of the word fragility with propositions about correlates and causal relations. Further complicating the matter, donor definitions appear to fall into three general but overlapping types: where fragility is defined in terms of the functionality
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 21

Gutierrez San n of states, of their outputs (including insecurity), or of their relationship with donors. This dispersion coexists with several forms of hodge-podge categorization (King and Langche, 2001; Cammack et al, 2006; Di John, 2010). Several indexes, furthermore, contain rather obvious biases, and some complex technical issues are dealt with expeditiously. Some indexes for example, the Index of State Weakness in the Developing World or the Index of African Governance average over the existing values, and therefore if a country A has five data and country B has only two, the averages of both are presented as the index. This is likely to introduce obvious biases. Nothing of this will be of the concern in this article. I will assume that each of these problems is easy in the following sense: the path towards the solution can be clearly described, even if in practice the concrete implementation is fraught by technical problems. My focus, instead, will be on hard or structural problems those that cannot be circumvented, have no obvious solution, and generally stem from deeply hidden biases and assumptions. Owing to this last factor, they have not been explicitly acknowledged, let alone analyzed or solved. The three main factors are defining, solving intrinsic ambiguity and addressing the issue of order. My discussion is organized as follows. In the first section, I (briefly) describe some of the main existing PSP indexes, sketch out their specificities and consider some reasons that may explain their growing importance. There are of course many more PSPs than the ones I present here, but the latter are among those that are most institutionally backed, and also most cited, utilized and established. In the second section, I focus on definitional issues. I suggest that definitions of PSP generally stem from prototyping, which in this context has clear downsides. Confusing, sometimes inconsistent, definitions generate haziness, endogeneity and a-historicity. The third section is dedicated to intrinsic ambiguity. I show that any precise in the sense of admitting crisp coding definition of PSP is problematic. The fourth section discusses order and aggregation. The main point of this section is that there is a deeply hidden assumption underpinning the operation of PSP databases, namely that there exists a numeraire that allows the agent to substitute units of one variable for units of another. This assumption is highly debatable, and has prevented a fine-grained discussion of the meaning of aggregation functions and multi-attribute decision making in the context of political and social databases. In particular, I show that in a very specific, but important, sense, PSP indexes are conceptually vacuous. Another way of putting this is that political global indexes have some important characteristics that undermine the meaningfulness of routine treatments. The concluding section attempts to pull the pieces together, and suggests that at least some of the most severe problems highlighted in the main discussion are tractable.

Fragility Indexes An Overview


Clearly, PSPs constitute a heterogeneous family. This includes a range of members, from the World Banks Country Policy and Institutional Assessment (CPIA), which was not originally designed to be a state performance index, to the Bertelsmann Stiftung Transformation Index (BTI), as well as the Failed States Index of the Fund for Peace, for example. Their definitions diverge, sometimes widely, as do their emphases and policy concerns. Some are global, while others focus on specific regions. Some aspire to mark and rank every single existing country, and some prefer to restrict themselves to the critical cases and so on.
22 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View


Table 1: List of some of the main indexes Index Bertelsmann Transformation Index (BTI) State Weakness Index Country Indicators for Foreign Policy Country Policy and Institutional Assessment (CPIA)/ International Development Association (IDA) Resource Allocation Index (IRAI) Failed States Index Index of African Governance Index of State Weakness in the Developing World State Fragility Index Institutional affiliation Bertelsmann Stiftung Carleton University The World Bank Fund for Peace Harvard University Brookings Institution George Mason University Number of Appeared variables in 2 More than 50 16 12 55 20 8 2008 2005 2005 2005 2008 2008 2008

Despite their differences, however, these tools share some important commonalities. First of all, they have appeared relatively recently (see Table 1). Second, they deal with both count data and verbal variables. In effect, at least certain portions of the variables typically consist of the verbal or numerical assessments of in-house coders about the state of different countries with respect to given criteria. Third, they tend to have a lot of variables more than 50 in some cases. The final index is almost always with some exceptions, notably the BTI a result of a two-step aggregation procedure: the variables are aggregated in boxes, and these in turn are averaged to produce the final index.3 Fourth, they establish a series of critical cut-off points to be able to set apart fragile states from those that are not. I will come back to these crucial characteristics below. PSPs have appeared and thrived together with a host of political indicators that generally intend to operationalize complex concepts by the capture of heterogeneous data by a highly multidimensional procedure. There are at least three reasons that help explain their growing importance. First of all, globalization. Global problems have to be dealt with, and global policies implemented by multilateral agencies have to be evaluated, by advocates, adversaries or third parties. An index plays a crucial role here, because some means of representing the overall impact and evolution of the policy (or of the problem) is necessary. Second, the need to synthesize the massive amounts of information now available as a result of recent technological progress. Indeed, our very notion of an indicator has been transformed thanks to the phenomenal proliferation of information sources mainly via the Internet during the last two decades. Third, and related to the previous two ones, the increasing awareness that instability in the global South is likely to have serious repercussions in the global North, in forms that have direct economic and political impact: migrations and new forms of violence, for example. This has given state fragility the status of a primary policy concern. In this context, PSPs are seen to help characterize, understand, prevent and perhaps even predict major state breakdowns. For example, the Political Instability Task Force is supposed to forecast with a two year lead time the outbreak of civil wars and other forms of political instability (Goldstone et al, 2010). For this reason, PSP indexes are also routinely used in academic quantitative exercises (see for example Cammack et al, 2006; Bates, 2008).
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 23

Gutierrez San n

Defining
To build a database and an index of PSP, it is necessary to have a conceptualization of what a state is, and how to measure its performance. There are a number of critical difficulties related to the process of definition that antecede operationally and logically measuring and evaluating.4 Both in the field and in PSP indexes, this has generally been done by prototyping. This has involved finding typical instantiations of failure, and then trying to extract their common characteristics (see the definitions contained in Appendix A, which basically proceed by example). The State Task Force, for instance, goes as far as to establish one prototype and then search for other key characteristics that may or may not be linked to the phenomenon it is trying to characterize. This shows that although prototyping is a natural enough procedure, indeed a key step in the establishment of typologies, it is fraught with dangers, a fact that the qualitative literature illustrates very well (see for example Zartman, 1995; Gros, 1996). More specifically, in the field of state performance, at least five ways of being misled by the forthright use of prototyping can be identified. First, prototypes encapsulate syndromes, that is to say, complex overlaps of different states of the world. From the lenses of prototyping, countries can appear to have a (possibly vague) resemblance, or to belong to the same category by analogy. However, analogy is no sound base for building typologies, as the epigraph at the beginning of this article illustrates well. In particular, when policy concerns are plugged directly into definitions, the characteristic traits of the phenomenon are collapsed with putative causes and consequences. Thus, several PSP indexes include correlates of fragility (such as high infant mortality, for example), possible causes (such as the lack of democracy), and predicated consequences (such as humanitarian disasters). This is a bit like defining cancer as con sist[ing] of smoking, uncontrolled growth of cells, and family crisis (Gutierrez et al, 2010, p. 51). Such definitions are clearly a sure way to analytical catastrophe, and additionally undermine one of the key potential uses of an index, namely providing quantitative data for regressions, insofar as if everything is plugged into the left hand of the equation, then over what will the value of the index be regressed? Beyond this major sin, there exists a range of further serious issues. There may exist more than one prototype for the same phenomenon (or tag), for example. If there are several ways of falling apart (Bates, 2008), then a single definition of fragility will not necessarily capture them all. Indeed, in the last years we have witnessed several events where either the de facto or the de jure sovereignty of a state is subverted, and where this state as a unitary entity disappears. But such events seem to be everything but homogeneous. For example, Czechoslovakia and the Soviet Union may have collapsed in very tangible ways (see Marples, 2004; Rosenau, 2006), but then we have cases like the Congo and Somalia, where formal state authority ceased to exist during long periods, while more informal forms of governance emerged at various points in time.5 Iraqs recent breakdown, produced by an invasion, was also very different, and its strength or fragility in comparison to its immediate past is clearly a matter of debate. These examples reveal a much more complex landscape than a simple linear progression (retrogression) from strength (or any similar term) to weakness. What we find instead is differential performance in the several dimensions of statehood; spectacular successes punctuated by brutal shocks; critical exogenous events playing a major role; sudden and sharp recuperations. This picture matches much better both historical evidence and state building theory (see, especially, Skocpol, 1984; Tilly, 1992) than the purely linear story implicit in PSP operational definitions.
24 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View In addition, prototyping based on breakdown generates definitions that will fit cases at the borders of the definitional space that is, extreme ones but not necessarily elsewhere. Said in other terms, it can produce an overfit to the set of situations in which everything goes awry as the variables that distinguish the extreme cases from the others well do not necessarily do so properly for cases in the middle. This is an extremely simple, but important, point, to which I will come back below, but it also relates to the fact that there are many situations indeed, the majority in which states generally perform well in some aspects and rather poorly in others. Though the prototype is abject failure, the typical situation is partial achievement (or, if the glass is half empty, partial failure). A fourth major issue is that these definitional routines can be associated with a lack of historicity. The basic problem here is that as for example Skocpol (1984) has shown very clearly in her work on revolutions strength or fragility are relational, not absolute, terms. Prototypes of failure capture objects, not relations. This point is not acknowledged even by such seasoned researchers as North et al (2009) who, in their latest book systematically refer to England in the sixteenth century as a fragile state. Although it is true that they need a toolkit of terms that are placed high on the ladder of abstraction see Sartori (1970) in order to fulfill their promise of building a conceptual framework for interpreting recorded human history, the litmus test of this type of terminology is precisely that it must preserve its basic desirable characteristics when applied to different periods (in this case, its relational nature). England may have been fragile in some sense then, but certainly not in the sense attributed to Somalia or Afghanistan today. On the other side of the quant-qual barricade, we find exactly the same issues. Polity, for example, gives Switzerland in the 1960s a mark of 10 out of 10 on its scale of democracy, without taking into consideration that women were disenfranchised.6 Few countries without female electoral participation would likely obtain the same generous mark today. The problem is that the index administrators (users) do not have any tools to express that in the 1960s this restriction was already rare, but still acceptable for regimes classified as democratic, while this is no longer the case today. This becomes a fundamental problem when not only cross-sections, but also longitudinal comparisons are incorporated into probabilistic models. The fifth problem is the failure to capture the interactions between the different dimensions of the definition. Appendix A summarizes the definitions of poor performance of some major PSP databases. The Fund for Peace, for example, presents a list of (negative) attributes: loss of territorial control, of legitimacy, inability to provide reasonable services, inability to interact as a full member of the international community. The State Failure Task Force offers a narrow definition consisting of the collapse of state authority, but as these events are rare they have also produced a broader one that includes genocide, disruptive regime transitions and revolutionary wars aimed at displacing the regime.7 On the other hand, Carleton University in Ottawa, Canada, creates an explicit scale that ranges from fragility to failure and collapse. For the USAID (2005, p. 1), it is more important to understand how far and quickly a country is moving from or toward stability than it is to categorize a state as failed or not. Though stability seems to be the most cherished value, the definition is organized around the notion of the direction of change (positive or negative). The CPIA, for its part, includes in its criteria weak institutions, poor governance, political instability and frequent violence (and its delayed effects). In all of these examples, the different dimensions over which state performance is evaluated are mutually interrelated. In fact, it is difficult to imagine a PSP index in which
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 25

Gutierrez San n this would not occur as the non-independence of the dimensions of the concept of interest is a characteristic of nearly all social scientific databases. This can create problems, however. For example, Polity does not include civil rights in its definition of democracy, as it candidly and explicitly acknowledges in its codebook,8 something that apart from obviously being contentious, also fails to take into account the fact that a lack of rights especially above a certain threshold can severely affect competitiveness, which Polity does measure. In other words, many PSP databases build their indexes as if their dimensions were completely independent, which is conceptually wrong. This also has negative technical implications: aggregating interrelated variables and boxes as if they were independent is equivalent to introducing hidden weights that change the aggregation procedure. If the latter is theoretically grounded, as it should be, this becomes a source of distortions. Finally, the logical operators that link the different dimensions of the concept, in this case PSP, are not discussed in the literature. The problem can be seen in the following way. Social scientific categories are generally multidimensional. To deal with them formally, it is necessary to forward a proposition about the way in which the various dimensions should be assembled in a single notion. This has a logical the discussion of the operators that link the dimensions and an operational the construction of the aggregation function, see the section on order expression. To ground the discussion, an example from another, related, field may be useful. In sum, the first step of PSP data set and index building is muddled by definitional confusion, as if putative consequences and causes enjoyed the same status as definitions proper; lack of historicity, as if we were not dealing with relational terms; nonorthogonality of the axes of the definitional space, as if there were no mutual interaction between the dimensions of the concept; and a lack of specification of the logical operators that link the different dimensions of the definition. These challenges still have not been acknowledged in the literature, let alone resolved.

Coding: Dealing With Intrinsic Ambiguity


After defining the structure of the database and the relevant dimensions/variables, the team that intends to build a PSP index needs to gather the data. This implies among other things operationalizing the definition and translating information and cues into tractable tags (numbers or scales), that is, coding. In their current form, codes are built under the assumption that they can be precise the level of democracy is a 4, the degree of provision of services a 5 and objective. In this context, I will mean by an objective coding one that fulfills criteria approximately equal to what is referred to as reliability in the psychological literature: thousands of coders of different backgrounds and experience, but identical level of information, repeatedly marking the same event or state of the world a large number of times in the same way, meaning that the variance will be very low.9 For PSPs, the whole operation is extremely problematic. Simply, the operational expressions offered by the database codebook are not worded in a form that allows for either objectivity or precision. The PSP definitions are routinely heavily hedged, full of modifiers and of imprecise terms. Haziness of language is not an exception, but rather the rule (see Appendix A). Furthermore, the definiens is many times as obscure and impenetrable as the definiendum. For example, explaining fragility in terms of legitimacy can hardly be seen as a solution, as legitimacy is as complex, ambiguous and
26 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View multidimensional a concept as fragility. This is a very serious problem, because the gist of the exercise if it expects to have any credibility is to replace terms that are not directly observable by other that are. In reality, we confront here a double problem: analytical and empirical. Analytically, if we cannot consider the expression state failure/collapse/fragility obvious, then we cannot consider as obvious the linked expressions good governance, legitimacy, adequate provision of vital services, among others. Empirically, it is not clear that legitimacy can be observed directly. Certainly, no clear instructions are to be found in the codebooks of the PSP databases. Now put yourself in the shoes of a coder, who has to translate this very vague terminology into a string of 0s and a 1s, or into numbers in a scale. What is she or he expected to do? Note that she or he faces at least the following dilemmas: (a) The need to establish cut-offs for example, deciding above which point crime is considered to have spun out of control.10 Without a clear specification of where the cut-off should be placed, precision is spurious and objectivity illusory. (b) Administrate linguistic hedges all the databases we are referring to are heavily hedged: part of the definition of the phenomenon is the degree to which its symptoms are observed. Or sometimes it is its nature that is qualified. One and the other modality appear frequently together. For USAID (2005, p. 1, my italics), for example, failure is related to the following circumstance: the central government does not exert effective control over significant parts of its own territory. Who is to decide how much is significant and what is effective? This hedged and symptomatic nature of social scientific definitions is simply ignored in the PSP operationalization process, and coders have to deal with this as best they can (and probably idiosyncratically). (c) Mark without clear evidence some PSP criteria are not even testable: by definition they cannot be observed. According to USAID (2005, p. 1), vulnerable states are those unable or unwilling to adequately assure the provision of security. The rather metaphysical reference to the unwillingness of states is widespread, yet the notion that unwillingness (of whom?) can be calculated in a precise and objective manner is by no means obvious. These issues take us directly to the problem of the quality of the data, another big source of uncertainty in the social sciences, especially in the PSP field. It is in the nature of things that state performance information is directly proportional to the level of development and the strength of the state. Indeed, there is a pretty good correlation between the quantity of missing data in a database, and GDP per capita.11 This means that generally the data will be scarcer, and poorer, relative to contexts for which it is most needed. This is not a fully intractable problem, but it is not trivial either. It is not very relevant when there are steep differentials in state capacity, because the fact that data are difficult to retrieve does not mean that anything goes. Afghanistans GDP per capita is missing for several years, but nobody would claim that those missing values could be equal or higher than, say, that of Singapore. On the other hand, if we want to compare Uganda with Rwanda, and we do not have reliable data about their GDP per capita, then things become more difficult. Note that the presence of poor data increases the problems associated with the issue of cut-off. If the criterion to decide that a country is in a state of war is 1000 casualties but data sources are shaky, what should be done with uncertain counts of 990 or of 1010?
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 27

Gutierrez San n One direct way of attacking this problem is by transforming numerical into categorical data.12 This is a clear improvement with respect to spurious precision. You know that you can put country A in the lower development category and so on.13 However, this does not eliminate ambiguity. There is a trade-off, of course: the issue of cut-off points persists, and actually increases (when does level 1 of category A start and end?). On the other hand, the index will now combine numerical and linguistic labels. These are two quite different types of labels. A person counting trade, or homicides, or transit accidents, and producing a numerical label, bases his or her result on some kind of written record, and operates under the assumption that if somebody else makes the same count she or he will (almost) always come to (nearly) the same result. However far we are from this ideal in practice, the operational assumption is reasonable enough.14 In other terms, in this context the coder can genuinely aspire to objectivity, and even to precision (within certain boundaries). Linguistic labels, on the other hand, are fundamentally different in nature. A coder produces a subjective evaluation based on an unspecified corpus of evidence. Here the notion of objectivity, even in the very limited sense I am using it, loses part of its meaning, even if once again there are several situations in which nobody would seriously put in doubt what the correct label is. For example, does North Korea have extensive formal constraints on its executive? No. Are there many constraints in Sweden? Yes. But the majority of comparisons indeed, almost all the really interesting ones are much less clear cut. Is political participation sharply restricted in Singapore, in Philippines, in Venezuela or in Colombia? Arguably, yes. But especially importantly for PSPs, as their ultimate product will be a ranking where is it more sharply restricted? We know that even in evaluations provided by experts, the answer is likely to diverge, sometimes greatly. It would be an error to assert that linguistic tags only reveal the preferences of the coder it is correct, instead, to say that they are different from numerical ones, and that they should not be treated on the same footing. PSP databases, however, tend to generally treat them as if they were identical, which is highly misleading. At the same time, a very substantial portion of the data offered by PSPs consists of linguistic, not of numerical, labels. Despite this, in econometric and quantitative exercises but also in policymaking and lay discussions they are treated as if they were bona fide hard data. This gives rise to a circular fiction: we speak to produce the facts, and then let the facts speak. This of course is intimately related to the political economy of knowledge. You do not have to be an extreme relativist to understand that the authoritative definition of a state of the world can be part of that state of the world. Using hazy notions as if they were genuine solutions to definitional problems can have direct political and economic consequences. For example, the LICUS and CPIA ratings used to be instrumental to decide on credits and access to other resources that low development countries desperately need (Carvalho, 2006; World Bank, 2006). All this simply underscores that social science quantification faces several types of uncertainty (and not only probabilistic issues). PSP indexes are not wrong because they are tainted by ambiguity, but because they fail to take inevitable ambiguity fully on board (Gutierrez and Gonzalez, 2009). Social conflicts, ambitions, aspirations and cravings are expressed in natural language; so are social science concepts. Thus, as long as we are talking about history and macro interactions, total disambiguation is doomed to failure. I believe that it can be comfortably claimed that ANY viable social scientific definition is heavily hedged. This, of course, includes state performance in an eminent
28 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View degree. For example, Douglass et al (2009, p. 41) assert that there are no sharp borders between their ideal types of states, and permanently stress the fluidity of their categories at the borders of the classificatory space. I read this in the following sense: in this case disambiguation is not only impossible, but if it were possible it would be incorrect (it would introduce an uncalled for assumption of crisp borders between state types). Index builders and users should take this observation very seriously. An additional source of uncertainty is the inverse relation between the quality of the observation and the interest of the event PSP and conflict researchers tend to focus their attention on situations where data are critically poor.15 Finally, indexes participate in the constitution of the very state of the world that they aspire to define. This makes some definitions an issue of complex political debate (Gutierrez and Gonzalez, 2009). Attempting to achieve precision and objectivity without taking into consideration these various types of uncertainty is arguably a vain enterprise.

Order
The last necessary step to producing a PSP index and more generally, any index useful for cross national research is aggregation. For example, you have many variables with different numerical and possibly linguistic tags (see Table 2), and a certain number of cases (typically, countries). You want to put all your cases into a scale, perhaps allowing for ties. This is a typical multi-attribute decision-making operation (Zanaky et al, 1998; Ehrgott and Gandibleux, 2002; Kahraman, 2008). There are many choices of this type in everyday life. The typical example used by engineers in multi-attribute decision-making literature concerns buying a car, where one normally takes into consideration security, petrol efficiency, power and price (see for example Lootsma, 1997). There can be cars that perform very well with respect to one or two variables, but poorly with respect to others (see Table 3, where higher marks mean better), yet the operation appears relatively unproblematic insofar as you aggregate all the variables to get a single mark, or rank, on which to base your decision, and then choose the car that is rated higher.
Table 2: Typical PSP information setting Variable 1 Case Case Case Case 1 2 3 N 1 2 5 4 Variable 2 5 2 3 3 Variable 3 6 3 2 2 Variable 4 8 4 1 1

Table 3: A very simple multi-attribute decision Security Car Car Car Car 1 2 3 4 5 3 2 5 Efficiency 5 3 5 3 Power 1 3 2 5 Price 4 5 5 2
29

r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

Gutierrez San n Ranking state performance is heavily multi-attribute. Generally, state performance is evaluated by 10 or more variables, thematically clustered in boxes (or baskets) of variables.16 The index is generated from the boxes. Thus, there are two sub-steps before producing the final mark, aggregating variables and aggregating boxes. But the previous step specifying the logical operators that link the dimensions of the concept has been omitted, and thus we can already anticipate problems. To start with the simplest possible scenario, suppose you have to make a final binary decision, the state is failed or not, and you have three boxes, A, B, C, all three also binary. How many ones (successes) will you require to classify the state as failed? All three? Two of three? One? Perhaps only the first two, or the third? This is the logical-conceptual dimension: will you base your definition on a logical AND, a logical OR, or a combination of both, in order to decide when are you in the presence of the phenomenon of interest? As happens in many other fields, PSP indexes generally ignore the issue. They do not try to establish conceptually what combination of boxes, or of variables, will produce enough evidence to assert that we are in presence of the phenomenon of interest (for example, state failure). Instead, PSP index managers tend to proceed directly to the operational aggregation, of both variables and boxes. Here there are two very, very fundamental and interrelated issues that deserve a careful consideration. The first is a typically deeply hidden assumption about the nature of the variables. The aggregation proceeds as if there were a common abstract counting item, what economists call a numeraire, that enables the decision maker to count how many units of variable A substitute one unit of variable B. In the example of the car buyer of Table 3, he gives every variable exactly the same weight; he will tolerate the loss of quality in dimension 1 (security) if there is an identical improvement, say, in dimension 2 (price). The assumption that there exists a numeraire is an essential clog in the machine of economic theory,17 and it is both powerful and (perhaps with a bit of optimism) reasonable for the type of problems for which it was crafted. The question, however, is whether it holds in the context of cross-national research and social and political variables, and more specifically, those relating to state fragility. As all the PSP databases take for granted the existence of the numeraire, they do not discuss explicitly their aggregation function; there is no theory behind the choice of one or another function. An aggregation function is a rule that attributes to every vector of data (case) a single number in the range (in this case, a mark, or a position in the rank). Both LICUS/CPIA and the FFPs aggregation functions are straightforward: they sum up the values of their variables (see Appendix B). To understand the concrete meaning of this, consider the following exercise: take the two variables that appear in two different boxes in the Fund for Peace Fragile States Index. On the one hand, Brain drain of professionals, intellectuals and political dissidents fearing persecution or repression, and on the other, Outbreak of politically inspired (as opposed to criminal) violence against innocent civilians. These variables are considered to be equivalent in numerical terms. But what does this mean? Would the repatriation of one (or more) scientist(s) compensate for, say, one massacre? I do not believe that anybody would be ready to say as much and of course there is no theory to sustain such proposition but this is arguably the concrete meaning of that particular aggregation function in relation to that particular situation. Note that I am not suggesting that a weighted average is incorrect in general; this would be pure obscurantism. What I am saying is that the implicit assumption of weighted averages is that there is a substitution rate between the variables, and that as soon as one unpacks that assumption it becomes evident that it does not hold always in the context
30 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View of social and political databases.18 Sometimes it does, but sometimes it does not (for further details, see Bouyssou and Vansnick, 1986; Gutierrez et al, 2010). How much of a monopoly over violence by the state would one be ready to sacrifice for democracy, or vice versa? How much security for liberty? I imagine that the immediate reaction of many readers will be to retort that, if there is an answer and many times there isnt it will depend for whom? This is completely correct. But when there is a numeraire, by a series of reductions and reasonable assumptions, the existence of a homogeneous decision maker in a concrete domain of activity is hypothesized.19 Multiattribute choice therefore typically tries to express in the most faithful way possible the preferences of the decision maker (Kahraman, 2008). Indeed, one of the critical aspects of multi-attribute choice is its capacity to elicit explicit criteria from those who decide and whose interests are being represented in the process of index creation: for example, the car buyer. In the PSP field, however, we have no homogeneity only the fiction of it that produces serious anomalies. This is especially clear when we take into account the fact that in PSP measures, a substantial part (in some, the totality) of the variables are subjective evaluations of the state of the world by in-house coders (which are subject to very tenuous controls, if any). Ignoring the issue of their preferences in multi-attribute decision making allows for hundreds of hidden biases to creep into the final ordering of the countries. This brings us to the second fundamental ordering issue. In the absence of a numeraire, a new, fundamental problem appears: order. Order issues have not been acknowledged as yet, but they are essential when a numeraire does not exist, because then each country lives in a space of many dimensions. The crucial change when moving from one to many dimensions is that total order is lost: not all strings of numbers can be compared. As seen in Figure 1, in a one-dimensional world any two numbers (cases, countries) can be compared: we know that 1 is less than 3, or that a is less than b. Instead, in the two-dimensional space of Figure 1, we are unable to establish whether a is superior to b or vice versa. a and b are incomparable, until we make explicit which of the two dimensions is more important for us. If it were the first, then a would beat b, otherwise b would come on top of a. Current PSP indexes do not acknowledge the problem of the absence of a numeraire, and thus of total order. In consequence, they only stand on solid ground when comparing countries where one of them performs better than the other in all dimensions. All other comparisons can be seen as an artefact of the arbitrary choice of the aggregation function. I will present this proposition very informally, as it is quite straightforward. The idea is to show that for any good aggregation function that takes two non-comparable

Figure 1: Loss of order when moving from one to more dimensions.


r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 31

Gutierrez San n countries a and b and ranks a as equivalent or better than b, you can find an equally legitimate aggregation function that ranks b above a. If this is the case, then unless one has shown that for some reason or another the choice of that particular aggregation function is theoretically or formally superior to other available options, then the relative position attributed to a and b is a result of the arbitrary choice of the function. Three concepts are necessary to capture the illustration: order, comparability and aggregation function. First of all, suppose we transform all the variables and boxes so that they go in the same direction and have the same top and bottom values.20 Then, we create a relation of precedence (t), which works as the standard less or equal than (worse or equivalent than). Case A precedes case B, or in other words AtB, if all the numbers contained in A are less or equal than those contained in B. For example, if we are speaking of state strength, Haiti precedes Norway. Note that some countries cannot be compared by this precedence relation, because they are neither better nor worse in every aspect than the other, as was the case of cars 2 and 3 in Table 3, or countries a and b in the bi-dimensional space of Figure 1. Then we need to count all the possible pair-wise comparisons between cases,21 and put into one category (call it set C) those pair-wise comparisons that can be made according to the precedence relation t, and in another category (say, set INC) those that cannot be counted pair-wise. The last piece of the jigsaw is the aggregation function. A correct aggregation function is a set of rules that attributes to a vector of characteristics a single element in the range, and additionally: (a) behaves monotonically (if case AtB, then A is ranked lower than B); and (b) it respects boundary conditions (if A has the higher marks in all its characteristics, then it will be attributed the highest rank; if it gets the lower ones, it is bottom ranked see Beliakov et al, 2007; Bustince et al, 2008). Now we have all the elements to represent visually the problem of the arbitrariness of the ranking of two non-comparable cases. Let us go to Figure 2. There are four cases, R, T, V, S (Figure 2(a)), and we want to represent them in a scale from 1 to 3. We know that R is better than all the others, and S is worse than all the others. What will happen with T and V? Suppose we create an aggregation function, F, and that, for one reason or another, puts T above V (see Figure 2(b)). R and T will get a 3, V a 2, and S a 1. Note that F fulfills all the conditions (monotonicity and boundary) of a correct aggregation function. Now imagine I change heart, and decide that V suits me better than T. As Figure 2(c) shows, I can create a new aggregation function, G, that puts V above T. Then R and V will get a 3, T a 2 and S a 1. Note that, once again, G is correct: it is monotonous and preserves boundary conditions. This shows that comparisons between the members of INC are basically arbitrary (that is, unless the aggregation function is explicitly discussed and substantiated). It otherwise remains for me to say that this illustration refers to a phenomenon that can be demonstrated formally, in a more or less straight forward manner (Gutierrez and Argoty, 2010). The possibility of finding alternative rankings for members of the set INC is very general, and grows very fast in the number of dimensions. Let us ground the discussion on concrete examples. Suppose we want to know whether Colombia is a stronger (less fragile, best performer) state than Bolivia. Now it is easy to understand why this comparison is difficult. It is a multi-attribute decision, without a numeraire or a homogeneous, abstract decision maker. Colombia has not had for a long time either the monopoly over legitimate violence or full control of its territory; on the other hand, it is a much better provider of basic services and has less ethnic fragmentation than Bolivia. In the FFP, Colombia is systematically rated higher than Bolivia, but
32 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View


R

S R

S R

Figure 2: (a) Creating two different well formed aggregation functions with different rankings. (b) The function F. (c) The function G. It ranks V better than T. It is also a correct aggregation function. Note: It ranks T better than V. It is a correct aggregation function.

the spread of the marks of the former is wider than the spread of the latter. For example, for 2009 Colombia obtained a rating of 89.2, but its variance (with respect to the variables that constitute the index) was of 1.684, whereas Bolivias mark was 86.3, but its variance was 0.953. To visualize this, cases are not represented in Figure 3 as numbers, but as intervals, where the left and right ends are the minimum and maximum score, respectively.22 The figure shows that there is no obvious way of comparing state strength (fragility) in Bolivia and Colombia in 2005, or more generally any pair belonging to set INC. As there is a well-formed aggregation function for every possible ranking of non-comparable cases, we face a significant dilemma. Or the overwhelming majority of cases are comparable, and then ranking them says nothing new, or there is a substantial portion of non-comparable cases, in which case ranking will inevitably be arbitrary: it predicates nothing about the relative position of the non-comparable cases, but rather about the
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 33

Gutierrez San n
1

0.8

0.6

0.4

0.2

0.3

0.4

0.5

0.6

0.7

Figure 3: Broad and narrow performances. Note: The thick line represents Colombia, and the thin one Bolivia.

rules of the game you have established. This is clearly not an optimal solution, insofar as if the administrator has full control over the rules of the game she or he also has control over the results. Another, more positive, way of wording this latter idea is that the relative positions (in the final rank, for example, in the range of the aggregation function) of the pair-wise comparisons pertaining to INC are wholly determined by the function itself.23 One might expect a mass of intellectual resources to be mobilized in order to theoretically support the specific aggregation function that is being used, and to explain why it is better than others, as if for substantive reasons we can show that one aggregation function is preferable over another, then the arbitrariness of the exercise vanishes. This, however, has not happened with regard to PSPs (at least, not formally, as informally, all PSP have their vociferous defenders). Quite the contrary: there is not a single discussion not one! about the issue in the literature, nor in the codebooks. In fact, there are codebooks that simply skip the theme. The deeply hidden assumption of the existence of a numeraire has clearly played a role in preserving this lacuna. It might be argued that multidimensionality can be flattened out. For example, factorial and principal component analyses, or multidimensional scaling, allow researchers to see below an apparently baffling multidimensionality (see Lebart, 1995; Izenman, 2008). Approaches such as structural equations permit the incorporation of multidimensionality directly into the model, assuming that a complex concept is a latent variable regulated by other, more explicit ones (on this point, see Lebart, 1995). The approach favored by the State Failure Task Force (Goldstone et al, 2000) is the following: reducing dimensionality through multi-variate statistics, and then using a neural network to make the final aggregation. It is easy to see that nothing of this can solve the problem of order and aggregation. For such a complex concept as state performance, the reduction of dimensionality is only a necessary but not a sufficient step in the process of aggregation; after reduction, more than one dimension will always remain. On the other hand, the downside of neural networks which in many respects are superior to the aggregation functions I considered above is that they are black boxes, as King and Langche (2000) note. They are semantically vacuous. In the context of our discussion, this is a highly undesirable characteristic: for PSP indexes, the aggregation function is a substantial part of the theory.
34 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View

Conclusion
Owing to globalization and technological change, global political indexes have thrived. Global policies and phenomena have to be followed, evaluated and assessed. Any suggestion that we reject them in toto would have to deal with the problems that incited indicator building in the first place: the need to have a general landscape of the effects of policy A or of the evolution of phenomenon B. One hundred first-class monographs about 100 different countries will certainly find many excellent uses, and will probably help comparative understanding much more than the vast majority of quantifications, but they will not produce a common, panoramic, view. The same can be said about dropping indexes and going only for single variables, or focusing only on the evolution of particular countries and sacrificing the comparative perspective. All these options have been tried and tested at one moment or another, but none solves the key issue of aggregation. Thus, it can comfortably be hypothesized that political indicators will remain with us for the foreseeable future. Currently, however, they appear contrived and weak, and potentially misleading. Their main problem is a lack of awareness of their own specificity. They have been built as if they were any other indicator. But this is not a good departure point. Political indicators are different in some crucial senses. First, the concepts that they try to operationalize deal with many sorts of uncertainty, and generally are heavily hedged and have fuzzy boundaries. Second, they are highly multidimensional. We do not have tools that would allow us to reduce all political variables to a common counting unit. Suppose we want to aggregate womens rights and violence. How many deaths would be equivalent to a given improvement in the situation of women of any given country? The question seems senseless; it is clear we would not know how to answer it. The lack of a numeraire implies that political indicators have to deal with the sticky problem of order, that is absent in other domains. Third, political indexes deal with corrupt, deteriorated and essentially verbal data (in some cases deserving the tag of data only by analogy). If political indicators in particular PSP indexes are here to stay, and if their present form they are highly imperfect, perhaps unusable (see Munck and Verkuilen, 2002; Vreeland, 2008), then it is fundamental that we seek to improve them. A basic message is that an index that is not informed by a substantive and meaningful theory will hardly be a coherent construct. As for other political indicators, no amount of numerical cunning will replace theoretical clarity. The theory materializes both in definitions and in the aggregation procedure. The latter is the key tool for dealing with the sticky issue of order (that appears when high dimensionality is irreducible). If aggregation and order are not dealt with explicitly, the final ranking will convey information about the ad hoc chosen procedure, not necessarily about any characteristics of the countries (although this argument does not apply to the fully comparable cases, where one country is better off in all dimensions than another). Discussing and exploring aggregations, thus, is a key task for anybody who expects to build, or to read, a political index (for a first interesting step in this direction, see OECD/DAC, 2007). Even then, however, the nature and quality of the data will remain a problem, although not necessarily an intractable one. The same phenomena that made political indexes possible and desirable a new wave of globalization; technological change generated a serious informational overflow. We have access to huge masses of data, which is better than noise but which is not fully reliable (and which is very malleable). How to use it? How to interpret it? Though these are still open questions, but there is a growing literature that
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 35

Gutierrez San n may help us find at least some provisional answers (Bouyssou and Vansnick, 1986; Bouyssou and Perny, 1992; Lootsma, 1997; Bossert and Peters, 2000; Al-Awadhi and Garthwaite, 2006; Hiroshi, 2009). The bottom line is that there is no bullet-proof indicator (Bouyssou et al, 2000). As in other aspects of the discussion, this does not mean that everything goes, however. In particular, substantial improvements can be attained if: (a) The quality of data is explicitly and thoroughly discussed. There are more and less reliable counts, counts are different from expert opinion assessments, and these differ from in-house coding with loose and informal constraints. The following questions should be answered: Where do the data come from? Which desirable conditions do they fulfill? Could a third party reproduce the generation of the data? (b) Political economy issues are addressed. In general, it is probably the case that global agencies would be better off following external indexes, over which they have no influence; in the long term, this is more credible and helps better decision makers to orient themselves in the world. Typical questions in this field: does the decision maker have control over the outcome of the index? If the data come from in-house coding or expert opinion, how big would be the variance of points of view (marks) held by experts taken from a broad sample? (c) Aggregation and order problems are explicitly acknowledged. Where does the aggregation function come from? Why this one and not another one? How robust is it? What kind of story do the weights imputed to the variables tell? (d) Gradually, new representations not only numerical, but also through intervals are imagined and put in operational form. Typically, political indicators and formal decision making tools (Bouyssou et al, 2000) are based on highly dimensional and ambiguous concepts, and deal with very noisy although not necessarily meaningless data. Mechanically applying to this world the techniques used for worlds that are much less noisy and that are reducible to one dimension is severely misleading.

Acknowledgement
I present here results of the Crisis States Research Programme, sponsored by DFID. I wish to thank James Putzel, the director of the program, and my colleague Andrea Gonzalez, for their support and contributions to the reflection developed here. Thanks also to the anonymous reviewers and the editor, whose comments helped improve substantially a previous version of this article.

Notes
1. 2. 3. 4. 5. I use this umbrella term because there is a huge terminological dispersion in the literature. In the context of this article, simple should not be taken as synonymous of bad; rather the contrary. There are some exceptions (such as the BTI), but this many variables-several aggregation steps situation is the standard one. In reality, there are many more issues, as defining state and statehood is not a trivial task. But these are beyond the scope of this article. It is important to note, however, that different instances of state fragility or failure can produce events that involve similar policy challenges (for example, developed countries will face a wave of migration and so on).
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

36

State Performance Indexes: A Critical View


6. Examples from Polity are useful to invoke for several reasons. In many ways, it is an exemplary database, but a number of PSP exercises include democracy as defined by Polity as a dimension of performance, associating it with state strength, with obvious conceptual problems as a result (see Inkeles, 1993; Srensen, 1993). See the State Failures codebook at http://globalpolicy.gmu.edu/pitf/. Other aspects of plural democracy, such as the rule of law, systems of checks and balances, freedom of the press, and so on are means to, or specific manifestations of, these general principles. We do not include coded data on civil liberties (see www.systemicpeace.org/inscr/ p4manualv2009.pdf). Note that precision and objectivity are different from each other (and from correctness). This threshold, additionally, may vary from country to country. The correlation is high but not perfect. For example, European centrally planned economies in the 1960s had high intermediate levels of development and strong states, but produced almost no data for researchers. The other way is imputation of missing data. There are already very sophisticated imputation techniques, which are, however, rarely used in PSP indexes, many of the latter simply delete cases with missing data, which introduces serious distortions. At the same time, it is important to note that a certain percentage of missing data, the use of imputation techniques falls sharply. For example, the World Banks GNI categorical data are extremely useful in this respect. Note that here we are dealing with intervals: we know that country As GNI per capita is between 0 and 200. See http://data.worldbank.org/about/country-classifications/country-and-lendinggroups or http://data.worldbank.org/node/207. The issue of whether the record is accurate is a wholly different problem. Polity has advanced in this regard, introducing a variable about the quality of data (see Polity IV Project: Dataset Users Manual, p. 31. See http://www.systemicpeace.org/inscr/p4manualv 2009.pdf). Given the complexity of the problem involved, this is a very good practice. And of course it need not be money. The underlying counting units are von Neumann utilities, but with a series of additional and also very reasonable assumptions, the observable numeraire becomes in effect money. There are other aggregation functions that do not beg the substitution assumption. For example, THE consumer, THE producer, THE politician, THE bus customer. We can suppose that, ceteris paribus, THE bus customer prefers a cheaper than a more expensive ride, and a shorter than a longer trip; the genuine exceptions are few, and contrived. On this subject, see Przeworski (2004, p. 86), for whom the theory [rational choice or strategic action] works only if we can identify classes of individuals in some structure of conflict and plausibly attribute to them some objectives. To put it differently, the political economy approach works only when it is imbued with sociology. This is why it is hard to say anything about individuals or even voters. They are heterogeneous. Some want one thing, some want another y The more sociology we can build into theories, the greater the benefit of the economic approach. For example, the higher the mark the better. There are Binomial (N,2), of these N being the total number of cases. This is the last formula I plug in; the rest of the discussion proceeds strictly verbally. Note that this interval representation does not assume that there is a numeraire that makes dimensions interchangeable; its assumption, instead, is that the most relevant characteristics of each case are captured by its extreme values. I believe that in many cases for example, PSP databases this assumption is much more reasonable than the numeraire one. This is not true for comparable cases (owing to the restriction of monotonicity).

7. 8.

9. 10. 11. 12.

13.

14. 15. 16. 17. 18. 19.

20. 21. 22.

23.

References
Al-Awadhi, S. and Garthwaite, P. (2006) Quantifying expert opinion for modelling fauna habitat distributions. Computational Statistics 21(1): 121140. Bates, R. (2008) When Things Fell Apart: State Failure in Late-Century Africa. New York: Cambridge University Press.
r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042 37

Gutierrez San n
Beliakov, G., Pradera, A. and Calvo, T. (2007) Aggregation Functions. A Guide For Practitioners. New York: Springer. Bossert, W. and Peters, H. (2000) Multi-attribute decision-making in individual and social choice. Mathematical Social Sciences 40: 327339. Bouyssou, D., Marchant, T., Pirlot, M., Perny, P., Tsoukias, A. and Vincke, P. (2000) Evaluation and Desicion Models. A Critial Perspective. Massachusetts: Kluwer Academic Publishers. Bouyssou, D. and Perny, P. (1992) Ranking methods for valued preference relations: A characterization of a method based on entering and leaving flows. European Journal of Operational Research 61: 186194. Bouyssou, D. and Vansnick, J. (1986) Noncompensatory and generalized noncompensatory preference structures. Theory and Decision 21: 251266. Briscoe, I. (2009) isis Europe, http://www.isis-europe.org/pdf/2008_artrel_227_esr42eufragilestates.pdf, accessed June 2009. Bustince, H., Herrera, F. and Montero, J. (2008) Fuzzy Sets and Their Extensions. Representation, Aggregation and Models. New York: Springer. Cammack, D., Mcleod, D., Menocal, A. and Christiansen, K. (2006) Donors and Fragile States Agenda: A Survey of Current Thinking and Practice-report Submitted to the Japan International Cooperation Agenda. London: ODI-JICA. Carvalho, S. (2006) Engaging with Fragile States: An IEG Review of World Bank Support to Lowincome Countries Under Stress. Washington DC: Independent Evaluation Group, World Bank. Christenson, M., Dabelko, G.D., Esty, D.C. and Parris, T.M. (2000) State Failure Task Force Report: Phase III Findings. McLean, VA: Science Applications International Corporation, http://globalpolicy.gmu.edu/pitf/SFTF%20Phase%20III%20Report%20Final.pdf, accessed 30 September 2000. Di John, J. (2010) The concept, causes and consequences of failed states: A critical review of the literature and agenda for research with specific reference to Sub-Saharan Africa. European Journal of Development Research 22(1): 1030. Dorff, R. (2000) Addressing the challenges of failed states. Paper presented at a conference on failed state; 710 April, Florence, Italy. Ehrgott, M. and Gandibleux, X. (2002) Multiple Criteria Optimization, State of the art annotated bibliographic survey. Massachusetts: Kluwer Academic Publishers. Goldstone, J. et al (2010) A global model for forecasting political instability. American Journal of Political Science 54(1): 190208. Goldstone, J.A. et al in consultation with Christenson, M. Dabelko, G.D. Esty, D.C. and Parris, T.M. (2000) State Failure Task Force Report: Phase III Findings. McLean, VA: Science Applications International Corporation. Gros, J.G. (1996) Toward a taxonomy of failed state in the new world order: Decaying Somalia, Liberia, Rwanda, and Haiti. Third World Quarterly 17(3): 455471. Gutierrez, F. and Gonzalez, A. (2009) Force and ambiguity: Evaluating sources for cross-natioanl research The case of military intervention. Crisis States Working Papers (Crisis State Research Center) 50: 135. Gutierrez, F. and Argoty, A. (2010) Order preserving functions from finite posets to R: How many are there? In: F. Gutierrez, D. Buitrago, A. Gonzalez and C. Lozano (eds.) Measuring Poor State Performance. Problems, Perspectives, and Paths Ahead. London: LSE UKAID, pp. 136146. Gutierrez, F., Gonzalez, A., Buitrago, D. and Lozano, C. (2010) Measuring poor state performance: Problems, perspectives and paths ahead. Paper presented to the workshop on Measuring State Performance; 2021 May, London School of Economics and Political Science, London, United Kingdom. Helman, G. and Ratner, S. (1993) Saving failed statess. Foreign Policy 89: 320. Hiroshi, S. (2009) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. India: Springer. Inkeles, A. (1993) On Measuring Democracy: Its Consequences and Concomitants. New Jersey: The State University. Izenman, A.J. (2008) Modern Multivariate Statistical Techniques. Philadelphia, PA: Springer. Kahraman, C. (2008) Fuzzy Multy-critera Decision Making. Theory and Applications with Recent Developments. Istanbul: Springer. King, G. and Langche, Z. (2000) Improving quantitative studies of international conflict: A conjecture. American Political Science Review 94(1): 2136.
38 r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View


King, G. and Langche, Z. (2001) Improving forecasts of state failure. World Politics 53: 623658. Lebart, L. (1995) Multivariate Descriptive Statistical Analysis. Correspondence Analysis and Related Techniques for Large Matrices. New York: John Wiley & Sons. Lootsma, F. 1997)) Fuzzy Logic for Planning and Decision Making. London: Kluwer Academic Publishers. Marples, D. (2004) The Collapse of the Soviet Union, 19851991. London: Longman. Munck, G. and Verkuilen, J. (2002) Conceptualizing and measuring democracy evaluating alternative indice. Comparative Political Studies 35(1): 534. North, D., Wallis, J. and Weingast, B. (2009) Violence and Social Orders. A Conceptual Framework for Interpreting Recorded Human History. New York: Cambridge University Press. OECD/DAC. (2007) Fragile States: Policy Commitment and Principles for Good International Engagement in Fragile States and Situations: DAC High. Paris: OECD/DAC (Organisation for Economic Co-operation and Development/Development Assistance Committee). Przeworski, A. (2004) States and Markets. A Primer in Political Economy. New York: Cambridge University Press. Rice, S. and Stewar, P. (2009) Index on State Weakness in the Developing World. Washington: The Brookings Institution. Rosenau, J. (2006) The Study of World Politics Globalization and Governance. New York: Routledge. Sartori, G. (1970) Concept misinformation in comparative politics. American Political Science Review 64(4): 10331053. Skocpol, T. (1984) Los estados y las revoluciones sociales. Un analisis comparativo de Francia, Rusia y China. Mexico: Fondo de Cultura Economico. Srensen, G. (1993) Democracy, authoritarianism and state strength. The European Journal of Development Research 5(1): 634. Taylor, D. (2008) Security expert says new US administration must develop fragile states. Voices of America, http://www.voanews.com/english/archive/2008-11/Security-Expert-says-New-USAdministration-must-Develop-Fragile-States-PART-2-of-5.cfm?CFID=304592121&CFTOKEN= 22882106&jsessionid=8430d822b05914a49a8a686e356711325247, accessed August 2009. Tilly, C. (1992) Coercion, capital y los Estados europeos (19901990). Madrid, Spain: Alianza. USAID. (2005) Fragile states strategy. Washington, DC: USAID. Document no. PD-ACA-999. Vreeland, J.R. (2008) Research note: A problem with polity Unpacking anocracy. Journal of Conflict Resolution 52(3): 401425. World Bank. (2006) Engaging with Fragile States. An IEG Review of World Bank Support to Low Income Countries under Stress. Washington: World Bank. Zanaky, S., Solomon, A., Wishart, N. and Dublish, S. (1998) Multi-attribute decision making. A simulation comparison of select methods. European Journal of Operational Research 107(3): 507529. Zartman, W. (1995) Collapsed States: The Disintegration and Restoration of Legitimate Authority. London: Lynne Rienner Publishers.

r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

39

Gutierrez San n

Appendix A
See Table A1.
Table A1: Synoptic panorama of selected PSP definitions Concept Fund for peace State failurea Definition A state that is failing has several attributes. One of the most common is the loss of physical control of its territory or a monopoly on the legitimate use of force. Other attributes of state failure include the erosion of legitimate authority to make collective decisions, an inability to provide reasonable public services and the inability to interact with other states as a full member of the international community. The 12 indicators cover a wide range of state failure risk elements such as extensive corruption and criminal behavior, inability to collect taxes or otherwise draw on citizen support, large-scale involuntary dislocation of the population, sharp economic decline, group-based inequality, institutionalized persecution or discrimination, severe demographic pressures, brain drain, and environmental decay. States can fail at varying rates through explosion, implosion, erosion, or invasion over different time periods USAID uses the term fragile states to refer generally to a broad range of failing, failed and recovering states. However, the distinction among them is not always clear in practice, as fragile states rarely travel a predictable path of failure and recovery, and the labels may mask sub-state and regional conditions (insurgencies, factions and so on.) that may be important factors in conflict and fragility. It is more important to understand how far and quickly a country is moving from or toward stability than it is to categorize a state as failed or not. Therefore, the strategy distinguishes between fragile states that are vulnerable and those that are already in crisis USAID is using vulnerable to refer to those states unable or unwilling to adequately assure the provision of security and basic services to significant portions of their populations and where the legitimacy of the government is in question. This includes states that are failing or recovering from crisis USAID is using crisis to refer to those states where the central government does not exert effective control over its own territory or is unable or unwilling to assure the provision of vital services to significant parts of its territory, where legitimacy of the government is weak or non-existent, and where violent conflict is a reality or a great risk Linguistic hedges Reasonable, full extensive, sharp

USAID Fragile stateb

Generally, rarely, may be

Vulnerableb

Adequately

Crisisb

40

r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

State Performance Indexes: A Critical View


Table A1: Concept State task force State failurec continued Definition Narrowly defined, state failures consist of instances in which central state authority collapses for several years. Fewer than 20 such episodes occurred globally between 1955 and 1998, however too few for robust statistical analysis. Furthermore, events that fall beneath this total-collapse threshold often pose challenges to US foreign policy as well. For these reasons, the task force broadened its definition of state failure to include a wider range of civil conflicts, political crises and massive human-rights violations that are typically associated with state breakdown Fragile states lack the functional authority to provide basic security within their borders, the institutional capacity to provide basic social needs for their populations, and/or the political legitimacy to effectively represent their citizens at home and abroad Weak states are susceptible to fragility or failure because of limited governance capacity, economic stagnation, and/or an inability to ensure the security of their borders and sovereign domestic territory Failing states exhibit key elements of fragility, and are experiencing organized political violence. Peace processes are weak or non-existent Failed states are states characterized by conflict, humanitarian crises, and economic collapse. Government authority, legitimacy and capacity no longer extend throughout the state, but instead are limited either to specific regions or groups Collapsed states possess no meaningful central governments. These nations exist purely as geographical expressions, lacking any characteristics of state authority, legitimacy, or capacity Recovering states are states that exhibit key elements of fragility, but where substantial and at least partially successful nation building efforts are present Linguistic hedges

Carleton University Fragile statesd

Basic, effectively

Weak statesd

Limited

Failing statesd Failed statesd

Key

Collapsed statesd

Recovering statesd

Least partially

Definition taken from http://www.fundforpeace.org/web/index.php?option=com_content&task=view&id= 102&Itemid=327#5. b Definitions taken from http://www.usaid.gov/policy/2005_fragile_states_strategy.pdf. c Definition taken from Jack A. Goldstone, Ted Robert Gurr, Barbara Harff, Marc A. Levy, Monty G. Marshall, Robert H. Bates, David L. Epstein, Colin H. Kahl, Pamela T. Surko, John C. Ulfelder and Alan N. Unger. In consultation with Christenson (2000). d Definitions taken from http://www.carleton.ca/cifp/app/serve.php/1138.pdf.

r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

41

Gutierrez San n

Appendix B

Examples of Aggregation Functions


LICUS

x1i x2i x3i =3x4i x5i x6i =3x7i x8i x9i x10i x11i =5 x12i x13i x14i x15i x16i =5 f xi 4

Fund For Peace f x1 ; x2 ; . . . ; x12


12 X i1

xi

After the sum has been done, then the following If-Then rules are used:  12  P If xi X90 then Alert i1  12  P If 60p x o90 then Danger i1  12 P xi o60 then Moderate danger If 30p  12 i1  P xi o30 then Sustainable state If
i1

42

r 2011 European Association of Development Research and Training Institutes 0957-8811 European Journal of Development Research Vol. 23, 1, 2042

Anda mungkin juga menyukai