Anda di halaman 1dari 277

LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM

IN THE BALKANS: A CORPUS-BASED STUDY

By Adnan Ajšić

A Dissertation

Submitted in Partial Fulfillment

of the Requirements for the Degree of

Doctor of Philosophy

in Applied Linguistics

Northern Arizona University

May 2015

Approved:

Douglas Biber, PhD, Co-Chair

Mary McGroarty, PhD, Co-Chair

Randi Reppen, PhD

James Wilce, PhD


Abstract

LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM

IN THE BALKANS: A CORPUS-BASED STUDY

ADNAN AJŠIĆ

Language ideologies have been closely related to nationalist discourses since the

inception of nationalism and the one-nation-one-language-one-territory trope, and

continue to be important for the construction and maintenance of national identities in

Europe and elsewhere. Although recent research has examined language debates and the

links between language ideologies and national identities in plurilingual and multicultural

societies (e.g., Canada, Vessey, 2013a; Spain/Catalonia, Pujolar, 2007), little attention has

been paid to contexts with minimal linguistic differences between groups such as the

West Central Balkans. Public language-related discourse in the Central South Slavic area

in the last twenty years has been dominated by a fierce debate over the ownership of the

common language (formerly known as Serbo-Croatian) and the concomitant contestation

of ethnolinguistic identities. The principal goal of this study, therefore, was to identify

dominant language-related discourses and language ideologies on the basis of an

empirical, mixed methods approach, and investigate the links between language-related

discourses, language ideologies, and ethnonationalist discourse in the mainstream press

published in Serbia as the largest Central South Slavic nation.

To investigate language-related discourses and language ideologies in the

mainstream Serbian press two comprehensive, specialized research (11,656,247 words

from 16,148 articles) and comparator (22,493,804 words from 37,227 articles) corpora

were compiled from relevant articles published in four leading Serbian dailies and

ii
weeklies. Following recent developments in mixed methods research into discourses and

ideologies (Baker et al., 2008), the data were analyzed using a combination of

quantitative (corpus linguistics) and qualitative (critical discourse analysis/discourse-

historical approach) methods. The second major goal of this study, therefore, was to

compare quantitative methods employed in terms of their usefulness and effectiveness for

the identification of language-related discourses and language ideologies.

The findings suggest the existence of pervasive language-related discourses of

endangerment and contestation which are based on an essentialist language ideology with

a long history and crucial function in Serbian nationalism. The methodological

comparison suggests different roles for different quantitative methods (e.g., micro- and

macroscopic analysis), as well as an overall complementarity of the quantitative and

qualitative methods. Crucially, however, exploratory factor analysis is shown to be the

most effective analytical method for the purposes of corpus-based investigations of

discourses and ideologies. Finally, despite some synchronic and diachronic variation in

(small ‘d’) discourses suggested by factors, the discursive and ideological profiles of the

mainstream Serbian press are shown to be fairly uniform and stable, suggesting broad

acceptance and naturalization of dominant language-related discourses and language

ideologies in Serbian society.

iii
Adnan Ajšić

© 2015

iv
Acknowledgments

I would like to thank my committee members, Doug Biber, Mary McGroarty,

Randi Reppen, and Jim Wilce for their unwavering support and endless patience. When I

was embarking upon this journey, I thought each one of you would be able to provide a

unique perspective I could rely on. I chose wisely. Thank you.

I am also grateful to my wife, Deniza, and my son, Aiden Mak, for endurance and

inspiration during a very difficult time. Hvala oboma. Gotovo je, slobodni smo. I’d also

like thank my mom, Elbisa, and mother-in-law, Ajša, who both did what Bosnian moms

do. Hvala objema. Finally, thank you to my sister, Amra, for proving me right, and my

brother-in-law, Nebojša, for having a rare combination of intelligence, skill, and patience.

Fala ti, učitelju.

Meši

Hvala ti na nesebičnosti i dostojanstvu u plavom mantilu. Htjelo je ovako da bude.

v
Contents
List of Tables.................................................................................................................... xii

List of Figures .................................................................................................................. xv

1. Introduction ................................................................................................................... 1

1.1 Definition of Problem ............................................................................................... 1

1.2 Sociolinguistic History and the Role of Language Ideologies in Contemporary

Balkans............................................................................................................................ 3

1.2.1 The symbolic importance of language. .............................................................. 3

1.2.2 A brief sociolinguistic history of West Central Balkans. ................................... 6

1.2.3 Language, identity and ethnonationalism in contemporary Balkans. .............. 13

1.3 Study Outline .......................................................................................................... 15

2. Literature Review ....................................................................................................... 16

2.1 Theoretical Approaches to Ideology ....................................................................... 16

2.1.1 Historical development. ................................................................................... 16

2.1.2 Theoretical approaches. ................................................................................... 16

2.1.3 Definitions........................................................................................................ 18

2.1.4 Conceptualizations. .......................................................................................... 20

2.2 Theoretical Approaches to Language Ideology ...................................................... 21

2.2.1 Historical development. ................................................................................... 21

2.2.2 Theoretical approaches. ................................................................................... 22

2.2.3 Definitions and conceptualizations. ................................................................. 23

2.3 Empirical Approaches to Language Ideology ......................................................... 26

2.3.1 Theoretical and methodological contexts in language ideology research. ....... 26

2.3.2 Research questions in language ideology research. ......................................... 28

vi
2.3.3 Types of data used in language ideology research. .......................................... 31

2.3.4 Corpus linguistics in research on discourse and language ideology. ............... 32

2.4 Gaps ........................................................................................................................ 34

3. Study Overview ........................................................................................................... 36

3.1 Research Questions ................................................................................................. 36

3.1.1 Research question 1. ........................................................................................ 36

3.1.2 Research question 2. ........................................................................................ 36

3.1.3 Research question 3. ........................................................................................ 36

3.1.4 Research question 4. ........................................................................................ 36

3.2 Research Design ..................................................................................................... 36

3.3 Construct Definitions and Operationalizations ....................................................... 39

3.3.1 Core concepts. .................................................................................................. 39

3.3.2 Keywords. ........................................................................................................ 39

3.3.3 Relevant collocates. ......................................................................................... 39

3.3.4 Dominant language-related discourses. ........................................................... 40

3.3.5 Dominant language ideologies. ........................................................................ 41

3.3.6 Ethnonationalism. ............................................................................................ 42

3.4 Coding Procedures .................................................................................................. 43

4. Data .............................................................................................................................. 43

5. Methods ........................................................................................................................ 48

5.1 Keyword Analysis ................................................................................................... 49

5.1.1 Theoretical background.................................................................................... 49

5.1.2 Analytical parameters and procedures. ............................................................ 51

vii
5.2 Collocation Analysis ............................................................................................... 52

5.2.1 Theoretical background.................................................................................... 52

5.2.2 Analytical parameters and procedures. ............................................................ 53

5.3 Exploratory Factor Analysis ................................................................................... 54

5.3.1 Theoretical background.................................................................................... 54

5.3.2 Analytical parameters and procedures. ............................................................ 56

5.3 Synchronic and Diachronic Variation (Analysis of Variance) ................................ 59

5.3.1 Theoretical background.................................................................................... 59

5.3.2 Analytical parameters and procedures. ............................................................ 60

5.4 Cluster Analysis ...................................................................................................... 61

5.4.1 Theoretical background.................................................................................... 61

5.4.2 Analytical parameters and procedures. ............................................................ 61

5.5 Critical Discourse Analysis: Discourse-historical Approach .................................. 61

5.5.1 Theoretical background.................................................................................... 61

5.5.2 Analytical parameters and procedures. ............................................................ 64

6. Results .......................................................................................................................... 67

6.1 Keyword Analysis ................................................................................................... 67

6.1.1 Keyword analysis (5+ hits section of SERBCORP). ....................................... 67

6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of

SERBCORP). ............................................................................................................ 73

6.1.3 Keyword associates. ......................................................................................... 81

6.2 Collocation Analysis ............................................................................................... 87

6.2.1 Collocation analysis (5+ hits section of SERBCORP). ................................... 87

viii
6.2.2 N-grams. ........................................................................................................... 91

6.3 Exploratory Factor Analysis ................................................................................... 96

6.3.1 Factor 1: Language education. ....................................................................... 105

6.3.2 Factor 3: Entrance exams. .............................................................................. 108

6.3.3 Factor 9: Foreign language education. ............................................................110

6.3.4 Factor 11: Officialization of Bosnian. .............................................................112

6.3.5 Factor 2: Cyrillic-only. ....................................................................................114

6.3.6 Factor 5: Minority language rights. ................................................................116

6.3.7 Factor 4: Officialization of Montenegrin 1. ....................................................119

6.3.8 Factor 8: Officialization of Montenegrin 2. ................................................... 121

6.3.9 Factor 6: Contestation over language ownership and name. ......................... 123

6.3.10 Factor 10: Linguistics as a science, lexicography, standardization and

contestation. ............................................................................................................ 126

6.3.11 Factor 7: Literature and publishing. ............................................................. 129

6.3.12 Factor 12: Linguacultural diplomacy, language, and culture. ...................... 130

6.4 Synchronic and Diachronic Variation in Language-related Discourses (Analysis of

Variance) ..................................................................................................................... 132

6.4.1 Variation by publication (synchronic). ........................................................... 133

6.4.2 Summary of variation by publication. ........................................................... 134

6.4.3 Variation by year of publication (diachronic). ............................................... 135

6.4.4 Summary of variation by year of publication. ............................................... 136

6.4.5 Variation by type of article (synchronic). ....................................................... 137

6.4.6 Summary of variation by type of article. ....................................................... 139

ix
6.5 Cluster Analysis .................................................................................................... 140

6.5.1 Preferred cluster solution and scoring patterns by factor and cluster. ........... 141

6.5.2 Synchronic and diachronic clustering patterns. ............................................. 145

6.6 Critical Discourse Analysis/Discourse-historical Approach ................................. 149

6.6.1 Excerpts from texts representative of Factor 2. ............................................. 150

6.6.2 Excerpts from texts representative of Factors 4 and 8. .................................. 154

6.6.3 Excerpts from texts representative of Factor 11. ........................................... 159

6.6.4 Excerpts from texts representative of Factors 6 and 10. ................................ 163

6.6.5 Topoi. ............................................................................................................. 169

7. Discussion................................................................................................................... 172

7.1 Research Question 1 ............................................................................................. 172

7.2 Research Question 2 ............................................................................................. 175

7.3 Research Question 3 ............................................................................................. 177

7.4 Research Question 4 ............................................................................................. 185

8. Conclusion ................................................................................................................. 186

8.1 Methodological Comparison................................................................................. 187

8.2 Language-related Discourses, Language Ideologies, and Ethnonationalism: What It

All Means .................................................................................................................... 189

8.3 Implications .......................................................................................................... 195

8.4 Limitations ............................................................................................................ 196

8.5 Future Research .................................................................................................... 197

References ...................................................................................................................... 205

Appendices ..................................................................................................................... 215

x
Appendix A: Sampling Procedures ............................................................................. 215

Appendix B: Comparative Analyses of Comparator Corpora .................................... 221

Appendix C: Keyword Analysis (SERBCORP) ......................................................... 224

Appendix D: Keyword Analysis (5+ Hits Section of SERBCORP) ........................... 231

Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits

Section of SERBCORP as the Reference Corpus) ..................................................... 235

Appendix F: Collocation Analysis (SERBCORP) ...................................................... 238

Appendix G: Collocation Analysis (5+ Hits Section of SERBCORP) ....................... 252

xi
List of Tables

Table 1 Research Design: CL and CDA Investigation of Language-related


Newspaper Discourse…………………………………………………….36

Table 2 Composition of SERBCORP (by Publication)…………………………..45

Table 3 Number of Articles in SERBCORP (by Year and Publication…………...45

Table 4 Article Mean Lengths, Standard Deviations, and STTR


(by Publication)…………………………………………………………..46

Table 5 Composition of SERBCOMP (by Publication)………………………….46

Table 6 Articles in SERBCORP (by Hit Count for the Lemma JEZIK and
Percentage)……………………………………………………………….46

Table 7 Composition of the 5+ Hits Section of SERBCORP (by Publication)…..47

Table 8 Number of Articles in the 5+ Hits Section of SERBCORP (by Year


and Publication).........................................................................................47

Table 9 Article Means, SD, and STTR in the 5+ Hits Section of SERBCORP
(by Publication).........................................................................................47

Table 10 Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP


(by Keyness Score)....................................................................................70

Table 11 Negative Key Lemmas in the 5+ Hits Section of SERBCORP (by


Keyness Score)…………………………………………………………...71

Table 12 Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP


with the 1-4 Hits Section of SERBCORP as the Reference Corpus
(by Keyness Score)………………………………………………………74

Table 13 Negative Key Lemmas in the 5+ Hits Section of SERBCORP with


the 1-4 Hits Section of SERBCORP as the Reference Corpus
(by Keyness Score)………………………………………………………75

Table 14 Positive Key Semantic Domains in the 5+ Hits Section of


SERBCORP with the 1-4 Hits Section of SERBCORP as the
Reference Corpus (by Rank)……………………………………………..76

xii
Table 15 Ethnolinguistic Identity-related Key-keywords and Key-keyword
Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits
Section of SERBCORP as the Reference Corpus (by Rank/Number of
Texts)…………………………………………………………………….84

Table 16 Factor-related Key-keywords and Key-keyword Associates in the 5+


Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP
as the Reference Corpus (by Rank)……………………………………...86

Table 17 Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+


Hits Section of SERBCORP (by Frequency)……………………………89

Table 18 Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+


Hits Section of SERBCORP (by MI Score)……………………………..90

Table 19 Sample of the Most Frequent N-grams in the 5+ Hits Section of


SERBCORP (by Number of Constituents and Frequency)……………...94

Table 20 Descriptive Statistics for the Variables in the 12-factor Solution


(N = 943, k = 107)……………………………………………………….97

Table 21 First 13 Eigenvalues of the Unrotated Factor Analysis


(N = 943, k = 107)…………………………………………………….....99

Table 22 Rotated Factor Pattern for the 12-factor Solution (Varimax Rotation)…100

Table 23 Summary of the Factorial Structure (Collocates in Parentheses were


not Used in the Calculation of Factor Scores)………………………….103

Table 24 Top 20 Highest Scoring Articles on Factor 1


(MV Outliers are in Bold)………………………………………………105

Table 25 Top 20 Highest Scoring Articles on Factor 3


(MV Outliers are in Bold)………………………………………………109

Table 26 Top 20 Highest Scoring Articles on Factor 9


(MV Outliers are in Bold)………………………………………………111

Table 27 Top 20 Highest Scoring Articles on Factor 11


(MV Outliers are in Bold)………………………………………………113

Table 28 Top 20 Highest Scoring Articles on Factor 2


(MV Outliers are in Bold)………………………………………………115

Table 29 Top 20 Highest Scoring Articles on Factor 5


(MV Outliers are in Bold)………………………………………………117

xiii
Table 30 Top 20 Highest Scoring Articles on Factor 4
(MV Outliers are in Bold)………………………………………………120

Table 31 Top 20 Highest Scoring Articles on Factor 8


(MV Outliers are in Bold)………………………………………………122

Table 32 Top 20 Highest Scoring Articles on Factor 6


(MV Outliers are in Bold)………………………………………………124

Table 33 Top 20 Highest Scoring Articles on Factor 10


(MV Outliers are in Bold)………………………………………………125

Table 34 Top 20 Highest Scoring Articles on Factor 7


(MV Outliers are in Bold)………………………………………………129

Table 35 Top 20 Highest Scoring Articles on Factor 12


(MV Outliers are in Bold)………………………………………………131

Table 36 Descriptive Statistics for Language-related Discourse Factor Scores


by Publication…………………………………………………………..133

Table 37 Descriptive Statistics for Language-related Discourse Factor Scores


by Year of Publication………………………………………………….135

Table 38 Descriptive Statistics for Language-related Discourse Factor Scores


by Type of Article………………………………………………………137

Table 39 Descriptive Statistics for Twelve Factors (Predictor Variables) in a


Six-cluster Solution.................................................................................141

Table 40 Results of ANOVA for Twelve Factors (Predictor Variables) in a


Six-cluster Solution.................................................................................143

Table 41 Discursive Links Between Twelve Factors Based on Highest Mean


Scores for Six Clusters………………………………………………….143

Table 42 Cluster Membership by Publication for the Six-cluster Solution………146

Table 43 Cluster Membership by Year of Publication for the Six-cluster


Solution…………………………………………………………………146

Table 44 Cluster Membership by Type of Article for the Six-cluster Solution…..146

xiv
List of Figures

Figure 1. Diagram of the analytical process………………………………………..38

Figure 2. Distribution of 5+ hit articles (by year, all publications)………………...48

Figure 3. Concordance lines for postoji ‘exists’ in the 5+ hits section of


SERBCORP……………………………………………………………...81

Figure 4. Scree plot of eigenvalues………………………………………………...99

Figure 5. Concordance lines for rasparčavanje ‘partitioning’ in SERBCORP…..192

xv
1. Introduction

1.1 Definition of Problem

Although the standard varieties of the Central South Slavic diasystem are virtually

fully mutually intelligible, in this region, as elsewhere, language has long been a primary

tool in the construction and maintenance of separate ethnonational identities and hence

highly contested (Greenberg, 2004). After a period of relative political stability and

formal linguistic union under the label of Serbo-Croatian in the former Yugoslavia (1945-

1991), the contestation reemerged and intensified with the dissolution of the federal state

and the initiation of the concomitant projects of (re-)construction of ethnonational

identities and states. The linguistic consequences of the dissolution include the “nominal

language death” of Serbo-Croatian, the emergence of four successor varieties bearing

ethnic labels (Bosnian, Croatian, Montenegrin, Serbian), and the formulation of

considerably different language policies in the new states (Bugarski, 2004). Despite this,

or perhaps precisely because of it, the contestation continues as identity and nationhood

continue to be negotiated. Most recently, in the summer months of 2013 three fierce

public language debates took place in three of the four successor states. In Bosnia-

Herzegovina, the debate, which took place in the period leading up to the country’s first

postwar census (October 2013), centered on the legitimacy of the name of one of the

three official languages, Bosnian. In Croatia, the debate centered on the reintroduction of

biscriptal public signs including the Cyrillic alphabet and the sometimes violent protests

against it in the easternmost Croatian city of Vukovar which has a sizeable Serb minority.

Finally, in Serbia the debate centered on the recognition of Bosnian as a minority

language, which came as a direct consequence of the country’s accession negotiations

1
with the European Union.

Underlying these and similar debates are language ideologies, for the present

purposes best defined as “the cultural system[s] of ideas about social and linguistic

relationships, together with their loading of moral and political interests” (Irvine, 1989, p.

255). Because they function as a mediating link between linguistic and social practices

language ideologies are “not about language alone” (Woolard, 1998, p. 3). Rather, they

often serve as tools for the invention of tradition which makes possible the “imagined

community” of the nation (Anderson, 1983), especially through deployment in unifying

public institutions such as print. For this reason, Silverstein (1998), for example, points

to the discursive practices in institutions as especially productive for the study of

language ideology. Available research suggests that mass media, and newspapers in

particular, are a primary site for discursive and ideological reproduction in modern

societies (e.g., Fowler, 1991; see also papers in Johnson & Ensslin, 2007). Studies of

language ideology thus often focus on public institutional discourses and those of

newspapers in particular, as “newspapers are self-conscious loci of ideology production”

(DiGiacomo, 1999, p. 105) as well as “key sites for language ideological debates between

various kinds of social actors” (Ensslin & Johnson, 2006, p. 155). Furthermore, if we

accept the view of newspapers as a discourse community with which the audience

identifies, it follows that “their average lexicon shapes, describes and expresses what is

accepted by [the] community” itself (Bassi, 2010, p. 209). This latter point is particularly

important for a lexical approach to dominant public language-related discourses and

language ideologies proposed in this study.

The principal goal of this study is to investigate the links between language-

2
related discourse and language ideologies and ethnonationalist discourse in mainstream

newspapers published in Serbia as the largest Central South Slavic nation. Language

ideologies have been closely related to nationalist discourses since the inception of

nationalism and the one-nation-one-language-one-territory trope (e.g., Bauman & Briggs,

2000). They continue to be important for the construction of national identities in Europe

and elsewhere, and in evidence in news writing (e.g., Blommaert & Verschueren, 1998).

Language-related discourse in the West Central Balkans in the last twenty years has been

dominated by a debate over the ownership of the common language and a concomitant

contestation of ethnonational identities which are still widely regarded in this area to rest

on linguistic distinctiveness. Although recent research has examined language debates

(e.g., Blommaert, 1999) and the links between language ideology and national identity in

plurilingual and multicultural societies (e.g., Canada, Vessey, 2013a and Spain/Catalonia,

Pujolar, 2007), little attention has been paid to contexts with minimal linguistic

differences between groups such as the West Central Balkans, particularly from a

quantitative or mixed methods perspective. This study is an attempt to close that gap.

1.2 Sociolinguistic History and the Role of Language Ideologies in Contemporary

Balkans

A nation has nothing holier nor dearer than its natural language, for it is only
through language that a nation, as a particular society, continues or vanishes.

Ljudevit Gaj (1835)

1.2.1 The symbolic importance of language. Language is one of the key

defining characteristics of Homo sapiens as a species1 and as such it has always been of

paramount importance to humans: as a means of communication, as an identity marker,

as a cultural tool. Its importance as an ideological tool in the struggle for hegemonic

3
power in late modernity, however, has been growing further still (Bourdieu, 1991;

Fairclough, 2001; Foucault, 1972; Habermas, 1989; Skutnabb-Kangas, 2000). Although

the societies of western and central Balkans, now collectively known as the former

Yugoslavia, do not always fit neatly in the category of late modernity, the significance of

language in both the traditional and postmodern senses is perhaps nowhere as great.

Indeed, as Robert Greenberg (2004) notes in the conclusion to his book Language and

identity in the Balkans: Serbo-Croatian and its disintegration, “in the former Yugoslavia

the power of language has at times reached absurd proportions” (p. 159). In the former

Yugoslavia, one might add echoing Ljudevit Gaj’s words from the epigraph above,

language ideology produced shibboleths that at times meant the difference between life

and death.

Although such language (and language-ideological) conflict is not unique but

represents a possible “sociolinguistic universal” (Ford, 2001), the language situation in

the former Yugoslavia is complex and can only be understood with reference to the

historical and sociopolitical trajectories of the region. Current academic treatments of

language in late modernity, especially vis-à-vis globalization, almost invariably discuss

colonialism as the backdrop for sociolinguistic developments globally (e.g., Wright,

2004). Ironically, despite the oft-repeated references to the region’s history, discussions

of language in the former Yugoslavia, academic and otherwise, often fail to appreciate the

importance of the region’s own colonial history, which, it is sometimes forgotten, is

longer than most and, unlike most, is the product of both Western and Eastern

imperialisms. Beyond the direct physical, political, legal, cultural, linguistic, etc.,

impacts of the dueling colonialist enterprises, and at least as important, is the impact of

4
the seventeenth- and eighteenth-century Western language ideologies, which were

received “enthusiastically” in Eastern Europe (Edwards, 1985; for an illustrative

example, see Gal, 2001), and as I hope to show, especially so in the Balkans (cf. Irvine &

Gal, 2000, especially pp. 60-71). Here, two strands of thought are particularly important:

the Lockeian rationalization of language in the empiricist tradition of the seventeenth-

century English Enlightenment, and even more so, the Herderian idealization of folk

language in the spirit of the eighteenth-century German Romanticism (Bauman & Briggs,

2000). Finally, it is especially important to recognize the characteristic

instrumentalization of language in ethnonationalist projects that these ideologies made

possible. Anachronistic yet coinciding with “the rise of small nations” (Wright, 2004) at

the end of the twentieth century and the beginning of the twenty-first, the Balkan

ethnonationalisms revived the Herderian ideal of a one-to-one equation between nation,

language, and territory and employed it in their projects of “contrastive self-

identification” (Fishman, 1972, p. 58), which have themselves been characterized by an

obsession with (purported) linguistic authenticity. The outcome has been what Greenberg

(2004), following Heinz Kloss, called the “nominal language death” of Serbo-Croatian,

the erstwhile common yet polycentric standard (Kordić, 2010), as well as a multiplicity

of mutually contested language ideologies (cf. Gal, 1998), waving the successor

languages as “flags” (Friedman, 1999). It is these resulting language ideologies, deeply

involved in the ethnonationalist projects through their deployment in nation-building via

public discourses, that I propose to examine in this dissertation.

As already noted, for a proper understanding it is necessary to anchor any analysis

of language ideology in its historical and sociopolitical contexts (cf. Irvine & Gal, 2000).

5
I will therefore first provide a brief discussion of the language history, language situation,

and language politics of the region, before moving on to offer a rationale and delimitation

for this study. The following account draws on Greenberg (2004), a rare comprehensive

and fairly neutral treatment of the topic,2 and to a lesser extent on Katičić (1997),

Carmichael (2000), and Dronjic (2011), as well as my own emic perspective.

1.2.2 A brief sociolinguistic history of West Central Balkans. The Slavic

languages of the former Yugoslavia (with the addition of Bulgarian) form the southern

part of the Slavic language group. Slovenian at the northwest and Macedonian at the

southeast ends of the area are separate languages of the Abstand type (Kloss, 1967),

whereas the larger central part of the area (Central South Slavic), spanning Croatia,

Bosnia-Herzegovina, Serbia, and Montenegro, features a continuum of a small number of

mutually comprehensible dialects, with one particular dialect (the Neo-Štokavian)

spanning all four countries and all four ethnic groups (i.e., Bosniaks, Croats,

Montenegrins, and Serbs). Considering the long and varied colonial history of the

region3 and the concomitant differences in culture and religion,4 it is remarkable that a

common dialect would develop and survive over the centuries. This became especially

important in the nineteenth century as the peoples of this region sought independence

and, following the trend in the rest of Europe, the formation of their own nation-states.

Concurrently with the drive for independence, however, a pan-Slavic linguapolitical

movement came into existence in Croatia (the Illyrians) which sought the unification of

South Slavs and their language based on a common dialect. Linguistic unification being

a more realistic goal than political unification at the time,5 a group of Serbian and

Croatian linguists and literary figures met in 1850 in Vienna, Austria to produce what

6
would become known as the Vienna Literary Agreement, which is widely considered to

be the inception of a common language standard for the Central South Slavic area (but

see Katičić, 1997 for a problematization of this view). The agreement was non-binding,

however, and it did not venture far beyond the status planning decision to base the

common standard on the “southern dialect” (i.e., the Neo-Štokavian) rather than on an

artificial amalgam of existing dialects (for the original text and an English translation of

the agreement, see Greenberg, 2004, pp. 168-171). Crucially, the name for this new

common literary standard was left unspecified.6

It is a truism, as Ricento (2006a), for example, notes, that language is inextricably

linked to power. The common standard that had been agreed upon in 1850 in Vienna

would therefore have to wait for the political circumstances to change to be implemented.

However, by the time such circumstances materialized at the end of World War I, the

politics of language in the region had also changed. Despite an initial defeat in the war

with Austria that began in the wake of the assassination of Archduke Franz Ferdinand of

Austria in Sarajevo, Serbia eventually recovered and joined her Triple Entente allies in

victory over Germany and Austria-Hungary. Owing to the war effort and continued

Russian support, Serbia was then granted its own regional sphere of influence, which

resulted in the formation of the Kingdom of Serbs, Croats, and Slovenes in 1918, the first

joint South Slavic state. Importantly, Serbia was the only South Slavic nation that

entered the union as a military power and an independent state,7 while ethnic groups

other than Serbs, Croats, and Slovenes (i.e., Bosniaks, Macedonians, and Montenegrins)

did not receive any political recognition at this time. Much like the Vienna Literary

Agreement, the new state turned out to be largely a Serbo-Croatian affair, wherein the

7
Croats fought to resist the sometimes real, sometimes perceived Serbian hegemony.8 The

political bickering destabilized the country, so in 1929 the reigning Serbian monarch,

King Alexander, seized the opportunity to change the constitution and with it,

symbolically, the country’s name into Yugoslavia.9

The period between 1850 and 1929, on the other hand, saw more status and

corpus planning work (see Greenberg, 2004, p. 54, for an overview of landmark events).

But, despite a shared dialect, the region harbored a number of quite disparate linguistic

traditions: there were two different alphabets in use (Latin and Cyrillic) and three

alternative pronunciations (Ekavian, Ikavian, and Ijekavian), as well as differences in

lexis and linguistic culture (e.g., in attitudes toward popular speech as the basis for a

standard). In addition to this, the standardization in Serbia had proceeded along divergent

lines since its independence in 1878. In 1913 Jovan Skerlić, a Serbian linguist, thus

attempted to resolve the major differences by proposing a compromise whereby the Serbs

would give up Cyrillic for the Latin alphabet while the Croats would switch from the

Ijekavian to Ekavian (i.e., Serbian) pronunciation, but this was never seriously

entertained. Furthermore, concurrently with King Alexander’s drive for a tighter union

and with government support, another prominent Serbian linguist, Aleksandar Belić,

published in 1930 an orthographic manual for Serbo-Croatian as an attempt to implement

Skerlić’s proposal with regard to pronunciation choice in Serbian favor. Needless to say,

this was opposed and even resented by most Ijekavian speakers (Bosniaks, Bosnian and

Croatian Serbs, Croats, and Montenegrins), but only the Croats had any political clout to

resist the Serbs. This was a prelude to a period of rapidly worsening inter-ethnic

relations, particularly between the Serbs and Croats, which culminated in the Yugoslav

8
capitulation after a brief war with Nazi Germany and the formation of a Croatian Nazi

puppet state (NDH) in 1941. Eager to dissociate Croatian from Serbian because of the

(perceived) implications of a close association for Croatian national identity, the ultra-

nationalist NDH government moved immediately upon its establishment to declare

Croatian as a separate language and embarked upon an aggressive program of re-

standardization, switching from the common phonetic to an etymological writing system

and introducing numerous archaisms and neologisms alike, among other innovations.

Subsequently, it also embarked upon an equally aggressive campaign of ethnic cleansing

which culminated in the genocide against the Croatian Serbs (and Roma).

At the same time, the Yugoslav Communist Party guerilla force, which was

composed of members from all ethnic groups, was gaining strength; headed by Josip

Broz Tito, the Partisans first founded the second Yugoslav state in 1943 and then defeated

both the German and Italian occupiers and their mostly Croatian and Serbian

ultranationalist collaborators (Ustashas and Chetniks). The subsequent language policy

of the Communist government is widely considered to have been committed to equality

among the constituent peoples (i.e., ethnic groups) and tolerance of minorities, although it

was not immune to certain problematic compromises. In accordance with its ideology of

“brotherhood and unity”, the second Yugoslavia eventually reinstated the common Serbo-

Croatian standard as the official language, while also recognizing Slovene and

Macedonian, as well as a number of minority languages such as Albanian; Bosniaks and

Montenegrins were left out again, however.10 In order to resolve some of the old issues

and chart a new course, a new meeting of linguists was called in 1954 in Novi Sad,

Serbia. Again, as in 1850, the meeting in Novi Sad included only Serb and Croat

9
linguists. The compromise agreement they reached, known as the Novi Sad Agreement,

consisted of ten conclusions concerning status and corpus planning (for the original text

and an English translation of the agreement, see Greenberg, 2004, pp. 172-174). The

agreement established a bi-centric new standard: now officially named Serbo-Croatian or

Croato-Serbian, it would have two equal “variants”, an Eastern/Ekavian and a

Western/Ijekavian one (i.e., Serbian and Croatian), with equal use of the two alphabets

(Latin and Cyrillic) throughout. Also agreed was joint codification work on new

orthographic manuals, grammars, and dictionaries, as well as terminology development.

Some Croatian linguists now argue that Serbo-Croatian never really existed (e.g.,

Barić et al., 1999; Katičić, 1997), whereas Serbian linguists generally reject this thesis.

Greenberg (2004), for his part, points to the fact that linguistic unification, at least the

original one in 1850, was not forced upon the Serbs and Croats by anyone, and this is

certainly true (even though this view ignores the fact of imposition of the linguistic union

on Bosniaks and Montenegrins). However, it is important to note that the historical and

political circumstances in the region, the relations between the different ethnic groups,

again especially Serbs and Croats, as well as the significance of the language issue, had

all drastically changed by 1954. This time, the Serb and Croat linguists produced what

now seems a mere tactical agreement, likely expecting that it would not last. And, of

course, it didn’t. Some twelve years later, in 1966, first an unauthorized dictionary of

Serbian with clear nationalist and anti-Communist overtones was published

(Moskovljević, 1966), then, a year later, the Croats responded by issuing the “Declaration

on the Name and Position of the Croatian Literary Language” which was a direct

challenge to the Serbo-Croatian common standard; the joint codification work that was

10
underway would soon stop. Despite a swift intervention by the federal authorities, who

rightly saw this turn of events as a danger to ethnic relations in Yugoslavia, the writing

was on the wall: the project of unification was over and it was only a matter of time

before Serbian and Croatian parted ways once more.

Greenberg (2004, p. 32) cogently notes that this language conflict was

symptomatic of a more general restructuring of the federal state, which was moving

toward decentralization; de jure devolution of a number of important powers from the

federal to the republic (i.e., state) level was finally enshrined in the 1974 rewriting of the

constitution. In terms of language policy, this is particularly significant because the new

constitution devolved also language policy to the republic level, effectively opening the

door to the introduction of a polycentric standard. This was, of course, seized upon by

Croatia, but equally importantly, also the authorities in Bosnia-Herzegovina and

Montenegro, who had had no voice in any of the previous decisions.11 Of course, official

federal policy, which in the Communist-run Yugoslavia had the force of a dogma, made it

impossible to change the name of the language from Serbo-Croatian or Croato-Serbian

into anything else, but new standard varieties were nevertheless introduced under the

euphemistic label “standard linguistic idiom”. Mindful of the ethnic heterogeneity of

Bosnia-Herzegovina, the republic’s authorities opted for a pan-ethnic standard which

included elements from the idioms of all three major ethnic groups (Bosniaks, Croats,

and Serbs) but was anchored in the idiom of the Bosniaks as the largest group, some

elements of which in their turn had come to be shared by Bosnian Serbs and Bosnian

Croats (e.g., frequent use of Turkish loans in everyday speech); both alphabets remained

in use and retained an equal status.12 However, the Serbian intellectual elite largely

11
interpreted this development as a threat to both the integrity of Yugoslavia and the ethnic

identity of their co-ethnics outside of Serbia, who made up a sizeable minority in Croatia

and roughly a third of the population of Bosnia-Herzegovina. This tension would simmer

more or less quietly until the resurgence of open nationalism after the Yugoslav

Communist Party had relinquished power and called free elections in 1990.

The “contrastive self-identification” between the different ethnic groups

mentioned above was on full display during the election campaigns of 1990, while

languages as the cornerstones of ethnonational identities were, indeed, waived as “flags”.

Needless to say, such a development did not bode well for the federation and the issue of

the political future of the country became “the question of all questions” during the

campaigns and particularly after the elections. As the largest ethnic group spread over

several republics and one that was effectively in control of the federal state as well as

overrepresented in the oversized federal military, Serbs stood to lose the most from a

dissolution of Yugoslavia; everyone else stood to gain their independence. These

positions proved to be irreconcilable in the ensuing round of negotiations on the future of

the federation by the newly elected presidents of the republics. The resulting stalemate

was broken by a declaration of independence by Slovenia, Croatia, and Macedonia in

1991, followed by Bosnia-Herzegovina in 1992; Montenegro remained in a federation

with Serbia until 2006 when it too regained independence. With the exception of Bosnia-

Herzegovina where this issue was more complicated on account of its ethnic

composition,13 each newly independent country declared its majority language official

and the unified Serbo-Croatian standard thus formally ceased to exist (see Sudetic, 1993

for a contemporary report). Having firm control over the powerful Yugoslav military, the

12
Serbs rejected these declarations of independence, turning political into military conflicts

with war flaring up first in Slovenia, then in Croatia, and finally in Bosnia-Herzegovina.14

After a series of wars, including the longest and particularly vicious Bosnian War which

culminated in a twin aggression against Bosnia-Herzegovina by Serbia and Croatia and

genocide against the Bosniaks by the Serbs, the former Yugoslavia metamorphosed into

seven independent states, each with its own language policy, replacing Serbo-Croatian

with Bosnian, Croatian, and Serbian as official languages in Bosnia-Herzegovina,

Croatian in Croatia, Montenegrin in Montenegro, and Serbian in Serbia (see Bugarski,

2004).

1.2.3 Language, identity and ethnonationalism in contemporary Balkans. It is

clear from the discussion above that despite a roughly 150-year-long history of

unification attempts, Central South Slavic has always been and remains a polycentric

language (cf. Kordić, 2010). At the same time, this polycentricity and the right to codify

and particularly name varieties has been fiercely contested since the nineteenth century

by the Serbian and Croatian intellectual elites, which have, with varying degrees of

success, continually and intensely ideologized the “common” language in order to

manipulate ethnonational identities on the basis of purportedly objective scientific

theories of language and nationalism. Hence, it has been of little consequence that,

A potentially classical example to disprove the existence of objective criteria of


nationhood is a comparison between the Serbs and Croats, on the one hand, and
the Flemish and the Dutch, on the other. In the Serbian-Croat case, existing
linguistic differences (underscored by a different orthography) have become
highly symbolic for the discontinuity, whereas in the Flemish-Dutch case (where
the linguistic differences are of almost exactly the same type and degree)
language is the main symbol of cultural unity. On all other accounts, the
differences are completely analogous as well – history (Ottoman rule for Serbia
versus Spanish rule for Flanders, resulting in long periods of political separation
from Croatia and Holland, respectively) and religion (Orthodox versus Catholic in

13
the one case, Catholic versus Protestant/Calvinist in the other” (Blommaert &
Verschueren, 1998, p. 199).

As noted above, similar to the rest of Eastern Europe the peoples of the Balkans have

embraced the essentialist Western language ideology which views language as the

embodiment of the character of the “natural” group that produced it, i.e. the nation (see

Bauman & Briggs, 2000). Consequently, language has been a primary site for the

construction of and struggle over ethnonational identity and the concomitant group rights

as “ideologies that appear to be about language, when carefully reread, are revealed to be

coded stories about political, religious, or scientific conflicts” (Gal, 1998, p. 323). This is

evident in public and scholarly discourses on language both in Europe and in the Balkans.

In a study of the role of language in European nationalist ideologies, Blommaert and

Verschueren (1998), for example, note that “the absence of the feature ‘distinct language’

tends to cast doubts on the legitimacy of claims to nationhood” (p. 192). Furthermore, as

Irvine and Gal (2000) argue, in “the political contestation surrounding contrasting

scholarly claims [i]n Macedonia, linguistic relationships came to be used as authorization

for political and military action that changed sociolinguistic practices, thereby bringing

into existence patterns of language use that more closely matched the ideology of

Western Europe” (p. 60). But it would be naïve, of course, to think that Western ideology

has been merely passively adopted. As the recent localizations of the contemporary

Western discourse on terrorism in the Serbian and Croatian press (Erjavec, 2009) show,

Western ideologies are also appropriated and function on the basis of “fractal recursivity”

(Irvine & Gal, 2000, p. 38) to serve specific local purposes. As Irvine and Gal (2000)

further contend, “[t]he continuing intensity of contestation” over language and

ethnonational identity is thus “hardly surprising, given the consequences envisaged and

14
authorized by the reigning language ideology and occasionally enacted under its auspices.

It is an ideology in which claims of linguistic affiliation are crucial and exclusivist

because they are also claims to territory and sovereignty” (p. 72, my italics). Seen in this

light, the continuing Serbian (and in the case of Bosnian, also Croatian) refusal to

recognize other groups’ ownership rights, including the right to name their language (see

Tošović, n.d., for a rationalization of this contestation), become rather transparent.

Although, as we have seen, language (ideological) debates have continued throughout the

period of the “joint” language development, the period between 1990 and now is

especially significant because it has seen the end of Serbian (and Croatian) lingua-

cultural hegemony and an unraveling of the “common” standard, particularly in Bosnia-

Herzegovina and Montenegro. These processes have been reflected in and partly

constituted through discourses produced for and directed at the public “as a language-

based form of political legitimation” (Gal & Woolard, 2001, p. 4). What is still missing

in this particular case, however, is empirical attestation of these dominant ideologies in

order to determine their specific contents and modi operandi. But, before we turn to the

methodological approach to the identification of language ideologies in this dissertation,

let us take a detailed look at the concepts of ideology and discourse, which are often

contested themselves.

1.3 Study Outline

The remainder of this study is organized as follows. Chapter 2 presents a

literature review, divided into sections about theoretical approaches to ideology and

language ideology, and empirical approaches to language ideology. In Chapter 3 a

detailed overview of the study is given, including the research questions, research design,

15
construct definitions, and gaps. Chapters 4 and 5 discuss data and methods employed,

while Chapters 6 and 7 present the results and discussion (by research question). The

conclusion, limitations, and suggestions for future research are given in Chapter 8.

Appendices A-G detail relevant preliminary analyses and show full lists of keywords and

collocations.

2. Literature Review

2.1 Theoretical Approaches to Ideology

2.1.1 Historical development. As is well known, the concept of ideology

originates from the late eighteenth century. The term was coined by the French

philosopher Destutt de Tracy who, in typical Enlightenment fashion, conceived of

ideology as a positivistic science of ideas based in physiological sensations (a branch of

zoölogy), optimistically hoping to arrive at a full understanding of the human mind and

achieve a rational organization of society (Eagleton, 1991; Silverstein, 1998; Woolard,

1998; see also Aarsleff, 1982). As both Silverstein and Eagleton note, however, similar to

many other terms ending in “-ology” the meaning of ideology quickly shifted from

“field-of-scientific-study” to “object-of-scientific-study” (Silverstein, 1998, p. 139, Note

1) and thus from “scientific study of human ideas” to “systems of ideas themselves”

(Eagleton, 1991, p. 63). Complicating matters further, over time the term has developed

“a whole range of useful meanings, not all of which are compatible with each other”

(Eagleton, 1991, p. 1; see also Blommaert, 2005).

2.1.2 Theoretical approaches. Different authors emphasize different aspects of

this semantic and conceptual quagmire. The literature on ideology is vast, spanning

many different fields and research traditions, so what follows is perforce a selective but,

16
for present purposes hopefully, functional treatment (for surveys, see, for example,

Eagleton, 1991; Thompson, 1984; and van Dijk, 1998). Woolard and Schieffelin (1994)

point to two basic divisions in the study of ideology, which are also applicable to the

study of language ideology. The first concerns the truth value of ideology and

differentiates between neutral and critical uses of the term, whereby neutral views of

ideology are ideational and all-encompassing (i.e., “all cultural systems of

representation”), whereas critical views of ideology are instrumental and particularistic

(i.e., “aspects of representation and social cognition with particular social origins”). This

is paralleled by Blommaert’s similarly theoretical distinction between ideology as “a

generalizing phenomenon characterizing the totality of a particular social or political

system, and operated by every member or actor in that system” and ideology as “a

specific set of symbolic representations […] serving a specific purpose, and operated by

specific groups or actors” (2005, p. 158, italics in the original). Recognizing this basic

division between “the wider and narrower senses of ideology”, Eagleton (1991, p. 3), on

the other hand, points also to interpretations of ideology as “illusion, distortion and

mystification” (i.e., concerned with the truth value of ideology) and those instead

“concerned with the function of ideas within social life” (i.e., the metapragmatics of

ideology). Most authors seem to agree that there is a further general distinction to be

made between the competing views of ideology: that between views of ideology as a

cognitive, ideational phenomenon (i.e., “group-schemata”, Blommaert, 2005, p. 162)

existing below the level of discursive consciousness, and those which see ideology as a

materialist phenomenon, a set of signifying practices arising from lived relations

(Althusser, 1971).

17
This latter point is particularly important for the second division in the study of

ideology as noted by Woolard and Schieffelin, which is epistemological and concerns

what they call “the siting of ideology”. In their own words, “[a]lthough ideology in

general is often taken as explicitly discursive, influential theorists have seen it as

behavioral, pre-reflective, or structural, that is, an organization of signifying practices

[which need not be linguistic] not in consciousness but in lived relations” (Woolard &

Schieffelin, 1994, p. 58). Put differently, depending on one’s viewpoint, traces of

ideology can be located either in explicit manifestations of discursive consciousness (i.e.,

metapragmatic commentary, cf. Silverstein, 1993), or they must be inferred from the

totality of lived relations in a particular community (e.g., linguistic usage or other

signifying practices). It will be noted that this distinction has methodological

implications for the study of ideology as well as language ideology which are most

pertinent for this dissertation; I shall therefore return to this issue toward the end of this

chapter.

2.1.3 Definitions. As a consequence of the multiplicity of contributions from

different fields, research traditions, and political positions noted above, definitions of

ideology abound. I rely here on the account provided by Terry Eagleton in his widely

cited book Ideology: An introduction (1991). Following Raymond Geuss, Eagleton

makes a distinction between “descriptive”, “pejorative” (cf. neutral vs. critical above) and

“positive” definitions of ideology. He goes on to formulate “descriptive” definitions of

ideology as those that view ideology as “belief-systems characteristic of certain social

groups or classes, composed of both discursive and non-discursive elements”;

“pejorative” definitions of ideology as those that view ideology as “a set of values,

18
meanings and beliefs which is to be viewed critically or negatively” because it

legitimates an unjust social order; and “positive” definitions of ideology as those that

view ideology as “a set of beliefs which coheres and inspires a specific group or class in

the pursuit of political interests judged to be desirable” (Eagleton, 1991, pp. 43-44).

Woolard (1998) cites the anthropologist Clifford Geertz and sociologist Karl

Mannheim as two of the most prominent advocates of a “descriptive” understanding of

ideology as the totality of social knowledge and a medium of meaning for social

purposes. But, as she also notes, the chief criticism of such conceptions of ideology is

that they neglect power relations, which is precisely the focus of the “pejorative”

definitions. Perhaps the most widely cited, but now also most widely rejected pejorative

definition (see Eagleton, 1991, especially pp. 10-26), is of ideology as “false

consciousness”, originally put forward by Karl Marx and Friedrich Engels in their book

The German Ideology. This understanding of ideology implies an accommodation and

acquiescence on the part of the proletariat to the bourgeois hegemony (Gramsci, 1971)

such that members of the working class are unable to identify their true class interests due

to their adherence to the bourgeois worldview. Similar to this is Thompson’s (1984, p. 4)

definition of ideology as “essentially linked to the process of sustaining asymmetrical

relations of power – to maintaining domination […] by disguising, legitimating, or

distorting those relations.” Eagleton finally cites Lenin’s approval of the term “socialist

ideology” as well as its acceptance by other “radical” theorists such as Sorel and

Althusser as an example of a “positive” definition of ideology. Interestingly, while the

“pejorative” and “positive” conceptions of ideology derive from the Marxist tradition and

the work on the political left, the “neutral” conceptions of ideology are ascribed to the

19
work deriving from the neo-Kantian tradition in philosophy and Durkheimeian tradition

in sociology (Blommaert, 2006a). Importantly for the present purposes, however, one

should note that what is understood as “an unjust social order” and “political interests

judged to be desirable” will, of course, depend on one’s ideological position and thus

presents a problem of its own, especially in an environment of pathological

ethnonationalist contestation of identity such as that of contemporary Balkans.

2.1.4 Conceptualizations. Theoretical accounts of ideology, finally, often speak

of conceptual “strands” or “themes” (Woolard, 1998) and “layers” (Blommaert, 2005)

organized in “a progressive sharpening of focus” (Eagleton, 1991). Woolard (1998) thus

identifies four such strands which can be summed up as follows: (1) ideology as

ideational or conceptual, mental rather than social phenomena; (2) ideology as reflective

of particular social experiences or interests, dependent on the material aspects of human

life; (3) ideology as signifying practices in the struggle for power; and (4) ideology as

distortion which can, but need not, derive from an interest in the legitimation of power.

Somewhat similarly, Eagleton (1991) conceptualizes ideology in a series of six steps,

proceeding from the very general and neutral definition of ideology as “the whole

complex of signifying practices and symbolic processes in a particular society” (p. 28) to

the specific and critical definitions of ideology as “ideas and beliefs which help to

legitimate the interests of a ruling group or class specifically by distortion” (p. 30), and

similarly false beliefs arising from the material structure of society rather than the

interests of a dominant class (e.g., commodity fetishism). Blommaert (2005), on the

other hand, points to the synchronization of different layers of ideology, i.e., “different

ideologies […] operat[ing] at different levels of historicity” (p. 175) but at the same

20
historical moment, as the likely cause of the “terminological muddle” (p. 161).

According to Blommaert, then, the multifacetedness of ideology is primarily a

consequence of the concurrent and, more often than not, opaque operation of multiple

ideologies of different orders and varying historical trajectories. But perhaps the most

important, if also the most general, conceptualization of ideology for the purposes of this

dissertation is that of ideology as the fundamental element in the triumvirate it forms with

discourse and power (e.g., Blommaert, 2005; Eagleton, 1991), which points to the

importance of the linguistic dimension of ideology as well as the ideological dimension

of language, and to which we now turn.

2.2 Theoretical Approaches to Language Ideology

2.2.1 Historical development. Although, as Silverstein (e.g., 2000) himself has

noted, the leading early twentieth century American anthropologists and linguists such as

Franz Boas, Leonard Bloomfield, and Edward Sapir did consider the issue of language

ideology only to dismiss it as inconsequential (Benjamin Whorf, according to Silverstein,

is the exception here), the emergence of the study of language ideology is now commonly

dated back to his own work, and his seminal 1979 paper “Language structure and

linguistic ideology” in particular (see Blommaert, 2006b; Kroskrity, 2004; Woolard,

1998). Kroskrity (2004) outlines briefly but usefully the history of twentieth-century

structuralist neglect of speaker awareness and the non-referential functions of language in

both anthropology and linguistics, pointing to the work in the fields of ethnography of

communication and interactional sociolinguistics by prominent figures such as Dell

Hymes and John Gumperz as an important precedent for the later work in language

ideology. On account of this neglect (which de Beaugrande, 1999, contends is due to

21
“scientism”, itself an ideology) and in contrast to ideology, then, language ideology as a

subfield of academic inquiry emerged only in the last two decades of the twentieth

century and is “still under construction” (Blommaert, 2006a). Furthermore, it does not,

as Woolard and Schieffelin (1994) noted, yet have a single core literature, although one is

beginning to coalesce primarily around linguistic anthropological attention to the topic

(see essays in Gal & Woolard, 2001; Schieffelin, Woolard & Kroskrity, 1998, 2000;

Wilce, 2000). There has also been a parallel and related research program in the

framework of critical discourse analysis, which has examined the role of language in

contemporary capitalist societies from a critical perspective (e.g., Fairclough, 2001;

Blackledge, 2005; Blommaert, 2005; Wodak, 2012; van Dijk, 1998), as well as a number

of contributions from applied linguistics generally and language policy and planning in

particular (e.g., Blackledge & Pavlenko, 2002; Blommaert, 1999; Jaffe, 1999; Lippi-

Green, 2007; McGroarty, 2008, 2010; Ricento, 2000).

2.2.2 Theoretical approaches. Research into language ideology has been as

multifarious as research into ideology itself, similarly spanning many different fields and

research traditions. Woolard and Schieffelin (1994) and Woolard (1998) provide wide-

ranging overviews of some of the most important work in the contributing fields and

research traditions. These include ethnography of speaking, multilingualism, literacy

studies, historiography of linguistics, as well as linguistic anthropology, language contact,

colonial studies, sociology, and, perhaps most importantly for present purposes, language

policy, language politics, and studies of identity and nationalism. An important, albeit

sometimes fuzzy, dividing line here has been between research foregrounding the

dialectic between language ideology and social life, referred to above as the

22
metapragmatics of (language) ideology, and research foregrounding the dialectic between

language ideology and the linguistic system itself (e.g., Dirven, Hawkins & Sandikcioglu,

2001; Seargeant, 2009; Silverstein, 1979). Note that, on account of my focus on public

discourses and the role of language ideology in the (re)construction and maintenance of

contemporary ethnolinguistic identities, I am primarily concerned here with the former

and will not be devoting much attention to the latter, even if the two are certainly

complementary and could be fruitfully combined (for an example, see Irvine & Gal,

2000).

2.2.3 Definitions and conceptualizations. Again, similar to ideology, definitions

of language ideology abound, reflecting different conceptualizations and emphases.

Silverstein (1979, p. 193) originally defined language ideologies as “sets of beliefs about

language articulated by users as a rationalization or justification of perceived language

structure and use.” Most other definitions are less concerned with language structure,

however, while beliefs about language, more often than not, are tacit and can only be

inferred by analyzing actual language-related practices and decisions (McGroarty, 2008).

Definitions of language ideology also differ in scope. Rumsey (1990) thus offers a very

general definition of language ideologies as “shared bodies of commonsense notions

about the nature of language in the world” (p. 346). Somewhat more narrowly, Seargeant

understands language ideology as “entrenched beliefs about the nature, function, and

symbolic value of language” (2009, p. 346). Spolsky (2004), on the other hand, focuses

on the pragmatic aspects of language ideologies as belief systems which determine

language attitudes, judgments, and behavior. Curiously, then, many definitions of

language ideology seem to be in the “neutral” camp (see the conceptualizations of

23
ideology above), seemingly eschewing the issue of power relations and the implication

thereof on the understanding of language ideology. Contrary to this trend, and more in

line with my own view, Irvine (1989) sees language ideology as “the cultural system of

ideas about social and linguistic relationships, together with their loading of moral and

political interests” (p. 255), while Errington (2001) makes this even more explicit by

referring to language ideologies as “situated, partial, and interested […] conceptions and

uses of language” (p. 110). Interestingly, although these last two definitions are not quite

neutral in the above sense, exhibiting as they do a critical awareness of the necessary

social situatedness of any ideology, they nevertheless stop short of fully articulating this

particular aspect of the concept. Further, perhaps due to the general poststructuralist

distaste for any, even remotely essentialist notions, there is no mention here of “illusion,

distortion and mystification” or “false consciousness” (but see Spitulnik, 1998, especially

p. 164).

This, however, is not to say that research into language ideology is generally

unaware or neglectful of the power aspect of it. On the contrary, it seems fair to say that

virtually all treatments of language ideology, whatever their foci, pay careful attention to

power and its implications in their analyses. A good example of this is the collection of

essays in Blommaert (1999), which anchors analysis in the power-laden concept of

(political) debate. This is further illustrated by some of the conclusions reached thus far

in the research on language ideology. Where syntheses of research on ideology identify

“strands” and “layers” (see above), syntheses of research into language ideology, more

specifically, speak of “features” or “dimensions” and “levels of organization” (Kroskrity,

2000b, 2004). Kroskrity (2004, pp. 501-509) identified five such levels of organization

24
of language ideology that emerged from the existing literature, three of which clearly

indicate the centrality of the concern with power relations in language-ideological

research: (1) group or individual interests (i.e., “language ideologies represent the

perception of language and discourse that is constructed in the interest of a specific social

or cultural group”); (2) multiplicity of ideologies (i.e., “language ideologies are profitably

conceived as multiple because of the plurality of meaningful social divisions”); and (5)

role of language ideology in identity construction (i.e., “language ideologies are

productively used in the creation and representation of various social and cultural

identities [such as] nationality and ethnicity”).

But even after we recognize that despite the apparent definitional shortcomings,

language-ideological research does, of course, incorporate power relations, the fact

remains that there is a tendency, at least in definitions, to treat language ideology as a

static phenomenon (a set of existing, given beliefs, notions, ideas or conceptions), a fait

accompli as it were. Thus we may know what a particular language ideology is (if, of

course, we can agree on a shared understanding of it), but we are less sure of how it

operates. However, language ideology, as Spitulnik (1998) points out, is both conceptual

and processual, and so our attention must be redirected to specific language-ideological

practices. When we adopt such a sociolinguistically dynamic view, we begin to

understand language ideologies not only as “ideas with which participants and observers

frame their understandings of linguistic varieties” but also as processes through which

they “map those understandings onto people, events, and activities that are significant to

them” (Irvine & Gal, 2000, p. 35), at which point we can finally consider their effects and

consequences.

25
2.3 Empirical Approaches to Language Ideology

This section provides a brief look at the origins and the development of the study

of language ideology, and the theoretical and methodological contexts for the current

investigation. This is followed by a discussion of the gaps in the existing literature and

the research questions addressed by the present study.

2.3.1 Theoretical and methodological contexts in language ideology research.

Existing research into public discourses and language ideologies is largely

interdisciplinary and can be classified broadly into one of three main thematic foci:

language ideology and language education (e.g., Hornberger & McKay, 2010); language

ideology and identity, ethnicity, and nationalism (e.g., Kroskrity, 2000a); and language

ideology and social justice (e.g., Blackledge & Pavlenko, 2002). However, it should be

noted that these often overlap as language-in-education policies, for example, can have

implications for ethnic and national identities as well as social justice (e.g., Lippi-Green,

2007). The theoretical and methodological orientations are similarly heterogeneous.

Perhaps due to the influence of (critical) discourse analysis, the theoretical approaches

have been eclectic and often unsystematic, drawing on and combining a wide range of

theoretical frameworks from various disciplines. The methodological approaches have

been predominantly qualitative (ethnography, historiography, discourse analysis, and

linguistic-philosophical and linguistic-theoretical analyses), but, pertinently for this

dissertation, an increasing number of studies are taking a mixed methods approach,

combining corpus-linguistic techniques and forms of (critical) discourse analysis (see,

e.g., Baker et al., 2008; Partington, 2010).

Despite the theoretical and methodological heterogeneity, however, even a

26
cursory glance at the available literature reveals a heavy reliance on readily available

textual materials, particularly newspaper articles, although this is less so in studies

focusing on language ideology and language education, which rely on surveys, and

observational (e.g., speech recordings) and experimental (e.g., matched-guise technique)

methods and data as much as on textual pedagogic materials and policy documents. The

choice of context for language-ideological research thus depends to a considerable degree

on its thematic focus. In the case of research into language ideology and language

education typical contexts include classrooms as well as educational contexts beyond the

classroom such as school districts or university departments (for an overview, see

McGroarty, 2010), whereas research into language ideology and identity, ethnicity, and

nationalism, as well as social justice, relies more on politically framed contexts (e.g.,

state or regional and national entities, with some studies taking a comparative, cross-

national approach). Further, unlike the studies of educational contexts, studies with one

of the other two foci strongly favor textual materials such as newspaper discourse and

corpora of official policy documents or historical documents created in particular

institutional contexts in a variety of genres and registers (e.g., Fitzsimmons Doolan,

2009; Ricento, 2003). Other contexts include translation work (e.g., Kuo & Nakamura,

2005) and, most recently, computer games (Ensslin, 2010).

Regardless of the thematic focus and methodological orientation, however, most

research into public discourses and language ideologies has been limited to institutional

contexts of one kind or another (e.g., educational system). The primary reason for this, of

course, is that language ideologies are sociocognitive phenomena and so, as noted above,

institutions (whether cultural, political, or religious) as major discourse nodes are more

27
likely sites to contain traces of ideologies but also to have an impact on their

reproduction. And yet the problem with an exclusive focus on institutional contexts is

that they tend to be dominated by official, top-down discourses and ideologies, often to

the point of exclusion or at least obscuration of any bottom-up, that is counter-

hegemonic, discourses and ideologies. More research is therefore needed into contexts

which are more likely to show traces of unofficial or alternative discourses and ideologies

such as, for example, (subaltern) language activism (Jaffe, 1999), and online reader

commentary (Vessey, 2013b) and debates in cyberspace (i.e., online fora and social

networks, e.g., Johnson, Milani & Upton, 2010). A note of caution is in order here,

however: although anonymized online communication has the (often dubious) advantage

of being free from many constraints of face-to-face interactions (in addition to ease of

access) and thus can offer an insight into attitudes and beliefs devoid of certain pragmatic

considerations, representativeness of such data is nearly always an issue and so they

should be treated with caution and matched against data from other sources whenever

possible.

2.3.2 Research questions in language ideology research. Partly on account of

the research traditions in the contributing disciplines (i.e., anthropology, critical discourse

analysis) and partly on account of the theoretical and methodological nascence of the

field, research questions in the existing studies are not always stated explicitly and can

thus be implicit or even difficult to identify. The two early foci on language attitudes and

the role of language ideologies in the construction and reproduction of (ethno)nationalist

identities remain prominent in recent studies (Fitzsimmons-Doolan, 2011; Fleischer,

2007; Vessey, 2013a; O’Rourke & Ramallo, 2013); other contributions have attended to

28
the specific role of public institutions in the production of language ideologies (Spitulnik,

1998), and globalization-related issues such as the manipulation of language in

ideological public discourses on economic neoliberalism, political correctness, and

religious fundamentalism (Crapanzano, 2000; de Beaugrande, 1999; Johnson & Suhr,

2003; Salama, 2011), as well as migration and diasporic communities (Baker et al., 2008;

Fraysee-Kim, 2010).

Unsurprisingly, approaches to the formulation of research questions differ widely,

so here again we can discern some major trends. Earlier studies, which relied more on

ethnographic and historiographic methods as well as various forms of discourse analysis,

tend to ask two types of questions both of which often include a methodological

component. Studies based on ethnographic and historiographic analyses thus ask macro-

level questions such as the following: What is the structure of language ideology? What

are the consequences (for politics, for research) of language ideologies? (Irvine & Gal,

2000); How have ideologies of language, nation and state been connected to each other

and the practice of sociolinguistics? (Heller, 1999). Studies based on discourse analysis,

on the other hand, tend to focus more on micro-level or localized issues, asking questions

such as: What discourse prosodies (and ideologies) do the term “liberal” and its

derivatives “liberalism” and “liberalization” have? (de Beaugrande, 1999); and, What is

the role of language ideology in the conflict of Catalan and Spanish nationalisms over the

ownership of the 1992 Olympic games in Barcelona? (DiGiacomo, 1999). However,

Blommaert and Verschueren (1998), for example, rely on discourse analysis but ask

macro-level questions (What is the specific role of language in current nationalist

ideologies? What is an adequate methodology for ideology research?), while Jaffe (1999)

29
relies mainly on ethnography to ask both local and more global questions (What are the

ideological underpinnings of the strategies of language planners in Corsica? How are

their language ideologies rooted in European and French political economies?).

In contrast to these, more recent studies tend to be more synchronically-oriented,

localized, and textually-based, increasingly relying on mixed methods designs which

combine various corpus-linguistic techniques such as keyword and collocation analyses

and critical discourse analysis (CDA). Also here, there is often a methodological

component to research questions. Examples of research questions in the more recent

studies include: What are the similarities and differences between Anglo-American and

German discourses of “political correctness”? (Johnson & Suhr, 2003); What are the

common categories of representation of refugees, asylum seekers, immigrants and

migrants in the British press? Which texts are representative? (Baker et al., 2008); and,

Can the Canadian context provide a useful site for cross-linguistic corpus-assisted

discourse studies (CADS)? How can cross-linguistic CADS shed light on language

ideologies in Canadian newspapers? (Freake, 2011).

Overall, then, research into language ideology seems to be moving away from

predominantly qualitative methods and theoretical and macro-level questions toward

mixed methods and methodological and micro-level questions focusing on localized

contexts. Most pertinently for the field of language policy and planning (LPP), there is

an emerging trend of using mixed methods approaches to examine corpora of official

language policy documents focusing on questions of localized state or national interest

(e.g., Fitzsimmons-Doolan, 2011; Freake, Gentil, & Sheyholislami, 2011). Interestingly,

although the question of the siting of ideology (Woolard & Schieffelin, 1994) is of central

30
importance to the identification and interpretation of language ideologies, language-

ideological research has so far largely failed to produce integrated accounts which would

examine and compare data from different sites of ideological (re)production (but see

Blackledge, 2005; Jaffe, 1999; and Vessey, 2013b for steps in this direction). At the same

time, the availability of data from various social media, as well as academic sources,

offers an opportunity for more integrated and innovative perspectives to consider and

compare sites of ideological (re)production which demonstrate varying levels of

discursive consciousness.

2.3.3 Types of data used in language ideology research. The different types of

data typically relied upon in language-ideological research have been hinted at above.

Depending on the methods used, they can be qualitative or quantitative and include

survey, observational, and experimental data, as well as data in the form of pedagogic

materials, official policy documents, newspaper and other media discourse, and historical

documents. In addition to these, some data types that have received comparatively less

attention include texts produced under experimental conditions (Wallis, 1998), time

allocation for different languages in broadcast media in multilingual contexts (Spitulnik,

1998), as well as data obtained by way of discussion groups (O’Rourke & Ramallo,

2013).

Newspaper discourse is probably the most frequently used type of data in

language ideology research. The reasons for a relative overemphasis on newspapers and

other print periodicals such as magazines over other potential sources of data are several.

First, and perhaps foremost, as already noted, “newspapers are self-conscious loci of

ideology production” (DiGiacomo, 1999, p. 105) as well as “key sites for language

31
ideological debates between various kinds of social actors” (Ensslin & Johnson, 2006, p.

155). Second, although they have been losing ground to newer media (e.g., television,

the Internet) for a long time, newspapers remain an influential institution in most

societies around the world, offering researchers a wealth of information in an increasingly

easily accessible and manipulable format (i.e., electronic text). Third, because of the

relative similarities between news discourse organization strategies across newspapers

but often also languages, newspaper data allow for effective synchronic comparisons of

discourses and ideologies based on independent variables such as political affiliation and

(ethno) national identity. Fourth, the relative constancy of newspaper formats across time

allows for equally effective diachronic comparisons, which are particularly useful in

demonstrating the changing nature of discourses and ideologies over time and their

dialogic relationship with the cultural, social, and economic conditions in which they are

embedded.

2.3.4 Corpus linguistics in research on discourse and language ideology.

Although there is a growing body of research that relies on corpus linguistics to study

discourse (e.g., Baker, 2006; Baker et al., 2008; Baker, 2010; Baker, Gabrielatos &

McEnery, 2013; Mautner, 2007; Partington, 2003; Partington, 2010), corpus-based (or

corpus-assisted or corpus-driven) studies explicitly concerned with language ideology are

still rare (e.g., Fitzsimmons Doolan, 2009, 2011, 2014; Ensslin & Johnson, 2006; Freake,

Gentil & Sheyholislami, 2011; Subtirelu, 2015; Vessey, 2013a,b). Corpus-based

discourse and language ideology studies make use of a wide variety of corpus-linguistic

tools such as wordlists, clusters, concordances, dispersion plots, collocates, and keywords

(for an excellent introductory overview, see Baker, 2006). Keyword analysis, a statistical

32
approach to contrastive analysis of word frequencies (for a detailed explanation, see

further below) is used to identify the lexical features characteristic of research corpora

and thus potentially interesting foci for follow-up discourse analysis. Scott (1997), for

example, uses this approach to identify lexical patterns suggesting gender bias in

contemporary English. Similarly, Subtirelu (2015) compares student evaluations of

university instructors with US and Korean names, finding a bias against the Koreans

rooted in a language ideology of nativism. Scott’s original approach focusing on

individual keywords has been broadened in recent years to include part-of-speech (POS)

and semantic category/field/domain analyses (Culpeper, 2009; Rayson, 2008), as well as

“aboutgrams” (Sinclair, 2006, cited in Warren, 2010, p. 118), the frequently occurring

lexical phrases which point to a text’s (or a corpus’) “aboutness”. While keywords point

to the “aboutness” of a text or corpus, key semantic fields and aboutgrams are taken to be

more directly suggestive of discourses and ideologies extant in the corpora under

investigation.

In addition to keyword analysis, corpus-based discourse studies often rely on

collocation analysis as well. Baker (2004, p. 347) argues that keyness and collocation

patterns can alert the researcher to “the existence of types of (embedded) discourse or

ideology.” Baker et al. (2008), for example, base their examination of the lexical patterns

around four core concepts (i.e., search terms) in their study of the discourse on

immigration in Britain on a combination of keyword analysis and collocation analysis.

Baker, Gabrielatos and McEnery (2013) take this approach a step further by using an

online corpus query system called Sketch Engine to grammatically tag their corpus and

then analyze not only lexical but also grammatical co-occurrence patterns between

33
collocates. Similarly, Freake, Gentil and Sheyholislami (2011) rely on collocation

analysis in addition to keyword analysis to contrast English- and French-language

discourses on the construction of nationhood in Quebec, while Vessey (2013b) adopts a

similar approach to study the language ideological debate on the use of French during the

Vancouver Olympics. Most recently, lexical patterns resulting from collocation analysis

have been used as data in quantitatively more sophisticated analyses. Fitzsimmons

Doolan (2011, 2014) used factor analysis, a multivariate statistical technique which

groups variables on the basis of their covariance (for details, see further below), on the

collocates of three core concepts identified using a 1.4 million-word corpus of language

policy documents, finding five factors interpretable as language ideologies.

Typically, the results of quantitative, corpus-based analysis are used to identify

patterns in the data which are worth pursuing further as well as to “downsample” the

corpus data to a size manageable by a human analyst. Quantitative analysis is then

combined with appropriate micro and macro discourse-analytic techniques in a

hermeneutic circle whereby the results of quantitative analysis lead to qualitative

analysis, the results of which are then in turn checked for reliability using quantitative

methods (Baker et al., 2008, p. 295; Fairclough, 2010; Partington, 2010; van Dijk, 2006;

Wodak, 2001; for examples, see Mautner, 2007 and Vessey, 2013a).

2.4 Gaps

This study aims to address several theoretical and methodological gaps in the

literature. It is well known that corpus linguistics research, including studies of discourse

and ideology, was developed in English and has also disproportionately focused on

English and a small number of major European languages. At the same time, extensive

34
inflectional morphology presents a number of corpus-linguistic challenges that are largely

absent from studies of languages such as English. Therefore, the first gap has to do with

the paucity of corpus-based research into smaller languages such as Central South Slavic

which are considerably structurally different from English.

The second gap is related to the first and has to do with geopolitical focus. Much

of recent research has examined language debates (e.g., Blommaert, 1999) and the links

between language ideology and national identity in plurilingual and multicultural

societies (e.g., Canada, Vessey, 2013; Finland, Hult & Pietikäinen, 2014; Spain/Catalonia,

Pujolar, 2007), but little attention has been paid to contexts with minimal linguistic

differences between groups, particularly in languages other than major European

languages (but see Wilce, 2010). This study will therefore contribute to the growing

body of literature on language ideology by focusing on a hitherto unexamined case

(closely related language varieties) and context (West Central Balkans).

Third, although several quantitative methods (keyword analysis, collocation

analysis, exploratory factor analysis) have been used in discourse and ideology studies,

there have as of yet been no attempts to compare them. So, this study will compare and

contrast the results obtained through the application of these methods in terms of their

usefulness and effectiveness for the study of language-related discourses and language

ideologies.

Finally, the fourth gap has to do with synchronic variation between different

discursive sites. This study will therefore compare language-related discourses and

language ideologies based on the distinction between ‘discursive’ and ‘practical’

consciousness (Kroskrity, 1998, 2004 following Giddens, 1979) by contrasting general

35
newspaper articles (written by journalists and/or experts) with letters-to-the-editor

(largely lay opinion).

3. Study Overview

This chapter presents the research questions, research design, construct definitions

and operationalizations, and coding procedures.

3.1 Research Questions

3.1.1 Research question 1. Can corpus linguistics-based quantitative methods

(keyword, collocation, exploratory factor, and cluster analyses) be used to identify lexical

patterns suggestive of dominant language-related discourses and language ideologies in

Central South Slavic and what similarities/differences are there between them?

3.1.2 Research question 2. What language-related discourses and language

ideologies relevant to Central South Slavic ethnolinguistic identities can be identified in

the 5+ hits section of SERBCORP?

3.1.3 Research question 3. What links can be identified between the language-

related discourses and language ideologies relevant to Central South Slavic

ethnolinguistic identities and ethnonationalism?

3.1.4 Research question 4. Is there synchronic and diachronic variation in the

identified language-related discourses and language ideologies relevant to Central South

Slavic ethnolinguistic identities?

3.2 Research Design

Table 1 presents the overall research design employed in the study. Data,

methods, and analytical procedures are listed by research question. Figure 1 shows a

diagram outlining the analytical process employed in the study.

36
Table 1

Research Design: CL and CDA Investigation of Language-related Newspaper Discourse

Research Question Data Methods/Analyses Conducted


RQ1. Can corpus linguistics- SERBCORP Identification of keywords in
based quantitative methods SERBCORP and the 5+ hits
(keyword, collocation, 5+ hits section of SERBCORP section of SERBCORP
exploratory factor, and cluster
analyses) be used to identify SERBCOMP Identification of significant
lexical patterns suggestive of collocates of the core concept
dominant language-related WaC corpora lemma JEZIK ‘language’
discourses and language
ideologies in Central South Exploratory factor analysis using
Slavic and what collocates as variables and texts
similarities/differences are there as observations
between them?
Analysis of variance comparing
mean text scores

Cluster analysis using factors as


predictor variables
RQ2. What language-related 5+ hits section of SERBCORP CDA/DHA-based interpretation
discourses and language of language-related discourses
ideologies relevant to Central Excerpts from individual texts and language ideologies (with
South Slavic ethnolinguistic identified as representative by references to the results of
identities can be identified in the EFA quantitative analyses)
5+ hits section of SERBCORP?
RQ3. What links can be Excerpts from individual texts CDA/DHA-based interpretation
identified between the language- identified as representative by of identified language ideologies
related discourses and language EFA as they pertain to Central South
ideologies relevant to Central Slavic ethnolinguistic identities
South Slavic ethnolinguistic Excerpts from the 1986 SANU15 (including topoi and references
identities and ethnonationalism? Memorandum to the results of quantitative
analyses)
RQ4. Is there synchronic and 5+ hits section of SERBCORP Analysis of variance comparing
diachronic variation in the mean scores of texts grouped by
identified language-related publication, year of publication,
discourses and language and type of article
ideologies relevant to Central
South Slavic ethnolinguistic Cluster analysis grouping texts
identities? by publication, year of
publication, and type of article

37
Corpus compilation
(relevant publications)

Sampling phase I

Research corpus Reference corpora

Articles with 1+ hits for Articles w/o hits for


jezi* (SERBCORP) jezi* (SERBCOMP);
WAC

Sampling phase II
Corpus comparisons >
SERBCOMP

Articles with 5+ hits for


jezi* (5+ hits section of
SERBCORP)

Keyword analysis Collocation analysis

Exploratory factor
analysis

Analysis of variance CDA/DHA

Cluster analysis

Figure 1. Diagram of the analytical process

38
3.3 Construct Definitions and Operationalizations

This section presents the definitions and operationalizations of the key constructs

investigated in the present study, as well as the coding procedures used.

3.3.1 Core concepts. The core concepts (lemma JEZIK ‘language’ for the purposes

of corpus compilation and initial collocation analysis, but also lemmas BOSANSKI JEZIK

‘Bosnian language’, CRNOGORSKI JEZIK ‘Montenegrin language’, HRVATSKI JEZIK

‘Croatian language’, SRPSKI JEZIK ‘Serbian language’, and

SRPSKOHRVATSKI/HRVATSKOSRPSKI JEZIK ‘Serbo-Croatian/Croato-Serbian language’ for

the purposes of follow-up analysis) are simply those lexical items and phrases whose

patterning in the corpus is most likely to lead to the identification of dominant language-

related discourses and language ideologies relevant to the maintenance and (re-)

construction of Central South Slavic ethnolinguistic identities and thus ethnonationalisms

in West Central Balkans.

3.3.2 Keywords. Here I use Scott’s (1997, p. 236) definition of a key word “as a

word which occurs with unusual frequency in a given text.” More importantly, keywords

are understood here also as “pointers to complex lexical objects which represent the

shared beliefs and values of a culture” (Stubbs, 2010, p. 23).

3.3.3 Relevant collocates. The relevant collocates are all lexical and some

function words (e.g., possessive pronouns) that are shown to be statistically significant

collocates of the core concepts and which, upon further analysis (i.e., concordancing,

factor analysis), are determined to pattern in ways suggestive of language-related

discourses and language ideologies. Although such collocates are determined by setting a

search span around a core concept (e.g., L5-R5), relevant collocates are defined here also

39
more broadly as “textual collocates” (Mason & Platt, 2006, cited in Stubbs, 2010, p. 27)

such that all their textual occurrences are counted in each text (for purposes of EFA) and

not only those that appear within a certain span of the node word.

3.3.4 Dominant language-related discourses. Discourse is a multifaceted term

with a range of divergent meanings in social sciences and the humanities. The traditional

definition of discourse in linguistics is simply “language above the sentence or above the

clause” (Stubbs, 1983, p. 1) or “language in use” (Brown & Yule, 1983). Michel

Foucault, however, added a socio-cognitive aspect to the term, defining it as “practices

which systematically form the objects of which they speak” (1972, p. 49). It is this

difference that prompted James Gee (2010, p. 34) to make a distinction between

discourses with a small ‘d’ (“language-in-use”) and discourses with a big ‘D’ (language-

in-use plus “socially accepted associations among ways of using language”) which is

sometimes used to differentiate between the two main understandings of discourse.

Foucault’s definition was elaborated by Norman Fairclough who conceptualizes

discourse in terms of three interrelated dimensions: text, discoursal practice (text

production, distribution and consumption), and social practice (e.g., 2010, p. 59). In

addition, as can be seen from Gee’s definition, it is possible to conceive of discourse, not

as a more or less coherent product of social and semiotic “practices” and thus a singular

noun as in Foucault’s case, but as a phenomenon with a multitude of manifestations and

therefore a plural noun. Although this plural understanding of discourse has gained

currency in both social sciences and the humanities, more often than not it is implicit and

left undefined. Most pertinently for my purposes here, Baker (2006) draws on several

sources to expand upon the original Foucault’s definition and add a plural dimension to

40
it, so I reproduce his account at length here,

[D]iscourse is a “system of statements which constructs an object” (Parker, 1992,


p. 5) or language-in-action (Blommaert, 2005, p. 2). It is further categorized by
Burr (1995, p. 48) as a “set of meanings, metaphors, representations, images,
stories, statements and so on that in some way together produce a particular
version of events… Surrounding any one object, event, person, etc., there may be
a variety of different discourses, each with a different story to tell about the world,
a different way of representing it to the world. Because of Foucault’s notion of
practices, discourse therefore becomes a countable noun: discourses (Cameron,
2001, p. 15). So around any given object or concept there are likely to be
multiple ways of constructing it […] (Baker, 2006, p. 4).

Taking this definition as a starting point, discourses are understood here as more

or less coherent systems of statements which construct an object of which they speak

(e.g., language) or an aspect thereof from a particular social or cultural position with the

goal of upholding the interests associated with this position. A discourse may be said to

be ideological to the extent that it reproduces or challenges unequal relations of power

between social subjects (Fairclough, 2010). Depending on their social effects, discourses

can thus be either dominant/hegemonic (reproducing domination) or subaltern/counter-

hegemonic (challenging domination). Language-related discourses are discourses thus

defined which pertain to language rather than other aspects of social reality. They are

operationalized as a) individual factors resulting from factor analysis, and sets of factors

clustering together (small ‘d’ discourses), and b) broader sets of statements about

linguistic and social relationships that extend across factor, cluster, and textual

boundaries and have identifiable ideological functions (big ‘D’ discourses).

3.3.5 Dominant language ideologies. Similar to discourse, language ideology is a

contested concept with a range of divergent meanings in the humanities and social

sciences (e.g., Eagleton, 1991). Following Irvine (1989, p. 255), language ideologies are

41
understood here as “the cultural system[s] of ideas about social and linguistic

relationships, together with their loading of moral and political interests.” More

specifically, “language ideologies represent the perception of language and discourse that

is constructed in the [political-economic] interest of a specific social or cultural group”

(Kroskrity, 2000b, p. 8, my emphasis). Also here, the use of the plural reflects a concern

with different social positions from which language ideologies emanate as well as the

different aspects of the object of ideologization. As with discourses, dominant language

ideologies are understood to be hegemonic, i.e. espoused from positions of power for the

purpose of maintenance of that power, whether naturalized or contested (cf. Kroskrity,

1998). Language ideologies are operationalized as implicit or explicit beliefs about

language and its relationship to society underlying dominant language-related discourses

identified here.

3.3.6 Ethnonationalism. Similar to discourse and ideology, nationalism is a

problematic concept and a complex social phenomenon that defies simple definitions. To

make things even more difficult, definitions and understanding of nationalism depend on

definitions and understanding of related concepts such as nation, nation-state, and

ethnicity, which are similarly problematic (for comprehensive discussions of these

concepts in relation to language, see, e.g., Barbour, 2000; Fishman, 1997; Fought, 2006;

May, 2001; Safran 1999). However, theorists often distinguish between ‘civic’ or ‘state’

and ‘ethnic’ nationalisms, i.e. nationalisms defined by affiliation with nations and nation-

states which are not necessarily culturally homogeneous (cf. Staats- or Willensnation; for

a discussion, see Wodak, de Cillia, Reisigl & Liebhart, 1999) and nationalisms defined by

affiliation with (or aspirations toward) nation-states based on ethnies or pre-existing

42
cultural groups that have a common myth of origin and share history, basic cultural

practices, and often language and religion (cf. Kulturnation, ibid.). Historically, ethnic

nationalism has clearly been the more important of the two in the Balkans (Carmichael,

2000), examples of temporarily successful civic or state nationalism such as the

(Communist) Yugoslav nationalism of the second Yugoslavia (1945-1991)

notwithstanding. Ethnonationalism is thus here understood simply as an intellectual

movement to define and pursue the political interests of a nation (whether self-declared

or recognized by others as such) and/or its state defined in ethnic terms.

3.4 Coding Procedures

Each text in the 5+ hits section of SERBCORP was coded for: a) publication

(Blic, NIN, Politika, Vreme); b) year of publication (2003, 2004, 2005, 2006, 2008), and

c) type of article (general newspaper articles, letters-to-the-editor).

4. Data

SERBCORP, the specialized research corpus compiled to represent general

language-related discourse in mainstream Serbian press, consists of articles containing

one or more instances of any of the lemma forms of the word jezik ‘language’ from four

leading national newspapers, two dailies (Blic, a tabloid, and Politika, a broadsheet) and

two weeklies published in Serbia (NIN, Vreme) in the period between 2003 and 2008.16

The publications were chosen based on three criteria: type of publication (broadsheets vs.

tabloids, dailies vs. weeklies), circulation figures,17 and relative standing in the Serbian

and regional publics (for details about the Serbian media market, see Đoković, Hrvatin &

Petković, 2004).18 Because full data sets were only available for the period between 2003

and 2008 (with the exception of Politika for the year 2007),19 the data set is limited to the

43
years 2003-2006 and 2008. Similarly, SERBCOMP, the reference (or, rather,

comparator) corpus used here, comprises articles from Politika, Blic, NIN, and Vreme, as

well as Večernje Novosti, published in the period between 2003 and 2014.

The two corpora were compiled by downloading the relevant articles from the

Serbian online media archive Ebart (www.arhiv.rs)20 as follows. After the target

publications had been identified, publication-specific searches were run for articles

containing any of the inflectional forms of the core concept lemma JEZIK ‘language’21

(and, perforce, the lemma JEZIČKI ‘linguistic’) by using the search term “jezi*” and the

given timeframe. Using a custom Python application, relevant articles thus identified

were then automatically downloaded, formatted22 and saved in separate folders according

to corpus, publication, and year and month of publication (e.g., SERBCORP>Politika>

2006>July). The application also automatically named the files according to publication

(e.g., POL for Politika), date of publication (e.g., POL-22-7-2006 for July 22, 2006), and

download rank for their given month (e.g., POL-22-7-2006-55 for the 55th article

downloaded from Politika for July 2006) or publication (in the case of publications with

lower numbers of articles, e.g. BLI-30-3-2004-544) to give them unique identifiers.

Similarly, using the search term “NOT jezi*”, SERBCOMP was compiled from randomly

chosen articles not containing any forms of either one of the two core concept lemmas

(JEZIK ‘language’ and JEZIČKI ‘linguistic’). Once compiled, both corpora were checked

for errors and duplicates. Finally, a frequency-based wordlist was used to identify and

exclude from SERBCORP a small number of articles which formally met the search

criteria (“jezi*”) but were nevertheless irrelevant (e.g., those containing words such as

jezičak ‘little tongue’, ježičak ‘little hedgehog’ and jezivo ‘horrible’, or last names such as

44
Ježić but not forms of the lemmas JEZIK or JEZIČKI).

SERBCORP comprises a total of 11,656,247 words from 16,148 articles, with a

majority of both (49.88% of words and 61.48% of articles) coming from the daily

Politika as the oldest and arguably most influential daily in Serbia (Tables 2 and 3).

Expectedly, dailies (Blic, Politika) contribute shorter articles as compared to weeklies

(NIN, Vreme), while standardized type-to-token ratios (Scott, 2014a) are similar across

the board (Table 4).

Table 2

Composition of SERBCORP (by Publication)

Publication No. of words % of words No. of articles % of articles


Blic 2,000,579 17.16 3,437 21.28
NIN 2,286,320 19.61 1,761 10.91
Politika 5,813,618 49.88 9,928 61.48
Vreme 1,555,730 13.35 1,022 6.33
Total 11,656,247 100.00 16,148 100.00

Table 3

Number of Articles in SERBCORP (by Year and Publication)

Year/Publication Blic NIN Politika Vreme Total by


year
2003 698 364 2,000 183 3,245
2004 670 376 1,848 224 3,118
2005 553 350 1,902 194 2,999
2006 674 332 2,151 176 3,333
2008 842 339 2,027 245 3,453
Total by publication 3437 1761 9,928 1022 16,148

SERBCOMP comprises a total of 22,493,804 words from 37,227 articles from all five

publications from the period between 2003 and 2014, as mentioned above (Table 5).

45
Table 4

Article Means, SD, and STTR in SERBCORP (by Publication)

Publication Mean length in words SD STTR


Blic 572.63 563.55 55.42
NIN 1,283.81 916.95 56.90
Politika 576.58 335.53 56.69
Vreme 1,505.12 1,384.88 56.55

Table 5

Composition of SERBCOMP (by Publication)

Publication No. of words % of words No. of articles % of articles


Blic 3,015,901 13.41 11,703 31.44
NIN 8,460,519 37.61 9,907 26.61
Politika 1,506,347 6.70 3,688 9.91
Večernje Novosti 910,588 4.05 2,889 7.76
Vreme 8,600,449 38.23 9,040 24.28
Total 22,493,804 100.00 37,227 100.00

Preliminary keyword and collocation analyses performed on SERBCORP (for

results and discussion, see Appendices A-C) suggested a section of SERBCORP

comprising articles with 5 or more hits for the lemma JEZIK ‘language’ as the optimal

research corpus for present purposes (see Table 6 for a breakdown of articles by hit count).

Table 6

Articles in SERBCORP (by Hit Count for the Lemma JEZIK and Percentage)

Hits No. of files per hit count %


1 10,616 65.75
2-4 4,275 26.47
5-9 843 5.22
10+ 414 2.56

The 5+ hits section of SERBCORP thus comprises a total of 1,118,454 words from 1,257

articles, with a majority of both from Politika (52.62% of words and 67.38% of articles,

see Tables 7 and 8). Similar to SERBCORP, the dailies contributed larger numbers of

46
shorter articles while standardized type-to-token ratios remain similar (Table 9).

Interestingly, the total number of 5+ hits articles decreased during this period in a linear

fashion (Figure 2), suggesting a gradual focus away from an explicit thematization of

language, arguably owing to changing sociopolitical circumstances (see Section 6.6). All

subsequent analyses were performed on the 5+ hits section of SERBCORP.

Table 7

Composition of the 5+ Hits Section of SERBCORP (by Publication)

Publication No. of words % of words No. of articles % of articles


Blic 148,492 13.28 164 13.05
NIN 280,281 25.06 184 14.64
Politika 588,605 52.62 847 67.38
Vreme 101,076 9.04 62 4.93
Total 1,118,454 100.00 1,257 100.00

Table 8

Number of Articles in the 5+ Hits Section of SERBCORP (by Year and Publication)

Year/Publication Blic NIN Politika Vreme Total by year


2003 49 42 196 10 297
2004 39 48 171 15 273
2005 20 36 170 13 239
2006 18 31 172 14 235
2008 38 27 138 10 213
Total by publication 164 184 847 62 1,257

Table 9

Articles Means, SD, and STTR in the 5+ Hits Section of SERBCORP (by Publication)

Publication Mean length in words SD STTR


Blic 893.79 949.33 55.07
NIN 1508.79 989.70 56.48
Politika 685.78 376.97 55.09
Vreme 1,613.16 1,164.43 55.89

47
400
297
273

No. of articles
300 239 235 213
200

100

0
2003 2004 2005 2006 2008
Year of publication

Figure 2. Distribution of 5+ hit articles (by year, all publications)

5. Methods

This study takes a mixed methods, lexical approach to the identification of

language-related discourses and language ideologies, combining corpus linguistics (CL)

and critical discourse analysis (CDA) in a manner similar to that originally proposed by

Baker et al. (2008). The initial, largely quantitative phase relies on five distinctly

different methodological approaches in the process of identification of pertinent lexis and

lexical patterns: keyword analysis, collocation analysis, exploratory factor analysis,

analysis of variance, and cluster analysis. All quantitative analyses were conducted with

the help of WST and the Statistical Package for the Social Sciences 21.0 (SPSS; IBM,

2012), as well as several custom Python and PERL applications. The follow-up, largely

qualitative phase in turn relies on analytical techniques developed within the discourse-

historical approach (DHA) to CDA (Reisigl & Wodak, 2009; Wodak, 2001). It should be

noted, however, that the research design is not purely sequential (quantitative-to-

qualitative) but rather hermeneutic (i.e., moving between quantitative and qualitative

techniques as necessary, cf. Baker et al., 2008; Reisigl & Wodak, 2009) as results of both

quantitative and qualitative analytical procedures are examined from both perspectives

and thus further focused and refined. Following is a discussion of the theoretical

48
background and a step-by-step explanation of the relevant parameters and procedures

used in the analysis.

5.1 Keyword Analysis

5.1.1 Theoretical background. Corpus-based discourse and ideology research

has mostly relied on keyword analysis (in addition to basic corpus-linguistic techniques

such as frequency, concordance, and collocation analysis, see below). Keyword analysis

has thus been used in a wide variety of discourse studies (see, for example, the essays in

Bondi & Scott, 2010) to identify what characterizes a certain text or corpus, as well as to

look for differences between parallel texts or corpora. The goal of keyword analysis

(Scott, 1997) is the identification of words “which occur with unusual frequency in a

given text [or corpus]” (p. 236), i.e. lexical features characteristic of research corpora and

thus potentially interesting as foci for follow-up discourse analysis. It requires a

reference corpus in addition to a research corpus and can be carried out automatically

using WST. An appropriate reference corpus should be composed of texts in the same

language as the research corpus, and is typically expected to be larger than the research

corpus, although what its optimal size may be is as of yet unclear (Scott, 2009, 2010).

The reference corpora of choice have often been large general corpora (i.e., those

comprising different registers) such as the BNC. However, in the absence of such

reference corpora (e.g., for languages other than English) and depending on research

questions, comparator corpora (corpora of similar size and register as the research

corpus) have been used. Examples include corpora compiled from texts on the same

topic reflecting different political or other orientations or corpora compiled from the same

types of texts excluding those with the same focus as the research corpus (see, for

49
example, Baker, 2006; Subtirelu, 2015; Vessey, 2013b). Once wordlists for both the

research and reference/comparator corpus have been compiled, keyword analysis uses

either the chi-square or log-likelihood statistic to cross-tabulate each word’s observed

frequency and the number of running words in the research corpus with its observed

frequency and the number of running words in the reference corpus (Scott, 2014a). This

procedure determines which words appear statistically more (or less) frequently in the

research corpus as compared to the reference corpus.

The result is a statistical measure of a word’s salience in the research corpus

reflected in a keyness score which is based on the statistic chosen (i.e., chi-square or log-

likelihood). A list of keywords (KWs) calculated for a corpus thus suggests the

“aboutness” of that corpus, i.e. what a corpus is about. KWs can be positive (when they

are significantly more frequent in the research as compared to the reference corpus) or

negative (when they are significantly less frequent in the reference corpus). Whereas

positive KWs suggest what a corpus is about, negative KWs can be used as an indicator

of what may be missing from it. Finally, the resulting KWs can be grouped into semantic

fields intuitively by the researcher in order to identify any patterns for further analysis

(cf. Baker, 2004; Ensslin & Johnson, 2006).

Keyword analysis has been the object of widespread criticism on several grounds

and particularly for its dependence on the size and type of reference corpus chosen, as

well as the choice of statistic (i.e., questionable reliability). Some researchers argue that

larger reference corpora are generally better (e.g., Scott, 2010), while others have

suggested that the optimal size for a (specialized) reference corpus may be five times the

size of the research corpus (Berber Sardinha, 1999, 2004). In contrast, Xiao & McEnery

50
(2005, p. 70) contend that “the size of the reference corpus is not very important in

making a keyword list,” particularly when dealing with sufficiently large corpora (cf.

Scott & Tribble, 2006, p. 64). Similarly, despite the wide reliance on large general

corpora, Culpeper (2009, p. 35) argues that it is better to use a reference corpus that is as

close as possible to the research corpus since this approach to keyword analysis avoids a

focus on irrelevant stylistic differences between registers and is more likely to produce a

keyword list which “reflect[s] something specific to the target [i.e., research] corpus.” At

the same time, as Rayson (2008, p. 527) notes, because of the independence assumptions

built into the procedure there should be no overlap between the research and reference

corpora. Finally, although Scott (2014a) suggests that the chi-square test “gives a better

estimate of keyness” in longer texts or entire corpora than the log-likelihood, Culpeper

(2009, p. 36) found that the two tests produce “only minor and occasional differences in

the ranking of words.”

5.1.2 Analytical parameters and procedures. Having decided on the optimal

reference corpus (see Appendix B for details of the procedure), several separate KW runs

were performed (retaining the parameters detailed above). First, to get a discursive

profile of the research corpus, the 5+ hits section of SERBCORP was compared to

SERBCOMP (the top 50 positive and all negative KWs are shown in Tables 12 and 13;

the full list of positive KWs is presented in Appendix D). Second, to get a discursive

profile of the 5+ hits section of SERBCORP vis-à-vis the 1-4 hits section of SERBCORP

and further test the validity of the sampling criterion (see discussion in Appendix A), the

5+ hits section of SERBCORP was compared to the 1-4 hits section of SERBCORP (the

top 50 positive and all negative KWs are shown in Tables 14 and 15; the full list of

51
positive KWs is presented in Appendix E). Third, KWs in the 5+ hits section of

SERBCORP were organized into semantic domains on the basis of their prevalent

meanings in this section of the corpus, as attested by concordance lines (Table 14).

Fourth, using WST a KW database was compiled to calculate key-KWs (KKWs, KWs

that are key in several texts) and KKW associates (KWs appearing in the same texts as

KKWs). The parameters used were as follows: p = .0000000001; minimum KKW

frequency = 2; minimum number of texts for database = 3; statistic for the calculation of

associates = MI3 (≥ 3); minimum number of associate texts for database = 3; and

minimum number of KWs per text for database = 3. Fifth, KKWs potentially related to

ethnolinguistic identities (e.g., glottonyms and ethnonyms) were identified and their

associates examined (Table 15). Sixth, and last, the KKW equivalents of the highest-

loading items from all 12 factors resulting from EFA (see Section 6.3) were examined for

associates in order to compare the results of keyword analysis associates procedure and

EFA (Table 16).

5.2 Collocation Analysis

5.2.1 Theoretical background. In contrast to keyword analysis, collocation

analysis examines the co-occurrence patterns between words and does not require a

reference corpus. The strength of association between two words is measured by various

statistical techniques such as the t-test, and z- and mutual information (MI) scores

(McEnery, Xiao & Tono, 2006). MI score, the preferred technique in analyses focusing

on relatively infrequent items, is calculated by comparing “the probability of observing

the two words together with the probability of observing each word independently, based

on the frequencies of the words” (Biber, Conrad & Reppen, 1998, p. 266). A score of 0

52
means that there is no association between the words, while a score higher than 0

suggests positive association; scores lower than 0 suggest negative association. An MI

score of 3 or higher is considered to indicate a significant association (Hunston, 2002, p.

71). Unlike keyword analysis, which represents a more general lexical (and discursive)

characterization of a corpus, collocation analysis provides an indication of how individual

words are used in a corpus. Such patterns can be suggestive of particular discourses and

underlying ideologies as “[n]o words are neutral [and] [c]hoice of words represents an

ideological position” (Stubbs, 1996, p. 107).

Further, in line with the recent shift in focus in corpus linguistics and applied

linguistics research generally to phraseology (see, for example, Biber, Conrad & Cortes,

2004; Gray & Biber, 2013; Chen & Baker, 2010), corpus-based research into discourses

and ideologies has examined n-grams (also known as lexical bundles or clusters, i.e.

recurring word combinations with n number of constituents, e.g. jezik i književnost

‘language and literature’; see Cheng & Lam, 2013 for a discourse analysis application).

N-gram analysis, as it will be referred to here, is useful as recurrent word combinations

can be more informative in semantic (and discursive) terms than individual collocates

considered in isolation.

5.2.2 Analytical parameters and procedures. Collocation analysis was run

separately on SERBCORP and the 5+ hits section of SERBCORP. It was conducted with

the help of the ‘concordance’ tool in WST, using the span of five words to the left of the

node word (lemma JEZIK ‘language’) and five words to the right (L5-R5), and cutoff

points for item frequency (≥ 20), number of texts (≥ 20), and strength of association (MI

≥ 5). Although these cutoff points are somewhat arbitrary, they ensured that the analysis

53
produced a manageable number of significant collocates that are sufficiently well

distributed throughout the corpus (cf. Biber, 1993). In the next step, the collocate lists

thus produced were scanned for the presence of irrelevant items such as function words, a

small number of which was then deleted from both lists.23 The results of collocation

analyses are presented in parallel lists ordered by frequency and MI score. The full lists

of the lemma collocates of JEZIK in SERBCORP (again, by frequency and MI score) are

shown in Appendix F (Tables F1 and F2). The top fifty lemma collocates of JEZIK in the

5+ hits section of SERBCORP (by frequency and MI score) are shown in Tables 17 and

18. The full list of collocates for the 5+ hits section of the corpus is given in Appendix G

(Tables G). (Note that these are the collocates that were used in EFA.) Finally, n-gram

analysis was conducted using the ‘clusters’ function in the ‘concordance’ tool in WST

(not to be confused with cluster analysis discussed in Section 6.5) The parameters used

were 2-6-constituent n-grams (to cover a wide range of frequently occurring phrasal

patterns), with a minimum item frequency of five in the span of five words to both left

and right of the node word (L5-R5); analysis was conducted separately for each of the

forms of the node lemma JEZIK ‘language’. A sample of the most frequent n-grams in the

5+ hits section of SERBCORP is shown in Table 19 (Section 6.2.2).

5.3 Exploratory Factor Analysis

5.3.1 Theoretical background. Exploratory factor analysis is a multivariate

statistical technique which groups variables into sets (called factors) based on their

covariance (for a detailed explanation of EFA, see Tabachnick & Fidell, 2007). It is

particularly useful for explorations of large data sets with numerous variables because it

can suggest patterns of variation and thus constructs underlying multiple variables, which

54
makes interpretation of complex patterns of variation possible. The application of EFA in

linguistics was pioneered and popularized by Douglas Biber (e.g., 1988, 2006), whose

methodology for the analysis of language use based on function-related patterning

between large numbers of grammatical and other variables (called multidimensional

analysis, MD) has had a significant impact on the study of grammar as well as

composition pedagogy, second language acquisition, and other related areas. Although

EFA-based multidimensional analysis is used in an increasing number of subfields of

applied linguistics (see, for example, the papers in Cortes & Csomay, 2015), it has not,

with one exception, been used in studies of discourse and ideology. Fitzsimmons Doolan

(2011, 2014), however, adapted MD by focusing on the co-occurrence patterns among

lexical rather than grammatical features. In her study of language ideologies in the

educational sphere in Arizona, she compiled and analyzed a corpus of official language

policy documents to identify the collocates of the core concepts language, literacy and

English. In the next step, she counted the frequencies of these collocates in all of the

texts in her corpus and then subjected those counts to EFA. This resulted in five factors

(i.e., groups of collocates that systematically co-occur throughout the corpus)

interpretable as different language ideologies on account of their indexical links to

language-related beliefs and attitudes existing in the social realm. Similar to EFA, cluster

analysis is a multivariate statistical technique which can be used to group objects or cases

such as individual texts within a data set. It has been recommended as a follow-up

procedure to EFA because of its ability to identify hitherto unidentified patterns in data

(Biber & Staples, in press). In this study, it is used to explore the differences and

similarities between texts based on their variation on three independent variables

55
(publication, year of publication, and article vs. letter-to-the-editor).

5.3.2 Analytical parameters and procedures. The application of EFA here is

based on Biber (1988) and Fitzsimmons Doolan (2011, 2014). However, instead of

limiting the collocates used in the analysis to those occurring in the premodifier position

(i.e., L1) as in Fitzsimmon Doolan (2011, 2014), all 305 collocates of the core concept

JEZIK ‘language’ in the 5+ hits section of SERBCORP were included, regardless of their

syntactic function or position vis-à-vis the node (for collocation analysis parameters and

procedures, see Section 5.2.2 above; for a full list of collocates, see Appendix G). Also, it

is important to reiterate at this point that collocates identified through collocation analysis

(i.e., micro-collocates) have a broader definition in their use in EFA as they are

considered and counted even when they occur outside of the ‘horizon’ of five word-slots

to the left or right of the node word (cf. macro- or textual collocates, Mason & Platt,

2006, cited in Stubbs, 2010, p. 27). Put simply, all textual appearances of the relevant

collocates are counted in each text to be included in the analysis and not only those that

appear within a certain span of the node word as is normally done in collocation analysis.

After the list of relevant collocates had been identified, a custom PERL program

was used to count (and normalize to a text length of 1,000 words) the frequency of each

collocate as a variable in each text as an observation. This normalization enabled

comparisons of frequency counts across texts of different lengths. Normalized frequency

counts were then inputted into SPSS, to check for assumptions and factorability (using

the following procedure outlined in Tabachnick & Fidell, 2007).

The data were first checked for multivariate outliers by examining each text’s

score on the Mahalanobis variable. With α = .001 and df = 306 (the number of variables),

56
the critical value of χ2 was 388.178; 314 texts had values in excess of the critical value

and were therefore excluded from further analysis.24 The remaining steps in the

procedure were thus performed on the resulting smaller data set (n = 943). The deletion

of multivariate outliers also resulted in the removal of all occurrences of the variable

jednom ‘once’, so it too was removed leaving the number of collocate variables at k =

305. Next, assumptions for factor analysis were checked. This was done by first

checking for multi-colinearity and singularity which were assessed by examining

tolerance, condition indexes, and variance proportion items. Singularity was found for

two items, Monte and negro,25 so negro was excluded from further consideration, leaving

the number of variables at k = 304. Normality and linearity were not examined because

the results are used descriptively.

Once it was determined that the data set met assumptions, principal factor

analysis was run using principal axis factoring (n = 304, k = 943). To assess the

factorability of the data set, the correlation matrix (several bivariate correlations were ≥

.30), KMO value (middling at .648), and Bartlett’s test of sphericity (significant, χ2

[46665] = 100910.536, p < .000) were examined. Based on these results, the data set was

considered to be factorable.

The number of factors was determined by examining a) the scree plot (which

seemed to flatten out between Factors 12 and 14), and b) the number of factors with

initial eigenvalues over 1.0 (108) and 2.0 (30), neither of which was considered

parsimonious; the number of factors with initial eigenvalues over 3.0 was twelve. The

range of solutions between 12 and 14 factors was next explored using the Varimax

rotation.26 Fewer than five variables loaded highly on the thirteenth and fourteenth

57
factors of the thirteen- and fourteen-factor solutions, so those two solutions were

discarded as over-factoring. Although all twelve factors of the twelve-factor solution

were represented by at least 6 salient loadings (≥ |.30|), a large number of variables had

communalities lower than .2, while the solution accounted for only 17.46% of the total

variance in the data. To determine the optimal factor solution, a series of rotations was

performed, removing variables with communalities < .2 and re-examining factorability at

each step.

The preferred solution was the twelve-factor solution with k = 107 collocate

variables, which a) accounted for the most total variance in the data (34.13%), b)

produced factors consisting of positively loading variables only, c) did not include any

item communalities < .2, and d) had the highest KMO value (.801). This factor solution

was further assessed for internal consistency, which was measured by examining the

Cronbach’s alpha of all items loading highly on each factor. The internal consistency

analysis produced the following results: Factor 1 (α = .864), Factor 2 (α = .768), Factor 3

(α = .734), Factor 4 (α = .679), Factor 5 (α = .678), Factor 6 (α = .534), Factor 7 (α =

.684), Factor 8 (α = .720), Factor 9 (α = .777), Factor 10 (α = .501), Factor 11 (α = .572),

Factor 12 (α = .554).

The stability of the solution was investigated by comparing the factors and items

with salient loadings on those factors between the different rotations. All factors

appeared in all rotated solutions (with minor differences in composition and order), while

Factor 7 changed in the preferred solution from a factor consisting mostly of negatively

loading variables to one in which the same variables all had positive loadings.

Interpretation of the factors in the preferred solution was conducted by examining

58
the collocate variables with salient loadings (≥ .30) on each factor individually. To

identify texts representative of each factor, factor scores were estimated for each text

using regression analysis. This was followed by a qualitative analysis of texts with top

factor scores to confirm and elaborate the interpretations.

However, it should be noted here that a separate analysis suggested that texts

initially identified as multivariate outliers (i.e., texts with ‘extreme’ scores on multiple

variables) would be among the most representative texts for all factors. Factor scores

were therefore also calculated for the full data set (i.e., including multivariate outliers),

this time by first converting the normalized frequencies into z-scores to standardize them,

and then summing the standardized frequencies of all variables with salient loadings on a

factor for a factor score for each text.27 Variables with salient loadings on more than one

factor were only included in the computation of the factor score for the factor on which

they had the highest loading (for a rationale for this procedure, see Biber, 1988, pp. 93-

95). An examination of the highest factor scores based on z-scores revealed that the

multivariate outlier texts indeed had many of the highest factor scores on all factors. A

comparison of factor scores estimated by regression analysis and those produced by z-

scores, however, showed that the two methods of computation were highly comparable.

Because these texts are outliers only in an abstract statistical sense and certainly belong to

the ‘population’ of texts sampled here, a decision was made to retain them in the

remainder of the analysis.

5.3 Synchronic and Diachronic Variation (Analysis of Variance)

5.3.1 Theoretical background. The rationale for analysis of variation here is to

try to determine whether there are any statistically significant differences between a)

59
different publications in this sample, b) different types of articles approximating different

types of consciousness (see Section 2.4 and discussion at the end of Section 6.3.1), as

well as c) diachronically between individual years of publication over the subject period.

Fitzsimmons Doolan (2011, 2014), for example, has shown that mean factor scores can

be used to compare observations grouped by what she calls ‘registers’ (i.e., text types) in

order to examine any variation between them. Similar to this, I compare publications,

years of publication, and types of articles in terms of how they score on individual

factors, and thus discourses, in an attempt to determine whether there are any significant

differences in the discursive constructions of language between publications, ‘lay’ people

and ‘experts’, and over time.

5.3.2 Analytical parameters and procedures. Synchronic and diachronic

variation in language-related discourses were examined through a comparison of the

factor scores for each of the six selected language-related discourses (i.e., factors) of texts

grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year of publication: 2003,

2004, 2005, 2006, 2008; and c) type of article: general newspaper articles vs. letters-to-

the-editor. The distribution and variation of factor scores for each group of texts were

examined to determine the appropriate statistical procedures to be used. To examine the

distribution, histograms and the Shapiro-Wilk test were used. To examine homogeneity

of variance, the Levene test was used. Normality assumptions were violated for all

factors and homogeneity assumptions were violated for all factors except Factor 8 on

‘publication’, and Factors 4, 8, 10, and 11 on ‘type of article’, so appropriate non-

parametric tests (Kruskal-Wallis, Mann-Whitney U) were chosen. The results of the

statistical comparisons among groups are presented by factor in Sections 6.4.1-6.4.3. For

60
all tests, α = .017 to reflect a Bonferroni adjustment of splitting the standard α = .05 by

three because the same data were used to run three separate analyses.

5.4 Cluster Analysis

5.4.1 Theoretical background. Cluster analysis is a multivariate exploratory

statistical procedure used to group within a data set cases/observations (e.g., texts)

defined in terms of categorical variables (Biber & Staples, in press). Clustering texts into

groups based on similarity in scores on quantitative measures such as factors/discourses

offers an insight into discursive patterning independent of researcher inference. Thus,

cluster analysis is used here to examine the patterning of texts and factors/discourses with

respect to three independent variables (as in the synchronic and diachronic analyses

above): publication, year of publication, and type of article.

5.4.2 Analytical parameters and procedures. Cluster analysis was conducted

using the twelve factors and the agglomerative hierarchical cluster analysis (HCA)

method (for a detailed, step-by-step explanation, see Biber & Staples, in press). Once the

optimal number of clusters was determined, a one-way ANOVA was used to compare the

mean scores of the predictor variables (i.e., factors) and check for statistical significance.

Next, the mean scores of the twelve factors for each of the six identified clusters were

examined to determine which factors scored most highly on which clusters. Lastly, the

composition of each cluster was investigated by using the crosstabs function in SPSS and

the three independent variables (publication, year of publication, and general newspaper

articles vs. letters-to-the-editor).

5.5 Critical Discourse Analysis: Discourse-historical Approach

5.5.1 Theoretical background. Texts, as Fairclough (2010, p. 57) notes, “bear

61
the imprint of ideological processes and structures.” Although, as he further argues, it

may not be possible “to ‘read off’ ideologies from texts […] because meanings are

produced through interpretations of texts and texts are open to diverse interpretations”

(ibid.; see also van Dijk, 2006), large-scale corpus-based analysis of frequency can reveal

patterns indicative of the cumulative effect of the media representations of particular

topics and point to the beliefs and assumptions (i.e., ideologies) that underlie them.

According to Stubbs (1996, p. 196), “the study of recurrent wordings is of central

importance in the study of language and ideology and can provide empirical evidence

[of] how culture is expressed in lexical patterns.” Corpus-linguistic tools can thus

provide a map of a corpus based on lexical patterns suggesting discourses and underlying

ideologies and “pinpointing areas of interest for a subsequent close analysis” (Baker et

al., 2008, p. 284).

However, as numerous researchers have noted (e.g., Baker et al., 2008;

Blackledge & Pavlenko, 2002; Partington, 2010; Ricento, 2006; Vessey, 2013a), corpus

linguistic analysis, powerful though it is, does not in itself constitute discourse or

ideological analysis. Discourses and ideologies do not exist in a vacuum, but rather

‘work’ by establishing links with social structures and practices, as well as by making

explicit or implicit references to other texts (intertextuality) or other discourses

(interdiscursivity). Understanding and interpreting discourses and ideologies, therefore,

requires a social, cultural, historical, and political contextualization of the lexical patterns

uncovered with the help of corpus linguistic tools.

With its focus on discourse, ideology and power, as well as a flexible, eclectic

methodology (for an overview, see Wodak & Meyer, 2009), critical discourse analysis

62
(CDA) is ideally suited for such analysis and thus as a complement to quantitative lexical

analysis. In the simplest of terms, CDA relies on linguistic analysis of discourse to

uncover and expose relations of unequal power in society. In CDA, text is “conceived as

a semiotic entity, embedded in an immediate, text-internal co-text as well as intertextual

and sociopolitical context,” while discursive and linguistic data are seen “as a social

practice, both reflecting and producing ideologies in society” (Baker et al., 2008, pp. 279-

280). Similarly, discourse is conceptualized as “a complex of three elements: social

practice, discoursal practice (text production, distribution and consumption), and text”

(Fairclough, 2010, p. 59). The ultimate goal of CDA, then, is to move from a micro-

analytic perspective of text to a macro-analytic perspective of social practice to

demonstrate “how language functions in constituting and transmitting knowledge, in

organizing social institutions or in exercising power in different domains/fields in our

societies” (Wodak, 2004).

CDA, of course, has been criticized, sometimes severely, on a number of grounds,

but particularly for its methodological shortcomings such as selectivity or potential bias

in data collection procedures, small samples, and a lack of concern for replicability (e.g.,

Blommaert, 2005; Stubbs, 1997). Recognizing this, Baker et al. (2008) have proposed a

methodological ‘synergy’ between corpus linguistics and critical discourse analysis,

whereby the two methodological approaches complement one another and thus cancel out

each other’s limitations. Although such a synergy does not necessarily guarantee

research entirely free of researcher inference (Baker, 2011 cited in Fitzsimmons Doolan

2014, p. 61), if applied in a principled manner it can demonstrably minimize it (e.g.,

Fitzsimmons Doolan, 2014). In any case, all language use and all analysis are perforce

63
ideological in the sense that, arguably, ideologically neutral positions are impossible, and

as a ‘critical’ approach, CDA has always refused to claim ‘objectivity’ (Fairclough, 2001,

p. 5). Further, as Vessey (2013a) notes, the principle of researcher self-reflexivity (e.g.,

Pennycook, 2001) applies.

5.5.2 Analytical parameters and procedures. Although many of the analytical

techniques available from the different approaches developed within CDA since its

inception could find application here (again, see Wodak & Meyer, 2009 for a

methodological overview), they are not all equally useful for our present purpose, which

is to examine language ideologies identifiable from language-related public discourse

and, particularly, the argumentation strategies deployed in the negotiation of contested

ethnolinguistic identities. For example, micro-analytical categories developed within the

systemic-functional linguistics (SFL) such as passivization and agentivity (Halliday &

Mathiessen, 2004) do not seem to have a clear direct application here. The discourse-

historical approach (DHA, Wodak, 2001, 2004), developed to trace the constitution of a

particular stereotypical image in public discourse, however, offers several macro-

analytical tools that are potentially useful (cf. Vessey, 2013a). DHA draws on

argumentation theory to identify several discursive strategies of relevance to analysis of

identity-related discourse such as the referential/nomination strategy (the construction of

in- and out-groups through membership categorization by metaphor and metonymy), and

predication (justification of positive or negative attributions given to social actors)

(Wodak & Meyer, 2009, pp. 319-320). However, preliminary analysis suggested as

particularly relevant and useful the discursive strategy of argumentation (i.e., topoi).

Topoi are defined as explicit or inferable obligatory premises which make it possible to

64
connect arguments with the conclusion (Wodak & Meyer, 2009), or simply “the common-

sense reasoning typical for specific issues” (van Dijk, 2000 cited in Baker et al., 2008, p.

299). In line with the methodological synergy explicated above, representative texts

identified through quantitative analysis are subjected to DHA with the goal of identifying

and describing the argumentation strategies (i.e., topoi) and their common frame of

reference (i.e., their associated language ideologies).

Finally, a note on expectations of findings. Hardt-Mautner (1995), for example,

has argued that researchers need to do background research and form hypotheses prior to

carrying out CL-informed discourse analysis. In purely corpus-linguistic terms, this

means that CL-informed discourse analysis should be corpus-based rather than corpus-

driven (cf. Tognini-Bonelli, 2001). Indeed, in one sense, it would of course be difficult to

evaluate, much less interpret any patterns resulting from CL analysis without relevant

background knowledge. However, although (unlike Baker et al., 2008), I do not think

hypotheses in corpus-informed discourse analysis are always necessary (e.g., in corpus

data mining studies such as Mautner, 2007, to point to an obvious example), I do think it

is useful to comment on my expectations of findings here, if for no other reason than

because what a researcher ultimately decides to focus on in a study is to a certain extent

conditioned by his or her own ideological commitments.

Language-related discourse in the Balkans in the last twenty-five years has been

primarily concerned with the symbolic value of language in the processes of construction

and maintenance of ethnolinguistic identities and attainment of sovereignty. The breakup

of former Yugoslavia, as I have already noted above (Chapter 1), produced a climate of

pervasive, pathological contestation which continues, albeit with diminishing intensity, to

65
this day. I therefore expect to find evidence of a discourse of contestation focusing on

linguacultural authenticity and ethnolinguistic identities. In addition to the

methodological approach outlined above, my analysis will be informed by the language-

ideological theoretical framework of linguistic differentiation, i.e., the “similarities [and

differences] in the ways ideologies ‘recognize’ or misrecognize linguistic differences:

how they locate, interpret, and rationalize sociolinguistic complexity, identifying

linguistic varieties with ‘typical’ persons and activities and accounting for the

differentiations among them” (Irvine & Gal, 2000, p. 36). Irvine and Gal (2000) identify

three processes by which this differentiation works, which they call iconization, fractal

recursivity, and erasure. Iconization refers to the mapping of linguistic features onto

social images, positing a direct link between one or more linguistic features and (an

essentialist conceptualization of) the nature of the persons or social groups who display

them. Fractal recursivity involves the projection of binary oppositions (e.g.,

existence/non-existence) from one level of relationship to another (e.g., from local to

regional). Erasure here refers to the simplification of a sociolinguistic field through

which some persons, social groups, or sociolinguistic phenomena are rendered invisible

in ideologically and politically convenient ways. A central theme which I expect to

emerge is that of “imagined inherent, natural links between a unitary mother tongue, a

territory, and an ethnonational identity” (Irvine & Gal, 2000, p. 60), or rather how such

links are used as arguments in the discourse of contestation around ethnolinguistic

identity. In addition, reference will be made to Blommaert and Verschueren’s (1998)

concept of “homogeneism” which refers to a belief in the “impossibility of heterogeneous

communities and the naturalness of homogeneous communities” (p. 207), a belief which

66
is a corollary of essentialist discourses and ideologies.

6. Results

This chapter is divided into three sections. The first section presents the results of

the application of different quantitative methods and statistical analyses (keyword,

collocation, factor, analysis of variance, and cluster analysis). The second section

presents the results of qualitative analysis (supported by relevant quantitative evidence).

The results include observations about the relative effectiveness of individual methods;

patterns in synchronic and diachronic variation in language-related discourses as well as

variation between different sites of discursive (re)production; and ethnolinguistic

identity-related discourses and language ideologies identified in the mainstream Serbian

press from the subject period.

6.1 Keyword Analysis

6.1.1 Keyword analysis (5+ hits section of SERBCORP). As explained above,

several keyword analyses were conducted in this study. The first keyword analysis

involved a comparison between the 5+ hits section of SERBCORP as the research corpus

and SERBCOMP as the comparator corpus. This analysis identified a total of 151

positive and 40 negative key lemmas. Tables 10 and 11 show the top 50 positive key

lemmas and all negative key lemmas in the 5+ hits section of SERBCORP (the full list of

positive key lemmas is shown in Appendix D).

Unsurprisingly, the top positive key lemma is JEZIK ‘language’, which of course

simply reflects the selection criterion used to create the research corpus (5+ hits section

of SERBCORP). Even a cursory glance at the remainder of the top 50 positive key

lemmas shows that the discursive profile here is similar to that of SERBCORP as a whole

67
(Appendix C), with numerous references to semantic fields such as education and

literature. However, it is equally clear that there is one major difference between the two

lists: whereas the SERBCORP key lemma list includes few items referring to regional

(i.e., Central South Slavic) ethnolinguistic identities, the list of key lemmas in the 5+ hits

section of SERBCORP includes items referring to all major regional (as well as other)

ethnonyms and glottonyms: srpski ‘Serbian’ (7,309 occurrences), now the second most

key key lemma, as well as crnogorski ‘Montenegrin’ (rank 28, 696 occurrences),

srpskohrvatski ‘Serbo-Croatian’ (rank 29, 226 occurrences), Srbi ‘Serbs’ (rank 35, 1,432

occurrences), hrvatski ‘Croatian’ (rank 36, 684 occurrences), bosanski ‘Bosnian’ (rank

46, 266 occurrences), and bošnjački ‘Bosniak’ (rank 47, 223 occurrences). In addition,

there is a set of other lexical items suggested as potentially relevant by perusal of

randomly selected articles such as ćirilica ‘Cyrillic’ (rank 11, 590 occurrences), pismo

‘alphabet’ (rank 19, 1,243 occurrences), narod ‘people’ (rank 21, 1,510 occurrences),

manjina ‘minority’ (rank 37, 491 occurrences), and nacionalni ‘national’ (rank 40, 1,361

occurrences). Further, the remainder of the 151 key lemmas (Table D1, Appendix D)

includes a considerable number of similarly pertinent items such as Crnogorci

‘Montenegrins’ (rank 60, 213 occurrences), Vuk (Karadžić)28 (rank 64, 482 occurrences),

Hrvati ‘Croats’ (rank 65, 315 occurrences), SANU ‘Serbian Academy of Arts and

Sciences’ (rank 66, 235 occurrences), identitet ‘identity’ (rank 68, 331 occurrences), naziv

‘(language) label’ (rank 70, 442 occurrences), ime ‘name’ (rank 73, 943 occurrences),

politika ‘politics’ (rank 74, 1,475 occurrences), Crna Gora ‘Montenegro’ (ranks 76 and

77, 1,072 occurrences), tradicija ‘tradition’ (rank 92, 252 occurrences), and nacija

‘nation’ (rank 93, 305 occurrences). This confirms that the 5+ hits section of

68
SERBCORP may indeed be a better target for analysis here. Also, note that the negative

key lemmas (Table 11) exhibit semantic patterns very similar to those identified for

SERBCORP as a whole (Table C2, Appendix C), i.e., lack of references to national

political and state institutions, as well as finances.

69
Table 10

Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
5 English engleski 1220 0.11 271 722 4949.45 0.0000000000
6 literature književnost 1305 0.12 269 1158 4667.73 0.0000000000
7 mother (adj.) maternji 611 0.05 197 1 3712.29 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
9 dictionary rečnik 801 0.07 135 320 3576.38 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
11 Cyrillic ćirilica 590 0.05 74 188 2756.74 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
13 professor profesor 1524 0.14 307 5952 0.03 2313.27 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
22 education obrazovanje 893 0.08 187 2969 0.01 1558.60 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
24 education (profession) prosvete 563 0.05 219 965 1516.59 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
26 linguist lingvista 260 0.02 81 12 1488.69 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
29 Serbo-Croatian srpskohrvatski 226 0.02 70 6 1323.38 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
34 school (university) fakultet 995 0.09 89 5250 0.02 1101.39 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
36 Croatian hrvatski 684 0.06 163 2749 0.01 1010.51 0.0000000000
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000

70
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 national nacionalni 1361 0.12 88 9893 0.04 961.57 0.0000000000
41 French francuski 477 0.04 140 1488 875.85 0.0000000000
42 elementary osnovni 870 0.08 87 5077 0.02 849.01 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
46 Bosnian bosanski 266 0.02 64 432 736.64 0.0000000000
47 Bosniak bošnjački 223 0.02 70 254 725.60 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000

Table 11

Negative Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 government vlada 412 0.04 138 26794 0.12 -844.02 0.0000000000
2 Serbia Srbija 2922 0.26 180 93159 0.41 -700.54 0.0000000000
3 millions miliona 177 0.02 104 15619 0.07 -654.02 0.0000000000
4 president predsednik 469 0.04 184 23415 0.10 -518.40 0.0000000000
5 year godina 5350 0.48 591 139140 0.62 -371.61 0.0000000000
6 parties stranke 148 0.01 69 10165 0.05 -339.42 0.0000000000
7 against protiv 419 0.04 240 18154 0.08 -312.14 0.0000000000
8 day dan 912 0.08 151 30134 0.13 -256.64 0.0000000000
9 authorities vlast 510 0.05 106 17482 0.08 -167.75 0.0000000000
10 director direktor 321 0.03 139 11611 0.05 -130.52 0.0000000000
11 Kosovo Kosovu 106 69 5254 0.02 -114.91 0.0000000000
12 last prošle 169 0.02 138 7041 0.03 -111.63 0.0000000000
13 citizens građani 412 0.04 70 13444 0.06 -109.59 0.0000000000
14 time vreme 1681 0.15 523 43233 0.19 -106.00 0.0000000000
15 public javnost 294 0.03 84 10303 0.05 -105.73 0.0000000000
16 solution rešenje 193 0.02 84 7465 0.03 -99.91 0.0000000000
17 law zakon 687 0.06 112 19899 0.09 -99.40 0.0000000000
18 percent odsto 816 0.07 193 22713 0.10 -92.36 0.0000000000
19 after posle 953 0.09 490 25886 0.12 -91.46 0.0000000000
20 now sad 396 0.04 237 12467 0.06 -89.12 0.0000000000
21 affairs poslova 92 66 4199 0.02 -79.70 0.0000000000
22 choice izbor 309 0.03 110 9934 0.04 -76.78 0.0000000000
23 moment trenutku 153 0.01 124 5648 0.03 -67.10 0.0000000000
24 expect očekuje 89 64 3828 0.02 -64.83 0.0000000000

71
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
25 week nedelje 104 86 4240 0.02 -64.12 0.0000000000
26 larger veći 410 0.04 108 11978 0.05 -62.32 0.0000000000
27 decision odluka 463 0.04 85 13130 0.06 -58.97 0.0000000000
28 political politički 856 0.08 131 22155 0.10 -56.87 0.0000000000
29 yesterday juče 206 0.02 148 6719 0.03 -54.67 0.0000000000
30 group grupa 463 0.04 110 12934 0.06 -53.63 0.0000000000
31 case slučaj 499 0.04 125 13753 0.06 -52.89 0.0000000000
32 say reći 1087 0.10 247 27137 0.12 -52.38 0.0000000000
33 five pet 320 0.03 227 9446 0.04 -51.56 0.0000000000
34 place mesto 799 0.07 222 20390 0.09 -47.07 0.0000000000
35 problem problem 824 0.07 250 20854 0.09 -45.08 0.0000000000
36 parliament skupštine 134 0.01 71 4590 0.02 -43.92 0.0000000000
37 city grada 159 0.01 104 5193 0.02 -42.45 0.0000000000
38 end kraj 588 0.05 102 15327 0.07 -41.39 0.0000000000
39 immediately odmah 193 0.02 148 5868 0.03 -36.46 0.0000000001
40 six šest 201 0.02 152 6055 0.03 -36.17 0.0000000001

72
6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of

SERBCORP). The second keyword analysis involved a comparison between the 5+ hits

section of SERBCORP as the research corpus and the 1-4 hits section of SERBCORP as

the comparator corpus. This analysis identified a total of 90 positive and 14 negative key

lemmas (the full list of positive key lemmas is shown in Table E1 in Appendix E). The

top 50 positive key lemmas and all negative key lemmas in the 5+ hits section of

SERBCORP with 1-4 hits section of SERBCORP as the comparator corpus are presented

in Tables 12 and 13.

Quite expectedly, of course, JEZIK is the top lemma also here. Compared to the 1-

4 hits section of SERBCORP, the discursive profile of the 5+ hits section of SERBCORP

is defined by items related to regional ethnolinguistic identities and education, with some

(albeit considerably fewer than in SERBCORP) references to literature (including

translation) and culture. Importantly, however, items referring to the major regional

ethnolinguistic identities are now all in the top 30 key lemmas, while most other relevant

items identified toward the end of the previous section have moved up in the list.

Interestingly, the lemma zakon ‘law’ is now identified as a positive keyword (rank 90,

687 occurrences). Note also that this prominence of the relevant (i.e., Central South

Slavic) ethnonyms and glottonyms in the list further validates the sampling criterion as

well as the cutoff point of 5 hits for the lemma JEZIK per article (see Section 4 and

Appendix A). Interestingly, as before, the negative key lemmas (now considerably fewer

in number on account of the smaller size of the comparator corpus) indicate a consistent

absence of items referring to national political and state institutions.

73
Table 12

Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus

(by Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
4 dictionary rečnik 801 0.07 135 783 1717.42 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
6 mother (adj.) maternji 611 0.05 197 649 1249.68 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
8 Cyrillic ćirilica 590 0.05 74 905 942.80 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
14 linguist lingvista 260 0.02 81 96 823.13 0.0000000000
15 professor profesor 1524 0.14 307 5987 0.06 775.36 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
18 Croatian hrvatski 684 0.06 163 1623 0.02 729.29 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
20 Serbo-Croatian srpskohrvatski 226 0.02 70 142 597.25 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
22 education obrazovanje 893 0.08 187 3078 0.03 574.07 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
26 Bosniak bošnjački 223 0.02 70 181 526.18 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
29 Bosnian bosanski 266 0.02 64 383 445.70 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000
31 elementary osnovni 870 0.08 87 3686 0.03 378.99 0.0000000000
32 national nacionalni 1361 0.12 88 6963 0.07 369.42 0.0000000000
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000

74
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats Hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
44 Montenegrins Crnogorci 213 0.02 62 564 199.56 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
48 meaning značenje 262 0.02 79 867 179.80 0.0000000000
49 school (university) fakultet 995 0.09 89 5774 0.05 177.70 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000

Table 13

Negative Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by

Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 year godina 5350 0.48 591 61158 0.58 -195.54 0.0000000000
2 Kosovo Kosovu 106 69 2830 0.03 -155.68 0.0000000000
3 president predsednik 469 0.04 184 7431 0.07 -139.42 0.0000000000
4 day dan 912 0.08 151 12274 0.12 -119.94 0.0000000000
5 government vlada 412 0.04 138 6396 0.06 -112.15 0.0000000000
6 Belgrade Beograd 1480 0.13 258 17753 0.17 -85.56 0.0000000000
7 after posle 953 0.09 490 12093 0.11 -85.48 0.0000000000
8 millions miliona 177 0.02 104 2933 0.03 -63.14 0.0000000000
9 city grada 159 0.01 104 2584 0.02 -52.49 0.0000000000
10 during tokom 299 0.03 202 4110 0.04 -44.45 0.0000000000
11 now sad 396 0.04 237 5147 0.05 -41.82 0.0000000000
12 last prošle 169 0.02 138 2524 0.02 -38.56 0.0000000000
13 time put 943 0.08 341 10847 0.10 -36.64 0.0000000000
14 saw video 104 80 1711 0.02 -36.06 0.0000000001

75
Table 14

Positive Key Semantic Domains in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference

Corpus (by Rank)

Rank Ethnolinguistic Etnolingvistički Rank Education & Obrazovanje i Rank Literature & Književnost i Rank Foreign Strani jezici
identities identiteti science nauka translation prevođenje languages
1 language jezik 3 linguistic jezički 9 word word 13 English engleski
2 Serbian srpski 5 school škola 16 literature književnost 28 foreign strani
4 dictionary rečnik 6 mother (adj.) maternji 24 literary književni 46 second drugi
7 alphabet pismo 10 instruction nastava 27 learning učenje 52 German nemački
8 Cyrillic ćirilica 11 learn učiti 34 culture kultura 53 Spanish španski
12 Montenegrin crnogorski 14 linguist lingvista 35 speech govor 61 French francuski
18 Croatian hrvatski 15 professor profesor 36 speak govoriti 62 Russian ruski
19 use (n.) upotreba 17 subject predmet 38 translator prevodilac 68 understand razumeti
20 Serbo-Croatian srpskohrvatski 21 education (pro.) prosvete 42 translation prevod 69 be able to moći
25 people narod 22 education obrazovanje 47 expression izraz
26 Bosniak bošnjački 23 grade razred 48 meaning značenje
29 Bosnian bosanski 30 students (K-12) učenici 51 wrote pisali
32 national nacionalni 31 elementary osnovni 63 poetry poezija
33 Vuk (Karadžić) Vuk 40 science nauka 66 writer pisac
37 minority manjina 43 class period čas 72 cultural kulturni
39 Croats Hrvati 45 doctor dr 81 poet pesnik
41 Serbs Srbi 49 school (univ.) fakultet
44 Montenegrins Crnogorci 54 exam ispit
50 label naziv 58 children deca
55 SANU SANU 59 knowledge znanje
56 name ime 60 scientific naučni
57 identity identitet 71 academician akademik
64 introduction uvođenje 73 example primer
65 percent odsto 76 sentence rečenica
67 Monte(negro) Gora 78 letters (a, b, c) slova
70 (Monte)negro Crna 79 students (K-8) đaci
74 nation nacija 82 schooling školovanje
75 Vojvodina Vojvodini 83 program/curric. program
77 today danas 85 book knjiga
80 same isti 87 parents roditelji
84 difference razlika
86 century vek
88 history istorija
89 change (v.) menja
90 law zakon

76
Based on the patterns identified so far in this section, it is clear that the 5+ hits

section of SERBCORP represents a concentrated discourse exhibiting numerous lexical

items and patterns relevant to an exploration of links between language-related discourses

and language ideologies and ethnolinguistic identities and ethnonationalism. However, it

is also quite clear that keywords identified for topically heterogeneous research corpora

such as SERBCORP as a whole or the 5+ hits section of SERBCORP are not as insightful

as those identified for topically homogeneous research corpora such as, for example,

parliamentary debates on a single issue (e.g., Baker, 2006) or student evaluations of

university instructors (e.g., Subtirelu, 2015).29 In other words, despite the identification

of promising lexical items and patterns demonstrated above, a decision about where to

begin analysis or what to focus on would still have to depend on researcher inference.

Therefore, in order to get a better sense of the discursive profile of the 5+ hits

section of SERBCORP, I classified all 90 key lemmas into semantic fields based on their

predominant semantic values in the corpus (confirmed by concordance lines in

ambiguous cases). This semantic classification resulted in four distinct semantic fields

with different numbers of items in each: ethnolinguistic identities (the largest), education

and science, literature and translation, and foreign languages (the smallest; see Table 14).

So, based on this semantic patterning, we can conclude that Serbian newspaper discourse

explicitly focused on language is dominated by references to Central South Slavic

ethnolinguistic identities, education, and, to a lesser extent, literature and translation and

foreign languages. From this, it is possible to further extrapolate that this general

language-related discourse is focused on contested (ethnolinguistic identities) and

uncontested (foreign languages, translation) differences and identities, as well as

77
education and literature as the primary sociocultural domains with respect to which

language is explicitly and overtly thematized. This is an important finding not only

because it gives us a sense of the general language-related (small ‘d’) discourses in

circulation here, but also because a very similar discursive profile emerged from

exploratory factor analysis (see Section 6.3).

Here, it would be possible, as is typically done, to make a selection based on

researcher inference of items to pursue further, for example through concordance

analysis. Let us briefly illustrate the problems with this approach using a small set of

lexical items identified by both keyword and collocation analysis as potentially

interesting examples. In addition to the obviously important Central South Slavic ethno-

and glottonyms, in Section 6.1.1 the following items were identified as some of the

pertinent key lemmas in this corpus: Vuk (Karadžić) (rank 64, 482 occurrences), SANU

‘Serbian Academy of Arts and Sciences’ (rank 66, 235 occurrences), ime ‘name’ (rank 73,

943 occurrences), and nacija ‘nation’ (rank 93, 305 occurrences). Again, the same items

were also identified as significant collocates of the lemma JEZIK: Vuk (Karadžić) (rank

113, 75 occurrences, 50 texts), SANU ‘Serbian Academy of Arts and Sciences’ (rank 120,

70 occurrences, 40 texts), ime ‘name’ (rank 36, 208 occurrences, 96 texts), and nacija

‘nation’ (rank 165, 51 occurrences, 37 texts).

As mentioned above, this selection is based both on researcher inference (which is

in turn itself based on background knowledge and a close reading of large numbers of

texts in the corpus) and quantitative evidence (results of keyword and collocation

analyses). Once identified by quantitative analyses, Vuk (Karadžić) and SANU were thus

deemed to be of potential interest because both Vuk Karadžić as an individual and SANU

78
as an institution have been historically closely linked to issues of language and

ethnolinguistic identity in the Central South Slavic area, which are the primary focus of

this study (for a detailed discussion, see Section 7.3). Similarly, the lexical item ime

‘name’ was deemed to be of potential interest because the naming of the different

varieties of Central South Slavic has been at the center of the public debate and

contestation related to ethnolinguistic identities since the breakup of Yugoslavia. The

lexical item nacija ‘nation’, finally, is an obvious choice of lexical item to investigate in a

study of links between language ideologies and ethnonationalism.

However, as can be seen, even a small set of relevant items presents the

researcher with thousands of occurrences (and hundreds of concordance lines) of

potential interest. The problem of how to deal with large numbers of potentially

interesting occurrences and concordance lines is exacerbated by the extensive inflectional

morphology of Central South Slavic in that, unlike English for example, semantic

patterns are broken up into numerous subpatterns corresponding to individual lemma

forms of both the node word and any collocates (e.g., nacija, nacije, naciji, etc.). This

atomizes the overall semantic profile of the lexical item but also renders lexical software

designed primarily with languages with simpler inflectional morphologies in mind, such

as WST, much less useful. Furthermore, in contrast to topically homogeneous corpora,

the 5+ hits section of SERBCORP features articles that are topically rather

heterogeneous, which presents more of a challenge for pattern analysis based on

concordance lines. Thus, in addition to a total of 482 concordance lines exhibiting

minimally informative lexical patterns, the lexical item Vuk, for example, has a mere 17

significant lexical collocates which show no obvious or easily discernible discursive

79
patterns of import for either language ideology or ethnonationalism.30 As will be shown

in Section 7.3, it is the fact that Vuk Karadžić as a historical figure is featured so

prominently in this corpus rather than any particular concordance or collocational

patterns associated with this lexical item that is important here. Put differently, lexical

items can have high discursive and ideological significance without exhibiting any

explicit collocational (or other) patterns. Concordance analysis therefore does not seem

to be a particularly effective way to either identify or present macroscopic lexical patterns

that can help profile a corpus and capture discourses.

It should be noted, however, that concordance analysis can still be useful for

microscopic lexical analysis during the preliminary stages of discursive corpus profiling

or when confirmation or elaboration of macroscopic lexical patterns are required. For

example, the verb postoji ‘exists’ (rank 48, 161 occurrences, 126 texts) was identified as a

significant collocate of the lemma JEZIK in the 5+ hits section of SERBCORP (see Table

17). Postoji was deemed to be of potential interest because it can be an explicit

expression of contestation as it is often used with a negator (ne postoji ‘does not exist’)

and applied to non-Serb varieties of Central South Slavic. The ‘concordance’ tool in

WST showed that, as a collocate of JEZIK, postoji was most often found in the R2

position.

As can be seen from Figure 3, (ne) postoji ‘(does not) exist’ is indeed applied to

Bosnian (lines 2 and 3) and Montenegrin (lines 4, 5, 6, 7) in an explicit manifestation of

the discourse of contestation.31 Interestingly, (ne) postoji is also applied to Serbian (line

16), but this was the only occurrence in conjunction with Serbian which failed to show up

in any of the subsequent analyses, including qualitative analyses of integral texts.

80
Figure 3. Concordance lines for postoji ‘exists’ in the 5+ hits section of SERBCORP

The conclusion we can draw from this brief demonstration, therefore, is that, if we

are interested in principled decision making and effective methods, analysis of

concordance lines of limited sets of lexical items selected on the basis of researcher

inference is clearly unsatisfactory as a tool for macroscopic discursive corpus profiling or

identification of representative texts. Instead, what is needed is an objective method of

analysis based on statistically significant patterns that can identify not only lexical foci

for analysis (i.e., discourses) but also individual representative texts for follow-up

qualitative analysis.

6.1.3 Keyword associates. The ‘keyword’ function in WST offers a technique

that represents a step in this direction. Based on keyword analysis, a database can be

created of items that are key in several texts in the corpus (key-keywords). This is done

by running separate keyword analyses for each individual text rather than the corpus as a

whole, and the result is information on which keywords are key in a researcher-

81
determined minimum number of texts. In addition to this, the function computes which

keywords (associates) co-occur with each of the key-keywords, forming lexical sets

(clumps) which can be suggestive of discourses and potentially also ideologies. Table 15

shows a list of 20 key-keywords most directly relevant to ethnolinguistic identities and

their clumps (top ten most frequent associates with MI scores ≥ 3). As can be seen, this

is quite an improvement over the keyword list as most key-keywords have discursively

indicative sets of associates. For example, the key-keyword Srbi ‘Serbs’ co-occurs with

the following keywords: Serbian, Croats, Croatian, people, name, literature, national,

academy, Croatia, book, professor, school, linguistic, literary, war, and learn.32 Clearly,

then, texts in which ‘Serbs’ appears as a keyword tend to discuss the Croats and Croatian

language, the language’s name and the (1990s’) war, all of which point to the (big ‘D’)

discourse of contestation mentioned toward the end of the methods section. Further,

there are indications of a discussion of the national academy of sciences and arts (i.e.,

SANU) and linguistics, as well as of education and literature more generally. Similarly,

the key-keywords Crna and Gora ‘Montenegro’, for example, co-occur with

Montenegrin, Serbian, mother (tongue), Montenegrins, label, and official, which suggests

a discourse (of contestation) pertaining to the recent change in language policy in

Montenegro, whereby the erstwhile official language, Serbian, was first replaced by an

identity-neutral label ‘mother tongue’ and then, upon independence from Serbia, by

‘Montengrin’.

To facilitate the methodological comparison further, I also checked if the most

salient (i.e., highest-loading) variables in the 12 factors identified by exploratory factor

analysis (see Section 6.3) showed up as key-keywords. Perhaps unsurprisingly, they all

82
do (Table 16). Thus, the most salient variable in Factor 2 (Cyrillic-Only), alphabet co-

occurs with Cyrillic, Serbian, Latin, official, professor, use (n.), school, English, high

school, and book here and with Cyrillic (n./adj.), use (n.), official, Latin, constitution,

protection, association, and law in Factor 2. The most salient variable in Factor 11

(Officialization of Bosnian), Bosnian, co-occurs with Bosniak, national, minority,

subject, and education here and with Bosniak, elective, element, national, and board in

Factor 11. Thus, the associates of alphabet and Factor 2 both seem to suggest a discourse

of endangerment concerned with the protection of the Serbian Cyrillic from the

(perceived) threat posed by the widespread use of the Latin alphabet in Serbia. The

associates of Bosnian and Factor 11, similarly, suggest a discourse of (contestation) of

minority language rights concerned with the recent official recognition of Bosnian as a

minority language in Serbia and its introduction in schools. The overlap between the

associates and factors, as can be seen, is considerable.

The problem with this analytical technique, however, arises when one decides to

explore these sets of associates further. The number of associates, for one, can be very

large, depending on their overall frequency in the corpus, which means that it may be

necessary to start focusing on the most frequent items as with keyword analysis proper.

Further, it would, for instance, be interesting to look up some (or all) of the texts in which

associates co-occur with a key-keyword for in-depth qualitative analysis. Unfortunately,

this is not possible as there is currently no way to obtain this information automatically

using the ‘associates’ tool (Mike Scott, personal communication). One could, of course,

look for this information manually, but with a research corpus of this size, that is clearly

undesirable.

83
Table 15

Ethnolinguistic Identity-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits

Section of SERBCORP as the Reference Corpus (by Rank/Number of Texts)

N KW KW Texts % Overall No. Associates (English) Associates (Serbian)


(English) (Serbian) Freq. Ass.
2 Serbian srpski 150 18.27 3051 210 literature·linguistic·Cyrillic·literary·alphabet· književnost·jezički·ćirilica·književni·
dictionary·book·Serbs·mother (adj.) ·school pismo·rečnik·knjiga·Srbi·maternji·škola
18 Montenegrin crnogorski 38 4.63 393 98 (Monte)negro·Serbian·Monte(negro) ·Montenegrins· Crna·srpski·Gora·Crnogorci· maternji·
mother (adj.)·literature·linguistic·literary·poetry· književnost·jezički·književni·poezija·
professor·nation·national·poet·orthography·renaming profesor·nacija·nacionalni·pesnik·
pravopis·preimenovanje
22 Serbs Srbi 34 4.14 506 101 Serbian·Croats·Croatian·people·name·literature· srpski·Hrvati·hrvatski·narod·ime·
national·academy·Croatia·book·professor·school· književnost·nacionalni·akademija·Hrvatska
linguistic·literary·war·learn ·knjiga·profesor·škola·jezički·književni·rat
·učiti
31 (Monte)negro Crna 29 3.53 382 90 Monte(negro)·Montenegrin·Serbian· mother (adj.)· Gora·crnogorski·srpski·maternji·Crnogorci
Montenegrins·label·official ·naziv·službeni
33 Monte(negro) Gora 27 3.29 381 89 (Monte)negro·Montenegrin·Serbian·mother (adj.)· Crna·crnogorski·srpski·maternji·Crnogorci
Montenegrins·label·official ·engleski·naziv·službeni
41 national nacionalni 24 2.92 319 91 minority·Bosnian·literature·Serbs·Serbian·learn· manjina·bosanski·književnost·Srbi·srpski·
Croats·school·Bosniak·Montenegrin·and·identity· učiti·Hrvati·škola·bošnjački·crnogorski·
mother (adj.)·nation·people·subject i·identitet·maternji·nacija·narod·predmet
46 Croatian hrvatski 22 2.68 234 93 Serbian·Serbs·linguistic·Bosniak·Croats·Croatia· srpski·Srbi·jezički·bošnjački·Hrvati·
literary·name·literature·poetry·school·dialect·English· Hrvatska·književni·ime·književnost·
and·book·minority·mother (adj.)·poem· poezija·škola·dijalekat·engleski·i·knjiga·
Serbo-Croatian·learn manjina·maternji·pesma·srpskohrvatski·
učiti
53 Bosnian bosanski 16 1.95 143 49 Bosniak·national·minority· subject·education bošnjački·nacionalni·manjina·predmet·
(profession)·Serbian·learn·literary·school prosvete·srpski·učiti·književni·škola
58 Montenegrins Crnogorci 15 1.83 113 55 Montenegrin·Serbian·(Monte)negro·Monte(negro)· crnogorski·srpski·Crna·Gora·nacija
nation
63 Serbo- srpskohrvatski 15 1.83 73 52 Croatian·dictionary·linguistic·Serbian·academy· srpski·jezički·rečnik·akademija·hrvatski·
Croatian language književni
83 Bosniak bošnjački 11 1.34 85 56 Serbian·Bosnian·Croatian·Bosniaks·linguistic·minority· srpski·bosanski·hrvatski·Bošnjaci· jezički·
mother (adj.)·national·standardization·school manjina·maternji·nacionalni·
standardizacija·škola
85 name ime 11 1.34 138 43 Serbs·Serbian·Croatian·Croatia·literature Srbi·srpski·hrvatski·Hrvatska·književnost
87 nation nacija 11 1.34 91 50 minority·Serbian·Montenegrins·Montenegrin·national manjina·srpski·Crnogorci·crnogorski·
nacionalni
93 Croats Hrvati 10 1.22 72 37 Serbs·Croatian·Serbian·national Srbi·hrvatski·srpski·nacionalni

84
N KW KW Texts % Overall No. Associates (English) Associates (Serbian)
(English) (Serbian) Freq. Ass.
101 identity identitet 9 1.10 72 46 national nacionalni
106 Croatia Hrvatska 8 0.97 107 30 language·I·Serbs·name·Croatian jezik·ja·Srbi·ime·hrvatski
110 renaming preimenovanje 8 0.97 41 45 high school·professor·Montenegrin·mother (adj.) gimnazija·profesor·crnogorski·maternji
111 Serbia Srbija 8 0.97 417 49
161 Bosnia Bosna 5 0.61 35 19
164 Herzegovina Hercegovina 5 0.61 37 17
169 nationalism nacionalizam 5 0.61 54 17
195 ethnic etnički 4 0.49 23 15 minority manjina

85
Table 16

Factor-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of

SERBCORP as the Reference Corpus (by Rank)

F KKW KKW Texts % Overall Freq. No. Ass. Associates (English) Associates (Serbian)
(English) (Serbian)
1 grade razred 47 5.72 378 117 school·subject·learn·instructor·education škola·predmet·učiti·nastavnik·prosvete·
(profession)·instruction·English·first-graders·class period· nastava·engleski·prvaci·čas·đaci
students (K-8)
2 alphabet pismo 49 5.97 657 108 Cyrillic·Serbian·Latin·official·professor·use (n.)· ćirilica·srpski·latinica ·službeni·profesor·
school·English·high school·book upotreba·škola ·engleski·gimnazija·knjiga
3 exam ispit 24 2.92 189 69 school·mathematics·students (K-12)·students (K-8)· škola·matematika·učenici·đaci·engleski·
English·high (school)·mother (adj.)·Serbian·high maternji·srpski·gimnazija·nastavnik·prijemni
school·instructor·entrance
4 school fakultet 25 3.05 433 87 professor·instruction·instructor·university·school·English· profesor·nastava·nastavnik·univerzitet·škola·
(university) education·student·literature·program of study engleski·obrazovanje·student·književnost·
studija
5 minority manjina 29 3.53 284 101 mother (adj.)·national·school·Bosnian·minority maternji·nacionalni·škola·bosanski·manjinski·nacija
(n.) (adj.)·nation·education· rights·subject·learn ·obrazovanje·prava·predmet·učiti
6 Croatia Hrvatska 8 0.97 107 30 Croatian·Serbs·name·I·literature hrvatski·Srbi·ime·ja·književnost
7 book knjiga 66 8.04 997 156 Serbian·literature·writer·literary·poetry·poem·poet· srpski·književnost·pisac·književni·poezija·
professor·dictionary·award pesma·pesnik·profesor·rečnik·nagrada
8 Montenegri crnogorsk 38 4.63 393 98 (Monte)negro·Serbian·Monte(negro)·Montenegrins·moth crna·srpski·gora·crnogorci·maternji·
n i er (adj.)·literature·linguistics·literary·poetry·professor književnost·jezički·književni·poezija·profesor
9 teach predavati 6 0.73 45 30 instructor nastavnik
10 linguistic jezički 60 7.31 367 122 Serbian·dictionary·literary·dialect·English·speech·literatu srpski·rečnik·književni·dijalekat·engleski·
re·people·Croatian·school govor·književnost·narod·hrvatski·škola
11 Bosnian bosanski 16 1.95 143 49 Bosniak·national·minority·subject·education bošnjački·nacionalni·manjina·predmet·
(profession)·Serbian·learn·literary·school prosvete·srpski·učiti·književni·škola
12 center centar 4 0.49 56 13

86
WST includes several other tools for the exploration of co-occurrences among

keywords (see Scott, 2014a, for details) such as the ‘keywords plot’ and ‘links’, which

calculate and plot a keyword’s collocates (i.e., keywords that occur within a researcher-

defined collocation span of the chosen keyword). However, this analysis only works with

individual texts so its usefulness for our purposes is limited. Another option is to take a

phrasal approach to keywords and calculate keyword clusters (i.e., n-grams), but this

technique only uses keywords which makes it highly unlikely to produce a sufficient

number of observations for analysis. With this, my exploration of keyword analysis here

is complete. In the next section, I examine the results of collocation analysis as applied

in this study.

6.2 Collocation Analysis

This section presents the results of collocation analyses of the lemma JEZIK

conducted on the 5+ hits section of SERBCORP (for results and discussion of collocation

analysis performed on SERBCORP as a whole, see Appendix F). Lists of collocates are

presented first by frequency and then also by MI score. As with keywords, only the top

50 collocates are shown in the tables in the body of the text; full lists are presented in

appendices. Lastly, a sample of the most frequent n-grams in the 5+ hits section of

SERBCORP is shown and examined.

6.2.1 Collocation analysis (5+ hits section of SERBCORP). Collocation

analysis of the lemma JEZIK conducted on the 5+ hits section of SERBCORP produced a

total of 305 lemma collocates of the lemma JEZIK (Appendix G, Tables G1 and G2).

Table 17 shows the top 50 lemma collocates by frequency. As in SERBCORP (Appendix

F, Table F1), the most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’

87
with 3,449 occurrences in 802 texts.

The prominent semantic fields are similar to those in SERBCORP, with most

high-ranking items suggesting a discourse of construction and maintenance of national

identity (see Appendix F). However, items referring to literature are now unaccompanied

by items referring to translation, while the semantic field of culture remains marginal

(one item). The semantic fields of school and foreign languages are somewhat less

prominent also. Considering the prominence of items referring to translation and

education above (particularly among key lemmas in SERBCORP), it seems safe to

conclude at this point that these two fields account for much of the information ‘loss’ due

to sampling. On the other hand, in line with the above demonstrated trend of increased

relevance of articles with higher numbers of hits for the lemma JEZIK, srpski ‘Serbian’,

hrvatski ‘Croatian’ (rank 15, 363 occurrences), and crnogorski ‘Montenegrin’ (rank 16,

351 occurrences) are joined by srpskohrvatski ‘Serbo-Croatian’ (rank 40, 182

occurrences) and bosanski ‘Bosnian’ (rank 46, 165 occurrences) in the top 50. Other

pertinent items remain: narod ‘people’ (rank 27, 252 occurrences) and, nacionalni

‘national’ (rank 31, 222 occurrences), ime ‘name’ (rank 36, 208 occurrences), postoji

‘exists’ (rank 48, 161 occurrences), with the addition of novi ‘new’ (rank 25, 254

occurrences) and pitanje ‘question’ (rank 45, 168 occurrences). Pismo ‘alphabet’ (rank

10, 457 occurrences) and rečnik ‘dictionary’ (rank 50, 158 occurrences), finally, suggest

a discourse on language policy, and specifically selection and codification (Haugen,

1972).

88
Table 17

Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of

SERBCORP (by Frequency)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 Serbian srpski 8.80 802 3449
2 foreign strani 13.02 346 1011
3 that taj 7.64 484 791
4 English engleski 12.63 323 693
5 mother maternji 8.86 296 636
6 own svoj 7.70 360 608
7 speak govoriti 10.26 321 552
8 second drugi 11.05 307 507
9 itself sam 5.11 338 494
10 alphabet pismo 9.19 165 457
11 literature književnost 7.58 220 456
12 one jedan 6.22 281 454
13 all svi 8.24 290 406
14 our naš 8.91 257 401
15 Croatian hrvatski 8.10 157 363
16 Montenegrin crnogorski 12.77 112 351
17 learn učiti 9.51 175 340
18 literary književni 9.16 176 335
19 school škola 6.97 179 318
20 they oni 5.86 245 310
21 official službeni 14.15 107 285
22 use upotreba 9.35 136 274
23 this ovaj 6.53 211 265
24 instruction nastava 8.30 149 262
25 new nov 10.26 172 254
26 word reč 5.86 180 253
27 people narod 7.09 153 252
28 year godina 5.36 186 239
29 professor profesor 8.14 145 239
30 culture kultura 8.74 153 231
31 national nacionalni 8.64 126 222
32 first prvi 6.08 154 222
33 learning učenje 7.28 136 222
34 French francuski 11.78 115 215
35 his njegov 7.45 165 213
36 name ime 10.61 96 208
37 two dva 5.72 138 206
38 Russian ruski 7.98 89 202
39 say kazati 9.73 152 187
40 Serbo-Croatian srpskohrvatski 8.76 96 182
41 German nemački 7.66 112 177
42 subject predmet 11.06 103 170
43 book knjiga 5.41 118 169
44 their njihov 7.48 135 168
45 question pitanje 5.88 108 168
46 Bosnian bosanski 9.30 54 165
47 written pisan 11.83 119 163
48 exists postoji 9.38 126 161
49 know znati 9.44 122 159
50 dictionary rečnik 6.66 77 158

89
Table 18

Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of

SERBCORP (by MI Score)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 official službeni 14.15 107 285
2 different različit 13.87 61 83
3 Slovene slovenski 13.86 58 101
4 learn naučiti 13.36 66 78
5 department katedra 13.25 57 91
6 foreign strani 13.02 346 1011
7 use koristiti 12.79 55 76
8 Montenegrin crnogorski 12.77 112 351
9 English engleski 12.63 323 693
10 introduce uvesti 12.45 41 55
11 special poseban 11.94 71 93
12 written pisan 11.83 119 163
13 French francuski 11.78 115 215
14 people’s narodni 11.67 76 143
15 say reći 11.64 64 74
16 nation nacija 11.23 37 51
17 Latin latinica 11.10 27 41
18 subject predmet 11.06 103 170
19 second drugi 11.05 307 507
20 official zvaničan 10.91 72 112
21 common zajednički 10.79 64 83
22 group grupa 10.75 43 65
23 difference razlika 10.73 35 44
24 standardization standardizacija 10.72 56 117
25 name ime 10.61 96 208
26 poetry poezija 10.49 63 50
27 see videti 10.42 23 27
28 section odeljenje 10.40 38 59
29 use služiti 10.39 29 34
30 speak govoriti 10.26 321 552
31 new nov 10.26 172 254
32 contemporary savremeni 10.23 73 106
33 spoken govorni 10.14 28 39
34 violence nasilje 10.11 23 28
35 desire želeti 10.03 49 53
36 Hungarian mađarski 9.85 51 108
37 Albanian albanski 9.84 37 76
38 say kazati 9.73 152 187
39 engage baviti 9.66 39 46
40 elementary osnovni 9.64 107 138
41 system sistem 9.58 29 34
42 renaming preimenovanje 9.54 45 94
43 learn učiti 9.51 175 340
44 title naslov 9.50 26 38
45 dialect dijalekat 9.50 46 78
46 her njen 9.47 29 33
47 written napisan 9.45 29 35
48 know znati 9.44 122 159
49 basis osnov 9.41 60 72
50 translate prevoditi 9.40 42 56

90
The top most significant collocates (Table 18) present a more opaque pattern, with

considerably more attributive adjectives in the list, as above: službeni ‘official’ (rank 1,

285 occurrences) and zvaničan ‘official’ (rank 20, 112 occurrences), različit ‘different’

(rank 2, 83 occurrences), strani ‘foreign’ (rank 6, 1,011 occurrences), poseban ‘special’

(rank 11, 93 occurrences), pisan ‘written’ (rank 47, 35 occurrences), zajednički

‘common’ (rank 21, 83 occurrences), nov ‘new’ (rank 31, 254 occurrences), savremeni

‘contemporary’ (rank 32, 106 occurrences), govorni ‘spoken’ (rank 33, 39 occurrences).

However, we do note references to minorities: albanski ‘Albanian’ (rank 37, 76

occurrences) and mađarski ‘Hungarian’ (rank 36, 108 occurrences), and a set of most

pertinent items suggesting identity contestation such as crnogorski ‘Montenegrin’ (rank

8, 351 occurrences), ime ‘name’ (rank 25, 208 occurrences), and preimenovanje

‘renaming’ (rank 42, 94 occurrences). New, previously unattested items include nacija

‘nation’ (rank 16, 51 occurrences), latinica ‘Latin (alphabet)’ (rank 17, 41 occurrences),

grupa ‘group’ (rank 22, 65 occurrences), and nasilje ‘violence’ (rank 34, 28 occurrences).

In conclusion, similar to keywords, collocation patterns present the analyst with a

wealth of information which is difficult to explore in an efficient but principled manner,

although, again, frequency does seem to be a better guide to insights into discursive

patterns than MI scores. However, similar to keyword analysis, collocation analysis

offers another analytical technique33 which has the potential to increase its usefulness for

the identification of discourses and ideologies, to which we now turn.

6.2.2 N-grams. Given the demonstrated higher relevance of the 5+ hits articles

for our purposes here, n-gram analysis was conducted on the 5+ hits section of

SERBCORP only. As expected, the total number of identified recurrent phrases was

91
large (3,753), and most were bigrams (2,381). The list of bigrams presented here (Table

19) is a researcher-selected sample from the top 100 most frequent phrases in every

category.

Again, similar to keyword associates, this represents an improvement on collocate

lists as we can now see the node word (here, different forms of the lemma JEZIK) in a

variety of phrasal contexts. Further, although this n-gram analysis is based on the

concordance lines of different forms of the lemma JEZIK, we can see a large number of

relevant phrases that do not contain any of the forms of this lemma. Because n-grams are

based on lexical patterns identified by collocation analysis, we are likely to see a lot of

the same items, only contextualized. Indeed, if we look at the bigrams here, we can see

many of the same lexical items and traces of that same discourse of construction and

maintenance of national identity in the (implied) opposition of phrases such as srpski

jezik ‘Serbian language’, maternji jezik ‘mother tongue’, svoj jezik ‘own language’, naš

jezik ‘our language’ (ranks 1, 2, 5, 7), on the one hand, and the phrases strani jezik

‘foreign language’, engleski jezik ‘English language’, and drugi (strani) jezik ‘second

(foreign) language’ (ranks 3, 4,11), on the other. We also note the high prominence of

phrases pointing to regional ethnolinguistic identities: srpski jezik ‘Serbian language’,

crnogorski jezik ‘Montenegrin language’, hrvatski jezik ‘Croatian language’, and

bosanski jezik ‘Bosnian language’ (ranks 1, 6, 9,10). The top trigrams, for example,

confirm the association between language and literature (jezik i književnost ‘language and

literature, rank 15) and language and culture (jezik i kulturu ‘language and culture, rank

26), but also point to a discourse related to language policy (jezik i pismo ‘language and

alphabet’, rečnik srpskog jezika ‘dictionary [of the] Serbian language’, odbor za

92
standardizaciju ‘board for standardization’, ranks 16, 18, 19) and a discourse of

contestation (preimenovanje srpskog jezika ‘renaming [of the] Serbian language’, o

preimenovanju jezika ‘about [the] renaming [of] language’, ranks 22, 24).

The top n-grams with four, five and six constituents confirm these and other

already attested patterns. Thus, in the 4-gram section, we see more evidence of a

discourse on minority language rights (na jezicima nacionalnih manjina ‘in [the]

languages [of] national minorities’, rank 32), endangerment (za zaštitu srpskog jezika ‘for

[the] protection [of the] Serbian language’, za odbranu srpskog jezika ‘for [the] defense

[of the] Serbian language’, ranks 35, 38), and contestation (preimenovanje srpskog jezika

u ‘renaming [of the] Serbian language into’, ne postojanju crnogorskog književnog ‘non-

existence [of the] Montenegrin literary’, o ne postojanju crnogorskog ‘about [the] non-

existence [of] Montenegrin’, ranks 34, 39, 42). In the 5-gram section, we see traces of a

discourse of language policy involving institutional control over language (odbora za

standardizaciju srpskog jezika ‘board for [the] standardization [of the] Serbian language’,

zakon o službenoj upotrebi jezika ‘law on [the] official use [of] language’, institut za

srpski jezik SANU ‘SANU institute for [the] Serbian language’, ranks 45, 54, 59), as well

as discourses of endangerment (e.g., primena latiničnog pisma srpskog jezika ‘use [of

the] Latin alphabet [of the] Serbian language’, sačuvati sopstveni jezik [i] njegovu

posebnost ‘preserve [one’s] own language [and] its autonomy’, rat za srpski jezik i ‘war

for [the] Serbian language and, ranks 58, 60, 65), contestation (e.g., srpski jezik

preimenuju u crnogorski ‘rename [the] Serbian language into [the] Montenegrin’, rank

63), and ethnolinguistic identity (svoju nacionalnost i svoj jezik ‘[one’s] own nationality

and [one’s] own language’, rank 64).

93
Table 19

Sample of the Most Frequent N-grams in the 5+ Hits Section of SERBCORP (by Number

of Constituents and Frequency)

N N-gram (English) N-gram (Serbian) Freq.


2-grams
1 Serbian language srpski jezik 955
2 mother tongue maternji jezik 267
3 foreign language strani jezik 258
4 English language engleski jezik 173
5 own language svoj jezik 120
6 Montenegrin language crnogorski jezik 96
7 our language naš jezik 92
8 literary language književni jezik 89
9 Croatian language hrvatski jezik 81
10 Bosnian language bosanski jezik 72
11 second language drugi jezik 69
12 one language jedan jezik 68
13 Russian language ruski jezik 59
14 official language službeni jezik 57
3-grams
15 language and literature jezik i književnost 149
16 language and alphabet jezik i pismo 55
17 second foreign language drugi strani jezik 31
18 dictionary (of the) Serbian language rečnik srpskog jezika 23
19 board for standardization odbor za standardizaciju 21
20 two foreign languages dva strana jezika 19
21 Serbian and Croatian srpskog i hrvatskog 18
22 renaming (of the) Serbian language preimenovanje srpskog jezika 17
23 as an elective kao izborni predmet 17
24 about (the) renaming (of) language o preimenovanju jezika 15
25 science of language nauka o jeziku 15
26 language and culture jezik i kulturu 15
4-grams
27 Serbian language and literature srpski jezik i književnost 84
28 for (the) standarization (of the) Serbian language za standardizaciju srpskog jezika 63
29 Institute for (the) Serbian language instituta za srpski jezik 36
30 board for (the) standardization (of) Serbian odbora za standardizaciju srpskog 32
31 about (the) official use (of the) language o službenoj upotrebi jezika 28
32 in (the) languages (of) national minorities na jezicima nacionalnih manjina 26
33 (of the) literary and people’s language književnog i narodnog jezika 21
34 renaming (of the) Serbian language into preimenovanje srpskog jezika u 15
35 for (the) protection (of the) Serbian language za zaštitu srpskog jezika 12
36 mother tongue and literature maternji jezik i književnost 12
37 law on official use zakon o službenoj upotrebi 11
38 for (the) defense (of the) Serbian language za odbranu srpskog jezika 10
39 non-existence (of the) Montengrin literary ne postojanju crnogorskog književnog 8
40 existence (of the) Montenegrin literary language postojanju crnogorskog književnog jezika 8
41 Serbian science (of) language srpska nauka o jeziku 7
42 about (the) non-existence (of) Montenegrin o ne postojanju crnogorskog 7
43 foreign language and mathematics strani jezik i matematika 7
5-grams
44 for (the) Serbian language and literature za srpski jezik i književnost 45
45 board for (the) standardization (of the) Serbian language odbora za standardizaciju srpskog jezika 32
46 official use (of) language and alphabet službenoj upotrebi jezika i pisma 26
47 language with elements (of) national culture jezik sa elementima nacionalne kulture 25
48 (of the) department (of) Serbian language and odseka za srpski jezik i 21
49 (of the) SANU institute for (the) Serbian language instituta za srpski jezik SANU 21
50 (of the ) Serbo-Croatian literary and people’s language srpskohrvatskog književnog i narodnog jezika 14
51 in mathematics and mother tongue iz matematike i maternjeg jezika 13
52 Serbian language and (the) Cyrillic alphabet srpski jezik i ćirilično pismo 11
53 Serbian language in official use u službenoj upotrebi srpski jezik 11
54 law on (the) official use (of) language zakon o službenoj upotrebi jezika 11
55 for (the) protection (of the) Cyrillic (of the) Serbian language za zaštitu ćirilice srpskog jezika 11

94
N N-gram (English) N-gram (Serbian) Freq.
56 association for (the) protection (of the) Cyrillic (of) Serbian udruženja za zaštitu ćirilice srpskog 8
57 non-existence (of the) Montenegrin literary language ne postojanju crnogorskog književnog jezika 8
58 use (of the) Latin alphabet (of the) Serbian language primena latiničnog pisma srpskog jezika 7
59 SANU institute for (the) Serbian language institut za srpski jezik SANU 6
60 preserve (one’s) own language and (its) autonomy sačuvati sopstveni jezik njegovu posebnost 6
61 rename (the) language into (the) Montenegrin language jezik preimenuju u crnogorski jezik 6
62 Serbian language into mother tongue srpski jezik u maternji jezik 6
63 rename (the) Serbian language into Montenegrin srpski jezik preimenuju u crnogorski 6
64 (one’s) own nationality and (one’s) own language svoju nacionalnost i svoj jezik 5
65 war for (the) Serbian language and rat za srpski jezik i 5
66 professor (of the) Serbian language and literature profesor srpskog jezika i književnosti 5
6-grams
67 about (the) official use (of) language and alphabet o službenoj upotrebi jezika i pisma 25
68 (of the) department of (the) Serbian language and literature odseka za srpski jezik i književnost 18
69 Bosnian language with elements of national culture bosanski jezik sa elementima nacionalne kulture 13
70 association for protection (of) Cyrillic (of the) Serbian language udruženja za zaštitu ćirilice srpskog jezika 8
71 dictionary (of) Serbo-Croatian literary and people’s language rečnik srpskohrvatskog knjiž. i narodnog jezika 8
72 students (in the) department (of the) Serbian language and studenti odseka za srpski jezik i 7
73 rename (the) Serbian language into (the) Montengrin language srpski jezik preimenuju u crnogorski jezik 6
74 subject (of) Serbian language into mother tongue predmeta srpski jezik u maternji jezik 6
75 in (the) languages (of) national minorities and for na jezicima nacionalnih manjina i za 5
76 official use (of) other languages and alphabets službena upotreba drugih jezika i pisama 5
77 Serbian language and literature into mother srpski jezik i književnost u maternji 5
78 chair (of) board for (the) standardization (of) Serbian language preds. odbora za standardizaciju srpskog jezika 5
79 fellow (of the) SANU institute for (the) Serbian language saradnik instituta za srpski jezik SANU 5
80 war for (the) Serbian language and alphabet rat za srpski jezik i pravopis 5

Finally, in the 6-gram section, we see many of the same phrases, only more

complete (e.g., udruženja za zaštitu ćirilice srpskog jezika ‘association for [the]

protection [of the] Cyrillic [of the] Serbian language’, srpski jezik preimenuju u

crnogorski jezik ‘rename [the] Serbian language into [the] Montenegrin language’, rat za

srpski jezik i pravopis ‘war for [the] Serbian language and orthography’, ranks 70, 73, 80)

and a phrase pointing to a conception of language in terms of minority ethnocultural

rights (bosanski jezik sa elementima nacionalne kulture ‘Bosnian language with elements

of national culture’, rank 69).

As can be seen, however, even this sample of n-grams (80/3,753 or .02%)

presents an amount of information which is not easily dealt with by an analyst. In other

words, although we have been able to identify a certain number of patterns pointing to

language-related discourses with clear ideological implications such as those of

endangerment, institutional control, minority rights, and contestation, we cannot be sure

95
we are not missing anything and, of course, it is still difficult to make a principled

decision about what to actually focus our analysis on. Also, even if we decided that this

was enough information to choose a focus, how do we identify the most representative

texts for qualitative analysis, for example? As mentioned above, available research

favors examination of concordance lines at this point, but again we are dealing with

thousands of occurrences of potentially relevant lexical items and hundreds of

concordance lines. I noted above that Hunston (2002), for instance, suggests

concentrating on a random sample of concordance lines to get around this problem, but

that hardly solves the problem. Similarly, using the ‘plot’ function to identify texts with

the highest numbers of hits for a particular lemma form of the node word (as in Vessey,

2013a) seems inadequate and ineffective if we are looking to account for the corpus as a

whole. Fortunately, exploratory factor analysis and cluster analysis seem to provide

solutions for these and a number of other issues in corpus-based discourse and ideology

research.

6.3 Exploratory Factor Analysis

Factor analysis resulted in the adoption of a 12-factor solution accounting for

34.13% of the total variance in the data. In contrast to Fitzsimmons Doolan (2011, 2014),

however, the factors were not interpreted as language ideologies, but rather as indicators

of the most salient topics and thus discourses in the data. The factors were labeled as

follows: Language education (5.40 %), Cyrillic-only (3.18 %), Entrance exams (3.16 %),

Officialization of Montenegrin 1 (2.83 %), Minority language rights (2.72 %),

Contestation over language ownership and name (2.67 %), Literature and publishing

(2.61 %), Officialization of Montenegrin 2 (2.61 %), Foreign language education (2.60

96
%), Linguistics as a science, lexicography, standardization and contestation (2.55 %),

Officialization of Bosnian (2.10 %), and Linguacultural diplomacy, language, and culture

(1.73 %). Descriptive statistics for all of the variables in the preferred factor solution are

presented in Table 20. The eigenvalues of the unrotated factor analysis are shown in

Table 21. Figure 4 shows the scree plot of eigenvalues. Table 22 presents the rotated

factor patterns for the 12-factor solution using the Varimax rotation. Finally, Table 23

shows a summary of the factorial structure, with the salient collocates for each factor

(those loading at ≥ .30) and their factor loadings.

Table 20

Descriptive Statistics for the Variables in the 12-factor Solution (N = 943, k = 107)

Collocate of JEZIK Mean Minimum Maximum Range Standard


English Serbian value value deviation
academy akademija 0.3 0.0 14.6 14.6 1.1
association udruženje 0.2 0.0 16.0 16.0 1.2
attend pohađati 0.1 0.0 14.4 14.4 0.8
authorities vlast 0.4 0.0 11.8 11.8 1.2
be able to moći 0.4 0.0 11.5 11.5 1.1
begin početi 0.6 0.0 23.5 23.5 1.5
Belgrade Beograd 1.1 0.0 35.3 35.3 2.3
board odbor 0.5 0.0 29.4 29.4 2.2
book knjiga 2.1 0.0 33.1 33.1 4.1
Bosniak bošnjački 0.3 0.0 25.3 25.3 1.7
Bosnian bosanski 0.3 0.0 28.6 28.6 2.0
call zvati 0.3 0.0 11.8 11.8 0.9
center centar 0.5 0.0 20.2 20.2 1.7
children deca 0.3 0.0 11.1 11.1 0.9
class period čas 0.6 0.0 24.4 24.4 2.0
common zajednički 0.3 0.0 19.6 19.6 0.9
community zajednica 0.4 0.0 18.9 18.9 1.5
compulsory obavezan 0.2 0.0 12.2 12.2 1.0
constitution ustav 0.5 0.0 30.5 30.5 2.4
course kurs 0.2 0.0 22.0 22.0 1.3
Croatia Hrvatska 0.4 0.0 22.2 22.2 1.6
Croatian hrvatski 0.8 0.0 21.4 21.4 2.4
Croats Hrvati 0.3 0.0 16.7 16.7 1.2
cultural kulturni 0.6 0.0 13.2 13.2 1.4
culture kultura 1.5 0.0 31.8 31.8 2.9
curriculum program 1.0 0.0 37.8 37.8 2.7
Cyrillic (n.) ćirilica 0.7 0.0 32.7 32.7 3.0
Cyrillic (adj.) ćirilično 0.1 0.0 10.9 10.9 0.6
decision odluka 0.5 0.0 18.8 18.8 1.8
department odsek 0.1 0.0 16.4 16.4 0.8
department katedra 0.3 0.0 15.3 15.3 1.3
dictionary rečnik 0.8 0.0 42.0 42.0 3.5
edition izdanje 0.3 0.0 14.1 14.1 1.1
education obrazovanje 0.8 0.0 27.0 27.0 2.2
elective izborni 0.1 0.0 15.6 15.6 0.8
element element 0.1 0.0 6.8 6.8 0.5
elementary osnovna 1.0 0.0 21.5 21.5 2.2
exam ispit 0.4 0.0 19.6 19.6 2.0

97
Collocate of JEZIK Mean Minimum Maximum Range Standard
English Serbian value value deviation
expression izraz 0.3 0.0 10.4 10.4 1.0
first prvi 2.0 0.0 30.9 30.9 2.5
foreign strani 2.2 0.0 31.4 31.4 4.0
framework okvir 0.3 0.0 11.8 11.8 0.9
grade razred 0.7 0.0 24.4 24.4 2.5
high school (gen.) srednja 0.3 0.0 12.2 12.2 1.0
high school (acad.) gimnazija 0.4 0.0 19.7 19.7 1.7
Hungarian mađarski 0.2 0.0 47.1 47.1 1.7
institute institut 0.3 0.0 24.8 24.8 1.7
instruction nastava 1.0 0.0 36.2 36.2 3.0
instructors nastavnici 0.6 0.0 23.4 23.4 2.1
interest interesovanje 0.2 0.0 7.8 7.8 0.6
introduction uvođenje 0.2 0.0 14.3 14.3 1.0
knowledge znanje 0.6 0.0 16.6 16.6 1.7
Latin latinica 0.5 0.0 36.1 36.1 2.5
law zakon 0.6 0.0 29.4 29.4 2.1
level nivo 0.3 0.0 22.1 22.1 1.2
linguist lingvista 0.3 0.0 13.7 13.7 1.1
linguistic jezički 0.9 0.0 19.8 19.8 2.0
linguistic lingvistički 0.2 0.0 10.1 10.1 0.7
linguistics lingvistika 0.1 0.0 12.5 12.5 0.7
literary književni 0.9 0.0 20.0 20.0 2.1
literature književnost 1.4 0.0 34.4 34.4 3.2
mathematics matematika 0.3 0.0 19.3 19.3 1.5
minority manjina 0.6 0.0 31.1 31.1 2.8
Montenegrin crnogorski 0.9 0.0 40.0 40.0 3.4
Montenegrins Crnogorci 0.2 0.0 19.3 19.3 1.2
Montenegro Crna 1.2 0.0 36.1 36.1 3.8
mother (adj.) maternji 0.8 0.0 24.6 24.6 2.2
name ime 0.9 0.0 22.5 22.5 2.2
national nacionalni 1.4 0.0 27.6 27.6 3.4
Nikšić Nikšić 0.1 0.0 15.3 15.3 0.9
official službeni 0.6 0.0 43.2 43.2 2.9
part deo 1.2 0.0 20.7 20.7 2.9
philology filološki 0.2 0.0 9.9 9.9 0.9
philosophy filozofski 0.2 0.0 8.7 8.7 0.8
poem pesma 0.4 0.0 21.3 21.3 1.8
poetry poezija 0.5 0.0 24.5 24.5 2.0
professor profesor 1.8 0.0 39.2 39.2 3.9
protection zaštita 0.2 0.0 10.5 10.5 0.8
publish objaviti 0.4 0.0 14.7 14.7 1.1
published objavljen 0.3 0.0 8.3 8.3 0.8
renaming preimenovanje 0.1 0.0 15.3 15.3 0.9
rights prava 0.9 0.0 33.3 33.3 2.2
Romanian rumunski 0.1 0.0 15.0 15.0 0.8
Ruthenian rusinski 0.1 0.0 12.7 12.7 0.8
SANU SANU 0.2 0.0 19.6 19.6 1.3
school (K-12) škola 2.8 0.0 35.8 35.8 6.0
school (univ.) fakultet 1.1 0.0 38.8 38.8 3.3
science nauka 0.7 0.0 23.6 23.6 1.8
scientific naučni 0.2 0.0 11.1 11.1 0.9
alphabet pismo 1.7 0.0 53.8 53.8 5.4
section odeljenje 0.4 0.0 21.9 21.9 1.7
Serbian srpski 8.5 0.0 68.6 68.6 10.2
Serbo-croatian srpskohrvatski 0.2 0.0 29.1 29.1 1.3
Serbs Srbi 1.2 0.0 35.1 35.1 3.1
Slovak slovački 0.1 0.0 12.7 12.7 0.7
standard standardni 0.1 0.0 7.5 7.5 0.5
students (K-12) učenici 0.7 0.0 19.4 19.4 2.4
students (univ.) studenti 0.7 0.0 32.8 32.8 2.9
study učiti 0.9 0.0 27.8 27.8 2.6
subject predmet 0.9 0.0 32.8 32.8 2.9
teach predavati 0.2 0.0 12.1 12.1 1.0
teachers učitelji 0.2 0.0 15.7 15.7 0.8
use (n.) upotreba 0.8 0.0 31.8 31.8 2.5
war rat 0.4 0.0 16.3 16.3 1.2
word reč 2.6 0.0 53.5 53.5 4.6
work delo 1.3 0.0 25.9 25.9 2.2

98
Collocate of JEZIK Mean Minimum Maximum Range Standard
English Serbian value value deviation
writer pisac 0.8 0.0 30.1 30.1 2.1

Table 21

First 13 Eigenvalues of the Unrotated Factor Analysis (N = 943, k = 107)

Factor number Eigenvalue % of shared variance


1 9.213 8.610
2 5.352 5.002
3 4.297 4.016
4 3.846 3.595
5 3.376 3.155
6 3.124 2.920
7 2.955 2.762
8 2.781 2.599
9 2.547 2.380
10 2.430 2.271
11 2.033 1.900
12 1.951 1.823
13 1.795 1.678

Figure 4. Scree plot of eigenvalues

99
Table 22

Rotated Factor Patterns for the 12-factor Solution (Varimax Rotation)

English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12


academy akademija .035 -.038 -.022 .027 .018 .463 .047 .004 -.029 .191 -.107 -.014
association udruženje .001 .356 -.004 .002 -.032 .043 .059 .112 -.014 -.072 -.028 .096
attend pohađati .353 -.037 .309 .006 .064 -.032 -.093 .025 .000 -.056 .093 .101
authorities vlast .152 .058 -.030 .096 .073 -.002 -.127 .392 -.072 -.059 .079 -.127
be able to moći .131 -.028 .262 -.057 .059 -.083 -.009 -.003 .482 -.020 .035 -.114
begin početi .194 -.091 -.018 -.001 -.010 .112 .033 .019 .375 -.047 -.120 .013
Belgrade Beograd .002 .010 .061 .209 -.065 -.055 .020 -.134 -.046 -.023 -.014 .349
board odbor .116 .054 -.005 .155 .076 -.008 -.042 .010 -.055 .220 .351 .024
book knjiga -.089 -.027 -.064 -.027 -.032 -.046 .562 -.067 -.059 .014 -.020 .016
Bosniak bošnjački .047 -.009 -.015 .054 .002 .174 -.046 .143 -.029 .051 .669 -.028
Bosnian bosanski .080 -.025 -.020 -.043 -.008 .094 -.025 .048 -.006 -.003 .804 -.015
call (v.) zvati -.052 -.049 -.053 -.003 -.066 .334 -.068 .102 -.018 .033 .065 -.127
center centar -.017 -.023 -.022 .079 -.030 -.055 -.055 -.018 .003 -.021 -.002 .469
children deca .527 -.064 .184 .001 -.023 -.077 -.157 .010 .010 -.125 -.022 .031
class period čas .611 -.041 .142 -.020 -.043 -.044 -.073 .026 .123 -.056 .038 .053
common zajednički .007 -.012 -.046 .003 -.007 .136 -.033 .000 .461 .095 .016 -.049
community zajednica -.038 .043 -.040 .001 .384 .021 -.097 .124 -.002 -.027 .013 -.014
compulsory obavezan .648 .017 .027 -.040 -.014 -.061 -.069 -.017 -.018 -.020 .092 -.045
constitution ustav -.040 .464 -.048 .090 .117 -.052 -.116 .186 -.064 .001 -.033 -.142
course kurs .123 -.048 .005 .109 .004 -.065 -.155 -.037 .045 -.056 -.051 .394
Croatia Hrvatska -.043 -.012 .000 .005 .047 .660 -.042 -.075 .002 .035 .011 -.057
Croatian hrvatski -.067 .099 .005 .013 .117 .622 -.085 -.026 .011 .136 .121 -.070
Croats Hrvati -.041 .034 -.005 -.021 .137 .659 -.091 -.030 -.007 .130 .081 -.028
cultural kulturni -.094 .134 -.044 -.033 -.005 .029 .109 .045 -.028 .004 .005 .421
culture kultura -.101 .029 -.071 .022 -.020 -.040 .163 -.001 -.004 .044 .096 .377
curriculum program .353 -.040 -.038 .112 .063 -.034 -.161 -.047 .092 -.028 .005 .175
Cyrillic (n.) ćirilica -.019 .735 .002 -.053 -.135 .125 .024 -.016 -.023 -.064 .016 .084
Cyrillic (adj.) ćirilično -.024 .518 -.023 .018 .055 -.078 -.058 -.023 .002 .071 -.023 -.066
decision odluka .018 .057 .046 .382 .044 -.045 -.066 .219 -.106 .013 .054 -.212
department odsek -.022 -.015 .014 .459 -.026 .046 .003 .160 -.051 .064 -.018 .052
department katedra .032 -.015 -.077 .387 .026 -.010 .065 -.105 .314 .037 -.007 .134
dictionary rečnik .011 -.042 -.023 -.060 -.019 -.044 .104 -.060 -.100 .485 -.058 .011
edition izdanje .008 -.045 -.035 -.050 -.017 -.124 .409 -.056 -.088 .314 -.054 .017
education obrazovanje .302 .010 .096 .277 .112 -.104 -.151 -.013 .028 -.031 .015 -.008
elective izborni .501 -.016 -.033 -.043 .035 -.039 -.042 .009 -.046 -.024 .523 -.018
element element .116 -.002 -.035 -.051 .022 -.042 -.002 -.012 -.026 -.031 .477 -.005
elementary osnovna .608 .011 .255 -.003 .022 -.072 -.084 -.020 .287 -.002 .017 -.058
exam ispit -.006 -.029 .758 -.032 -.029 -.042 -.047 -.035 .233 -.050 -.035 .030
expression izraz -.075 .003 -.063 -.088 -.077 -.020 .016 -.085 -.043 .311 .012 -.128

100
English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
first prvi .477 -.004 .100 -.030 -.063 -.002 .127 -.088 .093 -.026 -.007 -.071
foreign strani .328 -.035 .054 -.048 -.039 -.112 -.141 -.045 .450 .017 -.102 .061
framework okvir .245 .000 .006 .043 -.014 -.080 -.026 -.024 .408 .091 .008 .018
grade razred .797 -.015 .176 -.049 -.025 -.048 -.055 .010 .322 -.048 .005 -.085
high school (gen.) srednja .084 .026 .631 .006 .008 -.031 -.061 -.033 .016 -.081 -.023 -.025
high school (acad.) gimnazija .115 -.042 .532 .206 -.030 -.061 -.052 .075 -.033 -.031 -.015 -.107
Hungarian mađarski -.015 -.001 .034 .000 .458 .013 -.017 -.039 .007 -.033 -.009 -.029
institute institut .008 -.055 -.020 .014 -.010 -.070 -.043 -.029 -.070 .183 -.039 .362
instruction nastava .448 -.023 .233 .185 .018 -.082 -.108 .031 .453 -.055 .083 .006
instructors nastavnici .570 -.026 .074 .086 -.030 -.047 -.074 -.012 .306 -.058 -.066 -.030
interest interesovanje .042 -.059 .048 .034 -.006 -.067 .052 -.024 .004 -.029 -.023 .391
introduction uvođenje .068 .077 -.023 .021 .014 -.029 -.065 .370 .111 .013 .136 .062
knowledge znanje .207 -.063 .303 .074 -.057 -.120 -.109 -.063 .362 -.026 -.055 .103
Latin latinica -.030 .547 -.004 -.066 -.130 .135 -.008 -.043 -.006 -.043 .037 .060
law zakon .104 .348 -.067 .095 .270 -.106 -.187 -.078 -.032 -.004 .103 -.143
level nivo .103 .008 .007 .003 .088 -.075 -.116 -.018 .577 .024 -.036 .021
linguist lingvista -.063 .015 -.025 .027 -.007 .219 -.038 .060 .041 .496 .081 .041
linguistic jezički -.103 .005 .010 -.076 -.024 .027 .003 .026 .139 .576 .045 -.053
linguistic lingvistički -.073 -.007 -.020 .078 -.005 .115 -.018 .157 .080 .438 .045 .037
linguistics lingvistika -.087 -.022 .007 .067 -.023 .103 -.056 .024 .060 .373 .048 .007
literary književni -.088 -.008 -.055 .000 -.044 .128 .464 -.012 .002 .147 -.006 .038
literature književnost -.020 -.036 .024 .237 -.033 .075 .443 .005 .033 .002 -.008 .124
mathematics matematika .210 -.036 .638 -.045 -.012 -.041 -.009 -.017 .012 -.047 -.044 .008
minority manjina -.028 .098 .018 -.037 .785 .011 -.077 -.026 .039 -.056 .246 -.055
Montenegrin crnogorski -.088 .024 -.044 .032 -.001 .078 -.066 .765 .012 .073 .001 -.004
Montenegrins Crnogorci -.064 -.021 -.046 -.046 .056 .101 -.059 .428 .032 .011 -.017 .015
Montenegro Crna -.095 .075 -.048 .075 -.002 .033 -.075 .744 -.029 .003 -.002 .006
mother (adj.) maternji .100 .045 .297 .185 .083 .008 -.098 .401 -.066 -.066 .022 -.091
name ime -.079 -.003 -.064 .000 -.013 .385 -.042 .146 -.041 .077 .027 -.100
national nacionalni -.024 .149 -.028 -.078 .459 .138 -.072 .027 .041 -.020 .395 -.017
Nikšić Nikšić -.034 .027 .111 .454 -.032 -.038 -.024 .466 -.123 -.002 -.012 -.213
official službeni -.039 .558 -.022 .021 .179 -.037 -.096 .118 -.003 .028 .058 -.087
part deo -.066 -.019 .000 -.042 .004 .011 .413 -.062 -.003 .014 -.031 .133
philology filološki .030 -.029 -.030 .426 -.033 .051 .040 -.113 .141 .069 .025 .261
philosophy filozofski -.013 -.009 .023 .550 -.026 .016 -.023 .204 -.049 .038 .013 -.030
poem pesma -.040 -.062 -.071 -.027 -.030 -.084 .358 -.049 -.009 -.072 -.030 -.153
poetry poezija -.057 -.062 -.060 .012 -.039 -.086 .316 -.059 -.013 -.066 -.023 -.142
professor profesor .096 .047 .122 .578 -.071 -.093 .035 .089 .094 .078 -.008 .018
protection zaštita -.036 .388 .005 -.024 .171 .019 -.033 .096 -.017 -.056 -.025 .042
publish objaviti -.029 .064 -.054 .016 -.010 .005 .514 -.048 -.053 -.025 -.004 -.010
published objavljen -.040 -.023 .057 -.029 -.031 -.044 .440 -.052 -.037 .124 -.032 -.014
renaming preimenovanje -.038 .001 .079 .327 -.040 .003 -.037 .483 -.112 -.008 -.003 -.210
rights prava -.067 .113 -.048 .049 .401 -.005 -.142 .130 .156 -.025 .100 -.109
Romanian rumunski .014 -.006 .011 -.017 .528 .050 .052 -.029 -.031 -.067 -.051 .009
Ruthenian rusinski .026 .006 .019 -.021 .709 .001 -.012 -.006 -.025 -.036 -.053 .014

101
English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
SANU SANU .009 -.029 .004 -.026 .006 .130 .032 -.046 -.055 .487 -.028 .054
school (K-12) škola .570 -.040 .561 .021 -.027 -.083 -.198 .007 .200 -.100 .024 .096
school (univ.) fakultet .089 -.027 -.045 .657 .039 -.028 -.046 -.109 .174 -.020 -.007 .170
science nauka .043 -.005 -.043 .143 .000 .074 .079 .052 -.011 .352 -.063 .063
scientific naučni -.019 -.010 -.013 .165 -.011 .060 .093 .032 -.001 .365 -.016 .057
alphabet pismo -.045 .817 -.021 -.022 -.072 .063 -.030 -.079 -.028 .000 .005 .024
section odeljenje .146 -.039 .480 -.007 .026 -.015 -.042 -.036 .042 .032 -.014 .001
Serbian srpski -.063 .259 -.008 .165 .000 .351 .222 .230 -.044 .363 .011 .103
Serbo-croatian srpskohrvatski -.034 .043 -.007 .036 -.018 .348 .047 .018 .025 .317 .029 .026
Serbs Srbi -.085 .162 -.057 -.066 .040 .501 -.016 .080 -.003 .034 .009 .003
Slovak slovački .029 .001 .024 -.023 .615 .012 .003 .035 -.026 -.007 -.055 .037
standard standardni -.048 .047 -.023 .000 -.048 .009 -.037 -.027 .057 .317 .046 -.014
students (K-12) učenici .374 -.020 .629 -.006 .025 -.054 -.082 -.002 -.015 -.058 -.025 -.014
students (univ.) studenti -.007 -.035 -.040 .589 -.017 -.040 -.050 .026 .005 -.050 -.025 .215
study učiti .594 -.040 .137 -.021 -.046 -.043 -.132 .019 .086 -.094 .183 .094
subject predmet .761 -.013 .116 .060 .037 -.057 -.029 .085 -.002 -.036 .197 -.059
teach predavati .293 .002 .125 .136 -.042 -.009 -.008 .056 .606 -.038 .036 -.070
teachers učitelji .442 -.035 -.036 .084 -.003 -.019 -.046 -.042 .094 -.065 .024 -.040
use (n.) upotreba -.061 .647 -.047 -.014 .199 -.101 -.125 -.081 -.004 .212 .044 -.140
war rat -.077 -.065 -.049 -.067 -.040 .338 .004 -.016 -.015 -.058 -.013 -.002
word reč -.106 -.014 -.082 -.124 -.081 -.133 .028 -.131 -.054 .328 -.050 -.113
work delo -.061 -.060 -.047 -.009 -.016 -.053 .423 -.040 -.037 .073 .009 .052
writer pisac -.110 -.073 -.073 -.071 -.042 -.037 .473 -.002 -.029 -.111 .025 .017

102
Table 23

Summary of the Factorial Structure (Collocates in Parentheses were not Used in the

Calculation of Factor Scores)

Factor Collocates Collocates Factor loadings Factor/discourse label


English Serbian
Factor 1 (5.40 %) grade razred .797 Language education
subject predmet .761
compulsory obavezan .648
class period čas .611
elementary osnovna .608
study (v.) učiti .594
school (K-12) škola .570
instructors nastavnici .570
children deca .527
(elective) (izborni) .501
first prvi .477
instruction nastava .448
teachers učitelji .442
(students [K-12]) (učenici) .374
curriculum program .353
attend pohađati .353
(foreign) (strani) .328
education obrazovanje .302
Factor 2 (3.18 %) alphabet pismo .817 Cyrillic-only
Cyrillic (n.) ćirilica .735
use (n.) upotreba .647
official službeni .558
Latin latinica .547
Cyrillic (adj.) ćirilično .518
constitution ustav .464
protection zaštita .388
association udruženje .356
law zakon .348
Factor 3 (3.16 %) exam ispit .758 Entrance exams
mathematics matematika .638
high school (gen.) srednja (škola) .631
students (K-12) učenici .629
(school [K-12]) (škola) .561
high school (acad.) gimnazija .532
section odeljenje .480
(attend) (pohađati) .309
(knowledge) (znanje) .303
Factor 4 (2.83 %) school (univ.) fakultet .657 Officialization of Montenegrin 1
students (univ.) studenti .589
professor profesor .578
philosophy filozofski .550
department odsek .459
(Nikšić) (Nikšić) .454
philology filološki .426
department katedra .387
decision odluka .382
(renaming) (preimenovanje) .327
Factor 5 (2.72 %) minority manjina .785 Minority language rights
Ruthenian rusinski .709
Slovak slovački .615
Romanian rumunski .528
national nacionalni .459
Hungarian mađarski .458
rights prava .401
community zajednica .384
Factor 6 (2.67 %) Croatia Hrvatska .660 Contestation over language
Croats Hrvati .659 ownership and name
Croatian hrvatski .622

103
Factor Collocates Collocates Factor loadings Factor/discourse label
English Serbian
Serbs Srbi .501
academy akademija .463
name ime .385
(Serbian) (srpski) .351
Serbocroatian srpskohrvatski .348
war rat .338
call (v.) zvati .334
Factor 7 (2.61 %) book knjiga .562 Literature and publishing
publish objaviti .514
writer pisac .473
literary književni .464
literature književnost .443
published objavljen .440
work delo .423
part deo .413
edition izdanje .409
poem pesma .358
poetry poezija .316
Factor 8 (2.61 %) Montenegrin crnogorski .765 Officialization of Montenegrin 2
Montenegro Crna (Gora) .744
renaming preimenovanje .483
Nikšić Nikšić .466
Montenegrins Crnogorci .428
mother (adj.) maternji .401
authorities vlast .392
introduction uvođenje .370
Factor 9 (2.60 %) teach predavati .606 Foreign language education
level nivo .577
be able to moći .482
common zajednički .461
(instruction) (nastava) .453
foreign strani .450
framework okvir .408
begin početi .375
knowledge znanje .362
(grade) (razred) .322
(department) (katedra) .314
(instructors) (nastavnici) .306
Factor 10 (2.55 %) linguistic jezički .576 Linguistics as a science,
linguist lingvista .496 lexicography, standardization
SANU SANU .487 and contestation
dictionary rečnik .485
linguistic lingvistički .438
linguistics lingvistika .373
scientific naučni .365
Serbian srpski .363
science nauka .352
word reč .328
standard standardni .317
(Serbocroatian) (srpskohrvatski) .317
(edition) (izdanje) .314
expression izraz .311
Factor 11 (2.10 %) Bosnian bosanski .804 Officialization of Bosnian
Bosniak bošnjački .669
elective izborni .523
element element .477
(national) (nacionalni) .395
board odbor .351
Factor 12 (1.73 %) center centar .469 Linguacultural diplomacy,
cultural kulturni .421 language, and culture
course kurs .394
interest interesovanje .391
culture kultura .377
institute institut .362
Belgrade Beograd .349

104
6.3.1 Factor 1: Language education. The texts with the top twenty factor scores

on Factor 1 are listed in Table 24 (for a key, see Section 4). It should be noted that the

top three and a total of seven of the twenty top scoring articles here were originally

identified as multivariate outliers (see Section 5.3.2). Importantly, there was some

overlap with Factor 11 (Officialization of Bosnian) as the two top scoring articles on

Factor 1, for example, also had high factor scores on Factor 11. Discursive links between

individual factors were confirmed by the results of cluster analysis (see Section 6.5.), so

factors are presented in groups according to their discursive links (Table 41) rather than

according to the amount of variation they account for alone (Table 23). This language-

related (small ‘d’) discourse included the following salient collocates: grade, subject,

compulsory, class period, elementary, learn, school (K-12), instructors, children, elective,

first, instruction, teachers, students (K-12), curriculum, attend, foreign, and education,

and accounted for most variation (5.40%). Note that all text excerpts were taken from the

top scoring texts as the most representative of a factor/discourse.

Table 24

Top 20 Highest Scoring Articles on Factor 1 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-20-8-2004-55.txt 53.51
2 POL-13-11-2004-108.txt 50.36
3 BLI-19-8-2005-242.txt 49.16
4 POL-13-4-2003-87.txt 43.78
5 POL-25-12-2004-31.txt 42.28
6 POL-17-1-2008-99.txt 41.84
7 POL-27-4-2004-28.txt 39.88
8 POL-1-4-2003-171.txt 39.62
9 BLI-11-8-2006-324.txt 39.22
10 POL-8-4-2004-123.txt 36.36
11 POL-16-9-2004-104.txt 36.16
12 POL-21-10-2004-52.txt 35.49
13 BLI-26-3-2003-586.txt 34.42
14 POL-30-6-2006-2.txt 34.11
15 BLI-14-9-2004-223.txt 34.07
16 POL-26-3-2003-39.txt 32.59
17 POL-15-12-2004-99.txt 32.30
18 POL-19-12-2003-87.txt 31.54
19 POL-22-9-2004-57.txt 30.19
20 POL-29-9-2004-9.txt 29.41

105
Based on the salient variables and a qualitative examination of representative, i.e.

top scoring, texts (see Section 6.6.1), Factor 1 was interpreted as a general discourse

about language education. Keyword and collocation analyses above showed that

education was one of the most prominent semantic fields in this corpus and this is

reflected in Factor 1. Text excerpt 1 (from POL-13-4-2003-87, ranked 4 in Table 24)

illustrates the discourse typically found in articles scoring highly on Factor 1.34

Uz šest do sedam osnovnih predmeta, deca će pohađati izbornu i fakultativnu


nastavu. Svi budući đaci prvaci obavezno će učiti matematiku, srpski jezik i
književnost, umetnost, fizičko i zdravstveno vaspitanje, strani jezik i umesto
prirode i društva – svet oko nas. Oni kojima srpski nije maternji učiće i maternji
jezik. Uz to će pohađati i nastavu iz dva predmeta koja sami izaberu od onoga
što im škola ponudi. […] Samo naredne školske godine, da podsetimo, neće svi
đaci prvaci učiti strani jezik. Učiće oni koji pohađaju škole u kojima je moguće
organizovati nastavu iz stranih jezika. Naime, u nekim školama nema dovoljno
nastavnika stranih jezika, ali će se taj problem, obećavaju nadležni, rešiti već do
septembra sledeće godine. Jer, po programima devetoletke obavezno je učenje
stranog jezika od prvog razreda.

In addition to up to seven compulsory subjects, children will take electives and


optional instruction. All future first-grade students will take compulsory math,
Serbian language and literature, art, physical and health education, foreign
language and, instead of nature and society, the world around us. Those who do
not speak Serbian as a mother tongue will also receive mother tongue instruction.
In addition, they will also receive instruction in two subjects they choose
themselves from what is offered by the school. […] Reminder: not all first-grade
students will be required to take foreign language classes during the next school
year. Only those who attend schools in which it is possible to provide foreign
language instruction will be required to do so. Namely, some schools do not have
a sufficient number of foreign language teachers. But that problem, according to
the authorities, will be resolved as early as September next year because the nine-
year elementary curriculum requires foreign language instruction from grade one.

In this excerpt from a Politika article from April 13, 2003, we see an example of the

general discourse about language education thematizing the then topical changes to

aspects of (foreign language) education in Serbia. Serbia is a republic and, despite the

existence of an autonomous region (Vojvodina), its education system and the government

106
as a whole are fairly highly centralized, which means that policy and curriculum

decisions are made in the republic Ministry of Education in Belgrade. This excerpt refers

to a change in policy necessitated by a country-wide shortage of qualified foreign

language instructors.

Despite some overlap with several other factors noted above, texts scoring highly

on Factor 1 typically discuss language education in the context of a broader discourse on

education (e.g., educational reform topical during the subject period). Texts scoring

highly on Factor 1 thus typically do not thematize Central South Slavic (or any other)

ethnolinguistic identities. However, there are exceptions to this as this general language-

related educational (small ‘d’, see Section 3.3.4) discourse is sometimes permeated by a

(big ‘D’) discourse of contestation which is directly related to Central South Slavic

ethnolinguistic identities and thus ethnonationalism (for a description of this discourse,

see Section 6.6), as in text excerpt 2 from a Politika article from November 13, 2004

(POL-13-11-2004-108, ranked 2 in Table 24).

Bošnjaci se bore da prosvetne vlasti u Srbiji njihovoj deci u Tutinu, Sjenici i


Novom Pazaru omoguće da u prvom i drugom razredu kao izborni predmet uče svoj
maternji jezik sa elementima nacionalne kulture. Iz Bošnjačkog nacionalnog vijeća u
SCG kažu da đaci u nekim tamošnjim osnovnim školama već pohađaju časove iz svog
maternjeg jezika ali samo u okviru fakultativne nastave. Imaju samo jedan čas sedmično.
Ako bi predmet postao izborni, onda bi deca dobila još dva časa nedeljno rezervisana
za njihovu tradiciju, kulturu i jezik. Na prvi pogled ništa sporno. Ne bi ni bilo problema
da oni, Bošnjaci u Srbiji, svoj maternji jezik ne nazivaju bosanski. Prosvetne vlasti u
Srbiji ne mogu da im dopuste da u školama uče bosanski sve dok jezikoslovci ne
priznaju postojanje tog jezika.

Bosniaks are fighting to get the educational authorities in Serbia to allow their
children in Tutin, Sjenica, and Novi Pazar [municipalities with a Bosniak majority in
southwest Serbia] to study their own language with elements of national culture as an
elective in the first and second grades [of elementary school]. Officials of the Bosniak
National Council in Serbia and Montenegro [the country’s official name before
Montenegrin independence] say that elementary school children in this area are already
taking classes in their mother tongue but only as an optional subject, which means only

107
one class per week. If this subject became an elective, then the children would be able to
take two classes per week which would be reserved for their tradition, culture, and
language. At a first glance, there is nothing problematic about this. And there wouldn’t
be anything problematic about this if they, the Bosniaks in Serbia, did not call their
mother tongue Bosnian. Educational authorities in Serbia cannot allow them to study
Bosnian in schools until linguists recognize the existence of that language.

As can be seen, then, language is conceptualized in terms of (ethno-) national

identity, tradition and culture. The contestation of a self-ascribed language name which is

evident here is thus more about an attempt at delegitimization of ethnolinguistic identity

and the collective rights that come with a separate ethnolinguistic identity (such as a

group’s right to name its own language) than it is about language itself. As we will see

further below, this discourse of contestation is pervasive in this corpus, but at the same

time, some factors are much more parsimonious pointers to it than others, so follow-up

discussion (Sections 6.4-6.6) will be focused on a sub-section of six factors (Factors 2, 4,

6, 8, 10, and 11) which arise from texts that routinely thematize Central South Slavic

ethnolinguistic identities explicitly and thus are much more pertinent to an analysis of

links between language-related discourses and language ideologies, and

ethnonationalism.

6.3.2 Factor 3: Entrance exams. This language-related discourse included the

following salient collocates: exam, mathematics, high school (general and academic),

students (K-12), school (K-12), section, attend and knowledge, and accounted for 3.16%

of the variation. The texts with the top twenty factor scores on Factor 3 are listed in

Table 25. Note that only six of the twenty top scoring articles here were originally

identified as multivariate outliers. There was also some overlap with Factors 1, 9 and 11

as several texts scored highly on all or most of these factors (which, again, was confirmed

108
by Factors 1, 3, 9, and 11 clustering together, see Table 41).

Table 25

Top 20 Highest Scoring Articles on Factor 3 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-15-5-2003-92.txt 33.56
2 POL-23-4-2005-48.txt 29.34
3 POL-4-4-2005-157.txt 27.37
4 POL-12-4-2003-104.txt 26.80
5 BLI-17-6-2003-397.txt 26.71
6 POL-15-4-2003-82.txt 26.41
7 BLI-31-3-2003-575.txt 25.92
8 POL-23-5-2008-67.txt 24.79
9 POL-30-6-2008-5.txt 24.65
10 POL-24-4-2004-47.txt 23.67
11 BLI-8-9-2006-260.txt 22.76
12 POL-17-8-2005-63.txt 21.03
13 POL-4-2-2003-161.txt 20.30
14 POL-19-4-2005-73.txt 20.10
15 POL-18-6-2004-73.txt 19.62
16 POL-26-11-2003-29.txt 19.61
17 POL-8-5-2003-129.txt 19.36
18 BLI-19-6-2008-503.txt 19.32
19 POL-12-5-2003-104.txt 19.29
20 BLI-22-6-2004-387.txt 19.07

Based on the salient variables and a qualitative examination of representative

texts, Factor 3 was interpreted as a general educational discourse which thematized high

school entrance exams typically consisting of tests of skills in mathematics and (foreign)

languages. Text excerpt 3 (from POL-15-5-2003-92, ranked 1 in Table 25) illustrates the

discourse typically found in articles scoring highly on Factor 3.

Za 168 mesta u beogradskoj Filološkoj gimnaziji za sada se kandiduje 255


učenika. Toliko je njih od ukupno 342 prijavljena položilo specifični prijemni
ispit koji je bio identičan za sve jezičke srednje škole i odeljenja u Srbiji.
Kažemo za sada, jer se za beogradske klupe mogu u junu prijaviti i učenici koji su
ispite iz srpskog jezika i književnosti i stranih jezika polagali u Karlovačkoj
gimnaziji ili u gimnazijama u Smederevu, Kruševcu i Kragujevcu koje imaju po
jedno englesko odeljenje. I, mogu da „istisnu" đaka koji ima manje poena od
njih. - Svi ovi učenici polovinom juna polažu, kao i ostali osmaci, kvalifikacione
ispite iz maternjeg jezika i matematike. Oni za upis u filološke škole nisu
eliminacioni, ali će deci doneti dodatne bodove.

A total of 255 students have applied for the 168 available slots in the Belgrade
philological high school, for now. Of the total number of 342 candidates, this is
how many passed the special entrance exam which was identical for all linguistic
high schools and sections in Serbia. We say for now because also the students
109
who took the Serbian language and literature and foreign languages exams at the
Karlovac, Smederevo, Kruševac or Kragujevac high schools (which have one
English-language section each) will be eligible to apply to the Belgrade school in
June. And, they can ‘squeeze out’ a student who has fewer points than them.
Similar to other eighth-graders, all these students take qualification exams in
mother tongue and mathematics in mid-June. Failing these exams does not
disqualify a student from the process of admission to the philological schools, but
they do mean extra points for children who do well on them.

Here, it is important to note that, depending on the type of school, the process of

admission to high schools in Serbia, and elsewhere in the Balkans, requires applicants to

pass several entrance exams which include mother tongue (i.e., Serbian), math, and a

foreign language, most often English. Texts scoring highly on Factor 3 thus typically

discuss the process of admission to high schools and the relevant entrance exams; texts

scoring highly on this factor exhibit an administrative educational discourse and do not

typically thematize ethnolinguistic identity, either explicitly or implicitly.

6.3.3 Factor 9: Foreign language education. This language-related discourse

included the following salient collocates: teach, level, be able to, common, instruction,

foreign, framework, begin, knowledge, grade, department (katedra) and instructors, and

accounted for 2.60% of the variation. The texts with the top twenty factor scores on

Factor 9 are listed in Table 26. Here, as many as thirteen of the twenty top scoring

articles were originally identified as multivariate outliers. As noted above, the principal

area of overlap was with Factors 1 and 3 (Language education, Entrance exams), as well

as, to a lesser degree, Factor 11 (Officialization of Bosnian).

110
Table 26

Top 20 Highest Scoring Articles on Factor 9 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-23-8-2003-48.txt 65.71
2 POL-27-8-2003-28.txt 33.80
3 BLI-23-9-2003-191.txt 29.12
4 POL-10-1-2003-106.txt 26.40
5 POL-25-5-2005-44.txt 24.77
6 POL-15-5-2005-103.txt 23.37
7 POL-17-8-2005-63.txt 20.93
8 POL-19-9-2004-79.txt 20.53
9 BLI-4-8-2003-273.txt 20.16
10 BLI-9-3-2005-499.txt 19.22
11 BLI-29-3-2003-580.txt 18.76
12 BLI-12-5-2003-487.txt 18.35
13 POL-25-9-2004-41.txt 18.28
14 POL-16-9-2004-104.txt 17.56
15 POL-21-2-2003-42.txt 17.55
16 POL-17-7-2006-94.txt 15.39
17 BLI-27-10-2006-140.txt 14.97
18 POL-9-9-2006-163.txt 14.68
19 POL-17-9-2003-91.txt 14.49
20 POL-30-5-2004-16.txt 14.31

Based on the salient variables and a qualitative examination of representative

texts, Factor 9 was interpreted as a general discourse on foreign language education. Text

excerpt 4 (from POL-23-8-2003-48, ranked 1 in Table 26) provides an illustration of the

discourse found in articles scoring highly on Factor 9.

Ministarstvo prosvete Srbije odlučilo je juče da proširi listu predavača koji imaju
pravo da predaju strani jezik od prvog do šestog razreda osnovne škole, ako
poseduju znanje stranog jezika najmanje na nivou B2 zajedničkog evropskog
okvira. To znači da će strane jezike u nižim razredima moći da predaju i
profesori razredne nastave, diplomirani filolozi, psiholozi, pedagozi i druga lica
koja su završila neki nastavnički fakultet, saopštilo je Ministarstvo prosvete. Nivo
znanja stranog jezika dokazuje se polaganjem odgovarajućeg ispita na nekoj od
filoloških katedri univerziteta u Srbiji ili "međunarodno priznatom javnom
ispravom čiju valjanost utvrđuje Ministarstvo prosvete". Ministarstvo se odlučilo
na ovaj korak zbog nedostatka nastavnika za nastavu stranih jezika koja u
predstojećoj školskoj godini treba da počne u svim prvim razredima osnovne
škole.

The Serbian Ministry of Education decided yesterday to broaden the qualification


criteria for eligibility to teach foreign languages in elementary grades 1-6 to
instructors who demonstrate foreign language skills equivalent to at least the B2
level of the Common European Framework. This means that also general
education professors, professors of philology, psychology and pedagogy, and
others who have some kind of pedagogical degree will be eligible to teach foreign
111
languages to lower-level classes, it was said in the statement by the Ministry of
Education The level of foreign language skill can be proven by the passing of an
appropriate exam at one of the philology departments at universities in Serbia or
by “an internationally recognized certificate accepted by the Ministry of
Education.” The ministry made this decision because of a lack of instructors
qualified for foreign language instruction which will begin in first grades in all
elementary schools starting next school year.

Similar to the excerpt used to illustrate the discourse identified by Factor 1, this example

also points to a shortage of foreign language instructors in Serbian schools during the first

decade of the twenty-first century. Texts scoring highly on Factor 9, such as this Politika

article from August 23, 2003, typically discuss foreign language education issues in

Serbian elementary schools and high schools. Texts representative of this discourse thus

do not typically or significantly thematize ethnolinguistic identities, and so are of

marginal interest in terms of language ideologies and ethnonationalism.

6.3.4 Factor 11: Officialization of Bosnian. This language-related discourse

included the following salient collocates: Bosnian, Bosniak, elective, element, national

and board, and accounted for 2.10% of the variation. The texts with the top twenty factor

scores on Factor 11 are listed in Table 27. Note that as many as sixteen of the twenty top

scoring articles were originally identified as multivariate outliers here. As noted above,

the principle area of overlap was with Factors 1, 3, and 9 (see discussion of these factors

above and Table 41 below). There were also minor areas of overlap with Factors 5

(Minority language rights) and 10 (Linguistics as a science, lexicography, standardization

and contestation), although this latter overlap was not attested by cluster analysis; a

possible reason for this is that Bosnian is sometimes discussed as a minority language in

Serbia, as well as one of the contested Central South Slavic varieties.

112
Table 27

Top 20 Highest Scoring Articles on Factor 11 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-15-12-2004-99.txt 39.91
2 POL-15-1-2005-107.txt 37.19
3 POL-12-11-2004-113.txt 33.23
4 POL-13-11-2004-108.txt 30.79
5 POL-10-1-2005-134.txt 29.08
6 POL-11-11-2004-127.txt 28.13
7 POL-26-10-2004-29.txt 26.68
8 POL-16-2-2005-82.txt 25.26
9 POL-12-3-2003-119.txt 19.45
10 POL-10-12-2004-127.txt 18.95
11 POL-14-1-2005-115.txt 17.46
12 POL-28-1-2006-16.txt 16.88
13 POL-19-2-2005-60.txt 14.52
14 POL-27-5-2004-29.txt 14.47
15 POL-7-3-2005-163.txt 13.82
16 BLI-31-10-2003-118.txt 13.12
17 BLI-24-1-2008-913.txt 12.98
18 POL-9-3-2005-148.txt 12.45
19 POL-25-9-2004-41.txt 12.32
20 POL-9-6-2006-165.txt 11.75

Based on the salient variables and a qualitative examination of representative

texts, Factor 11 was interpreted as a discourse on the officialization of Bosnian. Text

excerpt 5 (from POL-12-11-2004-113, ranked 3 in Table 27) provides an illustration of

the discourse typically found in articles scoring highly on Factor 11.

ZAŠTO BOŠNJACI U SRBIJI NE MOGU DA UČE BOSANSKI Možda će se u


našim školama sledeće školske godine izučavati i bosanski jezik, ukoliko taj jezik
priznaju jezikoslovci Na sednici Prosvetnog odbora Skupštine Srbije čula se
duhovita opaska da se više od sat vremena raspravlja o nečemu što ne postoji – o
bosanskom jeziku. Ministar prosvete dr Slobodan Vuksanović je više puta
ponovio da bosanski jezik za sada zvanično ne postoji, a poslanik Milan
Veselinović iz Novog Pazara (SRS) je istakao da učenici u nekim osnovnim
školama u novopazarskoj, tutinskoj i sjeničkoj opštini u prvom i drugom razredu
uče bosanski jezik iz udžbenika za bosanski jezik sa elementima nacionalne
kulture.

WHY BOSNIAKS IN SERBIA CANNOT STUDY BOSNIAN The Bosnian


language may be introduced [as a subject] in our schools next school year if that
language is recognized by linguists. During the session of the Pedagogical board
of the Serbian parliament a witty remark was made that something which doesn’t
exist, the Bosnian language, had been discussed for an hour. Minister of
Education, dr. Slobodan Vuksanović, repeated several times that the Bosnian
language does not officially exist for now, but member of parliament Milan
Veselinović from Novi Pazar (SRS [Serbian Radical Party]) said that first- and
113
second-grade students in some elementary schools in the Novi Pazar, Tutin, and
Sjenica municipalities were studying Bosnian from a textbook for the Bosnian
language with elements of national culture.

Similar to the second excerpt used to illustrate the discourse identified by Factor 1 above,

language is conceptualized in terms of collective identity and culture (“Bosnian language

with elements of national culture”). Texts scoring highly on Factor 11, such as this

Politika article from November 12, 2004, typically discuss the then topical introduction

(and thus recognition/officialization) of Bosnian as a minority language in schools in

southwest Serbia, an area where the Bosniak minority has traditionally been in the

majority. Note the explicit link between minority language rights and ethnocultural

rights. Texts representative of this discourse also typically thematize Central South

Slavic ethnolinguistic identities and show traces of the discourse of contestation related to

these identities.

6.3.5 Factor 2: Cyrillic-only. This language-related discourse included the

following salient collocates: alphabet, Cyrillic (adj./n.), use (n.), official, Latin,

constitution, protection, association and law, and accounted for 3.18% of the variation.

The texts with the top twenty factor scores on Factor 2 are listed in Table 28. Note that as

many as twelve of the twenty top scoring articles here were originally identified as

multivariate outliers. Similar to Factor 1, there was some overlap with Factor 5 (Minority

language rights) as three texts scored highly on both factors.

114
Table 28

Top 20 Highest Scoring Articles on Factor 2 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-11-2-2005-108.txt 52.37
2 POL-16-12-2006-100.txt 50.71
3 POL-17-6-2004-76.txt 50.32
4 POL-16-3-2003-93.txt 46.21
5 POL-3-10-2008-168.txt 41.91
6 POL-22-8-2006-59.txt 37.87
7 POL-21-9-2004-72.txt 36.00
8 POL-25-8-2005-27.txt 35.78
9 POL-8-4-2005-138.txt 32.52
10 POL-16-4-2005-91.txt 30.42
11 POL-29-5-2003-20.txt 30.21
12 POL-26-7-2006-36.txt 29.39
13 BLI-28-8-2008-343.txt 27.56
14 POL-18-11-2005-74.txt 26.46
15 BLI-2-9-2006-272.txt 25.39
16 POL-2-6-2004-157.txt 24.42
17 POL-29-9-2006-20.txt 24.00
18 POL-11-10-2008-124.txt 23.90
19 POL-17-3-2008-114.txt 23.36
20 POL-4-3-2005-193.txt 23.20

Based on the salient variables and a qualitative examination of representative

texts, Factor 2 was interpreted as a classic discourse of endangerment (cf. Duchêne &

Heller, 2007), here referring to a (perceived) threat to the Cyrillic alphabet from the

widespread use of the Latin alphabet in Serbia. Text excerpt 6 (from POL-11-2-2005-

108, ranked 1 in Table 28) illustrates the discourse typically found in articles scoring

highly on Factor 2.

Udruženje građana za zaštitu srpskog pisma „Srpska ćirilica", zatražilo je da se


predsednik Srbije izvini srpskom narodu, jer se u predlogu Ustava ekspertske
grupe koju je on formirao, uz ćirilicu, navodi i latinica kao srpsko pismo.
Članovi Udruženja, u jučerašnjem saopštenju, postavljaju pitanje da li je
autorima ovog predloga poznat još neki narod na svetu koji za svoj jezik koristi
tuđe pismo. […] Pre svega, autorima predloga Ustava, navodi se dalje, nije
poznato da srpski jezik nikada nije pisan latinicom, sve do trenutka kada je
zloupotrebljena dobra volja da se južnoslovenskim saplemenicima Hrvatima
pomogne da i oni konačno dobiju svoje pismo.

The association for the protection of the Serbian alphabet “Serbian Cyrillic”
demands that the President of Serbia apologize to the Serbian people because a
constitution draft submitted by an expert group he formed treats both Latin and
Cyrillic as Serbian alphabets. In their yesterday’s statement, members of the
association ask if the authors of this draft know of any other people in the world
115
who use a foreign alphabet in their language. […] Above all, the statement further
reads, the authors of this constitution draft do not know that the Latin alphabet
had not been used in the Serbian language before a time during which the good
will to help the Southern Slavic co-tribesmen Croats to finally get their own
alphabet was abused.

It should be noted here that both Latin and Cyrillic alphabets are in use in Serbia (for a

discussion of the significance of this issue, see Section 7). Although the two scripts are

equally functional, Cyrilic is widely seen as autochthonous and thus as closely linked to

the Serbian ethnonational identity. This has made the use of the Latin alphabet a target

for Serbian ultranationalists who argue that it represents a threat to Serbian Cyrilic and

thus to Serbs themselves.

Texts scoring highly on Factor 2 thus typically discuss a (perceived) threat to the

Cyrillic alphabet in the context of available legal protections, sometimes making

references to minority language rights and regional ethnolinguistic identities. Texts

representative of this discourse typically thematize Central South Slavic ethnolinguistic

identities as the Latin alphabet is sometimes linked to Croats, as in the excerpt from a

Politika article from February 11, 2005 above.

6.3.6 Factor 5: Minority language rights. This language-related discourse

included the following salient collocates: minority, Ruthenian, Slovak, Romanian,

national, Hungarian, rights and community, and accounted for 2.72% of the variation.

The texts with the top twenty factor scores on Factor 5 are listed in Table 29. Again, as

many as fifteen of the twenty top scoring articles here were originally identified as

multivariate outliers. The principal area of overlap was with Factor 2 (Cyrillic-only).

116
Table 29

Top 20 Highest Scoring Articles on Factor 5 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-10-1-2003-106.txt 71.03
2 POL-16-4-2005-91.txt 53.05
3 NIN-27-10-2005-73.txt 39.97
4 POL-6-6-2006-188.txt 36.82
5 POL-3-10-2008-168.txt 27.06
6 POL-29-11-2004-10.txt 25.84
7 POL-9-1-2006-149.txt 24.34
8 POL-13-11-2008-111.txt 24.21
9 POL-26-2-2003-16.txt 24.19
10 POL-16-6-2006-110.txt 22.31
11 POL-9-6-2006-171.txt 22.22
12 POL-11-11-2005-114.txt 21.13
13 POL-13-7-2006-125.txt 19.95
14 POL-9-6-2006-164.txt 18.96
15 POL-5-4-2006-133.txt 17.95
16 POL-3-2-2006-142.txt 17.21
17 POL-28-1-2006-16.txt 16.95
18 POL-1-6-2003-210.txt 16.79
19 POL-6-6-2006-190.txt 16.39
20 POL-10-12-2003-160.txt 16.21

Based on the salient variables and a qualitative examination of representative

texts, Factor 5 was interpreted as a discourse on minority language rights. It should be

noted that, although varieties of what used to be called Serbo-Croatian such as Croatian

and Bosnian are sometimes mentioned in the general discourse on minority rights (as in

the excerpt from a Politika article from June 6, 2006 below), there is a clear distinction

between them and the non-Serbo-Croatian minority languages such as Hungarian or

Slovak (hence the separate factors). Text excerpt 7 (from POL-6-6-2006-188, ranked 4 in

Table 29) illustrates the discourse typically found in articles scoring highly on Factor 5.

Kako je najavljeno iz Pokrajinskog sekretarijata za upravu, propise i prava


nacionalnih manjina, pripadnici nacionalnih manjina u Vojvodini mogu
pribaviti dvojezična lična dokumenta. Prošle sedmice je završeno štampanje
dvojezičnih ličnih dokumenata, a juče je počela distribucija po policijskim
stanicama. - Lične karte su štampane u kombinaciji srpski-mađarski, srpski-
hrvatski, srpski-rumunski, srpski-rusinski, i srpski-češki, pa će pripadnici ovih
zajednica moći da zatraže dvojezična dokumenta od nadležnih organa već od
danas - izjavio je pokrajinski sekretar za upravu, propise i nacionalne manjine
Tamaš Korhec. Štampanje dvojezičnih dokumenata je obavljeno na osnovu
republičkog Zakona o upotrebi jezika i pisma iz 1991. godine. Ovaj zakon,

117
podsetimo, daje pravo pripadnicima nacionalnih manjina na dvojezične
dokumente u mestima u kojima je njihov maternji jezik u službenoj upotrebi.

As announced by the Provincial secretariat for administration, regulations and


national minority rights, members of the national minorities in Vojvodina will be
able to obtain bilingual personal IDs. The printing of bilingual personal IDs was
finished last week, and the distribution to police stations began yesterday.
Personal IDs were printed in the following [language] combinations:
Serbian/Hungarian, Serbian/Croatian, Serbian/Romanian, Serbian/Ruthenian and
Serbian/Czech. The members of these communities will be able to apply for
bilingual IDs starting as early as today, said provincial secretary for
administration, regulations and national minorities, Tamaš Korhec. The legal
basis for the printing of bilingual IDs is the republic Law on the use of language
and alphabet from 1991. This law, let us remind, gives the members of national
minorities the right to bilingual IDs in those places where their mother tongue is
in official use.

Texts scoring highly on Factor 5 typically discuss language minority rights in Serbia, and

particularly in the northern province of Vojvodina where most minority populations are

concentrated. Texts representative of this discourse typically thematize minority

ethnolinguistic identities, but, as noted above, because non-Serb Central South Slavic

minorities are considered to be (ethnolinguistically) different from non-Central South

Slavic minorities, texts representative of Factor 5 do not typically treat what I have been

referring to as ‘regional’ (i.e., Central South Slavic) ethnolinguistic identities. However,

there are often traces of a discourse of endangerment (for a description of this discourse,

see Section 6.6) in texts representative of Factor 5, which refers to the perceived

endangerment of the Cyrilic alphabet in Serbia, as in excerpt 8 from a Politika article

from April 16, 2005 (POL-16-4-2005-91, ranked 2 in Table 29).

Pokrajinski sekretarijat za propise, upravu i nacionalne manjine izrazio je


zabrinutost zbog najnovije odluke Skupštine opštine Šid kojom se ukida službena
upotreba slovačkog i rusinskog jezika i latiničnog pisma na teritoriji te opštine.
Ovaj sekretarijat je saopštio je da je to „u direktnoj suprotnosti sa Ustavom Srbije,
Ustavnom poveljom SCG i Zakonom o zaštiti prava nacionalnih manjina”. […]
Međutim, bez obzira na to što se Avramov uplašio za ćirilično pismo i što nije
znao da nacionalne manjine imaju pravo na službenu upotrebu maternjeg jezika,

118
odluka o proterivanju rusinskog, slovačkog i zabrani latiničnog pisma uzbunila je
javnost.

The Provincial secretariat for administration, regulations and national minorities


has expressed concern over the latest decision by the Šid [an urban area in the
province of Vojvodina] municipal council which removed from official use in the
territory of this municipality the Slovak and Ruthenian languages and the Latin
alphabet. This secretariat said that this decision is “in direct violation of the
Serbian Constitution, the constitutional declaration of Serbia and Montenegro,
and the Law on the protection of minority rights.” […] However, although
Avramov [the mayor of Šid] was apprehensive about the status of the Cyrilic
alphabet and although he was not aware that national minorities are entitled to the
official use of their mother tongues, the decision about the removal from official
use of the Slovak and Ruthenian languages and the Latin alphabet still alarmed
the public.

6.3.7 Factor 4: Officialization of Montenegrin 1. This language-related

discourse included the following salient collocates: school (university), students

(university), professor, philosophy, department (odsek/katedra), Nikšić, philology,

decision and renaming, and accounted for 2.83% of the variation. The texts with the top

twenty factor scores on Factor 4 are listed in Table 30. Note that as many as fifteen of the

twenty top scoring articles here were originally identified as multivariate outliers. The

principal area of overlap was with Factor 8 (Officialization of Montenegrin 2) as nine

texts scored highly on both factors.

119
Table 30

Top 20 Highest Scoring Articles on Factor 4 (MV Outliers are in Bold)

Rank Article Factor score


1 BLI-30-3-2004-544.txt 50.56
2 POL-15-4-2004-89.txt 36.23
3 POL-12-7-2004-89.txt 32.47
4 POL-5-4-2004-141.txt 30.70
5 POL-16-4-2004-86.txt 30.14
6 POL-13-12-2006-119.txt 29.96
7 POL-2-4-2004-166.txt 26.72
8 POL-8-9-2004-157.txt 26.67
9 NIN-18-12-2003-10.txt 26.10
10 POL-1-9-2003-173.txt 24.72
11 POL-7-7-2004-109.txt 24.17
12 POL-5-9-2008-165.txt 24.16
13 POL-29-3-2004-25.txt 23.86
14 POL-25-11-2003-34.txt 23.62
15 POL-22-1-2008-70.txt 21.84
16 POL-17-11-2003-84.txt 21.50
17 POL-27-8-2003-28.txt 19.32
18 POL-2-9-2003-166.txt 18.93
19 NIN-13-3-2003-381.txt 18.30
20 POL-11-7-2003-114.txt 18.16

Based on the salient variables and a qualitative examination of representative

texts, Factor 4 was interpreted as one of the two discourses on the officialization of

Montenegrin, which proceeded in two distinct phases with Serbian first being renamed

into mother tongue and then into Montenegrin. Text excerpt 9 (from BLI-30-3-2004-544,

ranked 1 in Table 30) illustrates the discourse typically found in articles scoring highly on

Factor 4.

Studenti Odseka za srpski jezik i književnost Filozofskog fakulteta u Nikšiću


[major urban area in northern Montenegro] juče popodne zamrzli su štrajk glađu
koji su počeli pre podne zahtevajući od crnogorskog ministra prosvete da povuče
odluku o preimenovanju srpskog jezika u maternji u osnovnim i srednjim
školama. “Profesori podržavaju naš stav, pa ćemo sačekati rezultate sednice
Saveta za opšte obrazovanje Crne Gore zakazane za petak, ali ukoliko naš zahtev
za vraćanje imena srpskog jezika ne bude podržan nastavićemo štrajk glađu od
ponedeljka“, rekao je agenciji Beta predsednik Štrajkačkog odbora studenata
Bojan Strunjaš. Studenti Odseka za srpski jezik i književnosti organizovali su i
potpisivanje peticije za odbranu srpskog jezika, koju je do 15 sati potpisalo oko
1.000 studenata Filosofskog fakulteta.

Demanding that the Montenegrin minister of education withdraw his decision


about the renaming of the Serbian language into mother tongue in elementary
schools and high schools, the students from the Department of Serbian language
120
and literature at the School of Philosophy at Nikšić suspended yesterday
afternoon their hunger strike they had started that morning. “The professors
support our demand, so we will wait for the outcome of the Council for general
education of Montenegro which is scheduled for Friday. However, if our demand
for the reinstatement of the name of the Serbian language is not accepted, we will
continue our hunger strike beginning on Monday,” Bojan Strunjaš, the president
of the student strike board, told BETA agency. The students from the Department
of Serbian language and literature also organized the signing of a petition for the
defense of the Serbian language which was signed by about 1,000 students of the
School of Philosophy by 3 PM.

As can be seen in this excerpt from a Blic article from March 30, 2003, texts scoring

highly on Factor 4 typically report on the protests against the new policy by professors

and students of Serbian in Montenegro. It should be noted that although Serbs (who were

and continue to be vehemently against this policy) represent a sizeable minority in

Montenegro with strong political representation in the Montenegrin parliament, they were

unable to stop the implementation of this policy because ethnic Montenegrins and their

political parties received support from all other minority groups (Bosniaks, Albanians,

etc.). Texts representative of this discourse are directly relevant to Central South Slavic

ethnolinguistic identities and are linked to texts representative of the discourse identified

by Factor 8 (see Section 6.3.8), so they will be discussed in conjunction with one another

(see Section 6.6).

6.3.8 Factor 8: Officialization of Montenegrin 2. This language-related

discourse included the following salient collocates: Montenegrin, Montenegro, renaming,

Nikšić, Montenegrins, mother (adj.), authorities and introduction, and accounted for

2.61% of the variation. The texts with the top twenty factor scores on Factor 8 are listed

in Table 31. Here, seven of the twenty top scoring articles were originally identified as

multivariate outliers. As noted above, the principal overlap was with Factor 4

121
(Officialization of Montenegrin 1).

Table 31

Top 20 Highest Scoring Articles on Factor 8 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-26-7-2005-39.txt 39.05
2 VRE-17-7-2003-115.txt 27.85
3 POL-9-11-2004-137.txt 24.49
4 BLI-30-3-2004-544.txt 24.00
5 POL-29-3-2004-25.txt 22.69
6 BLI-9-10-2004-161.txt 22.53
7 POL-31-3-2004-2.txt 22.24
8 BLI-20-9-2004-204.txt 22.17
9 POL-11-4-2003-113.txt 21.85
10 POL-11-4-2003-111.txt 21.85
11 POL-17-7-2006-92.txt 20.87
12 POL-7-7-2004-109.txt 20.69
13 BLI-24-3-2003-593.txt 20.64
14 POL-15-12-2004-98.txt 20.25
15 NIN-30-9-2004-112.txt 20.23
16 POL-2-9-2004-192.txt 19.77
17 POL-7-12-2003-177.txt 19.54
18 POL-27-3-2004-34.txt 19.34
19 POL-20-10-2004-57.txt 18.73
20 POL-23-7-2003-45.txt 18.61

Based on the salient variables and a qualitative examination of representative

texts, Factor 8 was interpreted as the second of the two discourses on the officialization

of Montenegrin. However, in contrast to the first discourse on the officialization of

Montenegrin, which was predominantly focused on the protests against the new policy by

students and professors of Serbian in Montenegro, this discourse also comprised the more

general views opposing the policy, particularly those espoused by nationalist intellectuals.

Text excerpt 10 (VRE-17-7-2003-115, ranked 2 in Table 31) provides an illustration of

the discourse representative of articles scoring highly on Factor 8.

Matija Bećković, “dežurni branilac svesrpstva u Crnoj Gori, hitro je, u svom
poznatom stilu, reagovao na ideju dr Vukotića”, o uvođenju engleskog kao
službenog jezika u Crnu Goru: “Bilo bi veoma korisno da se engleski jezik
uvede kao drugi službeni jezik u Crnu Goru, jer bi se malo odmorio crnogorski
jezik koji je to odavno zaslužio... Ako umesto srpskog jezika i srpske azbuke
uvedu maternji, onda bi možda bilo pravednije da ga nazovu maćehinskim.”

Matija Bećković, “the defender of Serbhood in Montenegro on duty,” reacted


quickly, in his well-known style, to Dr. Vukotić’s idea about the introduction of
122
English as an official language in Montenegro: “It would be very useful to
introduce English as a second official language in Montenegro because the
Montenegrin language would get some well-deserved rest… If mother tongue is
introduced in place of the Serbian language and Serbian alphabet, then it would
perhaps be more just to call it step-mother tongue.”

Texts scoring highly on Factor 8, such as this Vreme article from July 17, 2003, typically

discuss the fallout around the officialization of Montenegrin in Montenegro as well as its

links to the then impending declaration of independence of Montenegro. In this example,

Bećković, Serbian writer and member of the SANU Department for language and

literature, first ironizes a proposal for the introduction of English as a second official

language in Montenegro by Veselin Vukotić, a Montenegrin economist, by playing on the

stereotype of Montenegrins as lazy (“Montenegrin language would get some well-

deserved rest”). Following this, Bećković, typically of Serbian objections to the change

in language policy in Montenegro at the time, also denounces the change in the official

name of the language in Montenegro from Serbian into mother tongue by proposing that

it be called “step-mother tongue” instead. Both of these proposals were (rightly) seen by

Serbs in general and Serbian nationalists in particular, in both Montenegro and Serbia, as

a precursor to Montenegro’s eventual political independence. As can be seen, texts

representative of this discourse thematize ethnolinguistic identity and are directly relevant

to a discussion of ethnonationalism.

6.3.9 Factor 6: Contestation over language ownership and name. This

language-related discourse included the following salient collocates: Croatia, Croats,

Croatian, Serbs, academy, name, Serbian, Serbo-Croatian, war and call, and accounted

for 2.67% of the variation. The texts with the top twenty factor scores on Factor 6 are

listed in Table 32. Here, nine of the twenty top scoring articles were originally identified

123
as multivariate outliers. The principal area of overlap was with Factor 10 (Linguistics as

a science, lexicography, standardization and contestation).

Table 32

Top 20 Highest Scoring Articles on Factor 6 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-24-9-2005-45.txt 50.58
2 POL-20-8-2003-80.txt 30.11
3 POL-20-1-2006-76.txt 28.09
4 POL-1-10-2003-185.txt 28.07
5 POL-3-7-2006-192.txt 28.01
6 POL-10-7-2006-142.txt 26.19
7 POL-22-7-2006-55.txt 25.75
8 POL-11-3-2005-141.txt 25.75
9 POL-10-9-2005-127.txt 24.46
10 POL-25-4-2005-36.txt 21.04
11 POL-29-1-2005-21.txt 20.49
12 POL-13-8-2003-113.txt 20.05
13 POL-26-5-2008-48.txt 19.68
14 POL-15-7-2006-104.txt 19.56
15 POL-17-9-2005-80.txt 18.42
16 POL-7-2-2004-117.txt 18.42
17 POL-31-1-2003-1.txt 17.30
18 POL-27-10-2005-23.txt 16.98
19 POL-29-7-2006-13.txt 16.38
20 POL-18-1-2005-92.txt 15.92

Based on the salient variables and a qualitative examination of representative

texts, Factor 6 was interpreted as a discourse of contestation over language ownership

and name. The existence and prominence of this discourse were noted at several points

above. Note that although contestation here is mainly between Serbs and Croats, also

Bosnians and Montenegrins are featured prominently in most relevant texts. Similarly,

although this contestation can be multilateral, it is mostly directed at non-Serb Central

South Slavic ethnolinguistic identities. Text excerpt 11 (from POL-1-10-2003-185,

ranked 4 in Table 32) illustrates the discourse illustrative of articles scoring highly on

Factor 6.

Povodom izjave Stjepana Mesića koja je objavljena u štampi da Srbi u Hrvatskoj


ne govore srpski nego hrvatski, i to bolje od Hrvata, "Ćirilica" smatra da ta
izjava zaslužuje objašnjenje. Šta je za g. Mesića "hrvatski jezik": ono što su Ilirci
pozajmili od Vukovog (štokavskog, srpskog) jezičkog standarda prema

124
poznatom Bečkom književnom dogovoru (1850) ili ono što su Hrvati na osnovu
tog izvornog srpskog jezičkog standarda sačinili kao varijantu tog jezika za
današnji "hrvatski jezik"? Nesumnjivo je da Srbi u Hrvatskoj znaju da govore i
da pišu osim svog srpskog jezika i "novohrvatski", koji je izveden iz spomenutog
srpskog jezičkog standarda, a mnogi znaju i raniji hrvatski (čakavski i
kajkavski). Ako g. Mesić misli da Srbi uopšte ne govore svoj srpski jezik, to
može jedino da znači da su tamo srpski jezik i pismo zabranjeni. Ako misli da
Srbi u Hrvatskoj i nemaju svog srpskog jezika, to je u domenu šovinističke
farse, a ako su srpski jezik i ćirilica tamo i dalje zabranjeni, to bi moglo biti da on
tu istinu potvrđuje, pa bi mu trebalo odati neku vrstu priznanja.

In response to a statement by Stjepan Mesić [former President of Croatia]


published in the [Serbian] press that Serbs in Croatia do not speak Serbian but
Croatian, and rather better than Croats, “Cyrillic” [an association] contends that
that statement requires an explanation. What is “Croatian language” for Mr.
Mesić? That which the Illyrians borrowed from Vuk’s [Karadžić] (Štokavian,
Serbian) standard language according to the well-known Vienna literary
agreement (1850) or that which the Croats, based on this original Serbian standard
language, turned into present-day “Croatian language” as a variant of that
language? Undoubtedly, Serbs in Croatia, in addition to their Serbian language,
also know how to speak and write “New-Croatian”, which derives from the
aforementioned Serbian standard language, while many know also the earlier
Croatian (Čakavian and Kajkavian). If Mr. Mesić thinks that Serbs do not speak
their Serbian language at all, that can only mean that the Serbian language and
alphabet are forbidden over there. If he thinks that Serbs in Croatia do not have
their Serbian language, that is a Chauvinist farce, and if the Serbian language and
Cyrillic are still forbidden over there, that could mean that he is simply
confirming that truth, so he should be given some sort of recognition for it.

Here we see a typical example of the discourse of contestation mentioned above, with the

exception that this particular text also shows that contestation can go in the opposite

direction as well. In other words, rather than Serbs contesting other Central South Slavs’

ethnolinguistic identity as usual, we see a reaction to an apparent attempt to contest the

Serbian minority’s ethnolinguistic identity in Croatia by the then Croatian president.

Texts scoring highly on Factor 6, such as this Politika article from January 1, 2003, thus

typically discuss language and identity-related contestation between the different Central

South Slavic communities (and, again, most often Serbs and Croats) that has been going

on since the nineteenth century but which, for obvious reasons, has been particularly

125
intense since the breakup of Yugoslavia. Historically, the Croats were the only non-Serb

Central South Slavic ethnic group allowed to officially name and standardize their

language during the Serbo-Croatian era, so they have borne the brunt of Serbian

nationalist wrath since the breakup of Yugoslavia (as well as before) as they are seen as

the precursor/precedent that led the other two ethnic groups (Bosniaks and Montenegrins)

to demand linguistic separation. Evidently, texts representative of this discourse

thematize ethnolinguistic identity in a most pertinent way and are directly discursively

linked to texts representative of Factor 10, so they will be treated together in the

qualitative part of the discussion (see Section 6.6).

6.3.10 Factor 10: Linguistics as a science, lexicography, standardization and

contestation. This language-related discourse included the following salient collocates:

linguistic (jezički/lingvistički), linguist, SANU (Serbian Academy of Sciences and Arts),

dictionary, linguistics, scientific, Serbian, science, word, standard, Serbo-Croatian,

edition and expression, and accounted for 2.55% of the variation. The texts with the top

twenty factor scores on Factor 10 are listed in Table 33. Nine of the twenty top scoring

articles were originally identified as multivariate outliers. As noted above, the principle

area of overlap was with Factor 6 (Contestation over language ownership and name).

126
Table 33

Top 20 Highest Scoring Articles on Factor 10 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-9-2-2005-121.txt 35.57
2 POL-17-6-2006-95.txt 34.38
3 POL-1-10-2005-165.txt 30.87
4 POL-9-4-2005-127.txt 30.23
5 POL-5-1-2008-166.txt 27.77
6 POL-20-8-2003-80.txt 27.68
7 POL-25-10-2003-33.txt 27.36
8 POL-12-5-2008-144.txt 27.15
9 BLI-10-2-2005-552.txt 27.09
10 POL-14-9-2006-130.txt 26.90
11 POL-2-12-2005-197.txt 24.03
12 POL-6-6-2006-189.txt 22.39
13 POL-24-1-2006-47.txt 22.03
14 POL-24-6-2006-48.txt 21.53
15 POL-2-4-2004-169.txt 21.53
16 POL-12-4-2003-91.txt 21.20
17 POL-24-12-2006-45.txt 20.99
18 POL-2-6-2006-209.txt 20.10
19 POL-11-3-2005-141.txt 19.80
20 NIN-3-7-2008-182.txt 19.57

Based on the salient variables and a qualitative examination of representative

texts, Factor 10 was interpreted as a complex technical discourse on linguistics as a

science, lexicography, standardization and contestation. Text excerpt 12 (from POL-17-

6-2006-95, ranked 2 in Table 33) provides an illustration of the discourse typically found

in articles scoring highly on Factor 10.

Srpska lingvistika, oslobođena ranije obaveze o negovanju jezičkog zajedništva,


našla se pred važnim zadacima. Jezičke raspre kojima se stručna, a i laička
javnost ovih dana ponovo bavi (podstaknuta besedom akademika Dragoslava
Mihailovića) zapravo nisu nove. One datiraju od Vukovog vremena, ali, eto,
dosežu i do naših dana. Međutim, suština „problema" mahom se svodi(la)
najedno: da li je jezik kojim govore Srbi, Hrvati, Bošnjaci, Bosanci, Crnogorci
jedan, ali nejedinstven, i(li) jedinstven, ali nejednak i različit. Pri tome se obično
ističu dva kriterijuma: naučni (lingvistički) i nacionalni (socioemotivni). U tim
okvirima zastupaju se najčešće dva stava: (1) govornici datih jezičkih
(nacionalnih) entiteta imaju (moraju imati) svoj, autohtoni jezik i (2) jezik (pored
pisma i religijske pripadnosti) jeste osnovno obeležje nacionalnog legitimiteta i
identiteta naroda. Polazeći od nacionalnog principa, spori se mahom oko
imenovanja jezika, a kao argumenti naglašavaju - leksičke posebnosti. […] Nema
otuda (niti je bilo) ijednog ozbiljnijeg naučnog autoriteta koji bi mogao dokazati
da su jezici: srpski, hrvatski, bosanski/bošnjački, crnogorski ili dr. posebni,
autohtoni jezici. A u stvari je reč o varijetetima (varijantama) istog jezičkog

127
sistema, samim tim i modelima istog jezika. Isti jezički sistem I novija kritička
misao (osobito danas) najčešće se ne bavi razlikama u okviru jezičkog sistema,
već imenovanjima jezika, pri čemu je kritici osobito podvrgnut naziv „srpsko-
hrvatski". […] Neodrživa je njegova ocena da u Rečniku SANU „ima malo
leksike iz Srbije", mada nje zapravo ima najviše. Ono čega nema jeste - dovoljan
broj saradnika i savremenih sredstava za dalji rad na Rečniku SANU.

Freed from the earlier obligation of sustenance of linguistic unity, Serbian


linguistics is facing important tasks. The linguistic discussions (initiated by
academician Dragoslav Mihailović’s lecture) in which both the expert and lay
publics are again engaged these days are not really new. They go back to Vuk’s
time, but have extended to our days. However, the gist of the “problem” is:
whether the language spoken by Serbs, Croats, Bosniaks, Bosnians, Montenegrins
is one but not unified, or whether it is unified but unequal and different. Here,
two criteria are usually considered: the scientific (linguistic) one and the national
(socio-emotional) one. In this framework, there are usually two views: (1) the
speakers of the given linguistic (national) entities have (i.e., must have) their own
autochthonous language and (2) language (along with alphabet and religious
affiliation) is a basic symbol of national legitimacy and people’s identity.
Beginning with the national principle, the contestation mainly revolves around the
naming of the language, while the main arguments are lexical differences. […]
Therefore, there is no (nor has there ever been) credible scientific authority which
could prove that Serbian, Croatian, Bosnian/Bosniak, Montenegrin or other
varieties are separate, autochthonous languages. In fact, they are varieties
(variants) of the same linguistic system and thus models of the same language. A
single linguistic system The more recent critical thought (especially today) does
not ponder the differences within the linguistic system, but rather the naming of
the language, whereby the label “Serbo-Croatian” is especially criticized. […] His
opinion that the SANU dictionary contains “too little lexis from Serbia” is
unacceptable because it is in fact dominant. What is missing is a sufficient
number of research assistants and contemporary tools for further work on the
SANU dictionary.

As this excerpt suggests, texts scoring highly on Factor 10, such as this Politika article

from June 17, 2006, are typically longer and more complex than average newspaper texts

in this corpus. Top scoring texts here are characterized by a combination of references to

linguistics as a science, lexicography (e.g., the SANU unabridged dictionary of Serbian)

and language standardization issues, which often serve as arguments in the contestation

of authenticity of non-Serbian Central South Slavic glottonyms and separate

ethnolinguistic identities. Texts representative of this discourse often thematize Central

128
South Slavic ethnolinguistic identities overtly, and so are directly relevant to an analysis

of links between language-related discourses, language ideologies, and ethnonationalism.

6.3.11 Factor 7: Literature and publishing. This language-related discourse

included the following salient collocates: book, publisher, writer, literary, literature,

published, work, part, edition, poem and poetry, and accounted for 2.61% of the

variation. The texts with the top twenty factor scores on Factor 7 are listed in Table 34.

Note that only three of the twenty top scoring articles here were originally identified as

multivariate outliers. There was also no identifiable overlap between Factor 7 and any

other factors in this solution, suggesting a discourse separate from others identified here.

Table 34

Top 20 Highest Scoring Articles on Factor 7 (MV Outliers are in Bold)

Rank Article Factor score


1 POL-29-11-2004-10.txt 30.36
2 POL-24-1-2005-54.txt 30.24
3 POL-12-8-2006-121.txt 28.62
4 BLI-16-3-2008-788.txt 27.03
5 POL-28-3-2008-31.txt 25.85
6 POL-30-8-2008-12.txt 23.06
7 POL-26-7-2004-20.txt 22.03
8 POL-22-3-2008-74.txt 21.54
9 POL-5-8-2005-129.txt 21.10
10 POL-25-7-2005-41.txt 21.09
11 NIN-25-3-2004-285.txt 20.69
12 POL-30-11-2008-4.txt 20.66
13 POL-8-8-2008-115.txt 20.43
14 POL-17-8-2003-92.txt 20.36
15 POL-3-9-2003-163.txt 19.76
16 POL-24-5-2005-50.txt 19.52
17 POL-24-5-2004-41.txt 19.50
18 POL-21-3-2004-75.txt 19.26
19 POL-31-10-2008-7.txt 18.75
20 BLI-17-8-2008-372.txt 18.36

Based on the salient variables and a qualitative examination of representative

texts, Factor 7 was interpreted as a general discourse on literature and publishing. The

existence and prominence of this discourse were noted above. Text excerpt 13 (from

NIN-25-03-2004-285, ranked 10 in Table 34) illustrates the discourse typically found in

129
articles scoring highly on Factor 7.

Protekla decenija je jedan od najsnažnijih književnih odgovora dobila upravo u


poeziji Duška Novakovića, dobitnika nagrade “Vasko Popa” […]. Nagrada
“Vasko Popa” ušla je u desetu godinu postojanja tako što je jednoglasnom
odlukom žirija dodeljena pesniku Dušku Novakoviću za najbolju knjigu pesama
štampanu u 2003. godini. Nagrada je Novakoviću dodeljena za knjigu Izabrao
sam mesec koja je objavljena u izdanju Gradske biblioteke “Vladislav Petković
Dis” iz Čačka. Pre pisca knjige Izabrao sam mesec, nagradu “Vasko Popa” […]
dobili su Borislav Radović […]. Prvu knjigu pesama, Znalac ogledala,
Novaković je objavio 1976. godine. Ta knjiga je pre desetak godina doživela
drugo izdanje (Znalac ogledala i pridružene pesme) u kome su neke pesme
objavljene u drugoj, manje ili više izmenjenoj verziji […].

The poetry of Duško Novaković, the laureate of the “Vasko Popa” award, is one
of the most potent literary contributions of the last decade. […]. The tenth annual
“Vasko Popa” award was given to the poet Duško Novaković for the best book of
poetry printed in 2003 by a unanimous jury decision. Novaković received the
award for his book I chose the moon which was published by the city library
“Vladislav Petković Dis” in Čačak [an urban area in Serbia]. Before the author of
I chose the moon, the “Vasko Popa” award […] was given to Borislav Radović [a
[Serbian poet] […]. Novaković published his first book of poetry, The mirror
connoisseur, in 1976. The second edition of that book (The mirror connoisseur
and associated poems), in which some poems were published in new, more or less
changed versions […], was published about ten years ago.

Texts scoring highly on Factor 7, such as this NIN article from March 25, 2004, typically

feature reports on new literary editions, literary awards or events. Texts representative of

this discourse do not typically thematize ethnolinguistic identities or ethnonationalism,

and they do not exhibit any discursive links with texts representative of other

factors/discourses.

6.3.12 Factor 12: Linguacultural diplomacy, language, and culture. This

language-related discourse included the following salient collocates: center, cultural,

course, interest, culture, institute and Belgrade, and accounted for 1.73% of the variation.

The texts with the top twenty factor scores on Factor 12 are listed in Table 35. Eleven of

the twenty top scoring articles were originally identified as multivariate outliers here.

130
Similar to Factor 7, there were no identifiable discursive links between this discourse and

any other discourses identified here.

Table 35

Top 20 Highest Scoring Articles on Factor 12 (MV Outliers are in Bold)

Rank Article Factor score


1 BLI-21-11-2004-76.txt 35.09
2 POL-20-9-2005-64.txt 24.51
3 POL-9-4-2004-117.txt 24.48
4 VRE-8-1-2004-312.txt 22.21
5 POL-23-8-2006-50.txt 21.97
6 POL-25-11-2003-34.txt 19.77
7 POL-21-11-2004-60.txt 19.25
8 BLI-27-12-2003-8.txt 18.85
9 POL-8-2-2003-129.txt 17.93
10 POL-18-12-2003-94.txt 17.53
11 POL-31-1-2005-1.txt 17.44
12 POL-8-8-2004-111.txt 16.87
13 POL-2-9-2004-188.txt 16.81
14 POL-2-9-2003-166.txt 16.62
15 BLI-8-3-2006-608.txt 16.01
16 POL-9-9-2006-163.txt 15.82
17 POL-22-12-2004-54.txt 15.65
18 POL-4-11-2008-180.txt 14.66
19 POL-28-3-2008-31.txt 13.46
20 BLI-7-7-2005-302.txt 13.20

Based on the salient variables and a qualitative examination of representative

texts, Factor 12 was interpreted as a discourse on linguacultural diplomacy, language, and

culture. Text excerpt 14 (from BLI-21-11-2004-76, ranked 1 in Table 35) provides an

illustration of the discourse typically found in articles scoring highly on Factor 12.

Strani kulturni centri oduvek su bili prozor u svet i prilika da se ne samo nauče
strani jezici, već i da se bolje upozna kultura velikih nacija. Strani kulturni
centri su kod nas već decenijama sastavni deo kulturne ponude. […] Naši
sagovornici kažu da je najveći broj korisnika među mladima, naročito onih koji
pohađaju kurseve jezika. […] Po obimu literature, beogradski institut spada u
prvih pet naših instituta u svetu, a našu publiku čine većinom visokoobrazovani
ljudi - kaže Gudrun Krivokapić [head librarian, Belgrade Goethe Institute library].
Zbog velikog interesovanja, zaposleni u centrima imaju problem s prostorom,
ali su zato prezadovoljni odzivom.

Foreign cultural centers have always been a window onto the world and an
opportunity to not only learn foreign languages but also acquaint oneself better
with the cultures of the great nations. For decades, foreign cultural centers have
also been an integral part of our cultural scene. […] Our collocutors say that most
users are young people, particularly those attending language courses. […] By
131
library size, the Belgrade institute is in the top five of our major institutes
globally, and most of our audience is made up of highly educated people, says
Gudrun Krivokapić. Due to enormous interest, the staff at these centers are
dealing with a lack of space, but they are also very happy with the number of
visitors they get.

As can be seen from this excerpt from a Blic article from November 21, 2004, texts

scoring highly on Factor 12 typically discuss the offerings of embassy-sponsored cultural

centers in Belgrade and around Serbia, language courses and cultural events in particular.

Although texts representative of this discourse sometimes make references to nationalism

(e.g., “the cultures of the great nations”), they do not thematize ethnolinguistic identities

in the sense employed in this study.

To conclude this section, it has been shown that the twelve factors suggest twelve

(small ‘d’) language-related discourses, some of which are linked. Further, it has been

shown that, although most discourses identified here do feature references to Central

South Slavic identities, six of the twelve factors/discourses are clearly more pertinent for

an analysis of links between language-related discourse and language ideologies, on the

one hand, and ethnonationalism, on the other. Those six factors/(small ‘d’) discourses (2,

4, 6, 8, 10, 11) are further analyzed using quantitative and qualitative methods in the

following three sections (6.4-6.6).

6.4 Synchronic and Diachronic Variation in Language-related Discourses (Analysis

of Variance)

This section presents the results of analysis of synchronic and diachronic variation

in language-related discourses based on analysis of variance. Here, we focus on

individual factors interpreted as (small ‘d’) discourses above. Note, again, that only the

six factors (2, 4, 6, 8, 10, and 11) whose comparatively greater relevance for Central

132
South Slavic ethnonationalism(s) was established in Section 6.3 are analyzed here.

6.4.1 Variation by publication (synchronic). To examine variation by

publication the factor scores for each of the six selected language-related discourses (i.e.,

factors) of texts grouped by publication (Blic, NIN, Politika, and Vreme) were compared.

Descriptive statistics for each language-related discourse by publication are presented in

Table 36.

Table 36

Descriptive Statistics for Language-related Discourse Factor Scores by Publication

Factor Mean & SD


Blic NIN Politika Vreme
2 -0.170 0.420 -0.134 0.412 0.090 1.126 -0.177 0.352
4 -0.030 0.699 -0.048 0.780 0.037 0.995 -0.172 0.555
6 -0.201 0.438 0.034 0.606 0.049 1.046 -0.180 0.449
8 -0.015 0.828 -0.061 0.804 0.021 0.936 -0.006 1.072
10 -0.168 0.637 -0.043 0.770 0.074 0.971 -0.271 0.410
11 -0.131 0.497 -0.068 0.342 0.053 1.090 -0.075 0.304

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a significant

difference, (3, N = 943) = 11.902, p = .008. Pairwise comparisons showed that texts from

Politika oriented more positively on this language-related discourse than did texts from

Blic, although the difference did not reach the adjusted significance level.

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Officialization of Montenegrin 1 (Factor 4), the Kruskal-Wallis test was conducted and

indicated no significant difference, (3, N = 943) = 3.013, p = .390.

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Contestation over language ownership and name (Factor 6), the Kruskal-Wallis test was

conducted and indicated a significant difference, (3, N = 943) = 21.471, p = .000.

Pairwise comparisons showed that texts from NIN and Politika oriented significantly

133
more positively on this language-related discourse than did texts from Blic, while also

texts from NIN oriented significantly more positively on this discourse than did texts

from Vreme.

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Officialization of Montenegrin 2 (Factor 8), the Kruskal-Wallis test was conducted and

indicated no significant difference, (3, N = 943) = 1.321, p = .724.

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Linguistics as a science, lexicography, standardization and contestation (Factor 10), the

Kruskal-Wallis test was conducted and indicated a significant difference, (3, N = 943) =

16.707, p = .001. Pairwise comparisons showed that texts from Politika oriented

significantly more positively on this language-related discourse than did texts from Blic

and Vreme.

To evaluate the hypothesis that ‘publication’ could differentiate factor scores on

Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted and

indicated a significant difference, (3, N = 943) = 18.149, p = .000. Pairwise comparisons

showed that texts from Politika oriented significantly more positively on this language-

related discourse than did texts from Blic.

6.4.2 Summary of variation by publication. Discursive uniformity across

publications was thus attested only for Factors 4 and 8 (Officialization of Montenegrin 1

and 2) as texts from different publications were shown not to be significantly different in

this respect (i.e., they treat the issue of officialization of Montenegrin in similar ways).

Analysis of mean differences on all other factors showed that texts from Politika tended

to score significantly more highly than texts from Blic on all factors except Factor 2

134
(where the difference failed to reach the significance level adjusted for multiple pairwise

comparisons). Texts from Politika also scored significantly more highly than texts from

Vreme on Factor 10, while texts from NIN scored significantly more highly than texts

from either Blic or Vreme on Factor 6. This suggests some differences in the discursive

treatment of ethnolinguistic identities between broadsheets (Politika) and tabloids (Blic),

broadsheet dailies and some periodicals (Politika vs. Vreme), tabloid dailies and some

periodicals (Blic vs. NIN), as well as between periodicals themselves (NIN vs. Vreme).

Blic, and to a lesser extent Vreme, thus seem either to have devoted less attention to the

discourses suggested by Factors 2, 6, 10, and 11 than Politika and NIN, or to have treated

the theme of ethnolinguistic identities underlying the discourses suggested by these

factors in ways undetected by factor analysis here. Note, however, that this does not

necessarily mean the existence of significant ideological differences also.

6.4.3 Variation by year of publication (diachronic). To examine variation by

year of publication the factor scores for each of the six selected language-related

discourses (i.e., factors) of texts grouped by year of publication (2003, 2004, 2005, 2006,

and 2008) were compared. Descriptive statistics for each language-related discourse by

year of publication are presented in Table 37.

Table 37

Descriptive Statistics for Language-related Discourse Factor Scores by Year of

Publication

Factor Mean & SD


2003 2004 2005 2006 2008
2 0.01 0.97 0.01 0.74 -0.01 0.95 0.11 1.28 -0.14 0.55
4 -0.03 1.00 0.20 1.28 -0.03 0.55 -0.13 0.61 -0.06 0.66
6 -0.11 0.52 -0.04 0.67 0.23 1.41 0.07 1.06 -0.10 0.63
8 0.07 1.12 0.16 1.17 -0.11 0.63 -0.05 0.73 -0.15 0.44
10 0.02 0.76 -0.10 0.63 0.04 0.97 0.08 1.08 -0.03 0.96
11 -0.08 0.35 0.14 1.68 0.00 0.62 0.00 0.51 -0.07 0.36

135
To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a

significant difference, (4, N = 943) = 17.181, p = .001. Pairwise comparisons showed

that texts from 2003, 2004, and 2006 oriented significantly more positively on this

language-related discourse than did texts from 2008.

To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Officialization of Montenegrin 1 (Factor 4), the Kruskal-Wallis test was

conducted and indicated no significant difference, (4, N = 943) = 7.285, p = .122.

To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Contestation over language ownership and name (Factor 6), the Kruskal-Wallis

test was conducted and indicated no significant difference, (4, N = 943) = 2.939, p = .568.

To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Officialization of Montenegrin 2 (Factor 8), the Kruskal-Wallis test was

conducted and indicated no significant difference, (4, N = 943) = 4.020, p = .403.

To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Linguistics as a science, lexicography, standardization and contestation (Factor

10), the Kruskal-Wallis test was conducted and indicated no significant difference, (4, N

= 943) = 6.078, p = .193.

To evaluate the hypothesis that ‘year of publication’ could differentiate factor

scores on Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted

and indicated no significant difference, (4, N = 943) = 2.843, p = .584.

6.4.4 Summary of variation by year of publication. As might be expected,

then, a period of five to six years may not be long enough for many significant diachronic

136
differences to emerge. At the same time, a high degree of stability in a majority of

discourses identified here seems entirely plausible. Hence, we see very little change in

the discourses suggested by Factors 4, 6, 8, 10, and 11 during this period of time. The

only significant difference here is on Factor 2 between the years 2003, 2004, and 2006,

on the one hand, and the year 2008, on the other, which suggests a possible abatement of

interest in the issue of alphabet and therefore this facet of the discourse of endangerment

toward the end of this period. But this finding should be taken with caution as the

dynamics of discourse are volatile and interest in a particular issue can be quickly

rekindled by important events even after long periods of relative dormancy.

6.4.5 Variation by type of article (synchronic). To examine variation by type of

article the factor scores for each of the six selected language-related discourses (i.e.,

factors) of texts grouped by type of article (general newspaper articles vs. letters-to-the-

editor) were compared. Descriptive statistics for each language-related discourse by type

of article are presented in Table 38.

Table 38

Descriptive Statistics for Language-related Discourse Factor Scores by Type of Article

Factor Factor/Discourse Label Mean & SD


Newspaper articles Letters-to-the-editor
2 Cyrillic-only -0.06 0.84 0.87 1.57
4 Officialization of Montenegrin 1 0.01 0.92 -0.19 0.69
6 Contestation over language ownership & name -0.02 0.88 0.25 1.04
8 Officialization of Montenegrin 2 0.02 0.93 -0.27 0.46
10 Ling. as a science, lexicography, stand. & contestation 0.00 0.89 -0.01 0.70
11 Officialization of Bosnian 0.00 0.92 -0.04 0.56

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Cyrillic-only (Factor 2), the Mann-Whitney U test was conducted and indicated a

significant difference, U = 10.852, p = .000. This result indicates that letters-to-the-editor

oriented significantly more positively on this language-related discourse than did general

137
newspaper texts.

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Officialization of Montenegrin 1 (Factor 4), the Mann-Whitney U test was conducted

and indicated a significant difference, U = 21.607, p = .002. This result indicates that

general newspaper texts oriented significantly more positively on this language-related

discourse than did letters-to-the-editor.

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Contestation over language ownership and name (Factor 6), the Mann-Whitney U test

was conducted and indicated a significant difference, U = 22.885, p = .013. This result

indicates that letters-to-the-editor oriented significantly more positively on this language-

related discourse than did general newspaper texts.

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Officialization of Montenegrin 2 (Factor 8), the Mann-Whitney U test was conducted

and indicated a significant difference, U = 20.793, p = .000. This result indicates that

general newspaper texts oriented significantly more positively on this language-related

discourse than did letters-to-the-editor.

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Linguistics as a science, lexicography, standardization and contestation (Factor 10),

the Mann-Whitney U test was conducted and indicated no significant difference, U =

26.159, p = .349.

To evaluate the hypothesis that ‘type of article’ could differentiate factor scores

on Officialization of Bosnian (Factor 11), the Mann-Whitney U test was conducted and

indicated no significant difference, U = 27.452, p = .748.

138
6.4.6 Summary of variation by type of article. Newspaper articles and letters-

to-the-editor are not homogeneous categories. In addition to journalists, newspaper

articles are often written by public figures and professionals or experts. Similarly, in

addition to the general readership, letters-to-the-editor are often written by public figures

and concerned professionals. Therefore, they cannot be taken as authentic or reliable

expressions of discursive and practical consciousness (in the sense of Kroskrity, 1998).

However, to the extent that newspaper articles tend to be written by journalists and other

figures arguably exhibiting discursive consciousness, and letters-to-the-editor tend to be

written by the general readership arguably exhibiting practical consciousness, these two

text types can be considered as an approximation of discourses and ideologies

characteristic of either group.

Here, we see a convergence between discourses in circulation in the two groups in

Factors 10 and 11, and significant differences in Factors 2, 4, 6, and 8. Letters-to-the-

editor scored significantly more highly than did newspaper articles on Factor 2,

suggesting a heightened interest in alphabet and thus greater currency of this facet of the

discourse of endangerment among the general readership (and the readers of dailies in

particular, see above). It is also worth noting that some of this variation is due to the

reliance of fringe groups on letters-to-the-editor to air their views, as in the case of

associations demanding more stringent legal protections for the Cyrilic alphabet (for an

example of an alternative discourse unmasking these, see Section 6.6.1). Similarly,

letters-to-the-editor scored significantly more highly than did newspaper articles on

Factor 6, suggesting an internalization of the discourse of contestation of ethnolinguistic

identity purveyed primarily by linguists, as will be shown further below. Newspaper

139
articles, on the other hand, scored significantly more highly on Factors 4 and 8 which

suggests that the change of language policy in Montenegro attracted comparatively more

interest among journalists and language experts and which may be another indicator of a

dwindling interest in language-related issues among the public. Lastly, there was no

significant difference between the groups in terms of Factors 10 and 11. Arguably, this is

somewhat surprising considering that Factor 10 suggests a technical discourse, whereas

Factor 11 suggests an administrative issue, but it is also an indication of just how

dominant the (big ‘D’) discourse of contestation is, as well as to what extent it is

internalized.

However, it should again be noted that discursive differences attested here do not

necessarily mean that significant ideological differences exist also, as examples arising

from qualitative analysis below suggest that the same essentialist language ideology can

be identified in the dominant language-related discourses regardless of publication, year

of publication, or type of article. The results of the final quantitative technique applied

here, cluster analysis, are reported in the following section.

6.5 Cluster Analysis

This section presents the results of cluster analysis which was used to further

analyze covariance patterns. First, the discursive links between factors (e.g., Factors 4

and 8 both suggest a common discourse on the officialization of Montenegrin) were

tested by examining mean factor scores for each cluster. Second, the composition of each

cluster was examined with respect to three categorical independent variables: publication,

year of publication, and type of article.

140
6.5.1 Preferred cluster solution and scoring patterns by factor and cluster.

After a range of cluster solutions was examined, a six-cluster solution was identified as

optimal for this data set. Table 39 shows the descriptive statistics for all twelve factors as

predictor variables in a six-cluster solution.

Table 39

Descriptive Statistics for Twelve Factors (Predictor Variables) in a Six-cluster Solution

N Mean SD Std. 95% Confidence Interval for Mean Minimum Maximum


Error Lower Bound Upper Bound
F1 1 775 -2.0691 3.41779 .12277 -2.3101 -1.8281 -4.91 20.84
2 72 -3.2274 2.11004 .24867 -3.7233 -2.7316 -4.91 9.29
3 134 18.7734 10.77776 .93106 16.9318 20.6150 -1.11 53.51
4 159 -3.3504 1.55064 .12297 -3.5932 -3.1075 -4.91 3.62
5 78 -.3074 3.96494 .44894 -1.2013 .5866 -4.91 17.30
6 39 -3.1540 2.23610 .35806 -3.8789 -2.4291 -4.91 7.60
Total 1257 .0000 7.98397 .22519 -.4418 .4418 -4.91 53.51
F2 1 775 -.5533 3.63599 .13061 -.8097 -.2969 -2.34 41.91
2 72 -.9219 2.42671 .28599 -1.4921 -.3516 -2.34 11.25
3 134 -1.6581 1.38071 .11927 -1.8940 -1.4221 -2.34 4.94
4 159 -1.6931 1.46543 .11622 -1.9226 -1.4636 -2.34 9.43
5 78 -.2607 2.79064 .31598 -.8899 .3685 -2.34 9.36
6 39 25.8176 10.02103 1.60465 22.5691 29.0660 12.72 52.37
Total 1257 .0000 5.83630 .16462 -.3230 .3230 -2.34 52.37
F3 1 775 -.7485 1.84445 .06625 -.8785 -.6184 -1.37 15.49
2 72 -.8432 1.67400 .19728 -1.2366 -.4498 -1.37 10.42
3 134 6.5618 8.48330 .73285 5.1123 8.0114 -1.37 33.56
4 159 -1.1899 .51443 .04080 -1.2705 -1.1093 -1.37 1.49
5 78 -.0936 2.46480 .27908 -.6493 .4621 -1.37 10.53
6 39 -1.0775 .81042 .12977 -1.3402 -.8148 -1.37 2.17
Total 1257 .0000 3.93664 .11103 -.2178 .2178 -1.37 33.56
F4 1 775 -.8464 2.34708 .08431 -1.0119 -.6809 -2.16 12.00
2 72 -.3007 2.79879 .32984 -.9584 .3570 -2.16 14.47
3 134 .7560 5.01901 .43358 -.1016 1.6136 -2.16 26.10
4 159 -1.0648 2.10815 .16719 -1.3950 -.7346 -2.16 9.26
5 78 10.1722 11.15590 1.26316 7.6570 12.6875 -2.16 50.56
6 39 -1.2256 1.59152 .25485 -1.7416 -.7097 -2.16 3.76
Total 1257 .0000 4.67919 .13198 -.2589 .2589 -2.16 50.56
F5 1 775 .3354 5.46140 .19618 -.0497 .7205 -1.82 71.03
2 72 -.3526 2.49785 .29437 -.9396 .2343 -1.82 12.39
3 134 -.5388 2.44067 .21084 -.9559 -.1218 -1.82 12.50
4 159 -1.0269 2.45412 .19462 -1.4113 -.6425 -1.82 25.84
5 78 -.6604 1.85520 .21006 -1.0786 -.2421 -1.82 7.09
6 39 1.3449 4.27696 .68486 -.0416 2.7313 -1.82 15.75
Total 1257 .0000 4.60547 .12990 -.2548 .2548 -1.82 71.03
F6 1 775 -.2630 3.29035 .11819 -.4950 -.0310 -2.64 15.92
2 72 8.6821 12.20590 1.43848 5.8138 11.5503 -2.64 50.58
3 134 -2.2035 .99503 .08596 -2.3735 -2.0334 -2.64 2.15
4 159 -1.2229 1.46333 .11605 -1.4521 -.9937 -2.64 4.67
5 78 .1517 2.98166 .33761 -.5206 .8240 -2.64 10.43
6 39 1.4513 3.97872 .63711 .1615 2.7410 -2.64 12.49
Total 1257 .0000 4.65077 .13118 -.2573 .2573 -2.64 50.58
F7 1 775 -1.7200 2.38766 .08577 -1.8884 -1.5517 -4.03 6.94
2 72 1.2139 4.61241 .54358 .1300 2.2977 -4.03 18.13
3 134 -2.8041 1.57486 .13605 -3.0731 -2.5350 -4.03 4.27
4 159 11.3777 5.60051 .44415 10.5005 12.2550 3.22 30.36
5 78 -1.0456 3.20802 .36324 -1.7689 -.3223 -4.03 11.92
6 39 -2.7216 3.23525 .51805 -3.7703 -1.6728 -4.03 15.47
Total 1257 .0000 5.41351 .15269 -.2996 .2996 -4.03 30.36

141
N Mean SD Std. 95% Confidence Interval for Mean Minimum Maximum
Error Lower Bound Upper Bound
F8 1 775 -.7218 2.35914 .08474 -.8882 -.5555 -2.03 17.95
2 72 -1.1108 1.81867 .21433 -1.5381 -.6834 -2.03 6.00
3 134 -.2289 2.00141 .17290 -.5709 .1131 -2.03 11.20
4 159 -1.6072 1.28089 .10158 -1.8078 -1.4065 -2.03 8.63
5 78 11.5404 9.81900 1.11178 9.3265 13.7542 -2.03 39.05
6 39 .6529 4.92769 .78906 -.9444 2.2503 -2.03 16.33
Total 1257 .0000 4.46109 .12583 -.2469 .2469 -2.03 39.05
F9 1 775 -.6976 2.76180 .09921 -.8923 -.5028 -3.00 26.40
2 72 -.1311 2.70570 .31887 -.7669 .5047 -3.00 9.13
3 134 6.8956 9.77446 .84438 5.2254 8.5658 -3.00 65.71
4 159 -1.5884 1.21759 .09656 -1.7792 -1.3977 -3.00 2.68
5 78 -.4929 2.47942 .28074 -1.0520 .0661 -3.00 7.75
6 39 -2.1271 1.47721 .23654 -2.6060 -1.6483 -3.00 2.89
Total 1257 .0000 4.65937 .13142 -.2578 .2578 -3.00 65.71
F10 1 775 -.9072 3.50589 .12594 -1.1544 -.6599 -4.03 17.62
2 72 14.9900 8.98762 1.05920 12.8780 17.1020 -4.03 35.57
3 134 -2.7043 1.63691 .14141 -2.9840 -2.4246 -4.03 5.18
4 159 -.5382 3.21347 .25485 -1.0415 -.0348 -3.89 10.35
5 78 .8504 4.05708 .45937 -.0643 1.7652 -4.03 17.53
6 39 .1380 3.73877 .59868 -1.0740 1.3499 -4.03 11.72
Total 1257 .0000 5.42277 .15295 -.3001 .3001 -4.03 35.57
F11 1 775 -.1661 2.48777 .08936 -.3415 .0093 -.86 25.26
2 72 .0630 1.70718 .20119 -.3382 .4641 -.86 9.25
3 134 1.7206 7.44665 .64329 .4482 2.9930 -.86 39.91
4 159 -.6243 1.11762 .08863 -.7994 -.4493 -.86 10.02
5 78 .0578 1.64304 .18604 -.3127 .4282 -.86 6.72
6 39 -.2975 1.29823 .20788 -.7183 .1234 -.86 6.01
Total 1257 .0000 3.25725 .09187 -.1802 .1802 -.86 39.91
F12 1 775 .0491 3.88165 .13943 -.2246 .3228 -2.32 35.09
2 72 .1018 2.56701 .30253 -.5015 .7050 -2.32 10.16
3 134 -.5944 2.43468 .21032 -1.0104 -.1784 -2.32 9.73
4 159 .1818 2.69452 .21369 -.2402 .6039 -2.32 13.46
5 78 .5309 4.56334 .51670 -.4980 1.5598 -2.32 19.77
6 39 -.9244 1.97424 .31613 -1.5643 -.2844 -2.32 4.80
Total 1257 .0000 3.56106 .10044 -.1971 .1971 -2.32 35.09

142
Table 40

Results of ANOVA for Twelve Factors (Predictor Variables) in a Six-cluster Solution

Sum of df Mean F Sig.


Squares Square
F1 Between Groups 53475.049 5 10695.010 503.230 .000
Within Groups 26587.142 1251 21.253
Total 80062.191 1256
F2 Between Groups 27123.226 5 5424.645 433.370 .000
Within Groups 15659.203 1251 12.517
Total 42782.429 1256
F3 Between Groups 6526.184 5 1305.237 126.204 .000
Within Groups 12938.200 1251 10.342
Total 19464.384 1256
F4 Between Groups 8948.203 5 1789.641 120.681 .000
Within Groups 18551.713 1251 14.830
Total 27499.916 1256
F5 Between Groups 407.264 5 81.453 3.884 .002
Within Groups 26232.951 1251 20.970
Total 26640.215 1256
F6 Between Groups 6453.179 5 1290.636 77.948 .000
Within Groups 20713.632 1251 16.558
Total 27166.811 1256
F7 Between Groups 24409.693 5 4881.939 492.572 .000
Within Groups 12398.816 1251 9.911
Total 36808.509 1256
F8 Between Groups 11315.020 5 2263.004 206.930 .000
Within Groups 13681.049 1251 10.936
Total 24996.069 1256
F9 Between Groups 7346.547 5 1469.309 92.270 .000
Within Groups 19920.849 1251 15.924
Total 27267.396 1256
F10 Between Groups 17899.320 5 3579.864 235.270 .000
Within Groups 19035.179 1251 15.216
Total 36934.499 1256
F11 Between Groups 484.071 5 96.814 9.431 .000
Within Groups 12841.680 1251 10.265
Total 13325.751 1256
F12 Between Groups 110.525 5 22.105 1.748 .121
Within Groups 15816.960 1251 12.643
Total 15927.485 1256

Table 41

Discursive Links Between Twelve Factors Based on Highest Mean Scores for Six Clusters

Cluster Cluster Label Factors


1 Other (no factors)
2 Contestation over language ownership & name F6 F10
3 Language education incl. Offic. of Bosnian F1 F3 F9 F11
4 Literature & publishing F7
5 Officialization of Montenegrin 1 & 2 F4 F8 F12*
6 Cyrillic-only & Minority language rights F2 F5
* p = .121

As can be seen from Table 39, then, the mean factor scores are different for each

of the six clusters. Also, there is a significant difference between the mean factor scores

143
for all factors except Factor 12 (Table 40). The mean plots further show that none of the

factors has a high mean score for Cluster 1, while the highest Factor 12 mean score is for

Cluster 5 but without a statistically significant difference. Table 41 shows how the

remaining factors grouped by cluster based on their highest mean scores.

As mentioned in the previous section, texts loading on Factors 6 and Factor 10

(Contestation over language ownership and name; Linguistics as a science, lexicography,

standardization and contestation) share a focus on the contestation of Central South

Slavic ethnolinguistic identities and thus these two factors are grouped in Cluster 2.

Similarly, texts loading on Factors 1, 3, 9 and 11 (Language education; Entrance exams;

Foreign language education; Officialization of Bosnian) all share a focus on (language)

education and thus are grouped in Cluster 3. Here, three generally ethnolinguistically

neutral language education-related factors (Factors 1, 3, and 9) are joined by

ethnolinguistically-specific Factor 11 because the officialization of Bosnian is discussed

in the context of language education and the (small ‘d’) discourses represented by all four

factors share language education-related lexis. Factor 7 (Literature and publishing) with

the highest mean score for Cluster 4 is not grouped with any other factors. This was

expected as texts loading on this factor typically dealt with subject matter that was

unrelated to subject matter in texts loading on other factors. Factors 4 and 8

(Officialization of Montenegrin 1 and 2), grouped in Cluster 5, make a logical set since

both deal with the same issue, albeit with slightly different foci, as explained above.

Finally, Factors 2 and 5 (Cyrillic-only, Minority language rights) are grouped in Cluster 6

on account of a tendency in the discourse on the endangerment of the Cyrillic alphabet to

include references to minority rights (as illustrated in Section 6.3.5).

144
The results of cluster analysis shown above thus suggest two conclusions. First,

lexical covariance extends beyond the individual factors and suggests the existence of

discursive links between factors, i.e. small ‘d’ discourses shared by two or more factors.

The most obvious example of this is the discourse on the officialization of Montenegrin

(Cluster 5), but other (and perhaps qualitatively somewhat different) discursive links

between factors can be inferred from the other clusters as well (e.g., language education

discourse in Factors 1, 3, 9, and 11). Second, despite its obvious usefulness, lexical

covariance alone is not sufficient to identify (big ‘D’) discourses and particularly

ideologies. This can be seen in the way Factors 4, 6, 8, 10, and 11, all of which bear

traces of an overarching (big ‘D’) discourse of contestation (see Section 6.6), are grouped

in separate clusters. Although each of these factors points to an aspect of the discourse of

contestation suggested by previous analyses and thus confirms its existence and extent,

quantitative correlational analysis seems unable to capture the underlying link and thus

the (big ‘D’) discursive (and ultimately ideological) construct itself. This, as has been

suggested widely, must be done through qualitative analysis.

6.5.2 Synchronic and diachronic clustering patterns. Cluster analysis can also

help us focus qualitative analysis further by cross-tabulating data to examine whether

texts cluster in any particular patterns with respect to independent variables such as, in

our case, publication, year of publication, and type of article (general newspaper articles

vs. letters-to-the-editor, Tables 42-44).

145
Table 42

Cluster Membership by Publication for the Six-cluster Solution

Cluster Cluster label Publication Total


Blic NIN Politika Vreme
1 Other (no factors) 104 147 477 47 775
2 Contestation over lang. owner. & name 3 5 63 1 72
3 Lang. educ. incl. Offic. of Bosnian 35 3 89 7 134
4 Literature & publishing 8 22 123 6 159
5 Officialization of Montenegrin 1 & 2 10 7 60 1 78
6 Cyrillic-only & Minority language rights 4 0 35 0 39
Total 164 184 847 62 1257

Table 43

Cluster Membership by Year of Publication for the Six-cluster Solution

Cluster Cluster label Year Total


2003 2004 2005 2006 2008
1 Other (no factors) 173 157 145 158 142 775
2 Contestation over lang. owner. & name 12 5 24 20 11 72
3 Lang. educ. incl. Offic. of Bosnian 49 42 20 9 14 134
4 Literature & publishing 36 28 33 31 31 159
5 Officialization of Montenegrin 1 & 2 21 34 7 6 10 78
6 Cyrillic-only & Minority language rights 6 7 10 11 5 39
Total 297 273 239 235 213 1257

Table 44

Cluster Membership by Type of Article for the Six-cluster Solution

Cluster Cluster label Type of article Total


Newspaper articles Letters-to-the-editor
1 Other (no factors) 685 90 775
2 Contestation over lang. owner. & name 63 9 72
3 Lang. educ. incl. Offic. of Bosnian 125 9 134
4 Literature & publishing 156 3 159
5 Officialization of Montenegrin 1 & 2 76 2 78
6 Cyrillic-only & Minority language rights 23 16 39
Total 1128 129 1257

Expectedly, a majority of texts (775/1,257 or 61.7%) are grouped in Cluster 1 for

which none of the twelve factors had high mean scores; this cluster thus probably

represents the majority of the variance in the data (65.87%) unaccounted for by this

factor solution. Total cluster membership further shows that the largest number of the

remaining texts (159) are grouped in Cluster 4, which is not surprising considering the

146
general nature of Factor 7 (Literature and publishing) which had the highest mean score

for this cluster. The second largest cluster was Cluster 3 (134 texts) with four of the

twelve factors (1, 3, 9, 11) having their highest mean scores for that cluster as well. This

was followed by Cluster 5 (grouping Factors 4, 8, and 12, and 78 texts), Cluster 2

(grouping Factors 6 and 10, and 72 texts), and Cluster 6 (grouping Factors 2 and 5, and

39 texts).

Table 42 shows that there is no clear relationship between ‘publication’ and

cluster membership, although it should be noted that Cluster 6 comprises no texts from

the weeklies (NIN, Vreme). This suggests either that weeklies were less interested in the

Cyrillic alphabet and minority language rights (Factors 2 and 5) as compared with the

dailies (Blic, Politika), or that the readers of weeklies themselves had less interest in these

issues and hence did not write any letters-to-the-editor pertaining to them; it is also

possible that fringe groups’ activists were less interested in the weeklies as a vehicle for

their message on the endangerment of the Cyrillic alphabet so they didn’t contribute any

letters-to-the-editor either. Further, the largest numbers of Blic and Vreme articles

grouped outside of Cluster 1 (35 and 7, respectively) were concerned with language

education (Cluster 3, Factors 1, 3, 9, and 11), whereas the largest numbers of NIN and

Politika articles (22 and 123, respectively) were concerned with literature and publishing

(Cluster 4, Factor 7). Interestingly, Politika articles contributed the largest proportion of

texts pertaining to Central South Slavic identities and suggesting discourses of

endangerment and contestation (63/72 for Cluster 2/Factors 6 and 10, 60/78 for Cluster

5/Factors 4 and 8, and 35/39 for Cluster 6/Factors 2 and 5), but this finding is hardly

surprising considering that most 5+ hits articles (847/1,257) come from Politika.

147
Table 43 shows that there is no clear relationship between ‘year of publication’

and cluster membership, although, based on the numbers of articles, discourses

manifested in texts grouped in Clusters 3 and 5 (pertaining to language education and the

officialization of Bosnian and Montenegrin) seem to have been more pertinent in 2003

and 2004, while discourses manifested in texts grouped in Clusters 2 and 6 (pertaining to

contestation, Cyrillic alphabet, and minority language rights) seem to have been more

pertinent in 2005 and 2006. Arguably, these patterns simply reflect peaks in (public)

interest in these issues and the concomitant fluctuation in discursive activity and numbers

of articles (rather than any qualitative differences in terms of the big ‘D’ discourses).

Finally, Table 44 shows that there is no clear relationship between ‘type of article’

and cluster membership, although Clusters 4 and 5 (pertaining to literature and

publishing, and the officialization of Montenegrin, respectively) do exhibit the lowest

numbers of letters-to-the-editor, whereas discourses manifested in texts grouped in

Cluster 6 (pertaining to the Cyrillic alphabet and minority language rights) again seem to

have been comparatively more pertinent in letters-to-the-editor, as indicated by analysis

of variance in Section 6.4.5 also. These patterns suggest that writers of the letters-to-the-

editor thematizing language (whether lay people, activists, or experts) seem to have had

surprisingly little interest in the officialization of Montenegrin, but this is perhaps another

indicator of a general waning of public (and particularly lay) interest in issues of

ethnolinguistic identity toward the end of this period (see Figure 2), possibly due to a

fatigue with nationalism and the inevitability of Montenegrin independence; that writers

of letters-to-the-editor would also have little interest in the comparatively less

controversial (and more technical) issues of literature and publishing is less surprising.

148
The status of the Cyrillic alphabet and minority language rights, on the other hand, are

issues that seem to have been closer to home for many writers of letters-to-the-editor, and

particularly fringe groups’ activists, so the comparatively high degree of interest in issues

represented by Cluster 6 exhibited in this type of text is expected here. Arguably, the

discourse on the endangerment of the Cyrillic alphabet is part of a broader discourse of

declining language standards (arising from standard language ideology) which is widely

attested in the public (and, again, particularly lay) language-related discourses in many

other societies (see, e.g., Johnson & Ensslin, 2007, especially Part II), and Serbian society

does not seem to be an exception in this respect.

Based on a lack of any clear-cut patterns here, we can conclude that, despite some

observed differences in small ‘d’ discourses represented by factors, there is a degree of

overlap between the discursive profiles of the different publications, and between the

general newspaper articles and letters-to-the-editor, as well as relative stability in small

‘d’ discourses during this period. This is coupled by a high degree of both synchronic

and diachronic stability in (big ‘D’) language-related discourses (endangerment and

contestation, see Section 6.6). With the presentation of the results of quantitative

analyses complete, we now turn to a qualitative analysis of texts identified as

representative by factor scores.

6.6 Critical Discourse Analysis/Discourse-historical Approach

This section presents the results of a qualitative analysis of representative texts.

As noted above, qualitative analysis was based on and informed by the results of

quantitative analysis; the results of qualitative analysis were in turn checked against the

results of quantitative analysis. Initial qualitative analysis consisted of basic content and

149
thematic analysis informed by the results of quantitative analyses above; this was

followed by an analysis of topoi in the CDA/DHA tradition. The findings are presented

by factor (i.e., small ‘d’ language-related discourse); again, factors grouped in the same

clusters are treated together (with the exception of Factors 2 and 11 as their ‘partner’

factors, Factor 5 and Factors 1, 3, and 9, respectively, were shown to be marginal in terms

of Central South Slavic ethnolinguistic identities and ethnonationalism). Note that the

texts cited below can be identified as representative by looking up their file codes in the

tables showing top scoring texts for individual factors in Section 6.3 (for a key to the

tables, see Section 4) . Presentation is organized as follows: presented first are excerpts

in original Serbian, followed by English translations; file codes are only given with the

original text (and translations when repeated without the original text).

6.6.1 Excerpts from texts representative of Factor 2. Factor 2 (Cyrillic-only)

points to texts discussing the issues of alphabet choice and status in Serbia. These are

most often discussed in the context of changes to the constitution that were under

consideration during this period,

Usvajanje novog ustava Srbije, o kome se već dugo priča, podstaklo je i raspravu
o službenom pismu naše države. (BLI-2-9-2006-272)

The adoption of the new constitution of Serbia, which has been discussed for a
long time now, initiated a discussion about the official alphabet in our country.

as well as with respect to the process of accession to the European Union,

Ima mišljenja, veli on, da sa svojim pismom “ne možemo u Evropu i svet” i da
stoga moramo preći na latinicu. (POL-16-3-2003-93)

Some people think, he says, that “we cannot join Europe and the world” with our
alphabet so we have to switch to the Latin [alphabet].

150
Političke potrebe danas zahtevaju unifikaciju latiničkog pisma svugde u svetu.
(POL-22-8-2006-59)

Today, political needs demand a unification of the Latin alphabet everywhere in


the world.

Pobornici stava da srpski jezik treba da ima dva službena pisma, latinicu i
ćirilicu, smatraju se naprednijim, ističući da je upotreba latinice ono što će nas
približiti Zapadu. (BLI-2-9-2006-272)

The proponents of a two-alphabet (Latin and Cyrillic) solution for the Serbian
language consider themselves more progressive, insisting that the use of the Latin
alphabet is what will bring us closer to the West.

As noted above, Factor 2 suggests a discourse and ideology of endangerment,

Nema naroda u svetu koji danas drži do sebe i do svojih kulturnih i nacionalnih
korena a da toliko zapostavlja svoje pismo koliko to čini srpski narod. (POL-16-
3-2003-93)

Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does.

objasnio da je Avramov na ideju da se ukine latinica došao zbog toga što se


uplašio za ćirilicu. (POL-16-4-2005-91)

explained that Avramov [the mayor of Šid, an urban area in the province of
Vojvodina] came up with the idea to outlaw the Latin alphabet because he was
afraid for the Cyrillic.

which is supported by frequent (selective and flawed) comparisons to language situations

elsewhere in the world,

[d]a li je autorima ovog predloga poznat još neki narod na svetu koji za svoj jezik
koristi tuđe pismo (POL-11-2-2005-108)

do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language

opšte pravilo: jedan jezik – jedno pismo, jer na dva pisma u svom jeziku ni jedan

151
drugi narod u svetu ne piše (POL-21-9-2004-72)

general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language.

The importance of alphabet to the national identity is made explicit,

Ćirilično pismo je Srbima deo identiteta, to je njihova važna odrednica bez koje
oni ne bi više bili ono što su bili i što jesu. […] nije reč o ličnoj upotrebi pisma,
već je reč o kolektivnom i osnovnom ljudskom pravu Srba na svoj jezik i svoje
pismo. (POL-16-3-2003-93)

For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […] this is not about personal use
of alphabet, but rather about a collective and basic human right of Serbs to their
language and their alphabet.

as is the ultimate goal,

Svuda u svetu pismo i jezik većinskog naroda mora biti na prvom mestu, i mi
tražimo da tako bude i kod nas. (POL-21-9-2004-72)

Everywhere in the world the language and the alphabet of the majority must come
first and we ask that this be so here also.

In addition, the Latin alphabet is often, implicitly or explicitly, identified with the Croats

(and more rarely, Bosniaks),

srpski jezik nikada nije pisan latinicom, sve do trenutka kada je zloupotrebljena
dobra volja da se južnoslovenskim saplemenicima Hrvatima pomogne da i oni
konačno dobiju svoje pismo (POL-11-2-2005-108)

Serbian language had never been written using the Latin alphabet until the
moment when the good will to help the South Slavic co-tribesmen Croats finally
get their own alphabet was abused

as well as non-South Slavic minorities which are routinely discussed in terms of their

relative population sizes,

152
Avramov je mislio da nacionalne manjine koje ne prelaze 15 odsto ukupnog broja
stanovništva nemaju pravo na službenu upotrebu maternjeg jezika. (POL-16-4-
2005-91)

Avramov [the mayor of Šid, an urban area in the province of Vojvodina] thought
that national minorities which do not cross the threshold of 15 percent of the total
population do not have a right to official use of their mother tongue [and
alphabet].

Interestingly, however, there is also evidence of discourses offering alternative

argumentation (which nevertheless subscribe to the view of endangerment),

U poslednje vreme u “Politici” je objavljeno nekoliko tekstova u kojima se tvrdi


da latinica koju Srbi koriste u svom jeziku nije srpsko, već “hrvatsko pismo”. To
je, nesumnjivo, pokušaj da se ponovo pokrene kampanja protiv latinice i protiv
“zatiranja srpske ćirilice”, koju predvode članovi nekoliko udruženja za zaštitu
ćirilice i njihovi istomišljenici. Mada sam Srbin i na srpskom pišem isključivo
ćirilicom, smatram da su njihovi stavovi i rad pogrešni, nekorisni, pa čak i štetni
za srpski jezik i Srbiju. […] Članovi “ćiriličarskih” udruženja i njihovi
istomišljenici beskorisno mašu parolama, kao što su “srpska latinica ne postoji”,
“latinica nije srpsko već hrvatsko pismo” i sl. i postavljaju pitanje postoji li još
neki narod na svetu koji za svoj jezik koristi tuđe pismo? Naravno da ne postoji,
pošto svaki narod pismo koje koristi smatra svojim, bez obzira na to gde, kada i
kako je ono nastalo. Poznato je, na primer, da su simboli japanskog pisma nastali
u Kini, ali Japanci i drugi narodi kažu da oni pišu japanskim pismom. Uostalom,
francusko, englesko i holandsko pismo je identično, pa niko nikog ne optužuje za
korišćenje tuđeg pisma. Oni koji uporno tvrde da je latinica koju koriste Srbi
“tuđe pismo”, ne znaju ili ne žele da znaju da u svetskoj lingvistici ne postoji
ekskluzivno vlasništvo nad bilo kojim pismom. Pisma pripadaju svim jezicima
koji ih koriste, bilo u celosti, ili samo delimično. Stoga latinica koju Srbijanci uče
i koriste već 90 godina, a ostali Srbi i znatno duže i kojom se danas stalno ili
pretežno služi 80 odsto Srba, ne može biti tuđe, već samo srpsko pismo. […] Na
zaštiti jezika i pisma moraju se pokrenuti i učiniti odgovornim mnogi segmenti
društva. Tek onda se može očekivati da će kroz određeno vreme ćirilica postati ne
jedino, već prvo pismo srpskog jezika. Inače, satanizacijom latinice etiketama
“tuđa” i “hrvatska” neće se zaštititi srpska ćirilica. (POL-8-4-2005-138)

Recently, “Politika” published several texts which claim that the Latin alphabet
which Serbs use in their language is not a Serbian but rather “a Croatian
alphabet”. This is, undoubtedly, an attempt to again start a campaign against the
Latin alphabet and against the “destruction of the Serbian Cyrillic”, which has
been led by members of several associations for the protection of the Cyrillic and
their supporters. Although I am a Serb and although I use only Cyrillic to write in
Serbian, I think that their attitude and work are wrong, useless, and even

153
damaging to the Serbian language and Serbia. […] The members of the
“Cyrillic” associations and their supporters uselessly throw slogans around such
as that “Serbian Latin alphabet does not exist”, “Latin is not a Serbian but
Croatian alphabet” and so on, and ask if there is another nation in the world which
uses somebody else’s alphabet in its language. Of course, there isn’t because
every nation considers its own the alphabet it uses regardless of where, when and
how it came to be. It is well known, for example, that the characters of the
Japanese alphabet originate from China, but the Japanese and others say they use
the Japanese alphabet to write. Besides, the French, English and Dutch alphabets
are identical and yet no one accuses anyone of using somebody else’s alphabet.
Those who insist that the Latin alphabet used by Serbs is “somebody else’s
alphabet” do not know or do not want to know that exclusive ownership over any
alphabet does not exist in world linguistics. Alphabets belong to all languages
that use them, either in whole or in part. Therefore, the Latin alphabet learned
and used for as long as 90 years by Serbians and even longer by other Serbs, and
by 80 percent of Serbs all the time or part of the time, cannot be somebody else’s
but only a Serbian alphabet. […] In order to protect a language and alphabet
many different segments of society must be activated and made responsible.
Using labels such as “somebody else’s” and “Croatian” to satanize the Latin
alphabet will not protect the Serbian Cyrillic.

6.6.2 Excerpts from texts representative of Factors 4 and 8. Factors 4 and 8

(Officialization of Montenegrin 1 and 2) point to a discourse on the issue of change in

language policy in Montenegro whereby the name of the official language was first

changed from Serbian to mother tongue (prior to independence) and then from mother

tongue to Montenegrin (upon independence). Factor 4 points to texts discussing protests

against the change in policy by students and professors of Serbian in Montenegro, some

of whom ultimately lost their jobs because of their refusal to implement the new policy,

Profesori srpskog jezika i književnosti nikšićke gimnazije, koji već šesti dan
bojkotuju izvođenje nastave zbog preimenovanja nastavnog predmeta u
“maternji”, dobili su podršku kolega iz škole, koji su u pismu upućenom
Ministarstvu prosvete Crne Gore zapretili opštim bojkotom nastave, ukoliko se
njihovim kolegama uruče najavljeni otkazi. (POL8-9-2004-157)

The professors of the Serbian language and literature at the Nikšić [an urban area
in Montenegro] high school, who have boycotted instruction in protest against the
renaming of their subject to ‘mother tongue’ for six days now, received support
from their colleagues from the school, who in their letter sent to the Ministry of

154
Education of Montenegro threatened a general boycott if their colleagues are fired
as has been announced.

The protesters and various other actors in Serbia itself such as journalists, linguists, and

politicians object to the policy on historical, cultural, practical, and (pseudo-) scientific

grounds,

Proteste su uputili i prosvetni radnici iz Kotora, ističući da se na tim prostorima


vekovima govori srpski. (POL-2-2004-166)

Also the Kotor [an urban area in Montenegro] education workers protested,
emphasizing that Serbian has been spoken in their area for centuries.

taj čin doprineo ostvarivanju težnji vlasti i dela ljudi u Crnoj Gori da se odavde
protera sve što bi moglo da asocira na srpstvo (POL-15-4-2004-89)

that act contributed to the realization of the plan of the authorities and a part of the
people in Montenegro to banish from here all associations to Serbhood

Štrajkači glađu su pozvali sve studente, prosvetne radnike i đake Nikšića da im


se pridruže i zajednički dignu glas u „odbrani onog što nam je sveto”. (POL-15-4-
2004-89)

The hunger strikers called on all students and education workers of Nikšić [an
urban area in Montenegro] to join them and raise their voices in “the defense of
what is sacred”.

nije im problem žrtvovati struku i nauku, istoriju i tradiciju, uneti zabunu i haos u
nastavni i školski sistem, a time i u društvo u celini. (POL-7-72004-109)

they have no problem sacrificing profession and science, history and tradition, or
introducing confusion and chaos into the educational system and therefore the
society as a whole.

Vijeće smatra da u programima za osnovnu i srednju školu ne može stajati naziv


maternji jezik, zbog toga što je neprecizan, lingvistički problematičan i
neutemeljen i izazvao bi brojne naučno-stručne i praktične probleme... (POL-2-
2004-166)

155
The council holds that the label mother tongue cannot be introduced into curricula
for elementary schools and high schools because it is imprecise, linguistically
problematic and baseless and because it would cause numerous scientific and
professional as well as practical problems…

portraying it as mere ‘politicking’,

Naš Odsek broji oko 300 studenata i svi smo jedinstveni da ne dozvolimo
mešanje najprizemnije politike u fundamentalne naučne i lingvističke principe.
(POL8-9-2004-157)

Our department has around 300 students and we are all of one mind not to allow
the meddling of basest politics in fundamental scientific and linguistic principles.

pokušaj normiranja crnogorskog jezika „politikantski projekat, proizvod


političke manipulacije, koji nema istorijsku, tradicionalno-kulturnu, simboličku,
naučnu, niti jezičku zasnovanost” (POL-5-9-2008-165)

the attempt of norming the Montenegrin language “a politicking project, a product


of a linguistic manipulation which has no historical, traditional, cultural,
symbolic, scientific, or linguistic basis.

In addition, objections are also raised by relying on (again, selective and flawed)

comparisons with the outside world,

Ako Amerikancima ne smeta engleski jezik, ne vidim razloga zbog čega bi nekom
u Crnoj Gori smetalo ime jezika srpski, ili srpskohrvatski. (POL-7-7-2004-109)

If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian.

In addition to texts about the protests, Factor 8 points also to texts which discuss the

policy in a wider societal context. The argumentation, however, is similar. There is a

discourse of historical (in-) authenticity,

Svi istorijski izvori, književnost i celokupna kulturna baština Crne Gore ćiriličke
su provinijencije i svedoče da je jezik Crnogoraca bio i jeste srpski jezik. (POL-
31-3-2004-2)

156
All historical sources, literature and the entire cultural heritage of Montenegro is
Cyrillic and testifies that the language of Montenegrins has always been and is the
Serbian language.

U Zakonu kralja Nikole, koji ima 83 člana, navedeno je 13 predmeta koji će se


izučavati u osnovnoj školi. Na prvom mestu je nauka hrišćanska, na drugom
srpska istorija i na trećem predmet srpski jezik. (POL-29-3-2004-25)

In King Nicholas I’s Law,35 which has 83 articles, 13 elementary school subjects
are mentioned. First is Christian doctrine, second is Serbian history, and third is
the subject of Serbian language.

Pokušaj je to i uvođenja nepostojećeg jezika, takozvanog crnogorskog. (BLI-20-


9-2004-204)

This is an attempt to introduce a non-existing language, the so-called


Montenegrin.

which includes explicit comparisons of the Montenegrin authorities to historical invaders

and colonial powers,

Prvi put je Austrougarska, ukinula srpski i ćirilicu, drugi put su Italijani 1941.
godine naložili da se uvede maternji jezik, a sada to čini crnogorska vlast.
(POL-31-3-2004-2)

The first time Serbian and Cyrillic were outlawed by the Austro-Hungary, the
second time in 1941 the Italians ordered a change to mother tongue, and now this
is being done by the Montenegrin authorities.

Ono što Đukanović hoće do sada niko nije ostvario, ni turski osvajači. (BLI-20-9-
2004-204)

What Đukanović [then President of Montenegro] wants has never been


accomplished by anyone, not even the Turkish invaders.

The theme of (symbolic) historical violence is also carried through to the present,

to nosi opasnost prekrajanja istorije, s jedne, i ubijanje duha, bića i stvaralaštva


naroda, s druge strane (POL-31-3-2004-2)

157
this means a danger of changing history, on the one hand, and of killing the spirit,
the being and the creativity of the people, on the other

protest zbog kršenja Ustava i nasilja nad jezikom (BLI-9-10-2004-161)

protest against the violations of the Constitution and violence against language

Očito, čine to nasilnom promjenom identiteta Crne Gore. (BLI-20-9-2004-204)

Obviously, they are doing this through a forced change of the identity of
Montenegro.

Similar to texts loading highly on Facor 4, argumentation is partly (pseudo-) scientific,

Poslednjih godina žestoke euforije crnogorskog ultranacionalizma, događanja su


dovela i do poremećaja, u izgovoru, a stvoren je i crnogorski “knjiški” jezik,
autora profesora dr Vojislava Nikčevića. On je napravio i nova dva slova za dva
glasa, koji se mogu čuti u crnogorskim lokalizmima. To je, međutim, uzeto kao
osnov političke ujdurme za stvaranje samostalnosti, odnosno potpunog ukidanja
jezičkog rodoslova sa srpstvom. [...] Kompetentni stručnjaci, lingvisti od
naučnog autoriteta, još od vremena Vuka Karadžića, tvorca književnog srpskog
jezika tvrde da je jezik Crnogoraca, odnosno Srba jedinstven jezik – objašnjava
akademik Dašić. – Jezik može biti, a ne mora, nazvan i po državi. Ne sporim
pravo onima koji žele da svoj jezik nazivaju crnogorski, jer svako ima pravo da
jezik kojim govori naziva po svom uverenju i osećanju. Samo tvrdim, a to naučno
argumentovano dokazuju i lingvisti, da ne postoje naučni, lingvistički, istorijski i
socio-kulturni razlozi za preimenovanje srpskog u crnogorski jezik. Srpskim
jezikom u Crnoj Gori govore, osim Crnogoraca i Srba i muslimani i Bošnjaci.
(POL-31-3-2004-2)

The recent years of euphoria of Montenegrin ultra-nationalism brought deviations


in accent, while also a Montenegrin “literary” language has been created by
author professor Dr. Vojislav Nikčević [well-known Montenegrin linguist]. He
created two new letters for two phonemes which can be heard in Montenegrin
localisms. However, that was taken as a basis for political shenanigans around
independence, that is a complete erasure of linguistic kin with Serbhood. […]
Competent experts, linguists with scientific authority, have claimed since the time
of Vuk Karadžić, the creator of the literary Serbian language, that the language of
Montenegrins and Serbs is the same, academician Dašić [of the Montenegrin
Academy of Sciences and Arts] explained. – A language can be, but doesn’t have
to be, named after a state. I do not deny the right to those who want to name their
language Montenegrin, because everyone has the right to name their language
according to their beliefs and feelings. I’m only saying, and linguists have

158
provided scientific argumentation for this, that there are no scientific, linguistic,
historical or sociocultural reasons to rename Serbian into Montenegrin. Besides
Serbs and Montenegrins, Serbian is spoken in Montenegro also by Muslims and
Bosniaks.

„crnogorski planeri i programeri nesrećno odabrali termin maternji jezik” (POL-


29-3-2004-25)

“Montenegrin planners and programmers made an unfortunate choice by opting


for the term mother tongue”

Ultimately, as can be seen below, discourses and ideologies ostensibly about

language are often revealed to be about conflicts, societal, political, religious, cultural,

scientific or otherwise (Gal, 1998, p. 323),

„veštačke montaže identiteta naroda sa ovih prostora” (POL-17-7-2006-92)

“artificial montage of identity of the people in this area”

„otvara proces asimilacije Srba i da će vlast, kroz otvaranje jezika i pitanje


položaja crkve, kroz diskriminaciju prema srpskom narodu, želeti, u narednom
periodu da taj narod prevede u ono u što ona želi – da postanu ljudi koji će se
nacionalno iskazivati kao Crnogorci, koji govore crnogorskim jezikom i
pripadaju nepostojećoj crnogorskoj crkvi” (POL-17-7-2006-92)

“a process of assimilation of Serbs has begun and the authorities will, through the
raising of the questions of language and the status of the church, through
discrimination against the Serbian people, want to transform that people into what
they want it to be – to become people who declare their nationality to be
Montenegrin, who speak the Montenegrin language and belong to the non-
existing Montenegrin church”

6.6.3 Excerpts from texts representative of Factor 11. Similar to Factors 4 and

8, Factor 11 (Officialization of Bosnian) suggests a discourse on a change in language

policy. The difference is that, unlike Montenegrin in Montenegro, Bosnian was

introduced (and thus recognized/officialized) in Serbia as a minority language, largely as

159
a result of the Council of Europe and European Union requirements pertaining to

minority rights. Factor 11 thus points to texts discussing the then pending recognition of

Bosnian as a minority language and its introduction as a subject in elementary schools in

areas with a Bosniak majority, as well as the resistance to this on the part of various

political and academic actors in Serbia. Similar to the officialization of Montenegrin, the

discussions around the introduction of Bosnian are characterized primarily by a discourse

of contestation,

Prosvetni odbor Skupštine Srbije je zaključio da je ministar prosvete Srbije


Slobodan Vuksanović prekoračio zakonska ovlašćenja i protivno zaključcima
ovog odbora i stavu stručne javnosti odobrio izvođenje nastave iz predmeta
bosanski jezik sa elementima nacionalne kulture. (POL-15-1-2005-107)

The educational board of the Serbian parliament has concluded that the minister
of education Slobodan Vuksanović [then Serbian Minister of Education] had
exceeded his legal authority by approving instruction in the subject Bosnian
language with elements of national culture.

Ministar i njegov pomoćnik su istakli da bosanski jezik ne postoji, da propisi ne


predviđaju službenu upotrebu tog jezika, da nastavni planovi i programi i
udžbenici za taj predmet nisu odobreni, da taj predmet, po slovu zakona i
Pravilnika o nastavnim planovima i programima, ne može u ovoj školskoj godini
da bude ni redovni (obavezni), ni izborni, niti fakultativni predmet. (POL-11-11-
2004-127)

The minister and his assistant said that the Bosnian language did not exist, that
regulations did not foresee official use of that language, that curricula and
textbooks for that subject had not been approved, that that subject, according to
law and the Rulebook on curricula, could be neither a compulsory nor an elective
nor an optional subject in this school year.

which is mostly about the name of the language rather than the right to a minority status

itself,

Njihov zahtev bi mogao da se okarakteriše i kao manji od onoga što već uživaju
Albanci, Hrvati, Mađari... Samo kada bi bosanski jezik postojao. Greška ili
namera – Ne znam zašto su tražili da se uči bosanski, a ne bošnjački. Možda je

160
greška. (POL-12-11-2004-113)

Their request could also be characterized as less than what is already enjoyed by
Albanians, Croats, Hungarians… If only the Bosnian language existed. Error or
intent – I don’t know why they requested that Bosnian and not Bosniak be taught.
Perhaps it’s an error.

Zvanično ime jezika može da bude samo bošnjački jezik, odnosno da proizilazi iz
priznatog etnonima Bošnjaci, a ne “bosanski”. BiH je zemlja u kojoj žive tri
ravnopravna naroda i Bošnjaci ne treba da uzurpiraju pravo na bosansko ime.
Samo nekoliko primera, koji pokazuju da se u svetu koriste prvobitni nazivi za
jezike koji su u upotrebi dva ili više naroda. Tako, austrijski narod, koji ima
državu hiljadu godina, govori nemačkim jezikom, a ne austrijanskim ili
austrijskim. Švajcarci nemačkog porekla govore nemačkim, a ne švajcarskim
jezikom. Američki narod svoj jezik naziva engleskim, a ne angloameričkim ili
američkim. (POL-10-1-2005-134)

The official name of the language can only be Bosniak, deriving from the
recognized ethnonym Bosniaks, and not ‘Bosnian’. Bosnia-Herzegovina is a
country with three equal nations and Bosniaks should not usurp the right to the
Bosnian name. Just a couple of examples which show that for languages used by
two or more nations the original name is used around the world. Thus, the
Austrian nation, which has had a state for a thousand years, speaks German and
not Austrian. The Swiss of Germanic origin speak German, not Swiss. American
people call their language English, not Anglo-American or American.

Potom je na sednici pročitano stručno mišljenje koje je Prosvetnom odboru


dostavio Odbor za standardizaciju srpskog jezika SANU, iz koje se između
ostalog može zaključiti da se na srpskom bosanski jezik kaže bošnjački, a na
bosanskom – bosanski. (POL-15-1-2005-107)

After that an expert opinion was read in the meeting which had been provided to
the Educational board by the SANU Board for the standardization of the Serbian
language from which it can be concluded that in Serbian Bosnian means Bosniak,
and in Bosnian – Bosnian.

There is also evidence of a similar (pseudo-) scientific discourse,

Ako jezikoslovci kažu da bosanski jezik postoji (kao da je nauka o jeziku od juče
pa se ne zna koji jezici postoje na Balkanu i u svetu), onda pravnici treba da ga
pretoče u paragrafe. (POL-10-1-2005-134)

If linguists say that the Bosnian language exists (as if linguistics was a recent

161
development so we didn’t know which languages existed in the Balkans and
around the world), then lawyers need to turn it into paragraphs.

Podsetili su da je ministar svojevremeno rekao da bosanski jezik ne postoji, da


mora da sačeka stručnjake za jezik da kažu o tome kojim jezikom govore
Bošnjaci. (POL-15-1-2005-107)

They reminded that the minister said at one time that the Bosnian language did
not exist, that he had to wait for an expert opinion on what language was spoken
by Bosniaks.

Jezik bošnjačke nacionalne zajednice u Srbiji, čije je uvođenje kao izbornog


predmeta u prva dva razreda osnovne škole najavilo republičko Ministarstvo
prosvete, može se, prema srpskom jezičkom standardu, zvati isključivo bošnjački
jezik, poručio je Odbor za standardizaciju srpskog jezika. Srpska nauka o jeziku
je nedvosmislena u tome da se jezički standard Bošnjaka u srpskom jeziku može
označiti samo sintagmom bošnjački jezik, istakao je predsednik Odbora za
standardizaciju srpskog jezika akademik Ivan Klajn, u odgovoru skupštinskom
Odboru za prosvetu. On je naglasio da srpska nauka, međutim, ne može, ni kada
bi htela, utvrđivati naziv jezika u bošnjačkom jezičkom standardu. “Ona to ne
može činiti uprkos tome što su se čelnici bošnjačkog naroda i bošnjačkog
jezičkog standarda, uvodeći naziv jezika u raskoraku s nazivom naroda, opredelili
za zbunjivanje dobronamernih ljudi u zemlji i inostranstvu, koji se, i preko naziva
jezika, mogu, koliko-toliko obavestiti o jasnoći kategorija i meritumu stvari”,
rekao je profesor Klajn. Prema njegovim rečima, nelogično bi bilo da se za
građane Bosne i Hercegovine, Bosance i Hercegovce, svrstane u tri nacionalne
zajednice – Srbe, Hrvate i Bošnjake, koji govore trima jezicima – srpskim,
hrvatskim i bosanskim uvodi zvanični “bosanski jezik”. To bi značilo
uspostavljanje “bosanskog” kao državnog jezika, dok bi srpski i hrvatski, po toj
logici, imali status manjinskih jezika. (POL-16-2-2005-82)

The language of the Bosniak national minority in Serbia, whose introduction as an


elective in the first two grades of elementary school has been announced by the
republic Ministry of Education, can, according to the Serbian standard language,
be called exclusively Bosniak language, the Board for the standardization of the
Serbian language said. Serbian linguistics clearly says that only the syntagma
Bosniak language can be used in Serbian for the standard language used by
Bosniaks, Chair of the Board for the standardization of the Serbian language,
academician Ivan Klajn [well-known Serbian linguist and member of SANU],
said in response to the parliamentary Education board’s inquiry. He also said that
Serbian linguistics could not, even if it wanted to, determine the name of the
language in the Bosniak standard language. “Serbian linguistics cannot do this
despite the fact that the leaders of the Bosniak people and Bosniak standard
language decided to confuse people in the country and abroad by opting for a
language name that was in discrepancy with the name of the people, but people

162
can nevertheless inform themselves about the clarity of categories and the
meritum of things via the name of the language, said professor Klajn. According
to him, it would be illogical to introduce ‘Bosnian’ as the official language for the
citizens of Bosnia-Herzegovina, Bosnians and Herzegovinans, members of three
[ethnic] national communities who speak three languages – Serbian, Croatian and
Bosnian. That would mean an introduction of Bosnian as the state language,
while Serbian and Croatian, according to this logic, would have the status of
minority languages.

However, the Bosniak minority is also sometimes given a voice, albeit very rarely, as in

the following example,

Ovo je za našu kulturu i za Bošnjake Sandžaka izuzetan istorijski događaj – rekao


je tim povodom autor Alija Džogović, podsećajući da je “bosanski jezik
zabranjen pre skoro sto godina”. (POL-26-10-2004-29)

For our culture and for the Bosniaks of Sanjak [area in Southwest Serbia with a
Bosniak majority] this is an exceptional historical event – said author Alija
Džogović [Bosniak textbook author in Sanjak/Serbia] on the occasion, reminding
that “the Bosnian language was outlawed almost a hundred years ago.”

while the discourse of contestation is sometimes, if very rarely, subverted by politicians

such as the then Minister of Education, Slobodan Vuksanović,

Bosanski, a ne bošnjački zbog toga što su se, objasnio je, građani izjasnili da je
njihov jezik bosanski, što je to tradicija i zato što se u Sarajevu, gde je bio sa
predsednikom Srbije Borisom Tadićem, uverio da bosanski jezik postoji na
lingvističkoj karti. (POL-10-12-2004-127)

Bosnian and not Bosniak because, as he explained, the citizens opted for Bosnian
as their language, because that’s the traditional name and because he had an
opportunity during a trip to Sarajevo with Serbian President Boris Tadić to see for
himself that the Bosnian language existed on the linguistic map.

6.6.4 Excerpts from texts representative of Factors 6 and 10. Finally in this

section, Factors 6 and 10 (Contestation over language ownership and name, Linguistics

as a science, lexicography, standardization and contestation) suggest a more general

discourse of Central South Slavic ethnolinguistic identity-related contestation, with

163
particular emphasis on the role of the Serbian Academy of Science and Arts (SANU).

Factor 6 points to texts discussing contestation over language ownership and its name as

well as linguacultural and ethnic authenticity, primarily involving Serbs and Serbian on

the one hand, and Croats and Croatian on the other, as in the following excerpts,

Društvo srpske slovesnosti, Srpsko učeno društvo i Srpska kraljevska


akademija predstavljaju tri čina u izrastanju najautoritativnije srpske naučne
institucije. Za sve te tri institucije važili su stavovi da su Srbi južnoslovenski
narod koji govori svojim jezikom, srpskim, dakle koji je blizak drugim
slovenskim jezicima ali i različit od njih i da Srba ima tri vere: pravoslavne,
rimokatoličke i muhamedanske. (POL-10-9-2005-127)

The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speaks its own, Serbian language, which is close to other
Slavic languages, but also different from them, as well as that Serbs had three
faiths: [Eastern] Orthodox, Roman-Catholic and Mohammedan.

Sve srpsko i hrvatsko bilo je pomešano. Pokazaće se kasnije da je to bila velika


greška. Jer će se sve to zajedničko, na novim i neprirodnim osnovama, ubrzo
početi da se deli. Ta deoba, projektovana sa hrvatske strane, prirodno išla je na
srpsku štetu. Ona je podrazumevala da Srbima u kulturi ostaje samo ono što su
stvorili pravoslavci srpskohrvatskog jezika. (POL-24-9-2005-45)

Everything Serbian and Croatian was mixed together. This will turn out to be a
big mistake because everything that was common, on a new and unnatural basis,
would soon begin to divide. That division, projected from the Croatian side,
naturally was at Serbian expense. It meant that Serbian culture could keep only
what had been created by Orthodox speakers of the Serbo-Croatian language.

Najmanji je po štetnosti problem što se neslućeno kasni u izdavanju tomova


Rečnika. Mnogo teže od toga pogađa što se posrnuće koje je zahvatilo srpsku
jezičku nauku odmah po smrti Vuka Karadžića (1864) nastavlja i danas. Srpski
lingvisti su, primenjujući srpski jezik (koji se tako zvao i u vreme poslednjeg
njegovog reformatora) u srpskohrvatski, ušli u period u kome je napušten
vukovski put u nazivanju i izgrađivanju jezika srpskog naroda. A da taj period još
traje potvrđuje činjenica da srpski lingvisti nastavljaju da u Rečniku zovu svoj
jezik “srpskohrvatski” i posle nestanka “srpskohrvatskog jezika”. (POL-08-8-
2003-80)

164
The delays in the publishing of the different volumes of the Dictionary are the
least problem in terms of damage. It is much more of a problem that the downfall
of the Serbian linguistic science that began immediately after the death of Vuk
Karadžić (1864) continues today. Renaming the language from Serbian (which is
what it was called during the time of its last reformer) into Serbo-Croatian,
Serbian linguists entered a period during which the Vuk’s path in the naming and
development of the language of the Serbian people was abandoned. That that
period still continues is confirmed by the fact that Serbian linguists continue to
call their language “Serbo-Croatian” in the Dictionary even after the demise of the
“Serbo-Croatian language”.

The name of the language is the most prominent point of contention, while the

discourses attested above (a discourse of ethnolinguistic identity-related contestation, a

[pseudo-] scientific discourse on language and linguistics) are merged with other similar

discursive elements into a discursive formation contesting the linguacultural authenticity

and ethnolinguistic identity (and thus, implicitly, political legitimacy) which rests on

questionable historical narratives, pseudo-scientific linguistic arguments, and selective

and transparently flawed comparisons to language situations elsewhere in the world.

Kada je na Saboru Hrvatske 1861. godine pokrenuto pitanje naziva službenog


jezika, predlagano je da bude: „hrvatsko-slavonsko-srbski”, „hrvatsko-
slavonski”, „hrvatsko-srbski”, „hrvatski ili srbski”, „hrvatski”, „srbski” i
„narodni u trojednoj kraljevini jezik”. […] Sabor je izglasao Zakonski članak po
kojem je službeni jezik nazvan „jugoslavenskim”. Srbi nisu bili zadovoljni
takvim rešenjem. […] U jugoslovenstvu koje im je ponuđeno u nazivu jezika
nepogrešivo su sagledali vid velikohrvatstva, kojim je trebalo izbrisati srpsko
ime, srpsko nacionalno osećanje, pa i samo srpsko nacionalno biće. […]
Privlačnim i prividno zadovoljavajućim nazivom i za Srbe i za Hrvate, iz već
pomenutih razloga, izostavljeno je srpsko ime. Jugoslovenskim imenom
prikrivena je velikohrvatska težnja. Tim imenom Srbe je žedne trebalo prevesti
preko vode, trebalo ih je postepeno, ali dosledno brisati iz svakodnevnog života
Hrvatske, lišiti ih političke individualnosti i učiniti ih sastavnim delom
hrvatskog „političkog” naroda. (POL-3-7-2006-192).

When, at the 1861 Croatian Assembly, the issue of the name of the official
language was brought up, it was suggested that it be: “Croato-Slavonic-Serbian”,
“Croato-Slavonic”, “Croato-Serbian”, “Croatian or Serbian”, “Croatian”,
“Serbian” or “people’s language in the three-nation Kindgom”. […] The
Assembly adopted a law according to which the official language was called

165
“Yugoslav”. Serbs were not satisfied with such a solution. […] In the
Yugoslavhood that was offered them, they unmistakably detected a form of
Greater-Croatianhood, the aim of which was to erase the Serbian name, Serbian
national identity, and even the Serbian national being itself. […] For the reasons
mentioned, the Serbian name was thus excluded via a proposal that was attractive
and seemingly satisfying for both Serbs and Croats. The Greater-Croatian
tendency was thus concealed by a Yugoslav name. That name was supposed to
trick the Serbs, they were supposed to be slowly but steadily erased from the
everyday life of Croatia, to deprive them of political individuality and make them
an integral part of a Croatian “political” people.

Po toj tezi, Vuk Karadžić je za osnov standardnog srpskog književnog jezika uzeo
jezik kojim su govorili pravoslavni istočni Hercegovci. A oni su, po toj hrvatskoj
nacionalističkoj teoriji, u stvari Hrvati prevedeni u pravoslavlje, tako da su Srbi
“ukrali” Hrvatima jezik koji danas zovu srpski, pa zato sada ima toliko problema
sa tim jezicima. (POL-10-7-2006-142)

According to this thesis, Vuk Karadžić took the language spoken by [Eastern]
Orthodox East-Herzegovinans as the basis of the standard Serbian literary
language. And they, according to this nationalist theory, were in fact Croats
converted to [Eastern] Orthodox Christianity, so Serbs “stole” the language they
call Serbian today from Croats, which is why there are so many problems with
these language now.

Bilo je u tom njihovom nazivlju, imenovanju i prikazivanju jezika mnogo


dvosmislica, smicalica i podvala. Govorili su da je to jedan, jedinstven, isti i
zajednički jezik. Ali to još ništa ne govori čiji je to jezik. Da, jeste jedan, ali
srpski, jeste jedinstven, ali ne i hrvatski, jeste zajednički, ali samo po upotrebi,
no nikako ne i po pripadnosti i poreklu. Ali sve to (što je isti, zajednički i jedan)
ne može biti razlog za dvočlano ili višečlano imenovanje jezika, ili za potpuno
preimenovanje srpskog jezika u hrvatski. Jezik može dobiti ime samo po narodu
čiji je to jezik, ali ne i po imenima naroda koji se tim jezikom služe. Engleski
jezik takođe je jedan, zajednički i isti jezik za sve narode koji njime govore, ali se
zna čiji je taj jezik i kako se on zove, bez obzira ko i gde govori njime. On je uvek
samo engleski i kad se njime govori u Sjedinjenim Državama, Kanadi, Australiji,
Novom Zelandu ili na bilo kojem kraju sveta. Takvi su još nemački, španski i
portugalski jezik. Jezik nije onoga ko tim jezikom govori, nego onoga ko je taj
jezik stvarao i stvorio. Srpski narod je vekovima stvarao svoj jezik. Hrvati nisu
stvarali taj jezik. Oni su ga dobili i preuzeli gotovog, sa svim odlikama koje je
srpski jezik već imao. (POL-22-7-2006-55)

There were many tricks in their labeling, naming and representing the language.
They said it was one, unified, the same and common language. But that doesn’t
say whose language that is. Yes, it is one, but Serbian, it is unified, but not also

166
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. But all this (it being the same, common and one) cannot be reason enough
to dual-label or multiple-label the language, or to entirely rename the Serbian
language into Croatian. A language can only be named after the people it belongs
to, but not after the names of the peoples who also use it. English is also one,
common and the same for all people who speak it, but it is well known whose
language it is and what it’s name is, regardless of who speaks it and where. It is
always only English even when it is spoken in the United States, Canada,
Australia, New Zealand or anywhere else in the world. Such are also German,
Spanish, and Portuguese language. A language does not belong to him who
speaks it, but to him who created it. Serbian people have created their language
for centuries. The Croats did not create that language. They got it and took it
over ready-made, with all the characteristics that the Serbian language already
had.

Nauka i politika Tako se u nauku o jeziku umešala politika: Nauka je neporecivo


utvrdila da je štokavski Vukov jezik srpski, politika je tražila da bude i hrvatski.
Ali kada su se hrvatski lingvisti osilili da drsko odbace srpski jezik iz naziva i da
taj srpski jezik nazovu hrvatskim, i samo hrvatskim, srpski lingvisti, pod
pritiskom već preživelih političkih ideja i shvatanja, i dalje uporno i tvrdoglavo
nazivaju svoj jezik i srpskim i hrvatskim (srpskohrvatskim). (POL-22-7-2006-
55)

Science and politics Thus politics began to meddle in science: Science undeniably
determined that the Vuk’s Štokavian language is Serbian, politics demanded that
it also be Croatian. But when Croatian linguists tyranically throw the Serbian
language out of the name and call that Serbian language Croatian, and Croatian
alone, Serbian linguists, under the pressure of outdated political ideas and
philosophies, continue to stubbornly call their language both Serbian and Croatian
(Serbo-Croatian).

Srpski lingvisti, dakle, nikako da shvate da više nema Jugoslavije, ni kraljeve ni


Brozove, u kojoj je manipulacija i u nauci bila sve, da više za sva vremena nema
“srpskohrvatskog jezika”, dvočlanog imena jezika nema više nigde u Evropi i
svetu (ni engleski se ne zove, niti se ikad zvao “američkoengleski”). Srpski
lingvisti ne razumeju ni čemu je poslužio naziv “srpskohrvatski/
hrvatskosrpski”, ili “hrvatski ili srpski” jezik (jedino tome da se “hrvatski
književni jezik” izdvoji iz srpskog jezika, da se izdvoji “bošnjački” ili “bosanski
jezik”, da se sada planira izdvajanje i “crnogorskog jezika”). (POL-08-8-2003-
80)

Serbian linguists seem to have a hard time understanding that there is no


Yugoslavia any more, neither King’s nor Broz’s, in which manipulation also in
science was everything, that the “Serbo-Croatian language” is gone for good, that

167
there are no dual-label language names anymore anywhere in Europe or the world
(even English is not called, nor has it even been called, “American-English”).
Serbian linguists also do not understand what the purpose of the label “Serbo-
Croatian/Croato-Serbian”, or “Croatian or Serbian” language was (only to
separate the “Croatian literary language” from the Serbian language, to separate
“Bosniak” or “Bosnian language”, to now plan the separation of the “Montenegrin
language).

Našim lingvistima ostaje da po volji biraju BHMS ili srpski. Vreme je da se naši
lingvisti potpuno okrenu nauci i da već jednom perstanu da strahuju od politike.
Bio bi to čin ne samo prihvatanja naučnih vrednosti nego i moralni čin pokajanja
i izvinjenja srpskoj nauci i srpskom narodu. Neka mirno i slobodno nazovu
veliki rečnik SANU srpskim rečnikom. (POL-22-7-2006-55).

Our linguists are left with a choice between BHMS


[Bosnian/Croatian/Montenegrin/Serbian] or Serbian. It is time that our linguists
turn entirely to science and stop fearing politics already. It would be not only an
act of acceptance of scientific values but also a moral act of repentance and
apology to Serbian science and Serbian people. Let them peacefully and freely
call the unabridged SANU dictionary a Serbian dictionary.

Factor 10, on the other hand, suggests a more technical discourse pertaining to

lexicography and language standardization. Texts scoring highly on this factor typically

discuss language standardization issues, linguistic studies or book editions, most of which

were produced in the framework of SANU.

Nedavno objavljena “Sintaksa srpskog jezika", u izdanju “Beogradske knjige",


predstavlja suštinsku novinu u naučnoj lingvističko-gramatičkoj obradi srpskog
jezika U izdanju “Beogradske knjige", Instituta za srpski jezik SANU i Matice
srpske upravo je iz štampe izašla monumentalna edicija (oko 1600 stranica), pod
naslovom “Sintaksa savremenog srpskog jezika. (POL-1-10-2005-165)

The recently published [book] “Serbian Syntax” (publisher: “Beogradska knjiga”


[Belgrade book]) represents a novelty in the scientific linguistic-grammatical
study of the Serbian language. The monumental edition (around 1,600 pages) of
the [book] titled “Synax of the contemporary Serbian language” was published
jointly by ‘Beogradska knjiga’, SANU Institute for the Serbian language, and
Matica Srpska [Serbian Language Association].

[Milan Šipka] Već sam jednom prilikom rekao da se nakon disolucije zajedničkog
srpskohrvatskog standardnog jezika srpski lingvisti, i Srbi kao narod, nisu

168
jasno odredili prema novonastaloj situaciji. Zbog toga znatno zaostajemo u
lingvističkim aktivnostima, što je posledica i odsustva šire društvene podrške
negovanju srpskog standardnog jezika i jezičke kulture. (POL-5-1-2008-166)

[Milan Šipka, well-known Bosnian Serb linguist] I’ve already said on one
occasion that Serbian linguists, as well as Serbs as a people, never adopted a clear
position toward the situation that came about after the dissolution of the common
Serbo-Croatian standard language. This is why we lag behind in terms of
linguistic activity, which is also a consequence of a lack of broader societal
support for the development of the Serbian standard language and the linguistic
culture.

The unabridged SANU dictionary of Serbian (in preparation since the 1960s) is the most

frequent reference in these texts,

nismo mnogo pažnje posvećivali takvim rečnicima, jer je glavni projekat


decenijama bio veliki Rečnik SANU za koji se smatralo da će, tako veliki,
zadovoljiti sve leksikografske potrebe (NIN-3-7-2008-182)

we never paid much attention to such dictionaries because our main project for
decades was the unabridged SANU dictionary of Serbian which, large as it was,
was supposed to meet all lexicographic needs

Lastly, the (pseudo-) scientific arguments and a discourse of contestation are present but

marginal compared to other relevant factors,

A to što se oni danas (sinonimno) nazivaju nacionalnim imenima: srpski,


hrvatski, bošnjački/bosanski, crnogorski, bunjevački ili maternji i dr. - zapravo je
time jedan naučni, lingvistički princip pretvoren u sociolingvstički ili
nacionalnopolitički koncept. (POL-17-6-2006-95)

That they today bear (synonymous) national names: Serbian, Croatian,


Bosniak/Bosnian, Montenegrin, Bunjevački, or mother tongue and so on – is
really a transformation of a scientific, linguistic principle into a sociolinguistic or
national-political concept.

6.6.5 Topoi. In addition to the basic content and thematic analysis above,

representative texts were examined also for evidence of argumentation strategies in the

DHA tradition (Wodak, 2001; topoi, explicit or inferable obligatory premises which make

169
it possible to connect arguments with the conclusion, or simply “the common-sense

reasoning typical for specific issues,” van Dijk, 2000 cited in Baker et al., 2008, p. 299).

Two particularly prominent and recurrent relevant topoi were identified:

Topos 1: (‘hard’, i.e., structural) linguistics provides irrefutable scientific

evidence that the language spoken by Central South Slavs is Serbian in origin

(and should therefore be called Serbian only), and

Topos 2: the polycentricity of Central South Slavic is comparable to the

polycentricity of languages such as English, Spanish, and Portuguese (as well as

German) all of which bear a single, original label.

The pseudo-scientific arguments which form the basis of these two topoi were

already noted above. Nauka ‘science’ and naučni ‘scientific’, for example, were

identified by both keyword (Tables 12, 14 and E1) and collocation (Table F1) analysis as

items of potential discursive and ideological interest; they also both loaded highly on

Factor 10 (Table 23) which was shown by cluster analysis to be discursively linked with

Factor 6 (Table 41), perhaps the single most representative factor of the discourse of

contestation. In the excerpts from representative texts above, we saw the following

examples of these pseudo-scientific arguments,

Science undeniably determined that Vuk’s Štokavian language is Serbian, politics


demanded that it also be Croatian. (POL-22-7-2006-55)

Competent experts, linguists with scientific authority, have claimed since the time
of Vuk Karadžić, the creator of the literary Serbian language, that the language of
Montenegrins and Serbs is the same (POL-31-3-2004-2)

Serbian linguistics clearly says that only the syntagma Bosniak language can be
used in Serbian for the standard language used by Bosniaks (POL-16-2-2005-82)

linguists have provided scientific argumentation for this, that there are no
scientific, linguistic, historical or sociocultural reasons to rename Serbian into

170
Montenegrin (POL-31-3-2004-2)

as if linguistics was a recent development so we didn’t know which languages


existed in the Balkans and around the world (POL-10-1-2005-134)

a transformation of a scientific, linguistic principle into a sociolinguistic or


national-political concept (POL-17-6-2006-95)

Our linguists are left with a choice between BHMS


[Bosnian/Croatian/Montenegrin/Serbian] or Serbian. It is time that our linguists
turn entirely to science and stop fearing politics already. It would be not only an
act of acceptance of scientific values but also a moral act of repentance and
apology to the Serbian science and Serbian people. (POL-22-7-2006-55)

The pseudo-scientific argumentation captured in Topos 2, on the other hand, is

less obvious in the results of the quantitative analyses (partly, perhaps, on account of the

omnipresence of references to English many of which do not pertain to this topos), but

quite prominent in the representative texts. The examples we saw above include,

If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian. (POL-7-7-2004-109)

Just a couple of examples which show that for languages used by two or more
nations the original name is used around the world. Thus, the Austrian nation,
which has had a state for a thousand years, speaks German and not Austrian. The
Swiss of Germanic origin speak German, not Swiss. American people call their
language English, not Anglo-American or American. (POL-10-1-2005-134)

A language can only be named after the people it belongs to, but not after the
names of the peoples who also use it. English is also one, common and the same
for all people who speak it, but it is well known whose language it is and what it’s
name is, regardless of who speaks it and where. It is always only English even
when it is spoken in the United States, Canada, Australia, New Zealand or
anywhere else in the world. Such are also German, Spanish, and Portuguese
languages. (POL-22-7-2006-55)

Similar pseudo-scientific arguments could also be seen in the discourse of endangerment

in the calls for the defense of the Cyrillic alphabet,

171
Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does. (POL-16-
3-2003-93)

do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language (POL-11-2-2005-108)

general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language. (POL-21-9-2004-72)

Finally, it should be noted that both of these topoi are in evidence in language-related

discourses coming from the leading academic linguists, as well as the more marginal

figures, civic associations, and private citizens (in letters-to-the-editor).

7. Discussion

7.1 Research Question 1

The first research question was: Can corpus linguistics-based quantitative

methods (keyword, collocation, exploratory factor, and cluster analyses) be used to

identify lexical patterns suggestive of dominant language-related discourses and

language ideologies in Central South Slavic and what similarities/differences are there

between them? This question was addressed through a continuing comparison between

the different methods and techniques throughout the presentation of results (Section 6).

Despite all the challenges that extensive inflectional morphology presents for

corpus-based analysis in general and of discourses and ideologies in particular, it is clear

that all four methods can be successfully applied to identify lexical patterns suggestive of

dominant language-related discourses and language ideologies. Keyword lemmas and

key-keywords and their associates provided a macroscopic view of the characteristic lexis

and lexical patterns in the research corpus and hinted at their covariance and thus the

discursive profile of the corpus as a whole, as well as individual dominant discourses.

172
Significant collocates and n-grams provided complementary evidence that confirmed and

supplemented the patterns identified by keyword analysis and added a phrasal dimension

to the discursive profile; collocation analysis also supplied data for exploratory factor

analysis and cluster analysis. Most importantly, exploratory factor analysis took the

somewhat amorphous collocate data and turned them into a detailed discursive profile of

the corpus based on covariance, providing an objective, replicable way of identifying

representative texts in the form of factor scores unavailable from any other methods.

(Analysis of variance, further, showed that, though sometimes difficult to interpret,

synchronic and diachronic variation do exist and can be used in conjunction with the

results of other methods.) Finally, cluster analysis built on both the collocate data and the

factorial structure to provide an account of the patterning in the data with respect to the

discursive links between factors, and the three independent variables for a more fine-

tuned discursive profile of the corpus.

Further, lexical patterns identified by keyword, collocation, factor, and cluster

analyses proved to be congruent and complementary. As might be expected, the

differences between the lexical patterns identified by each method were largely a product

of their different approaches to the data. For example, where keyword analysis focuses

on lexical items that are significantly more frequent in the research corpus, collocation

analysis focuses on the lexis co-occurring with the core concept(s). Both analyses thus

produce patterns that are characteristic of the corpus, but from different perspectives.

This has been shown to involve a great deal of overlap as well as some differences.

Keyword and collocation analyses thus sometimes pointed to two different sides of the

same coin, as it were. A good example here is the relative prominence of the item

173
latinica ‘Latin (alphabet)’ in the results of collocation analysis, and its almost complete

absence from the results of keyword analysis. Similarly, ćirilica ‘Cyrillic’ is considerably

more prominent in the results of keyword analysis. These two lexical items are both very

prominent in the discourse of endangerment around the Serbian Cyrillic alphabet and so

reliance on either keyword or collocation analysis alone would have presented an

incomplete picture, even it would have been quite possible to identify this pattern through

follow-up qualitative analysis.

Systematic similarities and differences can also be seen in the way these analyses

can combine pertinent lexical items into groups for higher-order analysis. Keyword

analysis is in this respect somewhat similar to factor analysis as it can take a text-based

(i.e., macroscopic) view of covariance between individual lexical items to produce sets

indicative of themes, discourses, and ideologies. A comparison between keyword

associates and factors illustrated how similar the results of these two analyses can be.

However, unlike keyword associates, factor analysis offers a parsimonious way to

identify representative texts in an objective, reliable, and replicable manner. This is

another clear indication of the superiority of factor analysis to the other methods

employed here, particularly because of the widespread politicization of and biases in the

linguistic research in this area.

Collocation analysis, on the other hand, does not offer a way of combining items

into groups, except for those that repeatedly occur together in more or less fixed ways

(i.e., phrases or n-grams). Further, similar to keyword analysis, collocation analysis does

not provide an objective way to identify representative texts. However, unlike both

keyword and factor analyses, collocation analysis offers concordance lines which can be

174
used to quickly assess lexical patterns in actual use in different texts, but which have been

shown to be of limited use here. Also, unlike all three perhaps, cluster analysis accounts

for all of the data, and provides a way of testing relationships between the factorial

structure and independent categorical variables (which was also done using analysis of

variance).

7.2 Research Question 2

The second research question was: What language-related discourses and

language ideologies relevant to Central South Slavic ethnolinguistic identities can be

identified in the 5+ hits section of SERBCORP? This question was addressed through

examination of top scoring (i.e., representative) texts for evidence of explicit or implicit

references to Central South Slavic ethnolinguistic identities and topoi. The findings were

presented by factor (i.e., language-related discourse); factors identified by cluster analysis

as similar were treated together.

The quantitative evidence from keyword and collocation analysis showed that, at

the most general level, one of distinct and remote cultural identities (i.e., in SERBCORP,

see Appendix C), language is routinely conceptualized in terms of binary oppositions of

implicitly monolithic codes (cf. standard language ideology, Milroy, 2001). This was

indicated by frequent use of glottonyms which imply monolithic language varieties with

clearly demarcated boundaries and associated national identities (e.g., Serbian, English),

as well as sets of possessive pronouns constructing in- and out-groups and implying

ownership of language (e.g., our, own). At the level of SERBCORP as a whole, then, the

dominant language ideology in evidence is one of societal monolingualism and a natural

one-to-one correspondence between language and national identity, which at the same

175
time seems to be an expression of a belief in the “impossibility of heterogeneous

communities and the naturalness of homogeneous communities” (i.e., homogeneism,

Blommaert & Verschueren, 1998, p. 207). However, despite the binary difference and the

emphasis on an “us and them” view of collective identity, differences between what are

understood as distinct language varieties are internalized and taken for granted and so

there is very little evidence of identity-related contestation. In other words, only intra-

linguistic (i.e., Central South Slavic) identities are contested here.

This is, of course, entirely different at the level of less distinct and geographically

and culturally closer regional (i.e., Central South Slavic) ethno-cultural identities (5+ hits

section of SERBCORP). Here, even the most basic quantitative analysis pointed to the

prominence of lexical items such as, for example, name, label, renaming and (does not)

exist and thus a tendency toward negation of separateness and contestation of separate

names and identities. Keyword associates (Section 6.1.3) and n-grams (Section 6.2.2)

confirmed this tendency and showed that it pertained to a limited set of Central South

Slavic ethnolinguistic identities (e.g., the renaming of the Serbian language into

Montenegrin), while factor analysis showed the (big ‘D’) discourses of endangerment and

particularly contestation to be the most dominant, extending across six of the twelve

identified factors (i.e., small ‘d’ discourses). The dominant conceptualization of language

here is still one of natural one-to-one correspondence between language and (ethno-)

national identity and thus also homogeneism, but now the boundaries between in- and

out-groups are much less clearly defined as the lack of linguistic distinctiveness is used to

undermine claims to separate identity (as elsewhere in Europe, cf. Blommaert &

Verschueren, 1998). Note further that there is a familiar tendency to emphasize

176
differences projected outwardly and minimize (or erase) differences projected inwardly,

typical of nationalism (see, e.g., Hobsbawm, 1990). Ultimately, the dominant language

ideology in the mainstream Serbian newspaper discourse is an essentialist one, conflating

language, alphabet, and literature, and insisting on language as an embodiment of the

putative immutable, primordial character of the nation that created it, e.g.,

For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […]. (POL-16-3-2003-93)

Profesor Lompar veruje da je reč o “ideološkom zahvatu”: “Cilj je da se


književnost svede na puku umetnost, na likovno i muzičko, a ona nije samo
umetnost. To znači zanemarivanje njenog kulturnog, istorijskog, antropološkog
aspekta. Jer, za razliku od drugih naroda, književnost je u Srba presudni
konstituent nacionalnog identiteta, beleg postojanja u dugom trajanju turskih
vekova. Zato je prevođenje književnosti na medijski, funkcionalni aspekt za nas
pogubno i isto što i brisanje identiteta.” (NIN-13-3-2003-381)

Professor Lompar [Professor, School of Philology, University of Belgrade]


believes this is about an “ideological project”: “The aim here is to reduce
literature to a mere art form, to a fine-art status such as that of painting or music,
but it is not only an art form. This would mean a neglect of its cultural, historical,
anthropological aspects. Because, in contrast to other nations, for Serbs literature
is a crucial constituent of the national identity, proof of existence throughout the
long duration of the Turkish centuries. This is why the reduction of literature to
its media, functional aspect is detrimental for us and equal to an erasure of
identity.

But, how are we to understand the function of such a conceptualization of language and

the use of specific argumentation strategies (i.e., topoi), particularly with respect to the

discourses of endangerment and ethnolinguistic identity-related contestation which have

been shown to be so pervasive here?

7.3 Research Question 3

The third research question was: What links can be identified between the

177
language-related discourses and language ideologies relevant to Central South Slavic

ethnolinguistic identities and ethnonationalism? This question was addressed through a

historical and sociopolitical contextualization of the findings obtained through

quantitative and qualitative methods.

Contestation is not a new phenomenon in the Balkans, nor is it limited to the

Central South Slavic area. In addition to the contestation we note today, there have been

earlier historical examples, sometimes making equally absurd claims, such as the theory

developed by German nationalists in the nineteenth century (“Windischentheorie”)

according to which Carinthian Slovene dialects were more closely related to Germanic

than to Slavic languages, or the orchestrated negation of Macedonian identity by the

Greeks, Bulgarians, and Serbs, particularly in the latter half of the twentieth century,

which still continues today (for details about these, see Voss, 2006). Arguably, these

represent examples of what Irvine and Gal (2000) call ‘fractal recursivity’ whereby (inter-

linguistic) binary oppositions are used for the specific local purposes of delegitimation of

one (intra-linguistic) ethnolinguistic identity or another. All this contestation has two

things in common. The first is a focus on language. As the French-Serbian scholar, Yves

Tomić, notes in his expert report on the ideology of Greater Serbia in the nineteenth and

twentieth centuries written for the United Nation’s International Criminal Tribunal for the

former Yugoslavia (UN ICTY) in The Hague (Tomić, n.d.), at the root of the Greater

Serbian ideology is the Herderian language ideology according to which language is the

only valid criterion for the determination of national identity (see also Carmichael, 2000).

The second feature shared by the Balkan contestations is the instrumentalization of

linguistics in nationalist projects. According to Friedman (1999, p. 20),

178
At the end of the nineteenth and beginning of the twentieth centuries (and even
today, see, e.g. Glenny, 1995), linguists were putting their knowledge at the
service of politicians by choosing one or another isogloss as the definitive
justification for their ethnic identity – and therefore nationality […]. [L]inguistic
features become ‘flags’ that are manipulated to represent territorial claims. […].
The claims about nationality [a]re then translated into claims for the territory to be
included in the nation-state.

In his study of the negations of the Macedonian ethnolinguistic identity, Voss (2006, pp.

120-122) thus writes that “even in Yugoslav times we notice the coincidence of national

language ideology and ethnic identity ideology”, a contradiction which “becomes even

sharper after 1991” when “cultural policy became a tool in the rivalry of post-communist

elites” and which “remains unresolved today”.

However, cultural policy has been a favorite tool in the nationalist projects for

much longer. In her book, Yugoslavia’s implosion: The fatal attraction of Serbian

nationalism, Sonja Biserko, a former Yugoslav diplomat and the president of the Helsinki

Committee for Human Rights in Serbia, traces the origins of contemporary Serbian

nationalism back to the beginnings of the nineteenth century, “the formative period of

Serbia as a nation-state” (Biserko, 2012, p. 34), and the idea of resurrection of the

fourteenth-century Serbian medieval empire, “a patriarchal, Orthodox, ethnically

homogeneous state” (p. 33). This idea, known as “Greater Serbia” throughout the

twentieth century, was first formulated into a national strategy in 1844 in a work by Ilija

Garašanin, Serbian minister of internal affairs from 1843 to 1852, famously titled

“Načertanije” (‘draft plan’). The plan envisaged a resurrection of the medieval Serbian

state which had been destroyed by the Turks by integrating all Balkan territories in which

Serbs lived, either as a majority or as a minority, into a single state. These included large

parts of Croatia, Vojvodina (part of Hungary at the time), Bosnia-Herzegovina,

179
Montenegro, and northern parts of Albania (Tomić, n.d., p. 13). The plan has also been

widely known by an oft-repeated formula which summarizes it as “all Serbs in one state”.

However, much like elsewhere in Europe, this was a time of Romanticism and inception

of the national consciousness,36 when collective identities where much less clearly

delineated and much more fluid than they are today, so it was not always clear who Serbs,

for example, were. But this dilemma would be conclusively solved for Serbian

nationalists by the Serbian linguist and ethnographer, Vuk Karadžić, who created the

Serbian Cyrillic alphabet and initiated the standardization of modern Serbian. In his

book, indicatively titled Serbs, all and everywhere, written in 1836 and published in

1849, Karadžić demarcated the national Serbian territories and launched the theory of

Serbs as a people of several faiths (i.e., Orthodox, Catholic, and ‘Mohammedan’) unified

by a common language (see, for example, Tomić, n.d., pp. 8-9). Indeed, Western analysts

of nineteenth-century Balkans nationalist ideologies such as Behschnitt (1980, p. 71,

cited in Tomić, n.d., p. 10, Note 13), consider the ideas of Vuk Karadžić to be the

“linguistic and cultural ideology of Greater Serbia.”

As Biserko (2012) notes, there is a clear ideological continuity in Serbian

nationalism in the last two centuries, from the formation of Serbia as a nation-state in the

first half of the nineteenth century, to the two world wars and two Yugoslav states in the

first half of twentieth century, to the breakup of Yugoslavia and the ensuing Yugoslav

wars at the end of the twentieth century. However, for our purposes here, one other

historical moment is particularly important. As noted in the introduction, Yugoslavia was

showing signs of internal struggles and instability already before Josip Broz Tito’s death

in 1980. This trend was accelerated by the political and economic uncertainties in the

180
period following Tito’s death. Again, as mentioned above, Serbian elites tended to view

Yugoslavia as a form of Greater Serbia and therefore tried to impose a centralization of

the state which was opposed primarily by Slovenes and Croats (cf. Biserko, 2012). In

this climate and very much in the Serbian tradition of drafting conspiratorial nationalist

strategies, a group of Serbian intellectuals, members of the Serbian Academy of Sciences

and Arts at least one of whom was a leading linguist (Pavle Ivić), drafted the infamous

SANU Memorandum in the fall of 1986 (SANU, 1986). The memorandum alleged Serbs

and Serbia to be in an “unequal position” and “threatened” in Yugoslavia, and blamed the

1974 constitution which decentralized the country and gave greater rights to individual

republics, arguably a historically and politically valid arrangement but one in which Serbs

were a minority everywhere outside of Serbia itself. As Biserko (2012, p. 82) notes, “[i]n

essence, the Memorandum reiterared the Serbian national agenda from the late nineteenth

and early twentieth century, calling for ‘the liberation and unification of the entire Serb

people and the establishment of a Serb national and state community on the whole Serb

territory’.” Needles to say, the Memorandum was and continues to be widely regarded in

the former Yugoslavia as the definitive statement of the Serbian nationalist program, the

Greater Serbia, which was the principal cause of the 1990s wars.

Most interestingly, the Memorandum mentions the noun ‘language’ ten times, the

noun ‘linguists’ one time, and the adjective ‘linguistic’ three times in its thirty-two pages

of text, purveying discourses of endangerment and contestation and a language ideology

of essentialism rather similar to those attested above,

Manipulacije sa jezikom […]

Manipulation of language […]

181
Delovi srpskog naroda, koji u znatnom broju žive u drugim republikama, nemaju
prava, za razliku od nacionalnih manjina, da se služe svojim jezikom i pismom, da
se politički i kulturno organizuju, da zajednički razvijaju jedinstvenu kulturu svog
naroda.

Unlike national minorities, parts of Serbian people, living in other republics in


considerable numbers, do not have the right to use their own language and
alphabet, to organize politically and culturally, to participate in the joint
development of a unified culture of their people.
nametanja službenog jezika koji nosi ime drugog naroda (hrvatskog) oličavajući
time nacionalnu neravnopravnost

imposition of an official language which bears the name of another people


(Croatian), which illustrates national inequality

Taj je jezik ustavnom odredbom učinjen obaveznim i za Srbe u Hrvatskoj, a


nacionalistički nastrojeni hrvatski jezikoslovci sistematskom i odlično
organizovanom akcijom sve ga više udaljavaju od jezika u ostalim republikama
srpskohrvatskog jezičkog područja, što doprinosi slabljenu veza Srba u Hrvatskoj
sa ostalim Srbima.

That language was made compulsory also for Serbs in Croatia through a
constitutional decree, while the nationalist Croatian linguists continue to distance
it from the language in other republics of the Serbo-Croatian language area
through systematic and well-organized actions, which contributes to the
weakening of the links between Serbs in Croatia and other Serbs.

Praktično značenje izjava: „moramo brinuti“, „treba se boriti“, „više treba učiti
ćirilicu“ itd. može se procenjivati samo u njihovom suočenju sa stvarnom
jezičkom politikom koja se vodi u SRH. Ostrašćena revnost kojoj je cilj
konstituisanje zasebnog hrvatskog jezika što se izgranuje u protivstavu prema
svakoj ideji o zajedničkom jeziku Hrvata i Srba ne ostavlja dugoročno mnogo
izgleda srpskom narodu u Hrvatskoj da očuva svoj nacionalni identitet.

The practical meaning of statements such as “we must take care of”, “we need to
fight”, “Cyrillic should be taught more often”, etc., can be evaluated only against
the real language policy in the Federal Republic of Croatia. The zeal whose aim
is to create a separate Croatian language, opposed to the idea of a common
language of Croats and Serbs, does not leave much longterm prospect to the
Serbian people in Croatia of preserving their national identity.

Pod dejstvom vladajuće ideologije kulturne tekovine srpskog naroda otuđuju se,
prisvajaju ili obezvređuju, zanemaruju ili propadaju, jezik se potiskuje, a ćirilsko

182
pismo postepeno gubi.

As a consequence of the ruling ideology, the cultural inheritance of the Serbian


people is being alienated, appropriated, or devalued, neglected or left to ruin, the
language is being supressed, and the Cyrillic alphabet is being gradually lost.

Although the topoi featuring pseudo-scientific arguments on language attested above are

missing here, the discourse of endangerment is present and is more pronounced than

above, while the discourse of contestation is implicit rather than explicit. Apparently, the

discourse of Serbian linguistic nationalism evolved between 1986 and early 2000s,

adapting to the changing circumstances and replacing the alarmist, mobilizing discourse

of endangerment of the pre-war and war periods with pseudo-scientific argumentation

which is arguably more likely to be effective in a post-war period characterized by

widespread conflict fatigue. Further, the Croats are clearly labeled as the enemy through

their discursive construction as an out-group and by way of predication strategies such as

labeling them “nationalist” and “zeal(ous)” which are then used to justify the proposed

action (i.e., a recentralization of the Yugoslav state). In other words, all the main

elements of the discursive complex of Serbian linguistic nationalism which we saw above

are on display here also. This is confirmed by the results of quantitative analysis which

identified Vuk Karadžić and SANU as some of the pertinent lexical items in the research

corpus: both Vuk Karadžić and SANU appear in the key lemma (Tables 12, 14, and E1)

and collocation (Tables G1 and G2) lists, while SANU also appears numerous times in n-

grams (Table 19) and Factor 10 in EFA (Table 23). Similarly, the results of qualitative

analysis exemplify the routine intertextual references to Vuk Karadžić and the work done

within SANU and thus the interdiscursivity between language and nationalism in Serbia,

as in the following example,

183
It is much more of a problem that the downfall of the Serbian linguistic science
that began immediately after the death of Vuk Karadžić (1864) continues today.
Renaming the language from Serbian (which is what it was called during the time
of its last reformer) into Serbo-Croatian, Serbian linguists entered a period during
which the Vuk’s path in the naming and development of the language of the
Serbian people was abandoned. That that period still continues is confirmed by
the fact that Serbian linguists continue to call their language “Serbo-Croatian” in
the Dictionary even after the demise of the “Serbo-Croatian language”. (POL-08-
8-2003-80)

The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speak their own, Serbian language, which is close to
other Slavic languages, but also different from them, as well as that Serbs had
three faiths: Orthodox, Roman-Catholic and Mohammedan. (POL-10-9-2005-
127)

The dominant contemporary (and hegemonic) language-related discourses and

language ideologies in evidence in the mainstream Serbian press therefore seem largely

to derive from the revived Serbian nationalist program first articulated in the nineteenth-

century discursive and ideological work by Vuk Karadžić and Ilija Garašanin, as well as

the 1986 SANU Memorandum, and are employed as a cultural policy tool in various

aspects of the realization of this program and the establishment of a Greater Serbian

hegemony in the South Slavic area in the Balkans. The alternative, and sometimes

counter-hegemonic, language-related discourses and language ideologies, on the other

hand, though present (as in the critique of the work of the “Cyrillic” associations

presented at the end of Section 6.6.1), are marginal and not easily detectable by either

quantitative or qualitative methods. In short, in the mainstream Serbian press, language-

related discourses and language ideologies are largely discourses and ideologies of

Serbian ethnonationalism, which, in the wake of the breakup of Yugoslavia, can be

considered to be part of the Serbian nationalists’ “strategies of perpetuation” which

184
“attempt to maintain or reproduce a threatened national identity” (Wodak, de Cillia,

Reisigl & Liebhart, 1999, p. 33).

7.4 Research Question 4

The fourth research question was: Is there synchronic and diachronic variation in

the identified language-related discourses and language ideologies relevant to Central

South Slavic ethnolinguistic identities? This question was addressed through a

comparison of the factor scores for each of the six selected language-related discourses

(i.e., factors) of texts grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year

of publication: 2003, 2004, 2005, 2006, 2008; and c) type of article: general newspaper

articles vs. letters-to-the-editor. Synchronic and diachronic variation were also examined

using cluster analysis.

The patterns of synchronic variation (between different publications) in language-

related discourses identified by factors suggest differences between the broadsheet daily

Politika (est. in 1904) as the oldest and most presitigious daily in Serbia and the weekly

NIN (est. in 1935) as the oldest weekly in Serbia, on the one hand, and the tabloid daily

Blic (est. in 1996) and the weekly Vreme (est. in 1990), on the other. Politika and NIN

articles, it will be remembered, scored significantly more highly than Blic or Vreme on all

factors except Factors 4 and 8 (Officialization of Montenegrin). Although this pattern is

difficult to interpret with any degree of certainty, it seems safe to conclude that the older,

more conservative Politika and NIN offer better representations of dominant discourses

and thus of linguistic nationalism in Serbia. However, it should be noted that the

qualitative examination of representative texts suggested a high degree of congruence in

the actual (big ‘D’) discursive and ideological content of texts across all four publications

185
(see Section 6.6), so this difference may be quantitative rather than qualitative in nature.

The dominance of the identified discourses is underscored by their relative

stability over time. Although a period of five to six years arguably is not long enough for

significant differences to emerge, it is indicative that there were no statistically significant

diachronic differences in language-related discourses in five of the six examined factors,

particularly because the one factor that actually showed diachronic variation (Factor 2:

Cyrillic-only) was the most marginal to the contestation of Central South Slavic

ethnolinguistic identities. This finding is corroborated by the qualitative examination of

representative texts which suggested a high degree of congruence in the actual (big ‘D’)

discursive and ideological content of texts across all five years of publication considered

here. Finally, although the patterns of synchronic variation between different types of

article (general newspaper articles vs. letters-to-the-editor) presented a more complicated

picture, with some congruence (Factors 10 and 11) as well as significant differences

(Factors 2, 4, 6 and 8), also here the qualitative examination of representative texts

suggested a high degree of congruence in the actual (big ‘D’) discursive and ideological

content of texts.

8. Conclusion

This study had two major goals. The first was to determine whether a corpus-

linguistic methodology could be effectively applied to a language featuring extensive

inflectional morphology, and, if yes, to compare the different quantitative methods in

terms of their usefulness and effectiveness for identification of lexical patterns suggestive

of language-related discourses and, ultimately, language ideologies. The second goal was

to identify and describe dominant language-related discourses and language ideologies in

186
the mainstream Serbian press and to then examine those for any links with

ethnonationalism. Because the methodological comparison was the relatively more

straightforward of the two goals, it was dealt with first throughout the dissertation and it

is also dealt with first in this final section (Section 8.1). Conclusions about language-

related discourses and language ideologies identified in this study are offered in Section

8.2. Implications, limitations, and directions for future research are discussed in Sections

8.3, 8.4 and 8.5, respectively.

8.1 Methodological Comparison

It has been noted already that most studies of language-related discourses and

language ideologies have relied on qualitative methods only, while those that also use

quantitative methods tend to rely on keyword and collocation analysis. The reasons for

this include difficulties with the operationalization of discourse and ideology in

quantitative terms, as well as the relative ease of use and effectiveness of basic corpus-

linguistic methods such as keyword and collocation analysis. However, as this study has

demonstrated, despite the difficulties with the application of a quantitative methodology

to slippery concepts such as discourse and ideology, this approach offers novel insights

unavailable through a qualitative-only approach. Further, despite their mutual similarities

and differences, the quantitative methods applied here were shown to each offer a unique,

complementary angle from which to consider the data, which ensures both a more

comprehensive analysis and a higher degree of reliability. For example, keyword

analysis was shown to be much more effective than collocation analysis at corpus

profiling for sampling purposes. In other words, the decision to focus on the 5+ hits

section of SERBCORP would have been more difficult to justify based on the results of

187
collocation analysis alone. Perhaps most interestingly, it was demonstrated that reliance

on keyword and collocation analysis is comparatively less effective in research dealing

with large, topically heterogeneous corpora. As noted above, topically homogeneous

corpora such as single-issue parliamentary debates or student evaluations of university

instructors, regardless of their size, tend to have more homogeneous discursive and

ideological profiles also which are easier to identify using basic techniques. Topically

heterogeneous data sets, on the other hand, present a challenge on account of their more

heterogeneous and therefore more complex discursive and ideological profiles. Again,

this is where exploratory factor analysis proved to be far superior to other methods.

One conclusion, therefore, is that these methods are complementary in that they

provide both unique (e.g., micro- vs. macroscopic) and common (e.g., frequent, recurrent

lexis) perspectives, so they are best applied in conjunction with one another for the

purposes of triangulation. Perhaps most importantly, however, exploratory factor

analysis (based though it is on the results of collocation analysis) is the only method that

can effectively take researcher inference out of the process of identification both of (small

‘d’) discourses (i.e., factors) and representative texts (by providing factor scores for each

text). This finding is of paramount importance for research into discourses and

ideologies, particularly for research focusing on areas where linguistic research is

traditionally politicized and biased. Similarly, this is important for critical discourse

analysis which has been in need of research guided and supported by the results of

transparent, replicable procedures. Nevertheless, it is clear that researcher inference is

ultimately necessary for any meaningful interpretations, as quantitative procedures can

only identify salient lexical patterns and small ‘d’ discourses. This study thus also

188
confirms the effectiveness of a mixed methods approach whereby quantitative and

qualitative methods are combined in a hermeneutic fashion, i.e., qualitative analysis is

based on and guided by the results of quantitative analysis and vice versa.

Lastly, a note on the challenges presented by extensive inflectional morphology.

Despite its dangers, lemmatization may be a necessary evil as its advantages (e.g.,

aggregation of semantically closely related and highly correlated variables) seem to

outweigh its drawbacks (e.g., conflation of lexical items with potentially different

collocational or other patterning). Beyond a multiplicity of semantically related forms,

extensive inflectional morphology also has considerable potential for polysemy in the

form of homonyms/homographs. The solution, which unfortunately was not available

during the research reported on in this study, is reliable grammatical differentiation

through part-of-speech tagging. Similarly, extensive inflectional morphology presents a

problem for concordance analysis also because it breaks up the semantic unity of a

lemma (and with it its concordance patterns) into a myriad forms which concordancing

software such as WST is unable to deal with at the present moment. Here, the solution is

less clear, but will involve the development of software more capable of dealing with the

challenges of extensive inflectional morphology.

8.2 Language-related Discourses, Language Ideologies, and Ethnonationalism:

What It All Means

Two principal, dominant language-related discourses were identified in this study.

The first, a discourse of endangerment, was itself attested in two related forms each of

which focuses on a different aspect of language as well as a different ‘threat’. The first

form focuses on the perceived danger to the Cyrillic alphabet that a widespread use of the

189
Latin alphabet in Serbia is deemed to represent. Perhaps somewhat unusually among the

world’s languages, Central South Slavic uses both the Latin and the Cyrillic alphabets.

While the alphabets themselves are fully equivalent, the difference between them is in

their sociohistorical origins and thus sociolinguistic in nature. Despite the equivalence,

the Cyrillic alphabet has historically been strongly associated with the Serbian

ethnonational identity, whereas the Latin alphabet, largely because it provides a means of

distinction from the Serbs, is strongly preferred by the Bosniaks, Croats, and now

Montenegrins also. Here, then, we see an example of what Irvine and Gal (2000) call

‘iconization’ or mapping of linguistic features onto social images, positing a direct link

between one or more linguistic features and (an essentialist conceptualization of) the

nature of the persons or social groups who display them (for a reverse example, see

Section 6.6.1). Interestingly, however, the Latin alphabet is also in widespread use

among the Serbs themselves, both in Serbia and elsewhere, mostly because it has come to

be associated with modernity and the Western civilization (another example of

iconization). However, the Cyrillic alphabet is preferred in official contexts and has

never been in true danger of being phased out. There rather seems to exist a kind of

alphabetical diglossia, whereby the Cyrillic alphabet, in further examples of iconization,

is used in most official contexts and is associated with political conservatism and cultural

traditionalism, while the Latin script is mainly used in popular culture and is associated

with political liberalism and modernity (but see Herzfeld, 1987 for a problematization of

the concept of diglossia). The assessment of the endangerment of the Cyrillic is thus

(grossly) exaggerated, and the calls for its defense have primarily ideological and

political motivations. Note here that the calls to banish the Latin alphabet from Serbian

190
altogether represent an example of ‘erasure’ as the simplification of a sociolinguistic field

through which some persons, social groups, or sociolinguistic phenomena are rendered

invisible in ideologically and politically convenient ways (Irvine & Gal, 2000). Serbian

society has traditionally harbored a cult of victimhood since the defeat by the Ottoman

Turks in 1389 in Kosovo (see, e.g., Biserko, 2012), while vocal self-interested public

proclamations of endangerment of all things Serbian have become so pervasive (and

profitable) since the breakup of Yugoslavia that there are now widely used expressions

such as “to Serb” and “a professional Serb” to (mockingly) refer to the phenomenon.

Most proclamations of the endangerment of the Cyrillic in this data set (and, arguably, in

general) thus come from the fringe (minor civic associations and minor-league

academics) that harbors extremist political views; nevertheless, they are given enormous

(undue) attention and space in the mainstream media.

The second form of the discourse of endangerment focuses on the perceived threat

to the Serbian language and ethnonational identity posed by the post-Yugoslav political

and cultural independence of other Central South Slavs and their concomitant exercise of

their right to name the common language to reflect their own separate identities (i.e.,

Bosnian, Croatian, Montenegrin). Though equally baseless and ultimately absurd, this

second form of the discourse of endangerment is purveyed largely by leading academic

linguists, many of whom, such as Ivan Klajn for instance, are members of SANU, as well

as other academics and prominent writers (again, often members of SANU) and, more

often than not, Serbian politicians and, sometimes (though not attested here), also

members of the clergy in the Serbian Orthodox Church. As could be seen from the many

samples of this discourse cited above, there is an insistence on (and perhaps also a belief

191
in) the Serbian ethnic origins of all Central South Slavs (as well as the Macedonians who

speak a related but separate South Slavic language but are predominantly Eastern

Orthodox). The argument, based on this theory of ethnic origins and the tradition started

by Vuk Karadžić, but also the (politically convenient) Herderian language ideology

according to which language is the only valid criterion for ethnonational affiliation, is

that the “renaming” of the language represents rasparčavanje ‘partitioning’ (a recurrent

theme in public discourse in Serbia since the breakup of Yugoslavia and a keyword in the

sense of Williams, 1976) of the Serbian Volk and therefore a step towards its ultimate

destruction. Figure 5 shows the discourse prosody and a range of applications for (the

decidedly negative term) rasparčavanje, from the partitioning of the Byzantine empire

(line 1) to the partitioning of the Balkans (lines 2, 7) to the partitioning of Yugoslavia

(lines 3, 4, 6) to the partitioning of the (Serbian) language and literature (lines 5, 8, 10-

18) to the partitioning of the Vinča Institute of Nuclear Sciences at the University of

Belgrade (9, 19, 20). Note that a similar discourse of endangerment was evident in the

SANU Memorandum, although the focus there was on the ‘threat’ from the Croats since

Bosniaks and Montenegrins had not yet asserted themselves by mid 1980s when the

memorandum was drafted.

The second dominant language-related discourse identified in this study is a

discourse of contestation. This discourse is directly related to the second form of the

discourse of endangerment and is routinely purveyed by the same set of actors, only here

emphasis is on arguments whose ultimate aim is to delegitimize non-Serb Central South

Slavic identities and with them their political legitimacy and ultimately their rights to

territory and sovereignty.

192
Figure 5. Concordance lines for rasparčavanje ‘partitioning’ in SERBCORP

The discourse of contestation relies on pseudo-scientific arguments and an

instrumentalization of linguistics for political purposes, which is widely noted in the

literature. Typically, the argumentation rests on either Topos 1 or Topos 2, or both.

Topos 1 presents the issue of language name as a “scientific” problem which has been

conclusively settled, although it is never quite explained what this exactly means or what

the scientific methodology used to come to this conclusion was, except for occasional

vague references to dialectological studies. Regardless, the claim is repeatedly made that

the only “scientifically” justified name for Central South Slavic is Serbian. Topos 2,

similarly, draws a selective, flawed comparison between the polycentricity of Central

South Slavic and colonial languages such as English, Spanish and Portuguese (but also

German), arguing that in cases of such polycentricity the “original” name is always

193
preserved even if the language comes to be used by different nations. The fact that

colonial languages were transplanted to the new nations largely via colonialist enterprises

is ignored, and is in fact rather indicative of the conceptualization behind this argument;

that Serbian was not transplanted from Serbia to other Central South Slavic nations in

similar fashion is also ignored. Simultaneously, inconvenient examples of polycentricity

such as, for instance, that in Scandinavia (see, e.g., Vikør, 2000) or the situation with

Hindi/Urdu, which is nearly identical to that with Central South Slavic in several respects

(e.g., a high degree of mutual comprehensibility, different alphabets, correlation between

linguistic, ethnic and religious affiliations), are entirely ignored.

It has thus been shown that underlying the discourses of endangerment and

contestation here is an imported essentialist Western language ideology which derives

from the language philosophy of Johann Gottfried Herder as well as the Romantic

movement (Bauman & Briggs 2000, 2003), coupled by a Slavic language ideology

termed slovesnost in Serbian which encompasses language, alphabet, and literature and

treats them as different aspects of a unified linguistic entity. It has also been shown that

there exist intertextual and interdiscursive links between language-related discourses and

language ideologies in the mainstream Serbian press, on the one hand, and Serbian

ethnonationalism, on the other, as well as a common institutional site, SANU. Finally, it

has been shown that the dominant language-related discourses and language ideologies

were widely accepted in Serbian society during this period. Based on these findings, we

can conclude that, in the case of Serbia and the Balkans, language ideologies (and

language-related discourses) are indeed not “about language alone” (Woolard, 1998, p.

3), as well as that “[t]he continuing intensity of contestation” over language and

194
ethnonational identity is “hardly surprising, given the consequences envisaged and

authorized by the reigning language ideology and occasionally enacted under its auspices.

It is an ideology in which claims of linguistic affiliation are crucial and exclusivist

because they are also claims to territory and sovereignty” (Irvine and Gal, 2000, p. 72,

my italics). In other words, to make a reference to a somewhat similarly intractable

conflict situation of the Middle East, “[t]he political and military conflicts have stopped

but the linguistic conflict goes on” (Abd-el-Jawad & Al-Haq, 1997, p. 439). The ultimate

aim of prolonging this artificial and unnecessary conflict is clear,

Constructing language and tradition and placing them in relationship to


nature/science and society/politics continues to play a key role in producing and
naturalizing new modernist projects, new sets of legislators, and new forms of
social inequality. Which is not to say that the “new” does not often bear a
remarkable resemblance to what has come before, often centuries earlier. Indeed,
it would be difficult to imagine a time that the power of this process was more
apparent than at the end of the twentieth century and the beginning of the twenty-
first (Bauman & Briggs, 2003, p. 301).

What remains to be done is to take a stand with respect to such ideologies, fully

recognizing the impossibility of ideology-free positions. Here, the best course of action

seems to be to follow theorists such as Gramsci (1971) and Gee (2010) in their appeal to

judge ideologies in terms of their social effects rather than their truth values.

8.3 Implications

The findings of this study should be informative and useful to several different

audiences. The results of the comparison between the quantitative methodologies should

be of interest to scholars interested in mixed methods approaches to language ideologies,

but also those interested in similar approaches to discourses. The discursive and

language-ideological profiles presented here should be of interest to scholars in

195
sociolinguistics, as well as those in related fields such as linguistic anthropology, political

science, and sociology, as this is, to the best of my knowledge, currently unavailable.

More specifically, regional linguists and other public figures with an interest in language

such as politicians, academicians, and religious officials will be interested in how their

contributions influence the public discourse on language and dominant language

ideologies. Similarly, journalists and others working in and with the media will be

interested in their impact as purveyors of discourses and language ideologies. Finally,

members of the public in the four states of the Central South Slavic area, as well as

regionally, should find the language-ideological profile of the mainstream Serbian press

of some interest, especially because language continues to be a highly contested issue

which is often exploited for political purposes. It is hoped that empirical contributions

based on transparent, replicable procedures such as this one can help deconstruct the

complex of ethnonationalist discourses and ultimately contribute to regional

reconciliation.

8.4 Limitations

Despite the effort to triangulate the findings by employing multiple

methodological (keyword, collocation, factor, and cluster analysis, analysis of variance,

CDA/DHA) and epistemological (quantitative and qualitative, macro- and miscroscopic

analysis) perspectives, as well as an exhaustive data set (including all relevant articles

from the given time-frame), the language-ideological profile presented here must be

understood as tentative. The primary reason for this is that manifestations of public

language-related discourses and language ideologies are not limited to the press, but can

be found throughout the public sphere. Furthermore, while the press is an excellent

196
source of data on dominant discourses and ideologies purveyed by the elites, it is clear

that it does not represent equally well what the linguistic anthropologist Paul Kroskrity

calls “practical consciousness” (Kroskrity, 2004, p. 505), i.e., the so-called ordinary

people who tend to accept and naturalize dominant ideologies. In addition to this, there

are further aspects of language-related discourses that merit consideration such as, for

example, their extended diachronic development or their possible correlation with

political affiliation, which are considered only in part here. Similarly, the concepts of

discourse and (language) ideology are notoriously problematic and the difficulties

associated with them are necessarily carried over into the study proposed here. Although

language ideology, more often than not, is a linguistic discursive phenomenon, to treat it

as a solely linguistic discursive phenomenon as proposed here is to give but a partial

account of it. In other words, although the lexical approach to the identification of

language ideologies is well established, language ideologies are by no means limited to

their lexical manifestations and so should be examined from multiple other perspectives

(see below).

8.5 Future Research

This study was conceived as part of a larger project of identification and

description of dominant language-related discourses and language ideologies in the entire

Central South Slavic area, i.e., Bosnia-Herzegovina, Croatia, Montenegro, and Serbia.

The rationale for such a broad, comparative examination is that language-related

discourses and language ideologies in these four states are intimately linked on account

of the nations’ shared history. So, the first task for future research would be to produce

individual accounts of language-related discourses and language ideologies for the

197
remaining three national contexts. The second task would be to subject the findings of

individual case studies to a comparative analysis which would make possible a

comprehensive view of language-related discourses and language ideologies and their

links to ethnonationalism in this area. Simultaneously with this, future research would do

well to include also data from other discursive sites such as various kinds of institutional

documentation, popular culture, and language-related discussions in social media. Also,

consideration of independent variables such as political affiliation would provide further

distinctions in our attempt to account for the totality of language-related discourses and

ideologies and ethnonationalism in West Central Balkans as well as elsewhere. Finally,

in addition to other kinds of data from different discursive sites, the approach to the

identification of language-related discourses and language ideologies could be extended

to non-lexical aspects of language (e.g., grammatical relationships, semiotic

manifestations), as well as relevant non-linguistic social and discursive practices (e.g.,

allocation of material and symbolic resources).

1
For a critical review of the literature on the evolution of a symbol system that is arguably what makes

Homo sapiens unique, see Luuk 2013.


2
The demise of the former Yugoslavia has unleashed a regional culture of contestation so pervasive that it

can, not unreasonably, be characterized as pathological. This has meant the politicization of virtually

everything from strategically insignificant borderland areas between Slovenia and Croatia to the very name

of Macedonia which continues to be contested by Greece (see Voss, 2006); the virus, to continue the

medical metaphor, has easily infected the regional academic production, particularly in linguistics.

Greenberg (2004) himself thus notes that often such works, “given the ethnic affiliation of their authors, are

subjective and at times lack the scholarly rigor required in the study of linguistics” (p. 4; see also Irvine &

Gal, 2000, pp. 67-68). This situation has necessitated a reliance on outside sources, unaffiliated with any of

the parties in the region, so Kordić (2010), for example, relies heavily on extra-regional sources,

198
particularly those originating in the German-speaking world. However, as Greenberg also notes, little

relevant material is available in English, while also the treatments in other languages (e.g., Gröschel, 2009),

curiously enough, often exhibit obvious and disqualifying biases and are therefore not relied upon here (see

Katičić, 1997 for a possible explanation).


3
As is well known, Slovenia and Croatia were part of various Austrian and Hungarian kingdoms between

roughly the twelfth century and 1918, Serbia and Montenegro were part of the Ottoman empire between the

middle of the fifteenth century and 1878, while Bosnia-Herzegovina was part of both, belonging to the

Ottoman empire between the middle of the fifteenth century and 1878, and to the Austro-Hungarian empire

comparatively briefly between 1878 and 1918.


4
Partly owing to the policies of the Ottoman Empire which recognized faiths rather than ethnicities among

its subjects, there is a close association between religious and ethnic identities in the Balkans. Most

(religious) Croats are therefore Catholic, most (religious) Serbs and Montenegrins Eastern Orthodox, while

most (religious) Bosniaks are Muslim. This was further reinforced by the 1974 Yugoslav constitution

which recognized the non-Christian Bosnians as a separate ethnonational group but one defined by its

religious affiliation rather than ethnic identity, Muslims (as opposed to their now official ethnonym,

Bosniaks).
5
During that time, the area was still divided between the Austro-Hungarian and Ottoman empires, but

Serbia, in particular, exploited the increasing weakness of the Ottoman Empire to quickly move towards

full independence. Serbia and Montenegro were finally granted independence by the great European

powers at the Berlin Congress in 1878, while Bosnia-Herzegovina was placed under the administrative rule

of the Austro-Hungarian Empire to be annexed in 1908; Croatia remained part of Austria-Hungary until the

end of World War I in 1918.


6
The first mention of the name Serbo-Croatian in a codification work, according to Greenberg (2004, p.

54) came in 1867, in Pero Budmani’s Serbo-Croatian Grammar (see also Katičić, 1997, p. 171).
7
Although Montenegro had been independent since 1878, it was annexed by Serbia shortly after the end of

World War I.
8
Advancing her thesis that Serbo-Croatian/Croato-Serbian is one polycentric language rather than several

different languages, Kordić (2010), counter to virtually all Croatian linguists, contends that Serbian

199
domination of the common standard is “a myth”. However, while this may be true to a large extent for

Croatia, it is patently untrue for Bosnia-Herzegovina and Montenegro (see Endnote 10). In addition, her

insistence on the exclusionary, if historical, name for the language, i.e. Serbo-Croatian/Croato-Serbian, is

rather telling.
9
I.e., South-Slavia, a name which symbolically incorporates all Southern Slavs (except Bulgarians). One

should note, however, that for many non-Serb former Yugoslavs this name has come to index Serbian

domination and hegemony.


10
The existence of separate Bosniak and Montenegrin (ethnonational) identities was disputed by the Croats

and particularly by the Serbs, who since at least the time of Vuk Karadžić, the nineteenth-century Serbian

language reformer, had disputed the existence of any other separate Central South Slavic ethnolinguistic

identities including the Croatian, arguing as Karadžić did that the Croats and Bosniaks were “Serbs of the

Catholic and Islamic faiths”, respectively. For details on Vuk Karadžić, see Endnote 27.
11
Not only did no Bosniak or Montenegrin linguists or literary figures participate in either the Vienna or

Novi Sad agreements (see, e.g., Völkl, 2002, p. 216), Conclusion 7 of the Novi Sad agreement literally

reads, “[…] A mutually (sic!) agreed-upon Commission of Serb and Croat experts will develop a draft of

the Orthographic manual. […]” (Greenberg, 2004, p. 172, my italics).


12
This was strictly enforced as all children were taught and required to use both alphabets in elementary

schools, while all public institutions were required to display bi-alphabetal signs and use both alphabets in

their day-to-day operation. An illustrative example of this latter practice is the alternating use of the two

alphabets by the leading national daily Oslobodjenje, which published in the Latin alphabet one day and in

the Cyrillic alphabet the next.


13
After Bosnian Serbs and Bosnian Croats declared their languages to be Serbian and Croatian,

respectively, Bosniaks reverted to the historical designation of language in Bosnia-Herzegovina which had

first been upheld then abruptly ended by Austria-Hungary, declaring their language to be Bosnian in the

face of opposition from the other two groups which continues to this day.
14
Montenegro chose not to declare independence at that time and instead sided with Serbia in the

subsequent wars. Macedonia was spared a military conflict until a brief civil war with its Albanian

minority in 2001. Kosovo, long a southern Serbian province with a large Albanian majority, was

200
recognized as an independent state in 2008 after a brief 1999 war with Serbia and a NATO intervention

which forced the Serbian military out.


15
Srpska akademija nauka i umetnosti ‘Serbian Academy of Science and Arts’

(https://www.sanu.ac.rs/English/Index.aspx). For a discussion of the 1986 SANU Memorandum, see

Section 7.3.
16
The period 2003-2008 was chosen on the basis of data availability at the time of corpus compilation.
17
Precise circulation figures are somewhat difficult to come by in the Balkans. Independent auditors such

as ABC Srbija (Audit Bureau of Circulations, www.abcsrbija.com) do keep track of circulation figures for

marketing purposes, but their reports are proprietary and require a costly subscription for access. However,

information on circulation figures can also be obtained from occasional press reports issued by publishing

houses such as the Color Press Group (e.g., http://www.color.rs/novosti120.html).


18
Although objective measures of the standing of newspapers in a society (other than circulation figures)

are currently unavailable, Serbian newspaper market is fairly small and centralized, so there is little doubt

as to which publications are generally considered to be authoritative. It should also be noted that, in

addition to the four publications included in this study, the broadsheet daily Večernje Novosti also would

merit inclusion here on account of its relatively high circulation and standing; however, complete data sets

for the subject period were unavailable at the time of corpus compilation, so the Večernje Novosti data were

excluded from the research corpus.


19
The data for the year 2007 were excluded because the Politika data set for 2007 was incomplete due to a

download limit on the source website at the time of compilation.


20
Ebart (est. 2000) is a privately-owned, subscription-based commercial service archiving Serbian media

content.
21
JEZIK = jezik, jezika, jeziku, jezikom, jezici, jezike, jezicima (lemma forms by number and case).
22
The text of the articles was saved in plain txt format using Unicode (UTF-16) encoding and formatted

according to the TEI-guidelines for electronic text encoding and interchange (http://www.tei-

c.org/Guidelines/).
23
Pronouns, numbers, and quantifiers were excepted from deletion on account of their potential functions

in discourse strategies such as the referential/nomination strategy (i.e., the construction of in- and out-

201
groups, Wodak, 2001, pp. 72-74) as well as other, as-of-yet undetermined discursive functions.
24
Following recommendations in Tabachnick and Fidell (2007), the full data set was subjected to a log

transformation in an attempt to retain the cases previously identified as multivariate outliers and enhance

the factorability. However, the log transformation did not produce significantly better results, so the

remainder of the analysis was performed on the original data set.


25
Similar to ‘United States’, the country name ‘Montenegro’, in Serbian, consists of two words, Crna

(negro) and Gora (monte). Consequently, the two words were treated separately by collocation analysis

and were identified as two different collocates of the lemma JEZIK. Unsurprisingly, they turned out to be

correlated with one another to the point of singularity, so a decision was made to exclude one of the two

from further analysis and treat the remaining one as the full country name.
26
Although Biber (1988, p. 85, Note 2) correctly notes that “oblique solutions might be generally

preferable in studies of language use and acquisition, since it is unlikely that orthogonal, uncorrelated

factors actually occur as components of the communication process,” Varimax (orthogonal) and Promax

(oblique) rotations produced virtually the same results on this data set.
27
Z-scores were preferred to regression analysis here because the addition of multivariate outliers produced

a slightly different and inferior factor solution and thus factors and factor scores which were not directly

comparable to the preferred solution above and the regression analysis estimates of factor scores in it.
28
Vuk Stefanović Karadžić (1787-1864) was a Serbian language and literary scholar who created the

spelling system in use in contemporary Serbian and other Central South Slavic varieties and published

several early Serbian dictionaries and editions of Southern Slavic folk literarure (see

http://www.britannica.com/EBchecked/topic/311960/Vuk-Stefanovic-Karadzic). He remains one of the

most revered figures of Serbian history.


29
A similar remark can be made with respect to the concepts of semantic and discourse prosody. Although

both concepts have been found to be useful in analysis of discourse, lexical patterns here are seemingly too

heterogeneous for any clear prosodies to emerge on account of a) the topical heterogeneity of the research

corpus, and b) the multifariousness of the concept of language compared to concepts such as age, for

example (see Mautner, 2007).


30
The seventeen significant collocates of Vuk (ordered by total number of occurrences) are: Karadžić,

202
reforma ‘reform’, Stefanović, jezik ‘language’, srpski ‘Serbian’, sabor ‘assembly’, prvi ‘first’, [Petar II

Petrović] Njegoš [1813-1851, poet, philosopher, a Prince-Bishop of Montenegro], [Vuk] Drašković [a

Serbian writer and government minister in the early 2000s], Dositej [Obradović, 1739-1811, Serbian writer,

Enlightenment philosopher, and the first Minister of Education of Serbia], zadužbina ‘endowment’, reč

‘word’, nagrada ‘award’, jezički ‘linguistic’, delo ‘work’, ministar ‘minister’, and knjiga ‘book’.
31
The verb postoji ‘exists’ was identified as a shared significant collocate of the core concept lemmas

bosanski jezik ‘Bosnian language’ (8 occurrences, MI score = 6.696) and crnogorski jezik ‘Montenegrin

language’ (18 occurrences, MI score = 7.223), but not hrvatski jezik ‘Croatian language’. Although it is

clear from other evidence (e.g., excerpts from texts representative of factors/discourses) that Croatian also

is routinely discursively constructed as non-existent, Croatian, as noted already, has historically enjoyed a

more or less equal status with Serbian and is often implicitly treated as more legitimate than either Bosnian

or Montenegrin. It should further be noted that other lexical items are also used for the same purpose (e.g.,

the gerund postojanje ‘existence’; for examples, see Table 19).


32
Most key-keywords, particularly the more frequent ones, also have the keyword language as an

associate.
33
Following previous research (e.g., Vessey, 2013a), also the ‘plot’ and ‘patterns’ functions of the WST

concordancer were considered. The ‘plot’ function calculates the total number of hits for each text (but, as

already mentioned above, only for individual lemma forms) as well as their dispersion throughout a text.

The ‘patterns’ function presents identified collocates in a table ordered by their frequencies in each slot in

the collocation horizon around the node word (L5-R5). Unfortunately, both proved to be marginally useful

with a data set of this size, so they were excluded from further analysis.
34
In original text, the salient collocates are in bold. Because literal translations into English make for poor

readability, translated text may include equivalents which are not identified as salient collocates in the

original text. In translated text, both salient collocates and equivalents are underlined.
35
Nicholas I, Nikola Petrović (1841-1921), prince and king of Montenegro (see

http://www.britannica.com/EBchecked/topic/414057/Nicholas-I).
36
In a comprehensive study of nationalism in popular Serbian literature from the critical period between

1985 and 1995, Žunić (1999) suggests that Serbian literary Romanticism played a formative role in the

203
development of Serbian nationalism. Furthermore, he finds evidence of an instrumentalization of literature

in the Serbian nationalist project around the breakup of Yugoslavia. Quite in line with the prevalent

understanding of the role of literature in Serbian society (briefly exemplified in Section 7), some of the

most prominent popular literary figures of this period whose works Žunić (1999) studies were or still are

members of SANU (e.g., former President of the Federal Republic of Yugoslavia and one of the authors of

the SANU Memorandum, Dobrica Ćosić).

204
References

Aarsleff, H. (1982). From Locke to Saussure: Essays on the study of language and
intellectual history. Minneapolis: University of Minnesota Press.
Abd-el-Jawad, H. R. S., & Al-Haq, F. A. A. (1997). The impact of the peace process in
the Middle East on Arabic. In Clyne, M. (Ed.), Undoing and redoing corpus
planning (pp. 415-444). Berlin, New York: Mouton de Gruyter.
Althusser, L. (1971). Lenin and philosophy and other essays. London: New Left Books.
Anderson, B. (1983). Imagined communities. London: Verso.
Baker, P. (2004). Querying keywords: Questions of difference, frequency and sense in
keywords analysis. Journal of English Linguistics, 32(4), 346-359.
Baker, P. (2006). Using corpora in discourse analysis. London: Continuum.
Baker, P. (2010) Sociolinguistics and corpus linguistics. Edinburgh: Edinburgh
University Press.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyzanowski, M., McEnery, T., and
Wodak, R. (2008). A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine discourses of refugees and
asylum seekers in UK press. Discourse & Society, 19(3), 273-306.
Baker, P., Gabrielatos, C., & McEnery, T. (2013). Sketching Muslims: A corpus driven
analysis of representations around the word ‘Muslim’ in the British Press 1998-
2009. Applied Linguistics, 13(3), 255-278.
Barbour, S. (2000). Nationalism, language, Europe. In S. Barbour & C. Carmichael
(Eds.), Language and nationalism in Europe (pp. 1-17). Oxford: Oxford
University Press.
Barić, E., Hudeček, L., Koharović, N., Lončarić, M., Lukenda, M., Mamić, M.,
Mihaljević, M., Šarić, Lj., Švaćko, V., Vukojević, L., Zečević, V., & Žagar, M.
(1999). Hrvatski jezični savjetnik [Croatian language handbook]. Zagreb: Školske
Novine.
Bassi, E. (2010). A contrastive analysis of keywords in newspaper articles on the “Kyoto
Protocol”. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 207-218).
Amsterdam: John Benjamins Publishing.
Bauman, R., & Briggs, C. L. (2000). Language philosophy as language ideology: John
Locke and Johann Gottfried Herder. In P. V. Kroskrity (Ed.), Language regimes:
Ideologies, polities, and identities (pp. 139-204). Santa Fe, New Mexico: School
of American Research Press.
Bauman, R., & Briggs, C. L. (2003). Voices of modernity: Language ideologies and the
politics of inequality. Cambridge: Cambridge University Press.
Berber Sardinha, T. (1999). Using key words in text analysis: Practical aspects. Direct
Papers 42, 1-9. ISSN 1413-442x.
Berber Sardinha, T. (2004). Linguistica de corpus. Barueri: Sao Pãulo.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, D. (1993). Representativeness in Corpus Design. Literary and Linguistic
Computing, 8(4), 243-257.
Biber, D. (2006). University language: A corpus-based study of spoken and written
registers. Amsterdam: John Benjamins Publishing.

205
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in
university teaching and textbooks. Applied Linguistics, 25(3), 371-405.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language
structure and use. Cambridge: Cambridge University Press.
Biber, D., & Staples, S. (in press). Cluster analysis. In L. Plonsky (Ed.), Advancing
quantitative methods in second language learning. Routledge.
Biserko, S. (2012). Yugoslavia’s implosion: The fatal attraction of Serbian nationalism.
Belgrade: The Norwegian Helsinki Committee. Retrieved from
http://www.helsinki.org.rs/doc/yugoslavias%20implosion.pdf [Last accessed May
5, 2015]
Blackledge, A. (2005). Discourse and power in a multilingual world. Amsterdam: John
Benjamins Publishing.
Blackledge, A., & Pavlenko, A. (Eds.) (2002). Language ideologies in multilingual
contexts [Special Issue]. Multilingua 21(2/3).
Blommaert, J. (Ed.) (1999). Language ideological debates. Berlin, New York: Mouton de
Gruyter.
Blommaert, J. (2005). Discourse: A critical introduction. Cambridge: Cambridge
University Press.
Blommaert, J. (2006a). Language ideology. In B. Keith (Ed.), Encyclopedia of language
& linguistics (pp. 510-522). Boston: Elsevier.
Blommaert, J. (2006b). Language policy and national identity. In T. Ricento (Ed.),
Introduction to language policy: Theory and method (pp. 238-253). Malden, MA:
Blackwell Publishing.
Blommaert, J., & Verschueren, J. (1998). The role of language in European nationalist
ideologies. In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory (pp. 189-210). New York: Oxford
University Press.
Bondi, M., & Scott, M. (Eds.) (2010). Keyness in texts. Amsterdam: John Benjamins.
Bourdieu, P. (1991). Language and symbolic power. Cambridge, MA: Harvard
University Press.
Brown, G., & Yule, G. (1983). Discourse analysis. Cambridge: Cambridge University
Press.
Bugarski, R. (2004). Language policies in the successor states of former Yugoslavia.
Journal of Language and Politics, 3(2), 189-207.
Carmichael, C. (2000). ‘A people exists and that people has its language’: Language and
nationalism in the Balkans. In S. Barbour & C. Carmichael (Eds.), Language and
nationalism in Europe (pp. 220-239). Oxford: Oxford University Press.
Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language
Learning & Technology, 14(2), 30-49.
Cheng, W., & Lam, P. W. Y. (2013). Western perceptions of Hong Kong ten years on: A
corpus driven critical discourse study. Applied Linguistics, 34(2), 173-190.
Cortes, V., & Csomay, E. (Eds.) (2015). Corpus-based research in applied linguistics:
Studies in honor of Doug Biber. Amsterdam: John Benjamins.
Crapanzano, V. (2000). Serving the word: Literalism in America from the pulpit to the
bench. New York: New Press (distributed by W.W. Norton).

206
Culpeper, J. (2009). Keyness: Words, parts-of-speech and semantic categories in the
character-talk of Shakespeare’s Romeo and Juliet. International Journal of
Corpus Linguistics, 14(1), 29–59.
de Beaugrande, R. (1999). Discourse studies and the ideology of ‘liberalism’. Discourse
Studies, 1(3), 259-295.
DiGiacomo, (1999). Language ideological debates in an Olympic city: Barcelona 1992-
1996. In J. Blommaert (Ed.), Language ideological debates (pp. 105-142). Berlin,
New York: Mouton de Gruyter.
Dirven, R., Hawkins, B., & Sandikcioglu, E. (Eds.) (2001). Language and ideology (Vols.
1 & 2). Philadelphia: John Benjamins Publishing Company.
Đoković, D., Hrvatin, S. B., & Petković, B. (2004). Media ownership and its influence on
independence and pluralism of media in Serbia and the region. Belgrade: Medija
centar. Retrieved from
http://www.mc.rs/upload/documents/biblioteka/vlasnistvomedija1.pdf [Last
accessed May 5, 2015]
Dronjic, V. (2011). Serbo-Croatian: The making and breaking of an ausbausprache.
Language Problems & Language Planning, 35(1), 1-14.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English
for Academic Purposes. English for Specific Purposes, 28(3), 157-169.
Eagleton, T. (1991). Ideology: An introduction. London: Verso.
Edwards, J. (1985). Language, society, and identity. Oxford: Blackwell.
Ensslin, A. (2010). ‘Black and white’: Language ideologies in computer game discourse.
In S. Johnson, & T. M. Milani (Eds.), Language ideologies and media discourse:
Texts, practices, politics (pp. 205-222). London: Continuum.
Ensslin, A., & Johnson, S. (2006). Language in the news: Investigating representations of
“Englishness” using WordSmith Tools. Corpora, 1(2): 153-185.
Erjavec, K. (2009). The Bosnian “war on terrorism”. Journal of Language and Politics,
8(1), 5-27.
Fairclough, N. (2001). Language and power (2nd Ed.). Essex: Pearson Education
Limited.
Fairclough, N. (2010). Critical discourse analysis: The critical study of language (2nd
ed.). Harlow: Longman.
Fishman, J. A. (1972). Language and nationalism. Rowley: New Berry House Publishers.
Fishman, J. A. (1997). Language and ethnicity: The view from within. In Coulmas, F.
(Ed.), The handbook of sociolinguistics (pp. 327-343). Oxford: Blackwell.
Fitzsimmons Doolan, S. (2009). Is public discourse about language policy really public
discourse about immigration? A corpus-based study. Language Policy 8, (4), 377-
402.
Fitzsimmons Doolan, S. (2011). Identifying and describing language ideologies related
to Arizona educational language policy (Unpublished doctoral dissertation).
Northern Arizona University, Flagstaff, AZ. (UMI No. 3467048)
Fitzsimmons Doolan, S. (2014). Using lexical variables to identify language ideologies in
a policy corpus. Corpora, 9(1), 57-82.
Fleischer, A. A. (2007). The politics of language in Quebec: Language policy and
language ideologies in a pluriethnic society (Unpublished doctoral dissertation).
Georgetown University, Washington, D.C.

207
Ford, C. (2001). The (re-)birth of Bosnian: Comparative perspectives on language
planning in Bosnia-Herzegovina (Unpublished doctoral dissertation). University
of North Carolina at Chapel Hill, Chapel Hill, NC.
Foucault, M. (1972). The archeology of knowledge. London: Tavistock.
Fought, C. (2006). Language and ethnicity. Cambridge: Cambridge University Press.
Fowler, R. (1991). Language in the news. London: Routledge.
Fraysee-Kim, S. H. (2010). Keywords in Korean national consciousness: A corpus-based
analysis of school textbooks. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp.
219-234). Amsterdam: John Benjamins Publishing.
Freake, R. (2011). A cross-linguistic corpus-assisted discourse study of language
ideology in Canadian newspapers. Paper presented at the Corpus Linguistics
Conference, Birmingham, England. Retrieved from http://www.birmingham.ac.uk
/documents/college-artslaw/corpus/conference-archives/2011/paper-17.pdf [Last
accessed May 5, 2015]
Freake, R., Gentil, G., & Sheyholislami, J. (2011). A bilingual corpus-assisted discourse
study of the construction of nationhood and belonging in Quebec. Discourse &
Society, 22(1) 21-47.
Friedman, V. (1999). Linguistic emblems and emblematic languages: On language as
flag in the Balkans. Columbus: Department of Slavic and East European
Languages and Literatures at the Ohio State University.
Gal, S. (1998). Multiplicity and contention among language ideologies: A commentary.
In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language
ideologies: Practice and theory (pp. 317-331). New York: Oxford University
Press.
Gal, S. (2001). Linguistic theories and national images in nineteenth-century Hungary. In
S. Gal, & K. A. Woolard (Eds.), Languages and publics: The making of authority
(pp. 30-45). Manchester: St. Jerome.
Gal, S., & Woolard, K. A. (Eds.) (2001). Languages and publics: The making of
authority. Manchester: St. Jerome.
Gee, P. J. (2010). An introduction to discourse analysis: Theory and method. London:
Routledge.
Giddens, A. (1979). Central problems in social theory: Action, structure and
contradiction in social analysis. Berkeley and Los Angeles: University of
California Press.
Gramsci, A. (1971). Selections from the prison notebooks of Antonio Gramsci. London:
Lawrence & Wishart.
Gray, B., & Biber, D. (2013). Lexical frames in academic prose and
conversation. International Journal of Corpus Linguistics, 18(1), 109-135.
Greenberg, R. D. (2004). Language and identity in the Balkans: Serbo-Croatian and its
disintegration. Oxford: Oxford University Press.
Gröschel, B. (2009). Das Serbokroatische zwischen Linguistik und Politik: Mit einer
Bibliographie zum postjugoslawischen Sprachstreit [Serbo-Croatian between
linguistics and politics: With a bibliography on the post-Yugoslav language
conflict]. Lincom Europa: München.
Habermas, J. (1989). The structural transformation of the public sphere. Cambridge,
MA: MIT Press.

208
Halliday. M. A. K., & Matthiessen, C. M. I. M. (2004). Introduction to functional
grammar. New York: Routledge.
Hardt-Mautner, G (1995). ‘Only connect’: Critical discourse analysis and corpus
linguistics. Retrieved from http://ucrel.lancs.ac.uk/papers/techpaper/vol6.pdf
[Last accessed May 5, 2015]
Haugen, E. (1972). Dialect, language, nation. In J. B. Pride & J. Holmes
(Eds.), Sociolinguistics (pp. 97-111). Harmondsworth: Penguin. (Originally
published in American Anthropologist 68 (1966): 922-935.)
Heller, M. (1999). Heated language in a cold climate. In J. Blommaert (Ed.), Language
ideological debates (pp. 143-172). Berlin, New York: Mouton de Gruyter.
Herzfeld, M. (1987). Anthropology Through the looking-glass: Critical ethnography in
the margins of Europe. Cambridge: Cambridge University Press.
Hobsbawm, E. (1990). Nations and nationalism since 1780. Cambridge: Cambridge
University Press.
Hornberger, N. H., & McKay, S. L. (Eds.) (2010). Sociolinguistics and language
education. Bristol: Multilingual Matters.
Hult, F. M., & Pietikainen, S. (2014). Shaping discourses of multilingualism through a
language ideological debate: The case of Swedish in Finland. Journal of
Language and Politics, 13(1), 1-20.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University
Press.
IBM (2012). Statistical Package for the Social Sciences.
Irvine, J. T. (1989). When talk isn’t cheap: Language and political economy. American
Ethnologist, 16(2), 248-267.
Irvine, J. T., & Gal, S. (2000). Language ideology and linguistic differentiation. In P. V.
Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp. 35-
83). Santa Fe, New Mexico: School of American Research Press.
Jaffe, A. (1999). Ideologies in action: Language politics on Corsica. Berlin: Mouton de
Gruyter.
Johnson, S., & Ensslin, A. (2007). Language in the media: Representations, identities,
ideologies. New York: Continuum.
Johnson, S., & Milani, M. M. (Eds.) (2010). Language ideologies and media discourse:
Texts, practices, politics. London: Continuum.
Johnson, S., Milani, M. M., & Upton, C. (2010). Language ideological debates on the
BBC ‘Voices’ website: Hypermodality in theory and practice. In S. Johnson, & T.
M. Milani (Eds.), Language ideologies and media discourse: Texts, practices,
politics (pp. 223-251). London: Continuum.
Johnson, S., & Suhr, S. (2003). From ‘political correctness’ to ‘politische Korrektheit’:
Discourses of ‘PC’ in the German newspaper, Die Welt. Discourse & Society,
14(1), 49-68.
Katičić, R. (1997). Undoing a ‘unified language’: Bosnian, Croatian, Serbian. In M.
Clyne (Ed.), Undoing and redoing corpus planning (pp. 269-289). Berlin: Mouton
de Gruyter.
Kloss, H. (1967). ‘Abstand languages’ and ‘Ausbau languages’. Anthropological
Linguistics, 9(7), 29-41.
Kordić, S. (2010). Jezik i nacionalizam [Language and nationalism]. Zagreb: Durieux.

209
Kroskrity, P. V. (1998). Arizona Tewa Kiwa speech as a manifestation of a dominant
language ideology. In B. B.Schieffelin, K. A.Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory. New York: Oxford University Press.
Kroskrity, P. V. (Ed.) (2000a). Language regimes: Ideologies, polities, and identities.
Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2000b). Regimenting languages: Language ideological perspectives. In
P. V. Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp.
1-34). Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2004). Language ideologies. In A. Duranti (Ed.), A companion to
linguistic anthropology (pp. 496-517). Malden, MA: Blackwell Publishing.
Kuo, S., & Nakamura, M. (2005). Translation or transformation? A case study of
language and ideology in the Taiwanese press. Discourse & Society, 16(3), 393-
417.
Lippi-Green, R. (2007). English with an accent: Language, ideology, and discrimination
in the United States. London: Routledge.
Luuk, E. (2013). The structure and evolution of symbol. New Ideas in Psychology, 31(2),
87-97.
Mautner, G. (2007). Mining large corpora for social information: The case of elderly.
Language in Society, 36, 51-72.
May, S. (2001). Language and minority rights: Ethnicity, nationalism and the politics of
language. Harlow: Pearson.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. New York: Routledge.
McGroarty, M. (2008). The political matrix of linguistic ideologies. In B. Spolsky & F.
M. Hult (Eds.), The handbook of educational linguistics (pp. 98-112). Malden,
MA: Blackwell Publishing.
McGroarty, M. (2010). Language and ideologies. In N. N. Hornberger & S. L. McKay
(Eds.), Sociolinguistics and language education (pp. 3-39). Bristol: Multilingual
Matters.
Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal
of Sociolinguistics, 5(4), 530-555.
Moskovljević, M. (1966). Rečnik savremenog srpskohrvatskog jezika s jezičkim
savetnikom [Dictionary of contemporary Serbo-Croatian with a language
handbook]. Beograd: Tehnička Knjiga i Nolit.
O’Rourke, B., & Ramallo, F. (2013). Competing ideologies of linguistic authority
amongst new speakers in contemporary Galicia. Language in Society, 42(3), 287-
305.
Partington, A. (2003). The linguistics of political argument: The spin-doctor and the
wolf-pack at the White House. London: Routledge.
Partington, A. (2010). Modern Diachronic Corpus-Assisted Discourse Studies (MD-
CADS) [Special Issue]. Corpora, 5(2).
Pennycook, A. (2001). Critical applied linguistics: A critical introduction. Mahwah, NJ:
Routledge.

210
Pujolar, J. (2007). The future of Catalan: Language endangerment and nationalist
discourses in Catalonia. In A. Duchêne & M. Heller (Eds.), Discourses of
endangerment: Ideology and interest in the defence of languages (pp. 121-148).
London: Continuum.
Rayson, P. (2008). From key words to key semantic domains. International Journal of
Corpus Linguistics, 13(4), 519-549.
Reisigl, M., & Wodak, R. (2009). The discourse-historical approach (DHA). In R. Wodak
& M. Meyer (Eds.), Methods of critical discourse analysis: Theory and method
(pp. 87-121). SAGE: London.
Ricento, T. (Ed.) (2000). Ideology, politics and language policies: Focus on English.
Philadelphia: John Benjamins Publishing.
Ricento, T. (2003). The discursive construction of Americanism. Discourse & Society,
14(5), 611-637.
Ricento, T. (2006). Americanization, language ideologies and the construction of
European identities. In C. Mar-Molinero, & P. Stevenson (Eds.), Language
ideologies, policies and practices. Language and the future of Europe (pp. 44-57).
Basingstoke: Palgrave Macmillan.
Rumsey, A. (1990). Wording, meaning and linguistic ideology. American Anthropologist,
92(2), 346-361.
Safran, W. (1999). Nationalism. In Fishman, J. A. (Ed.), Handbook of language & ethnic
identity (pp. 77-93). Oxford: Oxford University Press.
Salama, A. H. Y. (2011). Ideological collocation and the recontextualization of Wahhabi-
Saudi Islam post-9/11: A synergy of corpus linguistics and critical discourse
analysis. Discourse & Society, 22(3), 315-342.
SANU (1986). Memorandum srpske akademije nauka i umetnosti (nacrt) [A draft
memorandum of the Serbian Academy of Sciences and Arts]. Retrieved from
http://www.helsinki.org.rs/serbian/doc/memorandum%20sanu.pdf [Last accessed
May 5, 2015]
Schieffelin, B. B., Woolard, K. A., & Kroskrity, P. V. (Eds.) (1998). Language
ideologies: practice and theory. New York: Oxford University Press.
Scott, M. (1997). PC analysis of key words – and key key words. System, 25(2), 233-245.
Scott, M. (2009). In search of a bad reference corpus. In D. Archer (Ed.), What’s in word-
list? Investigating word frequency and keyword extraction (pp. 79-92). Oxford:
Ashgate.
Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and
marking out our trails… In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 43-
58). Amsterdam: John Benjamins Publishing.
Scott, M. (2014a). WordSmith Tools Help Manual. Version 6.0. Liverpool: Lexical
Analysis Software.
Scott, M. (2014b). WordSmith Tools. Liverpool: Lexical Analysis Software.
Scott, M. R., & Tribble C. (2006). Key words and corpus analysis in language education.
Amsterdam: John Benjamins Publishing.
Seargeant, P. (2009). Language ideology, language theory, and the regulation of
linguistic behavior. Language Sciences, 31, 345-359.

211
Silverstein, M. (1979). Language structure and linguistic ideology. In R. Clyne, W.
Hanks & C. Hofbauer (Eds.), The elements: A parasession on linguistic units and
levels (pp. 193-247). Chicago: Chicago Linguistic Society.
Silverstein, M. (1993). Metapragmatic discourse and metapragmatic function. In J.A.
Lucy (Ed.), Reflexive language: Reported speech and metapragmatics (pp. 33-
58). Cambridge: Cambridge University Press.
Silverstein, M. (1998). The uses and utility of ideology: A commentary. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 123-145). New York: Oxford University Press.
Silverstein, M. (2000). Whorfianism and the linguistic imagination of nationality. In P. V.
Kroskrity, (Ed.), Language regimes: Ideologies, polities, and identities (pp. 85-
138). Santa Fe, New Mexico: School of American Research Press.
Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford: Oxford University
Press.
Skutnabb-Kangas, T. (2000). Linguistic genocide in education or worldwide diversity
and human rights? Mahwah, NJ: Lawrence Erlbaum Associated Inc., Publishers.
Spitulnik, D. (1998). Mediating unity and diversity: The production of language
ideologies in Zambian broadcasting. In B. B. Schieffelin, K. A. Woolard & P. V.
Kroskrity (Eds.), Language ideologies: Practice and theory (pp. 163-188). New
York: Oxford University Press.
Spolsky, B. (2004). Language policy. Cambridge: Cambridge University Press.
Stubbs, M. (1983). Discourse analysis: The sociolinguistic analysis of natural language.
Chicago: University of Chicago Press.
Stubbs, M. (1996). Text and corpus analysis. London: Blackwell.
Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness
in texts (pp. 21-42). Amsterdam: John Benjamins Publishing.
Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in
students’ evaluations of mathematics instructors on RateMyProfessors.com.
Language in Society 44 (1), 35-62.
Sudetic, C. (1993, December 26). Balkan conflicts are uncoupling Serbo-Croatian. The
New York Times. Retrieved from
http://www.nytimes.com/1993/12/26/world/balkan-conflicts-are-uncoupling-
serbo-croatian.html [Last accessed May 5, 2015]
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston, MA:
Pearson Education, Inc.
Thompson, J. B. (1984). Studies in the theory of ideology. Cambridge: Polity Press.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam: John Benjamins
Publishing Company.
Tomić, Y. (n.d.). The ideology of Greater Serbia in the nineteenth and twentieth
centuries: An expert report. Paris: Bibliotheque de documentation internationale
contemporaine, Universite de Paris X-Nanterre. Retrieved from
http://www.helsinki.org.rs/serbian/doc/expert%20report%20-
%20yves%20tomic.pdf [Last accessed May 5, 2015]

212
Tošović, B. (n.d.). Herausbildung des Bosnischen/Bosniakischen [The development of
Bosnian/Bosniak]. Retrieved from http://www-gewi.uni-
graz.at/gralis/Slawistikarium/BKS/Herausbildung_Bosnisch-Bosniakisch.pdf
[Last accessed May 5, 2015]
van Dijk, T. A. (1998). Ideology: A multidisciplinary approach. London: SAGE.
van Dijk, T. A. (2006). Ideology and discourse analysis. Journal of Political Ideologies,
11(2), 115-140.
Vessey, R. (2013a). Language ideologies and discourses of national identity in Canadian
newspapers: A cross-linguistic corpus-assisted discourse study (Unpublished
doctoral dissertation). University of London, London.
Vessey, R. (2013b). Too much French? Not enough French?: The Vancouver Olympics
and a very Canadian language ideological debate. Multilingua, 32(5), 659-682.
Vikør, L. S. (2000). Northern Europe: Languages as prime markers of ethnic and national
identity. In S. Barbour & C. Carmichael (Eds.), Language and nationalism in
Europe (pp. 105-129). Oxford: Oxford University Press.
Völkl, S. D. (2002). Bosnisch [Bosnian]. In M. Okuka (Ed.), Lexikon der Sprachen des
europäischen Ostens [The lexicon of East European languages] (Vol. 10, pp. 209-
218). Klagenfurt: Wieser. Retrieved from
http://wwwg.uni-klu.ac.at/eeo/Bosnisch.pdf [Last accessed May 5, 2015]
Voss, C. (2006). The Macedonian standard language: Tito-Yugoslav experiment or
symbol of ‘Great Macedonian’ ethnic inclusion? In C. Mar-Molinero, & P.
Stevenson (Eds.), Language ideologies, policies and practices. Language and the
future of Europe (pp. 118-132). Basingstoke: Palgrave Macmillan.
Wallis, D. A. (1998). Language, attitude, and ideology: An experimental social-
psychological study. Journal of Pragmatics, 30, 21-48.
Warren, M. (2010). Identifying aboutgrams in engineering texts. In M. Bondi & M. Scott
(Eds.), Keyness in texts (pp. 113-126). Amsterdam: John Benjamins Publishing.
Wilce, J. (2010). Society, language, history and religion: A perspective on Bangla from
linguistic anthropology. In T. Omoniyi (Ed.), The sociology of language and
religion: Change, conflict, and accommodation (pp. 126-155). London: Palgrave/
Macmillan.
Williams, R. (1976). Keywords: A vocabulary of culture and society. New York: Oxford
University Press.
Wodak, R. (2001). The discourse-historical approach. In R. Wodak & M. Meyer (Eds.),
Methods of critical discourse analysis (pp. 63-94). London: SAGE.
Wodak, R. (2004). Critical discourse analysis. In C. Seale, G. Gobo, J. F. Gubrium & D.
Silverman (Eds.), Qualitative research practice (pp.197-213). London: Sage.
Wodak, R. (2012). Language, power and identity. Language Teaching, 45(2), 215-233.
Wodak, R., & de Cillia, R., & Reisigl, M., & Liebhart, K. (1999). The discursive
construction of national identity. Edinburgh: Edinburgh University Press.
Wodak, R., & Meyer, M. (Eds.) (2009). Methods of critical discourse analysis: Theory
and method. SAGE: London.
Woolard, K. A. (1998). Introduction: Language ideology as a field of inquiry. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 3-47). New York: Oxford University Press.

213
Woolard, K. A., & Schieffelin, B. B. (1994). Language ideology. Annual Review of
Anthropology, 23, 55-82.
Wright, S. (2004). Language policy and language planning: From nationalism to
globalization. New York: Palgrave MacMillan.
Xiao, R., & McEnery, T. (2005). Two approaches to genre analysis: Three genres in
Modern American English. Journal of English Linguistics, 33(1), 62-82.
Žunić, D. (1999). Nacionalizam i književnost: Srpska književnost 1985-1995
[Nationalism and literature: Serbian literature 1985-1995]. Open Society Institute:
Budapest, Hungary. Retrieved from
http://rss.archives.ceu.hu/archive/00001127/01/133.pdf [Last accessed May 5,
2015]

214
Appendices

Appendix A: Sampling Procedures

Although one of the main goals of the methodological synergy between CL and

CDA is to provide an objective and reliable sampling procedure, similar to many other

corpus-based discourse studies the data set compiled here is too large for a

comprehensive analysis, even with the help of quantitative CL techniques. Existing

research deals with this issue in a variety of ways. Studies based on keyword analysis

typically focus on a limited number of high-scoring items, regardless of the actual

number of items identified as key. Studies based on collocation analysis, on the other

hand, focus on limited sets of search terms and collocates, as well as random samples of

concordance lines. To solve the problem of oversampling, Hunston (2002), for example,

suggests a downsampling technique based on a random selection of concordance lines.

Somewhat similarly, Baker et al. (2008) suggest a focus on what they call consistent

collocates (i.e., items that appear as significant collocates of the target core concept(s)

throughout the timeframe rather than over isolated periods within it). Vessey (2013a), in

contrast, concentrates precisely on such peaks in discursive activity (e.g., around

significant events), focusing her CL searches on a small set of theoretically-relevant items

resulting from previous research. Despite the relative practical and theoretical merits of

such downsampling procedures, however, it is clear that they all incur a loss of

information without necessarily demonstrating the validity of the approaches.

Exploratory factor analysis (EFA), on the other hand, is ideally suited for principled

analyses of large data sets with numerous variables, as it employs covariance among

variables to produce sets of mutually positively correlated variables that can help the

215
researcher identify discourses and ideologies in the data in an objective manner and with

the promise of minimal loss of information. For EFA to identify meaningful covariance

and thus produce useful results, however, variables must meet certain minimal

requirements such as sufficient frequency per observation (e.g., individual text; Douglas

Biber, personal communication).

In order to find a solution to this problem, the data were carefully examined from

a variety of standpoints, paying particular attention to the hit count (i.e., number of

occurrences of forms of the lemma JEZIK ‘language’) in individual articles and its

relationship to article content. Although there is a variety of ways to count hits and then

examine articles based on this information, the ‘plot’ function in WordSmith Tools 6.0

(WST; Scott, 2014b) concordancer tool was used initially to sort articles according to the

number of hits as the quickest and most practical solution. The top ten and bottom ten

articles on the list were then examined closely in their entirety with an eye to potential

usefulness of their content for analysis of language-related discourses and ideologies.

Quite expectedly, this examination revealed that higher numbers of hits meant a higher

likelihood of an article potentially containing relevant material. For example, while the

top ten articles tended to have high numbers of hits (e.g., between 22 and 45 for the

nominative form of the lemma JEZIK) and content in which language was explicitly

discussed, the bottom ten articles (and roughly 66% of all articles, see Table 4.6) all had a

single hit and tended to mention language only in passing. Interestingly, Vessey (2013a)

found a similar pattern in Canadian newspaper data in English and French; for illustrative

examples of the pattern here, see Examples 1 and 2 below. Based on this, hit count was

taken as a reasonable indicator of content relevance and thus taken as a valid sampling

216
criterion. Further, based in part on the methodological constraints of EFA (Douglas

Biber, personal communication), the cutoff point for inclusion in the research sample was

set at 5 hits for the lemma JEZIK per article.

Example 1 (excerpt from “The restoration of Serbian studies,” Politika, July 22,

2006, 45 hits)

Jezik jeste jedan, ali srpski, jeste jedinstven, ali ne i hrvatski, jeste zajednički, ali
samo po upotrebi,no nikako ne i po pripadnosti i poreklu Akademik Pavle Ivić, u
svojim dijalektološkim studijama, dao je tvrde i nepobitne naučne dokaze da
štokavski dijalekt, po svome poreklu i svojoj prvobitnoj teritorijalnoj
rasprostranjenosti, obuhvata oblasti srednjevekovne srpske države (uglavnom do
reke Cetine). Nema nikakve sumnje da je tim jezikom govorio srpski narod i da je
to jezik srpskog naroda. Nauci je takođe dobro poznato da su čakavski i kajkavski
dijalekti nastali na tlu Hrvatske i da oni predstavljaju izvorni hrvatski jezik.
Hrvati su se svoga čakavskog i kajkavskog jezika odrekli u prvoj polovini 19.
veka i prihvatili su Vukov, srpski, štokavski govor. Tako je srpski jezik postao i
hrvatski, zajednički, srpski i hrvatski, srpski ili hrvatski, srpskohrvatski i
hvratskosrpski. […]

The language [Central South Slavic] is one, but Serbian, it is unified, but not also
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. Academician Pavle Ivić, in his dialectological studies, has given hard and
irrefutable evidence that the Štokavian dialect, in terms of its origin and its
original territorial spread, covers the area of the medieval Serbian state (mostly to
the river Cetina). There is absolutely no doubt that this was the language spoken
by the Serbian people and belonging to the Serbian people. At the same time, it
has been scientifically established that the Čakavian and Kajkavian dialects came
into being in the territory of Croatia and that they represent the original Croatian
language. The Croats gave up on their Čakavian and Kajkavian language (sic) in
the first half of the nineteenth century, accepting Vuk’s [Vuk Stefanović Karadžić,
nineteenth century Serbian grammarian and language reformer] Serbian,
Štokavian speech. Thus, Serbian language became also Croatian, shared, Serbian
and Croatian, Serbian or Croatian, Serbo-Croatian or Croato-Serbian. […]

Example 2 (no title, Politika, February 17, 2003, 1 hit)

Večeras je u Novom Sadu održana svečana sednica Matice srpske povodom 177.
godišnjice postojanja ovog hrama naše kulture. Povodom 50. godišnjice Zbornika
Matice srpske za književnost i jezik besedio je prof. dr Jovan Delić. Na
večerašnjoj svečanosti pesniku Nikoli Vujičiću uručena je Zmajeva nagrada za
2002. godinu za knjigu pesama „Prepoznavanje" koju je izdalo Kulturno društvo
„Prosvjeta" iz Zagreba. Stihove laureata kazivao je dramski umetnik Ivan

217
Jagodić.

The Matica Srpska [Serbian Language and Literary Society] held a celebratory
session tonight in Novi Sad to mark the 177th anniversary of this temple of our
culture. To mark the 50th anniversary of the Matica Srpska Journal of Literature
and Language an address was delivered by Dr. Jovan Delić. Also at tonight’s
ceremony poet NikolaVujičić received the Zmaj [Jovan Jovanović Zmaj,
nineteenth century Serbian poet] Award for the year 2002 for his book of poetry
titled “Recognition” which was published by the Zagreb-based “Enlightenment”
Cultural Society. A selection of the poet laureate’s verses was performed by
theater actor Ivan Jagodić.

The ‘plot’ function in the WST concordancer (see above), however, is unable to

calculate the total number of hits for any given lemma (i.e., it is only capable of

calculating the number of hits for individual lemma forms separately). In order to arrive

at the total number of hits per article for all forms of the lemma JEZIK, a custom Python

application was used to simultaneously compute the total number of hits for all lemma

forms per article and sort articles into separate folders according to the number of hits.

Following this, another custom Python program was used to calculate the total number of

articles per hit category (1, 2-4, 5-9, and 10+ hits) and publication. Using the above-

mentioned cutoff point of 5 hits per article (again, for all forms of the lemma JEZIK

combined), a total of 1,257 articles were identified (with 5+ hits, see Tables 6-9 in

Section 4).

Another way to test hit count as a sampling criterion is to compare the frequencies

in the two sections of forms of the lemma JEZIK and its collocates directly related to the

pertinent ethnolinguistic identities such as Bosnian, Bosniak, Croatian, Montenegrin, and

Serbian as the most obvious pointers to discourses relevant to an analysis of links

between language ideologies and ethnonationalism (Table A1). (I am also including the

verb postoji “exists”, which here suggests a pervasive discourse and concomitant

218
ideology of contestation, as another example).

Table A1

Comparison between the 1 Hit and 5+ Hits Sections of SERBCORP in Terms of

Language- and Identity-related Collocates

1 hit section 5+ hits section


Size (words) 7,141,589 1,118,529
Size (articles) 10,616 1,257
Mean length in words (SD) 662.70 (621.03) 879.19 (728.90)

Lemma JEZIK Raw freq./per 1,000,000 Raw freq./per 1,000,000


jezik (language) 3,103/434 4,644/4,151
jezika 2,778/388 4,565/4,081
jeziku 3,020/422 1,931/1,726
jezikom 985/137 577/515
jezici (languages) 151/21 230/205
jezike 313/43 316/282
jezicima 268/37 267/238
Total 10,168/1,423 12,530/11,202

Identity-related collocates Raw freq./per 1,000,000 Raw freq./per 1,000,000


bosanski (Bosnian) 39/5.5 165/147.5
bošnjački (Bosniak) 14/1.9 116/103.7
crnogorski (Montenegrin) 34/4.75 351/313.8
hrvatski (Croatian) 93/13 363/324.5
srpski (Serbian) 2,122/297 3,449/3,083
postoji (exists) 63/8.8 161/143.9

Even a quick glance at the results of frequency and collocation analyses on the 1

hit and 5+ hits sections, shows that while forms of the lemma JEZIK can be found in both

sections of the corpus, they are roughly eight times more frequent in the 5+ hits section

overall (11,202 vs. 1,423 occurrences per million words), which of course is expected

given the selection criterion here. More importantly, the pertinent ethnolinguistic

collocates and the verb postoji ‘exists’ as another indicator of relevant discourses (all

significant at MI > 5) have considerably higher normalized frequencies in the 5+ hits

section, again suggesting a comparatively greater relevance of articles featuring higher

numbers of explicit references to language. Perhaps the starkest example here is the

219
collocate ‘Bosniak’ (103.7 vs. 1.9 occurrences per million words) which as a language

label is an indicator of a pervasive discourse of contestation (as many Serbian but also

Croatian linguists, politicians, and public figures argue that the language of Bosniaks can

only be called Bosniak, after the people, and not Bosnian, after the country, as that

supposedly represents an attack on separate Serbian and Croatian ethnolinguistic

identities within Bosnia, even if all three languages are official according to the country’s

constitution).

A final and perhaps most convincing piece of evidence of the greater relevance of

articles with a higher hit count, and thus of the validity of the hit count as a sampling

criterion, can be obtained from keyword analysis. The keyword list resulting from a

comparison between the 1-4 hits and 5+ hits sections of SERBCORP thus includes the

lemma JEZIK and the relevant ethnonyms and glottonyms, as well as a considerable

number of other potentially relevant items such as narod ‘people’, nacionalni ‘national’,

naziv ‘[language] label’, identitet ‘identity’, nacija ‘nation’, etc., which indicates their

greater salience in the 5+ hits section (see Table 6.5). It therefore seems reasonable to

conclude that, while traces of relevant discourses and ideologies can be found also in

texts that are not equally language-focused (i.e., articles with a lower hit count), the

sampled data set represents a concentrated discourse of higher relevance to the study of

the links between language-related discourses and language ideologies, on the one hand,

and ethnolinguistic identities and ethnonationalism, on the other.

220
Appendix B: Comparative Analyses of Comparator Corpora

In order to assess the relative usefulness of different types of reference corpora, I

also compiled a set of three wordlists from the very large, newly available web-as-corpus

(WaC) corpora of Serbian1 to use as reference corpora and compare to SERBCOMP

(Table B1).

Table B1

Sizes of SERBCOMP and the WaC Reference Corpora

Corpus Size (in words)


SERBCOMP 22,493,804
SETIMES2 43,482,838
OPUS2 198,141,613
srWaC14 561,529,963

To determine the optimal reference corpus, keyword analysis was conducted with

SERBCOMP as well as the WaC corpora as reference corpora, using both the chi-square

and log-likelihood tests. Minimum KW frequency (5) and the p value (.0000000001)

were kept the same for all tests and corpora combinations. In a discussion of measures of

similarity between corpora, Baker (2010, pp. 91-93) suggests the use of frequency and

rank information and the Spearman rank correlation statistic as a way to assess the degree

of similarity between multiple corpora. In an adaptation of this technique to comparisons

between reference corpora, I used keyness scores rather than item frequency to assess

similarity. Thus, in order to compare the results of keyword analyses conducted using

SERBCOMP and WaC corpora, I compared the (keyness-based) ranks of the top 100

KWs resulting from keyword analysis with SERBCOMP as the reference corpus to the

ranks of the same KWs resulting from keyword analyses with the WaC corpora as

reference corpora. The correlation scores produced by the Spearman rank correlation

221
statistic (p < .01) ranged from .62 for the largest but most heterogeneous srWaC14 to .61

for the somewhat smaller but more coherent OPUS to .71 for the smallest and most

coherent SETIMES2. Since SETIMES2 is a corpus compiled from news articles

published in an online news outlet (The Southeast European Times, www.setimes.com)

and is thus the most similar to SERBCOMP of the three WaC corpora, the result of

correlation analysis is expected.

A careful analysis of the keyword lists produced by the different reference corpora

revealed that, despite their very different sizes (see Table B1), different reference corpora

yield comparable results regardless of the statistic chosen. This lends support to Xiao &

McEnery’s (2005) claim that the size of the reference corpus may not be important,2

particularly for keyword analyses of large research corpora. Similarly confirmed is

Culpeper’s (2009) finding that the chi-square and log-likelihood tests produce negligibly

different keyword lists, so log-likelihood was used in all subsequent analyses. The most

notable difference between the keyword lists thus produced was in the noise levels, which

can be defined as the proportion of functional words and semantically generic lexical

material such as, for instance, prepositions (e.g., by) and time adverbs with context-

specific reference (e.g., yesterday) identified as key. Unsurprisingly, the keyword list

produced by SERBCOMP as the reference corpus exhibited the lowest noise level. An

additional difficulty in dealing with the WaC corpora as reference corpora was that

keyword analyses were based on comparisons of wordlists (rather than the corpora

themselves),3 which presented problems for lemmatization4 in WST. SERBCOMP as a

comparator corpus was thus determined to have two principal advantages. First, because

both SERBCORP and SERBCOMP comprise newspaper register, the resulting keyword

222
list is largely free from items that characterize newspaper language in general as well as

other items contributing to noise, and second, items identified as key can be expected to

be characteristic of language-related discourses. In other words, because SERBCORP

and SERBCOMP are very similar except in whether or not they contain texts mentioning

language, the KWs resulting from keyword analysis based on SERBCOMP as the

reference corpus are more likely to be characteristic of newspaper articles mentioning or

discussing language (rather than newspaper discourse in general) and so are more likely

to be useful for the identification of any discourses and ideologies pertaining to language

(cf. “irrelevant stylistic differences” in Culpeper, 2009 above). Based on these findings, a

decision was made to conduct all keyword analyses with SERBCOMP as the reference

corpus and to exclude the WaC corpora from further consideration.

223
Appendix C: Keyword Analysis (SERBCORP)

To get a discursive profile of the research corpus as a whole, SERBCORP was

compared to SERBCOMP (the lists of positive and negative KWs are presented in Tables

C1 and C2). This analysis identified a total of 111 positive and 77 negative key

lemmas.41 Expectedly, the top positive key lemma in SERBCORP is jezik ‘language’,

which simply confirms that SERBCORP is about language. The top ten positive key

lemmas also include knjiga ‘book’, srpski ‘Serbian’, književnost ‘literature’, engleski

‘English’, škola ‘school’, pisac ‘writer’, roman ‘novel’, pesnik ‘poet’, and kultura

‘culture’. This suggests that SERBCORP is predominantly about discussions of or

references to language in the contexts of literature, education, and culture, with a

prominent pair of identity-related items, Serbian and English, further suggesting a

pervasive discourse of national identity based on group membership (‘us’ and ‘them’).

Other top fifty positive key lemmas suggest similar semantic fields: ethnonyms and

glottonyms as well as other identity-related nouns and pronouns (naš ‘our, Srbi ‘Serbs’,

narod ‘people’, istorija ‘history’, francuski ‘French, ja ‘I’, svoj ‘own’, moj ‘my’, ona

‘she’, and Crna Gora ‘Montenegro’); education (e.g., professor ‘professor’, obrazovanje

‘education’, deca ‘children’, prosvete ‘education [profession]’, znanje ‘knowledge’,

nauka ‘science’, program ‘curriculum’), literature and publishing (e.g., izdavač

‘publisher, nagrada ‘award’, autor ‘author’, urednik ‘editor’), and theater and film (e.g.,

umetnost ‘art’, predstava ‘[theater] play’, film ‘film’, pozorišta ‘theater’). Most

pertinently for our purposes, there is a remarkable absence of references to other regional

ethnolinguistic identities (with the possible exception of the key lemma ime ‘name’),

which confirms that general language-related newspaper discourse may not be ideally

224
suited for the study of links between language-related discourses and language ideologies

and ethnolinguistic identities.

The top ten negative key lemmas include Srbija ‘Serbia’, vlada ‘government’,

evra ‘Euros’, miliona ‘millions’, odsto ‘percent’, dinara ‘Dinars’, zakon ‘law’, stranke

‘(political) parties’, protiv ‘against’, and predsednik ‘president’). This suggests that,

compared to general newspaper discourse (i.e., SERBCOMP), articles mentioning

language (SERBCORP) tend to include considerably fewer references to national

political and state institutions, as well as finances. Similar to the positive key lemmas

above, the remaining negative key lemmas confirm the relative absence of references to

national political and state institutions.

225
Table C1

Positive Key Lemmas in SERBCORP (by Keyness Score)

N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 33834 0.29 6517 1 72780.53 0.0000000000
2 book knjiga 22712 0.19 3535 10369 0.05 16361.50 0.0000000000
3 Serbian srpski 35745 0.31 3720 32565 0.14 9509.99 0.0000000000
4 literature književnost 6070 0.05 1280 1158 7657.32 0.0000000000
5 English engleski 5252 0.05 1361 722 7490.84 0.0000000000
6 school škola 12984 0.11 1362 7491 0.03 7281.08 0.0000000000
7 writer pisac 7477 0.06 1687 2633 0.01 6678.77 0.0000000000
8 novel roman 5908 0.05 1287 2166 5120.75 0.0000000000
9 poet pesnik 2907 0.02 857 600 3541.36 0.0000000000
10 culture kultura 9544 0.08 1172 8355 0.04 2762.23 0.0000000000
11 professor profesor 7511 0.06 1931 5952 0.03 2636.04 0.0000000000
12 our naš 32116 0.28 3284 43596 0.19 2242.57 0.0000000000
13 world svet 14916 0.13 2489 17017 0.08 2148.98 0.0000000000
14 Serbs Srbi 9874 0.08 1669 9918 0.04 2073.39 0.0000000000
15 life život 12633 0.11 3228 14019 0.06 1991.74 0.0000000000
16 people narod 8529 0.07 1667 8257 0.04 1966.10 0.0000000000
17 history istorija 6990 0.06 1015 6267 0.03 1922.79 0.0000000000
18 French francuski 2943 0.03 881 1488 1913.89 0.0000000000
19 publisher izdavač 2454 0.02 1020 1065 1850.34 0.0000000000
20 I ja 22907 0.20 3820 30315 0.13 1817.01 0.0000000000
21 art umetnost 5876 0.05 991 5011 0.02 1793.73 0.0000000000
22 award nagrada 5714 0.05 1037 4956 0.02 1685.40 0.0000000000
23 own svoj 43119 0.37 4169 64375 0.29 1673.43 0.0000000000
24 author autor 5520 0.05 1948 4725 0.02 1672.44 0.0000000000
25 word reč 15127 0.13 4482 18651 0.08 1638.93 0.0000000000
26 my moj 9172 0.08 1790 9866 0.04 1590.91 0.0000000000
27 education obrazovanje 3953 0.03 1001 2969 0.01 1522.34 0.0000000000
28 children deca 9665 0.08 1456 10786 0.05 1496.38 0.0000000000
29 education (profession) prosvete 1951 0.02 978 965 1297.91 0.0000000000
30 love ljubav 3050 0.03 978 2236 1222.29 0.0000000000
31 theater play predstava 4601 0.04 857 4259 0.02 1178.86 0.0000000000
32 work delo 3289 0.03 2119 2736 0.01 1054.13 0.0000000000
33 knowledge znanje 3355 0.03 858 2910 0.01 989.41 0.0000000000
34 science nauka 3731 0.03 1204 3468 0.02 946.90 0.0000000000
35 story priča 9748 0.08 3126 12432 0.06 916.19 0.0000000000
36 film film 7155 0.06 1105 8872 0.04 757.21 0.0000000000
37 doctor dr 5362 0.05 2203 6243 0.03 719.92 0.0000000000
38 program of study studija 2937 0.03 882 2775 0.01 717.56 0.0000000000
39 text tekst 5182 0.04 1413 5996 0.03 711.01 0.0000000000

226
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 wrote napisao 1844 0.02 1399 1451 655.18 0.0000000000
41 alphabet pismo 4667 0.04 865 5401 0.02 639.96 0.0000000000
42 Monte(negro) Crna 7474 0.06 824 9931 0.04 580.66 0.0000000000
43 (Monte)negro Gora 8250 0.07 812 11299 0.05 548.57 0.0000000000
44 one jedan 35933 0.31 7258 59267 0.26 545.43 0.0000000000
45 program/curriculum program 7711 0.07 1570 10476 0.05 535.17 0.0000000000
46 editor urednik 1600 0.01 1085 1307 530.83 0.0000000000
47 theater pozorišta 2067 0.02 988 2036 456.21 0.0000000000
48 she ona 12336 0.11 4913 18787 0.08 410.14 0.0000000000
49 name ime 6627 0.06 2440 9292 0.04 386.30 0.0000000000
50 here ovde 6153 0.05 3213 8589 0.04 368.01 0.0000000000
51 history njegov 20865 0.18 2728 34010 0.15 363.81 0.0000000000
52 part deo 14419 0.12 3981 22718 0.10 357.13 0.0000000000
53 experience iskustvo 3092 0.03 1135 3766 0.02 351.32 0.0000000000
54 picture slika 3168 0.03 1261 3970 0.02 320.85 0.0000000000
55 youth mladi 3415 0.03 1340 4380 0.02 312.91 0.0000000000
56 always uvek 7295 0.06 4195 10746 0.05 310.79 0.0000000000
57 born rođen 1174 0.01 949 1081 304.37 0.0000000000
58 many mnogi 7031 0.06 2371 10358 0.05 299.35 0.0000000000
59 her njen 7002 0.06 1551 10316 0.05 297.97 0.0000000000
60 live žive 2338 0.02 1729 2775 0.01 292.91 0.0000000000
61 woman žena 3900 0.03 1149 5254 0.02 282.68 0.0000000000
62 Belgrade (adj.) beogradski 4401 0.04 850 6097 0.03 274.72 0.0000000000
63 first prvi 21664 0.19 5387 36303 0.16 267.38 0.0000000000
64 today danas 10922 0.09 5569 17349 0.08 250.07 0.0000000000
65 man čovek 23017 0.20 2663 38926 0.17 249.37 0.0000000000
66 topic tema 4097 0.04 1184 5723 0.03 244.03 0.0000000000
67 father otac 1602 0.01 1017 1823 232.58 0.0000000000
68 age doba 1666 0.01 1277 1962 214.73 0.0000000000
69 war rat 6446 0.06 1078 9871 0.04 204.88 0.0000000000
70 death smrti 1762 0.02 1192 2165 193.38 0.0000000000
71 community zajednica 4381 0.04 868 6445 0.03 188.31 0.0000000000
72 common zajednički 2242 0.02 946 2937 0.01 186.44 0.0000000000
73 generation generacije 1096 889 1187 186.16 0.0000000000
74 sometimes ponekad 1462 0.01 1162 1751 177.12 0.0000000000
75 America Americi 1319 0.01 854 1557 168.56 0.0000000000
76 past prošlosti 1372 0.01 981 1651 163.28 0.0000000000
77 foreign strani 12907 0.11 1688 21659 0.10 156.14 0.0000000000
78 Yugoslavia Jugoslavije 2462 0.02 1531 3409 0.02 154.11 0.0000000000
79 person ličnosti 1474 0.01 1113 1835 153.41 0.0000000000
80 idea ideja 3168 0.03 1387 4585 0.02 151.94 0.0000000000
81 second drugi 30613 0.26 4548 54094 0.24 150.84 0.0000000000
82 abroad inostranstvu 1298 0.01 935 1605 138.88 0.0000000000
83 scene sceni 1723 0.01 1163 2276 0.01 137.69 0.0000000000
84 space prostor 5020 0.04 1297 7859 0.03 131.81 0.0000000000

227
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
85 this ovaj 49317 0.42 4897 89477 0.40 120.73 0.0000000000
86 every svaki 12309 0.11 2478 20959 0.09 120.22 0.0000000000
87 house kuća 1897 0.02 1386 2621 0.01 120.17 0.0000000000
88 Belgrade Beograd 19233 0.17 3386 33602 0.15 120.01 0.0000000000
89 you vi 7922 0.07 1306 13072 0.06 119.35 0.0000000000
90 voice/vote glas 1151 857 1439 117.70 0.0000000000
91 opinion mišljenje 3248 0.03 946 4940 0.02 109.05 0.0000000000
92 large veliki 18380 0.16 3411 32248 0.14 105.32 0.0000000000
93 world svetski 5387 0.05 814 8708 0.04 102.92 0.0000000000
94 right pravo 9921 0.09 2455 16852 0.07 100.55 0.0000000000
95 desire želja 1375 0.01 809 1893 88.83 0.0000000000
96 Serbia and Montenegro SCG 2523 0.02 1078 3843 0.02 83.71 0.0000000000
97 work rad 8687 0.07 2785 14832 0.07 81.27 0.0000000000
98 never nikad 5219 0.04 1290 8643 0.04 75.14 0.0000000000
99 most often najčešće 1445 0.01 1145 2067 74.65 0.0000000000
100 carry nosi 1048 935 1420 73.72 0.0000000000
101 society društvo 5464 0.05 1323 9127 0.04 70.33 0.0000000000
102 truth istina 2307 0.02 1148 3596 0.02 62.99 0.0000000000
103 city grad 4841 0.04 1385 8095 0.04 61.42 0.0000000000
104 find naći 3008 0.03 1266 4924 0.02 49.86 0.0000000000
105 conversation razgovor 1051 843 1536 47.25 0.0000000000
106 little mali 9949 0.09 1240 17645 0.08 45.01 0.0000000000
107 task zadatak 1458 0.01 866 2247 43.93 0.0000000000
108 decade decenije 1066 916 1588 41.91 0.0000000000
109 there tamo 4117 0.04 2543 6997 0.03 41.36 0.0000000000
110 emphasize ističe 1895 0.02 1472 3039 0.01 39.37 0.0000000000
111 all the time stalno 1776 0.02 1375 2851 0.01 36.52 0.0000000001

228
Table C2

Negative Key Lemmas in SERBCORP (by Keyness Score)

N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 Serbia Srbija 33742 0.29 2431 93159 0.41 -3359.86 0.0000000000
2 government vlada 6808 0.06 1877 26794 0.12 -3142.76 0.0000000000
3 Euros evra 3045 0.03 1325 15874 0.07 -3107.69 0.0000000000
4 millions miliona 3110 0.03 1511 15619 0.07 -2889.99 0.0000000000
5 percent odsto 5838 0.05 1923 22713 0.10 -2594.47 0.0000000000
6 Dinars dinara 2690 0.02 1060 11867 0.05 -1759.88 0.0000000000
7 law zakon 5689 0.05 1069 19899 0.09 -1733.10 0.0000000000
8 parties stranke 2288 0.02 1047 10165 0.05 -1527.36 0.0000000000
9 against protiv 5439 0.05 2939 18154 0.08 -1376.63 0.0000000000
10 president predsednik 7900 0.07 2829 23415 0.10 -1162.70 0.0000000000
11 authorities vlast 5740 0.05 1262 17482 0.08 -966.86 0.0000000000
12 money novac 3009 0.03 1163 10517 0.05 -913.74 0.0000000000
13 minister ministar 4453 0.04 1485 13981 0.06 -865.05 0.0000000000
14 decision odluka 4129 0.04 912 13130 0.06 -849.51 0.0000000000
15 citizens građani 4283 0.04 873 13444 0.06 -831.09 0.0000000000
16 EU EU 3393 0.03 875 10873 0.05 -722.24 0.0000000000
17 state država 10880 0.09 2251 26849 0.12 -484.33 0.0000000000
18 case slučaj 5020 0.04 1289 13753 0.06 -475.39 0.0000000000
19 public javnost 3582 0.03 938 10303 0.05 -449.71 0.0000000000
20 larger veći 4342 0.04 1194 11978 0.05 -428.86 0.0000000000
21 system sistem 3988 0.03 1152 10931 0.05 -378.75 0.0000000000
22 director direktor 4308 0.04 2047 11611 0.05 -368.04 0.0000000000
23 prime-minister premijera 1298 0.01 877 4429 0.02 -358.90 0.0000000000
24 former bivši 1163 896 4033 0.02 -342.56 0.0000000000
25 problem problem 8628 0.07 2392 20854 0.09 -318.88 0.0000000000
26 year godina 66508 0.57 7372 139140 0.62 -298.02 0.0000000000
27 time vreme 19337 0.17 6382 43233 0.19 -295.25 0.0000000000
28 solution rešenje 2675 0.02 1011 7465 0.03 -282.97 0.0000000000
29 day dan 13186 0.11 2168 30134 0.13 -268.22 0.0000000000
30 clearly jasno 2464 0.02 1852 6790 0.03 -241.75 0.0000000000
31 week nedelje 1401 0.01 1148 4240 0.02 -228.71 0.0000000000
32 development razvoj 3534 0.03 1185 9132 0.04 -226.23 0.0000000000
33 group grupa 5265 0.05 1301 12934 0.06 -225.25 0.0000000000
34 result rezultat 1052 875 3307 0.01 -205.43 0.0000000000
35 now sada 13005 0.11 6602 29006 0.13 -191.79 0.0000000000
36 publicly javno 1200 0.01 927 3603 0.02 -188.37 0.0000000000
37 last prošle 2693 0.02 2056 7041 0.03 -187.51 0.0000000000
38 expect očekuje 1334 0.01 1106 3828 0.02 -165.29 0.0000000000
39 parliament skupštine 1684 0.01 1024 4590 0.02 -154.44 0.0000000000

229
N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 choice izbor 4185 0.04 1285 9934 0.04 -129.72 0.0000000000
41 political politički 10025 0.09 1651 22155 0.10 -129.10 0.0000000000
42 number broj 8152 0.07 3245 18272 0.08 -128.79 0.0000000000
43 nobody niko 3674 0.03 2526 8821 0.04 -127.41 0.0000000000
44 means sredstva 1190 0.01 871 3301 0.01 -121.47 0.0000000000
45 earlier ranije 2036 0.02 1659 5184 0.02 -116.69 0.0000000000
46 be able to moći 42024 0.36 1782 86394 0.38 -114.45 0.0000000000
47 process proces 2077 0.02 994 5254 0.02 -113.17 0.0000000000
48 momentarily trenutno 1688 0.01 1364 4354 0.02 -106.62 0.0000000000
49 account računa 1052 875 2903 0.01 -104.07 0.0000000000
50 moment trenutku 2289 0.02 1711 5648 0.03 -101.71 0.0000000000
51 affairs poslova 1634 0.01 1023 4199 0.02 -100.40 0.0000000000
52 now sad 5543 0.05 2973 12467 0.06 -91.76 0.0000000000
53 possible moguće 2430 0.02 1830 5834 0.03 -84.22 0.0000000000
54 five pet 4137 0.04 2844 9446 0.04 -83.18 0.0000000000
55 less manje 3841 0.03 2753 8824 0.04 -83.17 0.0000000000
56 help pomoći 1236 0.01 986 3173 0.01 -75.37 0.0000000000
57 reason razlog 3839 0.03 1096 8736 0.04 -74.00 0.0000000000
58 situation situacija 1300 0.01 1023 3281 0.01 -69.57 0.0000000000
59 state stanje 3319 0.03 1042 7590 0.03 -68.01 0.0000000000
60 six šest 2608 0.02 1993 6055 0.03 -63.82 0.0000000000
61 ministry ministarstvo 4059 0.03 1124 9085 0.04 -62.85 0.0000000000
62 institution institucija 3133 0.03 953 7145 0.03 -62.06 0.0000000000
63 question pitanje 12767 0.11 3667 26711 0.12 -57.01 0.0000000000
64 Kosovo Kosova 3086 0.03 1165 6961 0.03 -53.10 0.0000000000
65 say reći 13062 0.11 2110 27137 0.12 -48.48 0.0000000000
66 power/forces snage 1148 866 2813 0.01 -48.01 0.0000000000
67 goal cilj 2369 0.02 1461 5394 0.02 -45.99 0.0000000000
68 nothing ništa 4976 0.04 3212 10772 0.05 -45.60 0.0000000000
69 get dobiti 8347 0.07 888 17602 0.08 -45.05 0.0000000000
70 persons lica 1410 0.01 927 3352 0.01 -44.36 0.0000000000
71 plan plan 2555 0.02 861 5743 0.03 -41.93 0.0000000000
72 largest najveći 5070 0.04 1664 10897 0.05 -40.71 0.0000000000
73 bad loše 1050 877 2540 0.01 -39.08 0.0000000000
74 far daleko 1596 0.01 1298 3696 0.02 -37.94 0.0000000000
75 ten deset 2621 0.02 1946 5838 0.03 -37.88 0.0000000000
76 immediately odmah 2636 0.02 2028 5868 0.03 -37.78 0.0000000000
77 last poslednji 4936 0.04 1049 10577 0.05 -37.39 0.0000000000

230
Appendix D: Keyword Analysis (5+ Hits Section of SERBCORP)

Table D1

Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
5 English engleski 1220 0.11 271 722 4949.45 0.0000000000
6 literature književnost 1305 0.12 269 1158 4667.73 0.0000000000
7 mother (adj.) maternji 611 0.05 197 1 3712.29 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
9 dictionary rečnik 801 0.07 135 320 3576.38 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
11 Cyrillic ćirilica 590 0.05 74 188 2756.74 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
13 professor profesor 1524 0.14 307 5952 0.03 2313.27 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
22 education obrazovanje 893 0.08 187 2969 0.01 1558.60 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
24 education (profession) prosvete 563 0.05 219 965 1516.59 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
26 linguist lingvista 260 0.02 81 12 1488.69 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
29 Serbo-Croatian srpskohrvatski 226 0.02 70 6 1323.38 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
34 school (university) fakultet 995 0.09 89 5250 0.02 1101.39 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
36 Croatian hrvatski 684 0.06 163 2749 0.01 1010.51 0.0000000000

231
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000
40 national nacionalni 1361 0.12 88 9893 0.04 961.57 0.0000000000
41 French francuski 477 0.04 140 1488 875.85 0.0000000000
42 elementary osnovni 870 0.08 87 5077 0.02 849.01 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
46 Bosnian bosanski 266 0.02 64 432 736.64 0.0000000000
47 Bosniak bošnjački 223 0.02 70 254 725.60 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000
51 century vek 855 0.08 72 6062 0.03 628.97 0.0000000000
52 cultural kulturni 655 0.06 95 3896 0.02 623.19 0.0000000000
53 meaning značenje 262 0.02 79 578 611.55 0.0000000000
54 foreign strani 2004 0.18 265 21659 0.10 598.10 0.0000000000
55 knowledge znanje 542 0.05 123 2910 0.01 587.42 0.0000000000
56 history istorija 848 0.08 125 6267 0.03 582.63 0.0000000000
57 exam ispit 305 0.03 63 910 579.48 0.0000000000
58 doctor dr 843 0.08 314 6243 0.03 577.16 0.0000000000
59 expression izraz 310 0.03 103 999 554.76 0.0000000000
60 Montenegrins Crnogorci 213 0.02 62 397 548.47 0.0000000000
61 German nemački 460 0.04 123 2301 0.01 541.69 0.0000000000
62 class period čas 542 0.05 63 3156 0.01 530.37 0.0000000000
63 world svet 1624 0.15 235 17017 0.08 528.78 0.0000000000
64 Vuk (Karadžić) Vuk 482 0.04 103 2575 0.01 525.54 0.0000000000
65 Croats hrvati 315 0.03 117 1111 523.22 0.0000000000
66 SANU SANU 235 0.02 84 582 509.44 0.0000000000
67 Spanish španski 201 0.02 63 414 488.93 0.0000000000
68 identity identitet 331 0.03 94 1356 480.11 0.0000000000
69 academician akademik 210 0.02 74 558 433.97 0.0000000000
70 label naziv 442 0.04 119 2660 0.01 413.93 0.0000000000
71 scientific naučni 255 0.02 65 947 404.94 0.0000000000
72 our naš 3182 0.28 341 43596 0.19 392.44 0.0000000000
73 name ime 943 0.08 252 9292 0.04 360.34 0.0000000000
74 politics politika 1475 0.13 897 17075 0.08 355.97 0.0000000000
75 own svoj 4336 0.39 440 64375 0.29 343.75 0.0000000000
76 Monte(negro) Gora 1072 0.10 93 11299 0.05 343.32 0.0000000000
77 (Monte)negro Crna 971 0.09 86 9931 0.04 337.32 0.0000000000
78 second drugi 3666 0.33 534 54094 0.24 301.76 0.0000000000
79 my/mine moj 929 0.08 165 9866 0.04 291.30 0.0000000000
80 author autor 540 0.05 170 4725 0.02 270.30 0.0000000000
81 schooling školovanje 160 0.01 62 558 268.30 0.0000000000

232
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
82 published objavljen 273 0.02 89 1595 265.93 0.0000000000
83 Russian ruski 525 0.05 106 4619 0.02 259.79 0.0000000000
84 writing pisanje 172 0.02 75 708 248.32 0.0000000000
85 program/curriculum program 931 0.08 161 10476 0.05 245.99 0.0000000000
86 publisher izdavač 211 0.02 82 1065 245.91 0.0000000000
87 iunderstand razumeti 287 0.03 62 1847 244.68 0.0000000000
88 literature literature 96 83 188 240.43 0.0000000000
89 spirit duh 275 0.02 64 1755 237.30 0.0000000000
90 I ja 2152 0.19 346 30315 0.13 230.32 0.0000000000
91 letters (a, b, c…) slova 100 72 227 229.30 0.0000000000
92 tradition tradicija 252 0.02 64 1559 227.21 0.0000000000
93 nation nacija 305 0.03 66 2155 225.56 0.0000000000
94 parents roditelji 314 0.03 106 2264 0.01 224.71 0.0000000000
95 text tekst 584 0.05 128 5996 0.03 200.77 0.0000000000
96 sentence rečenica 153 0.01 64 737 187.91 0.0000000000
97 today danas 1306 0.12 567 17349 0.08 185.88 0.0000000000
98 study studija 333 0.03 101 2775 0.01 183.93 0.0000000000
99 award nagrada 490 0.04 80 4956 0.02 175.20 0.0000000000
100 title naslov 246 0.02 63 1783 174.54 0.0000000000
101 lectures predavanja 100 72 394 150.48 0.0000000000
102 reality stvarnost 201 0.02 65 1413 149.85 0.0000000000
103 art umetnost 473 0.04 104 5011 0.02 149.29 0.0000000000
104 life život 1036 0.09 249 14019 0.06 135.40 0.0000000000
105 wrote napisao 194 0.02 141 1451 130.56 0.0000000000
106 one jedan 3595 0.32 684 59267 0.26 126.54 0.0000000000
107 work delo 288 0.03 168 2736 0.01 120.16 0.0000000000
108 Vojvodina Vojvodini 171 0.02 72 1316 109.51 0.0000000000
109 people's narodni 435 0.04 65 4981 0.02 108.71 0.0000000000
110 love ljubav 241 0.02 79 2236 106.18 0.0000000000
111 here ovde 664 0.06 306 8589 0.04 106.04 0.0000000000
112 many mnogi 768 0.07 238 10358 0.05 101.94 0.0000000000
113 self sebe 752 0.07 271 10149 0.05 99.50 0.0000000000
114 notion pojam 86 64 457 94.36 0.0000000000
115 Italian italijanski 85 67 452 93.18 0.0000000000
116 part deo 1475 0.13 378 22718 0.10 91.27 0.0000000000
117 interest interesovanje 160 0.01 80 1336 88.02 0.0000000000
118 example primer 571 0.05 342 7455 0.03 87.65 0.0000000000
119 special poseban 633 0.06 75 8470 0.04 87.14 0.0000000000
120 every svaki 1363 0.12 261 20959 0.09 85.31 0.0000000000
121 she ona 1227 0.11 449 18787 0.08 79.15 0.0000000000
122 form oblik 183 0.02 64 1734 76.81 0.0000000000
123 born rođen 131 0.01 77 1081 73.74 0.0000000000
124 often često 426 0.04 275 5427 0.02 72.44 0.0000000000
125 that onaj 1511 0.14 167 24063 0.11 72.35 0.0000000000
126 sense smisao 404 0.04 74 5093 0.02 71.65 0.0000000000

233
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
127 common zajednički 263 0.02 72 2937 0.01 71.12 0.0000000000
128 opinion mišljenje 392 0.04 109 4940 0.02 69.61 0.0000000000
129 your(s) vaš 215 0.02 68 2431 0.01 55.93 0.0000000000
130 age doba 183 0.02 131 1962 55.83 0.0000000000
131 both oba 149 0.01 121 1491 54.75 0.0000000000
132 introduction uvođenje 185 0.02 74 2005 54.71 0.0000000000
133 same isti 1104 0.10 171 17538 0.08 53.93 0.0000000000
134 live žive 235 0.02 167 2775 0.01 53.01 0.0000000000
135 creation stvaranje 187 0.02 65 2057 52.93 0.0000000000
136 generation generacije 125 0.01 98 1187 52.20 0.0000000000
137 phenomenon pojava 164 0.01 76 1759 49.98 0.0000000000
138 experience iskustvo 293 0.03 97 3766 0.02 48.04 0.0000000000
139 difference razlika 419 0.04 101 5836 0.03 47.50 0.0000000000
140 change menja 138 0.01 107 1433 46.01 0.0000000000
141 sometimes ponekad 159 0.01 121 1751 44.85 0.0000000000
142 story priča 794 0.07 237 12432 0.06 43.48 0.0000000000
143 newspaper novina 101 76 953 42.81 0.0000000000
144 community zajednica 446 0.04 95 6445 0.03 41.44 0.0000000000
145 their(s) njihov 1265 0.11 188 21010 0.09 41.29 0.0000000000
146 population stanovništva 156 0.01 91 1766 40.43 0.0000000000
147 they oni 5019 0.45 579 92128 0.41 38.69 0.0000000000
148 most often najčešće 173 0.02 127 2067 37.46 0.0000000000
149 first prvi 2077 0.19 480 36303 0.16 37.10 0.0000000000
150 past prošlosti 145 0.01 95 1651 36.89 0.0000000000
151 topic tema 395 0.04 107 5723 0.03 36.15 0.0000000001

234
Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference

Corpus)

Table E1

Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by

Keyness Score)

N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
4 dictionary rečnik 801 0.07 135 783 1717.42 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
6 mother (adj.) maternji 611 0.05 197 649 1249.68 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
8 Cyrillic ćirilica 590 0.05 74 905 942.80 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
14 linguist lingvista 260 0.02 81 96 823.13 0.0000000000
15 professor profesor 1524 0.14 307 5987 0.06 775.36 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
18 Croatian hrvatski 684 0.06 163 1623 0.02 729.29 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
20 Serbo-Croatian srpskohrvatski 226 0.02 70 142 597.25 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
22 education obrazovanje 893 0.08 187 3078 0.03 574.07 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
26 Bosniak bošnjački 223 0.02 70 181 526.18 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
29 Bosnian bosanski 266 0.02 64 383 445.70 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000

235
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
31 elementary osnovni 870 0.08 87 3686 0.03 378.99 0.0000000000
32 national nacionalni 1361 0.12 88 6963 0.07 369.42 0.0000000000
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
44 Montenegrins Crnogorci 213 0.02 62 564 199.56 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
48 meaning značenje 262 0.02 79 867 179.80 0.0000000000
49 school (university) fakultet 995 0.09 89 5774 0.05 177.70 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000
51 wrote pisali 1012 0.09 76 6015 0.06 164.69 0.0000000000
52 German nemački 460 0.04 123 2186 0.02 152.83 0.0000000000
53 Spanish španski 201 0.02 63 630 149.85 0.0000000000
54 exam ispit 305 0.03 63 1222 0.01 149.17 0.0000000000
55 SANU SANU 235 0.02 84 827 145.87 0.0000000000
56 name ime 943 0.08 252 5684 0.05 144.97 0.0000000000
57 identity identitet 331 0.03 94 1400 0.01 144.68 0.0000000000
58 children deca 1294 0.12 220 8371 0.08 144.52 0.0000000000
59 knowledge znanje 542 0.05 123 2813 0.03 140.91 0.0000000000
60 scientific naučni 255 0.02 65 1005 128.83 0.0000000000
61 French francuski 477 0.04 140 2466 0.02 125.47 0.0000000000
62 Russian ruski 525 0.05 106 2827 0.03 121.69 0.0000000000
63 poetry poezija 543 0.05 63 2988 0.03 117.18 0.0000000000
64 introduction uvođenje 185 0.02 74 645 116.65 0.0000000000
65 percent odsto 816 0.07 193 5022 0.05 114.84 0.0000000000
66 writer pisac 998 0.09 182 6480 0.06 109.40 0.0000000000
67 Monte(negro) Gora 1072 0.10 93 7178 0.07 99.97 0.0000000000
68 understand razumeti 287 0.03 62 1339 0.01 99.90 0.0000000000
69 be able to moći 4619 0.41 186 37405 0.35 90.76 0.0000000000
70 (Monte)negro Crna 971 0.09 86 6503 0.06 90.45 0.0000000000
71 academician akademik 210 0.02 74 900 89.19 0.0000000000
72 cultural kulturni 655 0.06 95 4147 0.04 81.12 0.0000000000
73 example primer 571 0.05 342 3576 0.03 74.36 0.0000000000
74 nation nacija 305 0.03 66 1622 0.02 73.55 0.0000000000
75 Vojvodina Vojvodini 171 0.02 72 740 71.08 0.0000000000

236
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
76 sentence rečenica 153 0.01 64 640 68.47 0.0000000000
77 today danas 1306 0.12 567 9616 0.09 65.65 0.0000000000
78 letters (a, b, c…) slova 100 72 344 64.47 0.0000000000
79 students (K-8) đaci 262 0.02 110 1426 0.01 58.63 0.0000000000
80 same isti 1104 0.10 171 8156 0.08 54.05 0.0000000000
81 poet pesnik 402 0.04 89 2505 0.02 53.54 0.0000000000
82 schooling školovanje 160 0.01 62 769 51.61 0.0000000000
83 program/curriculum program 931 0.08 161 6780 0.06 50.85 0.0000000000
84 difference razlika 419 0.04 101 2665 0.03 50.77 0.0000000000
85 book knjiga 2499 0.22 371 20214 0.19 49.74 0.0000000000
86 century vek 855 0.08 72 6192 0.06 48.64 0.0000000000
87 parents roditelji 314 0.03 106 1893 0.02 48.21 0.0000000000
88 history istorija 848 0.08 125 6142 0.06 48.20 0.0000000000
89 change (v.) menja 138 0.01 107 674 42.65 0.0000000000
90 law zakon 687 0.06 112 5002 0.05 37.58 0.0000000000

237
Appendix F: Collocation Analysis (SERBCORP)

Collocation analysis of the lemma JEZIK conducted on SERBCORP as a whole

produced a total of 368 lemma collocates of the lemma JEZIK (Tables F1 and F2). Table

F1 shows the lemma collocates by frequency. Unsurprisingly for a corpus of Serbian, the

most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’ with 8,063

occurrences in 4,486 texts. Perhaps equally expectedly, the second most frequent lemma

collocate is engleski ‘English’ (3,167 occurrences in 2,500 texts), which, in Serbia as in

many other countries around the world, is seen as the most important foreign language

(followed here by French, German, Russian, and Spanish). Other top ten most frequent

lemma collocates of the lemma JEZIK include: strani ‘foreign’, govoriti ‘speak’, svoj

‘own’, maternji ‘mother (tongue)’, naš ‘our’, književnost ‘literature’, svi ‘all’, and

francuski ‘French’. As with keyword analysis of SERBCORP above, the most frequent

collocates of JEZIK suggest a pervasive discourse of national identity based on the

routinized construction of in- and out-groups (Serbian, own, mother tongue, and our vs.

English, foreign, and French). Further, similar to the results of keyword analysis, the

remainder of the top 50 collocates indicate semantic fields of education (učiti ‘learn’,

škola ‘school’, profesor ‘professor’, učenje ‘learning’, nastava ‘instruction’, znanje

‘knowledge’, fakultet ‘school [university]’, and matematika ‘mathematics’), literature and

translation (književnost ‘literature’, preveden ‘translated’, književni ‘literary’, prevod

‘translation’, and prevoditi ‘translate’), and culture (kultura ‘culture’).

Interestingly, however, collocates seem to be more sensitive to identity-related

discourses than keywords even at the level of SERBCORP. Thus, we see hrvatski

‘Croatian’, zajednički ‘common’, nacionalni ‘national’, crnogorski ‘Montengrin’, narod

238
‘people’, istorija ‘history’, postoji ‘exists’, and ime ‘name’, hinting at the discourse of

contestation mentioned above. The presence of identity-related discourses is further

suggested by the collocate manjina ‘minority’ as well as the glottonyms referring to the

two largest minority groups in Serbia, albanski ‘Albanian’ and mađarski ‘Hungarian’.

The top ten most significant collocates (see Table F2), on the other hand, exhibit a

more opaque pattern. Zli ‘evil’ refers to the common metonymy zli jezici ‘evil tongues’

(or, rather, ‘malicious tongues’). We also see an eclectic mix of items such as the verb

izučavati ‘to study’, the attributive adjective razumljiv ‘comprehensible’, the plural noun

brojki ‘numerals’, the singular noun geografija ‘geography’, and the glottonym švedski

‘Swedish’. More interestingly, the most significant collocates of the lemma JEZIK in

SERBCORP also include službeni and zvaničan both meaning ‘official’, and hrvatski

‘Croatian’ and južnoslovenski ‘South-Slavic’. The remainder of the top 50 most

significant collocates is similarly opaque in terms of patterns, but we do see a number of

potentially interesting items such as manjinski ‘minority (adj.)’, različit ‘different’,

ravnopravan ‘equal’, zajednički ‘common’, maternji ‘mother (tongue)’, jedinstvo ‘unity’,

preimenovanje ‘renaming’, standardizacija ‘standardization’, etnički ‘ethnic’, and uvesti

‘introduce’, as well as bosanski ‘Bosnian’, jugoslovenske ‘Yugoslav’, and a rare reference

to bilingualism, dvojezično ‘bilingual’. Finaly, one entirely new pattern suggests the

presence of a discourse of linguistic transparency and perhaps purity/authenticity with the

attributive adjectives jednostavnim ‘simple, jasnim ‘clear’, and čisti ‘pure’. In sum, then,

the top collocates of the lemma JEZIK in SERBCORP seem to show patterns similar to

those exhibited by key lemmas, while frequency seems to offer a better insight into the

discursive profile than statistical significance.

239
Table F1

Lemma Collocates of the Lemma JEZIK ‘Language’ in SERBCORP (by Frequency)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 Serbian srpski 8.42 4486 8063
2 English engleski 9.84 2500 3167
3 foreign strani 5.94 1448 2293
4 speak govoriti 5.82 1609 1977
5 own svoj 5.67 1080 1442
6 mother (adj.) maternji 10.12 801 1265
7 our naš 5.90 1026 1262
8 literature književnost 7.64 761 1092
9 all svi 5.08 902 1068
10 French francuski 8.63 857 1052
11 second drugi 5.21 781 1022
12 German nemački 8.52 739 877
13 culture kultura 6.54 676 818
14 alphabet pismo 7.54 435 785
15 Russian ruski 7.89 518 717
16 learn učiti 6.41 519 714
17 translated preveden 7.57 660 696
18 school škola 5.85 501 673
19 word reč 6.02 543 644
20 professor profesor 6.52 503 619
21 Croatian hrvatski 12.53 354 593
22 use upotreba 8.35 366 538
23 literary književni 5.15 308 514
24 common zajednički 10.21 465 506
25 official službeni 13.30 276 493
26 national nacionalni 6.02 356 488
27 two dva 5.02 381 484
28 say kazati 6.78 436 478
29 learning učenje 8.79 328 471
30 instruction nastava 7.53 312 464
31 Montenegrin crnogorski 8.34 196 459
32 Albanian albanski 9.26 374 455
33 people narod 5.89 340 455
34 knowledge znanje 7.23 391 453
35 Hungarian mađarski 9.47 319 421
36 translation prevod 5.84 369 413
37 translate prevoditi 7.96 359 404
38 same isti 5.59 321 402
39 wrote pisali 6.44 337 402
40 their njihov 7.41 328 374
41 history istorija 5.88 315 372
42 school (university) fakultet 5.85 298 351
43 man čovek 6.30 316 347
44 exist postojati 5.64 278 328
45 Spanish španski 8.77 253 326
46 mathematics matematika 8.40 219 323
47 world (adj.) svetski 6.06 301 323
48 name ime 5.74 198 320
49 minority manjina 7.27 209 312
50 good dobar 5.80 284 301
51 subject predmet 5.21 207 297
52 Serbo-Croatian srpskohrvatski 8.07 196 287
53 people's narodni 6.37 194 278
54 published objavljen 6.84 252 274
55 Roma (adj.) romski 7.17 146 273
56 dictionary rečnik 7.54 171 265
57 three tri 5.05 217 259
58 Bosnian bosanski 9.23 129 257
59 Cyrillic ćirilica 5.88 173 256
60 contemporary savremeni 7.10 213 256
61 label naziv 6.86 179 255
62 learn naučiti 5.76 229 252
63 several nekoliko 5.11 227 238

240
N Collocate (English) Collocate (Serbian) MI score Texts Total
64 Serbs Srbi 6.95 176 234
65 knowledge poznavanje 8.85 208 231
66 poem pesma 5.94 205 229
67 course kurs 8.11 184 226
68 different različit 10.44 192 222
69 many mnogi 8.77 204 220
70 Greek grčki 7.80 173 219
71 official zvaničan 11.18 166 218
72 must morati 6.70 195 207
73 edition izdanje 6.43 187 205
74 mean (v.) značiti 5.81 194 204
75 class period čas 6.20 164 197
76 use (v). koristiti 9.73 172 197
77 translator prevodilac 5.37 178 197
78 understand razumeti 5.34 178 191
79 student (university) student 5.52 157 189
80 media mediji 6.22 161 187
81 poetry poezija 6.31 152 184
82 science nauka 5.76 146 182
83 text tekst 8.81 162 181
84 number broj 6.84 152 179
85 special poseban 5.51 152 177
86 education obrazovanje 5.63 156 176
87 four četiri 5.49 154 174
88 department katedra 6.66 130 174
89 European evropski 8.57 149 173
90 Italian italijanski 6.81 156 168
91 Slovene slovenski 8.27 114 168
92 others ostali 5.16 148 164
93 Romanian rumunski 7.08 126 160
94 state država 7.67 135 159
95 hair dlake 10.56 151 158
96 speech govor 6.31 127 155
97 instructor nastavnik 6.63 113 154
98 Bosniak bošnjački 7.02 87 152
99 exam ispit 6.79 117 152
100 introduction uvođenje 7.65 112 150
101 group grupa 8.61 120 148
102 textbook udžbenik 6.32 122 146
103 Latin latinski 8.81 115 143
104 written pisan 7.93 135 143
105 teach predavati 10.71 112 143
106 philological filološki 7.80 133 142
107 publish objaviti 5.39 136 141
108 novel roman 6.95 129 140
109 best najbolji 8.65 130 136
110 standardization standardizacija 9.41 69 133
111 five pet 5.15 123 132
112 said rečeno 9.00 125 129
113 music muzika 5.48 118 127
114 Vuk (Karadžić) Vuk 6.58 100 127
115 orthography pravopis 8.68 85 125
116 law zakon 5.05 100 124
117 faith vera 6.51 108 122
118 Slovene slovenački 5.97 112 121
119 program of study studija 5.52 105 121
120 written napisan 9.66 113 119
121 renaming preimenovanje 9.76 66 117
122 life život 5.60 111 117
123 university univerzitet 5.04 108 116
124 Bulgarian bugarski 8.36 83 115
125 serve služiti 9.32 105 115
126 represent predstavljati 5.07 106 114
127 film (adj.) filmski 5.48 101 112
128 customs običaji 8.21 107 112
129 area oblast 5.71 100 111
130 SANU SANU 6.86 77 111
131 hatred mržnje 7.80 91 109
132 call zvati 8.19 81 109

241
N Collocate (English) Collocate (Serbian) MI score Texts Total
133 board odbor 5.60 65 107
134 various razni 9.14 99 107
135 dialect dijalekat 7.12 73 106
136 communication komunikacija 5.02 89 106
137 students (K-12) učenici 5.39 89 106
138 protection zaštita 5.50 94 106
139 Japanese japanski 8.28 79 105
140 Slovak slovački 7.43 87 105
141 grammar gramatika 8.65 82 104
142 think misliti 7.37 97 104
143 self sebe 7.59 92 101
144 Arabic arapski 9.31 86 100
145 church crkva 6.89 86 99
146 excellent odlično 5.82 96 98
147 poetry pesnik 8.67 90 98
148 know poznavati 9.47 88 95
149 six šest 5.34 89 95
150 doctor dr 6.52 86 94
151 Chinese kineski 8.02 76 94
152 compulsory obavezan 6.78 78 93
153 level nivo 5.25 78 92
154 standard standardni 7.63 55 92
155 desire (v.) želeti 5.09 84 92
156 literature literatura 6.71 84 88
157 expression izraz 6.04 74 85
158 title naslov 9.11 70 85
159 introduce uvesti 8.97 70 85
160 territory prostor 7.66 80 84
161 attend pohađati 7.42 66 83
162 poetic pesnički 5.89 62 82
163 Croats Hrvati 6.16 52 80
164 department odsek 7.77 63 80
165 high school gimnazija 5.78 72 79
166 need (n.) potreba 5.03 71 79
167 framework okvir 5.03 66 76
168 style stil 6.81 71 76
169 so-called takozvani 6.05 59 75
170 instructional nastavni 5.54 55 73
171 translation prevođenje 11.04 64 73
172 belong pripadati 6.18 60 73
173 evil zli 13.67 68 73
174 against protiv 8.25 64 72
175 spoken govorni 8.62 56 71
176 Turkish turski 5.95 60 71
177 minority manjinski 11.05 48 69
178 both oba 6.77 58 69
179 Latin (alphabet) latinica 6.82 51 68
180 Polish poljski 8.55 62 68
181 needed potreban 5.33 64 67
182 exclusively isključivo 6.37 63 66
183 everyday svakodnevni 6.21 62 66
184 study (v.) izučavati 12.16 53 65
185 paper list 5.51 64 65
186 Macedonian makedonskom 9.40 58 64
187 Ruthenian rusinski 7.87 42 64
188 read čitati 6.05 59 63
189 ordinary običan 5.66 62 63
190 preservation očuvanje 7.34 59 63
191 comprehensible razumljiv 11.12 62 63
192 study (v.) studirati 5.10 53 63
193 fluently tečno 7.64 63 63
194 third treći 5.14 53 62
195 simultaneously istovremeno 5.02 57 61
196 classical klasični 7.27 40 61
197 appear pojaviti 6.24 58 61
198 defense odbrana 5.05 44 60
199 geography geografija 11.13 52 59
200 students (K-8) đaci 5.61 55 58
201 beautiful lep 5.58 53 58

242
N Collocate (English) Collocate (Serbian) MI score Texts Total
202 listen slušati 7.12 50 58
203 test test 6.99 44 58
204 nature priroda 5.16 54 57
205 high (school) srednji 5.50 53 56
206 Cyrillic (adj.) ćirilično 7.12 47 55
207 hear čuti 10.20 54 55
208 unique jedinstven 7.26 35 54
209 novelty novina 7.30 51 54
210 create stvarati 5.62 43 53
211 association udruženje 5.01 44 53
212 public javni 5.18 42 52
213 writing pisanje 5.68 46 52
214 computer računaru 8.93 50 52
215 element element 5.55 35 51
216 informing informisanje 5.49 46 51
217 necessary neophodan 5.30 46 51
218 again ponovo 6.46 48 50
219 follow pratiti 5.49 49 50
220 make praviti 7.53 45 50
221 picture slika 7.78 47 50
222 hundred sto 6.75 45 50
223 help pomoć 6.60 45 49
224 printed štampan 7.60 48 49
225 scientific naučni 7.64 43 48
226 significance značaj 5.33 44 48
227 magazine/journal časopis 6.01 39 47
228 abroad (n.) inostranstvu 5.78 41 47
229 interest interesovanje 5.42 45 47
230 Karadžić Karadžić 5.63 45 47
231 private privatni 9.29 33 47
232 environment sredina 5.01 34 47
233 election (adj.) izborni 5.67 32 46
234 universal univerzalni 9.34 42 46
235 clear čisti 9.80 42 45
236 Priština (adj.) prištinski 9.43 45 45
237 Roma Roma 6.25 20 45
238 influence uticaj 5.57 35 45
239 religion religija 8.77 41 44
240 schooling školovanje 5.72 42 44
241 teachers učitelji 6.24 35 44
242 readers čitaoci 5.16 43 43
243 philosophy filozofija 7.76 30 43
244 institute matica 5.47 38 43
245 show pokazivati 5.06 40 43
246 structure struktura 5.51 34 43
247 task zadatak 5.03 35 43
248 keep držati 5.14 40 42
249 comes out izlazi 6.87 41 42
250 persons lica 5.05 21 42
251 linguistic lingvistički 8.26 31 42
252 form oblik 5.16 35 42
253 syntax sintaksa 8.66 29 42
254 stand stajati 6.85 38 42
255 easier lakše 5.44 40 41
256 editor lektor 6.82 37 41
257 none nijedan 5.74 39 41
258 lectures predavanja 6.19 38 41
259 differentiate razlikovati 5.86 36 41
260 symbol simbola 7.04 35 41
261 variant varijanta 6.75 33 41
262 future (adj.) budući 5.33 38 40
263 rule pravilo 8.03 37 40
264 similar sličan 8.60 35 40
265 written ispisan 10.21 34 39
266 linguistics lingvistika 5.90 31 39
267 mother (n.) majka 6.91 37 39
268 Nikšić Nikšić 5.67 32 39
269 area područje 5.45 36 39
270 take (exams) polagati 5.40 32 39

243
N Collocate (English) Collocate (Serbian) MI score Texts Total
271 existence postojanje 6.26 31 39
272 including uključujući 5.73 37 39
273 speaking govoreći 8.48 38 38
274 violence nasilje 7.41 33 38
275 news vesti 6.24 34 38
276 Ijekavian ijekavski 7.02 29 37
277 call nazivati 6.82 32 37
278 organize organizovati 5.70 33 37
279 grade book dnevnik 6.78 36 36
280 rename preimenovati 8.24 24 36
281 reality stvarnost 6.96 35 36
282 computer science informatike 8.14 30 35
283 simple jednostavnim 11.02 34 35
284 encompass obuhvata 7.07 23 35
285 momentarily trenutno 6.76 33 35
286 thirty trideset 5.46 34 35
287 Ukrainian ukrajinski 8.97 30 35
288 purity čistota 8.59 26 34
289 reads glasi 6.79 32 34
290 linguist lingvista 6.73 33 34
291 first-graders prvaci 5.97 29 34
292 time slot termin 6.10 25 34
293 numerals brojki 11.62 32 33
294 use (n.) korišćenje 6.63 32 33
295 local lokalni 5.70 29 33
296 understood podrazumeva 5.04 33 33
297 research istraživanja 8.49 29 32
298 accent izgovor 5.52 27 32
299 adequate odgovarajući 5.85 29 32
300 sings peva 6.14 32 32
301 sentence rečenica 5.94 31 32
302 regional regionalni 10.31 24 32
303 study (n.) izučavanje 8.44 29 31
304 performs izvodi 7.26 30 31
305 minister ministar 5.19 29 31
306 enable omogućavati 5.52 27 31
307 jargon žargon 5.88 24 31
308 broadcast emituje 6.96 27 30
309 first najpre 8.26 28 30
310 paper papiru 7.80 30 30
311 master (v.) savladati 6.20 28 30
312 letter (a, b, c…) slovo 7.21 25 30
313 governing vladaju 7.32 27 30
314 rich bogat 5.56 28 29
315 diploma diploma 6.01 27 29
316 continue nastaviti 5.48 29 29
317 suits (v.) odgovara 7.05 27 29
318 discussion rasprava 5.29 25 29
319 equal ravnopravan 10.34 24 29
320 preserve sačuvati 5.25 26 29
321 population stanovništva 7.16 28 29
322 Subotica subotica 6.14 24 29
323 love (v.) volim 7.39 29 29
324 additional dodatni 5.41 27 28
325 bilingual dvojezično 9.00 28 28
326 offer (v.) nuditi 8.14 28 28
327 training obuku 9.65 26 28
328 count (n.) računa 7.12 25 28
329 walls zidova 6.49 22 28
330 optional fakultativni 8.59 24 27
331 South-Slavic južnoslovenski 11.88 22 27
332 phenomenon pojava 5.15 27 27
333 defend braniti 5.60 22 26
334 ethnic etnički 9.40 24 26
335 Hebrew hebrejskom 9.39 25 26
336 this (way) ovako 6.66 22 26
337 Swedish švedski 11.43 22 26
338 body tela 5.05 22 26
339 show (n.) emisije 5.79 24 25

244
N Collocate (English) Collocate (Serbian) MI score Texts Total
340 voice glas 5.05 25 25
341 past prošlosti 6.57 21 25
342 understanding razumevanje 5.94 22 25
343 choose birati 5.55 21 24
344 document dokumenta 5.97 23 24
345 Yugoslavia Jugoslavije 7.09 23 24
346 find pronađu 5.12 24 24
347 across širom 8.17 24 24
348 perfecting usavršavanje 6.57 24 24
349 connoisseur znalac 8.38 23 24
350 Czech češki 6.38 20 23
351 twenty dvadeset 5.12 23 23
352 clear jasnim 10.26 23 23
353 unity jedinstvo 9.91 21 23
354 ignorance nepoznavanje 8.57 23 23
355 try (v.) pokuša(va)ti 6.68 22 23
356 little pomalo 7.06 23 23
357 declare izjasniti 5.04 20 22
358 Yugoslav jugoslovenske 8.94 21 22
359 dialect narečja 7.30 20 22
360 connoisseur poznavalac 7.27 22 22
361 whole (n.) celina 5.12 20 21
362 contribution doprinos 8.69 21 21
363 twenty dvadesetak 5.27 21 21
364 thousands hiljada 5.88 21 21
365 learn about upoznaju 6.09 20 21
366 works delima 5.16 20 20
367 notion pojam 5.94 20 20
368 taking (exams) polaganje 6.76 20 20

245
Table F2

Lemma Collocates of the Lemma JEZIK ‘Language’ in SERBCORP (by MI Score)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 evil zli 13.67 68 73
2 official službeni 13.30 276 493
3 Croatian hrvatski 12.53 354 593
4 study (v.) izučavati 12.16 53 65
5 South-Slavic južnoslovenski 11.88 22 27
6 numerals brojki 11.62 32 33
7 Swedish švedski 11.43 22 26
8 official zvaničan 11.18 166 218
9 geography geografija 11.13 52 59
10 comprehensible razumljiv 11.12 62 63
11 minority manjinski 11.05 48 69
12 translation prevođenje 11.04 64 73
13 simple jednostavnim 11.02 34 35
14 teach predavati 10.71 112 143
15 hair dlake 10.56 151 158
16 different različit 10.44 192 222
17 equal ravnopravan 10.34 24 29
18 regional regionalni 10.31 24 32
19 clear jasnim 10.26 23 23
20 written ispisan 10.21 34 39
21 common zajednički 10.21 465 506
22 hear čuti 10.20 54 55
23 mother (adj.) maternji 10.12 801 1265
24 unity jedinstvo 9.91 21 23
25 English engleski 9.84 2500 3167
26 clear čisti 9.80 42 45
27 renaming preimenovanje 9.76 66 117
28 use (v). koristiti 9.73 172 197
29 written napisan 9.66 113 119
30 training obuku 9.65 26 28
31 Hungarian mađarski 9.47 319 421
32 know poznavati 9.47 88 95
33 Priština (adj.) prištinski 9.43 45 45
34 standardization standardizacija 9.41 69 133
35 ethnic etnički 9.40 24 26
36 Macedonian makedonskom 9.40 58 64
37 Hebrew hebrejskom 9.39 25 26
38 universal univerzalni 9.34 42 46
39 serve služiti 9.32 105 115
40 Arabic arapski 9.31 86 100
41 private privatni 9.29 33 47
42 Albanian albanski 9.26 374 455
43 Bosnian bosanski 9.23 129 257
44 various razni 9.14 99 107
45 title naslov 9.11 70 85
46 bilingual dvojezično 9.00 28 28
47 said rečeno 9.00 125 129
48 introduce uvesti 8.97 70 85
49 Ukrainian ukrajinski 8.97 30 35
50 Yugoslav jugoslovenske 8.94 21 22
51 computer računaru 8.93 50 52
52 knowledge poznavanje 8.85 208 231
53 Latin latinski 8.81 115 143
54 text tekst 8.81 162 181
55 learning učenje 8.79 328 471
56 religion religija 8.77 41 44
57 many mnogi 8.77 204 220
58 Spanish španski 8.77 253 326
59 contribution doprinos 8.69 21 21
60 orthography pravopis 8.68 85 125
61 poetry pesnik 8.67 90 98
62 syntax sintaksa 8.66 29 42
63 grammar gramatika 8.65 82 104

246
N Collocate (English) Collocate (Serbian) MI score Texts Total
64 best najbolji 8.65 130 136
65 French francuski 8.63 857 1052
66 spoken govorni 8.62 56 71
67 group grupa 8.61 120 148
68 similar sličan 8.60 35 40
69 purity čistota 8.59 26 34
70 optional fakultativni 8.59 24 27
71 European evropski 8.57 149 173
72 ignorance nepoznavanje 8.57 23 23
73 Polish poljski 8.55 62 68
74 German nemački 8.52 739 877
75 research istraživanja 8.49 29 32
76 speaking govoreći 8.48 38 38
77 study (n.) izučavanje 8.44 29 31
78 Serbian srpski 8.42 4486 8063
79 mathematics matematika 8.40 219 323
80 connoisseur znalac 8.38 23 24
81 Bulgarian bugarski 8.36 83 115
82 use upotreba 8.35 366 538
83 Montenegrin crnogorski 8.34 196 459
84 Japanese japanski 8.28 79 105
85 Slovene slovenski 8.27 114 168
86 first najpre 8.26 28 30
87 linguistic lingvistički 8.26 31 42
88 against protiv 8.25 64 72
89 rename preimenovati 8.24 24 36
90 customs običaji 8.21 107 112
91 call zvati 8.19 81 109
92 across širom 8.17 24 24
93 offer (v.) nuditi 8.14 28 28
94 computer science informatike 8.14 30 35
95 course kurs 8.11 184 226
96 Serbo-Croatian srpskohrvatski 8.07 196 287
97 rule pravilo 8.03 37 40
98 Chinese kineski 8.02 76 94
99 translate prevoditi 7.96 359 404
100 written pisan 7.93 135 143
101 Russian ruski 7.89 518 717
102 Ruthenian rusinski 7.87 42 64
103 paper papiru 7.80 30 30
104 Greek grčki 7.80 173 219
105 philological filološki 7.80 133 142
106 hatred mržnje 7.80 91 109
107 picture slika 7.78 47 50
108 department odsek 7.77 63 80
109 philosophy filozofija 7.76 30 43
110 state država 7.67 135 159
111 territory prostor 7.66 80 84
112 introduction uvođenje 7.65 112 150
113 literature književnost 7.64 761 1092
114 scientific naučni 7.64 43 48
115 fluently tečno 7.64 63 63
116 standard standardni 7.63 55 92
117 printed štampan 7.60 48 49
118 self sebe 7.59 92 101
119 translated preveden 7.57 660 696
120 alphabet pismo 7.54 435 785
121 dictionary rečnik 7.54 171 265
122 instruction nastava 7.53 312 464
123 make praviti 7.53 45 50
124 Slovak slovački 7.43 87 105
125 attend pohađati 7.42 66 83
126 violence nasilje 7.41 33 38
127 their njihov 7.41 328 374
128 love (v.) volim 7.39 29 29
129 think misliti 7.37 97 104
130 preservation očuvanje 7.34 59 63
131 governing vladaju 7.32 27 30
132 dialect narečja 7.30 20 22

247
N Collocate (English) Collocate (Serbian) MI score Texts Total
133 novelty novina 7.30 51 54
134 classical klasični 7.27 40 61
135 connoisseur poznavalac 7.27 22 22
136 minority manjina 7.27 209 312
137 performs izvodi 7.26 30 31
138 unique jedinstven 7.26 35 54
139 knowledge znanje 7.23 391 453
140 letter (a, b, c…) slovo 7.21 25 30
141 Roma (adj.) romski 7.17 146 273
142 population stanovništva 7.16 28 29
143 Cyrillic (adj.) ćirilično 7.12 47 55
144 count (n.) računa 7.12 25 28
145 dialect dijalekat 7.12 73 106
146 listen slušati 7.12 50 58
147 contemporary savremeni 7.10 213 256
148 Yugoslavia Jugoslavije 7.09 23 24
149 Romanian rumunski 7.08 126 160
150 encompass obuhvata 7.07 23 35
151 little pomalo 7.06 23 23
152 suits (v.) odgovara 7.05 27 29
153 symbol simbola 7.04 35 41
154 Ijekavian ijekavski 7.02 29 37
155 Bosniak bošnjački 7.02 87 152
156 test test 6.99 44 58
157 broadcast emituje 6.96 27 30
158 reality stvarnost 6.96 35 36
159 Serbs Srbi 6.95 176 234
160 novel roman 6.95 129 140
161 mother (n.) majka 6.91 37 39
162 church crkva 6.89 86 99
163 comes out izlazi 6.87 41 42
164 label naziv 6.86 179 255
165 SANU SANU 6.86 77 111
166 stand stajati 6.85 38 42
167 number broj 6.84 152 179
168 published objavljen 6.84 252 274
169 Latin (alphabet) latinica 6.82 51 68
170 editor lektor 6.82 37 41
171 call nazivati 6.82 32 37
172 style stil 6.81 71 76
173 Italian italijanski 6.81 156 168
174 exam ispit 6.79 117 152
175 reads glasi 6.79 32 34
176 grade book dnevnik 6.78 36 36
177 compulsory obavezan 6.78 78 93
178 say kazati 6.78 436 478
179 both oba 6.77 58 69
180 momentarily trenutno 6.76 33 35
181 taking (exams) polaganje 6.76 20 20
182 hundred sto 6.75 45 50
183 variant varijanta 6.75 33 41
184 linguist lingvista 6.73 33 34
185 literature literatura 6.71 84 88
186 must morati 6.70 195 207
187 try (v.) pokuša(va)ti 6.68 22 23
188 this (way) ovako 6.66 22 26
189 department katedra 6.66 130 174
190 use (n.) korišćenje 6.63 32 33
191 instructor nastavnik 6.63 113 154
192 help pomoć 6.60 45 49
193 Vuk (Karadžić) Vuk 6.58 100 127
194 past prošlosti 6.57 21 25
195 perfecting usavršavanje 6.57 24 24
196 culture kultura 6.54 676 818
197 doctor dr 6.52 86 94
198 professor profesor 6.52 503 619
199 faith vera 6.51 108 122
200 walls zidova 6.49 22 28
201 again ponovo 6.46 48 50

248
N Collocate (English) Collocate (Serbian) MI score Texts Total
202 wrote pisali 6.44 337 402
203 edition izdanje 6.43 187 205
204 learn učiti 6.41 519 714
205 Czech češki 6.38 20 23
206 people's narodni 6.37 194 278
207 exclusively isključivo 6.37 63 66
208 textbook udžbenik 6.32 122 146
209 poetry poezija 6.31 152 184
210 speech govor 6.31 127 155
211 man čovek 6.30 316 347
212 existence postojanje 6.26 31 39
213 Roma Roma 6.25 20 45
214 appear pojaviti 6.24 58 61
215 news vesti 6.24 34 38
216 teachers učitelji 6.24 35 44
217 media mediji 6.22 161 187
218 everyday svakodnevni 6.21 62 66
219 class period čas 6.20 164 197
220 master (v.) savladati 6.20 28 30
221 lectures predavanja 6.19 38 41
222 belong pripadati 6.18 60 73
223 Croats Hrvati 6.16 52 80
224 sings peva 6.14 32 32
225 Subotica Subotica 6.14 24 29
226 time slot termin 6.10 25 34
227 learn about upoznaju 6.09 20 21
228 world (adj.) svetski 6.06 301 323
229 read čitati 6.05 59 63
230 so-called takozvani 6.05 59 75
231 expression izraz 6.04 74 85
232 national nacionalni 6.02 356 488
233 word reč 6.02 543 644
234 magazine/journal časopis 6.01 39 47
235 diploma diploma 6.01 27 29
236 Slovene slovenački 5.97 112 121
237 document dokumenta 5.97 23 24
238 first-graders prvaci 5.97 29 34
239 Turkish turski 5.95 60 71
240 understanding razumevanje 5.94 22 25
241 sentence rečenica 5.94 31 32
242 poem pesma 5.94 205 229
243 foreign strani 5.94 1448 2293
244 notion pojam 5.94 20 20
245 our naš 5.90 1026 1262
246 linguistics lingvistika 5.90 31 39
247 poetic pesnički 5.89 62 82
248 people narod 5.89 340 455
249 history istorija 5.88 315 372
250 jargon žargon 5.88 24 31
251 Cyrillic ćirilica 5.88 173 256
252 thousands hiljada 5.88 21 21
253 differentiate razlikovati 5.86 36 41
254 adequate odgovarajući 5.85 29 32
255 school (university) fakultet 5.85 298 351
256 school škola 5.85 501 673
257 translation prevod 5.84 369 413
258 excellently odlično 5.82 96 98
259 speak govoriti 5.82 1609 1977
260 mean (v.) značiti 5.81 194 204
261 good dobar 5.80 284 301
262 show (n.) emisije 5.79 24 25
263 abroad (n.) inostranstvu 5.78 41 47
264 high school gimnazija 5.78 72 79
265 learn naučiti 5.76 229 252
266 science nauka 5.76 146 182
267 name ime 5.74 198 320
268 none nijedan 5.74 39 41
269 including uključujući 5.73 37 39
270 schooling školovanje 5.72 42 44

249
N Collocate (English) Collocate (Serbian) MI score Texts Total
271 area oblast 5.71 100 111
272 local lokalni 5.70 29 33
273 organize organizovati 5.70 33 37
274 writing pisanje 5.68 46 52
275 Nikšić Nikšić 5.67 32 39
276 election (adj.) izborni 5.67 32 46
277 own svoj 5.67 1080 1442
278 ordinary običan 5.66 62 63
279 exist postojati 5.64 278 328
280 education obrazovanje 5.63 156 176
281 Karadžić Karadžić 5.63 45 47
282 create stvarati 5.62 43 53
283 students (K-8) đaci 5.61 55 58
284 defend braniti 5.60 22 26
285 board odbor 5.60 65 107
286 life život 5.60 111 117
287 same isti 5.59 321 402
288 beautiful lep 5.58 53 58
289 influence uticaj 5.57 35 45
290 rich bogat 5.56 28 29
291 choose birati 5.55 21 24
292 element element 5.55 35 51
293 instructional nastavni 5.54 55 73
294 enable omogućavati 5.52 27 31
295 student (university) student 5.52 157 189
296 accent izgovor 5.52 27 32
297 program of study studija 5.52 105 121
298 paper list 5.51 64 65
299 structure struktura 5.51 34 43
300 special poseban 5.51 152 177
301 high (school) srednji 5.50 53 56
302 protection zaštita 5.50 94 106
303 follow pratiti 5.49 49 50
304 informing informisanje 5.49 46 51
305 four četiri 5.49 154 174
306 music muzika 5.48 118 127
307 continue nastaviti 5.48 29 29
308 film (adj.) filmski 5.48 101 112
309 institute matica 5.47 38 43
310 thirty trideset 5.46 34 35
311 area područje 5.45 36 39
312 easier lakše 5.44 40 41
313 interest interesovanje 5.42 45 47
314 additional dodatni 5.41 27 28
315 take (exams) polagati 5.40 32 39
316 students (K-12) učenici 5.39 89 106
317 publish objaviti 5.39 136 141
318 translator prevodilac 5.37 178 197
319 understand razumeti 5.34 178 191
320 six šest 5.34 89 95
321 future (adj.) budući 5.33 38 40
322 significance značaj 5.33 44 48
323 needed potreban 5.33 64 67
324 necessary neophodan 5.30 46 51
325 discussion rasprava 5.29 25 29
326 twenty dvadesetak 5.27 21 21
327 preserve sačuvati 5.25 26 29
328 level nivo 5.25 78 92
329 second drugi 5.21 781 1022
330 subject predmet 5.21 207 297
331 minister ministar 5.19 29 31
332 public javni 5.18 42 52
333 others ostali 5.16 148 164
334 nature priroda 5.16 54 57
335 form oblik 5.16 35 42
336 readers čitaoci 5.16 43 43
337 works delima 5.16 20 20
338 literary književni 5.15 308 514
339 phenomenon pojava 5.15 27 27

250
N Collocate (English) Collocate (Serbian) MI score Texts Total
340 five pet 5.15 123 132
341 keep držati 5.14 40 42
342 third treći 5.14 53 62
343 find pronađu 5.12 24 24
344 twenty dvadeset 5.12 23 23
345 whole (n.) celina 5.12 20 21
346 several nekoliko 5.11 227 238
347 study (v.) studirati 5.10 53 63
348 desire (v.) želeti 5.09 84 92
349 all svi 5.08 902 1068
350 represent predstavljati 5.07 106 114
351 show pokazivati 5.06 40 43
352 defense odbrana 5.05 44 60
353 body tela 5.05 22 26
354 law zakon 5.05 100 124
355 three tri 5.05 217 259
356 persons lica 5.05 21 42
357 voice glas 5.05 25 25
358 university univerzitet 5.04 108 116
359 understood podrazumeva 5.04 33 33
360 declare izjasniti 5.04 20 22
361 need (n.) potreba 5.03 71 79
362 framework okvir 5.03 66 76
363 task zadatak 5.03 35 43
364 two dva 5.02 381 484
365 simultaneously istovremeno 5.02 57 61
366 communication komunikacija 5.02 89 106
367 environment sredina 5.01 34 47
368 association udruženje 5.01 44 53

251
Appendix G: Collocation Analysis (5+ Hits Section of SERBCORP)

Table G1

Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP

(by frequency)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 Serbian srpski 8.80 802 3449
2 foreign strani 13.02 346 1011
3 that taj 7.64 484 791
4 English engleski 12.63 323 693
5 mother (adj.) maternji 8.86 296 636
6 own svoj 7.70 360 608
7 speak govoriti 10.26 321 552
8 second drugi 11.05 307 507
9 itself sam 5.11 338 494
10 alphabet pismo 9.19 165 457
11 literature književnost 7.58 220 456
12 one jedan 6.22 281 454
13 all svi 8.24 290 406
14 our naš 8.91 257 401
15 Croatian hrvatski 8.10 157 363
16 Montenegrin crnogorski 12.77 112 351
17 learn (v.) učiti 9.51 175 340
18 literary književni 9.16 176 335
19 school (K-12) škola 6.97 179 318
20 they oni 5.86 245 310
21 official službeni 14.15 107 285
22 use (n.) upotreba 9.35 136 274
23 this ovaj 6.53 211 265
24 instruction nastava 8.30 149 262
25 new nov 10.26 172 254
26 word reč 5.86 180 253
27 people narod 7.09 153 252
28 year godina 5.36 186 239
29 professor profesor 8.14 145 239
30 culture kultura 8.74 153 231
31 national nacionalni 8.64 126 222
32 first prvi 6.08 154 222
33 learning učenje 7.28 136 222
34 French francuski 11.78 115 215
35 his njegov 7.45 165 213
36 name ime 10.61 96 208
37 two dva 5.72 138 206
38 Russian ruski 7.98 89 202
39 say kazati 9.73 152 187
40 Serbo-Croatian srpskohrvatski 8.76 96 182
41 German nemački 7.66 112 177
42 subject predmet 11.06 103 170
43 book knjiga 5.41 118 169
44 their njihov 7.48 135 168
45 question pitanje 5.88 108 168
46 Bosnian bosanski 9.30 54 165
47 written pisan 11.83 119 163
48 exists postoji 9.38 126 161
49 know znati 9.44 122 159
50 dictionary rečnik 6.66 77 158
51 he on 5.39 133 156
52 same isti 7.39 112 154
53 name naziv 7.72 89 151
54 world (n.) svet 6.43 112 150
55 Serbia Srbija 6.54 109 148
56 Monte(negro) Gora 8.92 80 144
57 people's narodni 11.67 76 143

252
N Collocate (English) Collocate (Serbian) MI score Texts Total
58 elementary osnovni 9.64 107 138
59 Serbs Srbi 5.74 86 138
60 Cyrillic (n.) ćirilica 7.61 70 135
61 mathematics matematika 9.29 68 135
62 Roma romski 8.49 34 135
63 (Monte)negro Crna 7.75 81 134
64 history istorija 7.43 97 134
65 part deo 5.62 95 129
66 minority manjina 7.89 64 128
67 knowledge znanje 8.76 90 124
68 country zemlja 7.03 86 122
69 curriculum program 7.89 85 121
70 school (university) fakultet 7.22 86 120
71 standardization standardizacija 10.72 56 117
72 Bosniak bošnjački 7.75 54 116
73 children deca 7.21 75 113
74 every svaki 6.34 94 113
75 official zvaničan 10.91 72 112
76 institute institut 8.02 54 110
77 little mali 7.28 75 110
78 big veliki 5.85 82 110
79 man čovek 6.13 86 109
80 Hungarian mađarski 9.85 51 108
81 that is odnosno 5.51 81 108
82 contemporary savremeni 10.23 73 106
83 grade razred 7.20 65 103
84 Slovene slovenski 13.86 58 101
85 science nauka 7.56 69 100
86 instructor nastavnik 8.59 62 99
87 Spanish španski 7.60 47 98
88 renaming preimenovanje 9.54 45 94
89 that onaj 6.63 74 93
90 special poseban 11.94 71 93
91 introduction uvođenje 7.73 58 92
92 department katedra 13.25 57 91
93 orthography pravopis 7.82 53 90
94 student (university) student 8.57 62 90
95 board odbor 7.74 48 88
96 number broj 9.08 61 84
97 class period čas 7.76 64 84
98 linguistic jezički 7.09 63 83
99 begin početi 7.97 33 83
100 different različit 13.87 61 83
101 common zajednički 10.79 64 83
102 good dobar 6.75 75 82
103 rights prava 5.28 65 82
104 standard (adj.) standardni 9.12 45 81
105 state država 8.45 61 79
106 dialect dijalekat 9.50 46 78
107 many mnogi 7.18 68 78
108 learn (v.) naučiti 13.36 66 78
109 call zvati 9.15 51 78
110 cultural kulturni 7.13 70 77
111 Albanian albanski 9.84 37 76
112 use (v.) koristiti 12.79 55 76
113 Vuk (Karadžić) Vuk 7.57 50 75
114 say reći 11.64 64 74
115 basis osnov 9.41 60 72
116 speech govor 6.51 49 71
117 Belgrade Beograd 5.46 64 70
118 Greek grčki 7.76 34 70
119 problem problem 5.37 59 70
120 SANU SANU 6.19 40 70
121 Croats Hrvati 6.48 42 69
122 translation prevod 6.20 54 69
123 law zakon 6.13 47 69
124 teach predavati 8.05 49 68
125 European evropski 7.12 48 67
126 course kurs 7.98 43 67

253
N Collocate (English) Collocate (Serbian) MI score Texts Total
127 grammar gramatika 8.34 44 65
128 group grupa 10.75 43 65
129 my moj 5.67 44 65
130 relation odnos 6.56 50 65
131 become postati 6.94 50 65
132 political politički 6.23 52 64
133 writer pisac 9.14 46 63
134 education obrazovanje 5.37 49 62
135 own sopstveni 8.14 46 62
136 that tim 8.73 53 62
137 get dobiti 7.31 52 60
138 only jedini 9.34 50 60
139 decision odluka 6.81 41 60
140 translated preveden 9.30 49 60
141 Latin latinski 7.87 35 59
142 section odeljenje 10.40 38 59
143 textbook udžbenik 8.23 44 59
144 four četiri 5.08 46 58
145 exam ispit 6.69 38 57
146 living živ 8.50 47 57
147 media mediji 8.43 35 56
148 translate prevoditi 9.40 42 56
149 students (K-12) učenici 5.80 42 56
150 percent odsto 7.03 32 55
151 translator prevodilac 6.20 42 55
152 world (adj.) svetski 7.40 45 55
153 introduce uvesti 12.45 41 55
154 I ja 7.89 41 54
155 these ovi 9.36 51 54
156 represent predstavljati 5.53 48 54
157 work (v.) raditi 5.80 48 54
158 constitution ustav 6.69 40 54
159 others ostali 8.06 46 53
160 case slučaj 7.62 45 53
161 desire (v.) želeti 10.03 49 53
162 Europe Evropa 7.33 41 52
163 always uvek 7.22 49 52
164 Bulgarian bugarski 6.96 24 51
165 nation nacija 11.23 37 51
166 need (n.) potreba 7.05 43 51
167 study (n.) studija 6.37 40 51
168 make (v.) čini 6.02 43 50
169 identity identitet 6.31 37 50
170 compulsory obavezan 8.16 36 50
171 poetry poezija 10.49 63 50
172 Romanian rumunski 8.65 33 50
173 so-called takozvani 8.37 36 50
174 be able to moći 7.20 35 49
175 communication komunikacija 8.06 33 48
176 level nivo 6.70 38 48
177 area oblast 7.54 38 48
178 instructional nastavni 7.63 31 47
179 engage baviti 9.66 39 46
180 Montenegrins Crnogorci 6.48 22 46
181 department odsek 9.27 30 46
182 consider smatrati 5.01 42 46
183 art umetnost 6.15 42 46
184 day dan 5.70 41 45
185 defense odbrana 8.67 30 45
186 possiblity mogućnost 5.67 38 44
187 war rat 5.95 27 44
188 difference razlika 10.73 35 44
189 thing stvar 5.86 38 44
190 society društvo 5.84 32 43
191 think misliti 6.60 37 43
192 change (n.) promena 6.62 33 43
193 Ruthenian rusinski 8.88 24 43
194 text tekst 5.74 30 43
195 protection zaštita 7.41 39 43

254
N Collocate (English) Collocate (Serbian) MI score Texts Total
196 old stari 7.37 33 42
197 lead (v.) voditi 6.63 39 42
198 spirit duh 6.60 34 41
199 philological filološki 7.97 35 41
200 Latin (adj.) latinica 11.10 27 41
201 best najbolji 6.37 39 41
202 poem pesma 7.04 35 41
203 majority većina 9.34 36 41
204 link veza 7.32 35 41
205 framework okvir 8.61 30 40
206 remain ostati 6.85 33 40
207 knowledge poznavanje 7.61 34 40
208 Slovak slovački 8.65 27 40
209 faith vera 6.89 32 40
210 come doći 6.61 37 39
211 spoken govorni 10.14 28 39
212 understand razumeti 7.05 33 39
213 such takav 5.61 36 39
214 center centar 5.70 29 38
215 title naslov 9.50 26 38
216 attend pohađati 7.86 23 38
217 belong pripadati 6.04 27 38
218 element element 8.90 23 37
219 biggest najveći 5.51 33 37
220 write pisati 8.37 30 37
221 both oba 5.89 26 36
222 self sebe 5.62 32 36
223 study (v.) izučavati 9.23 26 35
224 linguistic lingvistički 8.38 25 35
225 written napisan 9.45 29 35
226 beginning početak 5.95 32 35
227 development razvoj 5.57 28 35
228 influence (n.) uticaj 6.82 25 35
229 community zajednica 6.66 27 35
230 elective izborni 6.37 20 34
231 published objavljen 6.39 30 34
232 system sistem 9.58 29 34
233 use (v.) služiti 10.39 29 34
234 expert stručnjak 7.47 30 34
235 tradition tradicija 6.62 29 34
236 third treći 6.46 28 34
237 philosophical filozofski 6.39 24 33
238 Croatia Hrvatska 6.48 26 33
239 Italian italijanski 6.55 27 33
240 expression izraz 8.84 25 33
241 to not have nemati 6.57 32 33
242 her njen 9.47 29 33
243 publish objaviti 8.07 32 33
244 council savet 5.80 24 33
245 Belgrade (adj.) beogradski 6.52 29 32
246 edition izdanje 5.95 25 32
247 existence postojanje 8.03 24 32
248 Cyrillic (adj.) ćirilično 7.91 23 31
249 citizens građani 9.02 25 31
250 linguistics lingvistika 8.17 23 31
251 change menjati 5.76 22 31
252 needed potreban 7.06 28 31
253 republic republika 6.29 25 31
254 Ijekavian (dialect) ijekavski 7.26 22 30
255 plan plan 6.20 24 30
256 association udruženje 6.96 23 30
257 linguist lingvista 5.77 28 29
258 reason razlog 5.64 20 29
259 six šest 5.14 25 29
260 high school gimnazija 5.14 24 28
261 violence nasilje 10.11 23 28
262 scientific naučni 6.33 25 28
263 necessary neophodan 8.16 25 28
264 never nikad 5.50 26 28

255
N Collocate (English) Collocate (Serbian) MI score Texts Total
265 standard (n.) standard 7.19 21 28
266 exclusively isključivo 8.51 25 27
267 Karadžić (Vuk) Karadžić 6.33 25 27
268 come into being nastati 6.76 22 27
269 explain objašnjavati 9.21 23 27
270 preservation očuvanje 6.67 23 27
271 accept prihvatiti 7.65 25 27
272 structure struktura 7.05 20 27
273 claim (v.) tvrditi 7.58 26 27
274 teachers (K-8) učitelji 6.27 20 27
275 see videti 10.42 23 27
276 state (adj.) državni 7.00 20 26
277 Nikšić Nikšić 7.59 20 26
278 form oblik 6.23 20 26
279 concern (v.) ticati 5.34 23 26
280 often često 5.09 24 25
281 name (v.) nazvati 7.03 20 25
282 nature priroda 6.54 23 25
283 sense smisao 6.11 24 25
284 high (school) srednji 8.12 22 25
285 creation stvaranje 6.13 20 25
286 topic tema 5.49 23 25
287 last poslednji 6.27 23 24
288 means (n.) sredstvo 6.50 22 24
289 everyday svakodnevni 8.18 20 24
290 difficult (adv.) teško 8.10 24 24
291 authorities vlast 5.07 20 24
292 significance značaj 5.70 20 24
293 academy akademija 5.99 22 23
294 institution institucija 5.39 21 23
295 opinion mišljenje 5.26 21 23
296 consideration obzir 5.07 20 23
297 bigger veći 5.69 21 23
298 work (n.) delo 5.47 20 22
299 less/er manje 5.15 21 22
300 origin poreklo 6.67 21 22
301 project (n.) projekat 8.95 20 22
302 hundred sto 5.55 20 22
303 interest interesovanje 5.74 20 21
304 studying izučavanje 7.87 20 21
305 government vlada 5.66 20 20

256
Table G2

Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP

(by MI Score)

N Collocate (English) Collocate (Serbian) MI score Texts Total


1 official službeni 14.15 107 285
2 different različit 13.87 61 83
3 Slovene slovenski 13.86 58 101
4 learn (v.) naučiti 13.36 66 78
5 department katedra 13.25 57 91
6 foreign strani 13.02 346 1011
7 use (v.) koristiti 12.79 55 76
8 Montenegrin crnogorski 12.77 112 351
9 English engleski 12.63 323 693
10 introduce uvesti 12.45 41 55
11 special poseban 11.94 71 93
12 written pisan 11.83 119 163
13 French francuski 11.78 115 215
14 people's narodni 11.67 76 143
15 say reći 11.64 64 74
16 nation nacija 11.23 37 51
17 Latin (adj.) latinica 11.10 27 41
18 subject predmet 11.06 103 170
19 second drugi 11.05 307 507
20 official zvaničan 10.91 72 112
21 common zajednički 10.79 64 83
22 group grupa 10.75 43 65
23 difference razlika 10.73 35 44
24 standardization standardizacija 10.72 56 117
25 name ime 10.61 96 208
26 poetry poezija 10.49 63 50
27 see videti 10.42 23 27
28 section odeljenje 10.40 38 59
29 use (v.) služiti 10.39 29 34
30 speak govoriti 10.26 321 552
31 new nov 10.26 172 254
32 contemporary savremeni 10.23 73 106
33 spoken govorni 10.14 28 39
34 violence nasilje 10.11 23 28
35 desire (v.) želeti 10.03 49 53
36 Hungarian mađarski 9.85 51 108
37 Albanian albanski 9.84 37 76
38 say kazati 9.73 152 187
39 engage baviti 9.66 39 46
40 elementary osnovni 9.64 107 138
41 system sistem 9.58 29 34
42 renaming preimenovanje 9.54 45 94
43 learn (v.) učiti 9.51 175 340
44 dialect dijalekat 9.50 46 78
45 title naslov 9.50 26 38
46 her njen 9.47 29 33
47 written napisan 9.45 29 35
48 know znati 9.44 122 159
49 basis osnov 9.41 60 72
50 translate prevoditi 9.40 42 56
51 exists postoji 9.38 126 161
52 these ovi 9.36 51 54
53 use (n.) upotreba 9.35 136 274
54 only jedini 9.34 50 60
55 majority većina 9.34 36 41
56 Bosnian bosanski 9.30 54 165
57 translated preveden 9.30 49 60
58 mathematics matematika 9.29 68 135
59 department odsek 9.27 30 46
60 study (v.) izučavati 9.23 26 35

257
N Collocate (English) Collocate (Serbian) MI score Texts Total
61 explain objašnjavati 9.21 23 27
62 alphabet pismo 9.19 165 457
63 literary književni 9.16 176 335
64 call zvati 9.15 51 78
65 writer pisac 9.14 46 63
66 standard (adj.) standardni 9.12 45 81
67 number broj 9.08 61 84
68 citizens građani 9.02 25 31
69 project (n.) projekat 8.95 20 22
70 (Monte)negro Gora 8.92 80 144
71 our naš 8.91 257 401
72 element element 8.90 23 37
73 Ruthenian rusinski 8.88 24 43
74 mother (adj.) maternji 8.86 296 636
75 expression izraz 8.84 25 33
76 Serbian srpski 8.80 802 3449
77 Serbo-Croatian srpskohrvatski 8.76 96 182
78 knowledge znanje 8.76 90 124
79 culture kultura 8.74 153 231
80 that tim 8.73 53 62
81 defense odbrana 8.67 30 45
82 Romanian rumunski 8.65 33 50
83 Slovak slovački 8.65 27 40
84 national nacionalni 8.64 126 222
85 framework okvir 8.61 30 40
86 instructor nastavnik 8.59 62 99
87 student (university) student 8.57 62 90
88 exclusively isključivo 8.51 25 27
89 living živ 8.50 47 57
90 Roma romski 8.49 34 135
91 state država 8.45 61 79
92 media mediji 8.43 35 56
93 linguistic lingvistički 8.38 25 35
94 so-called takozvani 8.37 36 50
95 write pisati 8.37 30 37
96 grammar gramatika 8.34 44 65
97 instruction nastava 8.30 149 262
98 all svi 8.24 290 406
99 textbook udžbenik 8.23 44 59
100 everyday svakodnevni 8.18 20 24
101 linguistics lingvistika 8.17 23 31
102 compulsory obavezan 8.16 36 50
103 necessary neophodan 8.16 25 28
104 professor profesor 8.14 145 239
105 own sopstveni 8.14 46 62
106 high (school) srednji 8.12 22 25
107 Croatian hrvatski 8.10 157 363
108 difficult (adv.) teško 8.10 24 24
109 publish objaviti 8.07 32 33
110 others ostali 8.06 46 53
111 communication komunikacija 8.06 33 48
112 teach predavati 8.05 49 68
113 existence postojanje 8.03 24 32
114 institute institut 8.02 54 110
115 Russian ruski 7.98 89 202
116 course kurs 7.98 43 67
117 begin početi 7.97 33 83
118 philological filološki 7.97 35 41
119 Cyrillic (adj.) ćirilično 7.91 23 31
120 minority manjina 7.89 64 128
121 curriculum program 7.89 85 121
122 I ja 7.89 41 54
123 Latin latinski 7.87 35 59
124 studying izučavanje 7.87 20 21
125 attend pohađati 7.86 23 38
126 orthography pravopis 7.82 53 90
127 class period čas 7.76 64 84
128 Greek grčki 7.76 34 70
129 Monte(negro) Crna 7.75 81 134

258
N Collocate (English) Collocate (Serbian) MI score Texts Total
130 Bosniak bošnjački 7.75 54 116
131 board odbor 7.74 48 88
132 introduction uvođenje 7.73 58 92
133 name naziv 7.72 89 151
134 own svoj 7.70 360 608
135 German nemački 7.66 112 177
136 accept prihvatiti 7.65 25 27
137 that taj 7.64 484 791
138 instructional nastavni 7.63 31 47
139 case slučaj 7.62 45 53
140 Cyrillic (n.) ćirilica 7.61 70 135
141 knowledge poznavanje 7.61 34 40
142 Spanish španski 7.60 47 98
143 Nikšić Nikšić 7.59 20 26
144 literature književnost 7.58 220 456
145 claim (v.) tvrditi 7.58 26 27
146 Vuk (Karadžić) Vuk 7.57 50 75
147 science nauka 7.56 69 100
148 area oblast 7.54 38 48
149 their njihov 7.48 135 168
150 expert stručnjak 7.47 30 34
151 his njegov 7.45 165 213
152 history istorija 7.43 97 134
153 protection zaštita 7.41 39 43
154 world (adj.) svetski 7.40 45 55
155 same isti 7.39 112 154
156 old stari 7.37 33 42
157 Europe Evropa 7.33 41 52
158 link veza 7.32 35 41
159 get dobiti 7.31 52 60
160 learning učenje 7.28 136 222
161 little mali 7.28 75 110
162 Ijekavian (dialect) ijekavski 7.26 22 30
163 school (university) fakultet 7.22 86 120
164 always uvek 7.22 49 52
165 children deca 7.21 75 113
166 grade razred 7.20 65 103
167 be able to moći 7.20 35 49
168 standard (n.) standard 7.19 21 28
169 many mnogi 7.18 68 78
170 cultural kulturni 7.13 70 77
171 European evropski 7.12 48 67
172 people narod 7.09 153 252
173 linguistic jezički 7.09 63 83
174 needed potreban 7.06 28 31
175 need (n.) potreba 7.05 43 51
176 understand razumeti 7.05 33 39
177 structure struktura 7.05 20 27
178 poem pesma 7.04 35 41
179 country zemlja 7.03 86 122
180 percent odsto 7.03 32 55
181 name (v.) nazvati 7.03 20 25
182 state (adj.) državni 7.00 20 26
183 school (K-12) škola 6.97 179 318
184 Bulgarian bugarski 6.96 24 51
185 association udruženje 6.96 23 30
186 become postati 6.94 50 65
187 faith vera 6.89 32 40
188 remain ostati 6.85 33 40
189 influence (n.) uticaj 6.82 25 35
190 decision odluka 6.81 41 60
191 come into being nastati 6.76 22 27
192 good dobar 6.75 75 82
193 level nivo 6.70 38 48
194 exam ispit 6.69 38 57
195 constitution ustav 6.69 40 54
196 preservation očuvanje 6.67 23 27
197 origin poreklo 6.67 21 22
198 dictionary rečnik 6.66 77 158

259
N Collocate (English) Collocate (Serbian) MI score Texts Total
199 community zajednica 6.66 27 35
200 that onaj 6.63 74 93
201 lead (v.) voditi 6.63 39 42
202 change (n.) promena 6.62 33 43
203 tradition tradicija 6.62 29 34
204 come doći 6.61 37 39
205 think misliti 6.60 37 43
206 spirit duh 6.60 34 41
207 to not have nemati 6.57 32 33
208 relation odnos 6.56 50 65
209 Italian italijanski 6.55 27 33
210 Serbia Srbija 6.54 109 148
211 nature priroda 6.54 23 25
212 this ovaj 6.53 211 265
213 Belgrade (adj.) beogradski 6.52 29 32
214 speech govor 6.51 49 71
215 means (n.) sredstvo 6.50 22 24
216 Croats Hrvati 6.48 42 69
217 Montenegrins Crnogorci 6.48 22 46
218 Croatia Hrvatska 6.48 26 33
219 third treći 6.46 28 34
220 world (n.) svet 6.43 112 150
221 published objavljen 6.39 30 34
222 philosophical filozofski 6.39 24 33
223 study (n.) studija 6.37 40 51
224 best najbolji 6.37 39 41
225 elective izborni 6.37 20 34
226 every svaki 6.34 94 113
227 scientific naučni 6.33 25 28
228 Karadžić (Vuk) Karadžić 6.33 25 27
229 identity identitet 6.31 37 50
230 republic republika 6.29 25 31
231 teachers (K-8) učitelji 6.27 20 27
232 last poslednji 6.27 23 24
233 political politički 6.23 52 64
234 form oblik 6.23 20 26
235 one jedan 6.22 281 454
236 translation prevod 6.20 54 69
237 translator prevodilac 6.20 42 55
238 plan plan 6.20 24 30
239 SANU SANU 6.19 40 70
240 art umetnost 6.15 42 46
241 man čovek 6.13 86 109
242 law zakon 6.13 47 69
243 creation stvaranje 6.13 20 25
244 sense smisao 6.11 24 25
245 first prvi 6.08 154 222
246 belong pripadati 6.04 27 38
247 make (v.) čini 6.02 43 50
248 academy akademija 5.99 22 23
249 war rat 5.95 27 44
250 beginning početak 5.95 32 35
251 edition izdanje 5.95 25 32
252 both oba 5.89 26 36
253 question pitanje 5.88 108 168
254 they oni 5.86 245 310
255 word reč 5.86 180 253
256 thing stvar 5.86 38 44
257 big veliki 5.85 82 110
258 society društvo 5.84 32 43
259 students (K-12) učenici 5.80 42 56
260 work (v.) raditi 5.80 48 54
261 council savet 5.80 24 33
262 linguist lingvista 5.77 28 29
263 change menjati 5.76 22 31
264 Serbs Srbi 5.74 86 138
265 text tekst 5.74 30 43
266 interest interesovanje 5.74 20 21
267 two dva 5.72 138 206

260
N Collocate (English) Collocate (Serbian) MI score Texts Total
268 day dan 5.70 41 45
269 center centar 5.70 29 38
270 significance značaj 5.70 20 24
271 bigger veći 5.69 21 23
272 my moj 5.67 44 65
273 possiblity mogućnost 5.67 38 44
274 government vlada 5.66 20 20
275 reason razlog 5.64 20 29
276 part deo 5.62 95 129
277 self sebe 5.62 32 36
278 such takav 5.61 36 39
279 development razvoj 5.57 28 35
280 hundred sto 5.55 20 22
281 represent predstavljati 5.53 48 54
282 that is odnosno 5.51 81 108
283 biggest najveći 5.51 33 37
284 never nikad 5.50 26 28
285 topic tema 5.49 23 25
286 work (n.) delo 5.47 20 22
287 Belgrade Beograd 5.46 64 70
288 book knjiga 5.41 118 169
289 he on 5.39 133 156
290 institution institucija 5.39 21 23
291 problem problem 5.37 59 70
292 education obrazovanje 5.37 49 62
293 year godina 5.36 186 239
294 concern (v.) ticati 5.34 23 26
295 rights prava 5.28 65 82
296 opinion mišljenje 5.26 21 23
297 less/er manje 5.15 21 22
298 six šest 5.14 25 29
299 high school gimnazija 5.14 24 28
300 itself sam 5.11 338 494
301 often često 5.09 24 25
302 four četiri 5.08 46 58
303 authorities vlast 5.07 20 24
304 consideration obzir 5.07 20 23
305 consider smatrati 5.01 42 46

1
SETIMES2, OPUS2, and srWaC14 are available at www.sketchengine.co.uk.
2
The caveat here, of course, is that the reference/comparator corpus should be at least the size of the

research corpus.
3
Because of their large sizes and limited availability, the WaC corpora could only be used as reference

corpora by first downloading their full wordlists in txt format and then converting these into WST wordlists

for the purposes of keyword analysis. The alternative solution, uploading the entire SERBCOMP onto the

SketchEngine website to conduct keyword analysis there, was technically demanding and prohibitively

expensive.
4
It should be noted that Serbian is a heavily inflectional language and so all search terms, keywords,

collocates, and n-grams are likely to (and do) show up in multiple inflectional forms in the corpus.

Although lemmatization can be problematic because it “has the potential to disguise important differences

261
in collocational preferences between different forms of a lemma” (Durrant, 2009, p. 162; see also Sinclair,

1991), it was consistently applied to all quantitative CL analyses (with the exception of n-gram analysis) in

this study to reduce the impact of inflectional morphology on statistical analyses (cf. Baker, Gabrielatos &

McEnery, 2013; Partington, 2010). For example, treating individual lemma forms separately often meant

that obviously important lexical items either fell (well) below the frequency threshold or appeared to be

less salient than they are. Lemmatization solved this problem by adding up the frequencies of all individual

lemma forms for a total lemma frequency. Similarly, treating individual lemma forms separately would

have multiplied the sometimes already large numbers of keywords, collocates, and n-grams. As a

corollary, many collocate variables based on individual lemma forms in EFA would have likely failed to

load on any factors due to their considerably lower frequencies. Thus, even though different lemma forms

do often exhibit different, and sometimes complementary, collocational preferences, this does not appear to

be of paramount importance in the context of a macroscopic approach as employed in this study.


41
For presentation purposes, the forms of lemmatized keywords are standardized to the nominative case of

the predominant number (singular or plural) for nouns and pronouns, first person singular masculinum for

adjectives, and the infinitive case for verbs. Keywords that appeared in only one of their possible lemma

forms are presented in their original form.

262

Anda mungkin juga menyukai