Adnan Ajšić - LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM IN THE BALKANS: A CORPUS-BASED STUDY

LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM
IN THE BALKANS: A CORPUS-BASED STUDY
By Adnan Ajšić
A Dissertation
Submitted in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy
in Applied Linguistics
Northern Arizona University
May 2015
Approved:
Douglas Biber, PhD, Co-Chair
Mary McGroarty, PhD, Co-Chair
Randi Reppen, PhD
James Wilce, PhD

Abstract
LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM
IN THE BALKANS: A CORPUS-BASED STUDY
ADNAN AJŠIĆ
Language ideologies have been closely related to nationalist discourses since the
inception of nationalism and the one-nation-one-language-one-territory trope, and
continue to be important for the construction and maintenance of national identities in
Europe and elsewhere. Although recent research has examined language debates and the
links between language ideologies and national identities in plurilingual and multicultural
societies (e.g., Canada, Vessey, 2013a; Spain/Catalonia, Pujolar, 2007), little attention has
been paid to contexts with minimal linguistic differences between groups such as the
West Central Balkans. Public language-related discourse in the Central South Slavic area
in the last twenty years has been dominated by a fierce debate over the ownership of the
common language (formerly known as Serbo-Croatian) and the concomitant contestation
of ethnolinguistic identities. The principal goal of this study, therefore, was to identify
dominant language-related discourses and language ideologies on the basis of an
empirical, mixed methods approach, and investigate the links between language-related
discourses, language ideologies, and ethnonationalist discourse in the mainstream press
published in Serbia as the largest Central South Slavic nation.
To investigate language-related discourses and language ideologies in the
mainstream Serbian press two comprehensive, specialized research (11,656,247 words
from 16,148 articles) and comparator (22,493,804 words from 37,227 articles) corpora
were compiled from relevant articles published in four leading Serbian dailies and
ii
weeklies. Following recent developments in mixed methods research into discourses and
ideologies (Baker et al., 2008), the data were analyzed using a combination of
quantitative (corpus linguistics) and qualitative (critical discourse analysis/discourse-
historical approach) methods. The second major goal of this study, therefore, was to
compare quantitative methods employed in terms of their usefulness and effectiveness for
the identification of language-related discourses and language ideologies.
The findings suggest the existence of pervasive language-related discourses of
endangerment and contestation which are based on an essentialist language ideology with
a long history and crucial function in Serbian nationalism. The methodological
comparison suggests different roles for different quantitative methods (e.g., micro- and
macroscopic analysis), as well as an overall complementarity of the quantitative and
qualitative methods. Crucially, however, exploratory factor analysis is shown to be the
most effective analytical method for the purposes of corpus-based investigations of
discourses and ideologies. Finally, despite some synchronic and diachronic variation in
(small ‘d’) discourses suggested by factors, the discursive and ideological profiles of the
mainstream Serbian press are shown to be fairly uniform and stable, suggesting broad
acceptance and naturalization of dominant language-related discourses and language
ideologies in Serbian society.
iii
Adnan Ajšić
© 2015
iv
Acknowledgments
I would like to thank my committee members, Doug Biber, Mary McGroarty,
Randi Reppen, and Jim Wilce for their unwavering support and endless patience. When I
was embarking upon this journey, I thought each one of you would be able to provide a
unique perspective I could rely on. I chose wisely. Thank you.
I am also grateful to my wife, Deniza, and my son, Aiden Mak, for endurance and
inspiration during a very difficult time. Hvala oboma. Gotovo je, slobodni smo. I’d also
like thank my mom, Elbisa, and mother-in-law, Ajša, who both did what Bosnian moms
do. Hvala objema. Finally, thank you to my sister, Amra, for proving me right, and my
brother-in-law, Nebojša, for having a rare combination of intelligence, skill, and patience.
Fala ti, učitelju.
Meši
Hvala ti na nesebičnosti i dostojanstvu u plavom mantilu. Htjelo je ovako da bude.
v
Contents
List of Tables.................................................................................................................... xii
List of Figures .................................................................................................................. xv
1. Introduction ................................................................................................................... 1
1.1 Definition of Problem ............................................................................................... 1
1.2 Sociolinguistic History and the Role of Language Ideologies in Contemporary
Balkans............................................................................................................................ 3
1.2.1 The symbolic importance of language. .............................................................. 3
1.2.2 A brief sociolinguistic history of West Central Balkans. ................................... 6
1.2.3 Language, identity and ethnonationalism in contemporary Balkans. .............. 13
1.3 Study Outline .......................................................................................................... 15
2. Literature Review ....................................................................................................... 16
2.1 Theoretical Approaches to Ideology ....................................................................... 16
2.1.1 Historical development. ................................................................................... 16
2.1.2 Theoretical approaches. ................................................................................... 16
2.1.3 Definitions........................................................................................................ 18
2.1.4 Conceptualizations. .......................................................................................... 20
2.2 Theoretical Approaches to Language Ideology ...................................................... 21
2.2.1 Historical development. ................................................................................... 21
2.2.2 Theoretical approaches. ................................................................................... 22
2.2.3 Definitions and conceptualizations. ................................................................. 23
2.3 Empirical Approaches to Language Ideology ......................................................... 26
2.3.1 Theoretical and methodological contexts in language ideology research. ....... 26
2.3.2 Research questions in language ideology research. ......................................... 28
vi
2.3.3 Types of data used in language ideology research. .......................................... 31
2.3.4 Corpus linguistics in research on discourse and language ideology. ............... 32
2.4 Gaps ........................................................................................................................ 34
3. Study Overview ........................................................................................................... 36
3.1 Research Questions ................................................................................................. 36
3.1.1 Research question 1. ........................................................................................ 36
3.2 Research Design ..................................................................................................... 36
3.3 Construct Definitions and Operationalizations ....................................................... 39
3.3.1 Core concepts. .................................................................................................. 39
3.3.2 Keywords. ........................................................................................................ 39
3.3.3 Relevant collocates. ......................................................................................... 39
3.3.4 Dominant language-related discourses. ........................................................... 40
3.3.5 Dominant language ideologies. ........................................................................ 41
3.3.6 Ethnonationalism. ............................................................................................ 42
3.4 Coding Procedures .................................................................................................. 43
4. Data .............................................................................................................................. 43
5. Methods ........................................................................................................................ 48
5.1 Keyword Analysis ................................................................................................... 49
5.1.1 Theoretical background.................................................................................... 49
5.1.2 Analytical parameters and procedures. ............................................................ 51
vii
5.2 Collocation Analysis ............................................................................................... 52
5.3 Exploratory Factor Analysis ................................................................................... 54
5.3 Synchronic and Diachronic Variation (Analysis of Variance) ................................ 59
5.4 Cluster Analysis ...................................................................................................... 61
5.5 Critical Discourse Analysis: Discourse-historical Approach .................................. 61
6. Results .......................................................................................................................... 67
6.1 Keyword Analysis ................................................................................................... 67
6.1.1 Keyword analysis (5+ hits section of SERBCORP). ....................................... 67
6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of
SERBCORP). ............................................................................................................ 73
6.1.3 Keyword associates. ......................................................................................... 81
6.2 Collocation Analysis ............................................................................................... 87
6.2.1 Collocation analysis (5+ hits section of SERBCORP). ................................... 87
viii
6.2.2 N-grams. ........................................................................................................... 91
6.3 Exploratory Factor Analysis ................................................................................... 96
6.3.1 Factor 1: Language education. ....................................................................... 105
6.3.2 Factor 3: Entrance exams. .............................................................................. 108
6.3.3 Factor 9: Foreign language education. ............................................................110
6.3.4 Factor 11: Officialization of Bosnian. .............................................................112
6.3.5 Factor 2: Cyrillic-only. ....................................................................................114
6.3.6 Factor 5: Minority language rights. ................................................................116
6.3.7 Factor 4: Officialization of Montenegrin 1. ....................................................119
6.3.8 Factor 8: Officialization of Montenegrin 2. ................................................... 121
6.3.9 Factor 6: Contestation over language ownership and name. ......................... 123
6.3.10 Factor 10: Linguistics as a science, lexicography, standardization and
contestation. ............................................................................................................ 126
6.3.11 Factor 7: Literature and publishing. ............................................................. 129
6.3.12 Factor 12: Linguacultural diplomacy, language, and culture. ...................... 130
6.4 Synchronic and Diachronic Variation in Language-related Discourses (Analysis of
Variance) ..................................................................................................................... 132
6.4.1 Variation by publication (synchronic). ........................................................... 133
6.4.2 Summary of variation by publication. ........................................................... 134
6.4.3 Variation by year of publication (diachronic). ............................................... 135
6.4.4 Summary of variation by year of publication. ............................................... 136
6.4.5 Variation by type of article (synchronic). ....................................................... 137
6.4.6 Summary of variation by type of article. ....................................................... 139
ix
6.5 Cluster Analysis .................................................................................................... 140
6.5.1 Preferred cluster solution and scoring patterns by factor and cluster. ........... 141
6.5.2 Synchronic and diachronic clustering patterns. ............................................. 145
6.6 Critical Discourse Analysis/Discourse-historical Approach ................................. 149
6.6.1 Excerpts from texts representative of Factor 2. ............................................. 150
6.6.2 Excerpts from texts representative of Factors 4 and 8. .................................. 154
6.6.3 Excerpts from texts representative of Factor 11. ........................................... 159
6.6.4 Excerpts from texts representative of Factors 6 and 10. ................................ 163
6.6.5 Topoi. ............................................................................................................. 169
7. Discussion................................................................................................................... 172
7.1 Research Question 1 ............................................................................................. 172
8. Conclusion ................................................................................................................. 186
8.1 Methodological Comparison................................................................................. 187
8.2 Language-related Discourses, Language Ideologies, and Ethnonationalism: What It
All Means .................................................................................................................... 189
8.3 Implications .......................................................................................................... 195
8.4 Limitations ............................................................................................................ 196
8.5 Future Research .................................................................................................... 197
References ...................................................................................................................... 205
Appendices ..................................................................................................................... 215
x
Appendix A: Sampling Procedures ............................................................................. 215
Appendix B: Comparative Analyses of Comparator Corpora .................................... 221
Appendix C: Keyword Analysis (SERBCORP) ......................................................... 224
Appendix D: Keyword Analysis (5+ Hits Section of SERBCORP) ........................... 231
Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits
Section of SERBCORP as the Reference Corpus) ..................................................... 235
Appendix F: Collocation Analysis (SERBCORP) ...................................................... 238
Appendix G: Collocation Analysis (5+ Hits Section of SERBCORP) ....................... 252
xi
List of Tables
Table 1 Research Design: CL and CDA Investigation of Language-related

Newspaper Discourse…………………………………………………….36
Table 2 Composition of SERBCORP (by Publication)…………………………..45
Table 3 Number of Articles in SERBCORP (by Year and Publication…………...45
Table 4 Article Mean Lengths, Standard Deviations, and STTR

(by Publication)…………………………………………………………..46
Table 5 Composition of SERBCOMP (by Publication)………………………….46
Table 6 Articles in SERBCORP (by Hit Count for the Lemma JEZIK and
Percentage)……………………………………………………………….46
Table 7 Composition of the 5+ Hits Section of SERBCORP (by Publication)…..47
Table 8 Number of Articles in the 5+ Hits Section of SERBCORP (by Year

and Publication).........................................................................................47
Table 9 Article Means, SD, and STTR in the 5+ Hits Section of SERBCORP
(by Publication).........................................................................................47
Table 10 Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP

(by Keyness Score)....................................................................................70
Table 11 Negative Key Lemmas in the 5+ Hits Section of SERBCORP (by

Keyness Score)…………………………………………………………...71
Table 12 Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP

with the 1-4 Hits Section of SERBCORP as the Reference Corpus
(by Keyness Score)………………………………………………………74
Table 13 Negative Key Lemmas in the 5+ Hits Section of SERBCORP with

the 1-4 Hits Section of SERBCORP as the Reference Corpus
(by Keyness Score)………………………………………………………75
Table 14 Positive Key Semantic Domains in the 5+ Hits Section of

SERBCORP with the 1-4 Hits Section of SERBCORP as the
Reference Corpus (by Rank)……………………………………………..76
xii
Table 15 Ethnolinguistic Identity-related Key-keywords and Key-keyword
Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits
Section of SERBCORP as the Reference Corpus (by Rank/Number of
Texts)…………………………………………………………………….84
Table 16 Factor-related Key-keywords and Key-keyword Associates in the 5+

Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP
as the Reference Corpus (by Rank)……………………………………...86
Table 17 Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+

Hits Section of SERBCORP (by Frequency)……………………………89
Table 18 Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+

Hits Section of SERBCORP (by MI Score)……………………………..90
Table 19 Sample of the Most Frequent N-grams in the 5+ Hits Section of

SERBCORP (by Number of Constituents and Frequency)……………...94
Table 20 Descriptive Statistics for the Variables in the 12-factor Solution

(N = 943, k = 107)……………………………………………………….97
Table 21 First 13 Eigenvalues of the Unrotated Factor Analysis

(N = 943, k = 107)…………………………………………………….....99
Table 22 Rotated Factor Pattern for the 12-factor Solution (Varimax Rotation)…100
Table 23 Summary of the Factorial Structure (Collocates in Parentheses were

not Used in the Calculation of Factor Scores)………………………….103
Table 24 Top 20 Highest Scoring Articles on Factor 1

(MV Outliers are in Bold)………………………………………………105





xiii





Table 36 Descriptive Statistics for Language-related Discourse Factor Scores

by Publication…………………………………………………………..133

by Year of Publication………………………………………………….135

by Type of Article………………………………………………………137
Table 39 Descriptive Statistics for Twelve Factors (Predictor Variables) in a

Six-cluster Solution.................................................................................141
Table 40 Results of ANOVA for Twelve Factors (Predictor Variables) in a

Six-cluster Solution.................................................................................143
Table 41 Discursive Links Between Twelve Factors Based on Highest Mean

Scores for Six Clusters………………………………………………….143
Table 42 Cluster Membership by Publication for the Six-cluster Solution………146
Table 43 Cluster Membership by Year of Publication for the Six-cluster

Solution…………………………………………………………………146
Table 44 Cluster Membership by Type of Article for the Six-cluster Solution…..146
xiv
List of Figures
Figure 1. Diagram of the analytical process………………………………………..38
Figure 2. Distribution of 5+ hit articles (by year, all publications)………………...48
Figure 3. Concordance lines for postoji ‘exists’ in the 5+ hits section of

SERBCORP……………………………………………………………...81
Figure 4. Scree plot of eigenvalues………………………………………………...99
Figure 5. Concordance lines for rasparčavanje ‘partitioning’ in SERBCORP…..192
xv
1. Introduction
1.1 Definition of Problem
Although the standard varieties of the Central South Slavic diasystem are virtually
fully mutually intelligible, in this region, as elsewhere, language has long been a primary
tool in the construction and maintenance of separate ethnonational identities and hence
highly contested (Greenberg, 2004). After a period of relative political stability and
formal linguistic union under the label of Serbo-Croatian in the former Yugoslavia (1945-
1991), the contestation reemerged and intensified with the dissolution of the federal state
and the initiation of the concomitant projects of (re-)construction of ethnonational
identities and states. The linguistic consequences of the dissolution include the “nominal
language death” of Serbo-Croatian, the emergence of four successor varieties bearing
ethnic labels (Bosnian, Croatian, Montenegrin, Serbian), and the formulation of
considerably different language policies in the new states (Bugarski, 2004). Despite this,
or perhaps precisely because of it, the contestation continues as identity and nationhood
continue to be negotiated. Most recently, in the summer months of 2013 three fierce
public language debates took place in three of the four successor states. In Bosnia-
Herzegovina, the debate, which took place in the period leading up to the country’s first
postwar census (October 2013), centered on the legitimacy of the name of one of the
three official languages, Bosnian. In Croatia, the debate centered on the reintroduction of
biscriptal public signs including the Cyrillic alphabet and the sometimes violent protests
against it in the easternmost Croatian city of Vukovar which has a sizeable Serb minority.
Finally, in Serbia the debate centered on the recognition of Bosnian as a minority
language, which came as a direct consequence of the country’s accession negotiations
1
with the European Union.
Underlying these and similar debates are language ideologies, for the present
purposes best defined as “the cultural system[s] of ideas about social and linguistic
relationships, together with their loading of moral and political interests” (Irvine, 1989, p.
255). Because they function as a mediating link between linguistic and social practices
language ideologies are “not about language alone” (Woolard, 1998, p. 3). Rather, they
often serve as tools for the invention of tradition which makes possible the “imagined
community” of the nation (Anderson, 1983), especially through deployment in unifying
public institutions such as print. For this reason, Silverstein (1998), for example, points
to the discursive practices in institutions as especially productive for the study of
language ideology. Available research suggests that mass media, and newspapers in
particular, are a primary site for discursive and ideological reproduction in modern
societies (e.g., Fowler, 1991; see also papers in Johnson & Ensslin, 2007). Studies of
language ideology thus often focus on public institutional discourses and those of
newspapers in particular, as “newspapers are self-conscious loci of ideology production”
(DiGiacomo, 1999, p. 105) as well as “key sites for language ideological debates between
various kinds of social actors” (Ensslin & Johnson, 2006, p. 155). Furthermore, if we
accept the view of newspapers as a discourse community with which the audience
identifies, it follows that “their average lexicon shapes, describes and expresses what is
accepted by [the] community” itself (Bassi, 2010, p. 209). This latter point is particularly
important for a lexical approach to dominant public language-related discourses and
language ideologies proposed in this study.
The principal goal of this study is to investigate the links between language-
2
related discourse and language ideologies and ethnonationalist discourse in mainstream
newspapers published in Serbia as the largest Central South Slavic nation. Language
ideologies have been closely related to nationalist discourses since the inception of
nationalism and the one-nation-one-language-one-territory trope (e.g., Bauman & Briggs,
2000). They continue to be important for the construction of national identities in Europe
and elsewhere, and in evidence in news writing (e.g., Blommaert & Verschueren, 1998).
Language-related discourse in the West Central Balkans in the last twenty years has been
dominated by a debate over the ownership of the common language and a concomitant
contestation of ethnonational identities which are still widely regarded in this area to rest
on linguistic distinctiveness. Although recent research has examined language debates
(e.g., Blommaert, 1999) and the links between language ideology and national identity in
plurilingual and multicultural societies (e.g., Canada, Vessey, 2013a and Spain/Catalonia,
Pujolar, 2007), little attention has been paid to contexts with minimal linguistic
differences between groups such as the West Central Balkans, particularly from a
quantitative or mixed methods perspective. This study is an attempt to close that gap.
1.2 Sociolinguistic History and the Role of Language Ideologies in Contemporary
Balkans
A nation has nothing holier nor dearer than its natural language, for it is only
through language that a nation, as a particular society, continues or vanishes.
Ljudevit Gaj (1835)
1.2.1 The symbolic importance of language. Language is one of the key
defining characteristics of Homo sapiens as a species1 and as such it has always been of
paramount importance to humans: as a means of communication, as an identity marker,
as a cultural tool. Its importance as an ideological tool in the struggle for hegemonic
3
power in late modernity, however, has been growing further still (Bourdieu, 1991;
Fairclough, 2001; Foucault, 1972; Habermas, 1989; Skutnabb-Kangas, 2000). Although
the societies of western and central Balkans, now collectively known as the former
Yugoslavia, do not always fit neatly in the category of late modernity, the significance of
language in both the traditional and postmodern senses is perhaps nowhere as great.
Indeed, as Robert Greenberg (2004) notes in the conclusion to his book Language and
identity in the Balkans: Serbo-Croatian and its disintegration, “in the former Yugoslavia
the power of language has at times reached absurd proportions” (p. 159). In the former
Yugoslavia, one might add echoing Ljudevit Gaj’s words from the epigraph above,
language ideology produced shibboleths that at times meant the difference between life
and death.
Although such language (and language-ideological) conflict is not unique but
represents a possible “sociolinguistic universal” (Ford, 2001), the language situation in
the former Yugoslavia is complex and can only be understood with reference to the
historical and sociopolitical trajectories of the region. Current academic treatments of
language in late modernity, especially vis-à-vis globalization, almost invariably discuss
colonialism as the backdrop for sociolinguistic developments globally (e.g., Wright,
2004). Ironically, despite the oft-repeated references to the region’s history, discussions
of language in the former Yugoslavia, academic and otherwise, often fail to appreciate the
importance of the region’s own colonial history, which, it is sometimes forgotten, is
longer than most and, unlike most, is the product of both Western and Eastern
imperialisms. Beyond the direct physical, political, legal, cultural, linguistic, etc.,
impacts of the dueling colonialist enterprises, and at least as important, is the impact of
4
the seventeenth- and eighteenth-century Western language ideologies, which were
received “enthusiastically” in Eastern Europe (Edwards, 1985; for an illustrative
example, see Gal, 2001), and as I hope to show, especially so in the Balkans (cf. Irvine &
Gal, 2000, especially pp. 60-71). Here, two strands of thought are particularly important:
the Lockeian rationalization of language in the empiricist tradition of the seventeenth-
century English Enlightenment, and even more so, the Herderian idealization of folk
language in the spirit of the eighteenth-century German Romanticism (Bauman & Briggs,
2000). Finally, it is especially important to recognize the characteristic
instrumentalization of language in ethnonationalist projects that these ideologies made
possible. Anachronistic yet coinciding with “the rise of small nations” (Wright, 2004) at
the end of the twentieth century and the beginning of the twenty-first, the Balkan
ethnonationalisms revived the Herderian ideal of a one-to-one equation between nation,
language, and territory and employed it in their projects of “contrastive self-
identification” (Fishman, 1972, p. 58), which have themselves been characterized by an
obsession with (purported) linguistic authenticity. The outcome has been what Greenberg
(2004), following Heinz Kloss, called the “nominal language death” of Serbo-Croatian,
the erstwhile common yet polycentric standard (Kordić, 2010), as well as a multiplicity
of mutually contested language ideologies (cf. Gal, 1998), waving the successor
languages as “flags” (Friedman, 1999). It is these resulting language ideologies, deeply
involved in the ethnonationalist projects through their deployment in nation-building via
public discourses, that I propose to examine in this dissertation.
As already noted, for a proper understanding it is necessary to anchor any analysis
of language ideology in its historical and sociopolitical contexts (cf. Irvine & Gal, 2000).
5
I will therefore first provide a brief discussion of the language history, language situation,
and language politics of the region, before moving on to offer a rationale and delimitation
for this study. The following account draws on Greenberg (2004), a rare comprehensive
and fairly neutral treatment of the topic,2 and to a lesser extent on Katičić (1997),
Carmichael (2000), and Dronjic (2011), as well as my own emic perspective.
1.2.2 A brief sociolinguistic history of West Central Balkans. The Slavic
languages of the former Yugoslavia (with the addition of Bulgarian) form the southern
part of the Slavic language group. Slovenian at the northwest and Macedonian at the
southeast ends of the area are separate languages of the Abstand type (Kloss, 1967),
whereas the larger central part of the area (Central South Slavic), spanning Croatia,
Bosnia-Herzegovina, Serbia, and Montenegro, features a continuum of a small number of
mutually comprehensible dialects, with one particular dialect (the Neo-Štokavian)
spanning all four countries and all four ethnic groups (i.e., Bosniaks, Croats,
Montenegrins, and Serbs). Considering the long and varied colonial history of the
region3 and the concomitant differences in culture and religion,4 it is remarkable that a
common dialect would develop and survive over the centuries. This became especially
important in the nineteenth century as the peoples of this region sought independence
and, following the trend in the rest of Europe, the formation of their own nation-states.
Concurrently with the drive for independence, however, a pan-Slavic linguapolitical
movement came into existence in Croatia (the Illyrians) which sought the unification of
South Slavs and their language based on a common dialect. Linguistic unification being
a more realistic goal than political unification at the time,5 a group of Serbian and
Croatian linguists and literary figures met in 1850 in Vienna, Austria to produce what
6
would become known as the Vienna Literary Agreement, which is widely considered to
be the inception of a common language standard for the Central South Slavic area (but
see Katičić, 1997 for a problematization of this view). The agreement was non-binding,
however, and it did not venture far beyond the status planning decision to base the
common standard on the “southern dialect” (i.e., the Neo-Štokavian) rather than on an
artificial amalgam of existing dialects (for the original text and an English translation of
the agreement, see Greenberg, 2004, pp. 168-171). Crucially, the name for this new
common literary standard was left unspecified.6
It is a truism, as Ricento (2006a), for example, notes, that language is inextricably
linked to power. The common standard that had been agreed upon in 1850 in Vienna
would therefore have to wait for the political circumstances to change to be implemented.
However, by the time such circumstances materialized at the end of World War I, the
politics of language in the region had also changed. Despite an initial defeat in the war
with Austria that began in the wake of the assassination of Archduke Franz Ferdinand of
Austria in Sarajevo, Serbia eventually recovered and joined her Triple Entente allies in
victory over Germany and Austria-Hungary. Owing to the war effort and continued
Russian support, Serbia was then granted its own regional sphere of influence, which
resulted in the formation of the Kingdom of Serbs, Croats, and Slovenes in 1918, the first
joint South Slavic state. Importantly, Serbia was the only South Slavic nation that
entered the union as a military power and an independent state,7 while ethnic groups
other than Serbs, Croats, and Slovenes (i.e., Bosniaks, Macedonians, and Montenegrins)
did not receive any political recognition at this time. Much like the Vienna Literary
Agreement, the new state turned out to be largely a Serbo-Croatian affair, wherein the
7
Croats fought to resist the sometimes real, sometimes perceived Serbian hegemony.8 The
political bickering destabilized the country, so in 1929 the reigning Serbian monarch,
King Alexander, seized the opportunity to change the constitution and with it,
symbolically, the country’s name into Yugoslavia.9
The period between 1850 and 1929, on the other hand, saw more status and
corpus planning work (see Greenberg, 2004, p. 54, for an overview of landmark events).
But, despite a shared dialect, the region harbored a number of quite disparate linguistic
traditions: there were two different alphabets in use (Latin and Cyrillic) and three
alternative pronunciations (Ekavian, Ikavian, and Ijekavian), as well as differences in
lexis and linguistic culture (e.g., in attitudes toward popular speech as the basis for a
standard). In addition to this, the standardization in Serbia had proceeded along divergent
lines since its independence in 1878. In 1913 Jovan Skerlić, a Serbian linguist, thus
attempted to resolve the major differences by proposing a compromise whereby the Serbs
would give up Cyrillic for the Latin alphabet while the Croats would switch from the
Ijekavian to Ekavian (i.e., Serbian) pronunciation, but this was never seriously
entertained. Furthermore, concurrently with King Alexander’s drive for a tighter union
and with government support, another prominent Serbian linguist, Aleksandar Belić,
published in 1930 an orthographic manual for Serbo-Croatian as an attempt to implement
Skerlić’s proposal with regard to pronunciation choice in Serbian favor. Needless to say,
this was opposed and even resented by most Ijekavian speakers (Bosniaks, Bosnian and
Croatian Serbs, Croats, and Montenegrins), but only the Croats had any political clout to
resist the Serbs. This was a prelude to a period of rapidly worsening inter-ethnic
relations, particularly between the Serbs and Croats, which culminated in the Yugoslav
8
capitulation after a brief war with Nazi Germany and the formation of a Croatian Nazi
puppet state (NDH) in 1941. Eager to dissociate Croatian from Serbian because of the
(perceived) implications of a close association for Croatian national identity, the ultra-
nationalist NDH government moved immediately upon its establishment to declare
Croatian as a separate language and embarked upon an aggressive program of re-
standardization, switching from the common phonetic to an etymological writing system
and introducing numerous archaisms and neologisms alike, among other innovations.
Subsequently, it also embarked upon an equally aggressive campaign of ethnic cleansing
which culminated in the genocide against the Croatian Serbs (and Roma).
At the same time, the Yugoslav Communist Party guerilla force, which was
composed of members from all ethnic groups, was gaining strength; headed by Josip
Broz Tito, the Partisans first founded the second Yugoslav state in 1943 and then defeated
both the German and Italian occupiers and their mostly Croatian and Serbian
ultranationalist collaborators (Ustashas and Chetniks). The subsequent language policy
of the Communist government is widely considered to have been committed to equality
among the constituent peoples (i.e., ethnic groups) and tolerance of minorities, although it
was not immune to certain problematic compromises. In accordance with its ideology of
“brotherhood and unity”, the second Yugoslavia eventually reinstated the common Serbo-
Croatian standard as the official language, while also recognizing Slovene and
Macedonian, as well as a number of minority languages such as Albanian; Bosniaks and
Montenegrins were left out again, however.10 In order to resolve some of the old issues
and chart a new course, a new meeting of linguists was called in 1954 in Novi Sad,
Serbia. Again, as in 1850, the meeting in Novi Sad included only Serb and Croat
9
linguists. The compromise agreement they reached, known as the Novi Sad Agreement,
consisted of ten conclusions concerning status and corpus planning (for the original text
and an English translation of the agreement, see Greenberg, 2004, pp. 172-174). The
agreement established a bi-centric new standard: now officially named Serbo-Croatian or
Croato-Serbian, it would have two equal “variants”, an Eastern/Ekavian and a
Western/Ijekavian one (i.e., Serbian and Croatian), with equal use of the two alphabets
(Latin and Cyrillic) throughout. Also agreed was joint codification work on new
orthographic manuals, grammars, and dictionaries, as well as terminology development.
Some Croatian linguists now argue that Serbo-Croatian never really existed (e.g.,
Barić et al., 1999; Katičić, 1997), whereas Serbian linguists generally reject this thesis.
Greenberg (2004), for his part, points to the fact that linguistic unification, at least the
original one in 1850, was not forced upon the Serbs and Croats by anyone, and this is
certainly true (even though this view ignores the fact of imposition of the linguistic union
on Bosniaks and Montenegrins). However, it is important to note that the historical and
political circumstances in the region, the relations between the different ethnic groups,
again especially Serbs and Croats, as well as the significance of the language issue, had
all drastically changed by 1954. This time, the Serb and Croat linguists produced what
now seems a mere tactical agreement, likely expecting that it would not last. And, of
course, it didn’t. Some twelve years later, in 1966, first an unauthorized dictionary of
Serbian with clear nationalist and anti-Communist overtones was published
(Moskovljević, 1966), then, a year later, the Croats responded by issuing the “Declaration
on the Name and Position of the Croatian Literary Language” which was a direct
challenge to the Serbo-Croatian common standard; the joint codification work that was
10
underway would soon stop. Despite a swift intervention by the federal authorities, who
rightly saw this turn of events as a danger to ethnic relations in Yugoslavia, the writing
was on the wall: the project of unification was over and it was only a matter of time
before Serbian and Croatian parted ways once more.
Greenberg (2004, p. 32) cogently notes that this language conflict was
symptomatic of a more general restructuring of the federal state, which was moving
toward decentralization; de jure devolution of a number of important powers from the
federal to the republic (i.e., state) level was finally enshrined in the 1974 rewriting of the
constitution. In terms of language policy, this is particularly significant because the new
constitution devolved also language policy to the republic level, effectively opening the
door to the introduction of a polycentric standard. This was, of course, seized upon by
Croatia, but equally importantly, also the authorities in Bosnia-Herzegovina and
Montenegro, who had had no voice in any of the previous decisions.11 Of course, official
federal policy, which in the Communist-run Yugoslavia had the force of a dogma, made it
impossible to change the name of the language from Serbo-Croatian or Croato-Serbian
into anything else, but new standard varieties were nevertheless introduced under the
euphemistic label “standard linguistic idiom”. Mindful of the ethnic heterogeneity of
Bosnia-Herzegovina, the republic’s authorities opted for a pan-ethnic standard which
included elements from the idioms of all three major ethnic groups (Bosniaks, Croats,
and Serbs) but was anchored in the idiom of the Bosniaks as the largest group, some
elements of which in their turn had come to be shared by Bosnian Serbs and Bosnian
Croats (e.g., frequent use of Turkish loans in everyday speech); both alphabets remained
in use and retained an equal status.12 However, the Serbian intellectual elite largely
11
interpreted this development as a threat to both the integrity of Yugoslavia and the ethnic
identity of their co-ethnics outside of Serbia, who made up a sizeable minority in Croatia
and roughly a third of the population of Bosnia-Herzegovina. This tension would simmer
more or less quietly until the resurgence of open nationalism after the Yugoslav
Communist Party had relinquished power and called free elections in 1990.
The “contrastive self-identification” between the different ethnic groups
mentioned above was on full display during the election campaigns of 1990, while
languages as the cornerstones of ethnonational identities were, indeed, waived as “flags”.
Needless to say, such a development did not bode well for the federation and the issue of
the political future of the country became “the question of all questions” during the
campaigns and particularly after the elections. As the largest ethnic group spread over
several republics and one that was effectively in control of the federal state as well as
overrepresented in the oversized federal military, Serbs stood to lose the most from a
dissolution of Yugoslavia; everyone else stood to gain their independence. These
positions proved to be irreconcilable in the ensuing round of negotiations on the future of
the federation by the newly elected presidents of the republics. The resulting stalemate
was broken by a declaration of independence by Slovenia, Croatia, and Macedonia in
1991, followed by Bosnia-Herzegovina in 1992; Montenegro remained in a federation
with Serbia until 2006 when it too regained independence. With the exception of Bosnia-
Herzegovina where this issue was more complicated on account of its ethnic
composition,13 each newly independent country declared its majority language official
and the unified Serbo-Croatian standard thus formally ceased to exist (see Sudetic, 1993
for a contemporary report). Having firm control over the powerful Yugoslav military, the
12
Serbs rejected these declarations of independence, turning political into military conflicts
with war flaring up first in Slovenia, then in Croatia, and finally in Bosnia-Herzegovina.14
After a series of wars, including the longest and particularly vicious Bosnian War which
culminated in a twin aggression against Bosnia-Herzegovina by Serbia and Croatia and
genocide against the Bosniaks by the Serbs, the former Yugoslavia metamorphosed into
seven independent states, each with its own language policy, replacing Serbo-Croatian
with Bosnian, Croatian, and Serbian as official languages in Bosnia-Herzegovina,
Croatian in Croatia, Montenegrin in Montenegro, and Serbian in Serbia (see Bugarski,
2004).
1.2.3 Language, identity and ethnonationalism in contemporary Balkans. It is
clear from the discussion above that despite a roughly 150-year-long history of
unification attempts, Central South Slavic has always been and remains a polycentric
language (cf. Kordić, 2010). At the same time, this polycentricity and the right to codify
and particularly name varieties has been fiercely contested since the nineteenth century
by the Serbian and Croatian intellectual elites, which have, with varying degrees of
success, continually and intensely ideologized the “common” language in order to
manipulate ethnonational identities on the basis of purportedly objective scientific
theories of language and nationalism. Hence, it has been of little consequence that,
A potentially classical example to disprove the existence of objective criteria of

nationhood is a comparison between the Serbs and Croats, on the one hand, and
the Flemish and the Dutch, on the other. In the Serbian-Croat case, existing
linguistic differences (underscored by a different orthography) have become
highly symbolic for the discontinuity, whereas in the Flemish-Dutch case (where
the linguistic differences are of almost exactly the same type and degree)
language is the main symbol of cultural unity. On all other accounts, the
differences are completely analogous as well – history (Ottoman rule for Serbia
versus Spanish rule for Flanders, resulting in long periods of political separation
from Croatia and Holland, respectively) and religion (Orthodox versus Catholic in
13
the one case, Catholic versus Protestant/Calvinist in the other” (Blommaert &
Verschueren, 1998, p. 199).
As noted above, similar to the rest of Eastern Europe the peoples of the Balkans have
embraced the essentialist Western language ideology which views language as the
embodiment of the character of the “natural” group that produced it, i.e. the nation (see
Bauman & Briggs, 2000). Consequently, language has been a primary site for the
construction of and struggle over ethnonational identity and the concomitant group rights
as “ideologies that appear to be about language, when carefully reread, are revealed to be
coded stories about political, religious, or scientific conflicts” (Gal, 1998, p. 323). This is
evident in public and scholarly discourses on language both in Europe and in the Balkans.
In a study of the role of language in European nationalist ideologies, Blommaert and
Verschueren (1998), for example, note that “the absence of the feature ‘distinct language’
tends to cast doubts on the legitimacy of claims to nationhood” (p. 192). Furthermore, as
Irvine and Gal (2000) argue, in “the political contestation surrounding contrasting
scholarly claims [i]n Macedonia, linguistic relationships came to be used as authorization
for political and military action that changed sociolinguistic practices, thereby bringing
into existence patterns of language use that more closely matched the ideology of
Western Europe” (p. 60). But it would be naïve, of course, to think that Western ideology
has been merely passively adopted. As the recent localizations of the contemporary
Western discourse on terrorism in the Serbian and Croatian press (Erjavec, 2009) show,
Western ideologies are also appropriated and function on the basis of “fractal recursivity”
(Irvine & Gal, 2000, p. 38) to serve specific local purposes. As Irvine and Gal (2000)
further contend, “[t]he continuing intensity of contestation” over language and
ethnonational identity is thus “hardly surprising, given the consequences envisaged and
14
authorized by the reigning language ideology and occasionally enacted under its auspices.
It is an ideology in which claims of linguistic affiliation are crucial and exclusivist
because they are also claims to territory and sovereignty” (p. 72, my italics). Seen in this
light, the continuing Serbian (and in the case of Bosnian, also Croatian) refusal to
recognize other groups’ ownership rights, including the right to name their language (see
Tošović, n.d., for a rationalization of this contestation), become rather transparent.
Although, as we have seen, language (ideological) debates have continued throughout the
period of the “joint” language development, the period between 1990 and now is
especially significant because it has seen the end of Serbian (and Croatian) lingua-
cultural hegemony and an unraveling of the “common” standard, particularly in Bosnia-
Herzegovina and Montenegro. These processes have been reflected in and partly
constituted through discourses produced for and directed at the public “as a language-
based form of political legitimation” (Gal & Woolard, 2001, p. 4). What is still missing
in this particular case, however, is empirical attestation of these dominant ideologies in
order to determine their specific contents and modi operandi. But, before we turn to the
methodological approach to the identification of language ideologies in this dissertation,
let us take a detailed look at the concepts of ideology and discourse, which are often
contested themselves.
1.3 Study Outline
The remainder of this study is organized as follows. Chapter 2 presents a
literature review, divided into sections about theoretical approaches to ideology and
language ideology, and empirical approaches to language ideology. In Chapter 3 a
detailed overview of the study is given, including the research questions, research design,
15
construct definitions, and gaps. Chapters 4 and 5 discuss data and methods employed,
while Chapters 6 and 7 present the results and discussion (by research question). The
conclusion, limitations, and suggestions for future research are given in Chapter 8.
Appendices A-G detail relevant preliminary analyses and show full lists of keywords and
collocations.
2. Literature Review
2.1 Theoretical Approaches to Ideology
2.1.1 Historical development. As is well known, the concept of ideology
originates from the late eighteenth century. The term was coined by the French
philosopher Destutt de Tracy who, in typical Enlightenment fashion, conceived of
ideology as a positivistic science of ideas based in physiological sensations (a branch of
zoölogy), optimistically hoping to arrive at a full understanding of the human mind and
achieve a rational organization of society (Eagleton, 1991; Silverstein, 1998; Woolard,
1998; see also Aarsleff, 1982). As both Silverstein and Eagleton note, however, similar to
many other terms ending in “-ology” the meaning of ideology quickly shifted from
“field-of-scientific-study” to “object-of-scientific-study” (Silverstein, 1998, p. 139, Note
1) and thus from “scientific study of human ideas” to “systems of ideas themselves”
(Eagleton, 1991, p. 63). Complicating matters further, over time the term has developed
“a whole range of useful meanings, not all of which are compatible with each other”
(Eagleton, 1991, p. 1; see also Blommaert, 2005).
2.1.2 Theoretical approaches. Different authors emphasize different aspects of
this semantic and conceptual quagmire. The literature on ideology is vast, spanning
many different fields and research traditions, so what follows is perforce a selective but,
16
for present purposes hopefully, functional treatment (for surveys, see, for example,
Eagleton, 1991; Thompson, 1984; and van Dijk, 1998). Woolard and Schieffelin (1994)
point to two basic divisions in the study of ideology, which are also applicable to the
study of language ideology. The first concerns the truth value of ideology and
differentiates between neutral and critical uses of the term, whereby neutral views of
ideology are ideational and all-encompassing (i.e., “all cultural systems of
representation”), whereas critical views of ideology are instrumental and particularistic
(i.e., “aspects of representation and social cognition with particular social origins”). This
is paralleled by Blommaert’s similarly theoretical distinction between ideology as “a
generalizing phenomenon characterizing the totality of a particular social or political
system, and operated by every member or actor in that system” and ideology as “a
specific set of symbolic representations […] serving a specific purpose, and operated by
specific groups or actors” (2005, p. 158, italics in the original). Recognizing this basic
division between “the wider and narrower senses of ideology”, Eagleton (1991, p. 3), on
the other hand, points also to interpretations of ideology as “illusion, distortion and
mystification” (i.e., concerned with the truth value of ideology) and those instead
“concerned with the function of ideas within social life” (i.e., the metapragmatics of
ideology). Most authors seem to agree that there is a further general distinction to be
made between the competing views of ideology: that between views of ideology as a
cognitive, ideational phenomenon (i.e., “group-schemata”, Blommaert, 2005, p. 162)
existing below the level of discursive consciousness, and those which see ideology as a
materialist phenomenon, a set of signifying practices arising from lived relations
(Althusser, 1971).
17
This latter point is particularly important for the second division in the study of
ideology as noted by Woolard and Schieffelin, which is epistemological and concerns
what they call “the siting of ideology”. In their own words, “[a]lthough ideology in
general is often taken as explicitly discursive, influential theorists have seen it as
behavioral, pre-reflective, or structural, that is, an organization of signifying practices
[which need not be linguistic] not in consciousness but in lived relations” (Woolard &
Schieffelin, 1994, p. 58). Put differently, depending on one’s viewpoint, traces of
ideology can be located either in explicit manifestations of discursive consciousness (i.e.,
metapragmatic commentary, cf. Silverstein, 1993), or they must be inferred from the
totality of lived relations in a particular community (e.g., linguistic usage or other
signifying practices). It will be noted that this distinction has methodological
implications for the study of ideology as well as language ideology which are most
pertinent for this dissertation; I shall therefore return to this issue toward the end of this
chapter.
2.1.3 Definitions. As a consequence of the multiplicity of contributions from
different fields, research traditions, and political positions noted above, definitions of
ideology abound. I rely here on the account provided by Terry Eagleton in his widely
cited book Ideology: An introduction (1991). Following Raymond Geuss, Eagleton
makes a distinction between “descriptive”, “pejorative” (cf. neutral vs. critical above) and
“positive” definitions of ideology. He goes on to formulate “descriptive” definitions of
ideology as those that view ideology as “belief-systems characteristic of certain social
groups or classes, composed of both discursive and non-discursive elements”;
“pejorative” definitions of ideology as those that view ideology as “a set of values,
18
meanings and beliefs which is to be viewed critically or negatively” because it
legitimates an unjust social order; and “positive” definitions of ideology as those that
view ideology as “a set of beliefs which coheres and inspires a specific group or class in
the pursuit of political interests judged to be desirable” (Eagleton, 1991, pp. 43-44).
Woolard (1998) cites the anthropologist Clifford Geertz and sociologist Karl
Mannheim as two of the most prominent advocates of a “descriptive” understanding of
ideology as the totality of social knowledge and a medium of meaning for social
purposes. But, as she also notes, the chief criticism of such conceptions of ideology is
that they neglect power relations, which is precisely the focus of the “pejorative”
definitions. Perhaps the most widely cited, but now also most widely rejected pejorative
definition (see Eagleton, 1991, especially pp. 10-26), is of ideology as “false
consciousness”, originally put forward by Karl Marx and Friedrich Engels in their book
The German Ideology. This understanding of ideology implies an accommodation and
acquiescence on the part of the proletariat to the bourgeois hegemony (Gramsci, 1971)
such that members of the working class are unable to identify their true class interests due
to their adherence to the bourgeois worldview. Similar to this is Thompson’s (1984, p. 4)
definition of ideology as “essentially linked to the process of sustaining asymmetrical
relations of power – to maintaining domination […] by disguising, legitimating, or
distorting those relations.” Eagleton finally cites Lenin’s approval of the term “socialist
ideology” as well as its acceptance by other “radical” theorists such as Sorel and
Althusser as an example of a “positive” definition of ideology. Interestingly, while the
“pejorative” and “positive” conceptions of ideology derive from the Marxist tradition and
the work on the political left, the “neutral” conceptions of ideology are ascribed to the
19
work deriving from the neo-Kantian tradition in philosophy and Durkheimeian tradition
in sociology (Blommaert, 2006a). Importantly for the present purposes, however, one
should note that what is understood as “an unjust social order” and “political interests
judged to be desirable” will, of course, depend on one’s ideological position and thus
presents a problem of its own, especially in an environment of pathological
ethnonationalist contestation of identity such as that of contemporary Balkans.
2.1.4 Conceptualizations. Theoretical accounts of ideology, finally, often speak
of conceptual “strands” or “themes” (Woolard, 1998) and “layers” (Blommaert, 2005)
organized in “a progressive sharpening of focus” (Eagleton, 1991). Woolard (1998) thus
identifies four such strands which can be summed up as follows: (1) ideology as
ideational or conceptual, mental rather than social phenomena; (2) ideology as reflective
of particular social experiences or interests, dependent on the material aspects of human
life; (3) ideology as signifying practices in the struggle for power; and (4) ideology as
distortion which can, but need not, derive from an interest in the legitimation of power.
Somewhat similarly, Eagleton (1991) conceptualizes ideology in a series of six steps,
proceeding from the very general and neutral definition of ideology as “the whole
complex of signifying practices and symbolic processes in a particular society” (p. 28) to
the specific and critical definitions of ideology as “ideas and beliefs which help to
legitimate the interests of a ruling group or class specifically by distortion” (p. 30), and
similarly false beliefs arising from the material structure of society rather than the
interests of a dominant class (e.g., commodity fetishism). Blommaert (2005), on the
other hand, points to the synchronization of different layers of ideology, i.e., “different
ideologies […] operat[ing] at different levels of historicity” (p. 175) but at the same
20
historical moment, as the likely cause of the “terminological muddle” (p. 161).
According to Blommaert, then, the multifacetedness of ideology is primarily a
consequence of the concurrent and, more often than not, opaque operation of multiple
ideologies of different orders and varying historical trajectories. But perhaps the most
important, if also the most general, conceptualization of ideology for the purposes of this
dissertation is that of ideology as the fundamental element in the triumvirate it forms with
discourse and power (e.g., Blommaert, 2005; Eagleton, 1991), which points to the
importance of the linguistic dimension of ideology as well as the ideological dimension
of language, and to which we now turn.
2.2 Theoretical Approaches to Language Ideology
2.2.1 Historical development. Although, as Silverstein (e.g., 2000) himself has
noted, the leading early twentieth century American anthropologists and linguists such as
Franz Boas, Leonard Bloomfield, and Edward Sapir did consider the issue of language
ideology only to dismiss it as inconsequential (Benjamin Whorf, according to Silverstein,
is the exception here), the emergence of the study of language ideology is now commonly
dated back to his own work, and his seminal 1979 paper “Language structure and
linguistic ideology” in particular (see Blommaert, 2006b; Kroskrity, 2004; Woolard,
1998). Kroskrity (2004) outlines briefly but usefully the history of twentieth-century
structuralist neglect of speaker awareness and the non-referential functions of language in
both anthropology and linguistics, pointing to the work in the fields of ethnography of
communication and interactional sociolinguistics by prominent figures such as Dell
Hymes and John Gumperz as an important precedent for the later work in language
ideology. On account of this neglect (which de Beaugrande, 1999, contends is due to
21
“scientism”, itself an ideology) and in contrast to ideology, then, language ideology as a
subfield of academic inquiry emerged only in the last two decades of the twentieth
century and is “still under construction” (Blommaert, 2006a). Furthermore, it does not,
as Woolard and Schieffelin (1994) noted, yet have a single core literature, although one is
beginning to coalesce primarily around linguistic anthropological attention to the topic
(see essays in Gal & Woolard, 2001; Schieffelin, Woolard & Kroskrity, 1998, 2000;
Wilce, 2000). There has also been a parallel and related research program in the
framework of critical discourse analysis, which has examined the role of language in
contemporary capitalist societies from a critical perspective (e.g., Fairclough, 2001;
Blackledge, 2005; Blommaert, 2005; Wodak, 2012; van Dijk, 1998), as well as a number
of contributions from applied linguistics generally and language policy and planning in
particular (e.g., Blackledge & Pavlenko, 2002; Blommaert, 1999; Jaffe, 1999; Lippi-
Green, 2007; McGroarty, 2008, 2010; Ricento, 2000).
2.2.2 Theoretical approaches. Research into language ideology has been as
multifarious as research into ideology itself, similarly spanning many different fields and
research traditions. Woolard and Schieffelin (1994) and Woolard (1998) provide wide-
ranging overviews of some of the most important work in the contributing fields and
research traditions. These include ethnography of speaking, multilingualism, literacy
studies, historiography of linguistics, as well as linguistic anthropology, language contact,
colonial studies, sociology, and, perhaps most importantly for present purposes, language
policy, language politics, and studies of identity and nationalism. An important, albeit
sometimes fuzzy, dividing line here has been between research foregrounding the
dialectic between language ideology and social life, referred to above as the
22
metapragmatics of (language) ideology, and research foregrounding the dialectic between
language ideology and the linguistic system itself (e.g., Dirven, Hawkins & Sandikcioglu,
2001; Seargeant, 2009; Silverstein, 1979). Note that, on account of my focus on public
discourses and the role of language ideology in the (re)construction and maintenance of
contemporary ethnolinguistic identities, I am primarily concerned here with the former
and will not be devoting much attention to the latter, even if the two are certainly
complementary and could be fruitfully combined (for an example, see Irvine & Gal,
2000).
2.2.3 Definitions and conceptualizations. Again, similar to ideology, definitions
of language ideology abound, reflecting different conceptualizations and emphases.
Silverstein (1979, p. 193) originally defined language ideologies as “sets of beliefs about
language articulated by users as a rationalization or justification of perceived language
structure and use.” Most other definitions are less concerned with language structure,
however, while beliefs about language, more often than not, are tacit and can only be
inferred by analyzing actual language-related practices and decisions (McGroarty, 2008).
Definitions of language ideology also differ in scope. Rumsey (1990) thus offers a very
general definition of language ideologies as “shared bodies of commonsense notions
about the nature of language in the world” (p. 346). Somewhat more narrowly, Seargeant
understands language ideology as “entrenched beliefs about the nature, function, and
symbolic value of language” (2009, p. 346). Spolsky (2004), on the other hand, focuses
on the pragmatic aspects of language ideologies as belief systems which determine
language attitudes, judgments, and behavior. Curiously, then, many definitions of
language ideology seem to be in the “neutral” camp (see the conceptualizations of
23
ideology above), seemingly eschewing the issue of power relations and the implication
thereof on the understanding of language ideology. Contrary to this trend, and more in
line with my own view, Irvine (1989) sees language ideology as “the cultural system of
ideas about social and linguistic relationships, together with their loading of moral and
political interests” (p. 255), while Errington (2001) makes this even more explicit by
referring to language ideologies as “situated, partial, and interested […] conceptions and
uses of language” (p. 110). Interestingly, although these last two definitions are not quite
neutral in the above sense, exhibiting as they do a critical awareness of the necessary
social situatedness of any ideology, they nevertheless stop short of fully articulating this
particular aspect of the concept. Further, perhaps due to the general poststructuralist
distaste for any, even remotely essentialist notions, there is no mention here of “illusion,
distortion and mystification” or “false consciousness” (but see Spitulnik, 1998, especially
p. 164).
This, however, is not to say that research into language ideology is generally
unaware or neglectful of the power aspect of it. On the contrary, it seems fair to say that
virtually all treatments of language ideology, whatever their foci, pay careful attention to
power and its implications in their analyses. A good example of this is the collection of
essays in Blommaert (1999), which anchors analysis in the power-laden concept of
(political) debate. This is further illustrated by some of the conclusions reached thus far
in the research on language ideology. Where syntheses of research on ideology identify
“strands” and “layers” (see above), syntheses of research into language ideology, more
specifically, speak of “features” or “dimensions” and “levels of organization” (Kroskrity,
2000b, 2004). Kroskrity (2004, pp. 501-509) identified five such levels of organization
24
of language ideology that emerged from the existing literature, three of which clearly
indicate the centrality of the concern with power relations in language-ideological
research: (1) group or individual interests (i.e., “language ideologies represent the
perception of language and discourse that is constructed in the interest of a specific social
or cultural group”); (2) multiplicity of ideologies (i.e., “language ideologies are profitably
conceived as multiple because of the plurality of meaningful social divisions”); and (5)
role of language ideology in identity construction (i.e., “language ideologies are
productively used in the creation and representation of various social and cultural
identities [such as] nationality and ethnicity”).
But even after we recognize that despite the apparent definitional shortcomings,
language-ideological research does, of course, incorporate power relations, the fact
remains that there is a tendency, at least in definitions, to treat language ideology as a
static phenomenon (a set of existing, given beliefs, notions, ideas or conceptions), a fait
accompli as it were. Thus we may know what a particular language ideology is (if, of
course, we can agree on a shared understanding of it), but we are less sure of how it
operates. However, language ideology, as Spitulnik (1998) points out, is both conceptual
and processual, and so our attention must be redirected to specific language-ideological
practices. When we adopt such a sociolinguistically dynamic view, we begin to
understand language ideologies not only as “ideas with which participants and observers
frame their understandings of linguistic varieties” but also as processes through which
they “map those understandings onto people, events, and activities that are significant to
them” (Irvine & Gal, 2000, p. 35), at which point we can finally consider their effects and
consequences.
25
2.3 Empirical Approaches to Language Ideology
This section provides a brief look at the origins and the development of the study
of language ideology, and the theoretical and methodological contexts for the current
investigation. This is followed by a discussion of the gaps in the existing literature and
the research questions addressed by the present study.
2.3.1 Theoretical and methodological contexts in language ideology research.
Existing research into public discourses and language ideologies is largely
interdisciplinary and can be classified broadly into one of three main thematic foci:
language ideology and language education (e.g., Hornberger & McKay, 2010); language
ideology and identity, ethnicity, and nationalism (e.g., Kroskrity, 2000a); and language
ideology and social justice (e.g., Blackledge & Pavlenko, 2002). However, it should be
noted that these often overlap as language-in-education policies, for example, can have
implications for ethnic and national identities as well as social justice (e.g., Lippi-Green,
2007). The theoretical and methodological orientations are similarly heterogeneous.
Perhaps due to the influence of (critical) discourse analysis, the theoretical approaches
have been eclectic and often unsystematic, drawing on and combining a wide range of
theoretical frameworks from various disciplines. The methodological approaches have
been predominantly qualitative (ethnography, historiography, discourse analysis, and
linguistic-philosophical and linguistic-theoretical analyses), but, pertinently for this
dissertation, an increasing number of studies are taking a mixed methods approach,
combining corpus-linguistic techniques and forms of (critical) discourse analysis (see,
e.g., Baker et al., 2008; Partington, 2010).
Despite the theoretical and methodological heterogeneity, however, even a
26
cursory glance at the available literature reveals a heavy reliance on readily available
textual materials, particularly newspaper articles, although this is less so in studies
focusing on language ideology and language education, which rely on surveys, and
observational (e.g., speech recordings) and experimental (e.g., matched-guise technique)
methods and data as much as on textual pedagogic materials and policy documents. The
choice of context for language-ideological research thus depends to a considerable degree
on its thematic focus. In the case of research into language ideology and language
education typical contexts include classrooms as well as educational contexts beyond the
classroom such as school districts or university departments (for an overview, see
McGroarty, 2010), whereas research into language ideology and identity, ethnicity, and
nationalism, as well as social justice, relies more on politically framed contexts (e.g.,
state or regional and national entities, with some studies taking a comparative, cross-
national approach). Further, unlike the studies of educational contexts, studies with one
of the other two foci strongly favor textual materials such as newspaper discourse and
corpora of official policy documents or historical documents created in particular
institutional contexts in a variety of genres and registers (e.g., Fitzsimmons Doolan,
2009; Ricento, 2003). Other contexts include translation work (e.g., Kuo & Nakamura,
2005) and, most recently, computer games (Ensslin, 2010).
Regardless of the thematic focus and methodological orientation, however, most
research into public discourses and language ideologies has been limited to institutional
contexts of one kind or another (e.g., educational system). The primary reason for this, of
course, is that language ideologies are sociocognitive phenomena and so, as noted above,
institutions (whether cultural, political, or religious) as major discourse nodes are more
27
likely sites to contain traces of ideologies but also to have an impact on their
reproduction. And yet the problem with an exclusive focus on institutional contexts is
that they tend to be dominated by official, top-down discourses and ideologies, often to
the point of exclusion or at least obscuration of any bottom-up, that is counter-
hegemonic, discourses and ideologies. More research is therefore needed into contexts
which are more likely to show traces of unofficial or alternative discourses and ideologies
such as, for example, (subaltern) language activism (Jaffe, 1999), and online reader
commentary (Vessey, 2013b) and debates in cyberspace (i.e., online fora and social
networks, e.g., Johnson, Milani & Upton, 2010). A note of caution is in order here,
however: although anonymized online communication has the (often dubious) advantage
of being free from many constraints of face-to-face interactions (in addition to ease of
access) and thus can offer an insight into attitudes and beliefs devoid of certain pragmatic
considerations, representativeness of such data is nearly always an issue and so they
should be treated with caution and matched against data from other sources whenever
possible.
2.3.2 Research questions in language ideology research. Partly on account of
the research traditions in the contributing disciplines (i.e., anthropology, critical discourse
analysis) and partly on account of the theoretical and methodological nascence of the
field, research questions in the existing studies are not always stated explicitly and can
thus be implicit or even difficult to identify. The two early foci on language attitudes and
the role of language ideologies in the construction and reproduction of (ethno)nationalist
identities remain prominent in recent studies (Fitzsimmons-Doolan, 2011; Fleischer,
2007; Vessey, 2013a; O’Rourke & Ramallo, 2013); other contributions have attended to
28
the specific role of public institutions in the production of language ideologies (Spitulnik,
1998), and globalization-related issues such as the manipulation of language in
ideological public discourses on economic neoliberalism, political correctness, and
religious fundamentalism (Crapanzano, 2000; de Beaugrande, 1999; Johnson & Suhr,
2003; Salama, 2011), as well as migration and diasporic communities (Baker et al., 2008;
Fraysee-Kim, 2010).
Unsurprisingly, approaches to the formulation of research questions differ widely,
so here again we can discern some major trends. Earlier studies, which relied more on
ethnographic and historiographic methods as well as various forms of discourse analysis,
tend to ask two types of questions both of which often include a methodological
component. Studies based on ethnographic and historiographic analyses thus ask macro-
level questions such as the following: What is the structure of language ideology? What
are the consequences (for politics, for research) of language ideologies? (Irvine & Gal,
2000); How have ideologies of language, nation and state been connected to each other
and the practice of sociolinguistics? (Heller, 1999). Studies based on discourse analysis,
on the other hand, tend to focus more on micro-level or localized issues, asking questions
such as: What discourse prosodies (and ideologies) do the term “liberal” and its
derivatives “liberalism” and “liberalization” have? (de Beaugrande, 1999); and, What is
the role of language ideology in the conflict of Catalan and Spanish nationalisms over the
ownership of the 1992 Olympic games in Barcelona? (DiGiacomo, 1999). However,
Blommaert and Verschueren (1998), for example, rely on discourse analysis but ask
macro-level questions (What is the specific role of language in current nationalist
ideologies? What is an adequate methodology for ideology research?), while Jaffe (1999)
29
relies mainly on ethnography to ask both local and more global questions (What are the
ideological underpinnings of the strategies of language planners in Corsica? How are
their language ideologies rooted in European and French political economies?).
In contrast to these, more recent studies tend to be more synchronically-oriented,
localized, and textually-based, increasingly relying on mixed methods designs which
combine various corpus-linguistic techniques such as keyword and collocation analyses
and critical discourse analysis (CDA). Also here, there is often a methodological
component to research questions. Examples of research questions in the more recent
studies include: What are the similarities and differences between Anglo-American and
German discourses of “political correctness”? (Johnson & Suhr, 2003); What are the
common categories of representation of refugees, asylum seekers, immigrants and
migrants in the British press? Which texts are representative? (Baker et al., 2008); and,
Can the Canadian context provide a useful site for cross-linguistic corpus-assisted
discourse studies (CADS)? How can cross-linguistic CADS shed light on language
ideologies in Canadian newspapers? (Freake, 2011).
Overall, then, research into language ideology seems to be moving away from
predominantly qualitative methods and theoretical and macro-level questions toward
mixed methods and methodological and micro-level questions focusing on localized
contexts. Most pertinently for the field of language policy and planning (LPP), there is
an emerging trend of using mixed methods approaches to examine corpora of official
language policy documents focusing on questions of localized state or national interest
(e.g., Fitzsimmons-Doolan, 2011; Freake, Gentil, & Sheyholislami, 2011). Interestingly,
although the question of the siting of ideology (Woolard & Schieffelin, 1994) is of central
30
importance to the identification and interpretation of language ideologies, language-
ideological research has so far largely failed to produce integrated accounts which would
examine and compare data from different sites of ideological (re)production (but see
Blackledge, 2005; Jaffe, 1999; and Vessey, 2013b for steps in this direction). At the same
time, the availability of data from various social media, as well as academic sources,
offers an opportunity for more integrated and innovative perspectives to consider and
compare sites of ideological (re)production which demonstrate varying levels of
discursive consciousness.
2.3.3 Types of data used in language ideology research. The different types of
data typically relied upon in language-ideological research have been hinted at above.
Depending on the methods used, they can be qualitative or quantitative and include
survey, observational, and experimental data, as well as data in the form of pedagogic
materials, official policy documents, newspaper and other media discourse, and historical
documents. In addition to these, some data types that have received comparatively less
attention include texts produced under experimental conditions (Wallis, 1998), time
allocation for different languages in broadcast media in multilingual contexts (Spitulnik,
1998), as well as data obtained by way of discussion groups (O’Rourke & Ramallo,
2013).
Newspaper discourse is probably the most frequently used type of data in
language ideology research. The reasons for a relative overemphasis on newspapers and
other print periodicals such as magazines over other potential sources of data are several.
First, and perhaps foremost, as already noted, “newspapers are self-conscious loci of
ideology production” (DiGiacomo, 1999, p. 105) as well as “key sites for language
31
ideological debates between various kinds of social actors” (Ensslin & Johnson, 2006, p.
155). Second, although they have been losing ground to newer media (e.g., television,
the Internet) for a long time, newspapers remain an influential institution in most
societies around the world, offering researchers a wealth of information in an increasingly
easily accessible and manipulable format (i.e., electronic text). Third, because of the
relative similarities between news discourse organization strategies across newspapers
but often also languages, newspaper data allow for effective synchronic comparisons of
discourses and ideologies based on independent variables such as political affiliation and
(ethno) national identity. Fourth, the relative constancy of newspaper formats across time
allows for equally effective diachronic comparisons, which are particularly useful in
demonstrating the changing nature of discourses and ideologies over time and their
dialogic relationship with the cultural, social, and economic conditions in which they are
embedded.
2.3.4 Corpus linguistics in research on discourse and language ideology.
Although there is a growing body of research that relies on corpus linguistics to study
discourse (e.g., Baker, 2006; Baker et al., 2008; Baker, 2010; Baker, Gabrielatos &
McEnery, 2013; Mautner, 2007; Partington, 2003; Partington, 2010), corpus-based (or
corpus-assisted or corpus-driven) studies explicitly concerned with language ideology are
still rare (e.g., Fitzsimmons Doolan, 2009, 2011, 2014; Ensslin & Johnson, 2006; Freake,
Gentil & Sheyholislami, 2011; Subtirelu, 2015; Vessey, 2013a,b). Corpus-based
discourse and language ideology studies make use of a wide variety of corpus-linguistic
tools such as wordlists, clusters, concordances, dispersion plots, collocates, and keywords
(for an excellent introductory overview, see Baker, 2006). Keyword analysis, a statistical
32
approach to contrastive analysis of word frequencies (for a detailed explanation, see
further below) is used to identify the lexical features characteristic of research corpora
and thus potentially interesting foci for follow-up discourse analysis. Scott (1997), for
example, uses this approach to identify lexical patterns suggesting gender bias in
contemporary English. Similarly, Subtirelu (2015) compares student evaluations of
university instructors with US and Korean names, finding a bias against the Koreans
rooted in a language ideology of nativism. Scott’s original approach focusing on
individual keywords has been broadened in recent years to include part-of-speech (POS)
and semantic category/field/domain analyses (Culpeper, 2009; Rayson, 2008), as well as
“aboutgrams” (Sinclair, 2006, cited in Warren, 2010, p. 118), the frequently occurring
lexical phrases which point to a text’s (or a corpus’) “aboutness”. While keywords point
to the “aboutness” of a text or corpus, key semantic fields and aboutgrams are taken to be
more directly suggestive of discourses and ideologies extant in the corpora under
investigation.
In addition to keyword analysis, corpus-based discourse studies often rely on
collocation analysis as well. Baker (2004, p. 347) argues that keyness and collocation
patterns can alert the researcher to “the existence of types of (embedded) discourse or
ideology.” Baker et al. (2008), for example, base their examination of the lexical patterns
around four core concepts (i.e., search terms) in their study of the discourse on
immigration in Britain on a combination of keyword analysis and collocation analysis.
Baker, Gabrielatos and McEnery (2013) take this approach a step further by using an
online corpus query system called Sketch Engine to grammatically tag their corpus and
then analyze not only lexical but also grammatical co-occurrence patterns between
33
collocates. Similarly, Freake, Gentil and Sheyholislami (2011) rely on collocation
analysis in addition to keyword analysis to contrast English- and French-language
discourses on the construction of nationhood in Quebec, while Vessey (2013b) adopts a
similar approach to study the language ideological debate on the use of French during the
Vancouver Olympics. Most recently, lexical patterns resulting from collocation analysis
have been used as data in quantitatively more sophisticated analyses. Fitzsimmons
Doolan (2011, 2014) used factor analysis, a multivariate statistical technique which
groups variables on the basis of their covariance (for details, see further below), on the
collocates of three core concepts identified using a 1.4 million-word corpus of language
policy documents, finding five factors interpretable as language ideologies.
Typically, the results of quantitative, corpus-based analysis are used to identify
patterns in the data which are worth pursuing further as well as to “downsample” the
corpus data to a size manageable by a human analyst. Quantitative analysis is then
combined with appropriate micro and macro discourse-analytic techniques in a
hermeneutic circle whereby the results of quantitative analysis lead to qualitative
analysis, the results of which are then in turn checked for reliability using quantitative
methods (Baker et al., 2008, p. 295; Fairclough, 2010; Partington, 2010; van Dijk, 2006;
Wodak, 2001; for examples, see Mautner, 2007 and Vessey, 2013a).
2.4 Gaps
This study aims to address several theoretical and methodological gaps in the
literature. It is well known that corpus linguistics research, including studies of discourse
and ideology, was developed in English and has also disproportionately focused on
English and a small number of major European languages. At the same time, extensive
34
inflectional morphology presents a number of corpus-linguistic challenges that are largely
absent from studies of languages such as English. Therefore, the first gap has to do with
the paucity of corpus-based research into smaller languages such as Central South Slavic
which are considerably structurally different from English.
The second gap is related to the first and has to do with geopolitical focus. Much
of recent research has examined language debates (e.g., Blommaert, 1999) and the links
between language ideology and national identity in plurilingual and multicultural
societies (e.g., Canada, Vessey, 2013; Finland, Hult & Pietikäinen, 2014; Spain/Catalonia,
Pujolar, 2007), but little attention has been paid to contexts with minimal linguistic
differences between groups, particularly in languages other than major European
languages (but see Wilce, 2010). This study will therefore contribute to the growing
body of literature on language ideology by focusing on a hitherto unexamined case
(closely related language varieties) and context (West Central Balkans).
Third, although several quantitative methods (keyword analysis, collocation
analysis, exploratory factor analysis) have been used in discourse and ideology studies,
there have as of yet been no attempts to compare them. So, this study will compare and
contrast the results obtained through the application of these methods in terms of their
usefulness and effectiveness for the study of language-related discourses and language
ideologies.
Finally, the fourth gap has to do with synchronic variation between different
discursive sites. This study will therefore compare language-related discourses and
language ideologies based on the distinction between ‘discursive’ and ‘practical’
consciousness (Kroskrity, 1998, 2004 following Giddens, 1979) by contrasting general
35
newspaper articles (written by journalists and/or experts) with letters-to-the-editor
(largely lay opinion).
3. Study Overview
This chapter presents the research questions, research design, construct definitions
and operationalizations, and coding procedures.
3.1 Research Questions
3.1.1 Research question 1. Can corpus linguistics-based quantitative methods
(keyword, collocation, exploratory factor, and cluster analyses) be used to identify lexical
patterns suggestive of dominant language-related discourses and language ideologies in
Central South Slavic and what similarities/differences are there between them?
3.1.2 Research question 2. What language-related discourses and language
ideologies relevant to Central South Slavic ethnolinguistic identities can be identified in
the 5+ hits section of SERBCORP?
3.1.3 Research question 3. What links can be identified between the language-
related discourses and language ideologies relevant to Central South Slavic
ethnolinguistic identities and ethnonationalism?
3.1.4 Research question 4. Is there synchronic and diachronic variation in the
identified language-related discourses and language ideologies relevant to Central South
Slavic ethnolinguistic identities?
3.2 Research Design
Table 1 presents the overall research design employed in the study. Data,
methods, and analytical procedures are listed by research question. Figure 1 shows a
diagram outlining the analytical process employed in the study.
36
Table 1
Research Design: CL and CDA Investigation of Language-related Newspaper Discourse
Research Question Data Methods/Analyses Conducted

RQ1. Can corpus linguistics- SERBCORP Identification of keywords in
based quantitative methods SERBCORP and the 5+ hits
(keyword, collocation, 5+ hits section of SERBCORP section of SERBCORP
exploratory factor, and cluster
analyses) be used to identify SERBCOMP Identification of significant
lexical patterns suggestive of collocates of the core concept
dominant language-related WaC corpora lemma JEZIK ‘language’
discourses and language
ideologies in Central South Exploratory factor analysis using
Slavic and what collocates as variables and texts
similarities/differences are there as observations
between them?
Analysis of variance comparing
mean text scores
Cluster analysis using factors as

predictor variables
RQ2. What language-related 5+ hits section of SERBCORP CDA/DHA-based interpretation
discourses and language of language-related discourses
ideologies relevant to Central Excerpts from individual texts and language ideologies (with
South Slavic ethnolinguistic identified as representative by references to the results of
identities can be identified in the EFA quantitative analyses)
5+ hits section of SERBCORP?
RQ3. What links can be Excerpts from individual texts CDA/DHA-based interpretation
identified between the language- identified as representative by of identified language ideologies
related discourses and language EFA as they pertain to Central South
ideologies relevant to Central Slavic ethnolinguistic identities
South Slavic ethnolinguistic Excerpts from the 1986 SANU15 (including topoi and references
identities and ethnonationalism? Memorandum to the results of quantitative
analyses)
RQ4. Is there synchronic and 5+ hits section of SERBCORP Analysis of variance comparing
diachronic variation in the mean scores of texts grouped by
identified language-related publication, year of publication,
discourses and language and type of article
ideologies relevant to Central
South Slavic ethnolinguistic Cluster analysis grouping texts
identities? by publication, year of
publication, and type of article
37
Corpus compilation
(relevant publications)
Sampling phase I
Research corpus Reference corpora
Articles with 1+ hits for Articles w/o hits for

jezi* (SERBCORP) jezi* (SERBCOMP);
WAC
Sampling phase II
Corpus comparisons >
SERBCOMP
Articles with 5+ hits for

jezi* (5+ hits section of
SERBCORP)
Keyword analysis Collocation analysis
Exploratory factor
analysis
Analysis of variance CDA/DHA
Cluster analysis
Figure 1. Diagram of the analytical process
38
3.3 Construct Definitions and Operationalizations
This section presents the definitions and operationalizations of the key constructs
investigated in the present study, as well as the coding procedures used.
3.3.1 Core concepts. The core concepts (lemma JEZIK ‘language’ for the purposes
of corpus compilation and initial collocation analysis, but also lemmas BOSANSKI JEZIK
‘Bosnian language’, CRNOGORSKI JEZIK ‘Montenegrin language’, HRVATSKI JEZIK
‘Croatian language’, SRPSKI JEZIK ‘Serbian language’, and
SRPSKOHRVATSKI/HRVATSKOSRPSKI JEZIK ‘Serbo-Croatian/Croato-Serbian language’ for
the purposes of follow-up analysis) are simply those lexical items and phrases whose
patterning in the corpus is most likely to lead to the identification of dominant language-
related discourses and language ideologies relevant to the maintenance and (re-)
construction of Central South Slavic ethnolinguistic identities and thus ethnonationalisms
in West Central Balkans.
3.3.2 Keywords. Here I use Scott’s (1997, p. 236) definition of a key word “as a
word which occurs with unusual frequency in a given text.” More importantly, keywords
are understood here also as “pointers to complex lexical objects which represent the
shared beliefs and values of a culture” (Stubbs, 2010, p. 23).
3.3.3 Relevant collocates. The relevant collocates are all lexical and some
function words (e.g., possessive pronouns) that are shown to be statistically significant
collocates of the core concepts and which, upon further analysis (i.e., concordancing,
factor analysis), are determined to pattern in ways suggestive of language-related
discourses and language ideologies. Although such collocates are determined by setting a
search span around a core concept (e.g., L5-R5), relevant collocates are defined here also
39
more broadly as “textual collocates” (Mason & Platt, 2006, cited in Stubbs, 2010, p. 27)
such that all their textual occurrences are counted in each text (for purposes of EFA) and
not only those that appear within a certain span of the node word.
3.3.4 Dominant language-related discourses. Discourse is a multifaceted term
with a range of divergent meanings in social sciences and the humanities. The traditional
definition of discourse in linguistics is simply “language above the sentence or above the
clause” (Stubbs, 1983, p. 1) or “language in use” (Brown & Yule, 1983). Michel
Foucault, however, added a socio-cognitive aspect to the term, defining it as “practices
which systematically form the objects of which they speak” (1972, p. 49). It is this
difference that prompted James Gee (2010, p. 34) to make a distinction between
discourses with a small ‘d’ (“language-in-use”) and discourses with a big ‘D’ (language-
in-use plus “socially accepted associations among ways of using language”) which is
sometimes used to differentiate between the two main understandings of discourse.
Foucault’s definition was elaborated by Norman Fairclough who conceptualizes
discourse in terms of three interrelated dimensions: text, discoursal practice (text
production, distribution and consumption), and social practice (e.g., 2010, p. 59). In
addition, as can be seen from Gee’s definition, it is possible to conceive of discourse, not
as a more or less coherent product of social and semiotic “practices” and thus a singular
noun as in Foucault’s case, but as a phenomenon with a multitude of manifestations and
therefore a plural noun. Although this plural understanding of discourse has gained
currency in both social sciences and the humanities, more often than not it is implicit and
left undefined. Most pertinently for my purposes here, Baker (2006) draws on several
sources to expand upon the original Foucault’s definition and add a plural dimension to
40
it, so I reproduce his account at length here,
[D]iscourse is a “system of statements which constructs an object” (Parker, 1992,

p. 5) or language-in-action (Blommaert, 2005, p. 2). It is further categorized by
Burr (1995, p. 48) as a “set of meanings, metaphors, representations, images,
stories, statements and so on that in some way together produce a particular
version of events… Surrounding any one object, event, person, etc., there may be
a variety of different discourses, each with a different story to tell about the world,
a different way of representing it to the world. Because of Foucault’s notion of
practices, discourse therefore becomes a countable noun: discourses (Cameron,
2001, p. 15). So around any given object or concept there are likely to be
multiple ways of constructing it […] (Baker, 2006, p. 4).
Taking this definition as a starting point, discourses are understood here as more
or less coherent systems of statements which construct an object of which they speak
(e.g., language) or an aspect thereof from a particular social or cultural position with the
goal of upholding the interests associated with this position. A discourse may be said to
be ideological to the extent that it reproduces or challenges unequal relations of power
between social subjects (Fairclough, 2010). Depending on their social effects, discourses
can thus be either dominant/hegemonic (reproducing domination) or subaltern/counter-
hegemonic (challenging domination). Language-related discourses are discourses thus
defined which pertain to language rather than other aspects of social reality. They are
operationalized as a) individual factors resulting from factor analysis, and sets of factors
clustering together (small ‘d’ discourses), and b) broader sets of statements about
linguistic and social relationships that extend across factor, cluster, and textual
boundaries and have identifiable ideological functions (big ‘D’ discourses).
3.3.5 Dominant language ideologies. Similar to discourse, language ideology is a
contested concept with a range of divergent meanings in the humanities and social
sciences (e.g., Eagleton, 1991). Following Irvine (1989, p. 255), language ideologies are
41
understood here as “the cultural system[s] of ideas about social and linguistic
relationships, together with their loading of moral and political interests.” More
specifically, “language ideologies represent the perception of language and discourse that
is constructed in the [political-economic] interest of a specific social or cultural group”
(Kroskrity, 2000b, p. 8, my emphasis). Also here, the use of the plural reflects a concern
with different social positions from which language ideologies emanate as well as the
different aspects of the object of ideologization. As with discourses, dominant language
ideologies are understood to be hegemonic, i.e. espoused from positions of power for the
purpose of maintenance of that power, whether naturalized or contested (cf. Kroskrity,
1998). Language ideologies are operationalized as implicit or explicit beliefs about
language and its relationship to society underlying dominant language-related discourses
identified here.
3.3.6 Ethnonationalism. Similar to discourse and ideology, nationalism is a
problematic concept and a complex social phenomenon that defies simple definitions. To
make things even more difficult, definitions and understanding of nationalism depend on
definitions and understanding of related concepts such as nation, nation-state, and
ethnicity, which are similarly problematic (for comprehensive discussions of these
concepts in relation to language, see, e.g., Barbour, 2000; Fishman, 1997; Fought, 2006;
May, 2001; Safran 1999). However, theorists often distinguish between ‘civic’ or ‘state’
and ‘ethnic’ nationalisms, i.e. nationalisms defined by affiliation with nations and nation-
states which are not necessarily culturally homogeneous (cf. Staats- or Willensnation; for
a discussion, see Wodak, de Cillia, Reisigl & Liebhart, 1999) and nationalisms defined by
affiliation with (or aspirations toward) nation-states based on ethnies or pre-existing
42
cultural groups that have a common myth of origin and share history, basic cultural
practices, and often language and religion (cf. Kulturnation, ibid.). Historically, ethnic
nationalism has clearly been the more important of the two in the Balkans (Carmichael,
2000), examples of temporarily successful civic or state nationalism such as the
(Communist) Yugoslav nationalism of the second Yugoslavia (1945-1991)
notwithstanding. Ethnonationalism is thus here understood simply as an intellectual
movement to define and pursue the political interests of a nation (whether self-declared
or recognized by others as such) and/or its state defined in ethnic terms.
3.4 Coding Procedures
Each text in the 5+ hits section of SERBCORP was coded for: a) publication
(Blic, NIN, Politika, Vreme); b) year of publication (2003, 2004, 2005, 2006, 2008), and
c) type of article (general newspaper articles, letters-to-the-editor).
4. Data
SERBCORP, the specialized research corpus compiled to represent general
language-related discourse in mainstream Serbian press, consists of articles containing
one or more instances of any of the lemma forms of the word jezik ‘language’ from four
leading national newspapers, two dailies (Blic, a tabloid, and Politika, a broadsheet) and
two weeklies published in Serbia (NIN, Vreme) in the period between 2003 and 2008.16
The publications were chosen based on three criteria: type of publication (broadsheets vs.
tabloids, dailies vs. weeklies), circulation figures,17 and relative standing in the Serbian
and regional publics (for details about the Serbian media market, see Đoković, Hrvatin &
Petković, 2004).18 Because full data sets were only available for the period between 2003
and 2008 (with the exception of Politika for the year 2007),19 the data set is limited to the
43
years 2003-2006 and 2008. Similarly, SERBCOMP, the reference (or, rather,
comparator) corpus used here, comprises articles from Politika, Blic, NIN, and Vreme, as
well as Večernje Novosti, published in the period between 2003 and 2014.
The two corpora were compiled by downloading the relevant articles from the
Serbian online media archive Ebart (www.arhiv.rs)20 as follows. After the target
publications had been identified, publication-specific searches were run for articles
containing any of the inflectional forms of the core concept lemma JEZIK ‘language’21
(and, perforce, the lemma JEZIČKI ‘linguistic’) by using the search term “jezi*” and the
given timeframe. Using a custom Python application, relevant articles thus identified
were then automatically downloaded, formatted22 and saved in separate folders according
to corpus, publication, and year and month of publication (e.g., SERBCORP>Politika>
2006>July). The application also automatically named the files according to publication
(e.g., POL for Politika), date of publication (e.g., POL-22-7-2006 for July 22, 2006), and
download rank for their given month (e.g., POL-22-7-2006-55 for the 55th article
downloaded from Politika for July 2006) or publication (in the case of publications with
lower numbers of articles, e.g. BLI-30-3-2004-544) to give them unique identifiers.
Similarly, using the search term “NOT jezi*”, SERBCOMP was compiled from randomly
chosen articles not containing any forms of either one of the two core concept lemmas
(JEZIK ‘language’ and JEZIČKI ‘linguistic’). Once compiled, both corpora were checked
for errors and duplicates. Finally, a frequency-based wordlist was used to identify and
exclude from SERBCORP a small number of articles which formally met the search
criteria (“jezi*”) but were nevertheless irrelevant (e.g., those containing words such as
jezičak ‘little tongue’, ježičak ‘little hedgehog’ and jezivo ‘horrible’, or last names such as
44
Ježić but not forms of the lemmas JEZIK or JEZIČKI).
SERBCORP comprises a total of 11,656,247 words from 16,148 articles, with a
majority of both (49.88% of words and 61.48% of articles) coming from the daily
Politika as the oldest and arguably most influential daily in Serbia (Tables 2 and 3).
Expectedly, dailies (Blic, Politika) contribute shorter articles as compared to weeklies
(NIN, Vreme), while standardized type-to-token ratios (Scott, 2014a) are similar across
the board (Table 4).
Table 2
Composition of SERBCORP (by Publication)
Publication No. of words % of words No. of articles % of articles

Blic 2,000,579 17.16 3,437 21.28
NIN 2,286,320 19.61 1,761 10.91
Politika 5,813,618 49.88 9,928 61.48
Vreme 1,555,730 13.35 1,022 6.33
Total 11,656,247 100.00 16,148 100.00
Table 3
Number of Articles in SERBCORP (by Year and Publication)
Year/Publication Blic NIN Politika Vreme Total by

year
2003 698 364 2,000 183 3,245
2004 670 376 1,848 224 3,118
2005 553 350 1,902 194 2,999
2006 674 332 2,151 176 3,333
2008 842 339 2,027 245 3,453
Total by publication 3437 1761 9,928 1022 16,148
SERBCOMP comprises a total of 22,493,804 words from 37,227 articles from all five
publications from the period between 2003 and 2014, as mentioned above (Table 5).
45
Table 4
Article Means, SD, and STTR in SERBCORP (by Publication)
Publication Mean length in words SD STTR

Blic 572.63 563.55 55.42
NIN 1,283.81 916.95 56.90
Politika 576.58 335.53 56.69
Vreme 1,505.12 1,384.88 56.55
Table 5
Composition of SERBCOMP (by Publication)

Blic 3,015,901 13.41 11,703 31.44
NIN 8,460,519 37.61 9,907 26.61
Politika 1,506,347 6.70 3,688 9.91
Večernje Novosti 910,588 4.05 2,889 7.76
Vreme 8,600,449 38.23 9,040 24.28
Total 22,493,804 100.00 37,227 100.00
Preliminary keyword and collocation analyses performed on SERBCORP (for
results and discussion, see Appendices A-C) suggested a section of SERBCORP
comprising articles with 5 or more hits for the lemma JEZIK ‘language’ as the optimal
research corpus for present purposes (see Table 6 for a breakdown of articles by hit count).
Table 6
Articles in SERBCORP (by Hit Count for the Lemma JEZIK and Percentage)
Hits No. of files per hit count %

1 10,616 65.75
2-4 4,275 26.47
5-9 843 5.22
10+ 414 2.56
The 5+ hits section of SERBCORP thus comprises a total of 1,118,454 words from 1,257
articles, with a majority of both from Politika (52.62% of words and 67.38% of articles,
see Tables 7 and 8). Similar to SERBCORP, the dailies contributed larger numbers of
46
shorter articles while standardized type-to-token ratios remain similar (Table 9).
Interestingly, the total number of 5+ hits articles decreased during this period in a linear
fashion (Figure 2), suggesting a gradual focus away from an explicit thematization of
language, arguably owing to changing sociopolitical circumstances (see Section 6.6). All
subsequent analyses were performed on the 5+ hits section of SERBCORP.
Table 7
Composition of the 5+ Hits Section of SERBCORP (by Publication)

Blic 148,492 13.28 164 13.05
NIN 280,281 25.06 184 14.64
Politika 588,605 52.62 847 67.38
Vreme 101,076 9.04 62 4.93
Total 1,118,454 100.00 1,257 100.00
Table 8
Number of Articles in the 5+ Hits Section of SERBCORP (by Year and Publication)
Year/Publication Blic NIN Politika Vreme Total by year

2003 49 42 196 10 297
2004 39 48 171 15 273
2005 20 36 170 13 239
2006 18 31 172 14 235
2008 38 27 138 10 213
Total by publication 164 184 847 62 1,257
Table 9
Articles Means, SD, and STTR in the 5+ Hits Section of SERBCORP (by Publication)
Publication Mean length in words SD STTR

Blic 893.79 949.33 55.07
NIN 1508.79 989.70 56.48
Politika 685.78 376.97 55.09
Vreme 1,613.16 1,164.43 55.89
47
400
297
273
No. of articles
300 239 235 213
200
100
0
2003 2004 2005 2006 2008
Year of publication
Figure 2. Distribution of 5+ hit articles (by year, all publications)
5. Methods
This study takes a mixed methods, lexical approach to the identification of
language-related discourses and language ideologies, combining corpus linguistics (CL)
and critical discourse analysis (CDA) in a manner similar to that originally proposed by
Baker et al. (2008). The initial, largely quantitative phase relies on five distinctly
different methodological approaches in the process of identification of pertinent lexis and
lexical patterns: keyword analysis, collocation analysis, exploratory factor analysis,
analysis of variance, and cluster analysis. All quantitative analyses were conducted with
the help of WST and the Statistical Package for the Social Sciences 21.0 (SPSS; IBM,
2012), as well as several custom Python and PERL applications. The follow-up, largely
qualitative phase in turn relies on analytical techniques developed within the discourse-
historical approach (DHA) to CDA (Reisigl & Wodak, 2009; Wodak, 2001). It should be
noted, however, that the research design is not purely sequential (quantitative-to-
qualitative) but rather hermeneutic (i.e., moving between quantitative and qualitative
techniques as necessary, cf. Baker et al., 2008; Reisigl & Wodak, 2009) as results of both
quantitative and qualitative analytical procedures are examined from both perspectives
and thus further focused and refined. Following is a discussion of the theoretical
48
background and a step-by-step explanation of the relevant parameters and procedures
used in the analysis.
5.1 Keyword Analysis
5.1.1 Theoretical background. Corpus-based discourse and ideology research
has mostly relied on keyword analysis (in addition to basic corpus-linguistic techniques
such as frequency, concordance, and collocation analysis, see below). Keyword analysis
has thus been used in a wide variety of discourse studies (see, for example, the essays in
Bondi & Scott, 2010) to identify what characterizes a certain text or corpus, as well as to
look for differences between parallel texts or corpora. The goal of keyword analysis
(Scott, 1997) is the identification of words “which occur with unusual frequency in a
given text [or corpus]” (p. 236), i.e. lexical features characteristic of research corpora and
thus potentially interesting as foci for follow-up discourse analysis. It requires a
reference corpus in addition to a research corpus and can be carried out automatically
using WST. An appropriate reference corpus should be composed of texts in the same
language as the research corpus, and is typically expected to be larger than the research
corpus, although what its optimal size may be is as of yet unclear (Scott, 2009, 2010).
The reference corpora of choice have often been large general corpora (i.e., those
comprising different registers) such as the BNC. However, in the absence of such
reference corpora (e.g., for languages other than English) and depending on research
questions, comparator corpora (corpora of similar size and register as the research
corpus) have been used. Examples include corpora compiled from texts on the same
topic reflecting different political or other orientations or corpora compiled from the same
types of texts excluding those with the same focus as the research corpus (see, for
49
example, Baker, 2006; Subtirelu, 2015; Vessey, 2013b). Once wordlists for both the
research and reference/comparator corpus have been compiled, keyword analysis uses
either the chi-square or log-likelihood statistic to cross-tabulate each word’s observed
frequency and the number of running words in the research corpus with its observed
frequency and the number of running words in the reference corpus (Scott, 2014a). This
procedure determines which words appear statistically more (or less) frequently in the
research corpus as compared to the reference corpus.
The result is a statistical measure of a word’s salience in the research corpus
reflected in a keyness score which is based on the statistic chosen (i.e., chi-square or log-
likelihood). A list of keywords (KWs) calculated for a corpus thus suggests the
“aboutness” of that corpus, i.e. what a corpus is about. KWs can be positive (when they
are significantly more frequent in the research as compared to the reference corpus) or
negative (when they are significantly less frequent in the reference corpus). Whereas
positive KWs suggest what a corpus is about, negative KWs can be used as an indicator
of what may be missing from it. Finally, the resulting KWs can be grouped into semantic
fields intuitively by the researcher in order to identify any patterns for further analysis
(cf. Baker, 2004; Ensslin & Johnson, 2006).
Keyword analysis has been the object of widespread criticism on several grounds
and particularly for its dependence on the size and type of reference corpus chosen, as
well as the choice of statistic (i.e., questionable reliability). Some researchers argue that
larger reference corpora are generally better (e.g., Scott, 2010), while others have
suggested that the optimal size for a (specialized) reference corpus may be five times the
size of the research corpus (Berber Sardinha, 1999, 2004). In contrast, Xiao & McEnery
50
(2005, p. 70) contend that “the size of the reference corpus is not very important in
making a keyword list,” particularly when dealing with sufficiently large corpora (cf.
Scott & Tribble, 2006, p. 64). Similarly, despite the wide reliance on large general
corpora, Culpeper (2009, p. 35) argues that it is better to use a reference corpus that is as
close as possible to the research corpus since this approach to keyword analysis avoids a
focus on irrelevant stylistic differences between registers and is more likely to produce a
keyword list which “reflect[s] something specific to the target [i.e., research] corpus.” At
the same time, as Rayson (2008, p. 527) notes, because of the independence assumptions
built into the procedure there should be no overlap between the research and reference
corpora. Finally, although Scott (2014a) suggests that the chi-square test “gives a better
estimate of keyness” in longer texts or entire corpora than the log-likelihood, Culpeper
(2009, p. 36) found that the two tests produce “only minor and occasional differences in
the ranking of words.”
5.1.2 Analytical parameters and procedures. Having decided on the optimal
reference corpus (see Appendix B for details of the procedure), several separate KW runs
were performed (retaining the parameters detailed above). First, to get a discursive
profile of the research corpus, the 5+ hits section of SERBCORP was compared to
SERBCOMP (the top 50 positive and all negative KWs are shown in Tables 12 and 13;
the full list of positive KWs is presented in Appendix D). Second, to get a discursive
profile of the 5+ hits section of SERBCORP vis-à-vis the 1-4 hits section of SERBCORP
and further test the validity of the sampling criterion (see discussion in Appendix A), the
5+ hits section of SERBCORP was compared to the 1-4 hits section of SERBCORP (the
top 50 positive and all negative KWs are shown in Tables 14 and 15; the full list of
51
positive KWs is presented in Appendix E). Third, KWs in the 5+ hits section of
SERBCORP were organized into semantic domains on the basis of their prevalent
meanings in this section of the corpus, as attested by concordance lines (Table 14).
Fourth, using WST a KW database was compiled to calculate key-KWs (KKWs, KWs
that are key in several texts) and KKW associates (KWs appearing in the same texts as
KKWs). The parameters used were as follows: p = .0000000001; minimum KKW
frequency = 2; minimum number of texts for database = 3; statistic for the calculation of
associates = MI3 (≥ 3); minimum number of associate texts for database = 3; and
minimum number of KWs per text for database = 3. Fifth, KKWs potentially related to
ethnolinguistic identities (e.g., glottonyms and ethnonyms) were identified and their
associates examined (Table 15). Sixth, and last, the KKW equivalents of the highest-
loading items from all 12 factors resulting from EFA (see Section 6.3) were examined for
associates in order to compare the results of keyword analysis associates procedure and
EFA (Table 16).
5.2 Collocation Analysis
5.2.1 Theoretical background. In contrast to keyword analysis, collocation
analysis examines the co-occurrence patterns between words and does not require a
reference corpus. The strength of association between two words is measured by various
statistical techniques such as the t-test, and z- and mutual information (MI) scores
(McEnery, Xiao & Tono, 2006). MI score, the preferred technique in analyses focusing
on relatively infrequent items, is calculated by comparing “the probability of observing
the two words together with the probability of observing each word independently, based
on the frequencies of the words” (Biber, Conrad & Reppen, 1998, p. 266). A score of 0
52
means that there is no association between the words, while a score higher than 0
suggests positive association; scores lower than 0 suggest negative association. An MI
score of 3 or higher is considered to indicate a significant association (Hunston, 2002, p.
71). Unlike keyword analysis, which represents a more general lexical (and discursive)
characterization of a corpus, collocation analysis provides an indication of how individual
words are used in a corpus. Such patterns can be suggestive of particular discourses and
underlying ideologies as “[n]o words are neutral [and] [c]hoice of words represents an
ideological position” (Stubbs, 1996, p. 107).
Further, in line with the recent shift in focus in corpus linguistics and applied
linguistics research generally to phraseology (see, for example, Biber, Conrad & Cortes,
2004; Gray & Biber, 2013; Chen & Baker, 2010), corpus-based research into discourses
and ideologies has examined n-grams (also known as lexical bundles or clusters, i.e.
recurring word combinations with n number of constituents, e.g. jezik i književnost
‘language and literature’; see Cheng & Lam, 2013 for a discourse analysis application).
N-gram analysis, as it will be referred to here, is useful as recurrent word combinations
can be more informative in semantic (and discursive) terms than individual collocates
considered in isolation.
5.2.2 Analytical parameters and procedures. Collocation analysis was run
separately on SERBCORP and the 5+ hits section of SERBCORP. It was conducted with
the help of the ‘concordance’ tool in WST, using the span of five words to the left of the
node word (lemma JEZIK ‘language’) and five words to the right (L5-R5), and cutoff
points for item frequency (≥ 20), number of texts (≥ 20), and strength of association (MI
≥ 5). Although these cutoff points are somewhat arbitrary, they ensured that the analysis
53
produced a manageable number of significant collocates that are sufficiently well
distributed throughout the corpus (cf. Biber, 1993). In the next step, the collocate lists
thus produced were scanned for the presence of irrelevant items such as function words, a
small number of which was then deleted from both lists.23 The results of collocation
analyses are presented in parallel lists ordered by frequency and MI score. The full lists
of the lemma collocates of JEZIK in SERBCORP (again, by frequency and MI score) are
shown in Appendix F (Tables F1 and F2). The top fifty lemma collocates of JEZIK in the
5+ hits section of SERBCORP (by frequency and MI score) are shown in Tables 17 and
18. The full list of collocates for the 5+ hits section of the corpus is given in Appendix G
(Tables G). (Note that these are the collocates that were used in EFA.) Finally, n-gram
analysis was conducted using the ‘clusters’ function in the ‘concordance’ tool in WST
(not to be confused with cluster analysis discussed in Section 6.5) The parameters used
were 2-6-constituent n-grams (to cover a wide range of frequently occurring phrasal
patterns), with a minimum item frequency of five in the span of five words to both left
and right of the node word (L5-R5); analysis was conducted separately for each of the
forms of the node lemma JEZIK ‘language’. A sample of the most frequent n-grams in the
5+ hits section of SERBCORP is shown in Table 19 (Section 6.2.2).
5.3 Exploratory Factor Analysis
5.3.1 Theoretical background. Exploratory factor analysis is a multivariate
statistical technique which groups variables into sets (called factors) based on their
covariance (for a detailed explanation of EFA, see Tabachnick & Fidell, 2007). It is
particularly useful for explorations of large data sets with numerous variables because it
can suggest patterns of variation and thus constructs underlying multiple variables, which
54
makes interpretation of complex patterns of variation possible. The application of EFA in
linguistics was pioneered and popularized by Douglas Biber (e.g., 1988, 2006), whose
methodology for the analysis of language use based on function-related patterning
between large numbers of grammatical and other variables (called multidimensional
analysis, MD) has had a significant impact on the study of grammar as well as
composition pedagogy, second language acquisition, and other related areas. Although
EFA-based multidimensional analysis is used in an increasing number of subfields of
applied linguistics (see, for example, the papers in Cortes & Csomay, 2015), it has not,
with one exception, been used in studies of discourse and ideology. Fitzsimmons Doolan
(2011, 2014), however, adapted MD by focusing on the co-occurrence patterns among
lexical rather than grammatical features. In her study of language ideologies in the
educational sphere in Arizona, she compiled and analyzed a corpus of official language
policy documents to identify the collocates of the core concepts language, literacy and
English. In the next step, she counted the frequencies of these collocates in all of the
texts in her corpus and then subjected those counts to EFA. This resulted in five factors
(i.e., groups of collocates that systematically co-occur throughout the corpus)
interpretable as different language ideologies on account of their indexical links to
language-related beliefs and attitudes existing in the social realm. Similar to EFA, cluster
analysis is a multivariate statistical technique which can be used to group objects or cases
such as individual texts within a data set. It has been recommended as a follow-up
procedure to EFA because of its ability to identify hitherto unidentified patterns in data
(Biber & Staples, in press). In this study, it is used to explore the differences and
similarities between texts based on their variation on three independent variables
55
(publication, year of publication, and article vs. letter-to-the-editor).
5.3.2 Analytical parameters and procedures. The application of EFA here is
based on Biber (1988) and Fitzsimmons Doolan (2011, 2014). However, instead of
limiting the collocates used in the analysis to those occurring in the premodifier position
(i.e., L1) as in Fitzsimmon Doolan (2011, 2014), all 305 collocates of the core concept
JEZIK ‘language’ in the 5+ hits section of SERBCORP were included, regardless of their
syntactic function or position vis-à-vis the node (for collocation analysis parameters and
procedures, see Section 5.2.2 above; for a full list of collocates, see Appendix G). Also, it
is important to reiterate at this point that collocates identified through collocation analysis
(i.e., micro-collocates) have a broader definition in their use in EFA as they are
considered and counted even when they occur outside of the ‘horizon’ of five word-slots
to the left or right of the node word (cf. macro- or textual collocates, Mason & Platt,
2006, cited in Stubbs, 2010, p. 27). Put simply, all textual appearances of the relevant
collocates are counted in each text to be included in the analysis and not only those that
appear within a certain span of the node word as is normally done in collocation analysis.
After the list of relevant collocates had been identified, a custom PERL program
was used to count (and normalize to a text length of 1,000 words) the frequency of each
collocate as a variable in each text as an observation. This normalization enabled
comparisons of frequency counts across texts of different lengths. Normalized frequency
counts were then inputted into SPSS, to check for assumptions and factorability (using
the following procedure outlined in Tabachnick & Fidell, 2007).
The data were first checked for multivariate outliers by examining each text’s
score on the Mahalanobis variable. With α = .001 and df = 306 (the number of variables),
56
the critical value of χ2 was 388.178; 314 texts had values in excess of the critical value
and were therefore excluded from further analysis.24 The remaining steps in the
procedure were thus performed on the resulting smaller data set (n = 943). The deletion
of multivariate outliers also resulted in the removal of all occurrences of the variable
jednom ‘once’, so it too was removed leaving the number of collocate variables at k =
305. Next, assumptions for factor analysis were checked. This was done by first
checking for multi-colinearity and singularity which were assessed by examining
tolerance, condition indexes, and variance proportion items. Singularity was found for
two items, Monte and negro,25 so negro was excluded from further consideration, leaving
the number of variables at k = 304. Normality and linearity were not examined because
the results are used descriptively.
Once it was determined that the data set met assumptions, principal factor
analysis was run using principal axis factoring (n = 304, k = 943). To assess the
factorability of the data set, the correlation matrix (several bivariate correlations were ≥
.30), KMO value (middling at .648), and Bartlett’s test of sphericity (significant, χ2
[46665] = 100910.536, p < .000) were examined. Based on these results, the data set was
considered to be factorable.
The number of factors was determined by examining a) the scree plot (which
seemed to flatten out between Factors 12 and 14), and b) the number of factors with
initial eigenvalues over 1.0 (108) and 2.0 (30), neither of which was considered
parsimonious; the number of factors with initial eigenvalues over 3.0 was twelve. The
range of solutions between 12 and 14 factors was next explored using the Varimax
rotation.26 Fewer than five variables loaded highly on the thirteenth and fourteenth
57
factors of the thirteen- and fourteen-factor solutions, so those two solutions were
discarded as over-factoring. Although all twelve factors of the twelve-factor solution
were represented by at least 6 salient loadings (≥ |.30|), a large number of variables had
communalities lower than .2, while the solution accounted for only 17.46% of the total
variance in the data. To determine the optimal factor solution, a series of rotations was
performed, removing variables with communalities < .2 and re-examining factorability at
each step.
The preferred solution was the twelve-factor solution with k = 107 collocate
variables, which a) accounted for the most total variance in the data (34.13%), b)
produced factors consisting of positively loading variables only, c) did not include any
item communalities < .2, and d) had the highest KMO value (.801). This factor solution
was further assessed for internal consistency, which was measured by examining the
Cronbach’s alpha of all items loading highly on each factor. The internal consistency
analysis produced the following results: Factor 1 (α = .864), Factor 2 (α = .768), Factor 3
(α = .734), Factor 4 (α = .679), Factor 5 (α = .678), Factor 6 (α = .534), Factor 7 (α =
.684), Factor 8 (α = .720), Factor 9 (α = .777), Factor 10 (α = .501), Factor 11 (α = .572),
Factor 12 (α = .554).
The stability of the solution was investigated by comparing the factors and items
with salient loadings on those factors between the different rotations. All factors
appeared in all rotated solutions (with minor differences in composition and order), while
Factor 7 changed in the preferred solution from a factor consisting mostly of negatively
loading variables to one in which the same variables all had positive loadings.
Interpretation of the factors in the preferred solution was conducted by examining
58
the collocate variables with salient loadings (≥ .30) on each factor individually. To
identify texts representative of each factor, factor scores were estimated for each text
using regression analysis. This was followed by a qualitative analysis of texts with top
factor scores to confirm and elaborate the interpretations.
However, it should be noted here that a separate analysis suggested that texts
initially identified as multivariate outliers (i.e., texts with ‘extreme’ scores on multiple
variables) would be among the most representative texts for all factors. Factor scores
were therefore also calculated for the full data set (i.e., including multivariate outliers),
this time by first converting the normalized frequencies into z-scores to standardize them,
and then summing the standardized frequencies of all variables with salient loadings on a
factor for a factor score for each text.27 Variables with salient loadings on more than one
factor were only included in the computation of the factor score for the factor on which
they had the highest loading (for a rationale for this procedure, see Biber, 1988, pp. 93-
95). An examination of the highest factor scores based on z-scores revealed that the
multivariate outlier texts indeed had many of the highest factor scores on all factors. A
comparison of factor scores estimated by regression analysis and those produced by z-
scores, however, showed that the two methods of computation were highly comparable.
Because these texts are outliers only in an abstract statistical sense and certainly belong to
the ‘population’ of texts sampled here, a decision was made to retain them in the
remainder of the analysis.
5.3 Synchronic and Diachronic Variation (Analysis of Variance)
5.3.1 Theoretical background. The rationale for analysis of variation here is to
try to determine whether there are any statistically significant differences between a)
59
different publications in this sample, b) different types of articles approximating different
types of consciousness (see Section 2.4 and discussion at the end of Section 6.3.1), as
well as c) diachronically between individual years of publication over the subject period.
Fitzsimmons Doolan (2011, 2014), for example, has shown that mean factor scores can
be used to compare observations grouped by what she calls ‘registers’ (i.e., text types) in
order to examine any variation between them. Similar to this, I compare publications,
years of publication, and types of articles in terms of how they score on individual
factors, and thus discourses, in an attempt to determine whether there are any significant
differences in the discursive constructions of language between publications, ‘lay’ people
and ‘experts’, and over time.
5.3.2 Analytical parameters and procedures. Synchronic and diachronic
variation in language-related discourses were examined through a comparison of the
factor scores for each of the six selected language-related discourses (i.e., factors) of texts
grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year of publication: 2003,
2004, 2005, 2006, 2008; and c) type of article: general newspaper articles vs. letters-to-
the-editor. The distribution and variation of factor scores for each group of texts were
examined to determine the appropriate statistical procedures to be used. To examine the
distribution, histograms and the Shapiro-Wilk test were used. To examine homogeneity
of variance, the Levene test was used. Normality assumptions were violated for all
factors and homogeneity assumptions were violated for all factors except Factor 8 on
‘publication’, and Factors 4, 8, 10, and 11 on ‘type of article’, so appropriate non-
parametric tests (Kruskal-Wallis, Mann-Whitney U) were chosen. The results of the
statistical comparisons among groups are presented by factor in Sections 6.4.1-6.4.3. For
60
all tests, α = .017 to reflect a Bonferroni adjustment of splitting the standard α = .05 by
three because the same data were used to run three separate analyses.
5.4 Cluster Analysis
5.4.1 Theoretical background. Cluster analysis is a multivariate exploratory
statistical procedure used to group within a data set cases/observations (e.g., texts)
defined in terms of categorical variables (Biber & Staples, in press). Clustering texts into
groups based on similarity in scores on quantitative measures such as factors/discourses
offers an insight into discursive patterning independent of researcher inference. Thus,
cluster analysis is used here to examine the patterning of texts and factors/discourses with
respect to three independent variables (as in the synchronic and diachronic analyses
above): publication, year of publication, and type of article.
5.4.2 Analytical parameters and procedures. Cluster analysis was conducted
using the twelve factors and the agglomerative hierarchical cluster analysis (HCA)
method (for a detailed, step-by-step explanation, see Biber & Staples, in press). Once the
optimal number of clusters was determined, a one-way ANOVA was used to compare the
mean scores of the predictor variables (i.e., factors) and check for statistical significance.
Next, the mean scores of the twelve factors for each of the six identified clusters were
examined to determine which factors scored most highly on which clusters. Lastly, the
composition of each cluster was investigated by using the crosstabs function in SPSS and
the three independent variables (publication, year of publication, and general newspaper
articles vs. letters-to-the-editor).
5.5 Critical Discourse Analysis: Discourse-historical Approach
5.5.1 Theoretical background. Texts, as Fairclough (2010, p. 57) notes, “bear
61
the imprint of ideological processes and structures.” Although, as he further argues, it
may not be possible “to ‘read off’ ideologies from texts […] because meanings are
produced through interpretations of texts and texts are open to diverse interpretations”
(ibid.; see also van Dijk, 2006), large-scale corpus-based analysis of frequency can reveal
patterns indicative of the cumulative effect of the media representations of particular
topics and point to the beliefs and assumptions (i.e., ideologies) that underlie them.
According to Stubbs (1996, p. 196), “the study of recurrent wordings is of central
importance in the study of language and ideology and can provide empirical evidence
[of] how culture is expressed in lexical patterns.” Corpus-linguistic tools can thus
provide a map of a corpus based on lexical patterns suggesting discourses and underlying
ideologies and “pinpointing areas of interest for a subsequent close analysis” (Baker et
al., 2008, p. 284).
However, as numerous researchers have noted (e.g., Baker et al., 2008;
Blackledge & Pavlenko, 2002; Partington, 2010; Ricento, 2006; Vessey, 2013a), corpus
linguistic analysis, powerful though it is, does not in itself constitute discourse or
ideological analysis. Discourses and ideologies do not exist in a vacuum, but rather
‘work’ by establishing links with social structures and practices, as well as by making
explicit or implicit references to other texts (intertextuality) or other discourses
(interdiscursivity). Understanding and interpreting discourses and ideologies, therefore,
requires a social, cultural, historical, and political contextualization of the lexical patterns
uncovered with the help of corpus linguistic tools.
With its focus on discourse, ideology and power, as well as a flexible, eclectic
methodology (for an overview, see Wodak & Meyer, 2009), critical discourse analysis
62
(CDA) is ideally suited for such analysis and thus as a complement to quantitative lexical
analysis. In the simplest of terms, CDA relies on linguistic analysis of discourse to
uncover and expose relations of unequal power in society. In CDA, text is “conceived as
a semiotic entity, embedded in an immediate, text-internal co-text as well as intertextual
and sociopolitical context,” while discursive and linguistic data are seen “as a social
practice, both reflecting and producing ideologies in society” (Baker et al., 2008, pp. 279-
280). Similarly, discourse is conceptualized as “a complex of three elements: social
practice, discoursal practice (text production, distribution and consumption), and text”
(Fairclough, 2010, p. 59). The ultimate goal of CDA, then, is to move from a micro-
analytic perspective of text to a macro-analytic perspective of social practice to
demonstrate “how language functions in constituting and transmitting knowledge, in
organizing social institutions or in exercising power in different domains/fields in our
societies” (Wodak, 2004).
CDA, of course, has been criticized, sometimes severely, on a number of grounds,
but particularly for its methodological shortcomings such as selectivity or potential bias
in data collection procedures, small samples, and a lack of concern for replicability (e.g.,
Blommaert, 2005; Stubbs, 1997). Recognizing this, Baker et al. (2008) have proposed a
methodological ‘synergy’ between corpus linguistics and critical discourse analysis,
whereby the two methodological approaches complement one another and thus cancel out
each other’s limitations. Although such a synergy does not necessarily guarantee
research entirely free of researcher inference (Baker, 2011 cited in Fitzsimmons Doolan
2014, p. 61), if applied in a principled manner it can demonstrably minimize it (e.g.,
Fitzsimmons Doolan, 2014). In any case, all language use and all analysis are perforce
63
ideological in the sense that, arguably, ideologically neutral positions are impossible, and
as a ‘critical’ approach, CDA has always refused to claim ‘objectivity’ (Fairclough, 2001,
p. 5). Further, as Vessey (2013a) notes, the principle of researcher self-reflexivity (e.g.,
Pennycook, 2001) applies.
5.5.2 Analytical parameters and procedures. Although many of the analytical
techniques available from the different approaches developed within CDA since its
inception could find application here (again, see Wodak & Meyer, 2009 for a
methodological overview), they are not all equally useful for our present purpose, which
is to examine language ideologies identifiable from language-related public discourse
and, particularly, the argumentation strategies deployed in the negotiation of contested
ethnolinguistic identities. For example, micro-analytical categories developed within the
systemic-functional linguistics (SFL) such as passivization and agentivity (Halliday &
Mathiessen, 2004) do not seem to have a clear direct application here. The discourse-
historical approach (DHA, Wodak, 2001, 2004), developed to trace the constitution of a
particular stereotypical image in public discourse, however, offers several macro-
analytical tools that are potentially useful (cf. Vessey, 2013a). DHA draws on
argumentation theory to identify several discursive strategies of relevance to analysis of
identity-related discourse such as the referential/nomination strategy (the construction of
in- and out-groups through membership categorization by metaphor and metonymy), and
predication (justification of positive or negative attributions given to social actors)
(Wodak & Meyer, 2009, pp. 319-320). However, preliminary analysis suggested as
particularly relevant and useful the discursive strategy of argumentation (i.e., topoi).
Topoi are defined as explicit or inferable obligatory premises which make it possible to
64
connect arguments with the conclusion (Wodak & Meyer, 2009), or simply “the common-
sense reasoning typical for specific issues” (van Dijk, 2000 cited in Baker et al., 2008, p.
299). In line with the methodological synergy explicated above, representative texts
identified through quantitative analysis are subjected to DHA with the goal of identifying
and describing the argumentation strategies (i.e., topoi) and their common frame of
reference (i.e., their associated language ideologies).
Finally, a note on expectations of findings. Hardt-Mautner (1995), for example,
has argued that researchers need to do background research and form hypotheses prior to
carrying out CL-informed discourse analysis. In purely corpus-linguistic terms, this
means that CL-informed discourse analysis should be corpus-based rather than corpus-
driven (cf. Tognini-Bonelli, 2001). Indeed, in one sense, it would of course be difficult to
evaluate, much less interpret any patterns resulting from CL analysis without relevant
background knowledge. However, although (unlike Baker et al., 2008), I do not think
hypotheses in corpus-informed discourse analysis are always necessary (e.g., in corpus
data mining studies such as Mautner, 2007, to point to an obvious example), I do think it
is useful to comment on my expectations of findings here, if for no other reason than
because what a researcher ultimately decides to focus on in a study is to a certain extent
conditioned by his or her own ideological commitments.
Language-related discourse in the Balkans in the last twenty-five years has been
primarily concerned with the symbolic value of language in the processes of construction
and maintenance of ethnolinguistic identities and attainment of sovereignty. The breakup
of former Yugoslavia, as I have already noted above (Chapter 1), produced a climate of
pervasive, pathological contestation which continues, albeit with diminishing intensity, to
65
this day. I therefore expect to find evidence of a discourse of contestation focusing on
linguacultural authenticity and ethnolinguistic identities. In addition to the
methodological approach outlined above, my analysis will be informed by the language-
ideological theoretical framework of linguistic differentiation, i.e., the “similarities [and
differences] in the ways ideologies ‘recognize’ or misrecognize linguistic differences:
how they locate, interpret, and rationalize sociolinguistic complexity, identifying
linguistic varieties with ‘typical’ persons and activities and accounting for the
differentiations among them” (Irvine & Gal, 2000, p. 36). Irvine and Gal (2000) identify
three processes by which this differentiation works, which they call iconization, fractal
recursivity, and erasure. Iconization refers to the mapping of linguistic features onto
social images, positing a direct link between one or more linguistic features and (an
essentialist conceptualization of) the nature of the persons or social groups who display
them. Fractal recursivity involves the projection of binary oppositions (e.g.,
existence/non-existence) from one level of relationship to another (e.g., from local to
regional). Erasure here refers to the simplification of a sociolinguistic field through
which some persons, social groups, or sociolinguistic phenomena are rendered invisible
in ideologically and politically convenient ways. A central theme which I expect to
emerge is that of “imagined inherent, natural links between a unitary mother tongue, a
territory, and an ethnonational identity” (Irvine & Gal, 2000, p. 60), or rather how such
links are used as arguments in the discourse of contestation around ethnolinguistic
identity. In addition, reference will be made to Blommaert and Verschueren’s (1998)
concept of “homogeneism” which refers to a belief in the “impossibility of heterogeneous
communities and the naturalness of homogeneous communities” (p. 207), a belief which
66
is a corollary of essentialist discourses and ideologies.
6. Results
This chapter is divided into three sections. The first section presents the results of
the application of different quantitative methods and statistical analyses (keyword,
collocation, factor, analysis of variance, and cluster analysis). The second section
presents the results of qualitative analysis (supported by relevant quantitative evidence).
The results include observations about the relative effectiveness of individual methods;
patterns in synchronic and diachronic variation in language-related discourses as well as
variation between different sites of discursive (re)production; and ethnolinguistic
identity-related discourses and language ideologies identified in the mainstream Serbian
press from the subject period.
6.1 Keyword Analysis
6.1.1 Keyword analysis (5+ hits section of SERBCORP). As explained above,
several keyword analyses were conducted in this study. The first keyword analysis
involved a comparison between the 5+ hits section of SERBCORP as the research corpus
and SERBCOMP as the comparator corpus. This analysis identified a total of 151
positive and 40 negative key lemmas. Tables 10 and 11 show the top 50 positive key
lemmas and all negative key lemmas in the 5+ hits section of SERBCORP (the full list of
positive key lemmas is shown in Appendix D).
Unsurprisingly, the top positive key lemma is JEZIK ‘language’, which of course
simply reflects the selection criterion used to create the research corpus (5+ hits section
of SERBCORP). Even a cursory glance at the remainder of the top 50 positive key
lemmas shows that the discursive profile here is similar to that of SERBCORP as a whole
67
(Appendix C), with numerous references to semantic fields such as education and
literature. However, it is equally clear that there is one major difference between the two
lists: whereas the SERBCORP key lemma list includes few items referring to regional
(i.e., Central South Slavic) ethnolinguistic identities, the list of key lemmas in the 5+ hits
section of SERBCORP includes items referring to all major regional (as well as other)
ethnonyms and glottonyms: srpski ‘Serbian’ (7,309 occurrences), now the second most
key key lemma, as well as crnogorski ‘Montenegrin’ (rank 28, 696 occurrences),
srpskohrvatski ‘Serbo-Croatian’ (rank 29, 226 occurrences), Srbi ‘Serbs’ (rank 35, 1,432
occurrences), hrvatski ‘Croatian’ (rank 36, 684 occurrences), bosanski ‘Bosnian’ (rank
46, 266 occurrences), and bošnjački ‘Bosniak’ (rank 47, 223 occurrences). In addition,
there is a set of other lexical items suggested as potentially relevant by perusal of
randomly selected articles such as ćirilica ‘Cyrillic’ (rank 11, 590 occurrences), pismo
‘alphabet’ (rank 19, 1,243 occurrences), narod ‘people’ (rank 21, 1,510 occurrences),
manjina ‘minority’ (rank 37, 491 occurrences), and nacionalni ‘national’ (rank 40, 1,361
occurrences). Further, the remainder of the 151 key lemmas (Table D1, Appendix D)
includes a considerable number of similarly pertinent items such as Crnogorci
‘Montenegrins’ (rank 60, 213 occurrences), Vuk (Karadžić)28 (rank 64, 482 occurrences),
Hrvati ‘Croats’ (rank 65, 315 occurrences), SANU ‘Serbian Academy of Arts and
Sciences’ (rank 66, 235 occurrences), identitet ‘identity’ (rank 68, 331 occurrences), naziv
‘(language) label’ (rank 70, 442 occurrences), ime ‘name’ (rank 73, 943 occurrences),
politika ‘politics’ (rank 74, 1,475 occurrences), Crna Gora ‘Montenegro’ (ranks 76 and
77, 1,072 occurrences), tradicija ‘tradition’ (rank 92, 252 occurrences), and nacija
‘nation’ (rank 93, 305 occurrences). This confirms that the 5+ hits section of
68
SERBCORP may indeed be a better target for analysis here. Also, note that the negative
key lemmas (Table 11) exhibit semantic patterns very similar to those identified for
SERBCORP as a whole (Table C2, Appendix C), i.e., lack of references to national
political and state institutions, as well as finances.
69
Table 10
Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
5 English engleski 1220 0.11 271 722 4949.45 0.0000000000
6 literature književnost 1305 0.12 269 1158 4667.73 0.0000000000
7 mother (adj.) maternji 611 0.05 197 1 3712.29 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
9 dictionary rečnik 801 0.07 135 320 3576.38 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
11 Cyrillic ćirilica 590 0.05 74 188 2756.74 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
13 professor profesor 1524 0.14 307 5952 0.03 2313.27 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
22 education obrazovanje 893 0.08 187 2969 0.01 1558.60 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
24 education (profession) prosvete 563 0.05 219 965 1516.59 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
26 linguist lingvista 260 0.02 81 12 1488.69 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
29 Serbo-Croatian srpskohrvatski 226 0.02 70 6 1323.38 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
34 school (university) fakultet 995 0.09 89 5250 0.02 1101.39 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
36 Croatian hrvatski 684 0.06 163 2749 0.01 1010.51 0.0000000000
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000
70
40 national nacionalni 1361 0.12 88 9893 0.04 961.57 0.0000000000
41 French francuski 477 0.04 140 1488 875.85 0.0000000000
42 elementary osnovni 870 0.08 87 5077 0.02 849.01 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
46 Bosnian bosanski 266 0.02 64 432 736.64 0.0000000000
47 Bosniak bošnjački 223 0.02 70 254 725.60 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000
Table 11
Negative Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
1 government vlada 412 0.04 138 26794 0.12 -844.02 0.0000000000
2 Serbia Srbija 2922 0.26 180 93159 0.41 -700.54 0.0000000000
3 millions miliona 177 0.02 104 15619 0.07 -654.02 0.0000000000
4 president predsednik 469 0.04 184 23415 0.10 -518.40 0.0000000000
5 year godina 5350 0.48 591 139140 0.62 -371.61 0.0000000000
6 parties stranke 148 0.01 69 10165 0.05 -339.42 0.0000000000
7 against protiv 419 0.04 240 18154 0.08 -312.14 0.0000000000
8 day dan 912 0.08 151 30134 0.13 -256.64 0.0000000000
9 authorities vlast 510 0.05 106 17482 0.08 -167.75 0.0000000000
10 director direktor 321 0.03 139 11611 0.05 -130.52 0.0000000000
11 Kosovo Kosovu 106 69 5254 0.02 -114.91 0.0000000000
12 last prošle 169 0.02 138 7041 0.03 -111.63 0.0000000000
13 citizens građani 412 0.04 70 13444 0.06 -109.59 0.0000000000
14 time vreme 1681 0.15 523 43233 0.19 -106.00 0.0000000000
15 public javnost 294 0.03 84 10303 0.05 -105.73 0.0000000000
16 solution rešenje 193 0.02 84 7465 0.03 -99.91 0.0000000000
17 law zakon 687 0.06 112 19899 0.09 -99.40 0.0000000000
18 percent odsto 816 0.07 193 22713 0.10 -92.36 0.0000000000
19 after posle 953 0.09 490 25886 0.12 -91.46 0.0000000000
20 now sad 396 0.04 237 12467 0.06 -89.12 0.0000000000
21 affairs poslova 92 66 4199 0.02 -79.70 0.0000000000
22 choice izbor 309 0.03 110 9934 0.04 -76.78 0.0000000000
23 moment trenutku 153 0.01 124 5648 0.03 -67.10 0.0000000000
24 expect očekuje 89 64 3828 0.02 -64.83 0.0000000000
71
25 week nedelje 104 86 4240 0.02 -64.12 0.0000000000
26 larger veći 410 0.04 108 11978 0.05 -62.32 0.0000000000
27 decision odluka 463 0.04 85 13130 0.06 -58.97 0.0000000000
28 political politički 856 0.08 131 22155 0.10 -56.87 0.0000000000
29 yesterday juče 206 0.02 148 6719 0.03 -54.67 0.0000000000
30 group grupa 463 0.04 110 12934 0.06 -53.63 0.0000000000
31 case slučaj 499 0.04 125 13753 0.06 -52.89 0.0000000000
32 say reći 1087 0.10 247 27137 0.12 -52.38 0.0000000000
33 five pet 320 0.03 227 9446 0.04 -51.56 0.0000000000
34 place mesto 799 0.07 222 20390 0.09 -47.07 0.0000000000
35 problem problem 824 0.07 250 20854 0.09 -45.08 0.0000000000
36 parliament skupštine 134 0.01 71 4590 0.02 -43.92 0.0000000000
37 city grada 159 0.01 104 5193 0.02 -42.45 0.0000000000
38 end kraj 588 0.05 102 15327 0.07 -41.39 0.0000000000
39 immediately odmah 193 0.02 148 5868 0.03 -36.46 0.0000000001
40 six šest 201 0.02 152 6055 0.03 -36.17 0.0000000001
72
6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of
SERBCORP). The second keyword analysis involved a comparison between the 5+ hits
section of SERBCORP as the research corpus and the 1-4 hits section of SERBCORP as
the comparator corpus. This analysis identified a total of 90 positive and 14 negative key
lemmas (the full list of positive key lemmas is shown in Table E1 in Appendix E). The
top 50 positive key lemmas and all negative key lemmas in the 5+ hits section of
SERBCORP with 1-4 hits section of SERBCORP as the comparator corpus are presented
in Tables 12 and 13.
Quite expectedly, of course, JEZIK is the top lemma also here. Compared to the 1-
4 hits section of SERBCORP, the discursive profile of the 5+ hits section of SERBCORP
is defined by items related to regional ethnolinguistic identities and education, with some
(albeit considerably fewer than in SERBCORP) references to literature (including
translation) and culture. Importantly, however, items referring to the major regional
ethnolinguistic identities are now all in the top 30 key lemmas, while most other relevant
items identified toward the end of the previous section have moved up in the list.
Interestingly, the lemma zakon ‘law’ is now identified as a positive keyword (rank 90,
687 occurrences). Note also that this prominence of the relevant (i.e., Central South
Slavic) ethnonyms and glottonyms in the list further validates the sampling criterion as
well as the cutoff point of 5 hits for the lemma JEZIK per article (see Section 4 and
Appendix A). Interestingly, as before, the negative key lemmas (now considerably fewer
in number on account of the smaller size of the comparator corpus) indicate a consistent
absence of items referring to national political and state institutions.
73
Table 12
Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus
(by Keyness Score)
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000
74
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats Hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
44 Montenegrins Crnogorci 213 0.02 62 564 199.56 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
48 meaning značenje 262 0.02 79 867 179.80 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000
Table 13
Negative Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by
Keyness Score)
1 year godina 5350 0.48 591 61158 0.58 -195.54 0.0000000000
2 Kosovo Kosovu 106 69 2830 0.03 -155.68 0.0000000000
4 day dan 912 0.08 151 12274 0.12 -119.94 0.0000000000
6 Belgrade Beograd 1480 0.13 258 17753 0.17 -85.56 0.0000000000
7 after posle 953 0.09 490 12093 0.11 -85.48 0.0000000000
9 city grada 159 0.01 104 2584 0.02 -52.49 0.0000000000
10 during tokom 299 0.03 202 4110 0.04 -44.45 0.0000000000
11 now sad 396 0.04 237 5147 0.05 -41.82 0.0000000000
12 last prošle 169 0.02 138 2524 0.02 -38.56 0.0000000000
13 time put 943 0.08 341 10847 0.10 -36.64 0.0000000000
14 saw video 104 80 1711 0.02 -36.06 0.0000000001
75
Table 14
Positive Key Semantic Domains in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference
Corpus (by Rank)
Rank Ethnolinguistic Etnolingvistički Rank Education & Obrazovanje i Rank Literature & Književnost i Rank Foreign Strani jezici
identities identiteti science nauka translation prevođenje languages
1 language jezik 3 linguistic jezički 9 word word 13 English engleski
2 Serbian srpski 5 school škola 16 literature književnost 28 foreign strani
4 dictionary rečnik 6 mother (adj.) maternji 24 literary književni 46 second drugi
7 alphabet pismo 10 instruction nastava 27 learning učenje 52 German nemački
8 Cyrillic ćirilica 11 learn učiti 34 culture kultura 53 Spanish španski
12 Montenegrin crnogorski 14 linguist lingvista 35 speech govor 61 French francuski
18 Croatian hrvatski 15 professor profesor 36 speak govoriti 62 Russian ruski
19 use (n.) upotreba 17 subject predmet 38 translator prevodilac 68 understand razumeti
20 Serbo-Croatian srpskohrvatski 21 education (pro.) prosvete 42 translation prevod 69 be able to moći
25 people narod 22 education obrazovanje 47 expression izraz
26 Bosniak bošnjački 23 grade razred 48 meaning značenje
29 Bosnian bosanski 30 students (K-12) učenici 51 wrote pisali
32 national nacionalni 31 elementary osnovni 63 poetry poezija
33 Vuk (Karadžić) Vuk 40 science nauka 66 writer pisac
37 minority manjina 43 class period čas 72 cultural kulturni
39 Croats Hrvati 45 doctor dr 81 poet pesnik
41 Serbs Srbi 49 school (univ.) fakultet
44 Montenegrins Crnogorci 54 exam ispit
50 label naziv 58 children deca
55 SANU SANU 59 knowledge znanje
56 name ime 60 scientific naučni
57 identity identitet 71 academician akademik
64 introduction uvođenje 73 example primer
65 percent odsto 76 sentence rečenica
67 Monte(negro) Gora 78 letters (a, b, c) slova
70 (Monte)negro Crna 79 students (K-8) đaci
74 nation nacija 82 schooling školovanje
75 Vojvodina Vojvodini 83 program/curric. program
77 today danas 85 book knjiga
80 same isti 87 parents roditelji
84 difference razlika
86 century vek
88 history istorija
89 change (v.) menja
90 law zakon
76
Based on the patterns identified so far in this section, it is clear that the 5+ hits
section of SERBCORP represents a concentrated discourse exhibiting numerous lexical
items and patterns relevant to an exploration of links between language-related discourses
and language ideologies and ethnolinguistic identities and ethnonationalism. However, it
is also quite clear that keywords identified for topically heterogeneous research corpora
such as SERBCORP as a whole or the 5+ hits section of SERBCORP are not as insightful
as those identified for topically homogeneous research corpora such as, for example,
parliamentary debates on a single issue (e.g., Baker, 2006) or student evaluations of
university instructors (e.g., Subtirelu, 2015).29 In other words, despite the identification
of promising lexical items and patterns demonstrated above, a decision about where to
begin analysis or what to focus on would still have to depend on researcher inference.
Therefore, in order to get a better sense of the discursive profile of the 5+ hits
section of SERBCORP, I classified all 90 key lemmas into semantic fields based on their
predominant semantic values in the corpus (confirmed by concordance lines in
ambiguous cases). This semantic classification resulted in four distinct semantic fields
with different numbers of items in each: ethnolinguistic identities (the largest), education
and science, literature and translation, and foreign languages (the smallest; see Table 14).
So, based on this semantic patterning, we can conclude that Serbian newspaper discourse
explicitly focused on language is dominated by references to Central South Slavic
ethnolinguistic identities, education, and, to a lesser extent, literature and translation and
foreign languages. From this, it is possible to further extrapolate that this general
language-related discourse is focused on contested (ethnolinguistic identities) and
uncontested (foreign languages, translation) differences and identities, as well as
77
education and literature as the primary sociocultural domains with respect to which
language is explicitly and overtly thematized. This is an important finding not only
because it gives us a sense of the general language-related (small ‘d’) discourses in
circulation here, but also because a very similar discursive profile emerged from
exploratory factor analysis (see Section 6.3).
Here, it would be possible, as is typically done, to make a selection based on
researcher inference of items to pursue further, for example through concordance
analysis. Let us briefly illustrate the problems with this approach using a small set of
lexical items identified by both keyword and collocation analysis as potentially
interesting examples. In addition to the obviously important Central South Slavic ethno-
and glottonyms, in Section 6.1.1 the following items were identified as some of the
pertinent key lemmas in this corpus: Vuk (Karadžić) (rank 64, 482 occurrences), SANU
‘Serbian Academy of Arts and Sciences’ (rank 66, 235 occurrences), ime ‘name’ (rank 73,
943 occurrences), and nacija ‘nation’ (rank 93, 305 occurrences). Again, the same items
were also identified as significant collocates of the lemma JEZIK: Vuk (Karadžić) (rank
113, 75 occurrences, 50 texts), SANU ‘Serbian Academy of Arts and Sciences’ (rank 120,
70 occurrences, 40 texts), ime ‘name’ (rank 36, 208 occurrences, 96 texts), and nacija
‘nation’ (rank 165, 51 occurrences, 37 texts).
As mentioned above, this selection is based both on researcher inference (which is
in turn itself based on background knowledge and a close reading of large numbers of
texts in the corpus) and quantitative evidence (results of keyword and collocation
analyses). Once identified by quantitative analyses, Vuk (Karadžić) and SANU were thus
deemed to be of potential interest because both Vuk Karadžić as an individual and SANU
78
as an institution have been historically closely linked to issues of language and
ethnolinguistic identity in the Central South Slavic area, which are the primary focus of
this study (for a detailed discussion, see Section 7.3). Similarly, the lexical item ime
‘name’ was deemed to be of potential interest because the naming of the different
varieties of Central South Slavic has been at the center of the public debate and
contestation related to ethnolinguistic identities since the breakup of Yugoslavia. The
lexical item nacija ‘nation’, finally, is an obvious choice of lexical item to investigate in a
study of links between language ideologies and ethnonationalism.
However, as can be seen, even a small set of relevant items presents the
researcher with thousands of occurrences (and hundreds of concordance lines) of
potential interest. The problem of how to deal with large numbers of potentially
interesting occurrences and concordance lines is exacerbated by the extensive inflectional
morphology of Central South Slavic in that, unlike English for example, semantic
patterns are broken up into numerous subpatterns corresponding to individual lemma
forms of both the node word and any collocates (e.g., nacija, nacije, naciji, etc.). This
atomizes the overall semantic profile of the lexical item but also renders lexical software
designed primarily with languages with simpler inflectional morphologies in mind, such
as WST, much less useful. Furthermore, in contrast to topically homogeneous corpora,
the 5+ hits section of SERBCORP features articles that are topically rather
heterogeneous, which presents more of a challenge for pattern analysis based on
concordance lines. Thus, in addition to a total of 482 concordance lines exhibiting
minimally informative lexical patterns, the lexical item Vuk, for example, has a mere 17
significant lexical collocates which show no obvious or easily discernible discursive
79
patterns of import for either language ideology or ethnonationalism.30 As will be shown
in Section 7.3, it is the fact that Vuk Karadžić as a historical figure is featured so
prominently in this corpus rather than any particular concordance or collocational
patterns associated with this lexical item that is important here. Put differently, lexical
items can have high discursive and ideological significance without exhibiting any
explicit collocational (or other) patterns. Concordance analysis therefore does not seem
to be a particularly effective way to either identify or present macroscopic lexical patterns
that can help profile a corpus and capture discourses.
It should be noted, however, that concordance analysis can still be useful for
microscopic lexical analysis during the preliminary stages of discursive corpus profiling
or when confirmation or elaboration of macroscopic lexical patterns are required. For
example, the verb postoji ‘exists’ (rank 48, 161 occurrences, 126 texts) was identified as a
significant collocate of the lemma JEZIK in the 5+ hits section of SERBCORP (see Table
17). Postoji was deemed to be of potential interest because it can be an explicit
expression of contestation as it is often used with a negator (ne postoji ‘does not exist’)
and applied to non-Serb varieties of Central South Slavic. The ‘concordance’ tool in
WST showed that, as a collocate of JEZIK, postoji was most often found in the R2
position.
As can be seen from Figure 3, (ne) postoji ‘(does not) exist’ is indeed applied to
Bosnian (lines 2 and 3) and Montenegrin (lines 4, 5, 6, 7) in an explicit manifestation of
the discourse of contestation.31 Interestingly, (ne) postoji is also applied to Serbian (line
16), but this was the only occurrence in conjunction with Serbian which failed to show up
in any of the subsequent analyses, including qualitative analyses of integral texts.
80
Figure 3. Concordance lines for postoji ‘exists’ in the 5+ hits section of SERBCORP
The conclusion we can draw from this brief demonstration, therefore, is that, if we
are interested in principled decision making and effective methods, analysis of
concordance lines of limited sets of lexical items selected on the basis of researcher
inference is clearly unsatisfactory as a tool for macroscopic discursive corpus profiling or
identification of representative texts. Instead, what is needed is an objective method of
analysis based on statistically significant patterns that can identify not only lexical foci
for analysis (i.e., discourses) but also individual representative texts for follow-up
qualitative analysis.
6.1.3 Keyword associates. The ‘keyword’ function in WST offers a technique
that represents a step in this direction. Based on keyword analysis, a database can be
created of items that are key in several texts in the corpus (key-keywords). This is done
by running separate keyword analyses for each individual text rather than the corpus as a
whole, and the result is information on which keywords are key in a researcher-
81
determined minimum number of texts. In addition to this, the function computes which
keywords (associates) co-occur with each of the key-keywords, forming lexical sets
(clumps) which can be suggestive of discourses and potentially also ideologies. Table 15
shows a list of 20 key-keywords most directly relevant to ethnolinguistic identities and
their clumps (top ten most frequent associates with MI scores ≥ 3). As can be seen, this
is quite an improvement over the keyword list as most key-keywords have discursively
indicative sets of associates. For example, the key-keyword Srbi ‘Serbs’ co-occurs with
the following keywords: Serbian, Croats, Croatian, people, name, literature, national,
academy, Croatia, book, professor, school, linguistic, literary, war, and learn.32 Clearly,
then, texts in which ‘Serbs’ appears as a keyword tend to discuss the Croats and Croatian
language, the language’s name and the (1990s’) war, all of which point to the (big ‘D’)
discourse of contestation mentioned toward the end of the methods section. Further,
there are indications of a discussion of the national academy of sciences and arts (i.e.,
SANU) and linguistics, as well as of education and literature more generally. Similarly,
the key-keywords Crna and Gora ‘Montenegro’, for example, co-occur with
Montenegrin, Serbian, mother (tongue), Montenegrins, label, and official, which suggests
a discourse (of contestation) pertaining to the recent change in language policy in
Montenegro, whereby the erstwhile official language, Serbian, was first replaced by an
identity-neutral label ‘mother tongue’ and then, upon independence from Serbia, by
‘Montengrin’.
To facilitate the methodological comparison further, I also checked if the most
salient (i.e., highest-loading) variables in the 12 factors identified by exploratory factor
analysis (see Section 6.3) showed up as key-keywords. Perhaps unsurprisingly, they all
82
do (Table 16). Thus, the most salient variable in Factor 2 (Cyrillic-Only), alphabet co-
occurs with Cyrillic, Serbian, Latin, official, professor, use (n.), school, English, high
school, and book here and with Cyrillic (n./adj.), use (n.), official, Latin, constitution,
protection, association, and law in Factor 2. The most salient variable in Factor 11
(Officialization of Bosnian), Bosnian, co-occurs with Bosniak, national, minority,
subject, and education here and with Bosniak, elective, element, national, and board in
Factor 11. Thus, the associates of alphabet and Factor 2 both seem to suggest a discourse
of endangerment concerned with the protection of the Serbian Cyrillic from the
(perceived) threat posed by the widespread use of the Latin alphabet in Serbia. The
associates of Bosnian and Factor 11, similarly, suggest a discourse of (contestation) of
minority language rights concerned with the recent official recognition of Bosnian as a
minority language in Serbia and its introduction in schools. The overlap between the
associates and factors, as can be seen, is considerable.
The problem with this analytical technique, however, arises when one decides to
explore these sets of associates further. The number of associates, for one, can be very
large, depending on their overall frequency in the corpus, which means that it may be
necessary to start focusing on the most frequent items as with keyword analysis proper.
Further, it would, for instance, be interesting to look up some (or all) of the texts in which
associates co-occur with a key-keyword for in-depth qualitative analysis. Unfortunately,
this is not possible as there is currently no way to obtain this information automatically
using the ‘associates’ tool (Mike Scott, personal communication). One could, of course,
look for this information manually, but with a research corpus of this size, that is clearly
undesirable.
83
Table 15
Ethnolinguistic Identity-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits
Section of SERBCORP as the Reference Corpus (by Rank/Number of Texts)
N KW KW Texts % Overall No. Associates (English) Associates (Serbian)

(English) (Serbian) Freq. Ass.
2 Serbian srpski 150 18.27 3051 210 literature·linguistic·Cyrillic·literary·alphabet· književnost·jezički·ćirilica·književni·
dictionary·book·Serbs·mother (adj.) ·school pismo·rečnik·knjiga·Srbi·maternji·škola
18 Montenegrin crnogorski 38 4.63 393 98 (Monte)negro·Serbian·Monte(negro) ·Montenegrins· Crna·srpski·Gora·Crnogorci· maternji·
mother (adj.)·literature·linguistic·literary·poetry· književnost·jezički·književni·poezija·
professor·nation·national·poet·orthography·renaming profesor·nacija·nacionalni·pesnik·
pravopis·preimenovanje
22 Serbs Srbi 34 4.14 506 101 Serbian·Croats·Croatian·people·name·literature· srpski·Hrvati·hrvatski·narod·ime·
national·academy·Croatia·book·professor·school· književnost·nacionalni·akademija·Hrvatska
linguistic·literary·war·learn ·knjiga·profesor·škola·jezički·književni·rat
·učiti
31 (Monte)negro Crna 29 3.53 382 90 Monte(negro)·Montenegrin·Serbian· mother (adj.)· Gora·crnogorski·srpski·maternji·Crnogorci
Montenegrins·label·official ·naziv·službeni
33 Monte(negro) Gora 27 3.29 381 89 (Monte)negro·Montenegrin·Serbian·mother (adj.)· Crna·crnogorski·srpski·maternji·Crnogorci
Montenegrins·label·official ·engleski·naziv·službeni
41 national nacionalni 24 2.92 319 91 minority·Bosnian·literature·Serbs·Serbian·learn· manjina·bosanski·književnost·Srbi·srpski·
Croats·school·Bosniak·Montenegrin·and·identity· učiti·Hrvati·škola·bošnjački·crnogorski·
mother (adj.)·nation·people·subject i·identitet·maternji·nacija·narod·predmet
46 Croatian hrvatski 22 2.68 234 93 Serbian·Serbs·linguistic·Bosniak·Croats·Croatia· srpski·Srbi·jezički·bošnjački·Hrvati·
literary·name·literature·poetry·school·dialect·English· Hrvatska·književni·ime·književnost·
and·book·minority·mother (adj.)·poem· poezija·škola·dijalekat·engleski·i·knjiga·
Serbo-Croatian·learn manjina·maternji·pesma·srpskohrvatski·
učiti
53 Bosnian bosanski 16 1.95 143 49 Bosniak·national·minority· subject·education bošnjački·nacionalni·manjina·predmet·
(profession)·Serbian·learn·literary·school prosvete·srpski·učiti·književni·škola
58 Montenegrins Crnogorci 15 1.83 113 55 Montenegrin·Serbian·(Monte)negro·Monte(negro)· crnogorski·srpski·Crna·Gora·nacija
nation
63 Serbo- srpskohrvatski 15 1.83 73 52 Croatian·dictionary·linguistic·Serbian·academy· srpski·jezički·rečnik·akademija·hrvatski·
Croatian language književni
83 Bosniak bošnjački 11 1.34 85 56 Serbian·Bosnian·Croatian·Bosniaks·linguistic·minority· srpski·bosanski·hrvatski·Bošnjaci· jezički·
mother (adj.)·national·standardization·school manjina·maternji·nacionalni·
standardizacija·škola
85 name ime 11 1.34 138 43 Serbs·Serbian·Croatian·Croatia·literature Srbi·srpski·hrvatski·Hrvatska·književnost
87 nation nacija 11 1.34 91 50 minority·Serbian·Montenegrins·Montenegrin·national manjina·srpski·Crnogorci·crnogorski·
nacionalni
93 Croats Hrvati 10 1.22 72 37 Serbs·Croatian·Serbian·national Srbi·hrvatski·srpski·nacionalni
84
N KW KW Texts % Overall No. Associates (English) Associates (Serbian)
(English) (Serbian) Freq. Ass.
101 identity identitet 9 1.10 72 46 national nacionalni
106 Croatia Hrvatska 8 0.97 107 30 language·I·Serbs·name·Croatian jezik·ja·Srbi·ime·hrvatski
110 renaming preimenovanje 8 0.97 41 45 high school·professor·Montenegrin·mother (adj.) gimnazija·profesor·crnogorski·maternji
111 Serbia Srbija 8 0.97 417 49
161 Bosnia Bosna 5 0.61 35 19
164 Herzegovina Hercegovina 5 0.61 37 17
169 nationalism nacionalizam 5 0.61 54 17
195 ethnic etnički 4 0.49 23 15 minority manjina
85
Table 16
Factor-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of
SERBCORP as the Reference Corpus (by Rank)
F KKW KKW Texts % Overall Freq. No. Ass. Associates (English) Associates (Serbian)
(English) (Serbian)
1 grade razred 47 5.72 378 117 school·subject·learn·instructor·education škola·predmet·učiti·nastavnik·prosvete·
(profession)·instruction·English·first-graders·class period· nastava·engleski·prvaci·čas·đaci
students (K-8)
2 alphabet pismo 49 5.97 657 108 Cyrillic·Serbian·Latin·official·professor·use (n.)· ćirilica·srpski·latinica ·službeni·profesor·
school·English·high school·book upotreba·škola ·engleski·gimnazija·knjiga
3 exam ispit 24 2.92 189 69 school·mathematics·students (K-12)·students (K-8)· škola·matematika·učenici·đaci·engleski·
English·high (school)·mother (adj.)·Serbian·high maternji·srpski·gimnazija·nastavnik·prijemni
school·instructor·entrance
4 school fakultet 25 3.05 433 87 professor·instruction·instructor·university·school·English· profesor·nastava·nastavnik·univerzitet·škola·
(university) education·student·literature·program of study engleski·obrazovanje·student·književnost·
studija
5 minority manjina 29 3.53 284 101 mother (adj.)·national·school·Bosnian·minority maternji·nacionalni·škola·bosanski·manjinski·nacija
(n.) (adj.)·nation·education· rights·subject·learn ·obrazovanje·prava·predmet·učiti
6 Croatia Hrvatska 8 0.97 107 30 Croatian·Serbs·name·I·literature hrvatski·Srbi·ime·ja·književnost
7 book knjiga 66 8.04 997 156 Serbian·literature·writer·literary·poetry·poem·poet· srpski·književnost·pisac·književni·poezija·
professor·dictionary·award pesma·pesnik·profesor·rečnik·nagrada
8 Montenegri crnogorsk 38 4.63 393 98 (Monte)negro·Serbian·Monte(negro)·Montenegrins·moth crna·srpski·gora·crnogorci·maternji·
n i er (adj.)·literature·linguistics·literary·poetry·professor književnost·jezički·književni·poezija·profesor
9 teach predavati 6 0.73 45 30 instructor nastavnik
10 linguistic jezički 60 7.31 367 122 Serbian·dictionary·literary·dialect·English·speech·literatu srpski·rečnik·književni·dijalekat·engleski·
re·people·Croatian·school govor·književnost·narod·hrvatski·škola
11 Bosnian bosanski 16 1.95 143 49 Bosniak·national·minority·subject·education bošnjački·nacionalni·manjina·predmet·
(profession)·Serbian·learn·literary·school prosvete·srpski·učiti·književni·škola
12 center centar 4 0.49 56 13
86
WST includes several other tools for the exploration of co-occurrences among
keywords (see Scott, 2014a, for details) such as the ‘keywords plot’ and ‘links’, which
calculate and plot a keyword’s collocates (i.e., keywords that occur within a researcher-
defined collocation span of the chosen keyword). However, this analysis only works with
individual texts so its usefulness for our purposes is limited. Another option is to take a
phrasal approach to keywords and calculate keyword clusters (i.e., n-grams), but this
technique only uses keywords which makes it highly unlikely to produce a sufficient
number of observations for analysis. With this, my exploration of keyword analysis here
is complete. In the next section, I examine the results of collocation analysis as applied
in this study.
6.2 Collocation Analysis
This section presents the results of collocation analyses of the lemma JEZIK
conducted on the 5+ hits section of SERBCORP (for results and discussion of collocation
analysis performed on SERBCORP as a whole, see Appendix F). Lists of collocates are
presented first by frequency and then also by MI score. As with keywords, only the top
50 collocates are shown in the tables in the body of the text; full lists are presented in
appendices. Lastly, a sample of the most frequent n-grams in the 5+ hits section of
SERBCORP is shown and examined.
6.2.1 Collocation analysis (5+ hits section of SERBCORP). Collocation
analysis of the lemma JEZIK conducted on the 5+ hits section of SERBCORP produced a
total of 305 lemma collocates of the lemma JEZIK (Appendix G, Tables G1 and G2).
Table 17 shows the top 50 lemma collocates by frequency. As in SERBCORP (Appendix
F, Table F1), the most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’
87
with 3,449 occurrences in 802 texts.
The prominent semantic fields are similar to those in SERBCORP, with most
high-ranking items suggesting a discourse of construction and maintenance of national
identity (see Appendix F). However, items referring to literature are now unaccompanied
by items referring to translation, while the semantic field of culture remains marginal
(one item). The semantic fields of school and foreign languages are somewhat less
prominent also. Considering the prominence of items referring to translation and
education above (particularly among key lemmas in SERBCORP), it seems safe to
conclude at this point that these two fields account for much of the information ‘loss’ due
to sampling. On the other hand, in line with the above demonstrated trend of increased
relevance of articles with higher numbers of hits for the lemma JEZIK, srpski ‘Serbian’,
hrvatski ‘Croatian’ (rank 15, 363 occurrences), and crnogorski ‘Montenegrin’ (rank 16,
351 occurrences) are joined by srpskohrvatski ‘Serbo-Croatian’ (rank 40, 182
occurrences) and bosanski ‘Bosnian’ (rank 46, 165 occurrences) in the top 50. Other
pertinent items remain: narod ‘people’ (rank 27, 252 occurrences) and, nacionalni
‘national’ (rank 31, 222 occurrences), ime ‘name’ (rank 36, 208 occurrences), postoji
‘exists’ (rank 48, 161 occurrences), with the addition of novi ‘new’ (rank 25, 254
occurrences) and pitanje ‘question’ (rank 45, 168 occurrences). Pismo ‘alphabet’ (rank
10, 457 occurrences) and rečnik ‘dictionary’ (rank 50, 158 occurrences), finally, suggest
a discourse on language policy, and specifically selection and codification (Haugen,
1972).
88
Table 17
Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of
SERBCORP (by Frequency)
N Collocate (English) Collocate (Serbian) MI score Texts Total

1 Serbian srpski 8.80 802 3449
2 foreign strani 13.02 346 1011
3 that taj 7.64 484 791
4 English engleski 12.63 323 693
5 mother maternji 8.86 296 636
6 own svoj 7.70 360 608
7 speak govoriti 10.26 321 552
8 second drugi 11.05 307 507
9 itself sam 5.11 338 494
10 alphabet pismo 9.19 165 457
11 literature književnost 7.58 220 456
12 one jedan 6.22 281 454
13 all svi 8.24 290 406
14 our naš 8.91 257 401
15 Croatian hrvatski 8.10 157 363
16 Montenegrin crnogorski 12.77 112 351
17 learn učiti 9.51 175 340
18 literary književni 9.16 176 335
19 school škola 6.97 179 318
20 they oni 5.86 245 310
21 official službeni 14.15 107 285
22 use upotreba 9.35 136 274
23 this ovaj 6.53 211 265
24 instruction nastava 8.30 149 262
25 new nov 10.26 172 254
26 word reč 5.86 180 253
27 people narod 7.09 153 252
28 year godina 5.36 186 239
29 professor profesor 8.14 145 239
30 culture kultura 8.74 153 231
31 national nacionalni 8.64 126 222
32 first prvi 6.08 154 222
33 learning učenje 7.28 136 222
34 French francuski 11.78 115 215
35 his njegov 7.45 165 213
36 name ime 10.61 96 208
37 two dva 5.72 138 206
38 Russian ruski 7.98 89 202
39 say kazati 9.73 152 187
40 Serbo-Croatian srpskohrvatski 8.76 96 182
41 German nemački 7.66 112 177
42 subject predmet 11.06 103 170
43 book knjiga 5.41 118 169
44 their njihov 7.48 135 168
45 question pitanje 5.88 108 168
46 Bosnian bosanski 9.30 54 165
47 written pisan 11.83 119 163
48 exists postoji 9.38 126 161
49 know znati 9.44 122 159
50 dictionary rečnik 6.66 77 158
89
Table 18
Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of
SERBCORP (by MI Score)

2 different različit 13.87 61 83
3 Slovene slovenski 13.86 58 101
4 learn naučiti 13.36 66 78
5 department katedra 13.25 57 91
7 use koristiti 12.79 55 76
10 introduce uvesti 12.45 41 55
11 special poseban 11.94 71 93
14 people’s narodni 11.67 76 143
15 say reći 11.64 64 74
16 nation nacija 11.23 37 51
17 Latin latinica 11.10 27 41
19 second drugi 11.05 307 507
20 official zvaničan 10.91 72 112
21 common zajednički 10.79 64 83
22 group grupa 10.75 43 65
23 difference razlika 10.73 35 44
24 standardization standardizacija 10.72 56 117
25 name ime 10.61 96 208
26 poetry poezija 10.49 63 50
27 see videti 10.42 23 27
28 section odeljenje 10.40 38 59
29 use služiti 10.39 29 34
31 new nov 10.26 172 254
32 contemporary savremeni 10.23 73 106
33 spoken govorni 10.14 28 39
34 violence nasilje 10.11 23 28
35 desire želeti 10.03 49 53
36 Hungarian mađarski 9.85 51 108
37 Albanian albanski 9.84 37 76
38 say kazati 9.73 152 187
39 engage baviti 9.66 39 46
40 elementary osnovni 9.64 107 138
41 system sistem 9.58 29 34
42 renaming preimenovanje 9.54 45 94
43 learn učiti 9.51 175 340
44 title naslov 9.50 26 38
45 dialect dijalekat 9.50 46 78
46 her njen 9.47 29 33
47 written napisan 9.45 29 35
48 know znati 9.44 122 159
49 basis osnov 9.41 60 72
50 translate prevoditi 9.40 42 56
90
The top most significant collocates (Table 18) present a more opaque pattern, with
considerably more attributive adjectives in the list, as above: službeni ‘official’ (rank 1,
285 occurrences) and zvaničan ‘official’ (rank 20, 112 occurrences), različit ‘different’
(rank 2, 83 occurrences), strani ‘foreign’ (rank 6, 1,011 occurrences), poseban ‘special’
(rank 11, 93 occurrences), pisan ‘written’ (rank 47, 35 occurrences), zajednički
‘common’ (rank 21, 83 occurrences), nov ‘new’ (rank 31, 254 occurrences), savremeni
‘contemporary’ (rank 32, 106 occurrences), govorni ‘spoken’ (rank 33, 39 occurrences).
However, we do note references to minorities: albanski ‘Albanian’ (rank 37, 76
occurrences) and mađarski ‘Hungarian’ (rank 36, 108 occurrences), and a set of most
pertinent items suggesting identity contestation such as crnogorski ‘Montenegrin’ (rank
8, 351 occurrences), ime ‘name’ (rank 25, 208 occurrences), and preimenovanje
‘renaming’ (rank 42, 94 occurrences). New, previously unattested items include nacija
‘nation’ (rank 16, 51 occurrences), latinica ‘Latin (alphabet)’ (rank 17, 41 occurrences),
grupa ‘group’ (rank 22, 65 occurrences), and nasilje ‘violence’ (rank 34, 28 occurrences).
In conclusion, similar to keywords, collocation patterns present the analyst with a
wealth of information which is difficult to explore in an efficient but principled manner,
although, again, frequency does seem to be a better guide to insights into discursive
patterns than MI scores. However, similar to keyword analysis, collocation analysis
offers another analytical technique33 which has the potential to increase its usefulness for
the identification of discourses and ideologies, to which we now turn.
6.2.2 N-grams. Given the demonstrated higher relevance of the 5+ hits articles
for our purposes here, n-gram analysis was conducted on the 5+ hits section of
SERBCORP only. As expected, the total number of identified recurrent phrases was
91
large (3,753), and most were bigrams (2,381). The list of bigrams presented here (Table
19) is a researcher-selected sample from the top 100 most frequent phrases in every
category.
Again, similar to keyword associates, this represents an improvement on collocate
lists as we can now see the node word (here, different forms of the lemma JEZIK) in a
variety of phrasal contexts. Further, although this n-gram analysis is based on the
concordance lines of different forms of the lemma JEZIK, we can see a large number of
relevant phrases that do not contain any of the forms of this lemma. Because n-grams are
based on lexical patterns identified by collocation analysis, we are likely to see a lot of
the same items, only contextualized. Indeed, if we look at the bigrams here, we can see
many of the same lexical items and traces of that same discourse of construction and
maintenance of national identity in the (implied) opposition of phrases such as srpski
jezik ‘Serbian language’, maternji jezik ‘mother tongue’, svoj jezik ‘own language’, naš
jezik ‘our language’ (ranks 1, 2, 5, 7), on the one hand, and the phrases strani jezik
‘foreign language’, engleski jezik ‘English language’, and drugi (strani) jezik ‘second
(foreign) language’ (ranks 3, 4,11), on the other. We also note the high prominence of
phrases pointing to regional ethnolinguistic identities: srpski jezik ‘Serbian language’,
crnogorski jezik ‘Montenegrin language’, hrvatski jezik ‘Croatian language’, and
bosanski jezik ‘Bosnian language’ (ranks 1, 6, 9,10). The top trigrams, for example,
confirm the association between language and literature (jezik i književnost ‘language and
literature, rank 15) and language and culture (jezik i kulturu ‘language and culture, rank
26), but also point to a discourse related to language policy (jezik i pismo ‘language and
alphabet’, rečnik srpskog jezika ‘dictionary [of the] Serbian language’, odbor za
92
standardizaciju ‘board for standardization’, ranks 16, 18, 19) and a discourse of
contestation (preimenovanje srpskog jezika ‘renaming [of the] Serbian language’, o
preimenovanju jezika ‘about [the] renaming [of] language’, ranks 22, 24).
The top n-grams with four, five and six constituents confirm these and other
already attested patterns. Thus, in the 4-gram section, we see more evidence of a
discourse on minority language rights (na jezicima nacionalnih manjina ‘in [the]
languages [of] national minorities’, rank 32), endangerment (za zaštitu srpskog jezika ‘for
[the] protection [of the] Serbian language’, za odbranu srpskog jezika ‘for [the] defense
[of the] Serbian language’, ranks 35, 38), and contestation (preimenovanje srpskog jezika
u ‘renaming [of the] Serbian language into’, ne postojanju crnogorskog književnog ‘non-
existence [of the] Montenegrin literary’, o ne postojanju crnogorskog ‘about [the] non-
existence [of] Montenegrin’, ranks 34, 39, 42). In the 5-gram section, we see traces of a
discourse of language policy involving institutional control over language (odbora za
standardizaciju srpskog jezika ‘board for [the] standardization [of the] Serbian language’,
zakon o službenoj upotrebi jezika ‘law on [the] official use [of] language’, institut za
srpski jezik SANU ‘SANU institute for [the] Serbian language’, ranks 45, 54, 59), as well
as discourses of endangerment (e.g., primena latiničnog pisma srpskog jezika ‘use [of
the] Latin alphabet [of the] Serbian language’, sačuvati sopstveni jezik [i] njegovu
posebnost ‘preserve [one’s] own language [and] its autonomy’, rat za srpski jezik i ‘war
for [the] Serbian language and, ranks 58, 60, 65), contestation (e.g., srpski jezik
preimenuju u crnogorski ‘rename [the] Serbian language into [the] Montenegrin’, rank
63), and ethnolinguistic identity (svoju nacionalnost i svoj jezik ‘[one’s] own nationality
and [one’s] own language’, rank 64).
93
Table 19
Sample of the Most Frequent N-grams in the 5+ Hits Section of SERBCORP (by Number
of Constituents and Frequency)
N N-gram (English) N-gram (Serbian) Freq.

2-grams
1 Serbian language srpski jezik 955
2 mother tongue maternji jezik 267
3 foreign language strani jezik 258
4 English language engleski jezik 173
5 own language svoj jezik 120
6 Montenegrin language crnogorski jezik 96
7 our language naš jezik 92
8 literary language književni jezik 89
9 Croatian language hrvatski jezik 81
10 Bosnian language bosanski jezik 72
11 second language drugi jezik 69
12 one language jedan jezik 68
13 Russian language ruski jezik 59
14 official language službeni jezik 57
3-grams
15 language and literature jezik i književnost 149
16 language and alphabet jezik i pismo 55
17 second foreign language drugi strani jezik 31
18 dictionary (of the) Serbian language rečnik srpskog jezika 23
19 board for standardization odbor za standardizaciju 21
20 two foreign languages dva strana jezika 19
21 Serbian and Croatian srpskog i hrvatskog 18
22 renaming (of the) Serbian language preimenovanje srpskog jezika 17
23 as an elective kao izborni predmet 17
24 about (the) renaming (of) language o preimenovanju jezika 15
25 science of language nauka o jeziku 15
26 language and culture jezik i kulturu 15
4-grams
27 Serbian language and literature srpski jezik i književnost 84
28 for (the) standarization (of the) Serbian language za standardizaciju srpskog jezika 63
29 Institute for (the) Serbian language instituta za srpski jezik 36
30 board for (the) standardization (of) Serbian odbora za standardizaciju srpskog 32
31 about (the) official use (of the) language o službenoj upotrebi jezika 28
32 in (the) languages (of) national minorities na jezicima nacionalnih manjina 26
33 (of the) literary and people’s language književnog i narodnog jezika 21
34 renaming (of the) Serbian language into preimenovanje srpskog jezika u 15
35 for (the) protection (of the) Serbian language za zaštitu srpskog jezika 12
36 mother tongue and literature maternji jezik i književnost 12
37 law on official use zakon o službenoj upotrebi 11
38 for (the) defense (of the) Serbian language za odbranu srpskog jezika 10
39 non-existence (of the) Montengrin literary ne postojanju crnogorskog književnog 8
40 existence (of the) Montenegrin literary language postojanju crnogorskog književnog jezika 8
41 Serbian science (of) language srpska nauka o jeziku 7
42 about (the) non-existence (of) Montenegrin o ne postojanju crnogorskog 7
43 foreign language and mathematics strani jezik i matematika 7
5-grams
44 for (the) Serbian language and literature za srpski jezik i književnost 45
45 board for (the) standardization (of the) Serbian language odbora za standardizaciju srpskog jezika 32
46 official use (of) language and alphabet službenoj upotrebi jezika i pisma 26
47 language with elements (of) national culture jezik sa elementima nacionalne kulture 25
48 (of the) department (of) Serbian language and odseka za srpski jezik i 21
49 (of the) SANU institute for (the) Serbian language instituta za srpski jezik SANU 21
50 (of the ) Serbo-Croatian literary and people’s language srpskohrvatskog književnog i narodnog jezika 14
51 in mathematics and mother tongue iz matematike i maternjeg jezika 13
52 Serbian language and (the) Cyrillic alphabet srpski jezik i ćirilično pismo 11
53 Serbian language in official use u službenoj upotrebi srpski jezik 11
54 law on (the) official use (of) language zakon o službenoj upotrebi jezika 11
55 for (the) protection (of the) Cyrillic (of the) Serbian language za zaštitu ćirilice srpskog jezika 11
94
N N-gram (English) N-gram (Serbian) Freq.
56 association for (the) protection (of the) Cyrillic (of) Serbian udruženja za zaštitu ćirilice srpskog 8
57 non-existence (of the) Montenegrin literary language ne postojanju crnogorskog književnog jezika 8
58 use (of the) Latin alphabet (of the) Serbian language primena latiničnog pisma srpskog jezika 7
59 SANU institute for (the) Serbian language institut za srpski jezik SANU 6
60 preserve (one’s) own language and (its) autonomy sačuvati sopstveni jezik njegovu posebnost 6
61 rename (the) language into (the) Montenegrin language jezik preimenuju u crnogorski jezik 6
62 Serbian language into mother tongue srpski jezik u maternji jezik 6
63 rename (the) Serbian language into Montenegrin srpski jezik preimenuju u crnogorski 6
64 (one’s) own nationality and (one’s) own language svoju nacionalnost i svoj jezik 5
65 war for (the) Serbian language and rat za srpski jezik i 5
66 professor (of the) Serbian language and literature profesor srpskog jezika i književnosti 5
6-grams
67 about (the) official use (of) language and alphabet o službenoj upotrebi jezika i pisma 25
68 (of the) department of (the) Serbian language and literature odseka za srpski jezik i književnost 18
69 Bosnian language with elements of national culture bosanski jezik sa elementima nacionalne kulture 13
70 association for protection (of) Cyrillic (of the) Serbian language udruženja za zaštitu ćirilice srpskog jezika 8
71 dictionary (of) Serbo-Croatian literary and people’s language rečnik srpskohrvatskog knjiž. i narodnog jezika 8
72 students (in the) department (of the) Serbian language and studenti odseka za srpski jezik i 7
73 rename (the) Serbian language into (the) Montengrin language srpski jezik preimenuju u crnogorski jezik 6
74 subject (of) Serbian language into mother tongue predmeta srpski jezik u maternji jezik 6
75 in (the) languages (of) national minorities and for na jezicima nacionalnih manjina i za 5
76 official use (of) other languages and alphabets službena upotreba drugih jezika i pisama 5
77 Serbian language and literature into mother srpski jezik i književnost u maternji 5
78 chair (of) board for (the) standardization (of) Serbian language preds. odbora za standardizaciju srpskog jezika 5
79 fellow (of the) SANU institute for (the) Serbian language saradnik instituta za srpski jezik SANU 5
80 war for (the) Serbian language and alphabet rat za srpski jezik i pravopis 5
Finally, in the 6-gram section, we see many of the same phrases, only more
complete (e.g., udruženja za zaštitu ćirilice srpskog jezika ‘association for [the]
protection [of the] Cyrillic [of the] Serbian language’, srpski jezik preimenuju u
crnogorski jezik ‘rename [the] Serbian language into [the] Montenegrin language’, rat za
srpski jezik i pravopis ‘war for [the] Serbian language and orthography’, ranks 70, 73, 80)
and a phrase pointing to a conception of language in terms of minority ethnocultural
rights (bosanski jezik sa elementima nacionalne kulture ‘Bosnian language with elements
of national culture’, rank 69).
As can be seen, however, even this sample of n-grams (80/3,753 or .02%)
presents an amount of information which is not easily dealt with by an analyst. In other
words, although we have been able to identify a certain number of patterns pointing to
language-related discourses with clear ideological implications such as those of
endangerment, institutional control, minority rights, and contestation, we cannot be sure
95
we are not missing anything and, of course, it is still difficult to make a principled
decision about what to actually focus our analysis on. Also, even if we decided that this
was enough information to choose a focus, how do we identify the most representative
texts for qualitative analysis, for example? As mentioned above, available research
favors examination of concordance lines at this point, but again we are dealing with
thousands of occurrences of potentially relevant lexical items and hundreds of
concordance lines. I noted above that Hunston (2002), for instance, suggests
concentrating on a random sample of concordance lines to get around this problem, but
that hardly solves the problem. Similarly, using the ‘plot’ function to identify texts with
the highest numbers of hits for a particular lemma form of the node word (as in Vessey,
2013a) seems inadequate and ineffective if we are looking to account for the corpus as a
whole. Fortunately, exploratory factor analysis and cluster analysis seem to provide
solutions for these and a number of other issues in corpus-based discourse and ideology
research.
6.3 Exploratory Factor Analysis
Factor analysis resulted in the adoption of a 12-factor solution accounting for
34.13% of the total variance in the data. In contrast to Fitzsimmons Doolan (2011, 2014),
however, the factors were not interpreted as language ideologies, but rather as indicators
of the most salient topics and thus discourses in the data. The factors were labeled as
follows: Language education (5.40 %), Cyrillic-only (3.18 %), Entrance exams (3.16 %),
Officialization of Montenegrin 1 (2.83 %), Minority language rights (2.72 %),
Contestation over language ownership and name (2.67 %), Literature and publishing
(2.61 %), Officialization of Montenegrin 2 (2.61 %), Foreign language education (2.60
96
%), Linguistics as a science, lexicography, standardization and contestation (2.55 %),
Officialization of Bosnian (2.10 %), and Linguacultural diplomacy, language, and culture
(1.73 %). Descriptive statistics for all of the variables in the preferred factor solution are
presented in Table 20. The eigenvalues of the unrotated factor analysis are shown in
Table 21. Figure 4 shows the scree plot of eigenvalues. Table 22 presents the rotated
factor patterns for the 12-factor solution using the Varimax rotation. Finally, Table 23
shows a summary of the factorial structure, with the salient collocates for each factor
(those loading at ≥ .30) and their factor loadings.
Table 20
Descriptive Statistics for the Variables in the 12-factor Solution (N = 943, k = 107)
Collocate of JEZIK Mean Minimum Maximum Range Standard

English Serbian value value deviation
academy akademija 0.3 0.0 14.6 14.6 1.1
association udruženje 0.2 0.0 16.0 16.0 1.2
attend pohađati 0.1 0.0 14.4 14.4 0.8
authorities vlast 0.4 0.0 11.8 11.8 1.2
be able to moći 0.4 0.0 11.5 11.5 1.1
begin početi 0.6 0.0 23.5 23.5 1.5
Belgrade Beograd 1.1 0.0 35.3 35.3 2.3
board odbor 0.5 0.0 29.4 29.4 2.2
book knjiga 2.1 0.0 33.1 33.1 4.1
Bosniak bošnjački 0.3 0.0 25.3 25.3 1.7
Bosnian bosanski 0.3 0.0 28.6 28.6 2.0
call zvati 0.3 0.0 11.8 11.8 0.9
center centar 0.5 0.0 20.2 20.2 1.7
children deca 0.3 0.0 11.1 11.1 0.9
class period čas 0.6 0.0 24.4 24.4 2.0
common zajednički 0.3 0.0 19.6 19.6 0.9
community zajednica 0.4 0.0 18.9 18.9 1.5
compulsory obavezan 0.2 0.0 12.2 12.2 1.0
constitution ustav 0.5 0.0 30.5 30.5 2.4
course kurs 0.2 0.0 22.0 22.0 1.3
Croatia Hrvatska 0.4 0.0 22.2 22.2 1.6
Croatian hrvatski 0.8 0.0 21.4 21.4 2.4
Croats Hrvati 0.3 0.0 16.7 16.7 1.2
cultural kulturni 0.6 0.0 13.2 13.2 1.4
culture kultura 1.5 0.0 31.8 31.8 2.9
curriculum program 1.0 0.0 37.8 37.8 2.7
Cyrillic (n.) ćirilica 0.7 0.0 32.7 32.7 3.0
Cyrillic (adj.) ćirilično 0.1 0.0 10.9 10.9 0.6
decision odluka 0.5 0.0 18.8 18.8 1.8
department odsek 0.1 0.0 16.4 16.4 0.8
department katedra 0.3 0.0 15.3 15.3 1.3
dictionary rečnik 0.8 0.0 42.0 42.0 3.5
edition izdanje 0.3 0.0 14.1 14.1 1.1
education obrazovanje 0.8 0.0 27.0 27.0 2.2
elective izborni 0.1 0.0 15.6 15.6 0.8
element element 0.1 0.0 6.8 6.8 0.5
elementary osnovna 1.0 0.0 21.5 21.5 2.2
exam ispit 0.4 0.0 19.6 19.6 2.0
97
expression izraz 0.3 0.0 10.4 10.4 1.0
first prvi 2.0 0.0 30.9 30.9 2.5
foreign strani 2.2 0.0 31.4 31.4 4.0
framework okvir 0.3 0.0 11.8 11.8 0.9
grade razred 0.7 0.0 24.4 24.4 2.5
high school (gen.) srednja 0.3 0.0 12.2 12.2 1.0
high school (acad.) gimnazija 0.4 0.0 19.7 19.7 1.7
Hungarian mađarski 0.2 0.0 47.1 47.1 1.7
institute institut 0.3 0.0 24.8 24.8 1.7
instruction nastava 1.0 0.0 36.2 36.2 3.0
instructors nastavnici 0.6 0.0 23.4 23.4 2.1
interest interesovanje 0.2 0.0 7.8 7.8 0.6
introduction uvođenje 0.2 0.0 14.3 14.3 1.0
knowledge znanje 0.6 0.0 16.6 16.6 1.7
Latin latinica 0.5 0.0 36.1 36.1 2.5
law zakon 0.6 0.0 29.4 29.4 2.1
level nivo 0.3 0.0 22.1 22.1 1.2
linguist lingvista 0.3 0.0 13.7 13.7 1.1
linguistic jezički 0.9 0.0 19.8 19.8 2.0
linguistic lingvistički 0.2 0.0 10.1 10.1 0.7
linguistics lingvistika 0.1 0.0 12.5 12.5 0.7
literary književni 0.9 0.0 20.0 20.0 2.1
literature književnost 1.4 0.0 34.4 34.4 3.2
mathematics matematika 0.3 0.0 19.3 19.3 1.5
minority manjina 0.6 0.0 31.1 31.1 2.8
Montenegrin crnogorski 0.9 0.0 40.0 40.0 3.4
Montenegrins Crnogorci 0.2 0.0 19.3 19.3 1.2
Montenegro Crna 1.2 0.0 36.1 36.1 3.8
mother (adj.) maternji 0.8 0.0 24.6 24.6 2.2
name ime 0.9 0.0 22.5 22.5 2.2
national nacionalni 1.4 0.0 27.6 27.6 3.4
Nikšić Nikšić 0.1 0.0 15.3 15.3 0.9
official službeni 0.6 0.0 43.2 43.2 2.9
part deo 1.2 0.0 20.7 20.7 2.9
philology filološki 0.2 0.0 9.9 9.9 0.9
philosophy filozofski 0.2 0.0 8.7 8.7 0.8
poem pesma 0.4 0.0 21.3 21.3 1.8
poetry poezija 0.5 0.0 24.5 24.5 2.0
professor profesor 1.8 0.0 39.2 39.2 3.9
protection zaštita 0.2 0.0 10.5 10.5 0.8
publish objaviti 0.4 0.0 14.7 14.7 1.1
published objavljen 0.3 0.0 8.3 8.3 0.8
renaming preimenovanje 0.1 0.0 15.3 15.3 0.9
rights prava 0.9 0.0 33.3 33.3 2.2
Romanian rumunski 0.1 0.0 15.0 15.0 0.8
Ruthenian rusinski 0.1 0.0 12.7 12.7 0.8
SANU SANU 0.2 0.0 19.6 19.6 1.3
school (K-12) škola 2.8 0.0 35.8 35.8 6.0
school (univ.) fakultet 1.1 0.0 38.8 38.8 3.3
science nauka 0.7 0.0 23.6 23.6 1.8
scientific naučni 0.2 0.0 11.1 11.1 0.9
alphabet pismo 1.7 0.0 53.8 53.8 5.4
section odeljenje 0.4 0.0 21.9 21.9 1.7
Serbian srpski 8.5 0.0 68.6 68.6 10.2
Serbo-croatian srpskohrvatski 0.2 0.0 29.1 29.1 1.3
Serbs Srbi 1.2 0.0 35.1 35.1 3.1
Slovak slovački 0.1 0.0 12.7 12.7 0.7
standard standardni 0.1 0.0 7.5 7.5 0.5
students (K-12) učenici 0.7 0.0 19.4 19.4 2.4
students (univ.) studenti 0.7 0.0 32.8 32.8 2.9
study učiti 0.9 0.0 27.8 27.8 2.6
subject predmet 0.9 0.0 32.8 32.8 2.9
teach predavati 0.2 0.0 12.1 12.1 1.0
teachers učitelji 0.2 0.0 15.7 15.7 0.8
use (n.) upotreba 0.8 0.0 31.8 31.8 2.5
war rat 0.4 0.0 16.3 16.3 1.2
word reč 2.6 0.0 53.5 53.5 4.6
work delo 1.3 0.0 25.9 25.9 2.2
98
writer pisac 0.8 0.0 30.1 30.1 2.1
Table 21
First 13 Eigenvalues of the Unrotated Factor Analysis (N = 943, k = 107)
Factor number Eigenvalue % of shared variance

1 9.213 8.610
2 5.352 5.002
3 4.297 4.016
4 3.846 3.595
5 3.376 3.155
6 3.124 2.920
7 2.955 2.762
8 2.781 2.599
9 2.547 2.380
10 2.430 2.271
11 2.033 1.900
12 1.951 1.823
13 1.795 1.678
Figure 4. Scree plot of eigenvalues
99
Table 22
Rotated Factor Patterns for the 12-factor Solution (Varimax Rotation)
English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12

academy akademija .035 -.038 -.022 .027 .018 .463 .047 .004 -.029 .191 -.107 -.014
association udruženje .001 .356 -.004 .002 -.032 .043 .059 .112 -.014 -.072 -.028 .096
attend pohađati .353 -.037 .309 .006 .064 -.032 -.093 .025 .000 -.056 .093 .101
authorities vlast .152 .058 -.030 .096 .073 -.002 -.127 .392 -.072 -.059 .079 -.127
be able to moći .131 -.028 .262 -.057 .059 -.083 -.009 -.003 .482 -.020 .035 -.114
begin početi .194 -.091 -.018 -.001 -.010 .112 .033 .019 .375 -.047 -.120 .013
Belgrade Beograd .002 .010 .061 .209 -.065 -.055 .020 -.134 -.046 -.023 -.014 .349
board odbor .116 .054 -.005 .155 .076 -.008 -.042 .010 -.055 .220 .351 .024
book knjiga -.089 -.027 -.064 -.027 -.032 -.046 .562 -.067 -.059 .014 -.020 .016
Bosniak bošnjački .047 -.009 -.015 .054 .002 .174 -.046 .143 -.029 .051 .669 -.028
Bosnian bosanski .080 -.025 -.020 -.043 -.008 .094 -.025 .048 -.006 -.003 .804 -.015
call (v.) zvati -.052 -.049 -.053 -.003 -.066 .334 -.068 .102 -.018 .033 .065 -.127
center centar -.017 -.023 -.022 .079 -.030 -.055 -.055 -.018 .003 -.021 -.002 .469
children deca .527 -.064 .184 .001 -.023 -.077 -.157 .010 .010 -.125 -.022 .031
class period čas .611 -.041 .142 -.020 -.043 -.044 -.073 .026 .123 -.056 .038 .053
common zajednički .007 -.012 -.046 .003 -.007 .136 -.033 .000 .461 .095 .016 -.049
community zajednica -.038 .043 -.040 .001 .384 .021 -.097 .124 -.002 -.027 .013 -.014
compulsory obavezan .648 .017 .027 -.040 -.014 -.061 -.069 -.017 -.018 -.020 .092 -.045
constitution ustav -.040 .464 -.048 .090 .117 -.052 -.116 .186 -.064 .001 -.033 -.142
course kurs .123 -.048 .005 .109 .004 -.065 -.155 -.037 .045 -.056 -.051 .394
Croatia Hrvatska -.043 -.012 .000 .005 .047 .660 -.042 -.075 .002 .035 .011 -.057
Croatian hrvatski -.067 .099 .005 .013 .117 .622 -.085 -.026 .011 .136 .121 -.070
Croats Hrvati -.041 .034 -.005 -.021 .137 .659 -.091 -.030 -.007 .130 .081 -.028
cultural kulturni -.094 .134 -.044 -.033 -.005 .029 .109 .045 -.028 .004 .005 .421
culture kultura -.101 .029 -.071 .022 -.020 -.040 .163 -.001 -.004 .044 .096 .377
curriculum program .353 -.040 -.038 .112 .063 -.034 -.161 -.047 .092 -.028 .005 .175
Cyrillic (n.) ćirilica -.019 .735 .002 -.053 -.135 .125 .024 -.016 -.023 -.064 .016 .084
Cyrillic (adj.) ćirilično -.024 .518 -.023 .018 .055 -.078 -.058 -.023 .002 .071 -.023 -.066
decision odluka .018 .057 .046 .382 .044 -.045 -.066 .219 -.106 .013 .054 -.212
department odsek -.022 -.015 .014 .459 -.026 .046 .003 .160 -.051 .064 -.018 .052
department katedra .032 -.015 -.077 .387 .026 -.010 .065 -.105 .314 .037 -.007 .134
dictionary rečnik .011 -.042 -.023 -.060 -.019 -.044 .104 -.060 -.100 .485 -.058 .011
edition izdanje .008 -.045 -.035 -.050 -.017 -.124 .409 -.056 -.088 .314 -.054 .017
education obrazovanje .302 .010 .096 .277 .112 -.104 -.151 -.013 .028 -.031 .015 -.008
elective izborni .501 -.016 -.033 -.043 .035 -.039 -.042 .009 -.046 -.024 .523 -.018
element element .116 -.002 -.035 -.051 .022 -.042 -.002 -.012 -.026 -.031 .477 -.005
elementary osnovna .608 .011 .255 -.003 .022 -.072 -.084 -.020 .287 -.002 .017 -.058
exam ispit -.006 -.029 .758 -.032 -.029 -.042 -.047 -.035 .233 -.050 -.035 .030
expression izraz -.075 .003 -.063 -.088 -.077 -.020 .016 -.085 -.043 .311 .012 -.128
100
first prvi .477 -.004 .100 -.030 -.063 -.002 .127 -.088 .093 -.026 -.007 -.071
foreign strani .328 -.035 .054 -.048 -.039 -.112 -.141 -.045 .450 .017 -.102 .061
framework okvir .245 .000 .006 .043 -.014 -.080 -.026 -.024 .408 .091 .008 .018
grade razred .797 -.015 .176 -.049 -.025 -.048 -.055 .010 .322 -.048 .005 -.085
high school (gen.) srednja .084 .026 .631 .006 .008 -.031 -.061 -.033 .016 -.081 -.023 -.025
high school (acad.) gimnazija .115 -.042 .532 .206 -.030 -.061 -.052 .075 -.033 -.031 -.015 -.107
Hungarian mađarski -.015 -.001 .034 .000 .458 .013 -.017 -.039 .007 -.033 -.009 -.029
institute institut .008 -.055 -.020 .014 -.010 -.070 -.043 -.029 -.070 .183 -.039 .362
instruction nastava .448 -.023 .233 .185 .018 -.082 -.108 .031 .453 -.055 .083 .006
instructors nastavnici .570 -.026 .074 .086 -.030 -.047 -.074 -.012 .306 -.058 -.066 -.030
interest interesovanje .042 -.059 .048 .034 -.006 -.067 .052 -.024 .004 -.029 -.023 .391
introduction uvođenje .068 .077 -.023 .021 .014 -.029 -.065 .370 .111 .013 .136 .062
knowledge znanje .207 -.063 .303 .074 -.057 -.120 -.109 -.063 .362 -.026 -.055 .103
Latin latinica -.030 .547 -.004 -.066 -.130 .135 -.008 -.043 -.006 -.043 .037 .060
law zakon .104 .348 -.067 .095 .270 -.106 -.187 -.078 -.032 -.004 .103 -.143
level nivo .103 .008 .007 .003 .088 -.075 -.116 -.018 .577 .024 -.036 .021
linguist lingvista -.063 .015 -.025 .027 -.007 .219 -.038 .060 .041 .496 .081 .041
linguistic jezički -.103 .005 .010 -.076 -.024 .027 .003 .026 .139 .576 .045 -.053
linguistic lingvistički -.073 -.007 -.020 .078 -.005 .115 -.018 .157 .080 .438 .045 .037
linguistics lingvistika -.087 -.022 .007 .067 -.023 .103 -.056 .024 .060 .373 .048 .007
literary književni -.088 -.008 -.055 .000 -.044 .128 .464 -.012 .002 .147 -.006 .038
literature književnost -.020 -.036 .024 .237 -.033 .075 .443 .005 .033 .002 -.008 .124
mathematics matematika .210 -.036 .638 -.045 -.012 -.041 -.009 -.017 .012 -.047 -.044 .008
minority manjina -.028 .098 .018 -.037 .785 .011 -.077 -.026 .039 -.056 .246 -.055
Montenegrin crnogorski -.088 .024 -.044 .032 -.001 .078 -.066 .765 .012 .073 .001 -.004
Montenegrins Crnogorci -.064 -.021 -.046 -.046 .056 .101 -.059 .428 .032 .011 -.017 .015
Montenegro Crna -.095 .075 -.048 .075 -.002 .033 -.075 .744 -.029 .003 -.002 .006
mother (adj.) maternji .100 .045 .297 .185 .083 .008 -.098 .401 -.066 -.066 .022 -.091
name ime -.079 -.003 -.064 .000 -.013 .385 -.042 .146 -.041 .077 .027 -.100
national nacionalni -.024 .149 -.028 -.078 .459 .138 -.072 .027 .041 -.020 .395 -.017
Nikšić Nikšić -.034 .027 .111 .454 -.032 -.038 -.024 .466 -.123 -.002 -.012 -.213
official službeni -.039 .558 -.022 .021 .179 -.037 -.096 .118 -.003 .028 .058 -.087
part deo -.066 -.019 .000 -.042 .004 .011 .413 -.062 -.003 .014 -.031 .133
philology filološki .030 -.029 -.030 .426 -.033 .051 .040 -.113 .141 .069 .025 .261
philosophy filozofski -.013 -.009 .023 .550 -.026 .016 -.023 .204 -.049 .038 .013 -.030
poem pesma -.040 -.062 -.071 -.027 -.030 -.084 .358 -.049 -.009 -.072 -.030 -.153
poetry poezija -.057 -.062 -.060 .012 -.039 -.086 .316 -.059 -.013 -.066 -.023 -.142
professor profesor .096 .047 .122 .578 -.071 -.093 .035 .089 .094 .078 -.008 .018
protection zaštita -.036 .388 .005 -.024 .171 .019 -.033 .096 -.017 -.056 -.025 .042
publish objaviti -.029 .064 -.054 .016 -.010 .005 .514 -.048 -.053 -.025 -.004 -.010
published objavljen -.040 -.023 .057 -.029 -.031 -.044 .440 -.052 -.037 .124 -.032 -.014
renaming preimenovanje -.038 .001 .079 .327 -.040 .003 -.037 .483 -.112 -.008 -.003 -.210
rights prava -.067 .113 -.048 .049 .401 -.005 -.142 .130 .156 -.025 .100 -.109
Romanian rumunski .014 -.006 .011 -.017 .528 .050 .052 -.029 -.031 -.067 -.051 .009
Ruthenian rusinski .026 .006 .019 -.021 .709 .001 -.012 -.006 -.025 -.036 -.053 .014
101
SANU SANU .009 -.029 .004 -.026 .006 .130 .032 -.046 -.055 .487 -.028 .054
school (K-12) škola .570 -.040 .561 .021 -.027 -.083 -.198 .007 .200 -.100 .024 .096
school (univ.) fakultet .089 -.027 -.045 .657 .039 -.028 -.046 -.109 .174 -.020 -.007 .170
science nauka .043 -.005 -.043 .143 .000 .074 .079 .052 -.011 .352 -.063 .063
scientific naučni -.019 -.010 -.013 .165 -.011 .060 .093 .032 -.001 .365 -.016 .057
alphabet pismo -.045 .817 -.021 -.022 -.072 .063 -.030 -.079 -.028 .000 .005 .024
section odeljenje .146 -.039 .480 -.007 .026 -.015 -.042 -.036 .042 .032 -.014 .001
Serbian srpski -.063 .259 -.008 .165 .000 .351 .222 .230 -.044 .363 .011 .103
Serbo-croatian srpskohrvatski -.034 .043 -.007 .036 -.018 .348 .047 .018 .025 .317 .029 .026
Serbs Srbi -.085 .162 -.057 -.066 .040 .501 -.016 .080 -.003 .034 .009 .003
Slovak slovački .029 .001 .024 -.023 .615 .012 .003 .035 -.026 -.007 -.055 .037
standard standardni -.048 .047 -.023 .000 -.048 .009 -.037 -.027 .057 .317 .046 -.014
students (K-12) učenici .374 -.020 .629 -.006 .025 -.054 -.082 -.002 -.015 -.058 -.025 -.014
students (univ.) studenti -.007 -.035 -.040 .589 -.017 -.040 -.050 .026 .005 -.050 -.025 .215
study učiti .594 -.040 .137 -.021 -.046 -.043 -.132 .019 .086 -.094 .183 .094
subject predmet .761 -.013 .116 .060 .037 -.057 -.029 .085 -.002 -.036 .197 -.059
teach predavati .293 .002 .125 .136 -.042 -.009 -.008 .056 .606 -.038 .036 -.070
teachers učitelji .442 -.035 -.036 .084 -.003 -.019 -.046 -.042 .094 -.065 .024 -.040
use (n.) upotreba -.061 .647 -.047 -.014 .199 -.101 -.125 -.081 -.004 .212 .044 -.140
war rat -.077 -.065 -.049 -.067 -.040 .338 .004 -.016 -.015 -.058 -.013 -.002
word reč -.106 -.014 -.082 -.124 -.081 -.133 .028 -.131 -.054 .328 -.050 -.113
work delo -.061 -.060 -.047 -.009 -.016 -.053 .423 -.040 -.037 .073 .009 .052
writer pisac -.110 -.073 -.073 -.071 -.042 -.037 .473 -.002 -.029 -.111 .025 .017
102
Table 23
Summary of the Factorial Structure (Collocates in Parentheses were not Used in the
Calculation of Factor Scores)
Factor Collocates Collocates Factor loadings Factor/discourse label

English Serbian
Factor 1 (5.40 %) grade razred .797 Language education
subject predmet .761
compulsory obavezan .648
class period čas .611
elementary osnovna .608
study (v.) učiti .594
school (K-12) škola .570
instructors nastavnici .570
children deca .527
(elective) (izborni) .501
first prvi .477
instruction nastava .448
teachers učitelji .442
(students [K-12]) (učenici) .374
curriculum program .353
attend pohađati .353
(foreign) (strani) .328
education obrazovanje .302
Factor 2 (3.18 %) alphabet pismo .817 Cyrillic-only
Cyrillic (n.) ćirilica .735
use (n.) upotreba .647
official službeni .558
Latin latinica .547
Cyrillic (adj.) ćirilično .518
constitution ustav .464
protection zaštita .388
association udruženje .356
law zakon .348
Factor 3 (3.16 %) exam ispit .758 Entrance exams
mathematics matematika .638
high school (gen.) srednja (škola) .631
students (K-12) učenici .629
(school [K-12]) (škola) .561
high school (acad.) gimnazija .532
section odeljenje .480
(attend) (pohađati) .309
(knowledge) (znanje) .303
Factor 4 (2.83 %) school (univ.) fakultet .657 Officialization of Montenegrin 1
students (univ.) studenti .589
professor profesor .578
philosophy filozofski .550
department odsek .459
(Nikšić) (Nikšić) .454
philology filološki .426
department katedra .387
decision odluka .382
(renaming) (preimenovanje) .327
Factor 5 (2.72 %) minority manjina .785 Minority language rights
Ruthenian rusinski .709
Slovak slovački .615
Romanian rumunski .528
national nacionalni .459
Hungarian mađarski .458
rights prava .401
community zajednica .384
Factor 6 (2.67 %) Croatia Hrvatska .660 Contestation over language
Croats Hrvati .659 ownership and name
Croatian hrvatski .622
103
Factor Collocates Collocates Factor loadings Factor/discourse label
English Serbian
Serbs Srbi .501
academy akademija .463
name ime .385
(Serbian) (srpski) .351
Serbocroatian srpskohrvatski .348
war rat .338
call (v.) zvati .334
Factor 7 (2.61 %) book knjiga .562 Literature and publishing
publish objaviti .514
writer pisac .473
literary književni .464
literature književnost .443
published objavljen .440
work delo .423
part deo .413
edition izdanje .409
poem pesma .358
poetry poezija .316
Factor 8 (2.61 %) Montenegrin crnogorski .765 Officialization of Montenegrin 2
Montenegro Crna (Gora) .744
renaming preimenovanje .483
Nikšić Nikšić .466
Montenegrins Crnogorci .428
mother (adj.) maternji .401
authorities vlast .392
introduction uvođenje .370
Factor 9 (2.60 %) teach predavati .606 Foreign language education
level nivo .577
be able to moći .482
common zajednički .461
(instruction) (nastava) .453
foreign strani .450
framework okvir .408
begin početi .375
knowledge znanje .362
(grade) (razred) .322
(department) (katedra) .314
(instructors) (nastavnici) .306
Factor 10 (2.55 %) linguistic jezički .576 Linguistics as a science,
linguist lingvista .496 lexicography, standardization
SANU SANU .487 and contestation
dictionary rečnik .485
linguistic lingvistički .438
linguistics lingvistika .373
scientific naučni .365
Serbian srpski .363
science nauka .352
word reč .328
standard standardni .317
(Serbocroatian) (srpskohrvatski) .317
(edition) (izdanje) .314
expression izraz .311
Factor 11 (2.10 %) Bosnian bosanski .804 Officialization of Bosnian
Bosniak bošnjački .669
elective izborni .523
element element .477
(national) (nacionalni) .395
board odbor .351
Factor 12 (1.73 %) center centar .469 Linguacultural diplomacy,
cultural kulturni .421 language, and culture
course kurs .394
interest interesovanje .391
culture kultura .377
institute institut .362
Belgrade Beograd .349
104
6.3.1 Factor 1: Language education. The texts with the top twenty factor scores
on Factor 1 are listed in Table 24 (for a key, see Section 4). It should be noted that the
top three and a total of seven of the twenty top scoring articles here were originally
identified as multivariate outliers (see Section 5.3.2). Importantly, there was some
overlap with Factor 11 (Officialization of Bosnian) as the two top scoring articles on
Factor 1, for example, also had high factor scores on Factor 11. Discursive links between
individual factors were confirmed by the results of cluster analysis (see Section 6.5.), so
factors are presented in groups according to their discursive links (Table 41) rather than
according to the amount of variation they account for alone (Table 23). This language-
related (small ‘d’) discourse included the following salient collocates: grade, subject,
compulsory, class period, elementary, learn, school (K-12), instructors, children, elective,
first, instruction, teachers, students (K-12), curriculum, attend, foreign, and education,
and accounted for most variation (5.40%). Note that all text excerpts were taken from the
top scoring texts as the most representative of a factor/discourse.
Table 24
Top 20 Highest Scoring Articles on Factor 1 (MV Outliers are in Bold)
Rank Article Factor score

1 POL-20-8-2004-55.txt 53.51
2 POL-13-11-2004-108.txt 50.36
3 BLI-19-8-2005-242.txt 49.16
4 POL-13-4-2003-87.txt 43.78
5 POL-25-12-2004-31.txt 42.28
6 POL-17-1-2008-99.txt 41.84
7 POL-27-4-2004-28.txt 39.88
8 POL-1-4-2003-171.txt 39.62
9 BLI-11-8-2006-324.txt 39.22
10 POL-8-4-2004-123.txt 36.36
11 POL-16-9-2004-104.txt 36.16
12 POL-21-10-2004-52.txt 35.49
13 BLI-26-3-2003-586.txt 34.42
14 POL-30-6-2006-2.txt 34.11
15 BLI-14-9-2004-223.txt 34.07
16 POL-26-3-2003-39.txt 32.59
17 POL-15-12-2004-99.txt 32.30
18 POL-19-12-2003-87.txt 31.54
19 POL-22-9-2004-57.txt 30.19
20 POL-29-9-2004-9.txt 29.41
105
Based on the salient variables and a qualitative examination of representative, i.e.
top scoring, texts (see Section 6.6.1), Factor 1 was interpreted as a general discourse
about language education. Keyword and collocation analyses above showed that
education was one of the most prominent semantic fields in this corpus and this is
reflected in Factor 1. Text excerpt 1 (from POL-13-4-2003-87, ranked 4 in Table 24)
illustrates the discourse typically found in articles scoring highly on Factor 1.34
Uz šest do sedam osnovnih predmeta, deca će pohađati izbornu i fakultativnu

nastavu. Svi budući đaci prvaci obavezno će učiti matematiku, srpski jezik i
književnost, umetnost, fizičko i zdravstveno vaspitanje, strani jezik i umesto
prirode i društva – svet oko nas. Oni kojima srpski nije maternji učiće i maternji
jezik. Uz to će pohađati i nastavu iz dva predmeta koja sami izaberu od onoga
što im škola ponudi. […] Samo naredne školske godine, da podsetimo, neće svi
đaci prvaci učiti strani jezik. Učiće oni koji pohađaju škole u kojima je moguće
organizovati nastavu iz stranih jezika. Naime, u nekim školama nema dovoljno
nastavnika stranih jezika, ali će se taj problem, obećavaju nadležni, rešiti već do
septembra sledeće godine. Jer, po programima devetoletke obavezno je učenje
stranog jezika od prvog razreda.
In addition to up to seven compulsory subjects, children will take electives and

optional instruction. All future first-grade students will take compulsory math,
Serbian language and literature, art, physical and health education, foreign
language and, instead of nature and society, the world around us. Those who do
not speak Serbian as a mother tongue will also receive mother tongue instruction.
In addition, they will also receive instruction in two subjects they choose
themselves from what is offered by the school. […] Reminder: not all first-grade
students will be required to take foreign language classes during the next school
year. Only those who attend schools in which it is possible to provide foreign
language instruction will be required to do so. Namely, some schools do not have
a sufficient number of foreign language teachers. But that problem, according to
the authorities, will be resolved as early as September next year because the nine-
year elementary curriculum requires foreign language instruction from grade one.
In this excerpt from a Politika article from April 13, 2003, we see an example of the
general discourse about language education thematizing the then topical changes to
aspects of (foreign language) education in Serbia. Serbia is a republic and, despite the
existence of an autonomous region (Vojvodina), its education system and the government
106
as a whole are fairly highly centralized, which means that policy and curriculum
decisions are made in the republic Ministry of Education in Belgrade. This excerpt refers
to a change in policy necessitated by a country-wide shortage of qualified foreign
language instructors.
Despite some overlap with several other factors noted above, texts scoring highly
on Factor 1 typically discuss language education in the context of a broader discourse on
education (e.g., educational reform topical during the subject period). Texts scoring
highly on Factor 1 thus typically do not thematize Central South Slavic (or any other)
ethnolinguistic identities. However, there are exceptions to this as this general language-
related educational (small ‘d’, see Section 3.3.4) discourse is sometimes permeated by a
(big ‘D’) discourse of contestation which is directly related to Central South Slavic
ethnolinguistic identities and thus ethnonationalism (for a description of this discourse,
see Section 6.6), as in text excerpt 2 from a Politika article from November 13, 2004
(POL-13-11-2004-108, ranked 2 in Table 24).
Bošnjaci se bore da prosvetne vlasti u Srbiji njihovoj deci u Tutinu, Sjenici i

Novom Pazaru omoguće da u prvom i drugom razredu kao izborni predmet uče svoj
maternji jezik sa elementima nacionalne kulture. Iz Bošnjačkog nacionalnog vijeća u
SCG kažu da đaci u nekim tamošnjim osnovnim školama već pohađaju časove iz svog
maternjeg jezika ali samo u okviru fakultativne nastave. Imaju samo jedan čas sedmično.
Ako bi predmet postao izborni, onda bi deca dobila još dva časa nedeljno rezervisana
za njihovu tradiciju, kulturu i jezik. Na prvi pogled ništa sporno. Ne bi ni bilo problema
da oni, Bošnjaci u Srbiji, svoj maternji jezik ne nazivaju bosanski. Prosvetne vlasti u
Srbiji ne mogu da im dopuste da u školama uče bosanski sve dok jezikoslovci ne
priznaju postojanje tog jezika.
Bosniaks are fighting to get the educational authorities in Serbia to allow their
children in Tutin, Sjenica, and Novi Pazar [municipalities with a Bosniak majority in
southwest Serbia] to study their own language with elements of national culture as an
elective in the first and second grades [of elementary school]. Officials of the Bosniak
National Council in Serbia and Montenegro [the country’s official name before
Montenegrin independence] say that elementary school children in this area are already
taking classes in their mother tongue but only as an optional subject, which means only
107
one class per week. If this subject became an elective, then the children would be able to
take two classes per week which would be reserved for their tradition, culture, and
language. At a first glance, there is nothing problematic about this. And there wouldn’t
be anything problematic about this if they, the Bosniaks in Serbia, did not call their
mother tongue Bosnian. Educational authorities in Serbia cannot allow them to study
Bosnian in schools until linguists recognize the existence of that language.
As can be seen, then, language is conceptualized in terms of (ethno-) national
identity, tradition and culture. The contestation of a self-ascribed language name which is
evident here is thus more about an attempt at delegitimization of ethnolinguistic identity
and the collective rights that come with a separate ethnolinguistic identity (such as a
group’s right to name its own language) than it is about language itself. As we will see
further below, this discourse of contestation is pervasive in this corpus, but at the same
time, some factors are much more parsimonious pointers to it than others, so follow-up
discussion (Sections 6.4-6.6) will be focused on a sub-section of six factors (Factors 2, 4,
6, 8, 10, and 11) which arise from texts that routinely thematize Central South Slavic
ethnolinguistic identities explicitly and thus are much more pertinent to an analysis of
links between language-related discourses and language ideologies, and
ethnonationalism.
6.3.2 Factor 3: Entrance exams. This language-related discourse included the
following salient collocates: exam, mathematics, high school (general and academic),
students (K-12), school (K-12), section, attend and knowledge, and accounted for 3.16%
of the variation. The texts with the top twenty factor scores on Factor 3 are listed in
Table 25. Note that only six of the twenty top scoring articles here were originally
identified as multivariate outliers. There was also some overlap with Factors 1, 9 and 11
as several texts scored highly on all or most of these factors (which, again, was confirmed
108
by Factors 1, 3, 9, and 11 clustering together, see Table 41).
Table 25

1 POL-15-5-2003-92.txt 33.56
2 POL-23-4-2005-48.txt 29.34
3 POL-4-4-2005-157.txt 27.37
4 POL-12-4-2003-104.txt 26.80
5 BLI-17-6-2003-397.txt 26.71
6 POL-15-4-2003-82.txt 26.41
7 BLI-31-3-2003-575.txt 25.92
8 POL-23-5-2008-67.txt 24.79
9 POL-30-6-2008-5.txt 24.65
10 POL-24-4-2004-47.txt 23.67
11 BLI-8-9-2006-260.txt 22.76
12 POL-17-8-2005-63.txt 21.03
13 POL-4-2-2003-161.txt 20.30
14 POL-19-4-2005-73.txt 20.10
15 POL-18-6-2004-73.txt 19.62
16 POL-26-11-2003-29.txt 19.61
17 POL-8-5-2003-129.txt 19.36
18 BLI-19-6-2008-503.txt 19.32
19 POL-12-5-2003-104.txt 19.29
20 BLI-22-6-2004-387.txt 19.07
Based on the salient variables and a qualitative examination of representative
texts, Factor 3 was interpreted as a general educational discourse which thematized high
school entrance exams typically consisting of tests of skills in mathematics and (foreign)
languages. Text excerpt 3 (from POL-15-5-2003-92, ranked 1 in Table 25) illustrates the
discourse typically found in articles scoring highly on Factor 3.
Za 168 mesta u beogradskoj Filološkoj gimnaziji za sada se kandiduje 255

učenika. Toliko je njih od ukupno 342 prijavljena položilo specifični prijemni
ispit koji je bio identičan za sve jezičke srednje škole i odeljenja u Srbiji.
Kažemo za sada, jer se za beogradske klupe mogu u junu prijaviti i učenici koji su
ispite iz srpskog jezika i književnosti i stranih jezika polagali u Karlovačkoj
gimnaziji ili u gimnazijama u Smederevu, Kruševcu i Kragujevcu koje imaju po
jedno englesko odeljenje. I, mogu da „istisnu" đaka koji ima manje poena od
njih. - Svi ovi učenici polovinom juna polažu, kao i ostali osmaci, kvalifikacione
ispite iz maternjeg jezika i matematike. Oni za upis u filološke škole nisu
eliminacioni, ali će deci doneti dodatne bodove.
A total of 255 students have applied for the 168 available slots in the Belgrade
philological high school, for now. Of the total number of 342 candidates, this is
how many passed the special entrance exam which was identical for all linguistic
high schools and sections in Serbia. We say for now because also the students
109
who took the Serbian language and literature and foreign languages exams at the
Karlovac, Smederevo, Kruševac or Kragujevac high schools (which have one
English-language section each) will be eligible to apply to the Belgrade school in
June. And, they can ‘squeeze out’ a student who has fewer points than them.
Similar to other eighth-graders, all these students take qualification exams in
mother tongue and mathematics in mid-June. Failing these exams does not
disqualify a student from the process of admission to the philological schools, but
they do mean extra points for children who do well on them.
Here, it is important to note that, depending on the type of school, the process of
admission to high schools in Serbia, and elsewhere in the Balkans, requires applicants to
pass several entrance exams which include mother tongue (i.e., Serbian), math, and a
foreign language, most often English. Texts scoring highly on Factor 3 thus typically
discuss the process of admission to high schools and the relevant entrance exams; texts
scoring highly on this factor exhibit an administrative educational discourse and do not
typically thematize ethnolinguistic identity, either explicitly or implicitly.
6.3.3 Factor 9: Foreign language education. This language-related discourse
included the following salient collocates: teach, level, be able to, common, instruction,
foreign, framework, begin, knowledge, grade, department (katedra) and instructors, and
accounted for 2.60% of the variation. The texts with the top twenty factor scores on
Factor 9 are listed in Table 26. Here, as many as thirteen of the twenty top scoring
articles were originally identified as multivariate outliers. As noted above, the principal
area of overlap was with Factors 1 and 3 (Language education, Entrance exams), as well
as, to a lesser degree, Factor 11 (Officialization of Bosnian).
110
Table 26

1 POL-23-8-2003-48.txt 65.71
2 POL-27-8-2003-28.txt 33.80
3 BLI-23-9-2003-191.txt 29.12
4 POL-10-1-2003-106.txt 26.40
5 POL-25-5-2005-44.txt 24.77
6 POL-15-5-2005-103.txt 23.37
7 POL-17-8-2005-63.txt 20.93
8 POL-19-9-2004-79.txt 20.53
9 BLI-4-8-2003-273.txt 20.16
10 BLI-9-3-2005-499.txt 19.22
11 BLI-29-3-2003-580.txt 18.76
12 BLI-12-5-2003-487.txt 18.35
13 POL-25-9-2004-41.txt 18.28
14 POL-16-9-2004-104.txt 17.56
15 POL-21-2-2003-42.txt 17.55
16 POL-17-7-2006-94.txt 15.39
17 BLI-27-10-2006-140.txt 14.97
18 POL-9-9-2006-163.txt 14.68
19 POL-17-9-2003-91.txt 14.49
20 POL-30-5-2004-16.txt 14.31
texts, Factor 9 was interpreted as a general discourse on foreign language education. Text
excerpt 4 (from POL-23-8-2003-48, ranked 1 in Table 26) provides an illustration of the
discourse found in articles scoring highly on Factor 9.
Ministarstvo prosvete Srbije odlučilo je juče da proširi listu predavača koji imaju
pravo da predaju strani jezik od prvog do šestog razreda osnovne škole, ako
poseduju znanje stranog jezika najmanje na nivou B2 zajedničkog evropskog
okvira. To znači da će strane jezike u nižim razredima moći da predaju i
profesori razredne nastave, diplomirani filolozi, psiholozi, pedagozi i druga lica
koja su završila neki nastavnički fakultet, saopštilo je Ministarstvo prosvete. Nivo
znanja stranog jezika dokazuje se polaganjem odgovarajućeg ispita na nekoj od
filoloških katedri univerziteta u Srbiji ili "međunarodno priznatom javnom
ispravom čiju valjanost utvrđuje Ministarstvo prosvete". Ministarstvo se odlučilo
na ovaj korak zbog nedostatka nastavnika za nastavu stranih jezika koja u
predstojećoj školskoj godini treba da počne u svim prvim razredima osnovne
škole.
The Serbian Ministry of Education decided yesterday to broaden the qualification

criteria for eligibility to teach foreign languages in elementary grades 1-6 to
instructors who demonstrate foreign language skills equivalent to at least the B2
level of the Common European Framework. This means that also general
education professors, professors of philology, psychology and pedagogy, and
others who have some kind of pedagogical degree will be eligible to teach foreign
111
languages to lower-level classes, it was said in the statement by the Ministry of
Education The level of foreign language skill can be proven by the passing of an
appropriate exam at one of the philology departments at universities in Serbia or
by “an internationally recognized certificate accepted by the Ministry of
Education.” The ministry made this decision because of a lack of instructors
qualified for foreign language instruction which will begin in first grades in all
elementary schools starting next school year.
Similar to the excerpt used to illustrate the discourse identified by Factor 1, this example
also points to a shortage of foreign language instructors in Serbian schools during the first
decade of the twenty-first century. Texts scoring highly on Factor 9, such as this Politika
article from August 23, 2003, typically discuss foreign language education issues in
Serbian elementary schools and high schools. Texts representative of this discourse thus
do not typically or significantly thematize ethnolinguistic identities, and so are of
marginal interest in terms of language ideologies and ethnonationalism.
6.3.4 Factor 11: Officialization of Bosnian. This language-related discourse
included the following salient collocates: Bosnian, Bosniak, elective, element, national
and board, and accounted for 2.10% of the variation. The texts with the top twenty factor
scores on Factor 11 are listed in Table 27. Note that as many as sixteen of the twenty top
scoring articles were originally identified as multivariate outliers here. As noted above,
the principle area of overlap was with Factors 1, 3, and 9 (see discussion of these factors
above and Table 41 below). There were also minor areas of overlap with Factors 5
(Minority language rights) and 10 (Linguistics as a science, lexicography, standardization
and contestation), although this latter overlap was not attested by cluster analysis; a
possible reason for this is that Bosnian is sometimes discussed as a minority language in
Serbia, as well as one of the contested Central South Slavic varieties.
112
Table 27

1 POL-15-12-2004-99.txt 39.91
2 POL-15-1-2005-107.txt 37.19
3 POL-12-11-2004-113.txt 33.23
4 POL-13-11-2004-108.txt 30.79
5 POL-10-1-2005-134.txt 29.08
6 POL-11-11-2004-127.txt 28.13
7 POL-26-10-2004-29.txt 26.68
8 POL-16-2-2005-82.txt 25.26
9 POL-12-3-2003-119.txt 19.45
10 POL-10-12-2004-127.txt 18.95
11 POL-14-1-2005-115.txt 17.46
12 POL-28-1-2006-16.txt 16.88
13 POL-19-2-2005-60.txt 14.52
14 POL-27-5-2004-29.txt 14.47
15 POL-7-3-2005-163.txt 13.82
16 BLI-31-10-2003-118.txt 13.12
17 BLI-24-1-2008-913.txt 12.98
18 POL-9-3-2005-148.txt 12.45
19 POL-25-9-2004-41.txt 12.32
20 POL-9-6-2006-165.txt 11.75
texts, Factor 11 was interpreted as a discourse on the officialization of Bosnian. Text
excerpt 5 (from POL-12-11-2004-113, ranked 3 in Table 27) provides an illustration of
the discourse typically found in articles scoring highly on Factor 11.
ZAŠTO BOŠNJACI U SRBIJI NE MOGU DA UČE BOSANSKI Možda će se u

našim školama sledeće školske godine izučavati i bosanski jezik, ukoliko taj jezik
priznaju jezikoslovci Na sednici Prosvetnog odbora Skupštine Srbije čula se
duhovita opaska da se više od sat vremena raspravlja o nečemu što ne postoji – o
bosanskom jeziku. Ministar prosvete dr Slobodan Vuksanović je više puta
ponovio da bosanski jezik za sada zvanično ne postoji, a poslanik Milan
Veselinović iz Novog Pazara (SRS) je istakao da učenici u nekim osnovnim
školama u novopazarskoj, tutinskoj i sjeničkoj opštini u prvom i drugom razredu
uče bosanski jezik iz udžbenika za bosanski jezik sa elementima nacionalne
kulture.
WHY BOSNIAKS IN SERBIA CANNOT STUDY BOSNIAN The Bosnian

language may be introduced [as a subject] in our schools next school year if that
language is recognized by linguists. During the session of the Pedagogical board
of the Serbian parliament a witty remark was made that something which doesn’t
exist, the Bosnian language, had been discussed for an hour. Minister of
Education, dr. Slobodan Vuksanović, repeated several times that the Bosnian
language does not officially exist for now, but member of parliament Milan
Veselinović from Novi Pazar (SRS [Serbian Radical Party]) said that first- and
113
second-grade students in some elementary schools in the Novi Pazar, Tutin, and
Sjenica municipalities were studying Bosnian from a textbook for the Bosnian
language with elements of national culture.
Similar to the second excerpt used to illustrate the discourse identified by Factor 1 above,
language is conceptualized in terms of collective identity and culture (“Bosnian language
with elements of national culture”). Texts scoring highly on Factor 11, such as this
Politika article from November 12, 2004, typically discuss the then topical introduction
(and thus recognition/officialization) of Bosnian as a minority language in schools in
southwest Serbia, an area where the Bosniak minority has traditionally been in the
majority. Note the explicit link between minority language rights and ethnocultural
rights. Texts representative of this discourse also typically thematize Central South
Slavic ethnolinguistic identities and show traces of the discourse of contestation related to
these identities.
6.3.5 Factor 2: Cyrillic-only. This language-related discourse included the
following salient collocates: alphabet, Cyrillic (adj./n.), use (n.), official, Latin,
constitution, protection, association and law, and accounted for 3.18% of the variation.
The texts with the top twenty factor scores on Factor 2 are listed in Table 28. Note that as
many as twelve of the twenty top scoring articles here were originally identified as
multivariate outliers. Similar to Factor 1, there was some overlap with Factor 5 (Minority
language rights) as three texts scored highly on both factors.
114
Table 28

1 POL-11-2-2005-108.txt 52.37
2 POL-16-12-2006-100.txt 50.71
3 POL-17-6-2004-76.txt 50.32
4 POL-16-3-2003-93.txt 46.21
5 POL-3-10-2008-168.txt 41.91
6 POL-22-8-2006-59.txt 37.87
7 POL-21-9-2004-72.txt 36.00
8 POL-25-8-2005-27.txt 35.78
9 POL-8-4-2005-138.txt 32.52
10 POL-16-4-2005-91.txt 30.42
11 POL-29-5-2003-20.txt 30.21
12 POL-26-7-2006-36.txt 29.39
13 BLI-28-8-2008-343.txt 27.56
14 POL-18-11-2005-74.txt 26.46
15 BLI-2-9-2006-272.txt 25.39
16 POL-2-6-2004-157.txt 24.42
17 POL-29-9-2006-20.txt 24.00
18 POL-11-10-2008-124.txt 23.90
19 POL-17-3-2008-114.txt 23.36
20 POL-4-3-2005-193.txt 23.20
texts, Factor 2 was interpreted as a classic discourse of endangerment (cf. Duchêne &
Heller, 2007), here referring to a (perceived) threat to the Cyrillic alphabet from the
widespread use of the Latin alphabet in Serbia. Text excerpt 6 (from POL-11-2-2005-
108, ranked 1 in Table 28) illustrates the discourse typically found in articles scoring
highly on Factor 2.
Udruženje građana za zaštitu srpskog pisma „Srpska ćirilica", zatražilo je da se

predsednik Srbije izvini srpskom narodu, jer se u predlogu Ustava ekspertske
grupe koju je on formirao, uz ćirilicu, navodi i latinica kao srpsko pismo.
Članovi Udruženja, u jučerašnjem saopštenju, postavljaju pitanje da li je
autorima ovog predloga poznat još neki narod na svetu koji za svoj jezik koristi
tuđe pismo. […] Pre svega, autorima predloga Ustava, navodi se dalje, nije
poznato da srpski jezik nikada nije pisan latinicom, sve do trenutka kada je
zloupotrebljena dobra volja da se južnoslovenskim saplemenicima Hrvatima
pomogne da i oni konačno dobiju svoje pismo.
The association for the protection of the Serbian alphabet “Serbian Cyrillic”
demands that the President of Serbia apologize to the Serbian people because a
constitution draft submitted by an expert group he formed treats both Latin and
Cyrillic as Serbian alphabets. In their yesterday’s statement, members of the
association ask if the authors of this draft know of any other people in the world
115
who use a foreign alphabet in their language. […] Above all, the statement further
reads, the authors of this constitution draft do not know that the Latin alphabet
had not been used in the Serbian language before a time during which the good
will to help the Southern Slavic co-tribesmen Croats to finally get their own
alphabet was abused.
It should be noted here that both Latin and Cyrillic alphabets are in use in Serbia (for a
discussion of the significance of this issue, see Section 7). Although the two scripts are
equally functional, Cyrilic is widely seen as autochthonous and thus as closely linked to
the Serbian ethnonational identity. This has made the use of the Latin alphabet a target
for Serbian ultranationalists who argue that it represents a threat to Serbian Cyrilic and
thus to Serbs themselves.
Texts scoring highly on Factor 2 thus typically discuss a (perceived) threat to the
Cyrillic alphabet in the context of available legal protections, sometimes making
references to minority language rights and regional ethnolinguistic identities. Texts
representative of this discourse typically thematize Central South Slavic ethnolinguistic
identities as the Latin alphabet is sometimes linked to Croats, as in the excerpt from a
Politika article from February 11, 2005 above.
6.3.6 Factor 5: Minority language rights. This language-related discourse
included the following salient collocates: minority, Ruthenian, Slovak, Romanian,
national, Hungarian, rights and community, and accounted for 2.72% of the variation.
The texts with the top twenty factor scores on Factor 5 are listed in Table 29. Again, as
many as fifteen of the twenty top scoring articles here were originally identified as
multivariate outliers. The principal area of overlap was with Factor 2 (Cyrillic-only).
116
Table 29

1 POL-10-1-2003-106.txt 71.03
2 POL-16-4-2005-91.txt 53.05
3 NIN-27-10-2005-73.txt 39.97
4 POL-6-6-2006-188.txt 36.82
5 POL-3-10-2008-168.txt 27.06
6 POL-29-11-2004-10.txt 25.84
7 POL-9-1-2006-149.txt 24.34
8 POL-13-11-2008-111.txt 24.21
9 POL-26-2-2003-16.txt 24.19
10 POL-16-6-2006-110.txt 22.31
11 POL-9-6-2006-171.txt 22.22
12 POL-11-11-2005-114.txt 21.13
13 POL-13-7-2006-125.txt 19.95
14 POL-9-6-2006-164.txt 18.96
15 POL-5-4-2006-133.txt 17.95
16 POL-3-2-2006-142.txt 17.21
17 POL-28-1-2006-16.txt 16.95
18 POL-1-6-2003-210.txt 16.79
19 POL-6-6-2006-190.txt 16.39
20 POL-10-12-2003-160.txt 16.21
texts, Factor 5 was interpreted as a discourse on minority language rights. It should be
noted that, although varieties of what used to be called Serbo-Croatian such as Croatian
and Bosnian are sometimes mentioned in the general discourse on minority rights (as in
the excerpt from a Politika article from June 6, 2006 below), there is a clear distinction
between them and the non-Serbo-Croatian minority languages such as Hungarian or
Slovak (hence the separate factors). Text excerpt 7 (from POL-6-6-2006-188, ranked 4 in
Table 29) illustrates the discourse typically found in articles scoring highly on Factor 5.
Kako je najavljeno iz Pokrajinskog sekretarijata za upravu, propise i prava

nacionalnih manjina, pripadnici nacionalnih manjina u Vojvodini mogu
pribaviti dvojezična lična dokumenta. Prošle sedmice je završeno štampanje
dvojezičnih ličnih dokumenata, a juče je počela distribucija po policijskim
stanicama. - Lične karte su štampane u kombinaciji srpski-mađarski, srpski-
hrvatski, srpski-rumunski, srpski-rusinski, i srpski-češki, pa će pripadnici ovih
zajednica moći da zatraže dvojezična dokumenta od nadležnih organa već od
danas - izjavio je pokrajinski sekretar za upravu, propise i nacionalne manjine
Tamaš Korhec. Štampanje dvojezičnih dokumenata je obavljeno na osnovu
republičkog Zakona o upotrebi jezika i pisma iz 1991. godine. Ovaj zakon,
117
podsetimo, daje pravo pripadnicima nacionalnih manjina na dvojezične
dokumente u mestima u kojima je njihov maternji jezik u službenoj upotrebi.
As announced by the Provincial secretariat for administration, regulations and

national minority rights, members of the national minorities in Vojvodina will be
able to obtain bilingual personal IDs. The printing of bilingual personal IDs was
finished last week, and the distribution to police stations began yesterday.
Personal IDs were printed in the following [language] combinations:
Serbian/Hungarian, Serbian/Croatian, Serbian/Romanian, Serbian/Ruthenian and
Serbian/Czech. The members of these communities will be able to apply for
bilingual IDs starting as early as today, said provincial secretary for
administration, regulations and national minorities, Tamaš Korhec. The legal
basis for the printing of bilingual IDs is the republic Law on the use of language
and alphabet from 1991. This law, let us remind, gives the members of national
minorities the right to bilingual IDs in those places where their mother tongue is
in official use.
Texts scoring highly on Factor 5 typically discuss language minority rights in Serbia, and
particularly in the northern province of Vojvodina where most minority populations are
concentrated. Texts representative of this discourse typically thematize minority
ethnolinguistic identities, but, as noted above, because non-Serb Central South Slavic
minorities are considered to be (ethnolinguistically) different from non-Central South
Slavic minorities, texts representative of Factor 5 do not typically treat what I have been
referring to as ‘regional’ (i.e., Central South Slavic) ethnolinguistic identities. However,
there are often traces of a discourse of endangerment (for a description of this discourse,
see Section 6.6) in texts representative of Factor 5, which refers to the perceived
endangerment of the Cyrilic alphabet in Serbia, as in excerpt 8 from a Politika article
from April 16, 2005 (POL-16-4-2005-91, ranked 2 in Table 29).
Pokrajinski sekretarijat za propise, upravu i nacionalne manjine izrazio je

zabrinutost zbog najnovije odluke Skupštine opštine Šid kojom se ukida službena
upotreba slovačkog i rusinskog jezika i latiničnog pisma na teritoriji te opštine.
Ovaj sekretarijat je saopštio je da je to „u direktnoj suprotnosti sa Ustavom Srbije,
Ustavnom poveljom SCG i Zakonom o zaštiti prava nacionalnih manjina”. […]
Međutim, bez obzira na to što se Avramov uplašio za ćirilično pismo i što nije
znao da nacionalne manjine imaju pravo na službenu upotrebu maternjeg jezika,
118
odluka o proterivanju rusinskog, slovačkog i zabrani latiničnog pisma uzbunila je
javnost.
The Provincial secretariat for administration, regulations and national minorities

has expressed concern over the latest decision by the Šid [an urban area in the
province of Vojvodina] municipal council which removed from official use in the
territory of this municipality the Slovak and Ruthenian languages and the Latin
alphabet. This secretariat said that this decision is “in direct violation of the
Serbian Constitution, the constitutional declaration of Serbia and Montenegro,
and the Law on the protection of minority rights.” […] However, although
Avramov [the mayor of Šid] was apprehensive about the status of the Cyrilic
alphabet and although he was not aware that national minorities are entitled to the
official use of their mother tongues, the decision about the removal from official
use of the Slovak and Ruthenian languages and the Latin alphabet still alarmed
the public.
6.3.7 Factor 4: Officialization of Montenegrin 1. This language-related
discourse included the following salient collocates: school (university), students
(university), professor, philosophy, department (odsek/katedra), Nikšić, philology,
decision and renaming, and accounted for 2.83% of the variation. The texts with the top
twenty factor scores on Factor 4 are listed in Table 30. Note that as many as fifteen of the
twenty top scoring articles here were originally identified as multivariate outliers. The
principal area of overlap was with Factor 8 (Officialization of Montenegrin 2) as nine
texts scored highly on both factors.
119
Table 30

1 BLI-30-3-2004-544.txt 50.56
2 POL-15-4-2004-89.txt 36.23
3 POL-12-7-2004-89.txt 32.47
4 POL-5-4-2004-141.txt 30.70
5 POL-16-4-2004-86.txt 30.14
6 POL-13-12-2006-119.txt 29.96
7 POL-2-4-2004-166.txt 26.72
8 POL-8-9-2004-157.txt 26.67
9 NIN-18-12-2003-10.txt 26.10
10 POL-1-9-2003-173.txt 24.72
11 POL-7-7-2004-109.txt 24.17
12 POL-5-9-2008-165.txt 24.16
13 POL-29-3-2004-25.txt 23.86
14 POL-25-11-2003-34.txt 23.62
15 POL-22-1-2008-70.txt 21.84
16 POL-17-11-2003-84.txt 21.50
17 POL-27-8-2003-28.txt 19.32
18 POL-2-9-2003-166.txt 18.93
19 NIN-13-3-2003-381.txt 18.30
20 POL-11-7-2003-114.txt 18.16
texts, Factor 4 was interpreted as one of the two discourses on the officialization of
Montenegrin, which proceeded in two distinct phases with Serbian first being renamed
into mother tongue and then into Montenegrin. Text excerpt 9 (from BLI-30-3-2004-544,
ranked 1 in Table 30) illustrates the discourse typically found in articles scoring highly on
Factor 4.
Studenti Odseka za srpski jezik i književnost Filozofskog fakulteta u Nikšiću

[major urban area in northern Montenegro] juče popodne zamrzli su štrajk glađu
koji su počeli pre podne zahtevajući od crnogorskog ministra prosvete da povuče
odluku o preimenovanju srpskog jezika u maternji u osnovnim i srednjim
školama. “Profesori podržavaju naš stav, pa ćemo sačekati rezultate sednice
Saveta za opšte obrazovanje Crne Gore zakazane za petak, ali ukoliko naš zahtev
za vraćanje imena srpskog jezika ne bude podržan nastavićemo štrajk glađu od
ponedeljka“, rekao je agenciji Beta predsednik Štrajkačkog odbora studenata
Bojan Strunjaš. Studenti Odseka za srpski jezik i književnosti organizovali su i
potpisivanje peticije za odbranu srpskog jezika, koju je do 15 sati potpisalo oko
1.000 studenata Filosofskog fakulteta.
Demanding that the Montenegrin minister of education withdraw his decision

about the renaming of the Serbian language into mother tongue in elementary
schools and high schools, the students from the Department of Serbian language
120
and literature at the School of Philosophy at Nikšić suspended yesterday
afternoon their hunger strike they had started that morning. “The professors
support our demand, so we will wait for the outcome of the Council for general
education of Montenegro which is scheduled for Friday. However, if our demand
for the reinstatement of the name of the Serbian language is not accepted, we will
continue our hunger strike beginning on Monday,” Bojan Strunjaš, the president
of the student strike board, told BETA agency. The students from the Department
of Serbian language and literature also organized the signing of a petition for the
defense of the Serbian language which was signed by about 1,000 students of the
School of Philosophy by 3 PM.
As can be seen in this excerpt from a Blic article from March 30, 2003, texts scoring
highly on Factor 4 typically report on the protests against the new policy by professors
and students of Serbian in Montenegro. It should be noted that although Serbs (who were
and continue to be vehemently against this policy) represent a sizeable minority in
Montenegro with strong political representation in the Montenegrin parliament, they were
unable to stop the implementation of this policy because ethnic Montenegrins and their
political parties received support from all other minority groups (Bosniaks, Albanians,
etc.). Texts representative of this discourse are directly relevant to Central South Slavic
ethnolinguistic identities and are linked to texts representative of the discourse identified
by Factor 8 (see Section 6.3.8), so they will be discussed in conjunction with one another
(see Section 6.6).
6.3.8 Factor 8: Officialization of Montenegrin 2. This language-related
discourse included the following salient collocates: Montenegrin, Montenegro, renaming,
Nikšić, Montenegrins, mother (adj.), authorities and introduction, and accounted for
2.61% of the variation. The texts with the top twenty factor scores on Factor 8 are listed
in Table 31. Here, seven of the twenty top scoring articles were originally identified as
multivariate outliers. As noted above, the principal overlap was with Factor 4
121
(Officialization of Montenegrin 1).
Table 31

1 POL-26-7-2005-39.txt 39.05
2 VRE-17-7-2003-115.txt 27.85
3 POL-9-11-2004-137.txt 24.49
4 BLI-30-3-2004-544.txt 24.00
5 POL-29-3-2004-25.txt 22.69
6 BLI-9-10-2004-161.txt 22.53
7 POL-31-3-2004-2.txt 22.24
8 BLI-20-9-2004-204.txt 22.17
9 POL-11-4-2003-113.txt 21.85
10 POL-11-4-2003-111.txt 21.85
11 POL-17-7-2006-92.txt 20.87
12 POL-7-7-2004-109.txt 20.69
13 BLI-24-3-2003-593.txt 20.64
14 POL-15-12-2004-98.txt 20.25
15 NIN-30-9-2004-112.txt 20.23
16 POL-2-9-2004-192.txt 19.77
17 POL-7-12-2003-177.txt 19.54
18 POL-27-3-2004-34.txt 19.34
19 POL-20-10-2004-57.txt 18.73
20 POL-23-7-2003-45.txt 18.61
texts, Factor 8 was interpreted as the second of the two discourses on the officialization
of Montenegrin. However, in contrast to the first discourse on the officialization of
Montenegrin, which was predominantly focused on the protests against the new policy by
students and professors of Serbian in Montenegro, this discourse also comprised the more
general views opposing the policy, particularly those espoused by nationalist intellectuals.
Text excerpt 10 (VRE-17-7-2003-115, ranked 2 in Table 31) provides an illustration of
the discourse representative of articles scoring highly on Factor 8.
Matija Bećković, “dežurni branilac svesrpstva u Crnoj Gori, hitro je, u svom
poznatom stilu, reagovao na ideju dr Vukotića”, o uvođenju engleskog kao
službenog jezika u Crnu Goru: “Bilo bi veoma korisno da se engleski jezik
uvede kao drugi službeni jezik u Crnu Goru, jer bi se malo odmorio crnogorski
jezik koji je to odavno zaslužio... Ako umesto srpskog jezika i srpske azbuke
uvedu maternji, onda bi možda bilo pravednije da ga nazovu maćehinskim.”
Matija Bećković, “the defender of Serbhood in Montenegro on duty,” reacted

quickly, in his well-known style, to Dr. Vukotić’s idea about the introduction of
122
English as an official language in Montenegro: “It would be very useful to
introduce English as a second official language in Montenegro because the
Montenegrin language would get some well-deserved rest… If mother tongue is
introduced in place of the Serbian language and Serbian alphabet, then it would
perhaps be more just to call it step-mother tongue.”
Texts scoring highly on Factor 8, such as this Vreme article from July 17, 2003, typically
discuss the fallout around the officialization of Montenegrin in Montenegro as well as its
links to the then impending declaration of independence of Montenegro. In this example,
Bećković, Serbian writer and member of the SANU Department for language and
literature, first ironizes a proposal for the introduction of English as a second official
language in Montenegro by Veselin Vukotić, a Montenegrin economist, by playing on the
stereotype of Montenegrins as lazy (“Montenegrin language would get some well-
deserved rest”). Following this, Bećković, typically of Serbian objections to the change
in language policy in Montenegro at the time, also denounces the change in the official
name of the language in Montenegro from Serbian into mother tongue by proposing that
it be called “step-mother tongue” instead. Both of these proposals were (rightly) seen by
Serbs in general and Serbian nationalists in particular, in both Montenegro and Serbia, as
a precursor to Montenegro’s eventual political independence. As can be seen, texts
representative of this discourse thematize ethnolinguistic identity and are directly relevant
to a discussion of ethnonationalism.
6.3.9 Factor 6: Contestation over language ownership and name. This
language-related discourse included the following salient collocates: Croatia, Croats,
Croatian, Serbs, academy, name, Serbian, Serbo-Croatian, war and call, and accounted
for 2.67% of the variation. The texts with the top twenty factor scores on Factor 6 are
listed in Table 32. Here, nine of the twenty top scoring articles were originally identified
123
as multivariate outliers. The principal area of overlap was with Factor 10 (Linguistics as
a science, lexicography, standardization and contestation).
Table 32

1 POL-24-9-2005-45.txt 50.58
2 POL-20-8-2003-80.txt 30.11
3 POL-20-1-2006-76.txt 28.09
4 POL-1-10-2003-185.txt 28.07
5 POL-3-7-2006-192.txt 28.01
6 POL-10-7-2006-142.txt 26.19
7 POL-22-7-2006-55.txt 25.75
8 POL-11-3-2005-141.txt 25.75
9 POL-10-9-2005-127.txt 24.46
10 POL-25-4-2005-36.txt 21.04
11 POL-29-1-2005-21.txt 20.49
12 POL-13-8-2003-113.txt 20.05
13 POL-26-5-2008-48.txt 19.68
14 POL-15-7-2006-104.txt 19.56
15 POL-17-9-2005-80.txt 18.42
16 POL-7-2-2004-117.txt 18.42
17 POL-31-1-2003-1.txt 17.30
18 POL-27-10-2005-23.txt 16.98
19 POL-29-7-2006-13.txt 16.38
20 POL-18-1-2005-92.txt 15.92
texts, Factor 6 was interpreted as a discourse of contestation over language ownership
and name. The existence and prominence of this discourse were noted at several points
above. Note that although contestation here is mainly between Serbs and Croats, also
Bosnians and Montenegrins are featured prominently in most relevant texts. Similarly,
although this contestation can be multilateral, it is mostly directed at non-Serb Central
South Slavic ethnolinguistic identities. Text excerpt 11 (from POL-1-10-2003-185,
ranked 4 in Table 32) illustrates the discourse illustrative of articles scoring highly on
Factor 6.
Povodom izjave Stjepana Mesića koja je objavljena u štampi da Srbi u Hrvatskoj

ne govore srpski nego hrvatski, i to bolje od Hrvata, "Ćirilica" smatra da ta
izjava zaslužuje objašnjenje. Šta je za g. Mesića "hrvatski jezik": ono što su Ilirci
pozajmili od Vukovog (štokavskog, srpskog) jezičkog standarda prema
124
poznatom Bečkom književnom dogovoru (1850) ili ono što su Hrvati na osnovu
tog izvornog srpskog jezičkog standarda sačinili kao varijantu tog jezika za
današnji "hrvatski jezik"? Nesumnjivo je da Srbi u Hrvatskoj znaju da govore i
da pišu osim svog srpskog jezika i "novohrvatski", koji je izveden iz spomenutog
srpskog jezičkog standarda, a mnogi znaju i raniji hrvatski (čakavski i
kajkavski). Ako g. Mesić misli da Srbi uopšte ne govore svoj srpski jezik, to
može jedino da znači da su tamo srpski jezik i pismo zabranjeni. Ako misli da
Srbi u Hrvatskoj i nemaju svog srpskog jezika, to je u domenu šovinističke
farse, a ako su srpski jezik i ćirilica tamo i dalje zabranjeni, to bi moglo biti da on
tu istinu potvrđuje, pa bi mu trebalo odati neku vrstu priznanja.
In response to a statement by Stjepan Mesić [former President of Croatia]

published in the [Serbian] press that Serbs in Croatia do not speak Serbian but
Croatian, and rather better than Croats, “Cyrillic” [an association] contends that
that statement requires an explanation. What is “Croatian language” for Mr.
Mesić? That which the Illyrians borrowed from Vuk’s [Karadžić] (Štokavian,
Serbian) standard language according to the well-known Vienna literary
agreement (1850) or that which the Croats, based on this original Serbian standard
language, turned into present-day “Croatian language” as a variant of that
language? Undoubtedly, Serbs in Croatia, in addition to their Serbian language,
also know how to speak and write “New-Croatian”, which derives from the
aforementioned Serbian standard language, while many know also the earlier
Croatian (Čakavian and Kajkavian). If Mr. Mesić thinks that Serbs do not speak
their Serbian language at all, that can only mean that the Serbian language and
alphabet are forbidden over there. If he thinks that Serbs in Croatia do not have
their Serbian language, that is a Chauvinist farce, and if the Serbian language and
Cyrillic are still forbidden over there, that could mean that he is simply
confirming that truth, so he should be given some sort of recognition for it.
Here we see a typical example of the discourse of contestation mentioned above, with the
exception that this particular text also shows that contestation can go in the opposite
direction as well. In other words, rather than Serbs contesting other Central South Slavs’
ethnolinguistic identity as usual, we see a reaction to an apparent attempt to contest the
Serbian minority’s ethnolinguistic identity in Croatia by the then Croatian president.
Texts scoring highly on Factor 6, such as this Politika article from January 1, 2003, thus
typically discuss language and identity-related contestation between the different Central
South Slavic communities (and, again, most often Serbs and Croats) that has been going
on since the nineteenth century but which, for obvious reasons, has been particularly
125
intense since the breakup of Yugoslavia. Historically, the Croats were the only non-Serb
Central South Slavic ethnic group allowed to officially name and standardize their
language during the Serbo-Croatian era, so they have borne the brunt of Serbian
nationalist wrath since the breakup of Yugoslavia (as well as before) as they are seen as
the precursor/precedent that led the other two ethnic groups (Bosniaks and Montenegrins)
to demand linguistic separation. Evidently, texts representative of this discourse
thematize ethnolinguistic identity in a most pertinent way and are directly discursively
linked to texts representative of Factor 10, so they will be treated together in the
qualitative part of the discussion (see Section 6.6).
6.3.10 Factor 10: Linguistics as a science, lexicography, standardization and
contestation. This language-related discourse included the following salient collocates:
linguistic (jezički/lingvistički), linguist, SANU (Serbian Academy of Sciences and Arts),
dictionary, linguistics, scientific, Serbian, science, word, standard, Serbo-Croatian,
edition and expression, and accounted for 2.55% of the variation. The texts with the top
twenty factor scores on Factor 10 are listed in Table 33. Nine of the twenty top scoring
articles were originally identified as multivariate outliers. As noted above, the principle
area of overlap was with Factor 6 (Contestation over language ownership and name).
126
Table 33

1 POL-9-2-2005-121.txt 35.57
2 POL-17-6-2006-95.txt 34.38
3 POL-1-10-2005-165.txt 30.87
4 POL-9-4-2005-127.txt 30.23
5 POL-5-1-2008-166.txt 27.77
6 POL-20-8-2003-80.txt 27.68
7 POL-25-10-2003-33.txt 27.36
8 POL-12-5-2008-144.txt 27.15
9 BLI-10-2-2005-552.txt 27.09
10 POL-14-9-2006-130.txt 26.90
11 POL-2-12-2005-197.txt 24.03
12 POL-6-6-2006-189.txt 22.39
13 POL-24-1-2006-47.txt 22.03
14 POL-24-6-2006-48.txt 21.53
15 POL-2-4-2004-169.txt 21.53
16 POL-12-4-2003-91.txt 21.20
17 POL-24-12-2006-45.txt 20.99
18 POL-2-6-2006-209.txt 20.10
19 POL-11-3-2005-141.txt 19.80
20 NIN-3-7-2008-182.txt 19.57
texts, Factor 10 was interpreted as a complex technical discourse on linguistics as a
science, lexicography, standardization and contestation. Text excerpt 12 (from POL-17-
6-2006-95, ranked 2 in Table 33) provides an illustration of the discourse typically found
in articles scoring highly on Factor 10.
Srpska lingvistika, oslobođena ranije obaveze o negovanju jezičkog zajedništva,

našla se pred važnim zadacima. Jezičke raspre kojima se stručna, a i laička
javnost ovih dana ponovo bavi (podstaknuta besedom akademika Dragoslava
Mihailovića) zapravo nisu nove. One datiraju od Vukovog vremena, ali, eto,
dosežu i do naših dana. Međutim, suština „problema" mahom se svodi(la)
najedno: da li je jezik kojim govore Srbi, Hrvati, Bošnjaci, Bosanci, Crnogorci
jedan, ali nejedinstven, i(li) jedinstven, ali nejednak i različit. Pri tome se obično
ističu dva kriterijuma: naučni (lingvistički) i nacionalni (socioemotivni). U tim
okvirima zastupaju se najčešće dva stava: (1) govornici datih jezičkih
(nacionalnih) entiteta imaju (moraju imati) svoj, autohtoni jezik i (2) jezik (pored
pisma i religijske pripadnosti) jeste osnovno obeležje nacionalnog legitimiteta i
identiteta naroda. Polazeći od nacionalnog principa, spori se mahom oko
imenovanja jezika, a kao argumenti naglašavaju - leksičke posebnosti. […] Nema
otuda (niti je bilo) ijednog ozbiljnijeg naučnog autoriteta koji bi mogao dokazati
da su jezici: srpski, hrvatski, bosanski/bošnjački, crnogorski ili dr. posebni,
autohtoni jezici. A u stvari je reč o varijetetima (varijantama) istog jezičkog
127
sistema, samim tim i modelima istog jezika. Isti jezički sistem I novija kritička
misao (osobito danas) najčešće se ne bavi razlikama u okviru jezičkog sistema,
već imenovanjima jezika, pri čemu je kritici osobito podvrgnut naziv „srpsko-
hrvatski". […] Neodrživa je njegova ocena da u Rečniku SANU „ima malo
leksike iz Srbije", mada nje zapravo ima najviše. Ono čega nema jeste - dovoljan
broj saradnika i savremenih sredstava za dalji rad na Rečniku SANU.
Freed from the earlier obligation of sustenance of linguistic unity, Serbian

linguistics is facing important tasks. The linguistic discussions (initiated by
academician Dragoslav Mihailović’s lecture) in which both the expert and lay
publics are again engaged these days are not really new. They go back to Vuk’s
time, but have extended to our days. However, the gist of the “problem” is:
whether the language spoken by Serbs, Croats, Bosniaks, Bosnians, Montenegrins
is one but not unified, or whether it is unified but unequal and different. Here,
two criteria are usually considered: the scientific (linguistic) one and the national
(socio-emotional) one. In this framework, there are usually two views: (1) the
speakers of the given linguistic (national) entities have (i.e., must have) their own
autochthonous language and (2) language (along with alphabet and religious
affiliation) is a basic symbol of national legitimacy and people’s identity.
Beginning with the national principle, the contestation mainly revolves around the
naming of the language, while the main arguments are lexical differences. […]
Therefore, there is no (nor has there ever been) credible scientific authority which
could prove that Serbian, Croatian, Bosnian/Bosniak, Montenegrin or other
varieties are separate, autochthonous languages. In fact, they are varieties
(variants) of the same linguistic system and thus models of the same language. A
single linguistic system The more recent critical thought (especially today) does
not ponder the differences within the linguistic system, but rather the naming of
the language, whereby the label “Serbo-Croatian” is especially criticized. […] His
opinion that the SANU dictionary contains “too little lexis from Serbia” is
unacceptable because it is in fact dominant. What is missing is a sufficient
number of research assistants and contemporary tools for further work on the
SANU dictionary.
As this excerpt suggests, texts scoring highly on Factor 10, such as this Politika article
from June 17, 2006, are typically longer and more complex than average newspaper texts
in this corpus. Top scoring texts here are characterized by a combination of references to
linguistics as a science, lexicography (e.g., the SANU unabridged dictionary of Serbian)
and language standardization issues, which often serve as arguments in the contestation
of authenticity of non-Serbian Central South Slavic glottonyms and separate
ethnolinguistic identities. Texts representative of this discourse often thematize Central
128
South Slavic ethnolinguistic identities overtly, and so are directly relevant to an analysis
of links between language-related discourses, language ideologies, and ethnonationalism.
6.3.11 Factor 7: Literature and publishing. This language-related discourse
included the following salient collocates: book, publisher, writer, literary, literature,
published, work, part, edition, poem and poetry, and accounted for 2.61% of the
variation. The texts with the top twenty factor scores on Factor 7 are listed in Table 34.
Note that only three of the twenty top scoring articles here were originally identified as
multivariate outliers. There was also no identifiable overlap between Factor 7 and any
other factors in this solution, suggesting a discourse separate from others identified here.
Table 34

1 POL-29-11-2004-10.txt 30.36
2 POL-24-1-2005-54.txt 30.24
3 POL-12-8-2006-121.txt 28.62
4 BLI-16-3-2008-788.txt 27.03
5 POL-28-3-2008-31.txt 25.85
6 POL-30-8-2008-12.txt 23.06
7 POL-26-7-2004-20.txt 22.03
8 POL-22-3-2008-74.txt 21.54
9 POL-5-8-2005-129.txt 21.10
10 POL-25-7-2005-41.txt 21.09
11 NIN-25-3-2004-285.txt 20.69
12 POL-30-11-2008-4.txt 20.66
13 POL-8-8-2008-115.txt 20.43
14 POL-17-8-2003-92.txt 20.36
15 POL-3-9-2003-163.txt 19.76
16 POL-24-5-2005-50.txt 19.52
17 POL-24-5-2004-41.txt 19.50
18 POL-21-3-2004-75.txt 19.26
19 POL-31-10-2008-7.txt 18.75
20 BLI-17-8-2008-372.txt 18.36
texts, Factor 7 was interpreted as a general discourse on literature and publishing. The
existence and prominence of this discourse were noted above. Text excerpt 13 (from
NIN-25-03-2004-285, ranked 10 in Table 34) illustrates the discourse typically found in
129
articles scoring highly on Factor 7.
Protekla decenija je jedan od najsnažnijih književnih odgovora dobila upravo u

poeziji Duška Novakovića, dobitnika nagrade “Vasko Popa” […]. Nagrada
“Vasko Popa” ušla je u desetu godinu postojanja tako što je jednoglasnom
odlukom žirija dodeljena pesniku Dušku Novakoviću za najbolju knjigu pesama
štampanu u 2003. godini. Nagrada je Novakoviću dodeljena za knjigu Izabrao
sam mesec koja je objavljena u izdanju Gradske biblioteke “Vladislav Petković
Dis” iz Čačka. Pre pisca knjige Izabrao sam mesec, nagradu “Vasko Popa” […]
dobili su Borislav Radović […]. Prvu knjigu pesama, Znalac ogledala,
Novaković je objavio 1976. godine. Ta knjiga je pre desetak godina doživela
drugo izdanje (Znalac ogledala i pridružene pesme) u kome su neke pesme
objavljene u drugoj, manje ili više izmenjenoj verziji […].
The poetry of Duško Novaković, the laureate of the “Vasko Popa” award, is one
of the most potent literary contributions of the last decade. […]. The tenth annual
“Vasko Popa” award was given to the poet Duško Novaković for the best book of
poetry printed in 2003 by a unanimous jury decision. Novaković received the
award for his book I chose the moon which was published by the city library
“Vladislav Petković Dis” in Čačak [an urban area in Serbia]. Before the author of
I chose the moon, the “Vasko Popa” award […] was given to Borislav Radović [a
[Serbian poet] […]. Novaković published his first book of poetry, The mirror
connoisseur, in 1976. The second edition of that book (The mirror connoisseur
and associated poems), in which some poems were published in new, more or less
changed versions […], was published about ten years ago.
Texts scoring highly on Factor 7, such as this NIN article from March 25, 2004, typically
feature reports on new literary editions, literary awards or events. Texts representative of
this discourse do not typically thematize ethnolinguistic identities or ethnonationalism,
and they do not exhibit any discursive links with texts representative of other
factors/discourses.
6.3.12 Factor 12: Linguacultural diplomacy, language, and culture. This
language-related discourse included the following salient collocates: center, cultural,
course, interest, culture, institute and Belgrade, and accounted for 1.73% of the variation.
The texts with the top twenty factor scores on Factor 12 are listed in Table 35. Eleven of
the twenty top scoring articles were originally identified as multivariate outliers here.
130
Similar to Factor 7, there were no identifiable discursive links between this discourse and
any other discourses identified here.
Table 35

1 BLI-21-11-2004-76.txt 35.09
2 POL-20-9-2005-64.txt 24.51
3 POL-9-4-2004-117.txt 24.48
4 VRE-8-1-2004-312.txt 22.21
5 POL-23-8-2006-50.txt 21.97
6 POL-25-11-2003-34.txt 19.77
7 POL-21-11-2004-60.txt 19.25
8 BLI-27-12-2003-8.txt 18.85
9 POL-8-2-2003-129.txt 17.93
10 POL-18-12-2003-94.txt 17.53
11 POL-31-1-2005-1.txt 17.44
12 POL-8-8-2004-111.txt 16.87
13 POL-2-9-2004-188.txt 16.81
14 POL-2-9-2003-166.txt 16.62
15 BLI-8-3-2006-608.txt 16.01
16 POL-9-9-2006-163.txt 15.82
17 POL-22-12-2004-54.txt 15.65
18 POL-4-11-2008-180.txt 14.66
19 POL-28-3-2008-31.txt 13.46
20 BLI-7-7-2005-302.txt 13.20
texts, Factor 12 was interpreted as a discourse on linguacultural diplomacy, language, and
culture. Text excerpt 14 (from BLI-21-11-2004-76, ranked 1 in Table 35) provides an
illustration of the discourse typically found in articles scoring highly on Factor 12.
Strani kulturni centri oduvek su bili prozor u svet i prilika da se ne samo nauče
strani jezici, već i da se bolje upozna kultura velikih nacija. Strani kulturni
centri su kod nas već decenijama sastavni deo kulturne ponude. […] Naši
sagovornici kažu da je najveći broj korisnika među mladima, naročito onih koji
pohađaju kurseve jezika. […] Po obimu literature, beogradski institut spada u
prvih pet naših instituta u svetu, a našu publiku čine većinom visokoobrazovani
ljudi - kaže Gudrun Krivokapić [head librarian, Belgrade Goethe Institute library].
Zbog velikog interesovanja, zaposleni u centrima imaju problem s prostorom,
ali su zato prezadovoljni odzivom.
Foreign cultural centers have always been a window onto the world and an
opportunity to not only learn foreign languages but also acquaint oneself better
with the cultures of the great nations. For decades, foreign cultural centers have
also been an integral part of our cultural scene. […] Our collocutors say that most
users are young people, particularly those attending language courses. […] By
131
library size, the Belgrade institute is in the top five of our major institutes
globally, and most of our audience is made up of highly educated people, says
Gudrun Krivokapić. Due to enormous interest, the staff at these centers are
dealing with a lack of space, but they are also very happy with the number of
visitors they get.
As can be seen from this excerpt from a Blic article from November 21, 2004, texts
scoring highly on Factor 12 typically discuss the offerings of embassy-sponsored cultural
centers in Belgrade and around Serbia, language courses and cultural events in particular.
Although texts representative of this discourse sometimes make references to nationalism
(e.g., “the cultures of the great nations”), they do not thematize ethnolinguistic identities
in the sense employed in this study.
To conclude this section, it has been shown that the twelve factors suggest twelve
(small ‘d’) language-related discourses, some of which are linked. Further, it has been
shown that, although most discourses identified here do feature references to Central
South Slavic identities, six of the twelve factors/discourses are clearly more pertinent for
an analysis of links between language-related discourse and language ideologies, on the
one hand, and ethnonationalism, on the other. Those six factors/(small ‘d’) discourses (2,
4, 6, 8, 10, 11) are further analyzed using quantitative and qualitative methods in the
following three sections (6.4-6.6).
6.4 Synchronic and Diachronic Variation in Language-related Discourses (Analysis
of Variance)
This section presents the results of analysis of synchronic and diachronic variation
in language-related discourses based on analysis of variance. Here, we focus on
individual factors interpreted as (small ‘d’) discourses above. Note, again, that only the
six factors (2, 4, 6, 8, 10, and 11) whose comparatively greater relevance for Central
132
South Slavic ethnonationalism(s) was established in Section 6.3 are analyzed here.
6.4.1 Variation by publication (synchronic). To examine variation by
publication the factor scores for each of the six selected language-related discourses (i.e.,
factors) of texts grouped by publication (Blic, NIN, Politika, and Vreme) were compared.
Descriptive statistics for each language-related discourse by publication are presented in
Table 36.
Table 36
Descriptive Statistics for Language-related Discourse Factor Scores by Publication
Factor Mean & SD

Blic NIN Politika Vreme
2 -0.170 0.420 -0.134 0.412 0.090 1.126 -0.177 0.352
4 -0.030 0.699 -0.048 0.780 0.037 0.995 -0.172 0.555
6 -0.201 0.438 0.034 0.606 0.049 1.046 -0.180 0.449
8 -0.015 0.828 -0.061 0.804 0.021 0.936 -0.006 1.072
10 -0.168 0.637 -0.043 0.770 0.074 0.971 -0.271 0.410
11 -0.131 0.497 -0.068 0.342 0.053 1.090 -0.075 0.304
To evaluate the hypothesis that ‘publication’ could differentiate factor scores on
Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a significant
difference, (3, N = 943) = 11.902, p = .008. Pairwise comparisons showed that texts from
Politika oriented more positively on this language-related discourse than did texts from
Blic, although the difference did not reach the adjusted significance level.
Officialization of Montenegrin 1 (Factor 4), the Kruskal-Wallis test was conducted and
indicated no significant difference, (3, N = 943) = 3.013, p = .390.
Contestation over language ownership and name (Factor 6), the Kruskal-Wallis test was
conducted and indicated a significant difference, (3, N = 943) = 21.471, p = .000.
Pairwise comparisons showed that texts from NIN and Politika oriented significantly
133
more positively on this language-related discourse than did texts from Blic, while also
texts from NIN oriented significantly more positively on this discourse than did texts
from Vreme.
Officialization of Montenegrin 2 (Factor 8), the Kruskal-Wallis test was conducted and
indicated no significant difference, (3, N = 943) = 1.321, p = .724.
Linguistics as a science, lexicography, standardization and contestation (Factor 10), the
Kruskal-Wallis test was conducted and indicated a significant difference, (3, N = 943) =
16.707, p = .001. Pairwise comparisons showed that texts from Politika oriented
significantly more positively on this language-related discourse than did texts from Blic
and Vreme.
Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted and
indicated a significant difference, (3, N = 943) = 18.149, p = .000. Pairwise comparisons
showed that texts from Politika oriented significantly more positively on this language-
related discourse than did texts from Blic.
6.4.2 Summary of variation by publication. Discursive uniformity across
publications was thus attested only for Factors 4 and 8 (Officialization of Montenegrin 1
and 2) as texts from different publications were shown not to be significantly different in
this respect (i.e., they treat the issue of officialization of Montenegrin in similar ways).
Analysis of mean differences on all other factors showed that texts from Politika tended
to score significantly more highly than texts from Blic on all factors except Factor 2
134
(where the difference failed to reach the significance level adjusted for multiple pairwise
comparisons). Texts from Politika also scored significantly more highly than texts from
Vreme on Factor 10, while texts from NIN scored significantly more highly than texts
from either Blic or Vreme on Factor 6. This suggests some differences in the discursive
treatment of ethnolinguistic identities between broadsheets (Politika) and tabloids (Blic),
broadsheet dailies and some periodicals (Politika vs. Vreme), tabloid dailies and some
periodicals (Blic vs. NIN), as well as between periodicals themselves (NIN vs. Vreme).
Blic, and to a lesser extent Vreme, thus seem either to have devoted less attention to the
discourses suggested by Factors 2, 6, 10, and 11 than Politika and NIN, or to have treated
the theme of ethnolinguistic identities underlying the discourses suggested by these
factors in ways undetected by factor analysis here. Note, however, that this does not
necessarily mean the existence of significant ideological differences also.
6.4.3 Variation by year of publication (diachronic). To examine variation by
year of publication the factor scores for each of the six selected language-related
discourses (i.e., factors) of texts grouped by year of publication (2003, 2004, 2005, 2006,
and 2008) were compared. Descriptive statistics for each language-related discourse by
year of publication are presented in Table 37.
Table 37
Descriptive Statistics for Language-related Discourse Factor Scores by Year of
Publication
Factor Mean & SD

2003 2004 2005 2006 2008
2 0.01 0.97 0.01 0.74 -0.01 0.95 0.11 1.28 -0.14 0.55
4 -0.03 1.00 0.20 1.28 -0.03 0.55 -0.13 0.61 -0.06 0.66
6 -0.11 0.52 -0.04 0.67 0.23 1.41 0.07 1.06 -0.10 0.63
8 0.07 1.12 0.16 1.17 -0.11 0.63 -0.05 0.73 -0.15 0.44
10 0.02 0.76 -0.10 0.63 0.04 0.97 0.08 1.08 -0.03 0.96
11 -0.08 0.35 0.14 1.68 0.00 0.62 0.00 0.51 -0.07 0.36
135
To evaluate the hypothesis that ‘year of publication’ could differentiate factor
scores on Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a
significant difference, (4, N = 943) = 17.181, p = .001. Pairwise comparisons showed
that texts from 2003, 2004, and 2006 oriented significantly more positively on this
language-related discourse than did texts from 2008.
scores on Officialization of Montenegrin 1 (Factor 4), the Kruskal-Wallis test was
conducted and indicated no significant difference, (4, N = 943) = 7.285, p = .122.
scores on Contestation over language ownership and name (Factor 6), the Kruskal-Wallis
test was conducted and indicated no significant difference, (4, N = 943) = 2.939, p = .568.
scores on Officialization of Montenegrin 2 (Factor 8), the Kruskal-Wallis test was
conducted and indicated no significant difference, (4, N = 943) = 4.020, p = .403.
scores on Linguistics as a science, lexicography, standardization and contestation (Factor
10), the Kruskal-Wallis test was conducted and indicated no significant difference, (4, N
= 943) = 6.078, p = .193.
scores on Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted
and indicated no significant difference, (4, N = 943) = 2.843, p = .584.
6.4.4 Summary of variation by year of publication. As might be expected,
then, a period of five to six years may not be long enough for many significant diachronic
136
differences to emerge. At the same time, a high degree of stability in a majority of
discourses identified here seems entirely plausible. Hence, we see very little change in
the discourses suggested by Factors 4, 6, 8, 10, and 11 during this period of time. The
only significant difference here is on Factor 2 between the years 2003, 2004, and 2006,
on the one hand, and the year 2008, on the other, which suggests a possible abatement of
interest in the issue of alphabet and therefore this facet of the discourse of endangerment
toward the end of this period. But this finding should be taken with caution as the
dynamics of discourse are volatile and interest in a particular issue can be quickly
rekindled by important events even after long periods of relative dormancy.
6.4.5 Variation by type of article (synchronic). To examine variation by type of
article the factor scores for each of the six selected language-related discourses (i.e.,
factors) of texts grouped by type of article (general newspaper articles vs. letters-to-the-
editor) were compared. Descriptive statistics for each language-related discourse by type
of article are presented in Table 38.
Table 38
Descriptive Statistics for Language-related Discourse Factor Scores by Type of Article
Factor Factor/Discourse Label Mean & SD

Newspaper articles Letters-to-the-editor
2 Cyrillic-only -0.06 0.84 0.87 1.57
4 Officialization of Montenegrin 1 0.01 0.92 -0.19 0.69
6 Contestation over language ownership & name -0.02 0.88 0.25 1.04
8 Officialization of Montenegrin 2 0.02 0.93 -0.27 0.46
10 Ling. as a science, lexicography, stand. & contestation 0.00 0.89 -0.01 0.70
11 Officialization of Bosnian 0.00 0.92 -0.04 0.56
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
on Cyrillic-only (Factor 2), the Mann-Whitney U test was conducted and indicated a
significant difference, U = 10.852, p = .000. This result indicates that letters-to-the-editor
oriented significantly more positively on this language-related discourse than did general
137
newspaper texts.
on Officialization of Montenegrin 1 (Factor 4), the Mann-Whitney U test was conducted
and indicated a significant difference, U = 21.607, p = .002. This result indicates that
general newspaper texts oriented significantly more positively on this language-related
discourse than did letters-to-the-editor.
on Contestation over language ownership and name (Factor 6), the Mann-Whitney U test
was conducted and indicated a significant difference, U = 22.885, p = .013. This result
indicates that letters-to-the-editor oriented significantly more positively on this language-
related discourse than did general newspaper texts.
on Officialization of Montenegrin 2 (Factor 8), the Mann-Whitney U test was conducted
and indicated a significant difference, U = 20.793, p = .000. This result indicates that
general newspaper texts oriented significantly more positively on this language-related
discourse than did letters-to-the-editor.
on Linguistics as a science, lexicography, standardization and contestation (Factor 10),
the Mann-Whitney U test was conducted and indicated no significant difference, U =
26.159, p = .349.
on Officialization of Bosnian (Factor 11), the Mann-Whitney U test was conducted and
indicated no significant difference, U = 27.452, p = .748.
138
6.4.6 Summary of variation by type of article. Newspaper articles and letters-
to-the-editor are not homogeneous categories. In addition to journalists, newspaper
articles are often written by public figures and professionals or experts. Similarly, in
addition to the general readership, letters-to-the-editor are often written by public figures
and concerned professionals. Therefore, they cannot be taken as authentic or reliable
expressions of discursive and practical consciousness (in the sense of Kroskrity, 1998).
However, to the extent that newspaper articles tend to be written by journalists and other
figures arguably exhibiting discursive consciousness, and letters-to-the-editor tend to be
written by the general readership arguably exhibiting practical consciousness, these two
text types can be considered as an approximation of discourses and ideologies
characteristic of either group.
Here, we see a convergence between discourses in circulation in the two groups in
Factors 10 and 11, and significant differences in Factors 2, 4, 6, and 8. Letters-to-the-
editor scored significantly more highly than did newspaper articles on Factor 2,
suggesting a heightened interest in alphabet and thus greater currency of this facet of the
discourse of endangerment among the general readership (and the readers of dailies in
particular, see above). It is also worth noting that some of this variation is due to the
reliance of fringe groups on letters-to-the-editor to air their views, as in the case of
associations demanding more stringent legal protections for the Cyrilic alphabet (for an
example of an alternative discourse unmasking these, see Section 6.6.1). Similarly,
letters-to-the-editor scored significantly more highly than did newspaper articles on
Factor 6, suggesting an internalization of the discourse of contestation of ethnolinguistic
identity purveyed primarily by linguists, as will be shown further below. Newspaper
139
articles, on the other hand, scored significantly more highly on Factors 4 and 8 which
suggests that the change of language policy in Montenegro attracted comparatively more
interest among journalists and language experts and which may be another indicator of a
dwindling interest in language-related issues among the public. Lastly, there was no
significant difference between the groups in terms of Factors 10 and 11. Arguably, this is
somewhat surprising considering that Factor 10 suggests a technical discourse, whereas
Factor 11 suggests an administrative issue, but it is also an indication of just how
dominant the (big ‘D’) discourse of contestation is, as well as to what extent it is
internalized.
However, it should again be noted that discursive differences attested here do not
necessarily mean that significant ideological differences exist also, as examples arising
from qualitative analysis below suggest that the same essentialist language ideology can
be identified in the dominant language-related discourses regardless of publication, year
of publication, or type of article. The results of the final quantitative technique applied
here, cluster analysis, are reported in the following section.
6.5 Cluster Analysis
This section presents the results of cluster analysis which was used to further
analyze covariance patterns. First, the discursive links between factors (e.g., Factors 4
and 8 both suggest a common discourse on the officialization of Montenegrin) were
tested by examining mean factor scores for each cluster. Second, the composition of each
cluster was examined with respect to three categorical independent variables: publication,
year of publication, and type of article.
140
6.5.1 Preferred cluster solution and scoring patterns by factor and cluster.
After a range of cluster solutions was examined, a six-cluster solution was identified as
optimal for this data set. Table 39 shows the descriptive statistics for all twelve factors as
predictor variables in a six-cluster solution.
Table 39
Descriptive Statistics for Twelve Factors (Predictor Variables) in a Six-cluster Solution
N Mean SD Std. 95% Confidence Interval for Mean Minimum Maximum

Error Lower Bound Upper Bound
F1 1 775 -2.0691 3.41779 .12277 -2.3101 -1.8281 -4.91 20.84
2 72 -3.2274 2.11004 .24867 -3.7233 -2.7316 -4.91 9.29
3 134 18.7734 10.77776 .93106 16.9318 20.6150 -1.11 53.51
4 159 -3.3504 1.55064 .12297 -3.5932 -3.1075 -4.91 3.62
5 78 -.3074 3.96494 .44894 -1.2013 .5866 -4.91 17.30
6 39 -3.1540 2.23610 .35806 -3.8789 -2.4291 -4.91 7.60
Total 1257 .0000 7.98397 .22519 -.4418 .4418 -4.91 53.51
F2 1 775 -.5533 3.63599 .13061 -.8097 -.2969 -2.34 41.91
2 72 -.9219 2.42671 .28599 -1.4921 -.3516 -2.34 11.25
3 134 -1.6581 1.38071 .11927 -1.8940 -1.4221 -2.34 4.94
4 159 -1.6931 1.46543 .11622 -1.9226 -1.4636 -2.34 9.43
5 78 -.2607 2.79064 .31598 -.8899 .3685 -2.34 9.36
6 39 25.8176 10.02103 1.60465 22.5691 29.0660 12.72 52.37
Total 1257 .0000 5.83630 .16462 -.3230 .3230 -2.34 52.37
F3 1 775 -.7485 1.84445 .06625 -.8785 -.6184 -1.37 15.49
2 72 -.8432 1.67400 .19728 -1.2366 -.4498 -1.37 10.42
3 134 6.5618 8.48330 .73285 5.1123 8.0114 -1.37 33.56
4 159 -1.1899 .51443 .04080 -1.2705 -1.1093 -1.37 1.49
5 78 -.0936 2.46480 .27908 -.6493 .4621 -1.37 10.53
6 39 -1.0775 .81042 .12977 -1.3402 -.8148 -1.37 2.17
Total 1257 .0000 3.93664 .11103 -.2178 .2178 -1.37 33.56
F4 1 775 -.8464 2.34708 .08431 -1.0119 -.6809 -2.16 12.00
2 72 -.3007 2.79879 .32984 -.9584 .3570 -2.16 14.47
3 134 .7560 5.01901 .43358 -.1016 1.6136 -2.16 26.10
4 159 -1.0648 2.10815 .16719 -1.3950 -.7346 -2.16 9.26
5 78 10.1722 11.15590 1.26316 7.6570 12.6875 -2.16 50.56
6 39 -1.2256 1.59152 .25485 -1.7416 -.7097 -2.16 3.76
Total 1257 .0000 4.67919 .13198 -.2589 .2589 -2.16 50.56
F5 1 775 .3354 5.46140 .19618 -.0497 .7205 -1.82 71.03
2 72 -.3526 2.49785 .29437 -.9396 .2343 -1.82 12.39
3 134 -.5388 2.44067 .21084 -.9559 -.1218 -1.82 12.50
4 159 -1.0269 2.45412 .19462 -1.4113 -.6425 -1.82 25.84
5 78 -.6604 1.85520 .21006 -1.0786 -.2421 -1.82 7.09
6 39 1.3449 4.27696 .68486 -.0416 2.7313 -1.82 15.75
Total 1257 .0000 4.60547 .12990 -.2548 .2548 -1.82 71.03
F6 1 775 -.2630 3.29035 .11819 -.4950 -.0310 -2.64 15.92
2 72 8.6821 12.20590 1.43848 5.8138 11.5503 -2.64 50.58
3 134 -2.2035 .99503 .08596 -2.3735 -2.0334 -2.64 2.15
4 159 -1.2229 1.46333 .11605 -1.4521 -.9937 -2.64 4.67
5 78 .1517 2.98166 .33761 -.5206 .8240 -2.64 10.43
6 39 1.4513 3.97872 .63711 .1615 2.7410 -2.64 12.49
Total 1257 .0000 4.65077 .13118 -.2573 .2573 -2.64 50.58
F7 1 775 -1.7200 2.38766 .08577 -1.8884 -1.5517 -4.03 6.94
2 72 1.2139 4.61241 .54358 .1300 2.2977 -4.03 18.13
3 134 -2.8041 1.57486 .13605 -3.0731 -2.5350 -4.03 4.27
4 159 11.3777 5.60051 .44415 10.5005 12.2550 3.22 30.36
5 78 -1.0456 3.20802 .36324 -1.7689 -.3223 -4.03 11.92
6 39 -2.7216 3.23525 .51805 -3.7703 -1.6728 -4.03 15.47
Total 1257 .0000 5.41351 .15269 -.2996 .2996 -4.03 30.36
141
N Mean SD Std. 95% Confidence Interval for Mean Minimum Maximum
Error Lower Bound Upper Bound
F8 1 775 -.7218 2.35914 .08474 -.8882 -.5555 -2.03 17.95
2 72 -1.1108 1.81867 .21433 -1.5381 -.6834 -2.03 6.00
3 134 -.2289 2.00141 .17290 -.5709 .1131 -2.03 11.20
4 159 -1.6072 1.28089 .10158 -1.8078 -1.4065 -2.03 8.63
5 78 11.5404 9.81900 1.11178 9.3265 13.7542 -2.03 39.05
6 39 .6529 4.92769 .78906 -.9444 2.2503 -2.03 16.33
Total 1257 .0000 4.46109 .12583 -.2469 .2469 -2.03 39.05
F9 1 775 -.6976 2.76180 .09921 -.8923 -.5028 -3.00 26.40
2 72 -.1311 2.70570 .31887 -.7669 .5047 -3.00 9.13
3 134 6.8956 9.77446 .84438 5.2254 8.5658 -3.00 65.71
4 159 -1.5884 1.21759 .09656 -1.7792 -1.3977 -3.00 2.68
5 78 -.4929 2.47942 .28074 -1.0520 .0661 -3.00 7.75
6 39 -2.1271 1.47721 .23654 -2.6060 -1.6483 -3.00 2.89
Total 1257 .0000 4.65937 .13142 -.2578 .2578 -3.00 65.71
F10 1 775 -.9072 3.50589 .12594 -1.1544 -.6599 -4.03 17.62
2 72 14.9900 8.98762 1.05920 12.8780 17.1020 -4.03 35.57
3 134 -2.7043 1.63691 .14141 -2.9840 -2.4246 -4.03 5.18
4 159 -.5382 3.21347 .25485 -1.0415 -.0348 -3.89 10.35
5 78 .8504 4.05708 .45937 -.0643 1.7652 -4.03 17.53
6 39 .1380 3.73877 .59868 -1.0740 1.3499 -4.03 11.72
Total 1257 .0000 5.42277 .15295 -.3001 .3001 -4.03 35.57
F11 1 775 -.1661 2.48777 .08936 -.3415 .0093 -.86 25.26
2 72 .0630 1.70718 .20119 -.3382 .4641 -.86 9.25
3 134 1.7206 7.44665 .64329 .4482 2.9930 -.86 39.91
4 159 -.6243 1.11762 .08863 -.7994 -.4493 -.86 10.02
5 78 .0578 1.64304 .18604 -.3127 .4282 -.86 6.72
6 39 -.2975 1.29823 .20788 -.7183 .1234 -.86 6.01
Total 1257 .0000 3.25725 .09187 -.1802 .1802 -.86 39.91
F12 1 775 .0491 3.88165 .13943 -.2246 .3228 -2.32 35.09
2 72 .1018 2.56701 .30253 -.5015 .7050 -2.32 10.16
3 134 -.5944 2.43468 .21032 -1.0104 -.1784 -2.32 9.73
4 159 .1818 2.69452 .21369 -.2402 .6039 -2.32 13.46
5 78 .5309 4.56334 .51670 -.4980 1.5598 -2.32 19.77
6 39 -.9244 1.97424 .31613 -1.5643 -.2844 -2.32 4.80
Total 1257 .0000 3.56106 .10044 -.1971 .1971 -2.32 35.09
142
Table 40
Results of ANOVA for Twelve Factors (Predictor Variables) in a Six-cluster Solution
Sum of df Mean F Sig.

Squares Square
F1 Between Groups 53475.049 5 10695.010 503.230 .000
Within Groups 26587.142 1251 21.253
Total 80062.191 1256
F2 Between Groups 27123.226 5 5424.645 433.370 .000
Within Groups 15659.203 1251 12.517
Total 42782.429 1256
F3 Between Groups 6526.184 5 1305.237 126.204 .000
Within Groups 12938.200 1251 10.342
Total 19464.384 1256
F4 Between Groups 8948.203 5 1789.641 120.681 .000
Within Groups 18551.713 1251 14.830
Total 27499.916 1256
F5 Between Groups 407.264 5 81.453 3.884 .002
Within Groups 26232.951 1251 20.970
Total 26640.215 1256
F6 Between Groups 6453.179 5 1290.636 77.948 .000
Within Groups 20713.632 1251 16.558
Total 27166.811 1256
F7 Between Groups 24409.693 5 4881.939 492.572 .000
Within Groups 12398.816 1251 9.911
Total 36808.509 1256
F8 Between Groups 11315.020 5 2263.004 206.930 .000
Within Groups 13681.049 1251 10.936
Total 24996.069 1256
F9 Between Groups 7346.547 5 1469.309 92.270 .000
Within Groups 19920.849 1251 15.924
Total 27267.396 1256
F10 Between Groups 17899.320 5 3579.864 235.270 .000
Within Groups 19035.179 1251 15.216
Total 36934.499 1256
F11 Between Groups 484.071 5 96.814 9.431 .000
Within Groups 12841.680 1251 10.265
Total 13325.751 1256
F12 Between Groups 110.525 5 22.105 1.748 .121
Within Groups 15816.960 1251 12.643
Total 15927.485 1256
Table 41
Discursive Links Between Twelve Factors Based on Highest Mean Scores for Six Clusters
Cluster Cluster Label Factors

1 Other (no factors)
2 Contestation over language ownership & name F6 F10
3 Language education incl. Offic. of Bosnian F1 F3 F9 F11
4 Literature & publishing F7
5 Officialization of Montenegrin 1 & 2 F4 F8 F12*
6 Cyrillic-only & Minority language rights F2 F5
* p = .121
As can be seen from Table 39, then, the mean factor scores are different for each
of the six clusters. Also, there is a significant difference between the mean factor scores
143
for all factors except Factor 12 (Table 40). The mean plots further show that none of the
factors has a high mean score for Cluster 1, while the highest Factor 12 mean score is for
Cluster 5 but without a statistically significant difference. Table 41 shows how the
remaining factors grouped by cluster based on their highest mean scores.
As mentioned in the previous section, texts loading on Factors 6 and Factor 10
(Contestation over language ownership and name; Linguistics as a science, lexicography,
standardization and contestation) share a focus on the contestation of Central South
Slavic ethnolinguistic identities and thus these two factors are grouped in Cluster 2.
Similarly, texts loading on Factors 1, 3, 9 and 11 (Language education; Entrance exams;
Foreign language education; Officialization of Bosnian) all share a focus on (language)
education and thus are grouped in Cluster 3. Here, three generally ethnolinguistically
neutral language education-related factors (Factors 1, 3, and 9) are joined by
ethnolinguistically-specific Factor 11 because the officialization of Bosnian is discussed
in the context of language education and the (small ‘d’) discourses represented by all four
factors share language education-related lexis. Factor 7 (Literature and publishing) with
the highest mean score for Cluster 4 is not grouped with any other factors. This was
expected as texts loading on this factor typically dealt with subject matter that was
unrelated to subject matter in texts loading on other factors. Factors 4 and 8
(Officialization of Montenegrin 1 and 2), grouped in Cluster 5, make a logical set since
both deal with the same issue, albeit with slightly different foci, as explained above.
Finally, Factors 2 and 5 (Cyrillic-only, Minority language rights) are grouped in Cluster 6
on account of a tendency in the discourse on the endangerment of the Cyrillic alphabet to
include references to minority rights (as illustrated in Section 6.3.5).
144
The results of cluster analysis shown above thus suggest two conclusions. First,
lexical covariance extends beyond the individual factors and suggests the existence of
discursive links between factors, i.e. small ‘d’ discourses shared by two or more factors.
The most obvious example of this is the discourse on the officialization of Montenegrin
(Cluster 5), but other (and perhaps qualitatively somewhat different) discursive links
between factors can be inferred from the other clusters as well (e.g., language education
discourse in Factors 1, 3, 9, and 11). Second, despite its obvious usefulness, lexical
covariance alone is not sufficient to identify (big ‘D’) discourses and particularly
ideologies. This can be seen in the way Factors 4, 6, 8, 10, and 11, all of which bear
traces of an overarching (big ‘D’) discourse of contestation (see Section 6.6), are grouped
in separate clusters. Although each of these factors points to an aspect of the discourse of
contestation suggested by previous analyses and thus confirms its existence and extent,
quantitative correlational analysis seems unable to capture the underlying link and thus
the (big ‘D’) discursive (and ultimately ideological) construct itself. This, as has been
suggested widely, must be done through qualitative analysis.
6.5.2 Synchronic and diachronic clustering patterns. Cluster analysis can also
help us focus qualitative analysis further by cross-tabulating data to examine whether
texts cluster in any particular patterns with respect to independent variables such as, in
our case, publication, year of publication, and type of article (general newspaper articles
vs. letters-to-the-editor, Tables 42-44).
145
Table 42
Cluster Membership by Publication for the Six-cluster Solution
Cluster Cluster label Publication Total

Blic NIN Politika Vreme
1 Other (no factors) 104 147 477 47 775
2 Contestation over lang. owner. & name 3 5 63 1 72
3 Lang. educ. incl. Offic. of Bosnian 35 3 89 7 134
4 Literature & publishing 8 22 123 6 159
5 Officialization of Montenegrin 1 & 2 10 7 60 1 78
6 Cyrillic-only & Minority language rights 4 0 35 0 39
Total 164 184 847 62 1257
Table 43
Cluster Membership by Year of Publication for the Six-cluster Solution
Cluster Cluster label Year Total

2003 2004 2005 2006 2008
1 Other (no factors) 173 157 145 158 142 775
2 Contestation over lang. owner. & name 12 5 24 20 11 72
3 Lang. educ. incl. Offic. of Bosnian 49 42 20 9 14 134
4 Literature & publishing 36 28 33 31 31 159
5 Officialization of Montenegrin 1 & 2 21 34 7 6 10 78
6 Cyrillic-only & Minority language rights 6 7 10 11 5 39
Total 297 273 239 235 213 1257
Table 44
Cluster Membership by Type of Article for the Six-cluster Solution
Cluster Cluster label Type of article Total

Newspaper articles Letters-to-the-editor
1 Other (no factors) 685 90 775
2 Contestation over lang. owner. & name 63 9 72
3 Lang. educ. incl. Offic. of Bosnian 125 9 134
4 Literature & publishing 156 3 159
5 Officialization of Montenegrin 1 & 2 76 2 78
6 Cyrillic-only & Minority language rights 23 16 39
Total 1128 129 1257
Expectedly, a majority of texts (775/1,257 or 61.7%) are grouped in Cluster 1 for
which none of the twelve factors had high mean scores; this cluster thus probably
represents the majority of the variance in the data (65.87%) unaccounted for by this
factor solution. Total cluster membership further shows that the largest number of the
remaining texts (159) are grouped in Cluster 4, which is not surprising considering the
146
general nature of Factor 7 (Literature and publishing) which had the highest mean score
for this cluster. The second largest cluster was Cluster 3 (134 texts) with four of the
twelve factors (1, 3, 9, 11) having their highest mean scores for that cluster as well. This
was followed by Cluster 5 (grouping Factors 4, 8, and 12, and 78 texts), Cluster 2
(grouping Factors 6 and 10, and 72 texts), and Cluster 6 (grouping Factors 2 and 5, and
39 texts).
Table 42 shows that there is no clear relationship between ‘publication’ and
cluster membership, although it should be noted that Cluster 6 comprises no texts from
the weeklies (NIN, Vreme). This suggests either that weeklies were less interested in the
Cyrillic alphabet and minority language rights (Factors 2 and 5) as compared with the
dailies (Blic, Politika), or that the readers of weeklies themselves had less interest in these
issues and hence did not write any letters-to-the-editor pertaining to them; it is also
possible that fringe groups’ activists were less interested in the weeklies as a vehicle for
their message on the endangerment of the Cyrillic alphabet so they didn’t contribute any
letters-to-the-editor either. Further, the largest numbers of Blic and Vreme articles
grouped outside of Cluster 1 (35 and 7, respectively) were concerned with language
education (Cluster 3, Factors 1, 3, 9, and 11), whereas the largest numbers of NIN and
Politika articles (22 and 123, respectively) were concerned with literature and publishing
(Cluster 4, Factor 7). Interestingly, Politika articles contributed the largest proportion of
texts pertaining to Central South Slavic identities and suggesting discourses of
endangerment and contestation (63/72 for Cluster 2/Factors 6 and 10, 60/78 for Cluster
5/Factors 4 and 8, and 35/39 for Cluster 6/Factors 2 and 5), but this finding is hardly
surprising considering that most 5+ hits articles (847/1,257) come from Politika.
147
Table 43 shows that there is no clear relationship between ‘year of publication’
and cluster membership, although, based on the numbers of articles, discourses
manifested in texts grouped in Clusters 3 and 5 (pertaining to language education and the
officialization of Bosnian and Montenegrin) seem to have been more pertinent in 2003
and 2004, while discourses manifested in texts grouped in Clusters 2 and 6 (pertaining to
contestation, Cyrillic alphabet, and minority language rights) seem to have been more
pertinent in 2005 and 2006. Arguably, these patterns simply reflect peaks in (public)
interest in these issues and the concomitant fluctuation in discursive activity and numbers
of articles (rather than any qualitative differences in terms of the big ‘D’ discourses).
Finally, Table 44 shows that there is no clear relationship between ‘type of article’
and cluster membership, although Clusters 4 and 5 (pertaining to literature and
publishing, and the officialization of Montenegrin, respectively) do exhibit the lowest
numbers of letters-to-the-editor, whereas discourses manifested in texts grouped in
Cluster 6 (pertaining to the Cyrillic alphabet and minority language rights) again seem to
have been comparatively more pertinent in letters-to-the-editor, as indicated by analysis
of variance in Section 6.4.5 also. These patterns suggest that writers of the letters-to-the-
editor thematizing language (whether lay people, activists, or experts) seem to have had
surprisingly little interest in the officialization of Montenegrin, but this is perhaps another
indicator of a general waning of public (and particularly lay) interest in issues of
ethnolinguistic identity toward the end of this period (see Figure 2), possibly due to a
fatigue with nationalism and the inevitability of Montenegrin independence; that writers
of letters-to-the-editor would also have little interest in the comparatively less
controversial (and more technical) issues of literature and publishing is less surprising.
148
The status of the Cyrillic alphabet and minority language rights, on the other hand, are
issues that seem to have been closer to home for many writers of letters-to-the-editor, and
particularly fringe groups’ activists, so the comparatively high degree of interest in issues
represented by Cluster 6 exhibited in this type of text is expected here. Arguably, the
discourse on the endangerment of the Cyrillic alphabet is part of a broader discourse of
declining language standards (arising from standard language ideology) which is widely
attested in the public (and, again, particularly lay) language-related discourses in many
other societies (see, e.g., Johnson & Ensslin, 2007, especially Part II), and Serbian society
does not seem to be an exception in this respect.
Based on a lack of any clear-cut patterns here, we can conclude that, despite some
observed differences in small ‘d’ discourses represented by factors, there is a degree of
overlap between the discursive profiles of the different publications, and between the
general newspaper articles and letters-to-the-editor, as well as relative stability in small
‘d’ discourses during this period. This is coupled by a high degree of both synchronic
and diachronic stability in (big ‘D’) language-related discourses (endangerment and
contestation, see Section 6.6). With the presentation of the results of quantitative
analyses complete, we now turn to a qualitative analysis of texts identified as
representative by factor scores.
6.6 Critical Discourse Analysis/Discourse-historical Approach
This section presents the results of a qualitative analysis of representative texts.
As noted above, qualitative analysis was based on and informed by the results of
quantitative analysis; the results of qualitative analysis were in turn checked against the
results of quantitative analysis. Initial qualitative analysis consisted of basic content and
149
thematic analysis informed by the results of quantitative analyses above; this was
followed by an analysis of topoi in the CDA/DHA tradition. The findings are presented
by factor (i.e., small ‘d’ language-related discourse); again, factors grouped in the same
clusters are treated together (with the exception of Factors 2 and 11 as their ‘partner’
factors, Factor 5 and Factors 1, 3, and 9, respectively, were shown to be marginal in terms
of Central South Slavic ethnolinguistic identities and ethnonationalism). Note that the
texts cited below can be identified as representative by looking up their file codes in the
tables showing top scoring texts for individual factors in Section 6.3 (for a key to the
tables, see Section 4) . Presentation is organized as follows: presented first are excerpts
in original Serbian, followed by English translations; file codes are only given with the
original text (and translations when repeated without the original text).
6.6.1 Excerpts from texts representative of Factor 2. Factor 2 (Cyrillic-only)
points to texts discussing the issues of alphabet choice and status in Serbia. These are
most often discussed in the context of changes to the constitution that were under
consideration during this period,
Usvajanje novog ustava Srbije, o kome se već dugo priča, podstaklo je i raspravu
o službenom pismu naše države. (BLI-2-9-2006-272)
The adoption of the new constitution of Serbia, which has been discussed for a
long time now, initiated a discussion about the official alphabet in our country.
as well as with respect to the process of accession to the European Union,
Ima mišljenja, veli on, da sa svojim pismom “ne možemo u Evropu i svet” i da
stoga moramo preći na latinicu. (POL-16-3-2003-93)
Some people think, he says, that “we cannot join Europe and the world” with our
alphabet so we have to switch to the Latin [alphabet].
150
Političke potrebe danas zahtevaju unifikaciju latiničkog pisma svugde u svetu.
(POL-22-8-2006-59)
Today, political needs demand a unification of the Latin alphabet everywhere in

the world.
Pobornici stava da srpski jezik treba da ima dva službena pisma, latinicu i
ćirilicu, smatraju se naprednijim, ističući da je upotreba latinice ono što će nas
približiti Zapadu. (BLI-2-9-2006-272)
The proponents of a two-alphabet (Latin and Cyrillic) solution for the Serbian
language consider themselves more progressive, insisting that the use of the Latin
alphabet is what will bring us closer to the West.
As noted above, Factor 2 suggests a discourse and ideology of endangerment,
Nema naroda u svetu koji danas drži do sebe i do svojih kulturnih i nacionalnih
korena a da toliko zapostavlja svoje pismo koliko to čini srpski narod. (POL-16-
3-2003-93)
Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does.
objasnio da je Avramov na ideju da se ukine latinica došao zbog toga što se

uplašio za ćirilicu. (POL-16-4-2005-91)
explained that Avramov [the mayor of Šid, an urban area in the province of
Vojvodina] came up with the idea to outlaw the Latin alphabet because he was
afraid for the Cyrillic.
which is supported by frequent (selective and flawed) comparisons to language situations
elsewhere in the world,
[d]a li je autorima ovog predloga poznat još neki narod na svetu koji za svoj jezik
koristi tuđe pismo (POL-11-2-2005-108)
do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language
opšte pravilo: jedan jezik – jedno pismo, jer na dva pisma u svom jeziku ni jedan
151
drugi narod u svetu ne piše (POL-21-9-2004-72)
general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language.
The importance of alphabet to the national identity is made explicit,
Ćirilično pismo je Srbima deo identiteta, to je njihova važna odrednica bez koje
oni ne bi više bili ono što su bili i što jesu. […] nije reč o ličnoj upotrebi pisma,
već je reč o kolektivnom i osnovnom ljudskom pravu Srba na svoj jezik i svoje
pismo. (POL-16-3-2003-93)
For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […] this is not about personal use
of alphabet, but rather about a collective and basic human right of Serbs to their
language and their alphabet.
as is the ultimate goal,
Svuda u svetu pismo i jezik većinskog naroda mora biti na prvom mestu, i mi
tražimo da tako bude i kod nas. (POL-21-9-2004-72)
Everywhere in the world the language and the alphabet of the majority must come
first and we ask that this be so here also.
In addition, the Latin alphabet is often, implicitly or explicitly, identified with the Croats
(and more rarely, Bosniaks),
srpski jezik nikada nije pisan latinicom, sve do trenutka kada je zloupotrebljena
dobra volja da se južnoslovenskim saplemenicima Hrvatima pomogne da i oni
konačno dobiju svoje pismo (POL-11-2-2005-108)
Serbian language had never been written using the Latin alphabet until the
moment when the good will to help the South Slavic co-tribesmen Croats finally
get their own alphabet was abused
as well as non-South Slavic minorities which are routinely discussed in terms of their
relative population sizes,
152
Avramov je mislio da nacionalne manjine koje ne prelaze 15 odsto ukupnog broja
stanovništva nemaju pravo na službenu upotrebu maternjeg jezika. (POL-16-4-
2005-91)
Avramov [the mayor of Šid, an urban area in the province of Vojvodina] thought
that national minorities which do not cross the threshold of 15 percent of the total
population do not have a right to official use of their mother tongue [and
alphabet].
Interestingly, however, there is also evidence of discourses offering alternative
argumentation (which nevertheless subscribe to the view of endangerment),
U poslednje vreme u “Politici” je objavljeno nekoliko tekstova u kojima se tvrdi

da latinica koju Srbi koriste u svom jeziku nije srpsko, već “hrvatsko pismo”. To
je, nesumnjivo, pokušaj da se ponovo pokrene kampanja protiv latinice i protiv
“zatiranja srpske ćirilice”, koju predvode članovi nekoliko udruženja za zaštitu
ćirilice i njihovi istomišljenici. Mada sam Srbin i na srpskom pišem isključivo
ćirilicom, smatram da su njihovi stavovi i rad pogrešni, nekorisni, pa čak i štetni
za srpski jezik i Srbiju. […] Članovi “ćiriličarskih” udruženja i njihovi
istomišljenici beskorisno mašu parolama, kao što su “srpska latinica ne postoji”,
“latinica nije srpsko već hrvatsko pismo” i sl. i postavljaju pitanje postoji li još
neki narod na svetu koji za svoj jezik koristi tuđe pismo? Naravno da ne postoji,
pošto svaki narod pismo koje koristi smatra svojim, bez obzira na to gde, kada i
kako je ono nastalo. Poznato je, na primer, da su simboli japanskog pisma nastali
u Kini, ali Japanci i drugi narodi kažu da oni pišu japanskim pismom. Uostalom,
francusko, englesko i holandsko pismo je identično, pa niko nikog ne optužuje za
korišćenje tuđeg pisma. Oni koji uporno tvrde da je latinica koju koriste Srbi
“tuđe pismo”, ne znaju ili ne žele da znaju da u svetskoj lingvistici ne postoji
ekskluzivno vlasništvo nad bilo kojim pismom. Pisma pripadaju svim jezicima
koji ih koriste, bilo u celosti, ili samo delimično. Stoga latinica koju Srbijanci uče
i koriste već 90 godina, a ostali Srbi i znatno duže i kojom se danas stalno ili
pretežno služi 80 odsto Srba, ne može biti tuđe, već samo srpsko pismo. […] Na
zaštiti jezika i pisma moraju se pokrenuti i učiniti odgovornim mnogi segmenti
društva. Tek onda se može očekivati da će kroz određeno vreme ćirilica postati ne
jedino, već prvo pismo srpskog jezika. Inače, satanizacijom latinice etiketama
“tuđa” i “hrvatska” neće se zaštititi srpska ćirilica. (POL-8-4-2005-138)
Recently, “Politika” published several texts which claim that the Latin alphabet
which Serbs use in their language is not a Serbian but rather “a Croatian
alphabet”. This is, undoubtedly, an attempt to again start a campaign against the
Latin alphabet and against the “destruction of the Serbian Cyrillic”, which has
been led by members of several associations for the protection of the Cyrillic and
their supporters. Although I am a Serb and although I use only Cyrillic to write in
Serbian, I think that their attitude and work are wrong, useless, and even
153
damaging to the Serbian language and Serbia. […] The members of the
“Cyrillic” associations and their supporters uselessly throw slogans around such
as that “Serbian Latin alphabet does not exist”, “Latin is not a Serbian but
Croatian alphabet” and so on, and ask if there is another nation in the world which
uses somebody else’s alphabet in its language. Of course, there isn’t because
every nation considers its own the alphabet it uses regardless of where, when and
how it came to be. It is well known, for example, that the characters of the
Japanese alphabet originate from China, but the Japanese and others say they use
the Japanese alphabet to write. Besides, the French, English and Dutch alphabets
are identical and yet no one accuses anyone of using somebody else’s alphabet.
Those who insist that the Latin alphabet used by Serbs is “somebody else’s
alphabet” do not know or do not want to know that exclusive ownership over any
alphabet does not exist in world linguistics. Alphabets belong to all languages
that use them, either in whole or in part. Therefore, the Latin alphabet learned
and used for as long as 90 years by Serbians and even longer by other Serbs, and
by 80 percent of Serbs all the time or part of the time, cannot be somebody else’s
but only a Serbian alphabet. […] In order to protect a language and alphabet
many different segments of society must be activated and made responsible.
Using labels such as “somebody else’s” and “Croatian” to satanize the Latin
alphabet will not protect the Serbian Cyrillic.
6.6.2 Excerpts from texts representative of Factors 4 and 8. Factors 4 and 8
(Officialization of Montenegrin 1 and 2) point to a discourse on the issue of change in
language policy in Montenegro whereby the name of the official language was first
changed from Serbian to mother tongue (prior to independence) and then from mother
tongue to Montenegrin (upon independence). Factor 4 points to texts discussing protests
against the change in policy by students and professors of Serbian in Montenegro, some
of whom ultimately lost their jobs because of their refusal to implement the new policy,
Profesori srpskog jezika i književnosti nikšićke gimnazije, koji već šesti dan
bojkotuju izvođenje nastave zbog preimenovanja nastavnog predmeta u
“maternji”, dobili su podršku kolega iz škole, koji su u pismu upućenom
Ministarstvu prosvete Crne Gore zapretili opštim bojkotom nastave, ukoliko se
njihovim kolegama uruče najavljeni otkazi. (POL8-9-2004-157)
The professors of the Serbian language and literature at the Nikšić [an urban area
in Montenegro] high school, who have boycotted instruction in protest against the
renaming of their subject to ‘mother tongue’ for six days now, received support
from their colleagues from the school, who in their letter sent to the Ministry of
154
Education of Montenegro threatened a general boycott if their colleagues are fired
as has been announced.
The protesters and various other actors in Serbia itself such as journalists, linguists, and
politicians object to the policy on historical, cultural, practical, and (pseudo-) scientific
grounds,
Proteste su uputili i prosvetni radnici iz Kotora, ističući da se na tim prostorima

vekovima govori srpski. (POL-2-2004-166)
Also the Kotor [an urban area in Montenegro] education workers protested,
emphasizing that Serbian has been spoken in their area for centuries.
taj čin doprineo ostvarivanju težnji vlasti i dela ljudi u Crnoj Gori da se odavde
protera sve što bi moglo da asocira na srpstvo (POL-15-4-2004-89)
that act contributed to the realization of the plan of the authorities and a part of the
people in Montenegro to banish from here all associations to Serbhood
Štrajkači glađu su pozvali sve studente, prosvetne radnike i đake Nikšića da im

se pridruže i zajednički dignu glas u „odbrani onog što nam je sveto”. (POL-15-4-
2004-89)
The hunger strikers called on all students and education workers of Nikšić [an
urban area in Montenegro] to join them and raise their voices in “the defense of
what is sacred”.
nije im problem žrtvovati struku i nauku, istoriju i tradiciju, uneti zabunu i haos u
nastavni i školski sistem, a time i u društvo u celini. (POL-7-72004-109)
they have no problem sacrificing profession and science, history and tradition, or
introducing confusion and chaos into the educational system and therefore the
society as a whole.
Vijeće smatra da u programima za osnovnu i srednju školu ne može stajati naziv

maternji jezik, zbog toga što je neprecizan, lingvistički problematičan i
neutemeljen i izazvao bi brojne naučno-stručne i praktične probleme... (POL-2-
2004-166)
155
The council holds that the label mother tongue cannot be introduced into curricula
for elementary schools and high schools because it is imprecise, linguistically
problematic and baseless and because it would cause numerous scientific and
professional as well as practical problems…
portraying it as mere ‘politicking’,
Naš Odsek broji oko 300 studenata i svi smo jedinstveni da ne dozvolimo
mešanje najprizemnije politike u fundamentalne naučne i lingvističke principe.
(POL8-9-2004-157)
Our department has around 300 students and we are all of one mind not to allow
the meddling of basest politics in fundamental scientific and linguistic principles.
pokušaj normiranja crnogorskog jezika „politikantski projekat, proizvod

političke manipulacije, koji nema istorijsku, tradicionalno-kulturnu, simboličku,
naučnu, niti jezičku zasnovanost” (POL-5-9-2008-165)
the attempt of norming the Montenegrin language “a politicking project, a product

of a linguistic manipulation which has no historical, traditional, cultural,
symbolic, scientific, or linguistic basis.
In addition, objections are also raised by relying on (again, selective and flawed)
comparisons with the outside world,
Ako Amerikancima ne smeta engleski jezik, ne vidim razloga zbog čega bi nekom
u Crnoj Gori smetalo ime jezika srpski, ili srpskohrvatski. (POL-7-7-2004-109)
If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian.
In addition to texts about the protests, Factor 8 points also to texts which discuss the
policy in a wider societal context. The argumentation, however, is similar. There is a
discourse of historical (in-) authenticity,
Svi istorijski izvori, književnost i celokupna kulturna baština Crne Gore ćiriličke
su provinijencije i svedoče da je jezik Crnogoraca bio i jeste srpski jezik. (POL-
31-3-2004-2)
156
All historical sources, literature and the entire cultural heritage of Montenegro is
Cyrillic and testifies that the language of Montenegrins has always been and is the
Serbian language.
U Zakonu kralja Nikole, koji ima 83 člana, navedeno je 13 predmeta koji će se

izučavati u osnovnoj školi. Na prvom mestu je nauka hrišćanska, na drugom
srpska istorija i na trećem predmet srpski jezik. (POL-29-3-2004-25)
In King Nicholas I’s Law,35 which has 83 articles, 13 elementary school subjects
are mentioned. First is Christian doctrine, second is Serbian history, and third is
the subject of Serbian language.
Pokušaj je to i uvođenja nepostojećeg jezika, takozvanog crnogorskog. (BLI-20-

9-2004-204)
This is an attempt to introduce a non-existing language, the so-called

Montenegrin.
which includes explicit comparisons of the Montenegrin authorities to historical invaders
and colonial powers,
Prvi put je Austrougarska, ukinula srpski i ćirilicu, drugi put su Italijani 1941.
godine naložili da se uvede maternji jezik, a sada to čini crnogorska vlast.
(POL-31-3-2004-2)
The first time Serbian and Cyrillic were outlawed by the Austro-Hungary, the
second time in 1941 the Italians ordered a change to mother tongue, and now this
is being done by the Montenegrin authorities.
Ono što Đukanović hoće do sada niko nije ostvario, ni turski osvajači. (BLI-20-9-
2004-204)
What Đukanović [then President of Montenegro] wants has never been

accomplished by anyone, not even the Turkish invaders.
The theme of (symbolic) historical violence is also carried through to the present,
to nosi opasnost prekrajanja istorije, s jedne, i ubijanje duha, bića i stvaralaštva

naroda, s druge strane (POL-31-3-2004-2)
157
this means a danger of changing history, on the one hand, and of killing the spirit,
the being and the creativity of the people, on the other
protest zbog kršenja Ustava i nasilja nad jezikom (BLI-9-10-2004-161)
protest against the violations of the Constitution and violence against language
Očito, čine to nasilnom promjenom identiteta Crne Gore. (BLI-20-9-2004-204)
Obviously, they are doing this through a forced change of the identity of
Montenegro.
Similar to texts loading highly on Facor 4, argumentation is partly (pseudo-) scientific,
Poslednjih godina žestoke euforije crnogorskog ultranacionalizma, događanja su

dovela i do poremećaja, u izgovoru, a stvoren je i crnogorski “knjiški” jezik,
autora profesora dr Vojislava Nikčevića. On je napravio i nova dva slova za dva
glasa, koji se mogu čuti u crnogorskim lokalizmima. To je, međutim, uzeto kao
osnov političke ujdurme za stvaranje samostalnosti, odnosno potpunog ukidanja
jezičkog rodoslova sa srpstvom. [...] Kompetentni stručnjaci, lingvisti od
naučnog autoriteta, još od vremena Vuka Karadžića, tvorca književnog srpskog
jezika tvrde da je jezik Crnogoraca, odnosno Srba jedinstven jezik – objašnjava
akademik Dašić. – Jezik može biti, a ne mora, nazvan i po državi. Ne sporim
pravo onima koji žele da svoj jezik nazivaju crnogorski, jer svako ima pravo da
jezik kojim govori naziva po svom uverenju i osećanju. Samo tvrdim, a to naučno
argumentovano dokazuju i lingvisti, da ne postoje naučni, lingvistički, istorijski i
socio-kulturni razlozi za preimenovanje srpskog u crnogorski jezik. Srpskim
jezikom u Crnoj Gori govore, osim Crnogoraca i Srba i muslimani i Bošnjaci.
(POL-31-3-2004-2)
The recent years of euphoria of Montenegrin ultra-nationalism brought deviations

in accent, while also a Montenegrin “literary” language has been created by
author professor Dr. Vojislav Nikčević [well-known Montenegrin linguist]. He
created two new letters for two phonemes which can be heard in Montenegrin
localisms. However, that was taken as a basis for political shenanigans around
independence, that is a complete erasure of linguistic kin with Serbhood. […]
Competent experts, linguists with scientific authority, have claimed since the time
of Vuk Karadžić, the creator of the literary Serbian language, that the language of
Montenegrins and Serbs is the same, academician Dašić [of the Montenegrin
Academy of Sciences and Arts] explained. – A language can be, but doesn’t have
to be, named after a state. I do not deny the right to those who want to name their
language Montenegrin, because everyone has the right to name their language
according to their beliefs and feelings. I’m only saying, and linguists have
158
provided scientific argumentation for this, that there are no scientific, linguistic,
historical or sociocultural reasons to rename Serbian into Montenegrin. Besides
Serbs and Montenegrins, Serbian is spoken in Montenegro also by Muslims and
Bosniaks.
„crnogorski planeri i programeri nesrećno odabrali termin maternji jezik” (POL-

29-3-2004-25)
“Montenegrin planners and programmers made an unfortunate choice by opting

for the term mother tongue”
Ultimately, as can be seen below, discourses and ideologies ostensibly about
language are often revealed to be about conflicts, societal, political, religious, cultural,
scientific or otherwise (Gal, 1998, p. 323),
„veštačke montaže identiteta naroda sa ovih prostora” (POL-17-7-2006-92)
“artificial montage of identity of the people in this area”
„otvara proces asimilacije Srba i da će vlast, kroz otvaranje jezika i pitanje

položaja crkve, kroz diskriminaciju prema srpskom narodu, želeti, u narednom
periodu da taj narod prevede u ono u što ona želi – da postanu ljudi koji će se
nacionalno iskazivati kao Crnogorci, koji govore crnogorskim jezikom i
pripadaju nepostojećoj crnogorskoj crkvi” (POL-17-7-2006-92)
“a process of assimilation of Serbs has begun and the authorities will, through the
raising of the questions of language and the status of the church, through
discrimination against the Serbian people, want to transform that people into what
they want it to be – to become people who declare their nationality to be
Montenegrin, who speak the Montenegrin language and belong to the non-
existing Montenegrin church”
6.6.3 Excerpts from texts representative of Factor 11. Similar to Factors 4 and
8, Factor 11 (Officialization of Bosnian) suggests a discourse on a change in language
policy. The difference is that, unlike Montenegrin in Montenegro, Bosnian was
introduced (and thus recognized/officialized) in Serbia as a minority language, largely as
159
a result of the Council of Europe and European Union requirements pertaining to
minority rights. Factor 11 thus points to texts discussing the then pending recognition of
Bosnian as a minority language and its introduction as a subject in elementary schools in
areas with a Bosniak majority, as well as the resistance to this on the part of various
political and academic actors in Serbia. Similar to the officialization of Montenegrin, the
discussions around the introduction of Bosnian are characterized primarily by a discourse
of contestation,
Prosvetni odbor Skupštine Srbije je zaključio da je ministar prosvete Srbije

Slobodan Vuksanović prekoračio zakonska ovlašćenja i protivno zaključcima
ovog odbora i stavu stručne javnosti odobrio izvođenje nastave iz predmeta
bosanski jezik sa elementima nacionalne kulture. (POL-15-1-2005-107)
The educational board of the Serbian parliament has concluded that the minister
of education Slobodan Vuksanović [then Serbian Minister of Education] had
exceeded his legal authority by approving instruction in the subject Bosnian
language with elements of national culture.
Ministar i njegov pomoćnik su istakli da bosanski jezik ne postoji, da propisi ne

predviđaju službenu upotrebu tog jezika, da nastavni planovi i programi i
udžbenici za taj predmet nisu odobreni, da taj predmet, po slovu zakona i
Pravilnika o nastavnim planovima i programima, ne može u ovoj školskoj godini
da bude ni redovni (obavezni), ni izborni, niti fakultativni predmet. (POL-11-11-
2004-127)
The minister and his assistant said that the Bosnian language did not exist, that
regulations did not foresee official use of that language, that curricula and
textbooks for that subject had not been approved, that that subject, according to
law and the Rulebook on curricula, could be neither a compulsory nor an elective
nor an optional subject in this school year.
which is mostly about the name of the language rather than the right to a minority status
itself,
Njihov zahtev bi mogao da se okarakteriše i kao manji od onoga što već uživaju
Albanci, Hrvati, Mađari... Samo kada bi bosanski jezik postojao. Greška ili
namera – Ne znam zašto su tražili da se uči bosanski, a ne bošnjački. Možda je
160
greška. (POL-12-11-2004-113)
Their request could also be characterized as less than what is already enjoyed by
Albanians, Croats, Hungarians… If only the Bosnian language existed. Error or
intent – I don’t know why they requested that Bosnian and not Bosniak be taught.
Perhaps it’s an error.
Zvanično ime jezika može da bude samo bošnjački jezik, odnosno da proizilazi iz
priznatog etnonima Bošnjaci, a ne “bosanski”. BiH je zemlja u kojoj žive tri
ravnopravna naroda i Bošnjaci ne treba da uzurpiraju pravo na bosansko ime.
Samo nekoliko primera, koji pokazuju da se u svetu koriste prvobitni nazivi za
jezike koji su u upotrebi dva ili više naroda. Tako, austrijski narod, koji ima
državu hiljadu godina, govori nemačkim jezikom, a ne austrijanskim ili
austrijskim. Švajcarci nemačkog porekla govore nemačkim, a ne švajcarskim
jezikom. Američki narod svoj jezik naziva engleskim, a ne angloameričkim ili
američkim. (POL-10-1-2005-134)
The official name of the language can only be Bosniak, deriving from the
recognized ethnonym Bosniaks, and not ‘Bosnian’. Bosnia-Herzegovina is a
country with three equal nations and Bosniaks should not usurp the right to the
Bosnian name. Just a couple of examples which show that for languages used by
two or more nations the original name is used around the world. Thus, the
Austrian nation, which has had a state for a thousand years, speaks German and
not Austrian. The Swiss of Germanic origin speak German, not Swiss. American
people call their language English, not Anglo-American or American.
Potom je na sednici pročitano stručno mišljenje koje je Prosvetnom odboru

dostavio Odbor za standardizaciju srpskog jezika SANU, iz koje se između
ostalog može zaključiti da se na srpskom bosanski jezik kaže bošnjački, a na
bosanskom – bosanski. (POL-15-1-2005-107)
After that an expert opinion was read in the meeting which had been provided to
the Educational board by the SANU Board for the standardization of the Serbian
language from which it can be concluded that in Serbian Bosnian means Bosniak,
and in Bosnian – Bosnian.
There is also evidence of a similar (pseudo-) scientific discourse,
Ako jezikoslovci kažu da bosanski jezik postoji (kao da je nauka o jeziku od juče
pa se ne zna koji jezici postoje na Balkanu i u svetu), onda pravnici treba da ga
pretoče u paragrafe. (POL-10-1-2005-134)
If linguists say that the Bosnian language exists (as if linguistics was a recent
161
development so we didn’t know which languages existed in the Balkans and
around the world), then lawyers need to turn it into paragraphs.
Podsetili su da je ministar svojevremeno rekao da bosanski jezik ne postoji, da

mora da sačeka stručnjake za jezik da kažu o tome kojim jezikom govore
Bošnjaci. (POL-15-1-2005-107)
They reminded that the minister said at one time that the Bosnian language did
not exist, that he had to wait for an expert opinion on what language was spoken
by Bosniaks.
Jezik bošnjačke nacionalne zajednice u Srbiji, čije je uvođenje kao izbornog

predmeta u prva dva razreda osnovne škole najavilo republičko Ministarstvo
prosvete, može se, prema srpskom jezičkom standardu, zvati isključivo bošnjački
jezik, poručio je Odbor za standardizaciju srpskog jezika. Srpska nauka o jeziku
je nedvosmislena u tome da se jezički standard Bošnjaka u srpskom jeziku može
označiti samo sintagmom bošnjački jezik, istakao je predsednik Odbora za
standardizaciju srpskog jezika akademik Ivan Klajn, u odgovoru skupštinskom
Odboru za prosvetu. On je naglasio da srpska nauka, međutim, ne može, ni kada
bi htela, utvrđivati naziv jezika u bošnjačkom jezičkom standardu. “Ona to ne
može činiti uprkos tome što su se čelnici bošnjačkog naroda i bošnjačkog
jezičkog standarda, uvodeći naziv jezika u raskoraku s nazivom naroda, opredelili
za zbunjivanje dobronamernih ljudi u zemlji i inostranstvu, koji se, i preko naziva
jezika, mogu, koliko-toliko obavestiti o jasnoći kategorija i meritumu stvari”,
rekao je profesor Klajn. Prema njegovim rečima, nelogično bi bilo da se za
građane Bosne i Hercegovine, Bosance i Hercegovce, svrstane u tri nacionalne
zajednice – Srbe, Hrvate i Bošnjake, koji govore trima jezicima – srpskim,
hrvatskim i bosanskim uvodi zvanični “bosanski jezik”. To bi značilo
uspostavljanje “bosanskog” kao državnog jezika, dok bi srpski i hrvatski, po toj
logici, imali status manjinskih jezika. (POL-16-2-2005-82)
The language of the Bosniak national minority in Serbia, whose introduction as an

elective in the first two grades of elementary school has been announced by the
republic Ministry of Education, can, according to the Serbian standard language,
be called exclusively Bosniak language, the Board for the standardization of the
Serbian language said. Serbian linguistics clearly says that only the syntagma
Bosniak language can be used in Serbian for the standard language used by
Bosniaks, Chair of the Board for the standardization of the Serbian language,
academician Ivan Klajn [well-known Serbian linguist and member of SANU],
said in response to the parliamentary Education board’s inquiry. He also said that
Serbian linguistics could not, even if it wanted to, determine the name of the
language in the Bosniak standard language. “Serbian linguistics cannot do this
despite the fact that the leaders of the Bosniak people and Bosniak standard
language decided to confuse people in the country and abroad by opting for a
language name that was in discrepancy with the name of the people, but people
162
can nevertheless inform themselves about the clarity of categories and the
meritum of things via the name of the language, said professor Klajn. According
to him, it would be illogical to introduce ‘Bosnian’ as the official language for the
citizens of Bosnia-Herzegovina, Bosnians and Herzegovinans, members of three
[ethnic] national communities who speak three languages – Serbian, Croatian and
Bosnian. That would mean an introduction of Bosnian as the state language,
while Serbian and Croatian, according to this logic, would have the status of
minority languages.
However, the Bosniak minority is also sometimes given a voice, albeit very rarely, as in
the following example,
Ovo je za našu kulturu i za Bošnjake Sandžaka izuzetan istorijski događaj – rekao

je tim povodom autor Alija Džogović, podsećajući da je “bosanski jezik
zabranjen pre skoro sto godina”. (POL-26-10-2004-29)
For our culture and for the Bosniaks of Sanjak [area in Southwest Serbia with a
Bosniak majority] this is an exceptional historical event – said author Alija
Džogović [Bosniak textbook author in Sanjak/Serbia] on the occasion, reminding
that “the Bosnian language was outlawed almost a hundred years ago.”
while the discourse of contestation is sometimes, if very rarely, subverted by politicians
such as the then Minister of Education, Slobodan Vuksanović,
Bosanski, a ne bošnjački zbog toga što su se, objasnio je, građani izjasnili da je
njihov jezik bosanski, što je to tradicija i zato što se u Sarajevu, gde je bio sa
predsednikom Srbije Borisom Tadićem, uverio da bosanski jezik postoji na
lingvističkoj karti. (POL-10-12-2004-127)
Bosnian and not Bosniak because, as he explained, the citizens opted for Bosnian
as their language, because that’s the traditional name and because he had an
opportunity during a trip to Sarajevo with Serbian President Boris Tadić to see for
himself that the Bosnian language existed on the linguistic map.
6.6.4 Excerpts from texts representative of Factors 6 and 10. Finally in this
section, Factors 6 and 10 (Contestation over language ownership and name, Linguistics
as a science, lexicography, standardization and contestation) suggest a more general
discourse of Central South Slavic ethnolinguistic identity-related contestation, with
163
particular emphasis on the role of the Serbian Academy of Science and Arts (SANU).
Factor 6 points to texts discussing contestation over language ownership and its name as
well as linguacultural and ethnic authenticity, primarily involving Serbs and Serbian on
the one hand, and Croats and Croatian on the other, as in the following excerpts,
Društvo srpske slovesnosti, Srpsko učeno društvo i Srpska kraljevska

akademija predstavljaju tri čina u izrastanju najautoritativnije srpske naučne
institucije. Za sve te tri institucije važili su stavovi da su Srbi južnoslovenski
narod koji govori svojim jezikom, srpskim, dakle koji je blizak drugim
slovenskim jezicima ali i različit od njih i da Srba ima tri vere: pravoslavne,
rimokatoličke i muhamedanske. (POL-10-9-2005-127)
The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speaks its own, Serbian language, which is close to other
Slavic languages, but also different from them, as well as that Serbs had three
faiths: [Eastern] Orthodox, Roman-Catholic and Mohammedan.
Sve srpsko i hrvatsko bilo je pomešano. Pokazaće se kasnije da je to bila velika

greška. Jer će se sve to zajedničko, na novim i neprirodnim osnovama, ubrzo
početi da se deli. Ta deoba, projektovana sa hrvatske strane, prirodno išla je na
srpsku štetu. Ona je podrazumevala da Srbima u kulturi ostaje samo ono što su
stvorili pravoslavci srpskohrvatskog jezika. (POL-24-9-2005-45)
Everything Serbian and Croatian was mixed together. This will turn out to be a
big mistake because everything that was common, on a new and unnatural basis,
would soon begin to divide. That division, projected from the Croatian side,
naturally was at Serbian expense. It meant that Serbian culture could keep only
what had been created by Orthodox speakers of the Serbo-Croatian language.
Najmanji je po štetnosti problem što se neslućeno kasni u izdavanju tomova

Rečnika. Mnogo teže od toga pogađa što se posrnuće koje je zahvatilo srpsku
jezičku nauku odmah po smrti Vuka Karadžića (1864) nastavlja i danas. Srpski
lingvisti su, primenjujući srpski jezik (koji se tako zvao i u vreme poslednjeg
njegovog reformatora) u srpskohrvatski, ušli u period u kome je napušten
vukovski put u nazivanju i izgrađivanju jezika srpskog naroda. A da taj period još
traje potvrđuje činjenica da srpski lingvisti nastavljaju da u Rečniku zovu svoj
jezik “srpskohrvatski” i posle nestanka “srpskohrvatskog jezika”. (POL-08-8-
2003-80)
164
The delays in the publishing of the different volumes of the Dictionary are the
least problem in terms of damage. It is much more of a problem that the downfall
of the Serbian linguistic science that began immediately after the death of Vuk
Karadžić (1864) continues today. Renaming the language from Serbian (which is
what it was called during the time of its last reformer) into Serbo-Croatian,
Serbian linguists entered a period during which the Vuk’s path in the naming and
development of the language of the Serbian people was abandoned. That that
period still continues is confirmed by the fact that Serbian linguists continue to
call their language “Serbo-Croatian” in the Dictionary even after the demise of the
“Serbo-Croatian language”.
The name of the language is the most prominent point of contention, while the
discourses attested above (a discourse of ethnolinguistic identity-related contestation, a
[pseudo-] scientific discourse on language and linguistics) are merged with other similar
discursive elements into a discursive formation contesting the linguacultural authenticity
and ethnolinguistic identity (and thus, implicitly, political legitimacy) which rests on
questionable historical narratives, pseudo-scientific linguistic arguments, and selective
and transparently flawed comparisons to language situations elsewhere in the world.
Kada je na Saboru Hrvatske 1861. godine pokrenuto pitanje naziva službenog

jezika, predlagano je da bude: „hrvatsko-slavonsko-srbski”, „hrvatsko-
slavonski”, „hrvatsko-srbski”, „hrvatski ili srbski”, „hrvatski”, „srbski” i
„narodni u trojednoj kraljevini jezik”. […] Sabor je izglasao Zakonski članak po
kojem je službeni jezik nazvan „jugoslavenskim”. Srbi nisu bili zadovoljni
takvim rešenjem. […] U jugoslovenstvu koje im je ponuđeno u nazivu jezika
nepogrešivo su sagledali vid velikohrvatstva, kojim je trebalo izbrisati srpsko
ime, srpsko nacionalno osećanje, pa i samo srpsko nacionalno biće. […]
Privlačnim i prividno zadovoljavajućim nazivom i za Srbe i za Hrvate, iz već
pomenutih razloga, izostavljeno je srpsko ime. Jugoslovenskim imenom
prikrivena je velikohrvatska težnja. Tim imenom Srbe je žedne trebalo prevesti
preko vode, trebalo ih je postepeno, ali dosledno brisati iz svakodnevnog života
Hrvatske, lišiti ih političke individualnosti i učiniti ih sastavnim delom
hrvatskog „političkog” naroda. (POL-3-7-2006-192).
When, at the 1861 Croatian Assembly, the issue of the name of the official
language was brought up, it was suggested that it be: “Croato-Slavonic-Serbian”,
“Croato-Slavonic”, “Croato-Serbian”, “Croatian or Serbian”, “Croatian”,
“Serbian” or “people’s language in the three-nation Kindgom”. […] The
Assembly adopted a law according to which the official language was called
165
“Yugoslav”. Serbs were not satisfied with such a solution. […] In the
Yugoslavhood that was offered them, they unmistakably detected a form of
Greater-Croatianhood, the aim of which was to erase the Serbian name, Serbian
national identity, and even the Serbian national being itself. […] For the reasons
mentioned, the Serbian name was thus excluded via a proposal that was attractive
and seemingly satisfying for both Serbs and Croats. The Greater-Croatian
tendency was thus concealed by a Yugoslav name. That name was supposed to
trick the Serbs, they were supposed to be slowly but steadily erased from the
everyday life of Croatia, to deprive them of political individuality and make them
an integral part of a Croatian “political” people.
Po toj tezi, Vuk Karadžić je za osnov standardnog srpskog književnog jezika uzeo
jezik kojim su govorili pravoslavni istočni Hercegovci. A oni su, po toj hrvatskoj
nacionalističkoj teoriji, u stvari Hrvati prevedeni u pravoslavlje, tako da su Srbi
“ukrali” Hrvatima jezik koji danas zovu srpski, pa zato sada ima toliko problema
sa tim jezicima. (POL-10-7-2006-142)
According to this thesis, Vuk Karadžić took the language spoken by [Eastern]
Orthodox East-Herzegovinans as the basis of the standard Serbian literary
language. And they, according to this nationalist theory, were in fact Croats
converted to [Eastern] Orthodox Christianity, so Serbs “stole” the language they
call Serbian today from Croats, which is why there are so many problems with
these language now.
Bilo je u tom njihovom nazivlju, imenovanju i prikazivanju jezika mnogo

dvosmislica, smicalica i podvala. Govorili su da je to jedan, jedinstven, isti i
zajednički jezik. Ali to još ništa ne govori čiji je to jezik. Da, jeste jedan, ali
srpski, jeste jedinstven, ali ne i hrvatski, jeste zajednički, ali samo po upotrebi,
no nikako ne i po pripadnosti i poreklu. Ali sve to (što je isti, zajednički i jedan)
ne može biti razlog za dvočlano ili višečlano imenovanje jezika, ili za potpuno
preimenovanje srpskog jezika u hrvatski. Jezik može dobiti ime samo po narodu
čiji je to jezik, ali ne i po imenima naroda koji se tim jezikom služe. Engleski
jezik takođe je jedan, zajednički i isti jezik za sve narode koji njime govore, ali se
zna čiji je taj jezik i kako se on zove, bez obzira ko i gde govori njime. On je uvek
samo engleski i kad se njime govori u Sjedinjenim Državama, Kanadi, Australiji,
Novom Zelandu ili na bilo kojem kraju sveta. Takvi su još nemački, španski i
portugalski jezik. Jezik nije onoga ko tim jezikom govori, nego onoga ko je taj
jezik stvarao i stvorio. Srpski narod je vekovima stvarao svoj jezik. Hrvati nisu
stvarali taj jezik. Oni su ga dobili i preuzeli gotovog, sa svim odlikama koje je
srpski jezik već imao. (POL-22-7-2006-55)
There were many tricks in their labeling, naming and representing the language.
They said it was one, unified, the same and common language. But that doesn’t
say whose language that is. Yes, it is one, but Serbian, it is unified, but not also
166
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. But all this (it being the same, common and one) cannot be reason enough
to dual-label or multiple-label the language, or to entirely rename the Serbian
language into Croatian. A language can only be named after the people it belongs
to, but not after the names of the peoples who also use it. English is also one,
common and the same for all people who speak it, but it is well known whose
language it is and what it’s name is, regardless of who speaks it and where. It is
always only English even when it is spoken in the United States, Canada,
Australia, New Zealand or anywhere else in the world. Such are also German,
Spanish, and Portuguese language. A language does not belong to him who
speaks it, but to him who created it. Serbian people have created their language
for centuries. The Croats did not create that language. They got it and took it
over ready-made, with all the characteristics that the Serbian language already
had.
Nauka i politika Tako se u nauku o jeziku umešala politika: Nauka je neporecivo

utvrdila da je štokavski Vukov jezik srpski, politika je tražila da bude i hrvatski.
Ali kada su se hrvatski lingvisti osilili da drsko odbace srpski jezik iz naziva i da
taj srpski jezik nazovu hrvatskim, i samo hrvatskim, srpski lingvisti, pod
pritiskom već preživelih političkih ideja i shvatanja, i dalje uporno i tvrdoglavo
nazivaju svoj jezik i srpskim i hrvatskim (srpskohrvatskim). (POL-22-7-2006-
55)
Science and politics Thus politics began to meddle in science: Science undeniably
determined that the Vuk’s Štokavian language is Serbian, politics demanded that
it also be Croatian. But when Croatian linguists tyranically throw the Serbian
language out of the name and call that Serbian language Croatian, and Croatian
alone, Serbian linguists, under the pressure of outdated political ideas and
philosophies, continue to stubbornly call their language both Serbian and Croatian
(Serbo-Croatian).
Srpski lingvisti, dakle, nikako da shvate da više nema Jugoslavije, ni kraljeve ni

Brozove, u kojoj je manipulacija i u nauci bila sve, da više za sva vremena nema
“srpskohrvatskog jezika”, dvočlanog imena jezika nema više nigde u Evropi i
svetu (ni engleski se ne zove, niti se ikad zvao “američkoengleski”). Srpski
lingvisti ne razumeju ni čemu je poslužio naziv “srpskohrvatski/
hrvatskosrpski”, ili “hrvatski ili srpski” jezik (jedino tome da se “hrvatski
književni jezik” izdvoji iz srpskog jezika, da se izdvoji “bošnjački” ili “bosanski
jezik”, da se sada planira izdvajanje i “crnogorskog jezika”). (POL-08-8-2003-
80)
Serbian linguists seem to have a hard time understanding that there is no

Yugoslavia any more, neither King’s nor Broz’s, in which manipulation also in
science was everything, that the “Serbo-Croatian language” is gone for good, that
167
there are no dual-label language names anymore anywhere in Europe or the world
(even English is not called, nor has it even been called, “American-English”).
Serbian linguists also do not understand what the purpose of the label “Serbo-
Croatian/Croato-Serbian”, or “Croatian or Serbian” language was (only to
separate the “Croatian literary language” from the Serbian language, to separate
“Bosniak” or “Bosnian language”, to now plan the separation of the “Montenegrin
language).
Našim lingvistima ostaje da po volji biraju BHMS ili srpski. Vreme je da se naši
lingvisti potpuno okrenu nauci i da već jednom perstanu da strahuju od politike.
Bio bi to čin ne samo prihvatanja naučnih vrednosti nego i moralni čin pokajanja
i izvinjenja srpskoj nauci i srpskom narodu. Neka mirno i slobodno nazovu
veliki rečnik SANU srpskim rečnikom. (POL-22-7-2006-55).
Our linguists are left with a choice between BHMS

[Bosnian/Croatian/Montenegrin/Serbian] or Serbian. It is time that our linguists
turn entirely to science and stop fearing politics already. It would be not only an
act of acceptance of scientific values but also a moral act of repentance and
apology to Serbian science and Serbian people. Let them peacefully and freely
call the unabridged SANU dictionary a Serbian dictionary.
Factor 10, on the other hand, suggests a more technical discourse pertaining to
lexicography and language standardization. Texts scoring highly on this factor typically
discuss language standardization issues, linguistic studies or book editions, most of which
were produced in the framework of SANU.
Nedavno objavljena “Sintaksa srpskog jezika", u izdanju “Beogradske knjige",

predstavlja suštinsku novinu u naučnoj lingvističko-gramatičkoj obradi srpskog
jezika U izdanju “Beogradske knjige", Instituta za srpski jezik SANU i Matice
srpske upravo je iz štampe izašla monumentalna edicija (oko 1600 stranica), pod
naslovom “Sintaksa savremenog srpskog jezika. (POL-1-10-2005-165)
The recently published [book] “Serbian Syntax” (publisher: “Beogradska knjiga”

[Belgrade book]) represents a novelty in the scientific linguistic-grammatical
study of the Serbian language. The monumental edition (around 1,600 pages) of
the [book] titled “Synax of the contemporary Serbian language” was published
jointly by ‘Beogradska knjiga’, SANU Institute for the Serbian language, and
Matica Srpska [Serbian Language Association].
[Milan Šipka] Već sam jednom prilikom rekao da se nakon disolucije zajedničkog
srpskohrvatskog standardnog jezika srpski lingvisti, i Srbi kao narod, nisu
168
jasno odredili prema novonastaloj situaciji. Zbog toga znatno zaostajemo u
lingvističkim aktivnostima, što je posledica i odsustva šire društvene podrške
negovanju srpskog standardnog jezika i jezičke kulture. (POL-5-1-2008-166)
[Milan Šipka, well-known Bosnian Serb linguist] I’ve already said on one
occasion that Serbian linguists, as well as Serbs as a people, never adopted a clear
position toward the situation that came about after the dissolution of the common
Serbo-Croatian standard language. This is why we lag behind in terms of
linguistic activity, which is also a consequence of a lack of broader societal
support for the development of the Serbian standard language and the linguistic
culture.
The unabridged SANU dictionary of Serbian (in preparation since the 1960s) is the most
frequent reference in these texts,
nismo mnogo pažnje posvećivali takvim rečnicima, jer je glavni projekat

decenijama bio veliki Rečnik SANU za koji se smatralo da će, tako veliki,
zadovoljiti sve leksikografske potrebe (NIN-3-7-2008-182)
we never paid much attention to such dictionaries because our main project for
decades was the unabridged SANU dictionary of Serbian which, large as it was,
was supposed to meet all lexicographic needs
Lastly, the (pseudo-) scientific arguments and a discourse of contestation are present but
marginal compared to other relevant factors,
A to što se oni danas (sinonimno) nazivaju nacionalnim imenima: srpski,

hrvatski, bošnjački/bosanski, crnogorski, bunjevački ili maternji i dr. - zapravo je
time jedan naučni, lingvistički princip pretvoren u sociolingvstički ili
nacionalnopolitički koncept. (POL-17-6-2006-95)
That they today bear (synonymous) national names: Serbian, Croatian,

Bosniak/Bosnian, Montenegrin, Bunjevački, or mother tongue and so on – is
really a transformation of a scientific, linguistic principle into a sociolinguistic or
national-political concept.
6.6.5 Topoi. In addition to the basic content and thematic analysis above,
representative texts were examined also for evidence of argumentation strategies in the
DHA tradition (Wodak, 2001; topoi, explicit or inferable obligatory premises which make
169
it possible to connect arguments with the conclusion, or simply “the common-sense
reasoning typical for specific issues,” van Dijk, 2000 cited in Baker et al., 2008, p. 299).
Two particularly prominent and recurrent relevant topoi were identified:
Topos 1: (‘hard’, i.e., structural) linguistics provides irrefutable scientific
evidence that the language spoken by Central South Slavs is Serbian in origin
(and should therefore be called Serbian only), and
Topos 2: the polycentricity of Central South Slavic is comparable to the
polycentricity of languages such as English, Spanish, and Portuguese (as well as
German) all of which bear a single, original label.
The pseudo-scientific arguments which form the basis of these two topoi were
already noted above. Nauka ‘science’ and naučni ‘scientific’, for example, were
identified by both keyword (Tables 12, 14 and E1) and collocation (Table F1) analysis as
items of potential discursive and ideological interest; they also both loaded highly on
Factor 10 (Table 23) which was shown by cluster analysis to be discursively linked with
Factor 6 (Table 41), perhaps the single most representative factor of the discourse of
contestation. In the excerpts from representative texts above, we saw the following
examples of these pseudo-scientific arguments,
Science undeniably determined that Vuk’s Štokavian language is Serbian, politics

demanded that it also be Croatian. (POL-22-7-2006-55)
Competent experts, linguists with scientific authority, have claimed since the time
of Vuk Karadžić, the creator of the literary Serbian language, that the language of
Montenegrins and Serbs is the same (POL-31-3-2004-2)
Serbian linguistics clearly says that only the syntagma Bosniak language can be
used in Serbian for the standard language used by Bosniaks (POL-16-2-2005-82)
linguists have provided scientific argumentation for this, that there are no
scientific, linguistic, historical or sociocultural reasons to rename Serbian into
170
Montenegrin (POL-31-3-2004-2)
as if linguistics was a recent development so we didn’t know which languages

existed in the Balkans and around the world (POL-10-1-2005-134)
a transformation of a scientific, linguistic principle into a sociolinguistic or

national-political concept (POL-17-6-2006-95)
Our linguists are left with a choice between BHMS

[Bosnian/Croatian/Montenegrin/Serbian] or Serbian. It is time that our linguists
turn entirely to science and stop fearing politics already. It would be not only an
act of acceptance of scientific values but also a moral act of repentance and
apology to the Serbian science and Serbian people. (POL-22-7-2006-55)
The pseudo-scientific argumentation captured in Topos 2, on the other hand, is
less obvious in the results of the quantitative analyses (partly, perhaps, on account of the
omnipresence of references to English many of which do not pertain to this topos), but
quite prominent in the representative texts. The examples we saw above include,
If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian. (POL-7-7-2004-109)
Just a couple of examples which show that for languages used by two or more
nations the original name is used around the world. Thus, the Austrian nation,
which has had a state for a thousand years, speaks German and not Austrian. The
Swiss of Germanic origin speak German, not Swiss. American people call their
language English, not Anglo-American or American. (POL-10-1-2005-134)
A language can only be named after the people it belongs to, but not after the
names of the peoples who also use it. English is also one, common and the same
for all people who speak it, but it is well known whose language it is and what it’s
name is, regardless of who speaks it and where. It is always only English even
when it is spoken in the United States, Canada, Australia, New Zealand or
anywhere else in the world. Such are also German, Spanish, and Portuguese
languages. (POL-22-7-2006-55)
Similar pseudo-scientific arguments could also be seen in the discourse of endangerment
in the calls for the defense of the Cyrillic alphabet,
171
Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does. (POL-16-
3-2003-93)
do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language (POL-11-2-2005-108)
general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language. (POL-21-9-2004-72)
Finally, it should be noted that both of these topoi are in evidence in language-related
discourses coming from the leading academic linguists, as well as the more marginal
figures, civic associations, and private citizens (in letters-to-the-editor).
7. Discussion
7.1 Research Question 1
The first research question was: Can corpus linguistics-based quantitative
methods (keyword, collocation, exploratory factor, and cluster analyses) be used to
identify lexical patterns suggestive of dominant language-related discourses and
language ideologies in Central South Slavic and what similarities/differences are there
between them? This question was addressed through a continuing comparison between
the different methods and techniques throughout the presentation of results (Section 6).
Despite all the challenges that extensive inflectional morphology presents for
corpus-based analysis in general and of discourses and ideologies in particular, it is clear
that all four methods can be successfully applied to identify lexical patterns suggestive of
dominant language-related discourses and language ideologies. Keyword lemmas and
key-keywords and their associates provided a macroscopic view of the characteristic lexis
and lexical patterns in the research corpus and hinted at their covariance and thus the
discursive profile of the corpus as a whole, as well as individual dominant discourses.
172
Significant collocates and n-grams provided complementary evidence that confirmed and
supplemented the patterns identified by keyword analysis and added a phrasal dimension
to the discursive profile; collocation analysis also supplied data for exploratory factor
analysis and cluster analysis. Most importantly, exploratory factor analysis took the
somewhat amorphous collocate data and turned them into a detailed discursive profile of
the corpus based on covariance, providing an objective, replicable way of identifying
representative texts in the form of factor scores unavailable from any other methods.
(Analysis of variance, further, showed that, though sometimes difficult to interpret,
synchronic and diachronic variation do exist and can be used in conjunction with the
results of other methods.) Finally, cluster analysis built on both the collocate data and the
factorial structure to provide an account of the patterning in the data with respect to the
discursive links between factors, and the three independent variables for a more fine-
tuned discursive profile of the corpus.
Further, lexical patterns identified by keyword, collocation, factor, and cluster
analyses proved to be congruent and complementary. As might be expected, the
differences between the lexical patterns identified by each method were largely a product
of their different approaches to the data. For example, where keyword analysis focuses
on lexical items that are significantly more frequent in the research corpus, collocation
analysis focuses on the lexis co-occurring with the core concept(s). Both analyses thus
produce patterns that are characteristic of the corpus, but from different perspectives.
This has been shown to involve a great deal of overlap as well as some differences.
Keyword and collocation analyses thus sometimes pointed to two different sides of the
same coin, as it were. A good example here is the relative prominence of the item
173
latinica ‘Latin (alphabet)’ in the results of collocation analysis, and its almost complete
absence from the results of keyword analysis. Similarly, ćirilica ‘Cyrillic’ is considerably
more prominent in the results of keyword analysis. These two lexical items are both very
prominent in the discourse of endangerment around the Serbian Cyrillic alphabet and so
reliance on either keyword or collocation analysis alone would have presented an
incomplete picture, even it would have been quite possible to identify this pattern through
follow-up qualitative analysis.
Systematic similarities and differences can also be seen in the way these analyses
can combine pertinent lexical items into groups for higher-order analysis. Keyword
analysis is in this respect somewhat similar to factor analysis as it can take a text-based
(i.e., macroscopic) view of covariance between individual lexical items to produce sets
indicative of themes, discourses, and ideologies. A comparison between keyword
associates and factors illustrated how similar the results of these two analyses can be.
However, unlike keyword associates, factor analysis offers a parsimonious way to
identify representative texts in an objective, reliable, and replicable manner. This is
another clear indication of the superiority of factor analysis to the other methods
employed here, particularly because of the widespread politicization of and biases in the
linguistic research in this area.
Collocation analysis, on the other hand, does not offer a way of combining items
into groups, except for those that repeatedly occur together in more or less fixed ways
(i.e., phrases or n-grams). Further, similar to keyword analysis, collocation analysis does
not provide an objective way to identify representative texts. However, unlike both
keyword and factor analyses, collocation analysis offers concordance lines which can be
174
used to quickly assess lexical patterns in actual use in different texts, but which have been
shown to be of limited use here. Also, unlike all three perhaps, cluster analysis accounts
for all of the data, and provides a way of testing relationships between the factorial
structure and independent categorical variables (which was also done using analysis of
variance).
The second research question was: What language-related discourses and
language ideologies relevant to Central South Slavic ethnolinguistic identities can be
identified in the 5+ hits section of SERBCORP? This question was addressed through
examination of top scoring (i.e., representative) texts for evidence of explicit or implicit
references to Central South Slavic ethnolinguistic identities and topoi. The findings were
presented by factor (i.e., language-related discourse); factors identified by cluster analysis
as similar were treated together.
The quantitative evidence from keyword and collocation analysis showed that, at
the most general level, one of distinct and remote cultural identities (i.e., in SERBCORP,
see Appendix C), language is routinely conceptualized in terms of binary oppositions of
implicitly monolithic codes (cf. standard language ideology, Milroy, 2001). This was
indicated by frequent use of glottonyms which imply monolithic language varieties with
clearly demarcated boundaries and associated national identities (e.g., Serbian, English),
as well as sets of possessive pronouns constructing in- and out-groups and implying
ownership of language (e.g., our, own). At the level of SERBCORP as a whole, then, the
dominant language ideology in evidence is one of societal monolingualism and a natural
one-to-one correspondence between language and national identity, which at the same
175
time seems to be an expression of a belief in the “impossibility of heterogeneous
communities and the naturalness of homogeneous communities” (i.e., homogeneism,
Blommaert & Verschueren, 1998, p. 207). However, despite the binary difference and the
emphasis on an “us and them” view of collective identity, differences between what are
understood as distinct language varieties are internalized and taken for granted and so
there is very little evidence of identity-related contestation. In other words, only intra-
linguistic (i.e., Central South Slavic) identities are contested here.
This is, of course, entirely different at the level of less distinct and geographically
and culturally closer regional (i.e., Central South Slavic) ethno-cultural identities (5+ hits
section of SERBCORP). Here, even the most basic quantitative analysis pointed to the
prominence of lexical items such as, for example, name, label, renaming and (does not)
exist and thus a tendency toward negation of separateness and contestation of separate
names and identities. Keyword associates (Section 6.1.3) and n-grams (Section 6.2.2)
confirmed this tendency and showed that it pertained to a limited set of Central South
Slavic ethnolinguistic identities (e.g., the renaming of the Serbian language into
Montenegrin), while factor analysis showed the (big ‘D’) discourses of endangerment and
particularly contestation to be the most dominant, extending across six of the twelve
identified factors (i.e., small ‘d’ discourses). The dominant conceptualization of language
here is still one of natural one-to-one correspondence between language and (ethno-)
national identity and thus also homogeneism, but now the boundaries between in- and
out-groups are much less clearly defined as the lack of linguistic distinctiveness is used to
undermine claims to separate identity (as elsewhere in Europe, cf. Blommaert &
Verschueren, 1998). Note further that there is a familiar tendency to emphasize
176
differences projected outwardly and minimize (or erase) differences projected inwardly,
typical of nationalism (see, e.g., Hobsbawm, 1990). Ultimately, the dominant language
ideology in the mainstream Serbian newspaper discourse is an essentialist one, conflating
language, alphabet, and literature, and insisting on language as an embodiment of the
putative immutable, primordial character of the nation that created it, e.g.,
For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […]. (POL-16-3-2003-93)
Profesor Lompar veruje da je reč o “ideološkom zahvatu”: “Cilj je da se

književnost svede na puku umetnost, na likovno i muzičko, a ona nije samo
umetnost. To znači zanemarivanje njenog kulturnog, istorijskog, antropološkog
aspekta. Jer, za razliku od drugih naroda, književnost je u Srba presudni
konstituent nacionalnog identiteta, beleg postojanja u dugom trajanju turskih
vekova. Zato je prevođenje književnosti na medijski, funkcionalni aspekt za nas
pogubno i isto što i brisanje identiteta.” (NIN-13-3-2003-381)
Professor Lompar [Professor, School of Philology, University of Belgrade]

believes this is about an “ideological project”: “The aim here is to reduce
literature to a mere art form, to a fine-art status such as that of painting or music,
but it is not only an art form. This would mean a neglect of its cultural, historical,
anthropological aspects. Because, in contrast to other nations, for Serbs literature
is a crucial constituent of the national identity, proof of existence throughout the
long duration of the Turkish centuries. This is why the reduction of literature to
its media, functional aspect is detrimental for us and equal to an erasure of
identity.
But, how are we to understand the function of such a conceptualization of language and
the use of specific argumentation strategies (i.e., topoi), particularly with respect to the
discourses of endangerment and ethnolinguistic identity-related contestation which have
been shown to be so pervasive here?
The third research question was: What links can be identified between the
177
language-related discourses and language ideologies relevant to Central South Slavic
ethnolinguistic identities and ethnonationalism? This question was addressed through a
historical and sociopolitical contextualization of the findings obtained through
quantitative and qualitative methods.
Contestation is not a new phenomenon in the Balkans, nor is it limited to the
Central South Slavic area. In addition to the contestation we note today, there have been
earlier historical examples, sometimes making equally absurd claims, such as the theory
developed by German nationalists in the nineteenth century (“Windischentheorie”)
according to which Carinthian Slovene dialects were more closely related to Germanic
than to Slavic languages, or the orchestrated negation of Macedonian identity by the
Greeks, Bulgarians, and Serbs, particularly in the latter half of the twentieth century,
which still continues today (for details about these, see Voss, 2006). Arguably, these
represent examples of what Irvine and Gal (2000) call ‘fractal recursivity’ whereby (inter-
linguistic) binary oppositions are used for the specific local purposes of delegitimation of
one (intra-linguistic) ethnolinguistic identity or another. All this contestation has two
things in common. The first is a focus on language. As the French-Serbian scholar, Yves
Tomić, notes in his expert report on the ideology of Greater Serbia in the nineteenth and
twentieth centuries written for the United Nation’s International Criminal Tribunal for the
former Yugoslavia (UN ICTY) in The Hague (Tomić, n.d.), at the root of the Greater
Serbian ideology is the Herderian language ideology according to which language is the
only valid criterion for the determination of national identity (see also Carmichael, 2000).
The second feature shared by the Balkan contestations is the instrumentalization of
linguistics in nationalist projects. According to Friedman (1999, p. 20),
178
At the end of the nineteenth and beginning of the twentieth centuries (and even
today, see, e.g. Glenny, 1995), linguists were putting their knowledge at the
service of politicians by choosing one or another isogloss as the definitive
justification for their ethnic identity – and therefore nationality […]. [L]inguistic
features become ‘flags’ that are manipulated to represent territorial claims. […].
The claims about nationality [a]re then translated into claims for the territory to be
included in the nation-state.
In his study of the negations of the Macedonian ethnolinguistic identity, Voss (2006, pp.
120-122) thus writes that “even in Yugoslav times we notice the coincidence of national
language ideology and ethnic identity ideology”, a contradiction which “becomes even
sharper after 1991” when “cultural policy became a tool in the rivalry of post-communist
elites” and which “remains unresolved today”.
However, cultural policy has been a favorite tool in the nationalist projects for
much longer. In her book, Yugoslavia’s implosion: The fatal attraction of Serbian
nationalism, Sonja Biserko, a former Yugoslav diplomat and the president of the Helsinki
Committee for Human Rights in Serbia, traces the origins of contemporary Serbian
nationalism back to the beginnings of the nineteenth century, “the formative period of
Serbia as a nation-state” (Biserko, 2012, p. 34), and the idea of resurrection of the
fourteenth-century Serbian medieval empire, “a patriarchal, Orthodox, ethnically
homogeneous state” (p. 33). This idea, known as “Greater Serbia” throughout the
twentieth century, was first formulated into a national strategy in 1844 in a work by Ilija
Garašanin, Serbian minister of internal affairs from 1843 to 1852, famously titled
“Načertanije” (‘draft plan’). The plan envisaged a resurrection of the medieval Serbian
state which had been destroyed by the Turks by integrating all Balkan territories in which
Serbs lived, either as a majority or as a minority, into a single state. These included large
parts of Croatia, Vojvodina (part of Hungary at the time), Bosnia-Herzegovina,
179
Montenegro, and northern parts of Albania (Tomić, n.d., p. 13). The plan has also been
widely known by an oft-repeated formula which summarizes it as “all Serbs in one state”.
However, much like elsewhere in Europe, this was a time of Romanticism and inception
of the national consciousness,36 when collective identities where much less clearly
delineated and much more fluid than they are today, so it was not always clear who Serbs,
for example, were. But this dilemma would be conclusively solved for Serbian
nationalists by the Serbian linguist and ethnographer, Vuk Karadžić, who created the
Serbian Cyrillic alphabet and initiated the standardization of modern Serbian. In his
book, indicatively titled Serbs, all and everywhere, written in 1836 and published in
1849, Karadžić demarcated the national Serbian territories and launched the theory of
Serbs as a people of several faiths (i.e., Orthodox, Catholic, and ‘Mohammedan’) unified
by a common language (see, for example, Tomić, n.d., pp. 8-9). Indeed, Western analysts
of nineteenth-century Balkans nationalist ideologies such as Behschnitt (1980, p. 71,
cited in Tomić, n.d., p. 10, Note 13), consider the ideas of Vuk Karadžić to be the
“linguistic and cultural ideology of Greater Serbia.”
As Biserko (2012) notes, there is a clear ideological continuity in Serbian
nationalism in the last two centuries, from the formation of Serbia as a nation-state in the
first half of the nineteenth century, to the two world wars and two Yugoslav states in the
first half of twentieth century, to the breakup of Yugoslavia and the ensuing Yugoslav
wars at the end of the twentieth century. However, for our purposes here, one other
historical moment is particularly important. As noted in the introduction, Yugoslavia was
showing signs of internal struggles and instability already before Josip Broz Tito’s death
in 1980. This trend was accelerated by the political and economic uncertainties in the
180
period following Tito’s death. Again, as mentioned above, Serbian elites tended to view
Yugoslavia as a form of Greater Serbia and therefore tried to impose a centralization of
the state which was opposed primarily by Slovenes and Croats (cf. Biserko, 2012). In
this climate and very much in the Serbian tradition of drafting conspiratorial nationalist
strategies, a group of Serbian intellectuals, members of the Serbian Academy of Sciences
and Arts at least one of whom was a leading linguist (Pavle Ivić), drafted the infamous
SANU Memorandum in the fall of 1986 (SANU, 1986). The memorandum alleged Serbs
and Serbia to be in an “unequal position” and “threatened” in Yugoslavia, and blamed the
1974 constitution which decentralized the country and gave greater rights to individual
republics, arguably a historically and politically valid arrangement but one in which Serbs
were a minority everywhere outside of Serbia itself. As Biserko (2012, p. 82) notes, “[i]n
essence, the Memorandum reiterared the Serbian national agenda from the late nineteenth
and early twentieth century, calling for ‘the liberation and unification of the entire Serb
people and the establishment of a Serb national and state community on the whole Serb
territory’.” Needles to say, the Memorandum was and continues to be widely regarded in
the former Yugoslavia as the definitive statement of the Serbian nationalist program, the
Greater Serbia, which was the principal cause of the 1990s wars.
Most interestingly, the Memorandum mentions the noun ‘language’ ten times, the
noun ‘linguists’ one time, and the adjective ‘linguistic’ three times in its thirty-two pages
of text, purveying discourses of endangerment and contestation and a language ideology
of essentialism rather similar to those attested above,
Manipulacije sa jezikom […]
Manipulation of language […]
181
Delovi srpskog naroda, koji u znatnom broju žive u drugim republikama, nemaju
prava, za razliku od nacionalnih manjina, da se služe svojim jezikom i pismom, da
se politički i kulturno organizuju, da zajednički razvijaju jedinstvenu kulturu svog
naroda.
Unlike national minorities, parts of Serbian people, living in other republics in

considerable numbers, do not have the right to use their own language and
alphabet, to organize politically and culturally, to participate in the joint
development of a unified culture of their people.
nametanja službenog jezika koji nosi ime drugog naroda (hrvatskog) oličavajući
time nacionalnu neravnopravnost
imposition of an official language which bears the name of another people

(Croatian), which illustrates national inequality
Taj je jezik ustavnom odredbom učinjen obaveznim i za Srbe u Hrvatskoj, a

nacionalistički nastrojeni hrvatski jezikoslovci sistematskom i odlično
organizovanom akcijom sve ga više udaljavaju od jezika u ostalim republikama
srpskohrvatskog jezičkog područja, što doprinosi slabljenu veza Srba u Hrvatskoj
sa ostalim Srbima.
That language was made compulsory also for Serbs in Croatia through a
constitutional decree, while the nationalist Croatian linguists continue to distance
it from the language in other republics of the Serbo-Croatian language area
through systematic and well-organized actions, which contributes to the
weakening of the links between Serbs in Croatia and other Serbs.
Praktično značenje izjava: „moramo brinuti“, „treba se boriti“, „više treba učiti
ćirilicu“ itd. može se procenjivati samo u njihovom suočenju sa stvarnom
jezičkom politikom koja se vodi u SRH. Ostrašćena revnost kojoj je cilj
konstituisanje zasebnog hrvatskog jezika što se izgranuje u protivstavu prema
svakoj ideji o zajedničkom jeziku Hrvata i Srba ne ostavlja dugoročno mnogo
izgleda srpskom narodu u Hrvatskoj da očuva svoj nacionalni identitet.
The practical meaning of statements such as “we must take care of”, “we need to
fight”, “Cyrillic should be taught more often”, etc., can be evaluated only against
the real language policy in the Federal Republic of Croatia. The zeal whose aim
is to create a separate Croatian language, opposed to the idea of a common
language of Croats and Serbs, does not leave much longterm prospect to the
Serbian people in Croatia of preserving their national identity.
Pod dejstvom vladajuće ideologije kulturne tekovine srpskog naroda otuđuju se,
prisvajaju ili obezvređuju, zanemaruju ili propadaju, jezik se potiskuje, a ćirilsko
182
pismo postepeno gubi.
As a consequence of the ruling ideology, the cultural inheritance of the Serbian

people is being alienated, appropriated, or devalued, neglected or left to ruin, the
language is being supressed, and the Cyrillic alphabet is being gradually lost.
Although the topoi featuring pseudo-scientific arguments on language attested above are
missing here, the discourse of endangerment is present and is more pronounced than
above, while the discourse of contestation is implicit rather than explicit. Apparently, the
discourse of Serbian linguistic nationalism evolved between 1986 and early 2000s,
adapting to the changing circumstances and replacing the alarmist, mobilizing discourse
of endangerment of the pre-war and war periods with pseudo-scientific argumentation
which is arguably more likely to be effective in a post-war period characterized by
widespread conflict fatigue. Further, the Croats are clearly labeled as the enemy through
their discursive construction as an out-group and by way of predication strategies such as
labeling them “nationalist” and “zeal(ous)” which are then used to justify the proposed
action (i.e., a recentralization of the Yugoslav state). In other words, all the main
elements of the discursive complex of Serbian linguistic nationalism which we saw above
are on display here also. This is confirmed by the results of quantitative analysis which
identified Vuk Karadžić and SANU as some of the pertinent lexical items in the research
corpus: both Vuk Karadžić and SANU appear in the key lemma (Tables 12, 14, and E1)
and collocation (Tables G1 and G2) lists, while SANU also appears numerous times in n-
grams (Table 19) and Factor 10 in EFA (Table 23). Similarly, the results of qualitative
analysis exemplify the routine intertextual references to Vuk Karadžić and the work done
within SANU and thus the interdiscursivity between language and nationalism in Serbia,
as in the following example,
183
It is much more of a problem that the downfall of the Serbian linguistic science
that began immediately after the death of Vuk Karadžić (1864) continues today.
Renaming the language from Serbian (which is what it was called during the time
of its last reformer) into Serbo-Croatian, Serbian linguists entered a period during
which the Vuk’s path in the naming and development of the language of the
Serbian people was abandoned. That that period still continues is confirmed by
the fact that Serbian linguists continue to call their language “Serbo-Croatian” in
the Dictionary even after the demise of the “Serbo-Croatian language”. (POL-08-
8-2003-80)
The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speak their own, Serbian language, which is close to
other Slavic languages, but also different from them, as well as that Serbs had
three faiths: Orthodox, Roman-Catholic and Mohammedan. (POL-10-9-2005-
127)
The dominant contemporary (and hegemonic) language-related discourses and
language ideologies in evidence in the mainstream Serbian press therefore seem largely
to derive from the revived Serbian nationalist program first articulated in the nineteenth-
century discursive and ideological work by Vuk Karadžić and Ilija Garašanin, as well as
the 1986 SANU Memorandum, and are employed as a cultural policy tool in various
aspects of the realization of this program and the establishment of a Greater Serbian
hegemony in the South Slavic area in the Balkans. The alternative, and sometimes
counter-hegemonic, language-related discourses and language ideologies, on the other
hand, though present (as in the critique of the work of the “Cyrillic” associations
presented at the end of Section 6.6.1), are marginal and not easily detectable by either
quantitative or qualitative methods. In short, in the mainstream Serbian press, language-
related discourses and language ideologies are largely discourses and ideologies of
Serbian ethnonationalism, which, in the wake of the breakup of Yugoslavia, can be
considered to be part of the Serbian nationalists’ “strategies of perpetuation” which
184
“attempt to maintain or reproduce a threatened national identity” (Wodak, de Cillia,
Reisigl & Liebhart, 1999, p. 33).
The fourth research question was: Is there synchronic and diachronic variation in
the identified language-related discourses and language ideologies relevant to Central
South Slavic ethnolinguistic identities? This question was addressed through a
comparison of the factor scores for each of the six selected language-related discourses
(i.e., factors) of texts grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year
of publication: 2003, 2004, 2005, 2006, 2008; and c) type of article: general newspaper
articles vs. letters-to-the-editor. Synchronic and diachronic variation were also examined
using cluster analysis.
The patterns of synchronic variation (between different publications) in language-
related discourses identified by factors suggest differences between the broadsheet daily
Politika (est. in 1904) as the oldest and most presitigious daily in Serbia and the weekly
NIN (est. in 1935) as the oldest weekly in Serbia, on the one hand, and the tabloid daily
Blic (est. in 1996) and the weekly Vreme (est. in 1990), on the other. Politika and NIN
articles, it will be remembered, scored significantly more highly than Blic or Vreme on all
factors except Factors 4 and 8 (Officialization of Montenegrin). Although this pattern is
difficult to interpret with any degree of certainty, it seems safe to conclude that the older,
more conservative Politika and NIN offer better representations of dominant discourses
and thus of linguistic nationalism in Serbia. However, it should be noted that the
qualitative examination of representative texts suggested a high degree of congruence in
the actual (big ‘D’) discursive and ideological content of texts across all four publications
185
(see Section 6.6), so this difference may be quantitative rather than qualitative in nature.
The dominance of the identified discourses is underscored by their relative
stability over time. Although a period of five to six years arguably is not long enough for
significant differences to emerge, it is indicative that there were no statistically significant
diachronic differences in language-related discourses in five of the six examined factors,
particularly because the one factor that actually showed diachronic variation (Factor 2:
Cyrillic-only) was the most marginal to the contestation of Central South Slavic
ethnolinguistic identities. This finding is corroborated by the qualitative examination of
representative texts which suggested a high degree of congruence in the actual (big ‘D’)
discursive and ideological content of texts across all five years of publication considered
here. Finally, although the patterns of synchronic variation between different types of
article (general newspaper articles vs. letters-to-the-editor) presented a more complicated
picture, with some congruence (Factors 10 and 11) as well as significant differences
(Factors 2, 4, 6 and 8), also here the qualitative examination of representative texts
suggested a high degree of congruence in the actual (big ‘D’) discursive and ideological
content of texts.
8. Conclusion
This study had two major goals. The first was to determine whether a corpus-
linguistic methodology could be effectively applied to a language featuring extensive
inflectional morphology, and, if yes, to compare the different quantitative methods in
terms of their usefulness and effectiveness for identification of lexical patterns suggestive
of language-related discourses and, ultimately, language ideologies. The second goal was
to identify and describe dominant language-related discourses and language ideologies in
186
the mainstream Serbian press and to then examine those for any links with
ethnonationalism. Because the methodological comparison was the relatively more
straightforward of the two goals, it was dealt with first throughout the dissertation and it
is also dealt with first in this final section (Section 8.1). Conclusions about language-
related discourses and language ideologies identified in this study are offered in Section
8.2. Implications, limitations, and directions for future research are discussed in Sections
8.3, 8.4 and 8.5, respectively.
8.1 Methodological Comparison
It has been noted already that most studies of language-related discourses and
language ideologies have relied on qualitative methods only, while those that also use
quantitative methods tend to rely on keyword and collocation analysis. The reasons for
this include difficulties with the operationalization of discourse and ideology in
quantitative terms, as well as the relative ease of use and effectiveness of basic corpus-
linguistic methods such as keyword and collocation analysis. However, as this study has
demonstrated, despite the difficulties with the application of a quantitative methodology
to slippery concepts such as discourse and ideology, this approach offers novel insights
unavailable through a qualitative-only approach. Further, despite their mutual similarities
and differences, the quantitative methods applied here were shown to each offer a unique,
complementary angle from which to consider the data, which ensures both a more
comprehensive analysis and a higher degree of reliability. For example, keyword
analysis was shown to be much more effective than collocation analysis at corpus
profiling for sampling purposes. In other words, the decision to focus on the 5+ hits
section of SERBCORP would have been more difficult to justify based on the results of
187
collocation analysis alone. Perhaps most interestingly, it was demonstrated that reliance
on keyword and collocation analysis is comparatively less effective in research dealing
with large, topically heterogeneous corpora. As noted above, topically homogeneous
corpora such as single-issue parliamentary debates or student evaluations of university
instructors, regardless of their size, tend to have more homogeneous discursive and
ideological profiles also which are easier to identify using basic techniques. Topically
heterogeneous data sets, on the other hand, present a challenge on account of their more
heterogeneous and therefore more complex discursive and ideological profiles. Again,
this is where exploratory factor analysis proved to be far superior to other methods.
One conclusion, therefore, is that these methods are complementary in that they
provide both unique (e.g., micro- vs. macroscopic) and common (e.g., frequent, recurrent
lexis) perspectives, so they are best applied in conjunction with one another for the
purposes of triangulation. Perhaps most importantly, however, exploratory factor
analysis (based though it is on the results of collocation analysis) is the only method that
can effectively take researcher inference out of the process of identification both of (small
‘d’) discourses (i.e., factors) and representative texts (by providing factor scores for each
text). This finding is of paramount importance for research into discourses and
ideologies, particularly for research focusing on areas where linguistic research is
traditionally politicized and biased. Similarly, this is important for critical discourse
analysis which has been in need of research guided and supported by the results of
transparent, replicable procedures. Nevertheless, it is clear that researcher inference is
ultimately necessary for any meaningful interpretations, as quantitative procedures can
only identify salient lexical patterns and small ‘d’ discourses. This study thus also
188
confirms the effectiveness of a mixed methods approach whereby quantitative and
qualitative methods are combined in a hermeneutic fashion, i.e., qualitative analysis is
based on and guided by the results of quantitative analysis and vice versa.
Lastly, a note on the challenges presented by extensive inflectional morphology.
Despite its dangers, lemmatization may be a necessary evil as its advantages (e.g.,
aggregation of semantically closely related and highly correlated variables) seem to
outweigh its drawbacks (e.g., conflation of lexical items with potentially different
collocational or other patterning). Beyond a multiplicity of semantically related forms,
extensive inflectional morphology also has considerable potential for polysemy in the
form of homonyms/homographs. The solution, which unfortunately was not available
during the research reported on in this study, is reliable grammatical differentiation
through part-of-speech tagging. Similarly, extensive inflectional morphology presents a
problem for concordance analysis also because it breaks up the semantic unity of a
lemma (and with it its concordance patterns) into a myriad forms which concordancing
software such as WST is unable to deal with at the present moment. Here, the solution is
less clear, but will involve the development of software more capable of dealing with the
challenges of extensive inflectional morphology.
8.2 Language-related Discourses, Language Ideologies, and Ethnonationalism:
What It All Means
Two principal, dominant language-related discourses were identified in this study.
The first, a discourse of endangerment, was itself attested in two related forms each of
which focuses on a different aspect of language as well as a different ‘threat’. The first
form focuses on the perceived danger to the Cyrillic alphabet that a widespread use of the
189
Latin alphabet in Serbia is deemed to represent. Perhaps somewhat unusually among the
world’s languages, Central South Slavic uses both the Latin and the Cyrillic alphabets.
While the alphabets themselves are fully equivalent, the difference between them is in
their sociohistorical origins and thus sociolinguistic in nature. Despite the equivalence,
the Cyrillic alphabet has historically been strongly associated with the Serbian
ethnonational identity, whereas the Latin alphabet, largely because it provides a means of
distinction from the Serbs, is strongly preferred by the Bosniaks, Croats, and now
Montenegrins also. Here, then, we see an example of what Irvine and Gal (2000) call
‘iconization’ or mapping of linguistic features onto social images, positing a direct link
between one or more linguistic features and (an essentialist conceptualization of) the
nature of the persons or social groups who display them (for a reverse example, see
Section 6.6.1). Interestingly, however, the Latin alphabet is also in widespread use
among the Serbs themselves, both in Serbia and elsewhere, mostly because it has come to
be associated with modernity and the Western civilization (another example of
iconization). However, the Cyrillic alphabet is preferred in official contexts and has
never been in true danger of being phased out. There rather seems to exist a kind of
alphabetical diglossia, whereby the Cyrillic alphabet, in further examples of iconization,
is used in most official contexts and is associated with political conservatism and cultural
traditionalism, while the Latin script is mainly used in popular culture and is associated
with political liberalism and modernity (but see Herzfeld, 1987 for a problematization of
the concept of diglossia). The assessment of the endangerment of the Cyrillic is thus
(grossly) exaggerated, and the calls for its defense have primarily ideological and
political motivations. Note here that the calls to banish the Latin alphabet from Serbian
190
altogether represent an example of ‘erasure’ as the simplification of a sociolinguistic field
through which some persons, social groups, or sociolinguistic phenomena are rendered
invisible in ideologically and politically convenient ways (Irvine & Gal, 2000). Serbian
society has traditionally harbored a cult of victimhood since the defeat by the Ottoman
Turks in 1389 in Kosovo (see, e.g., Biserko, 2012), while vocal self-interested public
proclamations of endangerment of all things Serbian have become so pervasive (and
profitable) since the breakup of Yugoslavia that there are now widely used expressions
such as “to Serb” and “a professional Serb” to (mockingly) refer to the phenomenon.
Most proclamations of the endangerment of the Cyrillic in this data set (and, arguably, in
general) thus come from the fringe (minor civic associations and minor-league
academics) that harbors extremist political views; nevertheless, they are given enormous
(undue) attention and space in the mainstream media.
The second form of the discourse of endangerment focuses on the perceived threat
to the Serbian language and ethnonational identity posed by the post-Yugoslav political
and cultural independence of other Central South Slavs and their concomitant exercise of
their right to name the common language to reflect their own separate identities (i.e.,
Bosnian, Croatian, Montenegrin). Though equally baseless and ultimately absurd, this
second form of the discourse of endangerment is purveyed largely by leading academic
linguists, many of whom, such as Ivan Klajn for instance, are members of SANU, as well
as other academics and prominent writers (again, often members of SANU) and, more
often than not, Serbian politicians and, sometimes (though not attested here), also
members of the clergy in the Serbian Orthodox Church. As could be seen from the many
samples of this discourse cited above, there is an insistence on (and perhaps also a belief
191
in) the Serbian ethnic origins of all Central South Slavs (as well as the Macedonians who
speak a related but separate South Slavic language but are predominantly Eastern
Orthodox). The argument, based on this theory of ethnic origins and the tradition started
by Vuk Karadžić, but also the (politically convenient) Herderian language ideology
according to which language is the only valid criterion for ethnonational affiliation, is
that the “renaming” of the language represents rasparčavanje ‘partitioning’ (a recurrent
theme in public discourse in Serbia since the breakup of Yugoslavia and a keyword in the
sense of Williams, 1976) of the Serbian Volk and therefore a step towards its ultimate
destruction. Figure 5 shows the discourse prosody and a range of applications for (the
decidedly negative term) rasparčavanje, from the partitioning of the Byzantine empire
(line 1) to the partitioning of the Balkans (lines 2, 7) to the partitioning of Yugoslavia
(lines 3, 4, 6) to the partitioning of the (Serbian) language and literature (lines 5, 8, 10-
18) to the partitioning of the Vinča Institute of Nuclear Sciences at the University of
Belgrade (9, 19, 20). Note that a similar discourse of endangerment was evident in the
SANU Memorandum, although the focus there was on the ‘threat’ from the Croats since
Bosniaks and Montenegrins had not yet asserted themselves by mid 1980s when the
memorandum was drafted.
The second dominant language-related discourse identified in this study is a
discourse of contestation. This discourse is directly related to the second form of the
discourse of endangerment and is routinely purveyed by the same set of actors, only here
emphasis is on arguments whose ultimate aim is to delegitimize non-Serb Central South
Slavic identities and with them their political legitimacy and ultimately their rights to
territory and sovereignty.
192
Figure 5. Concordance lines for rasparčavanje ‘partitioning’ in SERBCORP
The discourse of contestation relies on pseudo-scientific arguments and an
instrumentalization of linguistics for political purposes, which is widely noted in the
literature. Typically, the argumentation rests on either Topos 1 or Topos 2, or both.
Topos 1 presents the issue of language name as a “scientific” problem which has been
conclusively settled, although it is never quite explained what this exactly means or what
the scientific methodology used to come to this conclusion was, except for occasional
vague references to dialectological studies. Regardless, the claim is repeatedly made that
the only “scientifically” justified name for Central South Slavic is Serbian. Topos 2,
similarly, draws a selective, flawed comparison between the polycentricity of Central
South Slavic and colonial languages such as English, Spanish and Portuguese (but also
German), arguing that in cases of such polycentricity the “original” name is always
193
preserved even if the language comes to be used by different nations. The fact that
colonial languages were transplanted to the new nations largely via colonialist enterprises
is ignored, and is in fact rather indicative of the conceptualization behind this argument;
that Serbian was not transplanted from Serbia to other Central South Slavic nations in
similar fashion is also ignored. Simultaneously, inconvenient examples of polycentricity
such as, for instance, that in Scandinavia (see, e.g., Vikør, 2000) or the situation with
Hindi/Urdu, which is nearly identical to that with Central South Slavic in several respects
(e.g., a high degree of mutual comprehensibility, different alphabets, correlation between
linguistic, ethnic and religious affiliations), are entirely ignored.
It has thus been shown that underlying the discourses of endangerment and
contestation here is an imported essentialist Western language ideology which derives
from the language philosophy of Johann Gottfried Herder as well as the Romantic
movement (Bauman & Briggs 2000, 2003), coupled by a Slavic language ideology
termed slovesnost in Serbian which encompasses language, alphabet, and literature and
treats them as different aspects of a unified linguistic entity. It has also been shown that
there exist intertextual and interdiscursive links between language-related discourses and
language ideologies in the mainstream Serbian press, on the one hand, and Serbian
ethnonationalism, on the other, as well as a common institutional site, SANU. Finally, it
has been shown that the dominant language-related discourses and language ideologies
were widely accepted in Serbian society during this period. Based on these findings, we
can conclude that, in the case of Serbia and the Balkans, language ideologies (and
language-related discourses) are indeed not “about language alone” (Woolard, 1998, p.
3), as well as that “[t]he continuing intensity of contestation” over language and
194
ethnonational identity is “hardly surprising, given the consequences envisaged and
authorized by the reigning language ideology and occasionally enacted under its auspices.
It is an ideology in which claims of linguistic affiliation are crucial and exclusivist
because they are also claims to territory and sovereignty” (Irvine and Gal, 2000, p. 72,
my italics). In other words, to make a reference to a somewhat similarly intractable
conflict situation of the Middle East, “[t]he political and military conflicts have stopped
but the linguistic conflict goes on” (Abd-el-Jawad & Al-Haq, 1997, p. 439). The ultimate
aim of prolonging this artificial and unnecessary conflict is clear,
Constructing language and tradition and placing them in relationship to

nature/science and society/politics continues to play a key role in producing and
naturalizing new modernist projects, new sets of legislators, and new forms of
social inequality. Which is not to say that the “new” does not often bear a
remarkable resemblance to what has come before, often centuries earlier. Indeed,
it would be difficult to imagine a time that the power of this process was more
apparent than at the end of the twentieth century and the beginning of the twenty-
first (Bauman & Briggs, 2003, p. 301).
What remains to be done is to take a stand with respect to such ideologies, fully
recognizing the impossibility of ideology-free positions. Here, the best course of action
seems to be to follow theorists such as Gramsci (1971) and Gee (2010) in their appeal to
judge ideologies in terms of their social effects rather than their truth values.
8.3 Implications
The findings of this study should be informative and useful to several different
audiences. The results of the comparison between the quantitative methodologies should
be of interest to scholars interested in mixed methods approaches to language ideologies,
but also those interested in similar approaches to discourses. The discursive and
language-ideological profiles presented here should be of interest to scholars in
195
sociolinguistics, as well as those in related fields such as linguistic anthropology, political
science, and sociology, as this is, to the best of my knowledge, currently unavailable.
More specifically, regional linguists and other public figures with an interest in language
such as politicians, academicians, and religious officials will be interested in how their
contributions influence the public discourse on language and dominant language
ideologies. Similarly, journalists and others working in and with the media will be
interested in their impact as purveyors of discourses and language ideologies. Finally,
members of the public in the four states of the Central South Slavic area, as well as
regionally, should find the language-ideological profile of the mainstream Serbian press
of some interest, especially because language continues to be a highly contested issue
which is often exploited for political purposes. It is hoped that empirical contributions
based on transparent, replicable procedures such as this one can help deconstruct the
complex of ethnonationalist discourses and ultimately contribute to regional
reconciliation.
8.4 Limitations
Despite the effort to triangulate the findings by employing multiple
methodological (keyword, collocation, factor, and cluster analysis, analysis of variance,
CDA/DHA) and epistemological (quantitative and qualitative, macro- and miscroscopic
analysis) perspectives, as well as an exhaustive data set (including all relevant articles
from the given time-frame), the language-ideological profile presented here must be
understood as tentative. The primary reason for this is that manifestations of public
language-related discourses and language ideologies are not limited to the press, but can
be found throughout the public sphere. Furthermore, while the press is an excellent
196
source of data on dominant discourses and ideologies purveyed by the elites, it is clear
that it does not represent equally well what the linguistic anthropologist Paul Kroskrity
calls “practical consciousness” (Kroskrity, 2004, p. 505), i.e., the so-called ordinary
people who tend to accept and naturalize dominant ideologies. In addition to this, there
are further aspects of language-related discourses that merit consideration such as, for
example, their extended diachronic development or their possible correlation with
political affiliation, which are considered only in part here. Similarly, the concepts of
discourse and (language) ideology are notoriously problematic and the difficulties
associated with them are necessarily carried over into the study proposed here. Although
language ideology, more often than not, is a linguistic discursive phenomenon, to treat it
as a solely linguistic discursive phenomenon as proposed here is to give but a partial
account of it. In other words, although the lexical approach to the identification of
language ideologies is well established, language ideologies are by no means limited to
their lexical manifestations and so should be examined from multiple other perspectives
(see below).
8.5 Future Research
This study was conceived as part of a larger project of identification and
description of dominant language-related discourses and language ideologies in the entire
Central South Slavic area, i.e., Bosnia-Herzegovina, Croatia, Montenegro, and Serbia.
The rationale for such a broad, comparative examination is that language-related
discourses and language ideologies in these four states are intimately linked on account
of the nations’ shared history. So, the first task for future research would be to produce
individual accounts of language-related discourses and language ideologies for the
197
remaining three national contexts. The second task would be to subject the findings of
individual case studies to a comparative analysis which would make possible a
comprehensive view of language-related discourses and language ideologies and their
links to ethnonationalism in this area. Simultaneously with this, future research would do
well to include also data from other discursive sites such as various kinds of institutional
documentation, popular culture, and language-related discussions in social media. Also,
consideration of independent variables such as political affiliation would provide further
distinctions in our attempt to account for the totality of language-related discourses and
ideologies and ethnonationalism in West Central Balkans as well as elsewhere. Finally,
in addition to other kinds of data from different discursive sites, the approach to the
identification of language-related discourses and language ideologies could be extended
to non-lexical aspects of language (e.g., grammatical relationships, semiotic
manifestations), as well as relevant non-linguistic social and discursive practices (e.g.,
allocation of material and symbolic resources).
1
For a critical review of the literature on the evolution of a symbol system that is arguably what makes
Homo sapiens unique, see Luuk 2013.

2
The demise of the former Yugoslavia has unleashed a regional culture of contestation so pervasive that it
can, not unreasonably, be characterized as pathological. This has meant the politicization of virtually
everything from strategically insignificant borderland areas between Slovenia and Croatia to the very name
of Macedonia which continues to be contested by Greece (see Voss, 2006); the virus, to continue the
medical metaphor, has easily infected the regional academic production, particularly in linguistics.
Greenberg (2004) himself thus notes that often such works, “given the ethnic affiliation of their authors, are
subjective and at times lack the scholarly rigor required in the study of linguistics” (p. 4; see also Irvine &
Gal, 2000, pp. 67-68). This situation has necessitated a reliance on outside sources, unaffiliated with any of
the parties in the region, so Kordić (2010), for example, relies heavily on extra-regional sources,
198
particularly those originating in the German-speaking world. However, as Greenberg also notes, little
relevant material is available in English, while also the treatments in other languages (e.g., Gröschel, 2009),
curiously enough, often exhibit obvious and disqualifying biases and are therefore not relied upon here (see
Katičić, 1997 for a possible explanation).

3
As is well known, Slovenia and Croatia were part of various Austrian and Hungarian kingdoms between
roughly the twelfth century and 1918, Serbia and Montenegro were part of the Ottoman empire between the
middle of the fifteenth century and 1878, while Bosnia-Herzegovina was part of both, belonging to the
Ottoman empire between the middle of the fifteenth century and 1878, and to the Austro-Hungarian empire
comparatively briefly between 1878 and 1918.

4
Partly owing to the policies of the Ottoman Empire which recognized faiths rather than ethnicities among
its subjects, there is a close association between religious and ethnic identities in the Balkans. Most
(religious) Croats are therefore Catholic, most (religious) Serbs and Montenegrins Eastern Orthodox, while
most (religious) Bosniaks are Muslim. This was further reinforced by the 1974 Yugoslav constitution
which recognized the non-Christian Bosnians as a separate ethnonational group but one defined by its
religious affiliation rather than ethnic identity, Muslims (as opposed to their now official ethnonym,
Bosniaks).
5
During that time, the area was still divided between the Austro-Hungarian and Ottoman empires, but
Serbia, in particular, exploited the increasing weakness of the Ottoman Empire to quickly move towards
full independence. Serbia and Montenegro were finally granted independence by the great European
powers at the Berlin Congress in 1878, while Bosnia-Herzegovina was placed under the administrative rule
of the Austro-Hungarian Empire to be annexed in 1908; Croatia remained part of Austria-Hungary until the
end of World War I in 1918.

6
The first mention of the name Serbo-Croatian in a codification work, according to Greenberg (2004, p.
54) came in 1867, in Pero Budmani’s Serbo-Croatian Grammar (see also Katičić, 1997, p. 171).
7
Although Montenegro had been independent since 1878, it was annexed by Serbia shortly after the end of
World War I.
8
Advancing her thesis that Serbo-Croatian/Croato-Serbian is one polycentric language rather than several
different languages, Kordić (2010), counter to virtually all Croatian linguists, contends that Serbian
199
domination of the common standard is “a myth”. However, while this may be true to a large extent for
Croatia, it is patently untrue for Bosnia-Herzegovina and Montenegro (see Endnote 10). In addition, her
insistence on the exclusionary, if historical, name for the language, i.e. Serbo-Croatian/Croato-Serbian, is
rather telling.
9
I.e., South-Slavia, a name which symbolically incorporates all Southern Slavs (except Bulgarians). One
should note, however, that for many non-Serb former Yugoslavs this name has come to index Serbian
domination and hegemony.

10
The existence of separate Bosniak and Montenegrin (ethnonational) identities was disputed by the Croats
and particularly by the Serbs, who since at least the time of Vuk Karadžić, the nineteenth-century Serbian
language reformer, had disputed the existence of any other separate Central South Slavic ethnolinguistic
identities including the Croatian, arguing as Karadžić did that the Croats and Bosniaks were “Serbs of the
Catholic and Islamic faiths”, respectively. For details on Vuk Karadžić, see Endnote 27.
11
Not only did no Bosniak or Montenegrin linguists or literary figures participate in either the Vienna or
Novi Sad agreements (see, e.g., Völkl, 2002, p. 216), Conclusion 7 of the Novi Sad agreement literally
reads, “[…] A mutually (sic!) agreed-upon Commission of Serb and Croat experts will develop a draft of
the Orthographic manual. […]” (Greenberg, 2004, p. 172, my italics).

12
This was strictly enforced as all children were taught and required to use both alphabets in elementary
schools, while all public institutions were required to display bi-alphabetal signs and use both alphabets in
their day-to-day operation. An illustrative example of this latter practice is the alternating use of the two
alphabets by the leading national daily Oslobodjenje, which published in the Latin alphabet one day and in
the Cyrillic alphabet the next.

13
After Bosnian Serbs and Bosnian Croats declared their languages to be Serbian and Croatian,
respectively, Bosniaks reverted to the historical designation of language in Bosnia-Herzegovina which had
first been upheld then abruptly ended by Austria-Hungary, declaring their language to be Bosnian in the
face of opposition from the other two groups which continues to this day.
14
Montenegro chose not to declare independence at that time and instead sided with Serbia in the
subsequent wars. Macedonia was spared a military conflict until a brief civil war with its Albanian
minority in 2001. Kosovo, long a southern Serbian province with a large Albanian majority, was
200
recognized as an independent state in 2008 after a brief 1999 war with Serbia and a NATO intervention
which forced the Serbian military out.

15
Srpska akademija nauka i umetnosti ‘Serbian Academy of Science and Arts’
(https://www.sanu.ac.rs/English/Index.aspx). For a discussion of the 1986 SANU Memorandum, see
Section 7.3.
16
The period 2003-2008 was chosen on the basis of data availability at the time of corpus compilation.
17
Precise circulation figures are somewhat difficult to come by in the Balkans. Independent auditors such
as ABC Srbija (Audit Bureau of Circulations, www.abcsrbija.com) do keep track of circulation figures for
marketing purposes, but their reports are proprietary and require a costly subscription for access. However,
information on circulation figures can also be obtained from occasional press reports issued by publishing
houses such as the Color Press Group (e.g., http://www.color.rs/novosti120.html).

18
Although objective measures of the standing of newspapers in a society (other than circulation figures)
are currently unavailable, Serbian newspaper market is fairly small and centralized, so there is little doubt
as to which publications are generally considered to be authoritative. It should also be noted that, in
addition to the four publications included in this study, the broadsheet daily Večernje Novosti also would
merit inclusion here on account of its relatively high circulation and standing; however, complete data sets
for the subject period were unavailable at the time of corpus compilation, so the Večernje Novosti data were
excluded from the research corpus.

19
The data for the year 2007 were excluded because the Politika data set for 2007 was incomplete due to a
download limit on the source website at the time of compilation.

20
Ebart (est. 2000) is a privately-owned, subscription-based commercial service archiving Serbian media
content.
21
JEZIK = jezik, jezika, jeziku, jezikom, jezici, jezike, jezicima (lemma forms by number and case).
22
The text of the articles was saved in plain txt format using Unicode (UTF-16) encoding and formatted
according to the TEI-guidelines for electronic text encoding and interchange (http://www.tei-
c.org/Guidelines/).
23
Pronouns, numbers, and quantifiers were excepted from deletion on account of their potential functions
in discourse strategies such as the referential/nomination strategy (i.e., the construction of in- and out-
201
groups, Wodak, 2001, pp. 72-74) as well as other, as-of-yet undetermined discursive functions.
24
Following recommendations in Tabachnick and Fidell (2007), the full data set was subjected to a log
transformation in an attempt to retain the cases previously identified as multivariate outliers and enhance
the factorability. However, the log transformation did not produce significantly better results, so the
remainder of the analysis was performed on the original data set.

25
Similar to ‘United States’, the country name ‘Montenegro’, in Serbian, consists of two words, Crna
(negro) and Gora (monte). Consequently, the two words were treated separately by collocation analysis
and were identified as two different collocates of the lemma JEZIK. Unsurprisingly, they turned out to be
correlated with one another to the point of singularity, so a decision was made to exclude one of the two
from further analysis and treat the remaining one as the full country name.
26
Although Biber (1988, p. 85, Note 2) correctly notes that “oblique solutions might be generally
preferable in studies of language use and acquisition, since it is unlikely that orthogonal, uncorrelated
factors actually occur as components of the communication process,” Varimax (orthogonal) and Promax
(oblique) rotations produced virtually the same results on this data set.
27
Z-scores were preferred to regression analysis here because the addition of multivariate outliers produced
a slightly different and inferior factor solution and thus factors and factor scores which were not directly
comparable to the preferred solution above and the regression analysis estimates of factor scores in it.
28
Vuk Stefanović Karadžić (1787-1864) was a Serbian language and literary scholar who created the
spelling system in use in contemporary Serbian and other Central South Slavic varieties and published
several early Serbian dictionaries and editions of Southern Slavic folk literarure (see
http://www.britannica.com/EBchecked/topic/311960/Vuk-Stefanovic-Karadzic). He remains one of the
most revered figures of Serbian history.

29
A similar remark can be made with respect to the concepts of semantic and discourse prosody. Although
both concepts have been found to be useful in analysis of discourse, lexical patterns here are seemingly too
heterogeneous for any clear prosodies to emerge on account of a) the topical heterogeneity of the research
corpus, and b) the multifariousness of the concept of language compared to concepts such as age, for
example (see Mautner, 2007).

30
The seventeen significant collocates of Vuk (ordered by total number of occurrences) are: Karadžić,
202
reforma ‘reform’, Stefanović, jezik ‘language’, srpski ‘Serbian’, sabor ‘assembly’, prvi ‘first’, [Petar II
Petrović] Njegoš [1813-1851, poet, philosopher, a Prince-Bishop of Montenegro], [Vuk] Drašković [a
Serbian writer and government minister in the early 2000s], Dositej [Obradović, 1739-1811, Serbian writer,
Enlightenment philosopher, and the first Minister of Education of Serbia], zadužbina ‘endowment’, reč
‘word’, nagrada ‘award’, jezički ‘linguistic’, delo ‘work’, ministar ‘minister’, and knjiga ‘book’.
31
The verb postoji ‘exists’ was identified as a shared significant collocate of the core concept lemmas
bosanski jezik ‘Bosnian language’ (8 occurrences, MI score = 6.696) and crnogorski jezik ‘Montenegrin
language’ (18 occurrences, MI score = 7.223), but not hrvatski jezik ‘Croatian language’. Although it is
clear from other evidence (e.g., excerpts from texts representative of factors/discourses) that Croatian also
is routinely discursively constructed as non-existent, Croatian, as noted already, has historically enjoyed a
more or less equal status with Serbian and is often implicitly treated as more legitimate than either Bosnian
or Montenegrin. It should further be noted that other lexical items are also used for the same purpose (e.g.,
the gerund postojanje ‘existence’; for examples, see Table 19).

32
Most key-keywords, particularly the more frequent ones, also have the keyword language as an
associate.
33
Following previous research (e.g., Vessey, 2013a), also the ‘plot’ and ‘patterns’ functions of the WST
concordancer were considered. The ‘plot’ function calculates the total number of hits for each text (but, as
already mentioned above, only for individual lemma forms) as well as their dispersion throughout a text.
The ‘patterns’ function presents identified collocates in a table ordered by their frequencies in each slot in
the collocation horizon around the node word (L5-R5). Unfortunately, both proved to be marginally useful
with a data set of this size, so they were excluded from further analysis.
34
In original text, the salient collocates are in bold. Because literal translations into English make for poor
readability, translated text may include equivalents which are not identified as salient collocates in the
original text. In translated text, both salient collocates and equivalents are underlined.
35
Nicholas I, Nikola Petrović (1841-1921), prince and king of Montenegro (see
http://www.britannica.com/EBchecked/topic/414057/Nicholas-I).
36
In a comprehensive study of nationalism in popular Serbian literature from the critical period between
1985 and 1995, Žunić (1999) suggests that Serbian literary Romanticism played a formative role in the
203
development of Serbian nationalism. Furthermore, he finds evidence of an instrumentalization of literature
in the Serbian nationalist project around the breakup of Yugoslavia. Quite in line with the prevalent
understanding of the role of literature in Serbian society (briefly exemplified in Section 7), some of the
most prominent popular literary figures of this period whose works Žunić (1999) studies were or still are
members of SANU (e.g., former President of the Federal Republic of Yugoslavia and one of the authors of
the SANU Memorandum, Dobrica Ćosić).
204
References
Aarsleff, H. (1982). From Locke to Saussure: Essays on the study of language and
intellectual history. Minneapolis: University of Minnesota Press.
Abd-el-Jawad, H. R. S., & Al-Haq, F. A. A. (1997). The impact of the peace process in
the Middle East on Arabic. In Clyne, M. (Ed.), Undoing and redoing corpus
planning (pp. 415-444). Berlin, New York: Mouton de Gruyter.
Althusser, L. (1971). Lenin and philosophy and other essays. London: New Left Books.
Anderson, B. (1983). Imagined communities. London: Verso.
Baker, P. (2004). Querying keywords: Questions of difference, frequency and sense in
keywords analysis. Journal of English Linguistics, 32(4), 346-359.
Baker, P. (2006). Using corpora in discourse analysis. London: Continuum.
Baker, P. (2010) Sociolinguistics and corpus linguistics. Edinburgh: Edinburgh
University Press.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyzanowski, M., McEnery, T., and
Wodak, R. (2008). A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine discourses of refugees and
asylum seekers in UK press. Discourse & Society, 19(3), 273-306.
Baker, P., Gabrielatos, C., & McEnery, T. (2013). Sketching Muslims: A corpus driven
analysis of representations around the word ‘Muslim’ in the British Press 1998-
2009. Applied Linguistics, 13(3), 255-278.
Barbour, S. (2000). Nationalism, language, Europe. In S. Barbour & C. Carmichael
(Eds.), Language and nationalism in Europe (pp. 1-17). Oxford: Oxford
University Press.
Barić, E., Hudeček, L., Koharović, N., Lončarić, M., Lukenda, M., Mamić, M.,
Mihaljević, M., Šarić, Lj., Švaćko, V., Vukojević, L., Zečević, V., & Žagar, M.
(1999). Hrvatski jezični savjetnik [Croatian language handbook]. Zagreb: Školske
Novine.
Bassi, E. (2010). A contrastive analysis of keywords in newspaper articles on the “Kyoto
Protocol”. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 207-218).
Amsterdam: John Benjamins Publishing.
Bauman, R., & Briggs, C. L. (2000). Language philosophy as language ideology: John
Locke and Johann Gottfried Herder. In P. V. Kroskrity (Ed.), Language regimes:
Ideologies, polities, and identities (pp. 139-204). Santa Fe, New Mexico: School
of American Research Press.
Bauman, R., & Briggs, C. L. (2003). Voices of modernity: Language ideologies and the
politics of inequality. Cambridge: Cambridge University Press.
Berber Sardinha, T. (1999). Using key words in text analysis: Practical aspects. Direct
Papers 42, 1-9. ISSN 1413-442x.
Berber Sardinha, T. (2004). Linguistica de corpus. Barueri: Sao Pãulo.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, D. (1993). Representativeness in Corpus Design. Literary and Linguistic
Computing, 8(4), 243-257.
Biber, D. (2006). University language: A corpus-based study of spoken and written
registers. Amsterdam: John Benjamins Publishing.
205
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in
university teaching and textbooks. Applied Linguistics, 25(3), 371-405.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language
structure and use. Cambridge: Cambridge University Press.
Biber, D., & Staples, S. (in press). Cluster analysis. In L. Plonsky (Ed.), Advancing
quantitative methods in second language learning. Routledge.
Biserko, S. (2012). Yugoslavia’s implosion: The fatal attraction of Serbian nationalism.
Belgrade: The Norwegian Helsinki Committee. Retrieved from
http://www.helsinki.org.rs/doc/yugoslavias%20implosion.pdf [Last accessed May
5, 2015]
Blackledge, A. (2005). Discourse and power in a multilingual world. Amsterdam: John
Benjamins Publishing.
Blackledge, A., & Pavlenko, A. (Eds.) (2002). Language ideologies in multilingual
contexts [Special Issue]. Multilingua 21(2/3).
Blommaert, J. (Ed.) (1999). Language ideological debates. Berlin, New York: Mouton de
Gruyter.
Blommaert, J. (2005). Discourse: A critical introduction. Cambridge: Cambridge
University Press.
Blommaert, J. (2006a). Language ideology. In B. Keith (Ed.), Encyclopedia of language
& linguistics (pp. 510-522). Boston: Elsevier.
Blommaert, J. (2006b). Language policy and national identity. In T. Ricento (Ed.),
Introduction to language policy: Theory and method (pp. 238-253). Malden, MA:
Blackwell Publishing.
Blommaert, J., & Verschueren, J. (1998). The role of language in European nationalist
ideologies. In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory (pp. 189-210). New York: Oxford
University Press.
Bondi, M., & Scott, M. (Eds.) (2010). Keyness in texts. Amsterdam: John Benjamins.
Bourdieu, P. (1991). Language and symbolic power. Cambridge, MA: Harvard
University Press.
Brown, G., & Yule, G. (1983). Discourse analysis. Cambridge: Cambridge University
Press.
Bugarski, R. (2004). Language policies in the successor states of former Yugoslavia.
Journal of Language and Politics, 3(2), 189-207.
Carmichael, C. (2000). ‘A people exists and that people has its language’: Language and
nationalism in the Balkans. In S. Barbour & C. Carmichael (Eds.), Language and
nationalism in Europe (pp. 220-239). Oxford: Oxford University Press.
Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language
Learning & Technology, 14(2), 30-49.
Cheng, W., & Lam, P. W. Y. (2013). Western perceptions of Hong Kong ten years on: A
corpus driven critical discourse study. Applied Linguistics, 34(2), 173-190.
Cortes, V., & Csomay, E. (Eds.) (2015). Corpus-based research in applied linguistics:
Studies in honor of Doug Biber. Amsterdam: John Benjamins.
Crapanzano, V. (2000). Serving the word: Literalism in America from the pulpit to the
bench. New York: New Press (distributed by W.W. Norton).
206
Culpeper, J. (2009). Keyness: Words, parts-of-speech and semantic categories in the
character-talk of Shakespeare’s Romeo and Juliet. International Journal of
Corpus Linguistics, 14(1), 29–59.
de Beaugrande, R. (1999). Discourse studies and the ideology of ‘liberalism’. Discourse
Studies, 1(3), 259-295.
DiGiacomo, (1999). Language ideological debates in an Olympic city: Barcelona 1992-
1996. In J. Blommaert (Ed.), Language ideological debates (pp. 105-142). Berlin,
New York: Mouton de Gruyter.
Dirven, R., Hawkins, B., & Sandikcioglu, E. (Eds.) (2001). Language and ideology (Vols.
1 & 2). Philadelphia: John Benjamins Publishing Company.
Đoković, D., Hrvatin, S. B., & Petković, B. (2004). Media ownership and its influence on
independence and pluralism of media in Serbia and the region. Belgrade: Medija
centar. Retrieved from
http://www.mc.rs/upload/documents/biblioteka/vlasnistvomedija1.pdf [Last
accessed May 5, 2015]
Dronjic, V. (2011). Serbo-Croatian: The making and breaking of an ausbausprache.
Language Problems & Language Planning, 35(1), 1-14.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English
for Academic Purposes. English for Specific Purposes, 28(3), 157-169.
Eagleton, T. (1991). Ideology: An introduction. London: Verso.
Edwards, J. (1985). Language, society, and identity. Oxford: Blackwell.
Ensslin, A. (2010). ‘Black and white’: Language ideologies in computer game discourse.
In S. Johnson, & T. M. Milani (Eds.), Language ideologies and media discourse:
Texts, practices, politics (pp. 205-222). London: Continuum.
Ensslin, A., & Johnson, S. (2006). Language in the news: Investigating representations of
“Englishness” using WordSmith Tools. Corpora, 1(2): 153-185.
Erjavec, K. (2009). The Bosnian “war on terrorism”. Journal of Language and Politics,
8(1), 5-27.
Fairclough, N. (2001). Language and power (2nd Ed.). Essex: Pearson Education
Limited.
Fairclough, N. (2010). Critical discourse analysis: The critical study of language (2nd
ed.). Harlow: Longman.
Fishman, J. A. (1972). Language and nationalism. Rowley: New Berry House Publishers.
Fishman, J. A. (1997). Language and ethnicity: The view from within. In Coulmas, F.
(Ed.), The handbook of sociolinguistics (pp. 327-343). Oxford: Blackwell.
Fitzsimmons Doolan, S. (2009). Is public discourse about language policy really public
discourse about immigration? A corpus-based study. Language Policy 8, (4), 377-
402.
Fitzsimmons Doolan, S. (2011). Identifying and describing language ideologies related
to Arizona educational language policy (Unpublished doctoral dissertation).
Northern Arizona University, Flagstaff, AZ. (UMI No. 3467048)
Fitzsimmons Doolan, S. (2014). Using lexical variables to identify language ideologies in
a policy corpus. Corpora, 9(1), 57-82.
Fleischer, A. A. (2007). The politics of language in Quebec: Language policy and
language ideologies in a pluriethnic society (Unpublished doctoral dissertation).
Georgetown University, Washington, D.C.
207
Ford, C. (2001). The (re-)birth of Bosnian: Comparative perspectives on language
planning in Bosnia-Herzegovina (Unpublished doctoral dissertation). University
of North Carolina at Chapel Hill, Chapel Hill, NC.
Foucault, M. (1972). The archeology of knowledge. London: Tavistock.
Fought, C. (2006). Language and ethnicity. Cambridge: Cambridge University Press.
Fowler, R. (1991). Language in the news. London: Routledge.
Fraysee-Kim, S. H. (2010). Keywords in Korean national consciousness: A corpus-based
analysis of school textbooks. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp.
219-234). Amsterdam: John Benjamins Publishing.
Freake, R. (2011). A cross-linguistic corpus-assisted discourse study of language
ideology in Canadian newspapers. Paper presented at the Corpus Linguistics
Conference, Birmingham, England. Retrieved from http://www.birmingham.ac.uk
/documents/college-artslaw/corpus/conference-archives/2011/paper-17.pdf [Last
accessed May 5, 2015]
Freake, R., Gentil, G., & Sheyholislami, J. (2011). A bilingual corpus-assisted discourse
study of the construction of nationhood and belonging in Quebec. Discourse &
Society, 22(1) 21-47.
Friedman, V. (1999). Linguistic emblems and emblematic languages: On language as
flag in the Balkans. Columbus: Department of Slavic and East European
Languages and Literatures at the Ohio State University.
Gal, S. (1998). Multiplicity and contention among language ideologies: A commentary.
In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language
ideologies: Practice and theory (pp. 317-331). New York: Oxford University
Press.
Gal, S. (2001). Linguistic theories and national images in nineteenth-century Hungary. In
S. Gal, & K. A. Woolard (Eds.), Languages and publics: The making of authority
(pp. 30-45). Manchester: St. Jerome.
Gal, S., & Woolard, K. A. (Eds.) (2001). Languages and publics: The making of
authority. Manchester: St. Jerome.
Gee, P. J. (2010). An introduction to discourse analysis: Theory and method. London:
Routledge.
Giddens, A. (1979). Central problems in social theory: Action, structure and
contradiction in social analysis. Berkeley and Los Angeles: University of
California Press.
Gramsci, A. (1971). Selections from the prison notebooks of Antonio Gramsci. London:
Lawrence & Wishart.
Gray, B., & Biber, D. (2013). Lexical frames in academic prose and
conversation. International Journal of Corpus Linguistics, 18(1), 109-135.
Greenberg, R. D. (2004). Language and identity in the Balkans: Serbo-Croatian and its
disintegration. Oxford: Oxford University Press.
Gröschel, B. (2009). Das Serbokroatische zwischen Linguistik und Politik: Mit einer
Bibliographie zum postjugoslawischen Sprachstreit [Serbo-Croatian between
linguistics and politics: With a bibliography on the post-Yugoslav language
conflict]. Lincom Europa: München.
Habermas, J. (1989). The structural transformation of the public sphere. Cambridge,
MA: MIT Press.
208
Halliday. M. A. K., & Matthiessen, C. M. I. M. (2004). Introduction to functional
grammar. New York: Routledge.
Hardt-Mautner, G (1995). ‘Only connect’: Critical discourse analysis and corpus
linguistics. Retrieved from http://ucrel.lancs.ac.uk/papers/techpaper/vol6.pdf
[Last accessed May 5, 2015]
Haugen, E. (1972). Dialect, language, nation. In J. B. Pride & J. Holmes
(Eds.), Sociolinguistics (pp. 97-111). Harmondsworth: Penguin. (Originally
published in American Anthropologist 68 (1966): 922-935.)
Heller, M. (1999). Heated language in a cold climate. In J. Blommaert (Ed.), Language
ideological debates (pp. 143-172). Berlin, New York: Mouton de Gruyter.
Herzfeld, M. (1987). Anthropology Through the looking-glass: Critical ethnography in
the margins of Europe. Cambridge: Cambridge University Press.
Hobsbawm, E. (1990). Nations and nationalism since 1780. Cambridge: Cambridge
University Press.
Hornberger, N. H., & McKay, S. L. (Eds.) (2010). Sociolinguistics and language
education. Bristol: Multilingual Matters.
Hult, F. M., & Pietikainen, S. (2014). Shaping discourses of multilingualism through a
language ideological debate: The case of Swedish in Finland. Journal of
Language and Politics, 13(1), 1-20.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University
Press.
IBM (2012). Statistical Package for the Social Sciences.
Irvine, J. T. (1989). When talk isn’t cheap: Language and political economy. American
Ethnologist, 16(2), 248-267.
Irvine, J. T., & Gal, S. (2000). Language ideology and linguistic differentiation. In P. V.
Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp. 35-
83). Santa Fe, New Mexico: School of American Research Press.
Jaffe, A. (1999). Ideologies in action: Language politics on Corsica. Berlin: Mouton de
Gruyter.
Johnson, S., & Ensslin, A. (2007). Language in the media: Representations, identities,
ideologies. New York: Continuum.
Johnson, S., & Milani, M. M. (Eds.) (2010). Language ideologies and media discourse:
Texts, practices, politics. London: Continuum.
Johnson, S., Milani, M. M., & Upton, C. (2010). Language ideological debates on the
BBC ‘Voices’ website: Hypermodality in theory and practice. In S. Johnson, & T.
M. Milani (Eds.), Language ideologies and media discourse: Texts, practices,
politics (pp. 223-251). London: Continuum.
Johnson, S., & Suhr, S. (2003). From ‘political correctness’ to ‘politische Korrektheit’:
Discourses of ‘PC’ in the German newspaper, Die Welt. Discourse & Society,
14(1), 49-68.
Katičić, R. (1997). Undoing a ‘unified language’: Bosnian, Croatian, Serbian. In M.
Clyne (Ed.), Undoing and redoing corpus planning (pp. 269-289). Berlin: Mouton
de Gruyter.
Kloss, H. (1967). ‘Abstand languages’ and ‘Ausbau languages’. Anthropological
Linguistics, 9(7), 29-41.
Kordić, S. (2010). Jezik i nacionalizam [Language and nationalism]. Zagreb: Durieux.
209
Kroskrity, P. V. (1998). Arizona Tewa Kiwa speech as a manifestation of a dominant
language ideology. In B. B.Schieffelin, K. A.Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory. New York: Oxford University Press.
Kroskrity, P. V. (Ed.) (2000a). Language regimes: Ideologies, polities, and identities.
Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2000b). Regimenting languages: Language ideological perspectives. In
P. V. Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp.
1-34). Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2004). Language ideologies. In A. Duranti (Ed.), A companion to
linguistic anthropology (pp. 496-517). Malden, MA: Blackwell Publishing.
Kuo, S., & Nakamura, M. (2005). Translation or transformation? A case study of
language and ideology in the Taiwanese press. Discourse & Society, 16(3), 393-
417.
Lippi-Green, R. (2007). English with an accent: Language, ideology, and discrimination
in the United States. London: Routledge.
Luuk, E. (2013). The structure and evolution of symbol. New Ideas in Psychology, 31(2),
87-97.
Mautner, G. (2007). Mining large corpora for social information: The case of elderly.
Language in Society, 36, 51-72.
May, S. (2001). Language and minority rights: Ethnicity, nationalism and the politics of
language. Harlow: Pearson.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. New York: Routledge.
McGroarty, M. (2008). The political matrix of linguistic ideologies. In B. Spolsky & F.
M. Hult (Eds.), The handbook of educational linguistics (pp. 98-112). Malden,
MA: Blackwell Publishing.
McGroarty, M. (2010). Language and ideologies. In N. N. Hornberger & S. L. McKay
(Eds.), Sociolinguistics and language education (pp. 3-39). Bristol: Multilingual
Matters.
Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal
of Sociolinguistics, 5(4), 530-555.
Moskovljević, M. (1966). Rečnik savremenog srpskohrvatskog jezika s jezičkim
savetnikom [Dictionary of contemporary Serbo-Croatian with a language
handbook]. Beograd: Tehnička Knjiga i Nolit.
O’Rourke, B., & Ramallo, F. (2013). Competing ideologies of linguistic authority
amongst new speakers in contemporary Galicia. Language in Society, 42(3), 287-
305.
Partington, A. (2003). The linguistics of political argument: The spin-doctor and the
wolf-pack at the White House. London: Routledge.
Partington, A. (2010). Modern Diachronic Corpus-Assisted Discourse Studies (MD-
CADS) [Special Issue]. Corpora, 5(2).
Pennycook, A. (2001). Critical applied linguistics: A critical introduction. Mahwah, NJ:
Routledge.
210
Pujolar, J. (2007). The future of Catalan: Language endangerment and nationalist
discourses in Catalonia. In A. Duchêne & M. Heller (Eds.), Discourses of
endangerment: Ideology and interest in the defence of languages (pp. 121-148).
London: Continuum.
Rayson, P. (2008). From key words to key semantic domains. International Journal of
Corpus Linguistics, 13(4), 519-549.
Reisigl, M., & Wodak, R. (2009). The discourse-historical approach (DHA). In R. Wodak
& M. Meyer (Eds.), Methods of critical discourse analysis: Theory and method
(pp. 87-121). SAGE: London.
Ricento, T. (Ed.) (2000). Ideology, politics and language policies: Focus on English.
Philadelphia: John Benjamins Publishing.
Ricento, T. (2003). The discursive construction of Americanism. Discourse & Society,
14(5), 611-637.
Ricento, T. (2006). Americanization, language ideologies and the construction of
European identities. In C. Mar-Molinero, & P. Stevenson (Eds.), Language
ideologies, policies and practices. Language and the future of Europe (pp. 44-57).
Basingstoke: Palgrave Macmillan.
Rumsey, A. (1990). Wording, meaning and linguistic ideology. American Anthropologist,
92(2), 346-361.
Safran, W. (1999). Nationalism. In Fishman, J. A. (Ed.), Handbook of language & ethnic
identity (pp. 77-93). Oxford: Oxford University Press.
Salama, A. H. Y. (2011). Ideological collocation and the recontextualization of Wahhabi-
Saudi Islam post-9/11: A synergy of corpus linguistics and critical discourse
analysis. Discourse & Society, 22(3), 315-342.
SANU (1986). Memorandum srpske akademije nauka i umetnosti (nacrt) [A draft
memorandum of the Serbian Academy of Sciences and Arts]. Retrieved from
http://www.helsinki.org.rs/serbian/doc/memorandum%20sanu.pdf [Last accessed
May 5, 2015]
Schieffelin, B. B., Woolard, K. A., & Kroskrity, P. V. (Eds.) (1998). Language
ideologies: practice and theory. New York: Oxford University Press.
Scott, M. (1997). PC analysis of key words – and key key words. System, 25(2), 233-245.
Scott, M. (2009). In search of a bad reference corpus. In D. Archer (Ed.), What’s in word-
list? Investigating word frequency and keyword extraction (pp. 79-92). Oxford:
Ashgate.
Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and
marking out our trails… In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 43-
58). Amsterdam: John Benjamins Publishing.
Scott, M. (2014a). WordSmith Tools Help Manual. Version 6.0. Liverpool: Lexical
Analysis Software.
Scott, M. (2014b). WordSmith Tools. Liverpool: Lexical Analysis Software.
Scott, M. R., & Tribble C. (2006). Key words and corpus analysis in language education.
Amsterdam: John Benjamins Publishing.
Seargeant, P. (2009). Language ideology, language theory, and the regulation of
linguistic behavior. Language Sciences, 31, 345-359.
211
Silverstein, M. (1979). Language structure and linguistic ideology. In R. Clyne, W.
Hanks & C. Hofbauer (Eds.), The elements: A parasession on linguistic units and
levels (pp. 193-247). Chicago: Chicago Linguistic Society.
Silverstein, M. (1993). Metapragmatic discourse and metapragmatic function. In J.A.
Lucy (Ed.), Reflexive language: Reported speech and metapragmatics (pp. 33-
58). Cambridge: Cambridge University Press.
Silverstein, M. (1998). The uses and utility of ideology: A commentary. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 123-145). New York: Oxford University Press.
Silverstein, M. (2000). Whorfianism and the linguistic imagination of nationality. In P. V.
Kroskrity, (Ed.), Language regimes: Ideologies, polities, and identities (pp. 85-
138). Santa Fe, New Mexico: School of American Research Press.
Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford: Oxford University
Press.
Skutnabb-Kangas, T. (2000). Linguistic genocide in education or worldwide diversity
and human rights? Mahwah, NJ: Lawrence Erlbaum Associated Inc., Publishers.
Spitulnik, D. (1998). Mediating unity and diversity: The production of language
ideologies in Zambian broadcasting. In B. B. Schieffelin, K. A. Woolard & P. V.
Kroskrity (Eds.), Language ideologies: Practice and theory (pp. 163-188). New
York: Oxford University Press.
Spolsky, B. (2004). Language policy. Cambridge: Cambridge University Press.
Stubbs, M. (1983). Discourse analysis: The sociolinguistic analysis of natural language.
Chicago: University of Chicago Press.
Stubbs, M. (1996). Text and corpus analysis. London: Blackwell.
Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness
in texts (pp. 21-42). Amsterdam: John Benjamins Publishing.
Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in
students’ evaluations of mathematics instructors on RateMyProfessors.com.
Language in Society 44 (1), 35-62.
Sudetic, C. (1993, December 26). Balkan conflicts are uncoupling Serbo-Croatian. The
New York Times. Retrieved from
http://www.nytimes.com/1993/12/26/world/balkan-conflicts-are-uncoupling-
serbo-croatian.html [Last accessed May 5, 2015]
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston, MA:
Pearson Education, Inc.
Thompson, J. B. (1984). Studies in the theory of ideology. Cambridge: Polity Press.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam: John Benjamins
Publishing Company.
Tomić, Y. (n.d.). The ideology of Greater Serbia in the nineteenth and twentieth
centuries: An expert report. Paris: Bibliotheque de documentation internationale
contemporaine, Universite de Paris X-Nanterre. Retrieved from
http://www.helsinki.org.rs/serbian/doc/expert%20report%20-
%20yves%20tomic.pdf [Last accessed May 5, 2015]
212
Tošović, B. (n.d.). Herausbildung des Bosnischen/Bosniakischen [The development of
Bosnian/Bosniak]. Retrieved from http://www-gewi.uni-
graz.at/gralis/Slawistikarium/BKS/Herausbildung_Bosnisch-Bosniakisch.pdf
[Last accessed May 5, 2015]
van Dijk, T. A. (1998). Ideology: A multidisciplinary approach. London: SAGE.
van Dijk, T. A. (2006). Ideology and discourse analysis. Journal of Political Ideologies,
11(2), 115-140.
Vessey, R. (2013a). Language ideologies and discourses of national identity in Canadian
newspapers: A cross-linguistic corpus-assisted discourse study (Unpublished
doctoral dissertation). University of London, London.
Vessey, R. (2013b). Too much French? Not enough French?: The Vancouver Olympics
and a very Canadian language ideological debate. Multilingua, 32(5), 659-682.
Vikør, L. S. (2000). Northern Europe: Languages as prime markers of ethnic and national
identity. In S. Barbour & C. Carmichael (Eds.), Language and nationalism in
Europe (pp. 105-129). Oxford: Oxford University Press.
Völkl, S. D. (2002). Bosnisch [Bosnian]. In M. Okuka (Ed.), Lexikon der Sprachen des
europäischen Ostens [The lexicon of East European languages] (Vol. 10, pp. 209-
218). Klagenfurt: Wieser. Retrieved from
http://wwwg.uni-klu.ac.at/eeo/Bosnisch.pdf [Last accessed May 5, 2015]
Voss, C. (2006). The Macedonian standard language: Tito-Yugoslav experiment or
symbol of ‘Great Macedonian’ ethnic inclusion? In C. Mar-Molinero, & P.
Stevenson (Eds.), Language ideologies, policies and practices. Language and the
future of Europe (pp. 118-132). Basingstoke: Palgrave Macmillan.
Wallis, D. A. (1998). Language, attitude, and ideology: An experimental social-
psychological study. Journal of Pragmatics, 30, 21-48.
Warren, M. (2010). Identifying aboutgrams in engineering texts. In M. Bondi & M. Scott
(Eds.), Keyness in texts (pp. 113-126). Amsterdam: John Benjamins Publishing.
Wilce, J. (2010). Society, language, history and religion: A perspective on Bangla from
linguistic anthropology. In T. Omoniyi (Ed.), The sociology of language and
religion: Change, conflict, and accommodation (pp. 126-155). London: Palgrave/
Macmillan.
Williams, R. (1976). Keywords: A vocabulary of culture and society. New York: Oxford
University Press.
Wodak, R. (2001). The discourse-historical approach. In R. Wodak & M. Meyer (Eds.),
Methods of critical discourse analysis (pp. 63-94). London: SAGE.
Wodak, R. (2004). Critical discourse analysis. In C. Seale, G. Gobo, J. F. Gubrium & D.
Silverman (Eds.), Qualitative research practice (pp.197-213). London: Sage.
Wodak, R. (2012). Language, power and identity. Language Teaching, 45(2), 215-233.
Wodak, R., & de Cillia, R., & Reisigl, M., & Liebhart, K. (1999). The discursive
construction of national identity. Edinburgh: Edinburgh University Press.
Wodak, R., & Meyer, M. (Eds.) (2009). Methods of critical discourse analysis: Theory
and method. SAGE: London.
Woolard, K. A. (1998). Introduction: Language ideology as a field of inquiry. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 3-47). New York: Oxford University Press.
213
Woolard, K. A., & Schieffelin, B. B. (1994). Language ideology. Annual Review of
Anthropology, 23, 55-82.
Wright, S. (2004). Language policy and language planning: From nationalism to
globalization. New York: Palgrave MacMillan.
Xiao, R., & McEnery, T. (2005). Two approaches to genre analysis: Three genres in
Modern American English. Journal of English Linguistics, 33(1), 62-82.
Žunić, D. (1999). Nacionalizam i književnost: Srpska književnost 1985-1995
[Nationalism and literature: Serbian literature 1985-1995]. Open Society Institute:
Budapest, Hungary. Retrieved from
http://rss.archives.ceu.hu/archive/00001127/01/133.pdf [Last accessed May 5,
2015]
214
Appendices
Appendix A: Sampling Procedures
Although one of the main goals of the methodological synergy between CL and
CDA is to provide an objective and reliable sampling procedure, similar to many other
corpus-based discourse studies the data set compiled here is too large for a
comprehensive analysis, even with the help of quantitative CL techniques. Existing
research deals with this issue in a variety of ways. Studies based on keyword analysis
typically focus on a limited number of high-scoring items, regardless of the actual
number of items identified as key. Studies based on collocation analysis, on the other
hand, focus on limited sets of search terms and collocates, as well as random samples of
concordance lines. To solve the problem of oversampling, Hunston (2002), for example,
suggests a downsampling technique based on a random selection of concordance lines.
Somewhat similarly, Baker et al. (2008) suggest a focus on what they call consistent
collocates (i.e., items that appear as significant collocates of the target core concept(s)
throughout the timeframe rather than over isolated periods within it). Vessey (2013a), in
contrast, concentrates precisely on such peaks in discursive activity (e.g., around
significant events), focusing her CL searches on a small set of theoretically-relevant items
resulting from previous research. Despite the relative practical and theoretical merits of
such downsampling procedures, however, it is clear that they all incur a loss of
information without necessarily demonstrating the validity of the approaches.
Exploratory factor analysis (EFA), on the other hand, is ideally suited for principled
analyses of large data sets with numerous variables, as it employs covariance among
variables to produce sets of mutually positively correlated variables that can help the
215
researcher identify discourses and ideologies in the data in an objective manner and with
the promise of minimal loss of information. For EFA to identify meaningful covariance
and thus produce useful results, however, variables must meet certain minimal
requirements such as sufficient frequency per observation (e.g., individual text; Douglas
Biber, personal communication).
In order to find a solution to this problem, the data were carefully examined from
a variety of standpoints, paying particular attention to the hit count (i.e., number of
occurrences of forms of the lemma JEZIK ‘language’) in individual articles and its
relationship to article content. Although there is a variety of ways to count hits and then
examine articles based on this information, the ‘plot’ function in WordSmith Tools 6.0
(WST; Scott, 2014b) concordancer tool was used initially to sort articles according to the
number of hits as the quickest and most practical solution. The top ten and bottom ten
articles on the list were then examined closely in their entirety with an eye to potential
usefulness of their content for analysis of language-related discourses and ideologies.
Quite expectedly, this examination revealed that higher numbers of hits meant a higher
likelihood of an article potentially containing relevant material. For example, while the
top ten articles tended to have high numbers of hits (e.g., between 22 and 45 for the
nominative form of the lemma JEZIK) and content in which language was explicitly
discussed, the bottom ten articles (and roughly 66% of all articles, see Table 4.6) all had a
single hit and tended to mention language only in passing. Interestingly, Vessey (2013a)
found a similar pattern in Canadian newspaper data in English and French; for illustrative
examples of the pattern here, see Examples 1 and 2 below. Based on this, hit count was
taken as a reasonable indicator of content relevance and thus taken as a valid sampling
216
criterion. Further, based in part on the methodological constraints of EFA (Douglas
Biber, personal communication), the cutoff point for inclusion in the research sample was
set at 5 hits for the lemma JEZIK per article.
Example 1 (excerpt from “The restoration of Serbian studies,” Politika, July 22,
2006, 45 hits)
Jezik jeste jedan, ali srpski, jeste jedinstven, ali ne i hrvatski, jeste zajednički, ali
samo po upotrebi,no nikako ne i po pripadnosti i poreklu Akademik Pavle Ivić, u
svojim dijalektološkim studijama, dao je tvrde i nepobitne naučne dokaze da
štokavski dijalekt, po svome poreklu i svojoj prvobitnoj teritorijalnoj
rasprostranjenosti, obuhvata oblasti srednjevekovne srpske države (uglavnom do
reke Cetine). Nema nikakve sumnje da je tim jezikom govorio srpski narod i da je
to jezik srpskog naroda. Nauci je takođe dobro poznato da su čakavski i kajkavski
dijalekti nastali na tlu Hrvatske i da oni predstavljaju izvorni hrvatski jezik.
Hrvati su se svoga čakavskog i kajkavskog jezika odrekli u prvoj polovini 19.
veka i prihvatili su Vukov, srpski, štokavski govor. Tako je srpski jezik postao i
hrvatski, zajednički, srpski i hrvatski, srpski ili hrvatski, srpskohrvatski i
hvratskosrpski. […]
The language [Central South Slavic] is one, but Serbian, it is unified, but not also
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. Academician Pavle Ivić, in his dialectological studies, has given hard and
irrefutable evidence that the Štokavian dialect, in terms of its origin and its
original territorial spread, covers the area of the medieval Serbian state (mostly to
the river Cetina). There is absolutely no doubt that this was the language spoken
by the Serbian people and belonging to the Serbian people. At the same time, it
has been scientifically established that the Čakavian and Kajkavian dialects came
into being in the territory of Croatia and that they represent the original Croatian
language. The Croats gave up on their Čakavian and Kajkavian language (sic) in
the first half of the nineteenth century, accepting Vuk’s [Vuk Stefanović Karadžić,
nineteenth century Serbian grammarian and language reformer] Serbian,
Štokavian speech. Thus, Serbian language became also Croatian, shared, Serbian
and Croatian, Serbian or Croatian, Serbo-Croatian or Croato-Serbian. […]
Example 2 (no title, Politika, February 17, 2003, 1 hit)
Večeras je u Novom Sadu održana svečana sednica Matice srpske povodom 177.
godišnjice postojanja ovog hrama naše kulture. Povodom 50. godišnjice Zbornika
Matice srpske za književnost i jezik besedio je prof. dr Jovan Delić. Na
večerašnjoj svečanosti pesniku Nikoli Vujičiću uručena je Zmajeva nagrada za
2002. godinu za knjigu pesama „Prepoznavanje" koju je izdalo Kulturno društvo
„Prosvjeta" iz Zagreba. Stihove laureata kazivao je dramski umetnik Ivan
217
Jagodić.
The Matica Srpska [Serbian Language and Literary Society] held a celebratory
session tonight in Novi Sad to mark the 177th anniversary of this temple of our
culture. To mark the 50th anniversary of the Matica Srpska Journal of Literature
and Language an address was delivered by Dr. Jovan Delić. Also at tonight’s
ceremony poet NikolaVujičić received the Zmaj [Jovan Jovanović Zmaj,
nineteenth century Serbian poet] Award for the year 2002 for his book of poetry
titled “Recognition” which was published by the Zagreb-based “Enlightenment”
Cultural Society. A selection of the poet laureate’s verses was performed by
theater actor Ivan Jagodić.
The ‘plot’ function in the WST concordancer (see above), however, is unable to
calculate the total number of hits for any given lemma (i.e., it is only capable of
calculating the number of hits for individual lemma forms separately). In order to arrive
at the total number of hits per article for all forms of the lemma JEZIK, a custom Python
application was used to simultaneously compute the total number of hits for all lemma
forms per article and sort articles into separate folders according to the number of hits.
Following this, another custom Python program was used to calculate the total number of
articles per hit category (1, 2-4, 5-9, and 10+ hits) and publication. Using the above-
mentioned cutoff point of 5 hits per article (again, for all forms of the lemma JEZIK
combined), a total of 1,257 articles were identified (with 5+ hits, see Tables 6-9 in
Section 4).
Another way to test hit count as a sampling criterion is to compare the frequencies
in the two sections of forms of the lemma JEZIK and its collocates directly related to the
pertinent ethnolinguistic identities such as Bosnian, Bosniak, Croatian, Montenegrin, and
Serbian as the most obvious pointers to discourses relevant to an analysis of links
between language ideologies and ethnonationalism (Table A1). (I am also including the
verb postoji “exists”, which here suggests a pervasive discourse and concomitant
218
ideology of contestation, as another example).
Table A1
Comparison between the 1 Hit and 5+ Hits Sections of SERBCORP in Terms of
Language- and Identity-related Collocates
1 hit section 5+ hits section

Size (words) 7,141,589 1,118,529
Size (articles) 10,616 1,257
Mean length in words (SD) 662.70 (621.03) 879.19 (728.90)
Lemma JEZIK Raw freq./per 1,000,000 Raw freq./per 1,000,000

jezik (language) 3,103/434 4,644/4,151
jezika 2,778/388 4,565/4,081
jeziku 3,020/422 1,931/1,726
jezikom 985/137 577/515
jezici (languages) 151/21 230/205
jezike 313/43 316/282
jezicima 268/37 267/238
Total 10,168/1,423 12,530/11,202
Identity-related collocates Raw freq./per 1,000,000 Raw freq./per 1,000,000

bosanski (Bosnian) 39/5.5 165/147.5
bošnjački (Bosniak) 14/1.9 116/103.7
crnogorski (Montenegrin) 34/4.75 351/313.8
hrvatski (Croatian) 93/13 363/324.5
srpski (Serbian) 2,122/297 3,449/3,083
postoji (exists) 63/8.8 161/143.9
Even a quick glance at the results of frequency and collocation analyses on the 1
hit and 5+ hits sections, shows that while forms of the lemma JEZIK can be found in both
sections of the corpus, they are roughly eight times more frequent in the 5+ hits section
overall (11,202 vs. 1,423 occurrences per million words), which of course is expected
given the selection criterion here. More importantly, the pertinent ethnolinguistic
collocates and the verb postoji ‘exists’ as another indicator of relevant discourses (all
significant at MI > 5) have considerably higher normalized frequencies in the 5+ hits
section, again suggesting a comparatively greater relevance of articles featuring higher
numbers of explicit references to language. Perhaps the starkest example here is the
219
collocate ‘Bosniak’ (103.7 vs. 1.9 occurrences per million words) which as a language
label is an indicator of a pervasive discourse of contestation (as many Serbian but also
Croatian linguists, politicians, and public figures argue that the language of Bosniaks can
only be called Bosniak, after the people, and not Bosnian, after the country, as that
supposedly represents an attack on separate Serbian and Croatian ethnolinguistic
identities within Bosnia, even if all three languages are official according to the country’s
constitution).
A final and perhaps most convincing piece of evidence of the greater relevance of
articles with a higher hit count, and thus of the validity of the hit count as a sampling
criterion, can be obtained from keyword analysis. The keyword list resulting from a
comparison between the 1-4 hits and 5+ hits sections of SERBCORP thus includes the
lemma JEZIK and the relevant ethnonyms and glottonyms, as well as a considerable
number of other potentially relevant items such as narod ‘people’, nacionalni ‘national’,
naziv ‘[language] label’, identitet ‘identity’, nacija ‘nation’, etc., which indicates their
greater salience in the 5+ hits section (see Table 6.5). It therefore seems reasonable to
conclude that, while traces of relevant discourses and ideologies can be found also in
texts that are not equally language-focused (i.e., articles with a lower hit count), the
sampled data set represents a concentrated discourse of higher relevance to the study of
the links between language-related discourses and language ideologies, on the one hand,
and ethnolinguistic identities and ethnonationalism, on the other.
220
Appendix B: Comparative Analyses of Comparator Corpora
In order to assess the relative usefulness of different types of reference corpora, I
also compiled a set of three wordlists from the very large, newly available web-as-corpus
(WaC) corpora of Serbian1 to use as reference corpora and compare to SERBCOMP
(Table B1).
Table B1
Sizes of SERBCOMP and the WaC Reference Corpora
Corpus Size (in words)

SERBCOMP 22,493,804
SETIMES2 43,482,838
OPUS2 198,141,613
srWaC14 561,529,963
To determine the optimal reference corpus, keyword analysis was conducted with
SERBCOMP as well as the WaC corpora as reference corpora, using both the chi-square
and log-likelihood tests. Minimum KW frequency (5) and the p value (.0000000001)
were kept the same for all tests and corpora combinations. In a discussion of measures of
similarity between corpora, Baker (2010, pp. 91-93) suggests the use of frequency and
rank information and the Spearman rank correlation statistic as a way to assess the degree
of similarity between multiple corpora. In an adaptation of this technique to comparisons
between reference corpora, I used keyness scores rather than item frequency to assess
similarity. Thus, in order to compare the results of keyword analyses conducted using
SERBCOMP and WaC corpora, I compared the (keyness-based) ranks of the top 100
KWs resulting from keyword analysis with SERBCOMP as the reference corpus to the
ranks of the same KWs resulting from keyword analyses with the WaC corpora as
reference corpora. The correlation scores produced by the Spearman rank correlation
221
statistic (p < .01) ranged from .62 for the largest but most heterogeneous srWaC14 to .61
for the somewhat smaller but more coherent OPUS to .71 for the smallest and most
coherent SETIMES2. Since SETIMES2 is a corpus compiled from news articles
published in an online news outlet (The Southeast European Times, www.setimes.com)
and is thus the most similar to SERBCOMP of the three WaC corpora, the result of
correlation analysis is expected.
A careful analysis of the keyword lists produced by the different reference corpora
revealed that, despite their very different sizes (see Table B1), different reference corpora
yield comparable results regardless of the statistic chosen. This lends support to Xiao &
McEnery’s (2005) claim that the size of the reference corpus may not be important,2
particularly for keyword analyses of large research corpora. Similarly confirmed is
Culpeper’s (2009) finding that the chi-square and log-likelihood tests produce negligibly
different keyword lists, so log-likelihood was used in all subsequent analyses. The most
notable difference between the keyword lists thus produced was in the noise levels, which
can be defined as the proportion of functional words and semantically generic lexical
material such as, for instance, prepositions (e.g., by) and time adverbs with context-
specific reference (e.g., yesterday) identified as key. Unsurprisingly, the keyword list
produced by SERBCOMP as the reference corpus exhibited the lowest noise level. An
additional difficulty in dealing with the WaC corpora as reference corpora was that
keyword analyses were based on comparisons of wordlists (rather than the corpora
themselves),3 which presented problems for lemmatization4 in WST. SERBCOMP as a
comparator corpus was thus determined to have two principal advantages. First, because
both SERBCORP and SERBCOMP comprise newspaper register, the resulting keyword
222
list is largely free from items that characterize newspaper language in general as well as
other items contributing to noise, and second, items identified as key can be expected to
be characteristic of language-related discourses. In other words, because SERBCORP
and SERBCOMP are very similar except in whether or not they contain texts mentioning
language, the KWs resulting from keyword analysis based on SERBCOMP as the
reference corpus are more likely to be characteristic of newspaper articles mentioning or
discussing language (rather than newspaper discourse in general) and so are more likely
to be useful for the identification of any discourses and ideologies pertaining to language
(cf. “irrelevant stylistic differences” in Culpeper, 2009 above). Based on these findings, a
decision was made to conduct all keyword analyses with SERBCOMP as the reference
corpus and to exclude the WaC corpora from further consideration.
223
Appendix C: Keyword Analysis (SERBCORP)
To get a discursive profile of the research corpus as a whole, SERBCORP was
compared to SERBCOMP (the lists of positive and negative KWs are presented in Tables
C1 and C2). This analysis identified a total of 111 positive and 77 negative key
lemmas.41 Expectedly, the top positive key lemma in SERBCORP is jezik ‘language’,
which simply confirms that SERBCORP is about language. The top ten positive key
lemmas also include knjiga ‘book’, srpski ‘Serbian’, književnost ‘literature’, engleski
‘English’, škola ‘school’, pisac ‘writer’, roman ‘novel’, pesnik ‘poet’, and kultura
‘culture’. This suggests that SERBCORP is predominantly about discussions of or
references to language in the contexts of literature, education, and culture, with a
prominent pair of identity-related items, Serbian and English, further suggesting a
pervasive discourse of national identity based on group membership (‘us’ and ‘them’).
Other top fifty positive key lemmas suggest similar semantic fields: ethnonyms and
glottonyms as well as other identity-related nouns and pronouns (naš ‘our, Srbi ‘Serbs’,
narod ‘people’, istorija ‘history’, francuski ‘French, ja ‘I’, svoj ‘own’, moj ‘my’, ona
‘she’, and Crna Gora ‘Montenegro’); education (e.g., professor ‘professor’, obrazovanje
‘education’, deca ‘children’, prosvete ‘education [profession]’, znanje ‘knowledge’,
nauka ‘science’, program ‘curriculum’), literature and publishing (e.g., izdavač
‘publisher, nagrada ‘award’, autor ‘author’, urednik ‘editor’), and theater and film (e.g.,
umetnost ‘art’, predstava ‘[theater] play’, film ‘film’, pozorišta ‘theater’). Most
pertinently for our purposes, there is a remarkable absence of references to other regional
ethnolinguistic identities (with the possible exception of the key lemma ime ‘name’),
which confirms that general language-related newspaper discourse may not be ideally
224
suited for the study of links between language-related discourses and language ideologies
and ethnolinguistic identities.
The top ten negative key lemmas include Srbija ‘Serbia’, vlada ‘government’,
evra ‘Euros’, miliona ‘millions’, odsto ‘percent’, dinara ‘Dinars’, zakon ‘law’, stranke
‘(political) parties’, protiv ‘against’, and predsednik ‘president’). This suggests that,
compared to general newspaper discourse (i.e., SERBCOMP), articles mentioning
language (SERBCORP) tend to include considerably fewer references to national
political and state institutions, as well as finances. Similar to the positive key lemmas
above, the remaining negative key lemmas confirm the relative absence of references to
national political and state institutions.
225
Table C1
Positive Key Lemmas in SERBCORP (by Keyness Score)
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 33834 0.29 6517 1 72780.53 0.0000000000
2 book knjiga 22712 0.19 3535 10369 0.05 16361.50 0.0000000000
3 Serbian srpski 35745 0.31 3720 32565 0.14 9509.99 0.0000000000
6 school škola 12984 0.11 1362 7491 0.03 7281.08 0.0000000000
7 writer pisac 7477 0.06 1687 2633 0.01 6678.77 0.0000000000
8 novel roman 5908 0.05 1287 2166 5120.75 0.0000000000
9 poet pesnik 2907 0.02 857 600 3541.36 0.0000000000
10 culture kultura 9544 0.08 1172 8355 0.04 2762.23 0.0000000000
12 our naš 32116 0.28 3284 43596 0.19 2242.57 0.0000000000
13 world svet 14916 0.13 2489 17017 0.08 2148.98 0.0000000000
14 Serbs Srbi 9874 0.08 1669 9918 0.04 2073.39 0.0000000000
15 life život 12633 0.11 3228 14019 0.06 1991.74 0.0000000000
16 people narod 8529 0.07 1667 8257 0.04 1966.10 0.0000000000
17 history istorija 6990 0.06 1015 6267 0.03 1922.79 0.0000000000
19 publisher izdavač 2454 0.02 1020 1065 1850.34 0.0000000000
20 I ja 22907 0.20 3820 30315 0.13 1817.01 0.0000000000
21 art umetnost 5876 0.05 991 5011 0.02 1793.73 0.0000000000
22 award nagrada 5714 0.05 1037 4956 0.02 1685.40 0.0000000000
23 own svoj 43119 0.37 4169 64375 0.29 1673.43 0.0000000000
24 author autor 5520 0.05 1948 4725 0.02 1672.44 0.0000000000
25 word reč 15127 0.13 4482 18651 0.08 1638.93 0.0000000000
26 my moj 9172 0.08 1790 9866 0.04 1590.91 0.0000000000
28 children deca 9665 0.08 1456 10786 0.05 1496.38 0.0000000000
30 love ljubav 3050 0.03 978 2236 1222.29 0.0000000000
31 theater play predstava 4601 0.04 857 4259 0.02 1178.86 0.0000000000
32 work delo 3289 0.03 2119 2736 0.01 1054.13 0.0000000000
33 knowledge znanje 3355 0.03 858 2910 0.01 989.41 0.0000000000
34 science nauka 3731 0.03 1204 3468 0.02 946.90 0.0000000000
35 story priča 9748 0.08 3126 12432 0.06 916.19 0.0000000000
36 film film 7155 0.06 1105 8872 0.04 757.21 0.0000000000
37 doctor dr 5362 0.05 2203 6243 0.03 719.92 0.0000000000
38 program of study studija 2937 0.03 882 2775 0.01 717.56 0.0000000000
39 text tekst 5182 0.04 1413 5996 0.03 711.01 0.0000000000
226
40 wrote napisao 1844 0.02 1399 1451 655.18 0.0000000000
41 alphabet pismo 4667 0.04 865 5401 0.02 639.96 0.0000000000
42 Monte(negro) Crna 7474 0.06 824 9931 0.04 580.66 0.0000000000
43 (Monte)negro Gora 8250 0.07 812 11299 0.05 548.57 0.0000000000
44 one jedan 35933 0.31 7258 59267 0.26 545.43 0.0000000000
45 program/curriculum program 7711 0.07 1570 10476 0.05 535.17 0.0000000000
46 editor urednik 1600 0.01 1085 1307 530.83 0.0000000000
47 theater pozorišta 2067 0.02 988 2036 456.21 0.0000000000
48 she ona 12336 0.11 4913 18787 0.08 410.14 0.0000000000
49 name ime 6627 0.06 2440 9292 0.04 386.30 0.0000000000
50 here ovde 6153 0.05 3213 8589 0.04 368.01 0.0000000000
51 history njegov 20865 0.18 2728 34010 0.15 363.81 0.0000000000
52 part deo 14419 0.12 3981 22718 0.10 357.13 0.0000000000
53 experience iskustvo 3092 0.03 1135 3766 0.02 351.32 0.0000000000
54 picture slika 3168 0.03 1261 3970 0.02 320.85 0.0000000000
55 youth mladi 3415 0.03 1340 4380 0.02 312.91 0.0000000000
56 always uvek 7295 0.06 4195 10746 0.05 310.79 0.0000000000
57 born rođen 1174 0.01 949 1081 304.37 0.0000000000
58 many mnogi 7031 0.06 2371 10358 0.05 299.35 0.0000000000
59 her njen 7002 0.06 1551 10316 0.05 297.97 0.0000000000
60 live žive 2338 0.02 1729 2775 0.01 292.91 0.0000000000
61 woman žena 3900 0.03 1149 5254 0.02 282.68 0.0000000000
62 Belgrade (adj.) beogradski 4401 0.04 850 6097 0.03 274.72 0.0000000000
63 first prvi 21664 0.19 5387 36303 0.16 267.38 0.0000000000
64 today danas 10922 0.09 5569 17349 0.08 250.07 0.0000000000
65 man čovek 23017 0.20 2663 38926 0.17 249.37 0.0000000000
66 topic tema 4097 0.04 1184 5723 0.03 244.03 0.0000000000
67 father otac 1602 0.01 1017 1823 232.58 0.0000000000
68 age doba 1666 0.01 1277 1962 214.73 0.0000000000
69 war rat 6446 0.06 1078 9871 0.04 204.88 0.0000000000
70 death smrti 1762 0.02 1192 2165 193.38 0.0000000000
71 community zajednica 4381 0.04 868 6445 0.03 188.31 0.0000000000
72 common zajednički 2242 0.02 946 2937 0.01 186.44 0.0000000000
73 generation generacije 1096 889 1187 186.16 0.0000000000
74 sometimes ponekad 1462 0.01 1162 1751 177.12 0.0000000000
75 America Americi 1319 0.01 854 1557 168.56 0.0000000000
76 past prošlosti 1372 0.01 981 1651 163.28 0.0000000000
77 foreign strani 12907 0.11 1688 21659 0.10 156.14 0.0000000000
78 Yugoslavia Jugoslavije 2462 0.02 1531 3409 0.02 154.11 0.0000000000
79 person ličnosti 1474 0.01 1113 1835 153.41 0.0000000000
80 idea ideja 3168 0.03 1387 4585 0.02 151.94 0.0000000000
81 second drugi 30613 0.26 4548 54094 0.24 150.84 0.0000000000
82 abroad inostranstvu 1298 0.01 935 1605 138.88 0.0000000000
83 scene sceni 1723 0.01 1163 2276 0.01 137.69 0.0000000000
84 space prostor 5020 0.04 1297 7859 0.03 131.81 0.0000000000
227
85 this ovaj 49317 0.42 4897 89477 0.40 120.73 0.0000000000
86 every svaki 12309 0.11 2478 20959 0.09 120.22 0.0000000000
87 house kuća 1897 0.02 1386 2621 0.01 120.17 0.0000000000
88 Belgrade Beograd 19233 0.17 3386 33602 0.15 120.01 0.0000000000
89 you vi 7922 0.07 1306 13072 0.06 119.35 0.0000000000
90 voice/vote glas 1151 857 1439 117.70 0.0000000000
91 opinion mišljenje 3248 0.03 946 4940 0.02 109.05 0.0000000000
92 large veliki 18380 0.16 3411 32248 0.14 105.32 0.0000000000
93 world svetski 5387 0.05 814 8708 0.04 102.92 0.0000000000
94 right pravo 9921 0.09 2455 16852 0.07 100.55 0.0000000000
95 desire želja 1375 0.01 809 1893 88.83 0.0000000000
96 Serbia and Montenegro SCG 2523 0.02 1078 3843 0.02 83.71 0.0000000000
97 work rad 8687 0.07 2785 14832 0.07 81.27 0.0000000000
98 never nikad 5219 0.04 1290 8643 0.04 75.14 0.0000000000
99 most often najčešće 1445 0.01 1145 2067 74.65 0.0000000000
100 carry nosi 1048 935 1420 73.72 0.0000000000
101 society društvo 5464 0.05 1323 9127 0.04 70.33 0.0000000000
102 truth istina 2307 0.02 1148 3596 0.02 62.99 0.0000000000
103 city grad 4841 0.04 1385 8095 0.04 61.42 0.0000000000
104 find naći 3008 0.03 1266 4924 0.02 49.86 0.0000000000
105 conversation razgovor 1051 843 1536 47.25 0.0000000000
106 little mali 9949 0.09 1240 17645 0.08 45.01 0.0000000000
107 task zadatak 1458 0.01 866 2247 43.93 0.0000000000
108 decade decenije 1066 916 1588 41.91 0.0000000000
109 there tamo 4117 0.04 2543 6997 0.03 41.36 0.0000000000
110 emphasize ističe 1895 0.02 1472 3039 0.01 39.37 0.0000000000
111 all the time stalno 1776 0.02 1375 2851 0.01 36.52 0.0000000001
228
Table C2
Negative Key Lemmas in SERBCORP (by Keyness Score)
N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 Serbia Srbija 33742 0.29 2431 93159 0.41 -3359.86 0.0000000000
3 Euros evra 3045 0.03 1325 15874 0.07 -3107.69 0.0000000000
5 percent odsto 5838 0.05 1923 22713 0.10 -2594.47 0.0000000000
6 Dinars dinara 2690 0.02 1060 11867 0.05 -1759.88 0.0000000000
7 law zakon 5689 0.05 1069 19899 0.09 -1733.10 0.0000000000
8 parties stranke 2288 0.02 1047 10165 0.05 -1527.36 0.0000000000
9 against protiv 5439 0.05 2939 18154 0.08 -1376.63 0.0000000000
11 authorities vlast 5740 0.05 1262 17482 0.08 -966.86 0.0000000000
12 money novac 3009 0.03 1163 10517 0.05 -913.74 0.0000000000
13 minister ministar 4453 0.04 1485 13981 0.06 -865.05 0.0000000000
14 decision odluka 4129 0.04 912 13130 0.06 -849.51 0.0000000000
15 citizens građani 4283 0.04 873 13444 0.06 -831.09 0.0000000000
16 EU EU 3393 0.03 875 10873 0.05 -722.24 0.0000000000
17 state država 10880 0.09 2251 26849 0.12 -484.33 0.0000000000
18 case slučaj 5020 0.04 1289 13753 0.06 -475.39 0.0000000000
19 public javnost 3582 0.03 938 10303 0.05 -449.71 0.0000000000
20 larger veći 4342 0.04 1194 11978 0.05 -428.86 0.0000000000
21 system sistem 3988 0.03 1152 10931 0.05 -378.75 0.0000000000
22 director direktor 4308 0.04 2047 11611 0.05 -368.04 0.0000000000
23 prime-minister premijera 1298 0.01 877 4429 0.02 -358.90 0.0000000000
24 former bivši 1163 896 4033 0.02 -342.56 0.0000000000
25 problem problem 8628 0.07 2392 20854 0.09 -318.88 0.0000000000
26 year godina 66508 0.57 7372 139140 0.62 -298.02 0.0000000000
27 time vreme 19337 0.17 6382 43233 0.19 -295.25 0.0000000000
28 solution rešenje 2675 0.02 1011 7465 0.03 -282.97 0.0000000000
29 day dan 13186 0.11 2168 30134 0.13 -268.22 0.0000000000
30 clearly jasno 2464 0.02 1852 6790 0.03 -241.75 0.0000000000
31 week nedelje 1401 0.01 1148 4240 0.02 -228.71 0.0000000000
32 development razvoj 3534 0.03 1185 9132 0.04 -226.23 0.0000000000
33 group grupa 5265 0.05 1301 12934 0.06 -225.25 0.0000000000
34 result rezultat 1052 875 3307 0.01 -205.43 0.0000000000
35 now sada 13005 0.11 6602 29006 0.13 -191.79 0.0000000000
36 publicly javno 1200 0.01 927 3603 0.02 -188.37 0.0000000000
37 last prošle 2693 0.02 2056 7041 0.03 -187.51 0.0000000000
38 expect očekuje 1334 0.01 1106 3828 0.02 -165.29 0.0000000000
39 parliament skupštine 1684 0.01 1024 4590 0.02 -154.44 0.0000000000
229
N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 choice izbor 4185 0.04 1285 9934 0.04 -129.72 0.0000000000
41 political politički 10025 0.09 1651 22155 0.10 -129.10 0.0000000000
42 number broj 8152 0.07 3245 18272 0.08 -128.79 0.0000000000
43 nobody niko 3674 0.03 2526 8821 0.04 -127.41 0.0000000000
44 means sredstva 1190 0.01 871 3301 0.01 -121.47 0.0000000000
45 earlier ranije 2036 0.02 1659 5184 0.02 -116.69 0.0000000000
46 be able to moći 42024 0.36 1782 86394 0.38 -114.45 0.0000000000
47 process proces 2077 0.02 994 5254 0.02 -113.17 0.0000000000
48 momentarily trenutno 1688 0.01 1364 4354 0.02 -106.62 0.0000000000
49 account računa 1052 875 2903 0.01 -104.07 0.0000000000
50 moment trenutku 2289 0.02 1711 5648 0.03 -101.71 0.0000000000
51 affairs poslova 1634 0.01 1023 4199 0.02 -100.40 0.0000000000
52 now sad 5543 0.05 2973 12467 0.06 -91.76 0.0000000000
53 possible moguće 2430 0.02 1830 5834 0.03 -84.22 0.0000000000
54 five pet 4137 0.04 2844 9446 0.04 -83.18 0.0000000000
55 less manje 3841 0.03 2753 8824 0.04 -83.17 0.0000000000
56 help pomoći 1236 0.01 986 3173 0.01 -75.37 0.0000000000
57 reason razlog 3839 0.03 1096 8736 0.04 -74.00 0.0000000000
58 situation situacija 1300 0.01 1023 3281 0.01 -69.57 0.0000000000
59 state stanje 3319 0.03 1042 7590 0.03 -68.01 0.0000000000
60 six šest 2608 0.02 1993 6055 0.03 -63.82 0.0000000000
61 ministry ministarstvo 4059 0.03 1124 9085 0.04 -62.85 0.0000000000
62 institution institucija 3133 0.03 953 7145 0.03 -62.06 0.0000000000
63 question pitanje 12767 0.11 3667 26711 0.12 -57.01 0.0000000000
64 Kosovo Kosova 3086 0.03 1165 6961 0.03 -53.10 0.0000000000
65 say reći 13062 0.11 2110 27137 0.12 -48.48 0.0000000000
66 power/forces snage 1148 866 2813 0.01 -48.01 0.0000000000
67 goal cilj 2369 0.02 1461 5394 0.02 -45.99 0.0000000000
68 nothing ništa 4976 0.04 3212 10772 0.05 -45.60 0.0000000000
69 get dobiti 8347 0.07 888 17602 0.08 -45.05 0.0000000000
70 persons lica 1410 0.01 927 3352 0.01 -44.36 0.0000000000
71 plan plan 2555 0.02 861 5743 0.03 -41.93 0.0000000000
72 largest najveći 5070 0.04 1664 10897 0.05 -40.71 0.0000000000
73 bad loše 1050 877 2540 0.01 -39.08 0.0000000000
74 far daleko 1596 0.01 1298 3696 0.02 -37.94 0.0000000000
75 ten deset 2621 0.02 1946 5838 0.03 -37.88 0.0000000000
76 immediately odmah 2636 0.02 2028 5868 0.03 -37.78 0.0000000000
77 last poslednji 4936 0.04 1049 10577 0.05 -37.39 0.0000000000
230
Appendix D: Keyword Analysis (5+ Hits Section of SERBCORP)
Table D1
Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
231
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000
51 century vek 855 0.08 72 6062 0.03 628.97 0.0000000000
52 cultural kulturni 655 0.06 95 3896 0.02 623.19 0.0000000000
54 foreign strani 2004 0.18 265 21659 0.10 598.10 0.0000000000
57 exam ispit 305 0.03 63 910 579.48 0.0000000000
58 doctor dr 843 0.08 314 6243 0.03 577.16 0.0000000000
59 expression izraz 310 0.03 103 999 554.76 0.0000000000
61 German nemački 460 0.04 123 2301 0.01 541.69 0.0000000000
62 class period čas 542 0.05 63 3156 0.01 530.37 0.0000000000
63 world svet 1624 0.15 235 17017 0.08 528.78 0.0000000000
64 Vuk (Karadžić) Vuk 482 0.04 103 2575 0.01 525.54 0.0000000000
65 Croats hrvati 315 0.03 117 1111 523.22 0.0000000000
66 SANU SANU 235 0.02 84 582 509.44 0.0000000000
67 Spanish španski 201 0.02 63 414 488.93 0.0000000000
68 identity identitet 331 0.03 94 1356 480.11 0.0000000000
69 academician akademik 210 0.02 74 558 433.97 0.0000000000
70 label naziv 442 0.04 119 2660 0.01 413.93 0.0000000000
71 scientific naučni 255 0.02 65 947 404.94 0.0000000000
72 our naš 3182 0.28 341 43596 0.19 392.44 0.0000000000
73 name ime 943 0.08 252 9292 0.04 360.34 0.0000000000
74 politics politika 1475 0.13 897 17075 0.08 355.97 0.0000000000
75 own svoj 4336 0.39 440 64375 0.29 343.75 0.0000000000
76 Monte(negro) Gora 1072 0.10 93 11299 0.05 343.32 0.0000000000
77 (Monte)negro Crna 971 0.09 86 9931 0.04 337.32 0.0000000000
78 second drugi 3666 0.33 534 54094 0.24 301.76 0.0000000000
79 my/mine moj 929 0.08 165 9866 0.04 291.30 0.0000000000
80 author autor 540 0.05 170 4725 0.02 270.30 0.0000000000
81 schooling školovanje 160 0.01 62 558 268.30 0.0000000000
232
82 published objavljen 273 0.02 89 1595 265.93 0.0000000000
83 Russian ruski 525 0.05 106 4619 0.02 259.79 0.0000000000
84 writing pisanje 172 0.02 75 708 248.32 0.0000000000
86 publisher izdavač 211 0.02 82 1065 245.91 0.0000000000
87 iunderstand razumeti 287 0.03 62 1847 244.68 0.0000000000
88 literature literature 96 83 188 240.43 0.0000000000
89 spirit duh 275 0.02 64 1755 237.30 0.0000000000
90 I ja 2152 0.19 346 30315 0.13 230.32 0.0000000000
91 letters (a, b, c…) slova 100 72 227 229.30 0.0000000000
92 tradition tradicija 252 0.02 64 1559 227.21 0.0000000000
93 nation nacija 305 0.03 66 2155 225.56 0.0000000000
94 parents roditelji 314 0.03 106 2264 0.01 224.71 0.0000000000
95 text tekst 584 0.05 128 5996 0.03 200.77 0.0000000000
96 sentence rečenica 153 0.01 64 737 187.91 0.0000000000
97 today danas 1306 0.12 567 17349 0.08 185.88 0.0000000000
98 study studija 333 0.03 101 2775 0.01 183.93 0.0000000000
99 award nagrada 490 0.04 80 4956 0.02 175.20 0.0000000000
100 title naslov 246 0.02 63 1783 174.54 0.0000000000
101 lectures predavanja 100 72 394 150.48 0.0000000000
102 reality stvarnost 201 0.02 65 1413 149.85 0.0000000000
103 art umetnost 473 0.04 104 5011 0.02 149.29 0.0000000000
104 life život 1036 0.09 249 14019 0.06 135.40 0.0000000000
105 wrote napisao 194 0.02 141 1451 130.56 0.0000000000
106 one jedan 3595 0.32 684 59267 0.26 126.54 0.0000000000
107 work delo 288 0.03 168 2736 0.01 120.16 0.0000000000
108 Vojvodina Vojvodini 171 0.02 72 1316 109.51 0.0000000000
109 people's narodni 435 0.04 65 4981 0.02 108.71 0.0000000000
110 love ljubav 241 0.02 79 2236 106.18 0.0000000000
111 here ovde 664 0.06 306 8589 0.04 106.04 0.0000000000
112 many mnogi 768 0.07 238 10358 0.05 101.94 0.0000000000
113 self sebe 752 0.07 271 10149 0.05 99.50 0.0000000000
114 notion pojam 86 64 457 94.36 0.0000000000
115 Italian italijanski 85 67 452 93.18 0.0000000000
116 part deo 1475 0.13 378 22718 0.10 91.27 0.0000000000
117 interest interesovanje 160 0.01 80 1336 88.02 0.0000000000
118 example primer 571 0.05 342 7455 0.03 87.65 0.0000000000
119 special poseban 633 0.06 75 8470 0.04 87.14 0.0000000000
120 every svaki 1363 0.12 261 20959 0.09 85.31 0.0000000000
121 she ona 1227 0.11 449 18787 0.08 79.15 0.0000000000
122 form oblik 183 0.02 64 1734 76.81 0.0000000000
123 born rođen 131 0.01 77 1081 73.74 0.0000000000
124 often često 426 0.04 275 5427 0.02 72.44 0.0000000000
125 that onaj 1511 0.14 167 24063 0.11 72.35 0.0000000000
126 sense smisao 404 0.04 74 5093 0.02 71.65 0.0000000000
233
127 common zajednički 263 0.02 72 2937 0.01 71.12 0.0000000000
128 opinion mišljenje 392 0.04 109 4940 0.02 69.61 0.0000000000
129 your(s) vaš 215 0.02 68 2431 0.01 55.93 0.0000000000
130 age doba 183 0.02 131 1962 55.83 0.0000000000
131 both oba 149 0.01 121 1491 54.75 0.0000000000
132 introduction uvođenje 185 0.02 74 2005 54.71 0.0000000000
133 same isti 1104 0.10 171 17538 0.08 53.93 0.0000000000
134 live žive 235 0.02 167 2775 0.01 53.01 0.0000000000
135 creation stvaranje 187 0.02 65 2057 52.93 0.0000000000
136 generation generacije 125 0.01 98 1187 52.20 0.0000000000
137 phenomenon pojava 164 0.01 76 1759 49.98 0.0000000000
138 experience iskustvo 293 0.03 97 3766 0.02 48.04 0.0000000000
139 difference razlika 419 0.04 101 5836 0.03 47.50 0.0000000000
140 change menja 138 0.01 107 1433 46.01 0.0000000000
141 sometimes ponekad 159 0.01 121 1751 44.85 0.0000000000
142 story priča 794 0.07 237 12432 0.06 43.48 0.0000000000
143 newspaper novina 101 76 953 42.81 0.0000000000
144 community zajednica 446 0.04 95 6445 0.03 41.44 0.0000000000
145 their(s) njihov 1265 0.11 188 21010 0.09 41.29 0.0000000000
146 population stanovništva 156 0.01 91 1766 40.43 0.0000000000
147 they oni 5019 0.45 579 92128 0.41 38.69 0.0000000000
148 most often najčešće 173 0.02 127 2067 37.46 0.0000000000
149 first prvi 2077 0.19 480 36303 0.16 37.10 0.0000000000
150 past prošlosti 145 0.01 95 1651 36.89 0.0000000000
151 topic tema 395 0.04 107 5723 0.03 36.15 0.0000000001
234
Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference
Corpus)
Table E1
Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by
Keyness Score)
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000
235
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000
51 wrote pisali 1012 0.09 76 6015 0.06 164.69 0.0000000000
52 German nemački 460 0.04 123 2186 0.02 152.83 0.0000000000
53 Spanish španski 201 0.02 63 630 149.85 0.0000000000
54 exam ispit 305 0.03 63 1222 0.01 149.17 0.0000000000
55 SANU SANU 235 0.02 84 827 145.87 0.0000000000
56 name ime 943 0.08 252 5684 0.05 144.97 0.0000000000
57 identity identitet 331 0.03 94 1400 0.01 144.68 0.0000000000
58 children deca 1294 0.12 220 8371 0.08 144.52 0.0000000000
60 scientific naučni 255 0.02 65 1005 128.83 0.0000000000
61 French francuski 477 0.04 140 2466 0.02 125.47 0.0000000000
62 Russian ruski 525 0.05 106 2827 0.03 121.69 0.0000000000
63 poetry poezija 543 0.05 63 2988 0.03 117.18 0.0000000000
64 introduction uvođenje 185 0.02 74 645 116.65 0.0000000000
65 percent odsto 816 0.07 193 5022 0.05 114.84 0.0000000000
66 writer pisac 998 0.09 182 6480 0.06 109.40 0.0000000000
67 Monte(negro) Gora 1072 0.10 93 7178 0.07 99.97 0.0000000000
68 understand razumeti 287 0.03 62 1339 0.01 99.90 0.0000000000
69 be able to moći 4619 0.41 186 37405 0.35 90.76 0.0000000000
70 (Monte)negro Crna 971 0.09 86 6503 0.06 90.45 0.0000000000
71 academician akademik 210 0.02 74 900 89.19 0.0000000000
72 cultural kulturni 655 0.06 95 4147 0.04 81.12 0.0000000000
73 example primer 571 0.05 342 3576 0.03 74.36 0.0000000000
74 nation nacija 305 0.03 66 1622 0.02 73.55 0.0000000000
75 Vojvodina Vojvodini 171 0.02 72 740 71.08 0.0000000000
236
76 sentence rečenica 153 0.01 64 640 68.47 0.0000000000
77 today danas 1306 0.12 567 9616 0.09 65.65 0.0000000000
78 letters (a, b, c…) slova 100 72 344 64.47 0.0000000000
79 students (K-8) đaci 262 0.02 110 1426 0.01 58.63 0.0000000000
80 same isti 1104 0.10 171 8156 0.08 54.05 0.0000000000
81 poet pesnik 402 0.04 89 2505 0.02 53.54 0.0000000000
82 schooling školovanje 160 0.01 62 769 51.61 0.0000000000
84 difference razlika 419 0.04 101 2665 0.03 50.77 0.0000000000
85 book knjiga 2499 0.22 371 20214 0.19 49.74 0.0000000000
86 century vek 855 0.08 72 6192 0.06 48.64 0.0000000000
87 parents roditelji 314 0.03 106 1893 0.02 48.21 0.0000000000
89 change (v.) menja 138 0.01 107 674 42.65 0.0000000000
90 law zakon 687 0.06 112 5002 0.05 37.58 0.0000000000
237
Appendix F: Collocation Analysis (SERBCORP)
Collocation analysis of the lemma JEZIK conducted on SERBCORP as a whole
produced a total of 368 lemma collocates of the lemma JEZIK (Tables F1 and F2). Table
F1 shows the lemma collocates by frequency. Unsurprisingly for a corpus of Serbian, the
most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’ with 8,063
occurrences in 4,486 texts. Perhaps equally expectedly, the second most frequent lemma
collocate is engleski ‘English’ (3,167 occurrences in 2,500 texts), which, in Serbia as in
many other countries around the world, is seen as the most important foreign language
(followed here by French, German, Russian, and Spanish). Other top ten most frequent
lemma collocates of the lemma JEZIK include: strani ‘foreign’, govoriti ‘speak’, svoj
‘own’, maternji ‘mother (tongue)’, naš ‘our’, književnost ‘literature’, svi ‘all’, and
francuski ‘French’. As with keyword analysis of SERBCORP above, the most frequent
collocates of JEZIK suggest a pervasive discourse of national identity based on the
routinized construction of in- and out-groups (Serbian, own, mother tongue, and our vs.
English, foreign, and French). Further, similar to the results of keyword analysis, the
remainder of the top 50 collocates indicate semantic fields of education (učiti ‘learn’,
škola ‘school’, profesor ‘professor’, učenje ‘learning’, nastava ‘instruction’, znanje
‘knowledge’, fakultet ‘school [university]’, and matematika ‘mathematics’), literature and
translation (književnost ‘literature’, preveden ‘translated’, književni ‘literary’, prevod
‘translation’, and prevoditi ‘translate’), and culture (kultura ‘culture’).
Interestingly, however, collocates seem to be more sensitive to identity-related
discourses than keywords even at the level of SERBCORP. Thus, we see hrvatski
‘Croatian’, zajednički ‘common’, nacionalni ‘national’, crnogorski ‘Montengrin’, narod
238
‘people’, istorija ‘history’, postoji ‘exists’, and ime ‘name’, hinting at the discourse of
contestation mentioned above. The presence of identity-related discourses is further
suggested by the collocate manjina ‘minority’ as well as the glottonyms referring to the
two largest minority groups in Serbia, albanski ‘Albanian’ and mađarski ‘Hungarian’.
The top ten most significant collocates (see Table F2), on the other hand, exhibit a
more opaque pattern. Zli ‘evil’ refers to the common metonymy zli jezici ‘evil tongues’
(or, rather, ‘malicious tongues’). We also see an eclectic mix of items such as the verb
izučavati ‘to study’, the attributive adjective razumljiv ‘comprehensible’, the plural noun
brojki ‘numerals’, the singular noun geografija ‘geography’, and the glottonym švedski
‘Swedish’. More interestingly, the most significant collocates of the lemma JEZIK in
SERBCORP also include službeni and zvaničan both meaning ‘official’, and hrvatski
‘Croatian’ and južnoslovenski ‘South-Slavic’. The remainder of the top 50 most
significant collocates is similarly opaque in terms of patterns, but we do see a number of
potentially interesting items such as manjinski ‘minority (adj.)’, različit ‘different’,
ravnopravan ‘equal’, zajednički ‘common’, maternji ‘mother (tongue)’, jedinstvo ‘unity’,
preimenovanje ‘renaming’, standardizacija ‘standardization’, etnički ‘ethnic’, and uvesti
‘introduce’, as well as bosanski ‘Bosnian’, jugoslovenske ‘Yugoslav’, and a rare reference
to bilingualism, dvojezično ‘bilingual’. Finaly, one entirely new pattern suggests the
presence of a discourse of linguistic transparency and perhaps purity/authenticity with the
attributive adjectives jednostavnim ‘simple, jasnim ‘clear’, and čisti ‘pure’. In sum, then,
the top collocates of the lemma JEZIK in SERBCORP seem to show patterns similar to
those exhibited by key lemmas, while frequency seems to offer a better insight into the
discursive profile than statistical significance.
239
Table F1
Lemma Collocates of the Lemma JEZIK ‘Language’ in SERBCORP (by Frequency)

5 own svoj 5.67 1080 1442
6 mother (adj.) maternji 10.12 801 1265
7 our naš 5.90 1026 1262
9 all svi 5.08 902 1068
11 second drugi 5.21 781 1022
16 learn učiti 6.41 519 714
17 translated preveden 7.57 660 696
18 school škola 5.85 501 673
19 word reč 6.02 543 644
22 use upotreba 8.35 366 538
27 two dva 5.02 381 484
28 say kazati 6.78 436 478
33 people narod 5.89 340 455
34 knowledge znanje 7.23 391 453
36 translation prevod 5.84 369 413
38 same isti 5.59 321 402
39 wrote pisali 6.44 337 402
40 their njihov 7.41 328 374
41 history istorija 5.88 315 372
42 school (university) fakultet 5.85 298 351
43 man čovek 6.30 316 347
44 exist postojati 5.64 278 328
45 Spanish španski 8.77 253 326
46 mathematics matematika 8.40 219 323
47 world (adj.) svetski 6.06 301 323
48 name ime 5.74 198 320
49 minority manjina 7.27 209 312
50 good dobar 5.80 284 301
53 people's narodni 6.37 194 278
54 published objavljen 6.84 252 274
55 Roma (adj.) romski 7.17 146 273
57 three tri 5.05 217 259
59 Cyrillic ćirilica 5.88 173 256
61 label naziv 6.86 179 255
63 several nekoliko 5.11 227 238
240
64 Serbs Srbi 6.95 176 234
65 knowledge poznavanje 8.85 208 231
66 poem pesma 5.94 205 229
67 course kurs 8.11 184 226
69 many mnogi 8.77 204 220
70 Greek grčki 7.80 173 219
72 must morati 6.70 195 207
73 edition izdanje 6.43 187 205
74 mean (v.) značiti 5.81 194 204
75 class period čas 6.20 164 197
76 use (v). koristiti 9.73 172 197
77 translator prevodilac 5.37 178 197
78 understand razumeti 5.34 178 191
79 student (university) student 5.52 157 189
80 media mediji 6.22 161 187
82 science nauka 5.76 146 182
83 text tekst 8.81 162 181
84 number broj 6.84 152 179
86 education obrazovanje 5.63 156 176
87 four četiri 5.49 154 174
89 European evropski 8.57 149 173
90 Italian italijanski 6.81 156 168
92 others ostali 5.16 148 164
93 Romanian rumunski 7.08 126 160
94 state država 7.67 135 159
95 hair dlake 10.56 151 158
96 speech govor 6.31 127 155
97 instructor nastavnik 6.63 113 154
98 Bosniak bošnjački 7.02 87 152
99 exam ispit 6.79 117 152
100 introduction uvođenje 7.65 112 150
101 group grupa 8.61 120 148
102 textbook udžbenik 6.32 122 146
103 Latin latinski 8.81 115 143
105 teach predavati 10.71 112 143
106 philological filološki 7.80 133 142
107 publish objaviti 5.39 136 141
108 novel roman 6.95 129 140
109 best najbolji 8.65 130 136
111 five pet 5.15 123 132
112 said rečeno 9.00 125 129
113 music muzika 5.48 118 127
114 Vuk (Karadžić) Vuk 6.58 100 127
115 orthography pravopis 8.68 85 125
116 law zakon 5.05 100 124
117 faith vera 6.51 108 122
118 Slovene slovenački 5.97 112 121
119 program of study studija 5.52 105 121
122 life život 5.60 111 117
123 university univerzitet 5.04 108 116
124 Bulgarian bugarski 8.36 83 115
125 serve služiti 9.32 105 115
126 represent predstavljati 5.07 106 114
127 film (adj.) filmski 5.48 101 112
128 customs običaji 8.21 107 112
129 area oblast 5.71 100 111
130 SANU SANU 6.86 77 111
131 hatred mržnje 7.80 91 109
132 call zvati 8.19 81 109
241
133 board odbor 5.60 65 107
134 various razni 9.14 99 107
136 communication komunikacija 5.02 89 106
137 students (K-12) učenici 5.39 89 106
138 protection zaštita 5.50 94 106
139 Japanese japanski 8.28 79 105
140 Slovak slovački 7.43 87 105
141 grammar gramatika 8.65 82 104
142 think misliti 7.37 97 104
143 self sebe 7.59 92 101
144 Arabic arapski 9.31 86 100
145 church crkva 6.89 86 99
146 excellent odlično 5.82 96 98
147 poetry pesnik 8.67 90 98
148 know poznavati 9.47 88 95
149 six šest 5.34 89 95
150 doctor dr 6.52 86 94
151 Chinese kineski 8.02 76 94
152 compulsory obavezan 6.78 78 93
153 level nivo 5.25 78 92
154 standard standardni 7.63 55 92
155 desire (v.) želeti 5.09 84 92
156 literature literatura 6.71 84 88
157 expression izraz 6.04 74 85
160 territory prostor 7.66 80 84
161 attend pohađati 7.42 66 83
162 poetic pesnički 5.89 62 82
163 Croats Hrvati 6.16 52 80
164 department odsek 7.77 63 80
165 high school gimnazija 5.78 72 79
166 need (n.) potreba 5.03 71 79
167 framework okvir 5.03 66 76
168 style stil 6.81 71 76
169 so-called takozvani 6.05 59 75
170 instructional nastavni 5.54 55 73
171 translation prevođenje 11.04 64 73
172 belong pripadati 6.18 60 73
173 evil zli 13.67 68 73
174 against protiv 8.25 64 72
176 Turkish turski 5.95 60 71
177 minority manjinski 11.05 48 69
178 both oba 6.77 58 69
179 Latin (alphabet) latinica 6.82 51 68
180 Polish poljski 8.55 62 68
181 needed potreban 5.33 64 67
182 exclusively isključivo 6.37 63 66
183 everyday svakodnevni 6.21 62 66
184 study (v.) izučavati 12.16 53 65
185 paper list 5.51 64 65
186 Macedonian makedonskom 9.40 58 64
187 Ruthenian rusinski 7.87 42 64
188 read čitati 6.05 59 63
189 ordinary običan 5.66 62 63
190 preservation očuvanje 7.34 59 63
191 comprehensible razumljiv 11.12 62 63
192 study (v.) studirati 5.10 53 63
193 fluently tečno 7.64 63 63
194 third treći 5.14 53 62
195 simultaneously istovremeno 5.02 57 61
196 classical klasični 7.27 40 61
197 appear pojaviti 6.24 58 61
198 defense odbrana 5.05 44 60
199 geography geografija 11.13 52 59
200 students (K-8) đaci 5.61 55 58
201 beautiful lep 5.58 53 58
242
202 listen slušati 7.12 50 58
203 test test 6.99 44 58
204 nature priroda 5.16 54 57
205 high (school) srednji 5.50 53 56
206 Cyrillic (adj.) ćirilično 7.12 47 55
207 hear čuti 10.20 54 55
208 unique jedinstven 7.26 35 54
209 novelty novina 7.30 51 54
210 create stvarati 5.62 43 53
211 association udruženje 5.01 44 53
212 public javni 5.18 42 52
213 writing pisanje 5.68 46 52
214 computer računaru 8.93 50 52
215 element element 5.55 35 51
216 informing informisanje 5.49 46 51
217 necessary neophodan 5.30 46 51
218 again ponovo 6.46 48 50
219 follow pratiti 5.49 49 50
220 make praviti 7.53 45 50
221 picture slika 7.78 47 50
222 hundred sto 6.75 45 50
223 help pomoć 6.60 45 49
224 printed štampan 7.60 48 49
225 scientific naučni 7.64 43 48
226 significance značaj 5.33 44 48
227 magazine/journal časopis 6.01 39 47
228 abroad (n.) inostranstvu 5.78 41 47
229 interest interesovanje 5.42 45 47
230 Karadžić Karadžić 5.63 45 47
231 private privatni 9.29 33 47
232 environment sredina 5.01 34 47
233 election (adj.) izborni 5.67 32 46
234 universal univerzalni 9.34 42 46
235 clear čisti 9.80 42 45
236 Priština (adj.) prištinski 9.43 45 45
237 Roma Roma 6.25 20 45
238 influence uticaj 5.57 35 45
239 religion religija 8.77 41 44
240 schooling školovanje 5.72 42 44
241 teachers učitelji 6.24 35 44
242 readers čitaoci 5.16 43 43
243 philosophy filozofija 7.76 30 43
244 institute matica 5.47 38 43
245 show pokazivati 5.06 40 43
246 structure struktura 5.51 34 43
247 task zadatak 5.03 35 43
248 keep držati 5.14 40 42
249 comes out izlazi 6.87 41 42
250 persons lica 5.05 21 42
251 linguistic lingvistički 8.26 31 42
252 form oblik 5.16 35 42
253 syntax sintaksa 8.66 29 42
254 stand stajati 6.85 38 42
255 easier lakše 5.44 40 41
256 editor lektor 6.82 37 41
257 none nijedan 5.74 39 41
258 lectures predavanja 6.19 38 41
259 differentiate razlikovati 5.86 36 41
260 symbol simbola 7.04 35 41
261 variant varijanta 6.75 33 41
262 future (adj.) budući 5.33 38 40
263 rule pravilo 8.03 37 40
264 similar sličan 8.60 35 40
265 written ispisan 10.21 34 39
266 linguistics lingvistika 5.90 31 39
267 mother (n.) majka 6.91 37 39
268 Nikšić Nikšić 5.67 32 39
269 area područje 5.45 36 39
270 take (exams) polagati 5.40 32 39
243
271 existence postojanje 6.26 31 39
272 including uključujući 5.73 37 39
273 speaking govoreći 8.48 38 38
275 news vesti 6.24 34 38
276 Ijekavian ijekavski 7.02 29 37
277 call nazivati 6.82 32 37
278 organize organizovati 5.70 33 37
279 grade book dnevnik 6.78 36 36
280 rename preimenovati 8.24 24 36
281 reality stvarnost 6.96 35 36
282 computer science informatike 8.14 30 35
283 simple jednostavnim 11.02 34 35
284 encompass obuhvata 7.07 23 35
285 momentarily trenutno 6.76 33 35
286 thirty trideset 5.46 34 35
287 Ukrainian ukrajinski 8.97 30 35
288 purity čistota 8.59 26 34
289 reads glasi 6.79 32 34
290 linguist lingvista 6.73 33 34
291 first-graders prvaci 5.97 29 34
292 time slot termin 6.10 25 34
293 numerals brojki 11.62 32 33
294 use (n.) korišćenje 6.63 32 33
295 local lokalni 5.70 29 33
296 understood podrazumeva 5.04 33 33
297 research istraživanja 8.49 29 32
298 accent izgovor 5.52 27 32
299 adequate odgovarajući 5.85 29 32
300 sings peva 6.14 32 32
301 sentence rečenica 5.94 31 32
302 regional regionalni 10.31 24 32
303 study (n.) izučavanje 8.44 29 31
304 performs izvodi 7.26 30 31
305 minister ministar 5.19 29 31
306 enable omogućavati 5.52 27 31
307 jargon žargon 5.88 24 31
308 broadcast emituje 6.96 27 30
309 first najpre 8.26 28 30
310 paper papiru 7.80 30 30
311 master (v.) savladati 6.20 28 30
312 letter (a, b, c…) slovo 7.21 25 30
313 governing vladaju 7.32 27 30
314 rich bogat 5.56 28 29
315 diploma diploma 6.01 27 29
316 continue nastaviti 5.48 29 29
317 suits (v.) odgovara 7.05 27 29
318 discussion rasprava 5.29 25 29
319 equal ravnopravan 10.34 24 29
320 preserve sačuvati 5.25 26 29
321 population stanovništva 7.16 28 29
322 Subotica subotica 6.14 24 29
323 love (v.) volim 7.39 29 29
324 additional dodatni 5.41 27 28
325 bilingual dvojezično 9.00 28 28
326 offer (v.) nuditi 8.14 28 28
327 training obuku 9.65 26 28
328 count (n.) računa 7.12 25 28
329 walls zidova 6.49 22 28
330 optional fakultativni 8.59 24 27
331 South-Slavic južnoslovenski 11.88 22 27
332 phenomenon pojava 5.15 27 27
333 defend braniti 5.60 22 26
334 ethnic etnički 9.40 24 26
335 Hebrew hebrejskom 9.39 25 26
336 this (way) ovako 6.66 22 26
337 Swedish švedski 11.43 22 26
338 body tela 5.05 22 26
339 show (n.) emisije 5.79 24 25
244
340 voice glas 5.05 25 25
341 past prošlosti 6.57 21 25
342 understanding razumevanje 5.94 22 25
343 choose birati 5.55 21 24
344 document dokumenta 5.97 23 24
345 Yugoslavia Jugoslavije 7.09 23 24
346 find pronađu 5.12 24 24
347 across širom 8.17 24 24
348 perfecting usavršavanje 6.57 24 24
349 connoisseur znalac 8.38 23 24
350 Czech češki 6.38 20 23
351 twenty dvadeset 5.12 23 23
352 clear jasnim 10.26 23 23
353 unity jedinstvo 9.91 21 23
354 ignorance nepoznavanje 8.57 23 23
355 try (v.) pokuša(va)ti 6.68 22 23
356 little pomalo 7.06 23 23
357 declare izjasniti 5.04 20 22
358 Yugoslav jugoslovenske 8.94 21 22
359 dialect narečja 7.30 20 22
360 connoisseur poznavalac 7.27 22 22
361 whole (n.) celina 5.12 20 21
362 contribution doprinos 8.69 21 21
363 twenty dvadesetak 5.27 21 21
364 thousands hiljada 5.88 21 21
365 learn about upoznaju 6.09 20 21
366 works delima 5.16 20 20
367 notion pojam 5.94 20 20
368 taking (exams) polaganje 6.76 20 20
245
Table F2
Lemma Collocates of the Lemma JEZIK ‘Language’ in SERBCORP (by MI Score)

1 evil zli 13.67 68 73
5 South-Slavic južnoslovenski 11.88 22 27
6 numerals brojki 11.62 32 33
7 Swedish švedski 11.43 22 26
9 geography geografija 11.13 52 59
10 comprehensible razumljiv 11.12 62 63
11 minority manjinski 11.05 48 69
12 translation prevođenje 11.04 64 73
13 simple jednostavnim 11.02 34 35
15 hair dlake 10.56 151 158
17 equal ravnopravan 10.34 24 29
18 regional regionalni 10.31 24 32
19 clear jasnim 10.26 23 23
20 written ispisan 10.21 34 39
22 hear čuti 10.20 54 55
24 unity jedinstvo 9.91 21 23
26 clear čisti 9.80 42 45
28 use (v). koristiti 9.73 172 197
30 training obuku 9.65 26 28
32 know poznavati 9.47 88 95
33 Priština (adj.) prištinski 9.43 45 45
35 ethnic etnički 9.40 24 26
36 Macedonian makedonskom 9.40 58 64
37 Hebrew hebrejskom 9.39 25 26
38 universal univerzalni 9.34 42 46
39 serve služiti 9.32 105 115
40 Arabic arapski 9.31 86 100
41 private privatni 9.29 33 47
44 various razni 9.14 99 107
46 bilingual dvojezično 9.00 28 28
47 said rečeno 9.00 125 129
49 Ukrainian ukrajinski 8.97 30 35
50 Yugoslav jugoslovenske 8.94 21 22
51 computer računaru 8.93 50 52
54 text tekst 8.81 162 181
56 religion religija 8.77 41 44
57 many mnogi 8.77 204 220
59 contribution doprinos 8.69 21 21
61 poetry pesnik 8.67 90 98
62 syntax sintaksa 8.66 29 42
246
67 group grupa 8.61 120 148
68 similar sličan 8.60 35 40
69 purity čistota 8.59 26 34
70 optional fakultativni 8.59 24 27
72 ignorance nepoznavanje 8.57 23 23
73 Polish poljski 8.55 62 68
75 research istraživanja 8.49 29 32
76 speaking govoreći 8.48 38 38
77 study (n.) izučavanje 8.44 29 31
80 connoisseur znalac 8.38 23 24
82 use upotreba 8.35 366 538
84 Japanese japanski 8.28 79 105
86 first najpre 8.26 28 30
88 against protiv 8.25 64 72
89 rename preimenovati 8.24 24 36
90 customs običaji 8.21 107 112
91 call zvati 8.19 81 109
92 across širom 8.17 24 24
93 offer (v.) nuditi 8.14 28 28
94 computer science informatike 8.14 30 35
95 course kurs 8.11 184 226
97 rule pravilo 8.03 37 40
98 Chinese kineski 8.02 76 94
103 paper papiru 7.80 30 30
104 Greek grčki 7.80 173 219
106 hatred mržnje 7.80 91 109
107 picture slika 7.78 47 50
109 philosophy filozofija 7.76 30 43
110 state država 7.67 135 159
111 territory prostor 7.66 80 84
115 fluently tečno 7.64 63 63
116 standard standardni 7.63 55 92
117 printed štampan 7.60 48 49
118 self sebe 7.59 92 101
123 make praviti 7.53 45 50
127 their njihov 7.41 328 374
128 love (v.) volim 7.39 29 29
131 governing vladaju 7.32 27 30
132 dialect narečja 7.30 20 22
247
133 novelty novina 7.30 51 54
134 classical klasični 7.27 40 61
135 connoisseur poznavalac 7.27 22 22
137 performs izvodi 7.26 30 31
138 unique jedinstven 7.26 35 54
140 letter (a, b, c…) slovo 7.21 25 30
141 Roma (adj.) romski 7.17 146 273
142 population stanovništva 7.16 28 29
144 count (n.) računa 7.12 25 28
146 listen slušati 7.12 50 58
148 Yugoslavia Jugoslavije 7.09 23 24
150 encompass obuhvata 7.07 23 35
151 little pomalo 7.06 23 23
152 suits (v.) odgovara 7.05 27 29
153 symbol simbola 7.04 35 41
154 Ijekavian ijekavski 7.02 29 37
156 test test 6.99 44 58
157 broadcast emituje 6.96 27 30
158 reality stvarnost 6.96 35 36
159 Serbs Srbi 6.95 176 234
160 novel roman 6.95 129 140
161 mother (n.) majka 6.91 37 39
162 church crkva 6.89 86 99
163 comes out izlazi 6.87 41 42
164 label naziv 6.86 179 255
165 SANU SANU 6.86 77 111
166 stand stajati 6.85 38 42
167 number broj 6.84 152 179
169 Latin (alphabet) latinica 6.82 51 68
170 editor lektor 6.82 37 41
171 call nazivati 6.82 32 37
172 style stil 6.81 71 76
174 exam ispit 6.79 117 152
175 reads glasi 6.79 32 34
176 grade book dnevnik 6.78 36 36
178 say kazati 6.78 436 478
179 both oba 6.77 58 69
180 momentarily trenutno 6.76 33 35
181 taking (exams) polaganje 6.76 20 20
182 hundred sto 6.75 45 50
183 variant varijanta 6.75 33 41
185 literature literatura 6.71 84 88
186 must morati 6.70 195 207
187 try (v.) pokuša(va)ti 6.68 22 23
188 this (way) ovako 6.66 22 26
190 use (n.) korišćenje 6.63 32 33
192 help pomoć 6.60 45 49
194 past prošlosti 6.57 21 25
195 perfecting usavršavanje 6.57 24 24
197 doctor dr 6.52 86 94
199 faith vera 6.51 108 122
200 walls zidova 6.49 22 28
201 again ponovo 6.46 48 50
248
202 wrote pisali 6.44 337 402
204 learn učiti 6.41 519 714
205 Czech češki 6.38 20 23
210 speech govor 6.31 127 155
211 man čovek 6.30 316 347
213 Roma Roma 6.25 20 45
214 appear pojaviti 6.24 58 61
215 news vesti 6.24 34 38
216 teachers učitelji 6.24 35 44
217 media mediji 6.22 161 187
220 master (v.) savladati 6.20 28 30
221 lectures predavanja 6.19 38 41
224 sings peva 6.14 32 32
225 Subotica Subotica 6.14 24 29
226 time slot termin 6.10 25 34
227 learn about upoznaju 6.09 20 21
229 read čitati 6.05 59 63
233 word reč 6.02 543 644
234 magazine/journal časopis 6.01 39 47
235 diploma diploma 6.01 27 29
236 Slovene slovenački 5.97 112 121
237 document dokumenta 5.97 23 24
238 first-graders prvaci 5.97 29 34
239 Turkish turski 5.95 60 71
240 understanding razumevanje 5.94 22 25
241 sentence rečenica 5.94 31 32
242 poem pesma 5.94 205 229
243 foreign strani 5.94 1448 2293
244 notion pojam 5.94 20 20
245 our naš 5.90 1026 1262
247 poetic pesnički 5.89 62 82
248 people narod 5.89 340 455
250 jargon žargon 5.88 24 31
251 Cyrillic ćirilica 5.88 173 256
252 thousands hiljada 5.88 21 21
253 differentiate razlikovati 5.86 36 41
254 adequate odgovarajući 5.85 29 32
256 school škola 5.85 501 673
258 excellently odlično 5.82 96 98
259 speak govoriti 5.82 1609 1977
260 mean (v.) značiti 5.81 194 204
261 good dobar 5.80 284 301
262 show (n.) emisije 5.79 24 25
263 abroad (n.) inostranstvu 5.78 41 47
267 name ime 5.74 198 320
268 none nijedan 5.74 39 41
269 including uključujući 5.73 37 39
270 schooling školovanje 5.72 42 44
249
271 area oblast 5.71 100 111
272 local lokalni 5.70 29 33
273 organize organizovati 5.70 33 37
274 writing pisanje 5.68 46 52
276 election (adj.) izborni 5.67 32 46
277 own svoj 5.67 1080 1442
278 ordinary običan 5.66 62 63
279 exist postojati 5.64 278 328
281 Karadžić Karadžić 5.63 45 47
282 create stvarati 5.62 43 53
283 students (K-8) đaci 5.61 55 58
284 defend braniti 5.60 22 26
285 board odbor 5.60 65 107
286 life život 5.60 111 117
287 same isti 5.59 321 402
288 beautiful lep 5.58 53 58
289 influence uticaj 5.57 35 45
290 rich bogat 5.56 28 29
291 choose birati 5.55 21 24
294 enable omogućavati 5.52 27 31
296 accent izgovor 5.52 27 32
297 program of study studija 5.52 105 121
298 paper list 5.51 64 65
303 follow pratiti 5.49 49 50
304 informing informisanje 5.49 46 51
305 four četiri 5.49 154 174
306 music muzika 5.48 118 127
307 continue nastaviti 5.48 29 29
308 film (adj.) filmski 5.48 101 112
309 institute matica 5.47 38 43
310 thirty trideset 5.46 34 35
311 area područje 5.45 36 39
312 easier lakše 5.44 40 41
314 additional dodatni 5.41 27 28
315 take (exams) polagati 5.40 32 39
320 six šest 5.34 89 95
321 future (adj.) budući 5.33 38 40
325 discussion rasprava 5.29 25 29
326 twenty dvadesetak 5.27 21 21
327 preserve sačuvati 5.25 26 29
328 level nivo 5.25 78 92
329 second drugi 5.21 781 1022
331 minister ministar 5.19 29 31
332 public javni 5.18 42 52
335 form oblik 5.16 35 42
336 readers čitaoci 5.16 43 43
337 works delima 5.16 20 20
339 phenomenon pojava 5.15 27 27
250
340 five pet 5.15 123 132
341 keep držati 5.14 40 42
342 third treći 5.14 53 62
343 find pronađu 5.12 24 24
344 twenty dvadeset 5.12 23 23
345 whole (n.) celina 5.12 20 21
346 several nekoliko 5.11 227 238
347 study (v.) studirati 5.10 53 63
348 desire (v.) želeti 5.09 84 92
349 all svi 5.08 902 1068
351 show pokazivati 5.06 40 43
353 body tela 5.05 22 26
354 law zakon 5.05 100 124
355 three tri 5.05 217 259
356 persons lica 5.05 21 42
357 voice glas 5.05 25 25
358 university univerzitet 5.04 108 116
359 understood podrazumeva 5.04 33 33
360 declare izjasniti 5.04 20 22
361 need (n.) potreba 5.03 71 79
363 task zadatak 5.03 35 43
364 two dva 5.02 381 484
365 simultaneously istovremeno 5.02 57 61
367 environment sredina 5.01 34 47
251
Appendix G: Collocation Analysis (5+ Hits Section of SERBCORP)
Table G1
Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP
(by frequency)

3 that taj 7.64 484 791
6 own svoj 7.70 360 608
8 second drugi 11.05 307 507
9 itself sam 5.11 338 494
12 one jedan 6.22 281 454
13 all svi 8.24 290 406
14 our naš 8.91 257 401
17 learn (v.) učiti 9.51 175 340
19 school (K-12) škola 6.97 179 318
20 they oni 5.86 245 310
22 use (n.) upotreba 9.35 136 274
23 this ovaj 6.53 211 265
25 new nov 10.26 172 254
26 word reč 5.86 180 253
27 people narod 7.09 153 252
28 year godina 5.36 186 239
32 first prvi 6.08 154 222
35 his njegov 7.45 165 213
36 name ime 10.61 96 208
37 two dva 5.72 138 206
39 say kazati 9.73 152 187
43 book knjiga 5.41 118 169
44 their njihov 7.48 135 168
49 know znati 9.44 122 159
51 he on 5.39 133 156
52 same isti 7.39 112 154
53 name naziv 7.72 89 151
54 world (n.) svet 6.43 112 150
55 Serbia Srbija 6.54 109 148
56 Monte(negro) Gora 8.92 80 144
252
59 Serbs Srbi 5.74 86 138
60 Cyrillic (n.) ćirilica 7.61 70 135
62 Roma romski 8.49 34 135
63 (Monte)negro Crna 7.75 81 134
65 part deo 5.62 95 129
68 country zemlja 7.03 86 122
69 curriculum program 7.89 85 121
73 children deca 7.21 75 113
74 every svaki 6.34 94 113
76 institute institut 8.02 54 110
77 little mali 7.28 75 110
78 big veliki 5.85 82 110
79 man čovek 6.13 86 109
81 that is odnosno 5.51 81 108
83 grade razred 7.20 65 103
89 that onaj 6.63 74 93
95 board odbor 7.74 48 88
96 number broj 9.08 61 84
98 linguistic jezički 7.09 63 83
99 begin početi 7.97 33 83
102 good dobar 6.75 75 82
103 rights prava 5.28 65 82
104 standard (adj.) standardni 9.12 45 81
105 state država 8.45 61 79
107 many mnogi 7.18 68 78
108 learn (v.) naučiti 13.36 66 78
109 call zvati 9.15 51 78
110 cultural kulturni 7.13 70 77
112 use (v.) koristiti 12.79 55 76
114 say reći 11.64 64 74
115 basis osnov 9.41 60 72
116 speech govor 6.51 49 71
117 Belgrade Beograd 5.46 64 70
118 Greek grčki 7.76 34 70
119 problem problem 5.37 59 70
120 SANU SANU 6.19 40 70
123 law zakon 6.13 47 69
126 course kurs 7.98 43 67
253
128 group grupa 10.75 43 65
129 my moj 5.67 44 65
130 relation odnos 6.56 50 65
131 become postati 6.94 50 65
132 political politički 6.23 52 64
133 writer pisac 9.14 46 63
135 own sopstveni 8.14 46 62
136 that tim 8.73 53 62
137 get dobiti 7.31 52 60
138 only jedini 9.34 50 60
139 decision odluka 6.81 41 60
144 four četiri 5.08 46 58
145 exam ispit 6.69 38 57
146 living živ 8.50 47 57
147 media mediji 8.43 35 56
150 percent odsto 7.03 32 55
154 I ja 7.89 41 54
155 these ovi 9.36 51 54
157 work (v.) raditi 5.80 48 54
158 constitution ustav 6.69 40 54
160 case slučaj 7.62 45 53
161 desire (v.) želeti 10.03 49 53
162 Europe Evropa 7.33 41 52
163 always uvek 7.22 49 52
166 need (n.) potreba 7.05 43 51
167 study (n.) studija 6.37 40 51
168 make (v.) čini 6.02 43 50
169 identity identitet 6.31 37 50
174 be able to moći 7.20 35 49
176 level nivo 6.70 38 48
177 area oblast 7.54 38 48
180 Montenegrins Crnogorci 6.48 22 46
182 consider smatrati 5.01 42 46
183 art umetnost 6.15 42 46
184 day dan 5.70 41 45
186 possiblity mogućnost 5.67 38 44
187 war rat 5.95 27 44
189 thing stvar 5.86 38 44
190 society društvo 5.84 32 43
192 change (n.) promena 6.62 33 43
194 text tekst 5.74 30 43
254
196 old stari 7.37 33 42
197 lead (v.) voditi 6.63 39 42
198 spirit duh 6.60 34 41
200 Latin (adj.) latinica 11.10 27 41
202 poem pesma 7.04 35 41
203 majority većina 9.34 36 41
204 link veza 7.32 35 41
206 remain ostati 6.85 33 40
209 faith vera 6.89 32 40
210 come doći 6.61 37 39
213 such takav 5.61 36 39
214 center centar 5.70 29 38
219 biggest najveći 5.51 33 37
220 write pisati 8.37 30 37
221 both oba 5.89 26 36
222 self sebe 5.62 32 36
226 beginning početak 5.95 32 35
227 development razvoj 5.57 28 35
228 influence (n.) uticaj 6.82 25 35
229 community zajednica 6.66 27 35
230 elective izborni 6.37 20 34
233 use (v.) služiti 10.39 29 34
234 expert stručnjak 7.47 30 34
235 tradition tradicija 6.62 29 34
236 third treći 6.46 28 34
237 philosophical filozofski 6.39 24 33
238 Croatia Hrvatska 6.48 26 33
241 to not have nemati 6.57 32 33
242 her njen 9.47 29 33
244 council savet 5.80 24 33
245 Belgrade (adj.) beogradski 6.52 29 32
249 citizens građani 9.02 25 31
251 change menjati 5.76 22 31
253 republic republika 6.29 25 31
254 Ijekavian (dialect) ijekavski 7.26 22 30
255 plan plan 6.20 24 30
258 reason razlog 5.64 20 29
259 six šest 5.14 25 29
264 never nikad 5.50 26 28
255
265 standard (n.) standard 7.19 21 28
267 Karadžić (Vuk) Karadžić 6.33 25 27
268 come into being nastati 6.76 22 27
269 explain objašnjavati 9.21 23 27
271 accept prihvatiti 7.65 25 27
273 claim (v.) tvrditi 7.58 26 27
274 teachers (K-8) učitelji 6.27 20 27
275 see videti 10.42 23 27
276 state (adj.) državni 7.00 20 26
278 form oblik 6.23 20 26
279 concern (v.) ticati 5.34 23 26
280 often često 5.09 24 25
281 name (v.) nazvati 7.03 20 25
283 sense smisao 6.11 24 25
285 creation stvaranje 6.13 20 25
286 topic tema 5.49 23 25
287 last poslednji 6.27 23 24
288 means (n.) sredstvo 6.50 22 24
290 difficult (adv.) teško 8.10 24 24
291 authorities vlast 5.07 20 24
293 academy akademija 5.99 22 23
294 institution institucija 5.39 21 23
295 opinion mišljenje 5.26 21 23
296 consideration obzir 5.07 20 23
297 bigger veći 5.69 21 23
298 work (n.) delo 5.47 20 22
299 less/er manje 5.15 21 22
300 origin poreklo 6.67 21 22
301 project (n.) projekat 8.95 20 22
302 hundred sto 5.55 20 22
304 studying izučavanje 7.87 20 21
305 government vlada 5.66 20 20
256
Table G2
Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP
(by MI Score)

4 learn (v.) naučiti 13.36 66 78
7 use (v.) koristiti 12.79 55 76
15 say reći 11.64 64 74
17 Latin (adj.) latinica 11.10 27 41
19 second drugi 11.05 307 507
22 group grupa 10.75 43 65
25 name ime 10.61 96 208
27 see videti 10.42 23 27
29 use (v.) služiti 10.39 29 34
31 new nov 10.26 172 254
35 desire (v.) želeti 10.03 49 53
38 say kazati 9.73 152 187
43 learn (v.) učiti 9.51 175 340
46 her njen 9.47 29 33
48 know znati 9.44 122 159
49 basis osnov 9.41 60 72
52 these ovi 9.36 51 54
53 use (n.) upotreba 9.35 136 274
54 only jedini 9.34 50 60
55 majority većina 9.34 36 41
257
61 explain objašnjavati 9.21 23 27
64 call zvati 9.15 51 78
65 writer pisac 9.14 46 63
66 standard (adj.) standardni 9.12 45 81
67 number broj 9.08 61 84
68 citizens građani 9.02 25 31
69 project (n.) projekat 8.95 20 22
70 (Monte)negro Gora 8.92 80 144
71 our naš 8.91 257 401
80 that tim 8.73 53 62
89 living živ 8.50 47 57
90 Roma romski 8.49 34 135
91 state država 8.45 61 79
92 media mediji 8.43 35 56
95 write pisati 8.37 30 37
98 all svi 8.24 290 406
105 own sopstveni 8.14 46 62
108 difficult (adv.) teško 8.10 24 24
114 institute institut 8.02 54 110
116 course kurs 7.98 43 67
117 begin početi 7.97 33 83
121 curriculum program 7.89 85 121
122 I ja 7.89 41 54
124 studying izučavanje 7.87 20 21
128 Greek grčki 7.76 34 70
129 Monte(negro) Crna 7.75 81 134
258
131 board odbor 7.74 48 88
133 name naziv 7.72 89 151
134 own svoj 7.70 360 608
136 accept prihvatiti 7.65 25 27
137 that taj 7.64 484 791
139 case slučaj 7.62 45 53
140 Cyrillic (n.) ćirilica 7.61 70 135
145 claim (v.) tvrditi 7.58 26 27
148 area oblast 7.54 38 48
149 their njihov 7.48 135 168
150 expert stručnjak 7.47 30 34
151 his njegov 7.45 165 213
155 same isti 7.39 112 154
156 old stari 7.37 33 42
157 Europe Evropa 7.33 41 52
158 link veza 7.32 35 41
159 get dobiti 7.31 52 60
161 little mali 7.28 75 110
162 Ijekavian (dialect) ijekavski 7.26 22 30
164 always uvek 7.22 49 52
165 children deca 7.21 75 113
166 grade razred 7.20 65 103
167 be able to moći 7.20 35 49
168 standard (n.) standard 7.19 21 28
169 many mnogi 7.18 68 78
170 cultural kulturni 7.13 70 77
172 people narod 7.09 153 252
173 linguistic jezički 7.09 63 83
175 need (n.) potreba 7.05 43 51
178 poem pesma 7.04 35 41
179 country zemlja 7.03 86 122
180 percent odsto 7.03 32 55
181 name (v.) nazvati 7.03 20 25
182 state (adj.) državni 7.00 20 26
183 school (K-12) škola 6.97 179 318
186 become postati 6.94 50 65
187 faith vera 6.89 32 40
188 remain ostati 6.85 33 40
189 influence (n.) uticaj 6.82 25 35
190 decision odluka 6.81 41 60
191 come into being nastati 6.76 22 27
192 good dobar 6.75 75 82
193 level nivo 6.70 38 48
194 exam ispit 6.69 38 57
195 constitution ustav 6.69 40 54
197 origin poreklo 6.67 21 22
259
199 community zajednica 6.66 27 35
200 that onaj 6.63 74 93
201 lead (v.) voditi 6.63 39 42
202 change (n.) promena 6.62 33 43
203 tradition tradicija 6.62 29 34
204 come doći 6.61 37 39
206 spirit duh 6.60 34 41
207 to not have nemati 6.57 32 33
208 relation odnos 6.56 50 65
210 Serbia Srbija 6.54 109 148
212 this ovaj 6.53 211 265
213 Belgrade (adj.) beogradski 6.52 29 32
214 speech govor 6.51 49 71
215 means (n.) sredstvo 6.50 22 24
217 Montenegrins Crnogorci 6.48 22 46
218 Croatia Hrvatska 6.48 26 33
219 third treći 6.46 28 34
220 world (n.) svet 6.43 112 150
222 philosophical filozofski 6.39 24 33
223 study (n.) studija 6.37 40 51
225 elective izborni 6.37 20 34
226 every svaki 6.34 94 113
228 Karadžić (Vuk) Karadžić 6.33 25 27
229 identity identitet 6.31 37 50
230 republic republika 6.29 25 31
231 teachers (K-8) učitelji 6.27 20 27
232 last poslednji 6.27 23 24
233 political politički 6.23 52 64
234 form oblik 6.23 20 26
235 one jedan 6.22 281 454
238 plan plan 6.20 24 30
239 SANU SANU 6.19 40 70
240 art umetnost 6.15 42 46
241 man čovek 6.13 86 109
242 law zakon 6.13 47 69
243 creation stvaranje 6.13 20 25
244 sense smisao 6.11 24 25
245 first prvi 6.08 154 222
247 make (v.) čini 6.02 43 50
248 academy akademija 5.99 22 23
249 war rat 5.95 27 44
250 beginning početak 5.95 32 35
252 both oba 5.89 26 36
254 they oni 5.86 245 310
255 word reč 5.86 180 253
256 thing stvar 5.86 38 44
257 big veliki 5.85 82 110
258 society društvo 5.84 32 43
260 work (v.) raditi 5.80 48 54
261 council savet 5.80 24 33
263 change menjati 5.76 22 31
264 Serbs Srbi 5.74 86 138
265 text tekst 5.74 30 43
267 two dva 5.72 138 206
260
268 day dan 5.70 41 45
269 center centar 5.70 29 38
271 bigger veći 5.69 21 23
272 my moj 5.67 44 65
273 possiblity mogućnost 5.67 38 44
274 government vlada 5.66 20 20
275 reason razlog 5.64 20 29
276 part deo 5.62 95 129
277 self sebe 5.62 32 36
278 such takav 5.61 36 39
279 development razvoj 5.57 28 35
280 hundred sto 5.55 20 22
282 that is odnosno 5.51 81 108
283 biggest najveći 5.51 33 37
284 never nikad 5.50 26 28
285 topic tema 5.49 23 25
286 work (n.) delo 5.47 20 22
287 Belgrade Beograd 5.46 64 70
288 book knjiga 5.41 118 169
289 he on 5.39 133 156
290 institution institucija 5.39 21 23
291 problem problem 5.37 59 70
293 year godina 5.36 186 239
294 concern (v.) ticati 5.34 23 26
295 rights prava 5.28 65 82
296 opinion mišljenje 5.26 21 23
297 less/er manje 5.15 21 22
298 six šest 5.14 25 29
300 itself sam 5.11 338 494
301 often često 5.09 24 25
302 four četiri 5.08 46 58
303 authorities vlast 5.07 20 24
304 consideration obzir 5.07 20 23
305 consider smatrati 5.01 42 46
1
SETIMES2, OPUS2, and srWaC14 are available at www.sketchengine.co.uk.
2
The caveat here, of course, is that the reference/comparator corpus should be at least the size of the
research corpus.
3
Because of their large sizes and limited availability, the WaC corpora could only be used as reference
corpora by first downloading their full wordlists in txt format and then converting these into WST wordlists
for the purposes of keyword analysis. The alternative solution, uploading the entire SERBCOMP onto the
SketchEngine website to conduct keyword analysis there, was technically demanding and prohibitively
expensive.
4
It should be noted that Serbian is a heavily inflectional language and so all search terms, keywords,
collocates, and n-grams are likely to (and do) show up in multiple inflectional forms in the corpus.
Although lemmatization can be problematic because it “has the potential to disguise important differences
261
in collocational preferences between different forms of a lemma” (Durrant, 2009, p. 162; see also Sinclair,
1991), it was consistently applied to all quantitative CL analyses (with the exception of n-gram analysis) in
this study to reduce the impact of inflectional morphology on statistical analyses (cf. Baker, Gabrielatos &
McEnery, 2013; Partington, 2010). For example, treating individual lemma forms separately often meant
that obviously important lexical items either fell (well) below the frequency threshold or appeared to be
less salient than they are. Lemmatization solved this problem by adding up the frequencies of all individual
lemma forms for a total lemma frequency. Similarly, treating individual lemma forms separately would
have multiplied the sometimes already large numbers of keywords, collocates, and n-grams. As a
corollary, many collocate variables based on individual lemma forms in EFA would have likely failed to
load on any factors due to their considerably lower frequencies. Thus, even though different lemma forms
do often exhibit different, and sometimes complementary, collocational preferences, this does not appear to
be of paramount importance in the context of a macroscopic approach as employed in this study.

41
For presentation purposes, the forms of lemmatized keywords are standardized to the nominative case of
the predominant number (singular or plural) for nouns and pronouns, first person singular masculinum for
adjectives, and the infinitive case for verbs. Keywords that appeared in only one of their possible lemma
forms are presented in their original form.
262

Adnan Ajšić - LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM IN THE BALKANS: A CORPUS-BASED STUDY

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Adnan Ajšić - LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM IN THE BALKANS: A CORPUS-BASED STUDY

Diunggah oleh

Hak Cipta:

Format Tersedia

LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM

IN THE BALKANS: A CORPUS-BASED STUDY

Submitted in Partial Fulfillment

of the Requirements for the Degree of

Northern Arizona University

Douglas Biber, PhD, Co-Chair

Mary McGroarty, PhD, Co-Chair

Randi Reppen, PhD

James Wilce, PhD

LANGUAGE IDEOLOGIES, PUBLIC DISCOURSES, AND ETHNONATIONALISM

IN THE BALKANS: A CORPUS-BASED STUDY

inception of nationalism and the one-nation-one-language-one-territory trope, and

continue to be important for the construction and maintenance of national identities in

common language (formerly known as Serbo-Croatian) and the concomitant contestation

dominant language-related discourses and language ideologies on the basis of an

discourses, language ideologies, and ethnonationalist discourse in the mainstream press

published in Serbia as the largest Central South Slavic nation.

To investigate language-related discourses and language ideologies in the

mainstream Serbian press two comprehensive, specialized research (11,656,247 words

quantitative (corpus linguistics) and qualitative (critical discourse analysis/discourse-

the identification of language-related discourses and language ideologies.

The findings suggest the existence of pervasive language-related discourses of

a long history and crucial function in Serbian nationalism. The methodological

macroscopic analysis), as well as an overall complementarity of the quantitative and

qualitative methods. Crucially, however, exploratory factor analysis is shown to be the

most effective analytical method for the purposes of corpus-based investigations of

acceptance and naturalization of dominant language-related discourses and language

ideologies in Serbian society.

I would like to thank my committee members, Doug Biber, Mary McGroarty,

unique perspective I could rely on. I chose wisely. Thank you.

Fala ti, učitelju.

Hvala ti na nesebičnosti i dostojanstvu u plavom mantilu. Htjelo je ovako da bude.

List of Figures .................................................................................................................. xv

1.1 Definition of Problem ............................................................................................... 1

1.2 Sociolinguistic History and the Role of Language Ideologies in Contemporary

1.2.1 The symbolic importance of language. .............................................................. 3

1.2.2 A brief sociolinguistic history of West Central Balkans. ................................... 6

1.2.3 Language, identity and ethnonationalism in contemporary Balkans. .............. 13

1.3 Study Outline .......................................................................................................... 15

2. Literature Review ....................................................................................................... 16

2.1 Theoretical Approaches to Ideology ....................................................................... 16

2.1.1 Historical development. ................................................................................... 16

2.1.2 Theoretical approaches. ................................................................................... 16

2.1.4 Conceptualizations. .......................................................................................... 20

2.2 Theoretical Approaches to Language Ideology ...................................................... 21

2.2.1 Historical development. ................................................................................... 21

2.2.2 Theoretical approaches. ................................................................................... 22

2.2.3 Definitions and conceptualizations. ................................................................. 23

2.3 Empirical Approaches to Language Ideology ......................................................... 26

2.3.1 Theoretical and methodological contexts in language ideology research. ....... 26

2.3.2 Research questions in language ideology research. ......................................... 28

2.3.4 Corpus linguistics in research on discourse and language ideology. ............... 32

2.4 Gaps ........................................................................................................................ 34

3. Study Overview ........................................................................................................... 36

3.1 Research Questions ................................................................................................. 36

3.1.1 Research question 1. ........................................................................................ 36

3.1.2 Research question 2. ........................................................................................ 36

3.1.3 Research question 3. ........................................................................................ 36

3.1.4 Research question 4. ........................................................................................ 36

3.2 Research Design ..................................................................................................... 36

3.3 Construct Definitions and Operationalizations ....................................................... 39

3.3.1 Core concepts. .................................................................................................. 39

3.3.2 Keywords. ........................................................................................................ 39

3.3.3 Relevant collocates. ......................................................................................... 39