By Adnan Ajšić
A Dissertation
Doctor of Philosophy
in Applied Linguistics
May 2015
Approved:
ADNAN AJŠIĆ
Language ideologies have been closely related to nationalist discourses since the
Europe and elsewhere. Although recent research has examined language debates and the
links between language ideologies and national identities in plurilingual and multicultural
societies (e.g., Canada, Vessey, 2013a; Spain/Catalonia, Pujolar, 2007), little attention has
been paid to contexts with minimal linguistic differences between groups such as the
West Central Balkans. Public language-related discourse in the Central South Slavic area
in the last twenty years has been dominated by a fierce debate over the ownership of the
of ethnolinguistic identities. The principal goal of this study, therefore, was to identify
empirical, mixed methods approach, and investigate the links between language-related
from 16,148 articles) and comparator (22,493,804 words from 37,227 articles) corpora
were compiled from relevant articles published in four leading Serbian dailies and
ii
weeklies. Following recent developments in mixed methods research into discourses and
ideologies (Baker et al., 2008), the data were analyzed using a combination of
historical approach) methods. The second major goal of this study, therefore, was to
compare quantitative methods employed in terms of their usefulness and effectiveness for
endangerment and contestation which are based on an essentialist language ideology with
comparison suggests different roles for different quantitative methods (e.g., micro- and
discourses and ideologies. Finally, despite some synchronic and diachronic variation in
(small ‘d’) discourses suggested by factors, the discursive and ideological profiles of the
mainstream Serbian press are shown to be fairly uniform and stable, suggesting broad
iii
Adnan Ajšić
© 2015
iv
Acknowledgments
Randi Reppen, and Jim Wilce for their unwavering support and endless patience. When I
was embarking upon this journey, I thought each one of you would be able to provide a
I am also grateful to my wife, Deniza, and my son, Aiden Mak, for endurance and
inspiration during a very difficult time. Hvala oboma. Gotovo je, slobodni smo. I’d also
like thank my mom, Elbisa, and mother-in-law, Ajša, who both did what Bosnian moms
do. Hvala objema. Finally, thank you to my sister, Amra, for proving me right, and my
brother-in-law, Nebojša, for having a rare combination of intelligence, skill, and patience.
Meši
v
Contents
List of Tables.................................................................................................................... xii
1. Introduction ................................................................................................................... 1
Balkans............................................................................................................................ 3
2.1.3 Definitions........................................................................................................ 18
vi
2.3.3 Types of data used in language ideology research. .......................................... 31
4. Data .............................................................................................................................. 43
5. Methods ........................................................................................................................ 48
vii
5.2 Collocation Analysis ............................................................................................... 52
6. Results .......................................................................................................................... 67
6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of
SERBCORP). ............................................................................................................ 73
viii
6.2.2 N-grams. ........................................................................................................... 91
6.3.9 Factor 6: Contestation over language ownership and name. ......................... 123
6.3.12 Factor 12: Linguacultural diplomacy, language, and culture. ...................... 130
ix
6.5 Cluster Analysis .................................................................................................... 140
6.5.1 Preferred cluster solution and scoring patterns by factor and cluster. ........... 141
6.6.4 Excerpts from texts representative of Factors 6 and 10. ................................ 163
7. Discussion................................................................................................................... 172
x
Appendix A: Sampling Procedures ............................................................................. 215
Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits
xi
List of Tables
Table 6 Articles in SERBCORP (by Hit Count for the Lemma JEZIK and
Percentage)……………………………………………………………….46
Table 9 Article Means, SD, and STTR in the 5+ Hits Section of SERBCORP
(by Publication).........................................................................................47
xii
Table 15 Ethnolinguistic Identity-related Key-keywords and Key-keyword
Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits
Section of SERBCORP as the Reference Corpus (by Rank/Number of
Texts)…………………………………………………………………….84
Table 22 Rotated Factor Pattern for the 12-factor Solution (Varimax Rotation)…100
xiii
Table 30 Top 20 Highest Scoring Articles on Factor 4
(MV Outliers are in Bold)………………………………………………120
xiv
List of Figures
xv
1. Introduction
Although the standard varieties of the Central South Slavic diasystem are virtually
fully mutually intelligible, in this region, as elsewhere, language has long been a primary
tool in the construction and maintenance of separate ethnonational identities and hence
highly contested (Greenberg, 2004). After a period of relative political stability and
formal linguistic union under the label of Serbo-Croatian in the former Yugoslavia (1945-
1991), the contestation reemerged and intensified with the dissolution of the federal state
identities and states. The linguistic consequences of the dissolution include the “nominal
considerably different language policies in the new states (Bugarski, 2004). Despite this,
or perhaps precisely because of it, the contestation continues as identity and nationhood
continue to be negotiated. Most recently, in the summer months of 2013 three fierce
public language debates took place in three of the four successor states. In Bosnia-
Herzegovina, the debate, which took place in the period leading up to the country’s first
postwar census (October 2013), centered on the legitimacy of the name of one of the
three official languages, Bosnian. In Croatia, the debate centered on the reintroduction of
biscriptal public signs including the Cyrillic alphabet and the sometimes violent protests
against it in the easternmost Croatian city of Vukovar which has a sizeable Serb minority.
1
with the European Union.
Underlying these and similar debates are language ideologies, for the present
purposes best defined as “the cultural system[s] of ideas about social and linguistic
relationships, together with their loading of moral and political interests” (Irvine, 1989, p.
255). Because they function as a mediating link between linguistic and social practices
language ideologies are “not about language alone” (Woolard, 1998, p. 3). Rather, they
often serve as tools for the invention of tradition which makes possible the “imagined
public institutions such as print. For this reason, Silverstein (1998), for example, points
language ideology. Available research suggests that mass media, and newspapers in
particular, are a primary site for discursive and ideological reproduction in modern
societies (e.g., Fowler, 1991; see also papers in Johnson & Ensslin, 2007). Studies of
language ideology thus often focus on public institutional discourses and those of
(DiGiacomo, 1999, p. 105) as well as “key sites for language ideological debates between
various kinds of social actors” (Ensslin & Johnson, 2006, p. 155). Furthermore, if we
accept the view of newspapers as a discourse community with which the audience
identifies, it follows that “their average lexicon shapes, describes and expresses what is
accepted by [the] community” itself (Bassi, 2010, p. 209). This latter point is particularly
The principal goal of this study is to investigate the links between language-
2
related discourse and language ideologies and ethnonationalist discourse in mainstream
newspapers published in Serbia as the largest Central South Slavic nation. Language
ideologies have been closely related to nationalist discourses since the inception of
2000). They continue to be important for the construction of national identities in Europe
and elsewhere, and in evidence in news writing (e.g., Blommaert & Verschueren, 1998).
Language-related discourse in the West Central Balkans in the last twenty years has been
dominated by a debate over the ownership of the common language and a concomitant
contestation of ethnonational identities which are still widely regarded in this area to rest
(e.g., Blommaert, 1999) and the links between language ideology and national identity in
plurilingual and multicultural societies (e.g., Canada, Vessey, 2013a and Spain/Catalonia,
Pujolar, 2007), little attention has been paid to contexts with minimal linguistic
differences between groups such as the West Central Balkans, particularly from a
quantitative or mixed methods perspective. This study is an attempt to close that gap.
Balkans
A nation has nothing holier nor dearer than its natural language, for it is only
through language that a nation, as a particular society, continues or vanishes.
defining characteristics of Homo sapiens as a species1 and as such it has always been of
as a cultural tool. Its importance as an ideological tool in the struggle for hegemonic
3
power in late modernity, however, has been growing further still (Bourdieu, 1991;
the societies of western and central Balkans, now collectively known as the former
Yugoslavia, do not always fit neatly in the category of late modernity, the significance of
language in both the traditional and postmodern senses is perhaps nowhere as great.
Indeed, as Robert Greenberg (2004) notes in the conclusion to his book Language and
identity in the Balkans: Serbo-Croatian and its disintegration, “in the former Yugoslavia
the power of language has at times reached absurd proportions” (p. 159). In the former
Yugoslavia, one might add echoing Ljudevit Gaj’s words from the epigraph above,
language ideology produced shibboleths that at times meant the difference between life
and death.
the former Yugoslavia is complex and can only be understood with reference to the
2004). Ironically, despite the oft-repeated references to the region’s history, discussions
of language in the former Yugoslavia, academic and otherwise, often fail to appreciate the
longer than most and, unlike most, is the product of both Western and Eastern
imperialisms. Beyond the direct physical, political, legal, cultural, linguistic, etc.,
impacts of the dueling colonialist enterprises, and at least as important, is the impact of
4
the seventeenth- and eighteenth-century Western language ideologies, which were
example, see Gal, 2001), and as I hope to show, especially so in the Balkans (cf. Irvine &
Gal, 2000, especially pp. 60-71). Here, two strands of thought are particularly important:
century English Enlightenment, and even more so, the Herderian idealization of folk
language in the spirit of the eighteenth-century German Romanticism (Bauman & Briggs,
possible. Anachronistic yet coinciding with “the rise of small nations” (Wright, 2004) at
the end of the twentieth century and the beginning of the twenty-first, the Balkan
obsession with (purported) linguistic authenticity. The outcome has been what Greenberg
(2004), following Heinz Kloss, called the “nominal language death” of Serbo-Croatian,
the erstwhile common yet polycentric standard (Kordić, 2010), as well as a multiplicity
of mutually contested language ideologies (cf. Gal, 1998), waving the successor
of language ideology in its historical and sociopolitical contexts (cf. Irvine & Gal, 2000).
5
I will therefore first provide a brief discussion of the language history, language situation,
and language politics of the region, before moving on to offer a rationale and delimitation
for this study. The following account draws on Greenberg (2004), a rare comprehensive
and fairly neutral treatment of the topic,2 and to a lesser extent on Katičić (1997),
languages of the former Yugoslavia (with the addition of Bulgarian) form the southern
part of the Slavic language group. Slovenian at the northwest and Macedonian at the
southeast ends of the area are separate languages of the Abstand type (Kloss, 1967),
whereas the larger central part of the area (Central South Slavic), spanning Croatia,
spanning all four countries and all four ethnic groups (i.e., Bosniaks, Croats,
Montenegrins, and Serbs). Considering the long and varied colonial history of the
region3 and the concomitant differences in culture and religion,4 it is remarkable that a
common dialect would develop and survive over the centuries. This became especially
important in the nineteenth century as the peoples of this region sought independence
and, following the trend in the rest of Europe, the formation of their own nation-states.
movement came into existence in Croatia (the Illyrians) which sought the unification of
South Slavs and their language based on a common dialect. Linguistic unification being
a more realistic goal than political unification at the time,5 a group of Serbian and
Croatian linguists and literary figures met in 1850 in Vienna, Austria to produce what
6
would become known as the Vienna Literary Agreement, which is widely considered to
be the inception of a common language standard for the Central South Slavic area (but
see Katičić, 1997 for a problematization of this view). The agreement was non-binding,
however, and it did not venture far beyond the status planning decision to base the
common standard on the “southern dialect” (i.e., the Neo-Štokavian) rather than on an
artificial amalgam of existing dialects (for the original text and an English translation of
the agreement, see Greenberg, 2004, pp. 168-171). Crucially, the name for this new
linked to power. The common standard that had been agreed upon in 1850 in Vienna
would therefore have to wait for the political circumstances to change to be implemented.
However, by the time such circumstances materialized at the end of World War I, the
politics of language in the region had also changed. Despite an initial defeat in the war
with Austria that began in the wake of the assassination of Archduke Franz Ferdinand of
Austria in Sarajevo, Serbia eventually recovered and joined her Triple Entente allies in
victory over Germany and Austria-Hungary. Owing to the war effort and continued
Russian support, Serbia was then granted its own regional sphere of influence, which
resulted in the formation of the Kingdom of Serbs, Croats, and Slovenes in 1918, the first
joint South Slavic state. Importantly, Serbia was the only South Slavic nation that
entered the union as a military power and an independent state,7 while ethnic groups
other than Serbs, Croats, and Slovenes (i.e., Bosniaks, Macedonians, and Montenegrins)
did not receive any political recognition at this time. Much like the Vienna Literary
Agreement, the new state turned out to be largely a Serbo-Croatian affair, wherein the
7
Croats fought to resist the sometimes real, sometimes perceived Serbian hegemony.8 The
political bickering destabilized the country, so in 1929 the reigning Serbian monarch,
King Alexander, seized the opportunity to change the constitution and with it,
The period between 1850 and 1929, on the other hand, saw more status and
corpus planning work (see Greenberg, 2004, p. 54, for an overview of landmark events).
But, despite a shared dialect, the region harbored a number of quite disparate linguistic
traditions: there were two different alphabets in use (Latin and Cyrillic) and three
lexis and linguistic culture (e.g., in attitudes toward popular speech as the basis for a
standard). In addition to this, the standardization in Serbia had proceeded along divergent
lines since its independence in 1878. In 1913 Jovan Skerlić, a Serbian linguist, thus
attempted to resolve the major differences by proposing a compromise whereby the Serbs
would give up Cyrillic for the Latin alphabet while the Croats would switch from the
Ijekavian to Ekavian (i.e., Serbian) pronunciation, but this was never seriously
entertained. Furthermore, concurrently with King Alexander’s drive for a tighter union
and with government support, another prominent Serbian linguist, Aleksandar Belić,
Skerlić’s proposal with regard to pronunciation choice in Serbian favor. Needless to say,
this was opposed and even resented by most Ijekavian speakers (Bosniaks, Bosnian and
Croatian Serbs, Croats, and Montenegrins), but only the Croats had any political clout to
resist the Serbs. This was a prelude to a period of rapidly worsening inter-ethnic
relations, particularly between the Serbs and Croats, which culminated in the Yugoslav
8
capitulation after a brief war with Nazi Germany and the formation of a Croatian Nazi
puppet state (NDH) in 1941. Eager to dissociate Croatian from Serbian because of the
(perceived) implications of a close association for Croatian national identity, the ultra-
and introducing numerous archaisms and neologisms alike, among other innovations.
which culminated in the genocide against the Croatian Serbs (and Roma).
At the same time, the Yugoslav Communist Party guerilla force, which was
composed of members from all ethnic groups, was gaining strength; headed by Josip
Broz Tito, the Partisans first founded the second Yugoslav state in 1943 and then defeated
both the German and Italian occupiers and their mostly Croatian and Serbian
among the constituent peoples (i.e., ethnic groups) and tolerance of minorities, although it
was not immune to certain problematic compromises. In accordance with its ideology of
“brotherhood and unity”, the second Yugoslavia eventually reinstated the common Serbo-
Croatian standard as the official language, while also recognizing Slovene and
Montenegrins were left out again, however.10 In order to resolve some of the old issues
and chart a new course, a new meeting of linguists was called in 1954 in Novi Sad,
Serbia. Again, as in 1850, the meeting in Novi Sad included only Serb and Croat
9
linguists. The compromise agreement they reached, known as the Novi Sad Agreement,
consisted of ten conclusions concerning status and corpus planning (for the original text
and an English translation of the agreement, see Greenberg, 2004, pp. 172-174). The
Western/Ijekavian one (i.e., Serbian and Croatian), with equal use of the two alphabets
(Latin and Cyrillic) throughout. Also agreed was joint codification work on new
Some Croatian linguists now argue that Serbo-Croatian never really existed (e.g.,
Barić et al., 1999; Katičić, 1997), whereas Serbian linguists generally reject this thesis.
Greenberg (2004), for his part, points to the fact that linguistic unification, at least the
original one in 1850, was not forced upon the Serbs and Croats by anyone, and this is
certainly true (even though this view ignores the fact of imposition of the linguistic union
on Bosniaks and Montenegrins). However, it is important to note that the historical and
political circumstances in the region, the relations between the different ethnic groups,
again especially Serbs and Croats, as well as the significance of the language issue, had
all drastically changed by 1954. This time, the Serb and Croat linguists produced what
now seems a mere tactical agreement, likely expecting that it would not last. And, of
course, it didn’t. Some twelve years later, in 1966, first an unauthorized dictionary of
(Moskovljević, 1966), then, a year later, the Croats responded by issuing the “Declaration
on the Name and Position of the Croatian Literary Language” which was a direct
challenge to the Serbo-Croatian common standard; the joint codification work that was
10
underway would soon stop. Despite a swift intervention by the federal authorities, who
rightly saw this turn of events as a danger to ethnic relations in Yugoslavia, the writing
was on the wall: the project of unification was over and it was only a matter of time
Greenberg (2004, p. 32) cogently notes that this language conflict was
symptomatic of a more general restructuring of the federal state, which was moving
federal to the republic (i.e., state) level was finally enshrined in the 1974 rewriting of the
constitution. In terms of language policy, this is particularly significant because the new
constitution devolved also language policy to the republic level, effectively opening the
door to the introduction of a polycentric standard. This was, of course, seized upon by
Montenegro, who had had no voice in any of the previous decisions.11 Of course, official
federal policy, which in the Communist-run Yugoslavia had the force of a dogma, made it
into anything else, but new standard varieties were nevertheless introduced under the
included elements from the idioms of all three major ethnic groups (Bosniaks, Croats,
and Serbs) but was anchored in the idiom of the Bosniaks as the largest group, some
elements of which in their turn had come to be shared by Bosnian Serbs and Bosnian
Croats (e.g., frequent use of Turkish loans in everyday speech); both alphabets remained
in use and retained an equal status.12 However, the Serbian intellectual elite largely
11
interpreted this development as a threat to both the integrity of Yugoslavia and the ethnic
identity of their co-ethnics outside of Serbia, who made up a sizeable minority in Croatia
and roughly a third of the population of Bosnia-Herzegovina. This tension would simmer
more or less quietly until the resurgence of open nationalism after the Yugoslav
Communist Party had relinquished power and called free elections in 1990.
mentioned above was on full display during the election campaigns of 1990, while
Needless to say, such a development did not bode well for the federation and the issue of
the political future of the country became “the question of all questions” during the
campaigns and particularly after the elections. As the largest ethnic group spread over
several republics and one that was effectively in control of the federal state as well as
overrepresented in the oversized federal military, Serbs stood to lose the most from a
the federation by the newly elected presidents of the republics. The resulting stalemate
with Serbia until 2006 when it too regained independence. With the exception of Bosnia-
Herzegovina where this issue was more complicated on account of its ethnic
composition,13 each newly independent country declared its majority language official
and the unified Serbo-Croatian standard thus formally ceased to exist (see Sudetic, 1993
for a contemporary report). Having firm control over the powerful Yugoslav military, the
12
Serbs rejected these declarations of independence, turning political into military conflicts
with war flaring up first in Slovenia, then in Croatia, and finally in Bosnia-Herzegovina.14
After a series of wars, including the longest and particularly vicious Bosnian War which
genocide against the Bosniaks by the Serbs, the former Yugoslavia metamorphosed into
seven independent states, each with its own language policy, replacing Serbo-Croatian
2004).
clear from the discussion above that despite a roughly 150-year-long history of
unification attempts, Central South Slavic has always been and remains a polycentric
language (cf. Kordić, 2010). At the same time, this polycentricity and the right to codify
and particularly name varieties has been fiercely contested since the nineteenth century
by the Serbian and Croatian intellectual elites, which have, with varying degrees of
theories of language and nationalism. Hence, it has been of little consequence that,
13
the one case, Catholic versus Protestant/Calvinist in the other” (Blommaert &
Verschueren, 1998, p. 199).
As noted above, similar to the rest of Eastern Europe the peoples of the Balkans have
embraced the essentialist Western language ideology which views language as the
embodiment of the character of the “natural” group that produced it, i.e. the nation (see
Bauman & Briggs, 2000). Consequently, language has been a primary site for the
construction of and struggle over ethnonational identity and the concomitant group rights
as “ideologies that appear to be about language, when carefully reread, are revealed to be
coded stories about political, religious, or scientific conflicts” (Gal, 1998, p. 323). This is
evident in public and scholarly discourses on language both in Europe and in the Balkans.
Verschueren (1998), for example, note that “the absence of the feature ‘distinct language’
tends to cast doubts on the legitimacy of claims to nationhood” (p. 192). Furthermore, as
Irvine and Gal (2000) argue, in “the political contestation surrounding contrasting
for political and military action that changed sociolinguistic practices, thereby bringing
into existence patterns of language use that more closely matched the ideology of
Western Europe” (p. 60). But it would be naïve, of course, to think that Western ideology
has been merely passively adopted. As the recent localizations of the contemporary
Western discourse on terrorism in the Serbian and Croatian press (Erjavec, 2009) show,
Western ideologies are also appropriated and function on the basis of “fractal recursivity”
(Irvine & Gal, 2000, p. 38) to serve specific local purposes. As Irvine and Gal (2000)
ethnonational identity is thus “hardly surprising, given the consequences envisaged and
14
authorized by the reigning language ideology and occasionally enacted under its auspices.
because they are also claims to territory and sovereignty” (p. 72, my italics). Seen in this
light, the continuing Serbian (and in the case of Bosnian, also Croatian) refusal to
recognize other groups’ ownership rights, including the right to name their language (see
Although, as we have seen, language (ideological) debates have continued throughout the
period of the “joint” language development, the period between 1990 and now is
especially significant because it has seen the end of Serbian (and Croatian) lingua-
Herzegovina and Montenegro. These processes have been reflected in and partly
constituted through discourses produced for and directed at the public “as a language-
based form of political legitimation” (Gal & Woolard, 2001, p. 4). What is still missing
order to determine their specific contents and modi operandi. But, before we turn to the
let us take a detailed look at the concepts of ideology and discourse, which are often
contested themselves.
literature review, divided into sections about theoretical approaches to ideology and
detailed overview of the study is given, including the research questions, research design,
15
construct definitions, and gaps. Chapters 4 and 5 discuss data and methods employed,
while Chapters 6 and 7 present the results and discussion (by research question). The
conclusion, limitations, and suggestions for future research are given in Chapter 8.
Appendices A-G detail relevant preliminary analyses and show full lists of keywords and
collocations.
2. Literature Review
originates from the late eighteenth century. The term was coined by the French
zoölogy), optimistically hoping to arrive at a full understanding of the human mind and
1998; see also Aarsleff, 1982). As both Silverstein and Eagleton note, however, similar to
many other terms ending in “-ology” the meaning of ideology quickly shifted from
1) and thus from “scientific study of human ideas” to “systems of ideas themselves”
(Eagleton, 1991, p. 63). Complicating matters further, over time the term has developed
“a whole range of useful meanings, not all of which are compatible with each other”
this semantic and conceptual quagmire. The literature on ideology is vast, spanning
many different fields and research traditions, so what follows is perforce a selective but,
16
for present purposes hopefully, functional treatment (for surveys, see, for example,
Eagleton, 1991; Thompson, 1984; and van Dijk, 1998). Woolard and Schieffelin (1994)
point to two basic divisions in the study of ideology, which are also applicable to the
study of language ideology. The first concerns the truth value of ideology and
differentiates between neutral and critical uses of the term, whereby neutral views of
(i.e., “aspects of representation and social cognition with particular social origins”). This
system, and operated by every member or actor in that system” and ideology as “a
specific set of symbolic representations […] serving a specific purpose, and operated by
specific groups or actors” (2005, p. 158, italics in the original). Recognizing this basic
division between “the wider and narrower senses of ideology”, Eagleton (1991, p. 3), on
the other hand, points also to interpretations of ideology as “illusion, distortion and
mystification” (i.e., concerned with the truth value of ideology) and those instead
“concerned with the function of ideas within social life” (i.e., the metapragmatics of
ideology). Most authors seem to agree that there is a further general distinction to be
made between the competing views of ideology: that between views of ideology as a
existing below the level of discursive consciousness, and those which see ideology as a
(Althusser, 1971).
17
This latter point is particularly important for the second division in the study of
what they call “the siting of ideology”. In their own words, “[a]lthough ideology in
[which need not be linguistic] not in consciousness but in lived relations” (Woolard &
metapragmatic commentary, cf. Silverstein, 1993), or they must be inferred from the
implications for the study of ideology as well as language ideology which are most
pertinent for this dissertation; I shall therefore return to this issue toward the end of this
chapter.
different fields, research traditions, and political positions noted above, definitions of
ideology abound. I rely here on the account provided by Terry Eagleton in his widely
makes a distinction between “descriptive”, “pejorative” (cf. neutral vs. critical above) and
18
meanings and beliefs which is to be viewed critically or negatively” because it
legitimates an unjust social order; and “positive” definitions of ideology as those that
view ideology as “a set of beliefs which coheres and inspires a specific group or class in
the pursuit of political interests judged to be desirable” (Eagleton, 1991, pp. 43-44).
Woolard (1998) cites the anthropologist Clifford Geertz and sociologist Karl
ideology as the totality of social knowledge and a medium of meaning for social
purposes. But, as she also notes, the chief criticism of such conceptions of ideology is
that they neglect power relations, which is precisely the focus of the “pejorative”
definitions. Perhaps the most widely cited, but now also most widely rejected pejorative
consciousness”, originally put forward by Karl Marx and Friedrich Engels in their book
acquiescence on the part of the proletariat to the bourgeois hegemony (Gramsci, 1971)
such that members of the working class are unable to identify their true class interests due
distorting those relations.” Eagleton finally cites Lenin’s approval of the term “socialist
ideology” as well as its acceptance by other “radical” theorists such as Sorel and
“pejorative” and “positive” conceptions of ideology derive from the Marxist tradition and
the work on the political left, the “neutral” conceptions of ideology are ascribed to the
19
work deriving from the neo-Kantian tradition in philosophy and Durkheimeian tradition
in sociology (Blommaert, 2006a). Importantly for the present purposes, however, one
should note that what is understood as “an unjust social order” and “political interests
judged to be desirable” will, of course, depend on one’s ideological position and thus
identifies four such strands which can be summed up as follows: (1) ideology as
ideational or conceptual, mental rather than social phenomena; (2) ideology as reflective
life; (3) ideology as signifying practices in the struggle for power; and (4) ideology as
distortion which can, but need not, derive from an interest in the legitimation of power.
proceeding from the very general and neutral definition of ideology as “the whole
complex of signifying practices and symbolic processes in a particular society” (p. 28) to
the specific and critical definitions of ideology as “ideas and beliefs which help to
legitimate the interests of a ruling group or class specifically by distortion” (p. 30), and
similarly false beliefs arising from the material structure of society rather than the
other hand, points to the synchronization of different layers of ideology, i.e., “different
ideologies […] operat[ing] at different levels of historicity” (p. 175) but at the same
20
historical moment, as the likely cause of the “terminological muddle” (p. 161).
consequence of the concurrent and, more often than not, opaque operation of multiple
ideologies of different orders and varying historical trajectories. But perhaps the most
important, if also the most general, conceptualization of ideology for the purposes of this
dissertation is that of ideology as the fundamental element in the triumvirate it forms with
discourse and power (e.g., Blommaert, 2005; Eagleton, 1991), which points to the
noted, the leading early twentieth century American anthropologists and linguists such as
Franz Boas, Leonard Bloomfield, and Edward Sapir did consider the issue of language
is the exception here), the emergence of the study of language ideology is now commonly
dated back to his own work, and his seminal 1979 paper “Language structure and
1998). Kroskrity (2004) outlines briefly but usefully the history of twentieth-century
both anthropology and linguistics, pointing to the work in the fields of ethnography of
Hymes and John Gumperz as an important precedent for the later work in language
21
“scientism”, itself an ideology) and in contrast to ideology, then, language ideology as a
subfield of academic inquiry emerged only in the last two decades of the twentieth
century and is “still under construction” (Blommaert, 2006a). Furthermore, it does not,
as Woolard and Schieffelin (1994) noted, yet have a single core literature, although one is
(see essays in Gal & Woolard, 2001; Schieffelin, Woolard & Kroskrity, 1998, 2000;
Wilce, 2000). There has also been a parallel and related research program in the
framework of critical discourse analysis, which has examined the role of language in
Blackledge, 2005; Blommaert, 2005; Wodak, 2012; van Dijk, 1998), as well as a number
of contributions from applied linguistics generally and language policy and planning in
particular (e.g., Blackledge & Pavlenko, 2002; Blommaert, 1999; Jaffe, 1999; Lippi-
multifarious as research into ideology itself, similarly spanning many different fields and
research traditions. Woolard and Schieffelin (1994) and Woolard (1998) provide wide-
ranging overviews of some of the most important work in the contributing fields and
colonial studies, sociology, and, perhaps most importantly for present purposes, language
policy, language politics, and studies of identity and nationalism. An important, albeit
sometimes fuzzy, dividing line here has been between research foregrounding the
dialectic between language ideology and social life, referred to above as the
22
metapragmatics of (language) ideology, and research foregrounding the dialectic between
language ideology and the linguistic system itself (e.g., Dirven, Hawkins & Sandikcioglu,
2001; Seargeant, 2009; Silverstein, 1979). Note that, on account of my focus on public
discourses and the role of language ideology in the (re)construction and maintenance of
and will not be devoting much attention to the latter, even if the two are certainly
complementary and could be fruitfully combined (for an example, see Irvine & Gal,
2000).
Silverstein (1979, p. 193) originally defined language ideologies as “sets of beliefs about
structure and use.” Most other definitions are less concerned with language structure,
however, while beliefs about language, more often than not, are tacit and can only be
Definitions of language ideology also differ in scope. Rumsey (1990) thus offers a very
about the nature of language in the world” (p. 346). Somewhat more narrowly, Seargeant
understands language ideology as “entrenched beliefs about the nature, function, and
symbolic value of language” (2009, p. 346). Spolsky (2004), on the other hand, focuses
23
ideology above), seemingly eschewing the issue of power relations and the implication
thereof on the understanding of language ideology. Contrary to this trend, and more in
line with my own view, Irvine (1989) sees language ideology as “the cultural system of
ideas about social and linguistic relationships, together with their loading of moral and
political interests” (p. 255), while Errington (2001) makes this even more explicit by
referring to language ideologies as “situated, partial, and interested […] conceptions and
uses of language” (p. 110). Interestingly, although these last two definitions are not quite
neutral in the above sense, exhibiting as they do a critical awareness of the necessary
social situatedness of any ideology, they nevertheless stop short of fully articulating this
particular aspect of the concept. Further, perhaps due to the general poststructuralist
distaste for any, even remotely essentialist notions, there is no mention here of “illusion,
distortion and mystification” or “false consciousness” (but see Spitulnik, 1998, especially
p. 164).
This, however, is not to say that research into language ideology is generally
unaware or neglectful of the power aspect of it. On the contrary, it seems fair to say that
virtually all treatments of language ideology, whatever their foci, pay careful attention to
power and its implications in their analyses. A good example of this is the collection of
(political) debate. This is further illustrated by some of the conclusions reached thus far
“strands” and “layers” (see above), syntheses of research into language ideology, more
2000b, 2004). Kroskrity (2004, pp. 501-509) identified five such levels of organization
24
of language ideology that emerged from the existing literature, three of which clearly
research: (1) group or individual interests (i.e., “language ideologies represent the
perception of language and discourse that is constructed in the interest of a specific social
or cultural group”); (2) multiplicity of ideologies (i.e., “language ideologies are profitably
conceived as multiple because of the plurality of meaningful social divisions”); and (5)
productively used in the creation and representation of various social and cultural
But even after we recognize that despite the apparent definitional shortcomings,
static phenomenon (a set of existing, given beliefs, notions, ideas or conceptions), a fait
accompli as it were. Thus we may know what a particular language ideology is (if, of
course, we can agree on a shared understanding of it), but we are less sure of how it
operates. However, language ideology, as Spitulnik (1998) points out, is both conceptual
understand language ideologies not only as “ideas with which participants and observers
frame their understandings of linguistic varieties” but also as processes through which
they “map those understandings onto people, events, and activities that are significant to
them” (Irvine & Gal, 2000, p. 35), at which point we can finally consider their effects and
consequences.
25
2.3 Empirical Approaches to Language Ideology
This section provides a brief look at the origins and the development of the study
of language ideology, and the theoretical and methodological contexts for the current
investigation. This is followed by a discussion of the gaps in the existing literature and
interdisciplinary and can be classified broadly into one of three main thematic foci:
language ideology and language education (e.g., Hornberger & McKay, 2010); language
ideology and identity, ethnicity, and nationalism (e.g., Kroskrity, 2000a); and language
ideology and social justice (e.g., Blackledge & Pavlenko, 2002). However, it should be
noted that these often overlap as language-in-education policies, for example, can have
implications for ethnic and national identities as well as social justice (e.g., Lippi-Green,
Perhaps due to the influence of (critical) discourse analysis, the theoretical approaches
have been eclectic and often unsystematic, drawing on and combining a wide range of
26
cursory glance at the available literature reveals a heavy reliance on readily available
focusing on language ideology and language education, which rely on surveys, and
methods and data as much as on textual pedagogic materials and policy documents. The
on its thematic focus. In the case of research into language ideology and language
education typical contexts include classrooms as well as educational contexts beyond the
McGroarty, 2010), whereas research into language ideology and identity, ethnicity, and
nationalism, as well as social justice, relies more on politically framed contexts (e.g.,
state or regional and national entities, with some studies taking a comparative, cross-
national approach). Further, unlike the studies of educational contexts, studies with one
of the other two foci strongly favor textual materials such as newspaper discourse and
2009; Ricento, 2003). Other contexts include translation work (e.g., Kuo & Nakamura,
research into public discourses and language ideologies has been limited to institutional
contexts of one kind or another (e.g., educational system). The primary reason for this, of
course, is that language ideologies are sociocognitive phenomena and so, as noted above,
institutions (whether cultural, political, or religious) as major discourse nodes are more
27
likely sites to contain traces of ideologies but also to have an impact on their
reproduction. And yet the problem with an exclusive focus on institutional contexts is
that they tend to be dominated by official, top-down discourses and ideologies, often to
hegemonic, discourses and ideologies. More research is therefore needed into contexts
which are more likely to show traces of unofficial or alternative discourses and ideologies
such as, for example, (subaltern) language activism (Jaffe, 1999), and online reader
commentary (Vessey, 2013b) and debates in cyberspace (i.e., online fora and social
networks, e.g., Johnson, Milani & Upton, 2010). A note of caution is in order here,
however: although anonymized online communication has the (often dubious) advantage
of being free from many constraints of face-to-face interactions (in addition to ease of
access) and thus can offer an insight into attitudes and beliefs devoid of certain pragmatic
should be treated with caution and matched against data from other sources whenever
possible.
the research traditions in the contributing disciplines (i.e., anthropology, critical discourse
analysis) and partly on account of the theoretical and methodological nascence of the
field, research questions in the existing studies are not always stated explicitly and can
thus be implicit or even difficult to identify. The two early foci on language attitudes and
2007; Vessey, 2013a; O’Rourke & Ramallo, 2013); other contributions have attended to
28
the specific role of public institutions in the production of language ideologies (Spitulnik,
2003; Salama, 2011), as well as migration and diasporic communities (Baker et al., 2008;
Fraysee-Kim, 2010).
so here again we can discern some major trends. Earlier studies, which relied more on
tend to ask two types of questions both of which often include a methodological
component. Studies based on ethnographic and historiographic analyses thus ask macro-
level questions such as the following: What is the structure of language ideology? What
are the consequences (for politics, for research) of language ideologies? (Irvine & Gal,
2000); How have ideologies of language, nation and state been connected to each other
and the practice of sociolinguistics? (Heller, 1999). Studies based on discourse analysis,
on the other hand, tend to focus more on micro-level or localized issues, asking questions
such as: What discourse prosodies (and ideologies) do the term “liberal” and its
derivatives “liberalism” and “liberalization” have? (de Beaugrande, 1999); and, What is
the role of language ideology in the conflict of Catalan and Spanish nationalisms over the
Blommaert and Verschueren (1998), for example, rely on discourse analysis but ask
ideologies? What is an adequate methodology for ideology research?), while Jaffe (1999)
29
relies mainly on ethnography to ask both local and more global questions (What are the
and critical discourse analysis (CDA). Also here, there is often a methodological
studies include: What are the similarities and differences between Anglo-American and
German discourses of “political correctness”? (Johnson & Suhr, 2003); What are the
migrants in the British press? Which texts are representative? (Baker et al., 2008); and,
Can the Canadian context provide a useful site for cross-linguistic corpus-assisted
discourse studies (CADS)? How can cross-linguistic CADS shed light on language
Overall, then, research into language ideology seems to be moving away from
contexts. Most pertinently for the field of language policy and planning (LPP), there is
although the question of the siting of ideology (Woolard & Schieffelin, 1994) is of central
30
importance to the identification and interpretation of language ideologies, language-
ideological research has so far largely failed to produce integrated accounts which would
examine and compare data from different sites of ideological (re)production (but see
Blackledge, 2005; Jaffe, 1999; and Vessey, 2013b for steps in this direction). At the same
time, the availability of data from various social media, as well as academic sources,
offers an opportunity for more integrated and innovative perspectives to consider and
discursive consciousness.
2.3.3 Types of data used in language ideology research. The different types of
data typically relied upon in language-ideological research have been hinted at above.
Depending on the methods used, they can be qualitative or quantitative and include
survey, observational, and experimental data, as well as data in the form of pedagogic
materials, official policy documents, newspaper and other media discourse, and historical
documents. In addition to these, some data types that have received comparatively less
attention include texts produced under experimental conditions (Wallis, 1998), time
1998), as well as data obtained by way of discussion groups (O’Rourke & Ramallo,
2013).
language ideology research. The reasons for a relative overemphasis on newspapers and
other print periodicals such as magazines over other potential sources of data are several.
First, and perhaps foremost, as already noted, “newspapers are self-conscious loci of
ideology production” (DiGiacomo, 1999, p. 105) as well as “key sites for language
31
ideological debates between various kinds of social actors” (Ensslin & Johnson, 2006, p.
155). Second, although they have been losing ground to newer media (e.g., television,
the Internet) for a long time, newspapers remain an influential institution in most
easily accessible and manipulable format (i.e., electronic text). Third, because of the
but often also languages, newspaper data allow for effective synchronic comparisons of
discourses and ideologies based on independent variables such as political affiliation and
(ethno) national identity. Fourth, the relative constancy of newspaper formats across time
allows for equally effective diachronic comparisons, which are particularly useful in
demonstrating the changing nature of discourses and ideologies over time and their
dialogic relationship with the cultural, social, and economic conditions in which they are
embedded.
Although there is a growing body of research that relies on corpus linguistics to study
discourse (e.g., Baker, 2006; Baker et al., 2008; Baker, 2010; Baker, Gabrielatos &
McEnery, 2013; Mautner, 2007; Partington, 2003; Partington, 2010), corpus-based (or
still rare (e.g., Fitzsimmons Doolan, 2009, 2011, 2014; Ensslin & Johnson, 2006; Freake,
discourse and language ideology studies make use of a wide variety of corpus-linguistic
tools such as wordlists, clusters, concordances, dispersion plots, collocates, and keywords
(for an excellent introductory overview, see Baker, 2006). Keyword analysis, a statistical
32
approach to contrastive analysis of word frequencies (for a detailed explanation, see
further below) is used to identify the lexical features characteristic of research corpora
and thus potentially interesting foci for follow-up discourse analysis. Scott (1997), for
example, uses this approach to identify lexical patterns suggesting gender bias in
university instructors with US and Korean names, finding a bias against the Koreans
individual keywords has been broadened in recent years to include part-of-speech (POS)
“aboutgrams” (Sinclair, 2006, cited in Warren, 2010, p. 118), the frequently occurring
lexical phrases which point to a text’s (or a corpus’) “aboutness”. While keywords point
to the “aboutness” of a text or corpus, key semantic fields and aboutgrams are taken to be
more directly suggestive of discourses and ideologies extant in the corpora under
investigation.
collocation analysis as well. Baker (2004, p. 347) argues that keyness and collocation
patterns can alert the researcher to “the existence of types of (embedded) discourse or
ideology.” Baker et al. (2008), for example, base their examination of the lexical patterns
around four core concepts (i.e., search terms) in their study of the discourse on
Baker, Gabrielatos and McEnery (2013) take this approach a step further by using an
online corpus query system called Sketch Engine to grammatically tag their corpus and
then analyze not only lexical but also grammatical co-occurrence patterns between
33
collocates. Similarly, Freake, Gentil and Sheyholislami (2011) rely on collocation
similar approach to study the language ideological debate on the use of French during the
Vancouver Olympics. Most recently, lexical patterns resulting from collocation analysis
Doolan (2011, 2014) used factor analysis, a multivariate statistical technique which
groups variables on the basis of their covariance (for details, see further below), on the
collocates of three core concepts identified using a 1.4 million-word corpus of language
patterns in the data which are worth pursuing further as well as to “downsample” the
analysis, the results of which are then in turn checked for reliability using quantitative
methods (Baker et al., 2008, p. 295; Fairclough, 2010; Partington, 2010; van Dijk, 2006;
Wodak, 2001; for examples, see Mautner, 2007 and Vessey, 2013a).
2.4 Gaps
This study aims to address several theoretical and methodological gaps in the
literature. It is well known that corpus linguistics research, including studies of discourse
and ideology, was developed in English and has also disproportionately focused on
English and a small number of major European languages. At the same time, extensive
34
inflectional morphology presents a number of corpus-linguistic challenges that are largely
absent from studies of languages such as English. Therefore, the first gap has to do with
the paucity of corpus-based research into smaller languages such as Central South Slavic
The second gap is related to the first and has to do with geopolitical focus. Much
of recent research has examined language debates (e.g., Blommaert, 1999) and the links
societies (e.g., Canada, Vessey, 2013; Finland, Hult & Pietikäinen, 2014; Spain/Catalonia,
Pujolar, 2007), but little attention has been paid to contexts with minimal linguistic
languages (but see Wilce, 2010). This study will therefore contribute to the growing
analysis, exploratory factor analysis) have been used in discourse and ideology studies,
there have as of yet been no attempts to compare them. So, this study will compare and
contrast the results obtained through the application of these methods in terms of their
usefulness and effectiveness for the study of language-related discourses and language
ideologies.
Finally, the fourth gap has to do with synchronic variation between different
discursive sites. This study will therefore compare language-related discourses and
35
newspaper articles (written by journalists and/or experts) with letters-to-the-editor
3. Study Overview
This chapter presents the research questions, research design, construct definitions
(keyword, collocation, exploratory factor, and cluster analyses) be used to identify lexical
Central South Slavic and what similarities/differences are there between them?
3.1.3 Research question 3. What links can be identified between the language-
Table 1 presents the overall research design employed in the study. Data,
methods, and analytical procedures are listed by research question. Figure 1 shows a
36
Table 1
37
Corpus compilation
(relevant publications)
Sampling phase I
Sampling phase II
Corpus comparisons >
SERBCOMP
Exploratory factor
analysis
Cluster analysis
38
3.3 Construct Definitions and Operationalizations
This section presents the definitions and operationalizations of the key constructs
3.3.1 Core concepts. The core concepts (lemma JEZIK ‘language’ for the purposes
of corpus compilation and initial collocation analysis, but also lemmas BOSANSKI JEZIK
the purposes of follow-up analysis) are simply those lexical items and phrases whose
patterning in the corpus is most likely to lead to the identification of dominant language-
related discourses and language ideologies relevant to the maintenance and (re-)
3.3.2 Keywords. Here I use Scott’s (1997, p. 236) definition of a key word “as a
word which occurs with unusual frequency in a given text.” More importantly, keywords
are understood here also as “pointers to complex lexical objects which represent the
3.3.3 Relevant collocates. The relevant collocates are all lexical and some
function words (e.g., possessive pronouns) that are shown to be statistically significant
collocates of the core concepts and which, upon further analysis (i.e., concordancing,
discourses and language ideologies. Although such collocates are determined by setting a
search span around a core concept (e.g., L5-R5), relevant collocates are defined here also
39
more broadly as “textual collocates” (Mason & Platt, 2006, cited in Stubbs, 2010, p. 27)
such that all their textual occurrences are counted in each text (for purposes of EFA) and
not only those that appear within a certain span of the node word.
with a range of divergent meanings in social sciences and the humanities. The traditional
definition of discourse in linguistics is simply “language above the sentence or above the
clause” (Stubbs, 1983, p. 1) or “language in use” (Brown & Yule, 1983). Michel
which systematically form the objects of which they speak” (1972, p. 49). It is this
difference that prompted James Gee (2010, p. 34) to make a distinction between
discourses with a small ‘d’ (“language-in-use”) and discourses with a big ‘D’ (language-
in-use plus “socially accepted associations among ways of using language”) which is
production, distribution and consumption), and social practice (e.g., 2010, p. 59). In
addition, as can be seen from Gee’s definition, it is possible to conceive of discourse, not
as a more or less coherent product of social and semiotic “practices” and thus a singular
therefore a plural noun. Although this plural understanding of discourse has gained
currency in both social sciences and the humanities, more often than not it is implicit and
left undefined. Most pertinently for my purposes here, Baker (2006) draws on several
sources to expand upon the original Foucault’s definition and add a plural dimension to
40
it, so I reproduce his account at length here,
Taking this definition as a starting point, discourses are understood here as more
or less coherent systems of statements which construct an object of which they speak
(e.g., language) or an aspect thereof from a particular social or cultural position with the
goal of upholding the interests associated with this position. A discourse may be said to
between social subjects (Fairclough, 2010). Depending on their social effects, discourses
defined which pertain to language rather than other aspects of social reality. They are
operationalized as a) individual factors resulting from factor analysis, and sets of factors
clustering together (small ‘d’ discourses), and b) broader sets of statements about
linguistic and social relationships that extend across factor, cluster, and textual
contested concept with a range of divergent meanings in the humanities and social
sciences (e.g., Eagleton, 1991). Following Irvine (1989, p. 255), language ideologies are
41
understood here as “the cultural system[s] of ideas about social and linguistic
relationships, together with their loading of moral and political interests.” More
specifically, “language ideologies represent the perception of language and discourse that
(Kroskrity, 2000b, p. 8, my emphasis). Also here, the use of the plural reflects a concern
with different social positions from which language ideologies emanate as well as the
ideologies are understood to be hegemonic, i.e. espoused from positions of power for the
identified here.
problematic concept and a complex social phenomenon that defies simple definitions. To
make things even more difficult, definitions and understanding of nationalism depend on
concepts in relation to language, see, e.g., Barbour, 2000; Fishman, 1997; Fought, 2006;
May, 2001; Safran 1999). However, theorists often distinguish between ‘civic’ or ‘state’
and ‘ethnic’ nationalisms, i.e. nationalisms defined by affiliation with nations and nation-
states which are not necessarily culturally homogeneous (cf. Staats- or Willensnation; for
a discussion, see Wodak, de Cillia, Reisigl & Liebhart, 1999) and nationalisms defined by
42
cultural groups that have a common myth of origin and share history, basic cultural
practices, and often language and religion (cf. Kulturnation, ibid.). Historically, ethnic
nationalism has clearly been the more important of the two in the Balkans (Carmichael,
movement to define and pursue the political interests of a nation (whether self-declared
Each text in the 5+ hits section of SERBCORP was coded for: a) publication
(Blic, NIN, Politika, Vreme); b) year of publication (2003, 2004, 2005, 2006, 2008), and
4. Data
one or more instances of any of the lemma forms of the word jezik ‘language’ from four
leading national newspapers, two dailies (Blic, a tabloid, and Politika, a broadsheet) and
two weeklies published in Serbia (NIN, Vreme) in the period between 2003 and 2008.16
The publications were chosen based on three criteria: type of publication (broadsheets vs.
tabloids, dailies vs. weeklies), circulation figures,17 and relative standing in the Serbian
and regional publics (for details about the Serbian media market, see Đoković, Hrvatin &
Petković, 2004).18 Because full data sets were only available for the period between 2003
and 2008 (with the exception of Politika for the year 2007),19 the data set is limited to the
43
years 2003-2006 and 2008. Similarly, SERBCOMP, the reference (or, rather,
comparator) corpus used here, comprises articles from Politika, Blic, NIN, and Vreme, as
well as Večernje Novosti, published in the period between 2003 and 2014.
The two corpora were compiled by downloading the relevant articles from the
Serbian online media archive Ebart (www.arhiv.rs)20 as follows. After the target
publications had been identified, publication-specific searches were run for articles
containing any of the inflectional forms of the core concept lemma JEZIK ‘language’21
(and, perforce, the lemma JEZIČKI ‘linguistic’) by using the search term “jezi*” and the
given timeframe. Using a custom Python application, relevant articles thus identified
were then automatically downloaded, formatted22 and saved in separate folders according
2006>July). The application also automatically named the files according to publication
(e.g., POL for Politika), date of publication (e.g., POL-22-7-2006 for July 22, 2006), and
download rank for their given month (e.g., POL-22-7-2006-55 for the 55th article
downloaded from Politika for July 2006) or publication (in the case of publications with
Similarly, using the search term “NOT jezi*”, SERBCOMP was compiled from randomly
chosen articles not containing any forms of either one of the two core concept lemmas
(JEZIK ‘language’ and JEZIČKI ‘linguistic’). Once compiled, both corpora were checked
for errors and duplicates. Finally, a frequency-based wordlist was used to identify and
exclude from SERBCORP a small number of articles which formally met the search
criteria (“jezi*”) but were nevertheless irrelevant (e.g., those containing words such as
jezičak ‘little tongue’, ježičak ‘little hedgehog’ and jezivo ‘horrible’, or last names such as
44
Ježić but not forms of the lemmas JEZIK or JEZIČKI).
majority of both (49.88% of words and 61.48% of articles) coming from the daily
Politika as the oldest and arguably most influential daily in Serbia (Tables 2 and 3).
(NIN, Vreme), while standardized type-to-token ratios (Scott, 2014a) are similar across
Table 2
Table 3
SERBCOMP comprises a total of 22,493,804 words from 37,227 articles from all five
publications from the period between 2003 and 2014, as mentioned above (Table 5).
45
Table 4
Table 5
comprising articles with 5 or more hits for the lemma JEZIK ‘language’ as the optimal
research corpus for present purposes (see Table 6 for a breakdown of articles by hit count).
Table 6
Articles in SERBCORP (by Hit Count for the Lemma JEZIK and Percentage)
The 5+ hits section of SERBCORP thus comprises a total of 1,118,454 words from 1,257
articles, with a majority of both from Politika (52.62% of words and 67.38% of articles,
see Tables 7 and 8). Similar to SERBCORP, the dailies contributed larger numbers of
46
shorter articles while standardized type-to-token ratios remain similar (Table 9).
Interestingly, the total number of 5+ hits articles decreased during this period in a linear
fashion (Figure 2), suggesting a gradual focus away from an explicit thematization of
language, arguably owing to changing sociopolitical circumstances (see Section 6.6). All
Table 7
Table 8
Number of Articles in the 5+ Hits Section of SERBCORP (by Year and Publication)
Table 9
Articles Means, SD, and STTR in the 5+ Hits Section of SERBCORP (by Publication)
47
400
297
273
No. of articles
300 239 235 213
200
100
0
2003 2004 2005 2006 2008
Year of publication
5. Methods
and critical discourse analysis (CDA) in a manner similar to that originally proposed by
Baker et al. (2008). The initial, largely quantitative phase relies on five distinctly
analysis of variance, and cluster analysis. All quantitative analyses were conducted with
the help of WST and the Statistical Package for the Social Sciences 21.0 (SPSS; IBM,
2012), as well as several custom Python and PERL applications. The follow-up, largely
qualitative phase in turn relies on analytical techniques developed within the discourse-
historical approach (DHA) to CDA (Reisigl & Wodak, 2009; Wodak, 2001). It should be
noted, however, that the research design is not purely sequential (quantitative-to-
qualitative) but rather hermeneutic (i.e., moving between quantitative and qualitative
techniques as necessary, cf. Baker et al., 2008; Reisigl & Wodak, 2009) as results of both
quantitative and qualitative analytical procedures are examined from both perspectives
and thus further focused and refined. Following is a discussion of the theoretical
48
background and a step-by-step explanation of the relevant parameters and procedures
has mostly relied on keyword analysis (in addition to basic corpus-linguistic techniques
such as frequency, concordance, and collocation analysis, see below). Keyword analysis
has thus been used in a wide variety of discourse studies (see, for example, the essays in
Bondi & Scott, 2010) to identify what characterizes a certain text or corpus, as well as to
look for differences between parallel texts or corpora. The goal of keyword analysis
(Scott, 1997) is the identification of words “which occur with unusual frequency in a
given text [or corpus]” (p. 236), i.e. lexical features characteristic of research corpora and
reference corpus in addition to a research corpus and can be carried out automatically
using WST. An appropriate reference corpus should be composed of texts in the same
language as the research corpus, and is typically expected to be larger than the research
corpus, although what its optimal size may be is as of yet unclear (Scott, 2009, 2010).
The reference corpora of choice have often been large general corpora (i.e., those
comprising different registers) such as the BNC. However, in the absence of such
reference corpora (e.g., for languages other than English) and depending on research
questions, comparator corpora (corpora of similar size and register as the research
corpus) have been used. Examples include corpora compiled from texts on the same
topic reflecting different political or other orientations or corpora compiled from the same
types of texts excluding those with the same focus as the research corpus (see, for
49
example, Baker, 2006; Subtirelu, 2015; Vessey, 2013b). Once wordlists for both the
research and reference/comparator corpus have been compiled, keyword analysis uses
frequency and the number of running words in the research corpus with its observed
frequency and the number of running words in the reference corpus (Scott, 2014a). This
procedure determines which words appear statistically more (or less) frequently in the
reflected in a keyness score which is based on the statistic chosen (i.e., chi-square or log-
likelihood). A list of keywords (KWs) calculated for a corpus thus suggests the
“aboutness” of that corpus, i.e. what a corpus is about. KWs can be positive (when they
are significantly more frequent in the research as compared to the reference corpus) or
negative (when they are significantly less frequent in the reference corpus). Whereas
positive KWs suggest what a corpus is about, negative KWs can be used as an indicator
of what may be missing from it. Finally, the resulting KWs can be grouped into semantic
fields intuitively by the researcher in order to identify any patterns for further analysis
Keyword analysis has been the object of widespread criticism on several grounds
and particularly for its dependence on the size and type of reference corpus chosen, as
well as the choice of statistic (i.e., questionable reliability). Some researchers argue that
larger reference corpora are generally better (e.g., Scott, 2010), while others have
suggested that the optimal size for a (specialized) reference corpus may be five times the
size of the research corpus (Berber Sardinha, 1999, 2004). In contrast, Xiao & McEnery
50
(2005, p. 70) contend that “the size of the reference corpus is not very important in
making a keyword list,” particularly when dealing with sufficiently large corpora (cf.
Scott & Tribble, 2006, p. 64). Similarly, despite the wide reliance on large general
corpora, Culpeper (2009, p. 35) argues that it is better to use a reference corpus that is as
close as possible to the research corpus since this approach to keyword analysis avoids a
focus on irrelevant stylistic differences between registers and is more likely to produce a
keyword list which “reflect[s] something specific to the target [i.e., research] corpus.” At
the same time, as Rayson (2008, p. 527) notes, because of the independence assumptions
built into the procedure there should be no overlap between the research and reference
corpora. Finally, although Scott (2014a) suggests that the chi-square test “gives a better
estimate of keyness” in longer texts or entire corpora than the log-likelihood, Culpeper
(2009, p. 36) found that the two tests produce “only minor and occasional differences in
reference corpus (see Appendix B for details of the procedure), several separate KW runs
were performed (retaining the parameters detailed above). First, to get a discursive
profile of the research corpus, the 5+ hits section of SERBCORP was compared to
SERBCOMP (the top 50 positive and all negative KWs are shown in Tables 12 and 13;
the full list of positive KWs is presented in Appendix D). Second, to get a discursive
profile of the 5+ hits section of SERBCORP vis-à-vis the 1-4 hits section of SERBCORP
and further test the validity of the sampling criterion (see discussion in Appendix A), the
5+ hits section of SERBCORP was compared to the 1-4 hits section of SERBCORP (the
top 50 positive and all negative KWs are shown in Tables 14 and 15; the full list of
51
positive KWs is presented in Appendix E). Third, KWs in the 5+ hits section of
SERBCORP were organized into semantic domains on the basis of their prevalent
meanings in this section of the corpus, as attested by concordance lines (Table 14).
Fourth, using WST a KW database was compiled to calculate key-KWs (KKWs, KWs
that are key in several texts) and KKW associates (KWs appearing in the same texts as
frequency = 2; minimum number of texts for database = 3; statistic for the calculation of
associates = MI3 (≥ 3); minimum number of associate texts for database = 3; and
minimum number of KWs per text for database = 3. Fifth, KKWs potentially related to
ethnolinguistic identities (e.g., glottonyms and ethnonyms) were identified and their
associates examined (Table 15). Sixth, and last, the KKW equivalents of the highest-
loading items from all 12 factors resulting from EFA (see Section 6.3) were examined for
associates in order to compare the results of keyword analysis associates procedure and
analysis examines the co-occurrence patterns between words and does not require a
reference corpus. The strength of association between two words is measured by various
statistical techniques such as the t-test, and z- and mutual information (MI) scores
(McEnery, Xiao & Tono, 2006). MI score, the preferred technique in analyses focusing
the two words together with the probability of observing each word independently, based
on the frequencies of the words” (Biber, Conrad & Reppen, 1998, p. 266). A score of 0
52
means that there is no association between the words, while a score higher than 0
71). Unlike keyword analysis, which represents a more general lexical (and discursive)
words are used in a corpus. Such patterns can be suggestive of particular discourses and
underlying ideologies as “[n]o words are neutral [and] [c]hoice of words represents an
Further, in line with the recent shift in focus in corpus linguistics and applied
linguistics research generally to phraseology (see, for example, Biber, Conrad & Cortes,
2004; Gray & Biber, 2013; Chen & Baker, 2010), corpus-based research into discourses
and ideologies has examined n-grams (also known as lexical bundles or clusters, i.e.
‘language and literature’; see Cheng & Lam, 2013 for a discourse analysis application).
can be more informative in semantic (and discursive) terms than individual collocates
considered in isolation.
separately on SERBCORP and the 5+ hits section of SERBCORP. It was conducted with
the help of the ‘concordance’ tool in WST, using the span of five words to the left of the
node word (lemma JEZIK ‘language’) and five words to the right (L5-R5), and cutoff
points for item frequency (≥ 20), number of texts (≥ 20), and strength of association (MI
≥ 5). Although these cutoff points are somewhat arbitrary, they ensured that the analysis
53
produced a manageable number of significant collocates that are sufficiently well
distributed throughout the corpus (cf. Biber, 1993). In the next step, the collocate lists
thus produced were scanned for the presence of irrelevant items such as function words, a
small number of which was then deleted from both lists.23 The results of collocation
analyses are presented in parallel lists ordered by frequency and MI score. The full lists
of the lemma collocates of JEZIK in SERBCORP (again, by frequency and MI score) are
shown in Appendix F (Tables F1 and F2). The top fifty lemma collocates of JEZIK in the
5+ hits section of SERBCORP (by frequency and MI score) are shown in Tables 17 and
18. The full list of collocates for the 5+ hits section of the corpus is given in Appendix G
(Tables G). (Note that these are the collocates that were used in EFA.) Finally, n-gram
analysis was conducted using the ‘clusters’ function in the ‘concordance’ tool in WST
(not to be confused with cluster analysis discussed in Section 6.5) The parameters used
were 2-6-constituent n-grams (to cover a wide range of frequently occurring phrasal
patterns), with a minimum item frequency of five in the span of five words to both left
and right of the node word (L5-R5); analysis was conducted separately for each of the
forms of the node lemma JEZIK ‘language’. A sample of the most frequent n-grams in the
statistical technique which groups variables into sets (called factors) based on their
covariance (for a detailed explanation of EFA, see Tabachnick & Fidell, 2007). It is
particularly useful for explorations of large data sets with numerous variables because it
can suggest patterns of variation and thus constructs underlying multiple variables, which
54
makes interpretation of complex patterns of variation possible. The application of EFA in
linguistics was pioneered and popularized by Douglas Biber (e.g., 1988, 2006), whose
analysis, MD) has had a significant impact on the study of grammar as well as
composition pedagogy, second language acquisition, and other related areas. Although
applied linguistics (see, for example, the papers in Cortes & Csomay, 2015), it has not,
with one exception, been used in studies of discourse and ideology. Fitzsimmons Doolan
lexical rather than grammatical features. In her study of language ideologies in the
educational sphere in Arizona, she compiled and analyzed a corpus of official language
policy documents to identify the collocates of the core concepts language, literacy and
English. In the next step, she counted the frequencies of these collocates in all of the
texts in her corpus and then subjected those counts to EFA. This resulted in five factors
language-related beliefs and attitudes existing in the social realm. Similar to EFA, cluster
analysis is a multivariate statistical technique which can be used to group objects or cases
such as individual texts within a data set. It has been recommended as a follow-up
procedure to EFA because of its ability to identify hitherto unidentified patterns in data
(Biber & Staples, in press). In this study, it is used to explore the differences and
55
(publication, year of publication, and article vs. letter-to-the-editor).
based on Biber (1988) and Fitzsimmons Doolan (2011, 2014). However, instead of
limiting the collocates used in the analysis to those occurring in the premodifier position
(i.e., L1) as in Fitzsimmon Doolan (2011, 2014), all 305 collocates of the core concept
JEZIK ‘language’ in the 5+ hits section of SERBCORP were included, regardless of their
syntactic function or position vis-à-vis the node (for collocation analysis parameters and
procedures, see Section 5.2.2 above; for a full list of collocates, see Appendix G). Also, it
is important to reiterate at this point that collocates identified through collocation analysis
(i.e., micro-collocates) have a broader definition in their use in EFA as they are
considered and counted even when they occur outside of the ‘horizon’ of five word-slots
to the left or right of the node word (cf. macro- or textual collocates, Mason & Platt,
2006, cited in Stubbs, 2010, p. 27). Put simply, all textual appearances of the relevant
collocates are counted in each text to be included in the analysis and not only those that
appear within a certain span of the node word as is normally done in collocation analysis.
After the list of relevant collocates had been identified, a custom PERL program
was used to count (and normalize to a text length of 1,000 words) the frequency of each
counts were then inputted into SPSS, to check for assumptions and factorability (using
The data were first checked for multivariate outliers by examining each text’s
score on the Mahalanobis variable. With α = .001 and df = 306 (the number of variables),
56
the critical value of χ2 was 388.178; 314 texts had values in excess of the critical value
and were therefore excluded from further analysis.24 The remaining steps in the
procedure were thus performed on the resulting smaller data set (n = 943). The deletion
of multivariate outliers also resulted in the removal of all occurrences of the variable
jednom ‘once’, so it too was removed leaving the number of collocate variables at k =
305. Next, assumptions for factor analysis were checked. This was done by first
tolerance, condition indexes, and variance proportion items. Singularity was found for
two items, Monte and negro,25 so negro was excluded from further consideration, leaving
the number of variables at k = 304. Normality and linearity were not examined because
Once it was determined that the data set met assumptions, principal factor
analysis was run using principal axis factoring (n = 304, k = 943). To assess the
factorability of the data set, the correlation matrix (several bivariate correlations were ≥
.30), KMO value (middling at .648), and Bartlett’s test of sphericity (significant, χ2
[46665] = 100910.536, p < .000) were examined. Based on these results, the data set was
considered to be factorable.
The number of factors was determined by examining a) the scree plot (which
seemed to flatten out between Factors 12 and 14), and b) the number of factors with
initial eigenvalues over 1.0 (108) and 2.0 (30), neither of which was considered
parsimonious; the number of factors with initial eigenvalues over 3.0 was twelve. The
range of solutions between 12 and 14 factors was next explored using the Varimax
rotation.26 Fewer than five variables loaded highly on the thirteenth and fourteenth
57
factors of the thirteen- and fourteen-factor solutions, so those two solutions were
were represented by at least 6 salient loadings (≥ |.30|), a large number of variables had
communalities lower than .2, while the solution accounted for only 17.46% of the total
variance in the data. To determine the optimal factor solution, a series of rotations was
each step.
The preferred solution was the twelve-factor solution with k = 107 collocate
variables, which a) accounted for the most total variance in the data (34.13%), b)
produced factors consisting of positively loading variables only, c) did not include any
item communalities < .2, and d) had the highest KMO value (.801). This factor solution
was further assessed for internal consistency, which was measured by examining the
Cronbach’s alpha of all items loading highly on each factor. The internal consistency
analysis produced the following results: Factor 1 (α = .864), Factor 2 (α = .768), Factor 3
Factor 12 (α = .554).
The stability of the solution was investigated by comparing the factors and items
with salient loadings on those factors between the different rotations. All factors
appeared in all rotated solutions (with minor differences in composition and order), while
Factor 7 changed in the preferred solution from a factor consisting mostly of negatively
loading variables to one in which the same variables all had positive loadings.
58
the collocate variables with salient loadings (≥ .30) on each factor individually. To
identify texts representative of each factor, factor scores were estimated for each text
using regression analysis. This was followed by a qualitative analysis of texts with top
However, it should be noted here that a separate analysis suggested that texts
initially identified as multivariate outliers (i.e., texts with ‘extreme’ scores on multiple
variables) would be among the most representative texts for all factors. Factor scores
were therefore also calculated for the full data set (i.e., including multivariate outliers),
this time by first converting the normalized frequencies into z-scores to standardize them,
and then summing the standardized frequencies of all variables with salient loadings on a
factor for a factor score for each text.27 Variables with salient loadings on more than one
factor were only included in the computation of the factor score for the factor on which
they had the highest loading (for a rationale for this procedure, see Biber, 1988, pp. 93-
95). An examination of the highest factor scores based on z-scores revealed that the
multivariate outlier texts indeed had many of the highest factor scores on all factors. A
scores, however, showed that the two methods of computation were highly comparable.
Because these texts are outliers only in an abstract statistical sense and certainly belong to
the ‘population’ of texts sampled here, a decision was made to retain them in the
try to determine whether there are any statistically significant differences between a)
59
different publications in this sample, b) different types of articles approximating different
types of consciousness (see Section 2.4 and discussion at the end of Section 6.3.1), as
well as c) diachronically between individual years of publication over the subject period.
Fitzsimmons Doolan (2011, 2014), for example, has shown that mean factor scores can
be used to compare observations grouped by what she calls ‘registers’ (i.e., text types) in
order to examine any variation between them. Similar to this, I compare publications,
years of publication, and types of articles in terms of how they score on individual
factors, and thus discourses, in an attempt to determine whether there are any significant
factor scores for each of the six selected language-related discourses (i.e., factors) of texts
grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year of publication: 2003,
2004, 2005, 2006, 2008; and c) type of article: general newspaper articles vs. letters-to-
the-editor. The distribution and variation of factor scores for each group of texts were
distribution, histograms and the Shapiro-Wilk test were used. To examine homogeneity
of variance, the Levene test was used. Normality assumptions were violated for all
factors and homogeneity assumptions were violated for all factors except Factor 8 on
statistical comparisons among groups are presented by factor in Sections 6.4.1-6.4.3. For
60
all tests, α = .017 to reflect a Bonferroni adjustment of splitting the standard α = .05 by
three because the same data were used to run three separate analyses.
statistical procedure used to group within a data set cases/observations (e.g., texts)
defined in terms of categorical variables (Biber & Staples, in press). Clustering texts into
cluster analysis is used here to examine the patterning of texts and factors/discourses with
respect to three independent variables (as in the synchronic and diachronic analyses
using the twelve factors and the agglomerative hierarchical cluster analysis (HCA)
method (for a detailed, step-by-step explanation, see Biber & Staples, in press). Once the
optimal number of clusters was determined, a one-way ANOVA was used to compare the
mean scores of the predictor variables (i.e., factors) and check for statistical significance.
Next, the mean scores of the twelve factors for each of the six identified clusters were
examined to determine which factors scored most highly on which clusters. Lastly, the
composition of each cluster was investigated by using the crosstabs function in SPSS and
the three independent variables (publication, year of publication, and general newspaper
61
the imprint of ideological processes and structures.” Although, as he further argues, it
may not be possible “to ‘read off’ ideologies from texts […] because meanings are
produced through interpretations of texts and texts are open to diverse interpretations”
(ibid.; see also van Dijk, 2006), large-scale corpus-based analysis of frequency can reveal
topics and point to the beliefs and assumptions (i.e., ideologies) that underlie them.
importance in the study of language and ideology and can provide empirical evidence
[of] how culture is expressed in lexical patterns.” Corpus-linguistic tools can thus
provide a map of a corpus based on lexical patterns suggesting discourses and underlying
ideologies and “pinpointing areas of interest for a subsequent close analysis” (Baker et
Blackledge & Pavlenko, 2002; Partington, 2010; Ricento, 2006; Vessey, 2013a), corpus
linguistic analysis, powerful though it is, does not in itself constitute discourse or
ideological analysis. Discourses and ideologies do not exist in a vacuum, but rather
‘work’ by establishing links with social structures and practices, as well as by making
requires a social, cultural, historical, and political contextualization of the lexical patterns
With its focus on discourse, ideology and power, as well as a flexible, eclectic
methodology (for an overview, see Wodak & Meyer, 2009), critical discourse analysis
62
(CDA) is ideally suited for such analysis and thus as a complement to quantitative lexical
uncover and expose relations of unequal power in society. In CDA, text is “conceived as
and sociopolitical context,” while discursive and linguistic data are seen “as a social
practice, both reflecting and producing ideologies in society” (Baker et al., 2008, pp. 279-
practice, discoursal practice (text production, distribution and consumption), and text”
(Fairclough, 2010, p. 59). The ultimate goal of CDA, then, is to move from a micro-
but particularly for its methodological shortcomings such as selectivity or potential bias
in data collection procedures, small samples, and a lack of concern for replicability (e.g.,
Blommaert, 2005; Stubbs, 1997). Recognizing this, Baker et al. (2008) have proposed a
whereby the two methodological approaches complement one another and thus cancel out
each other’s limitations. Although such a synergy does not necessarily guarantee
research entirely free of researcher inference (Baker, 2011 cited in Fitzsimmons Doolan
Fitzsimmons Doolan, 2014). In any case, all language use and all analysis are perforce
63
ideological in the sense that, arguably, ideologically neutral positions are impossible, and
as a ‘critical’ approach, CDA has always refused to claim ‘objectivity’ (Fairclough, 2001,
p. 5). Further, as Vessey (2013a) notes, the principle of researcher self-reflexivity (e.g.,
techniques available from the different approaches developed within CDA since its
inception could find application here (again, see Wodak & Meyer, 2009 for a
methodological overview), they are not all equally useful for our present purpose, which
Mathiessen, 2004) do not seem to have a clear direct application here. The discourse-
historical approach (DHA, Wodak, 2001, 2004), developed to trace the constitution of a
analytical tools that are potentially useful (cf. Vessey, 2013a). DHA draws on
in- and out-groups through membership categorization by metaphor and metonymy), and
(Wodak & Meyer, 2009, pp. 319-320). However, preliminary analysis suggested as
particularly relevant and useful the discursive strategy of argumentation (i.e., topoi).
Topoi are defined as explicit or inferable obligatory premises which make it possible to
64
connect arguments with the conclusion (Wodak & Meyer, 2009), or simply “the common-
sense reasoning typical for specific issues” (van Dijk, 2000 cited in Baker et al., 2008, p.
299). In line with the methodological synergy explicated above, representative texts
identified through quantitative analysis are subjected to DHA with the goal of identifying
and describing the argumentation strategies (i.e., topoi) and their common frame of
has argued that researchers need to do background research and form hypotheses prior to
means that CL-informed discourse analysis should be corpus-based rather than corpus-
driven (cf. Tognini-Bonelli, 2001). Indeed, in one sense, it would of course be difficult to
evaluate, much less interpret any patterns resulting from CL analysis without relevant
background knowledge. However, although (unlike Baker et al., 2008), I do not think
data mining studies such as Mautner, 2007, to point to an obvious example), I do think it
Language-related discourse in the Balkans in the last twenty-five years has been
primarily concerned with the symbolic value of language in the processes of construction
of former Yugoslavia, as I have already noted above (Chapter 1), produced a climate of
65
this day. I therefore expect to find evidence of a discourse of contestation focusing on
linguistic varieties with ‘typical’ persons and activities and accounting for the
differentiations among them” (Irvine & Gal, 2000, p. 36). Irvine and Gal (2000) identify
three processes by which this differentiation works, which they call iconization, fractal
recursivity, and erasure. Iconization refers to the mapping of linguistic features onto
social images, positing a direct link between one or more linguistic features and (an
essentialist conceptualization of) the nature of the persons or social groups who display
which some persons, social groups, or sociolinguistic phenomena are rendered invisible
emerge is that of “imagined inherent, natural links between a unitary mother tongue, a
territory, and an ethnonational identity” (Irvine & Gal, 2000, p. 60), or rather how such
communities and the naturalness of homogeneous communities” (p. 207), a belief which
66
is a corollary of essentialist discourses and ideologies.
6. Results
This chapter is divided into three sections. The first section presents the results of
collocation, factor, analysis of variance, and cluster analysis). The second section
The results include observations about the relative effectiveness of individual methods;
several keyword analyses were conducted in this study. The first keyword analysis
involved a comparison between the 5+ hits section of SERBCORP as the research corpus
and SERBCOMP as the comparator corpus. This analysis identified a total of 151
positive and 40 negative key lemmas. Tables 10 and 11 show the top 50 positive key
lemmas and all negative key lemmas in the 5+ hits section of SERBCORP (the full list of
Unsurprisingly, the top positive key lemma is JEZIK ‘language’, which of course
simply reflects the selection criterion used to create the research corpus (5+ hits section
of SERBCORP). Even a cursory glance at the remainder of the top 50 positive key
lemmas shows that the discursive profile here is similar to that of SERBCORP as a whole
67
(Appendix C), with numerous references to semantic fields such as education and
literature. However, it is equally clear that there is one major difference between the two
lists: whereas the SERBCORP key lemma list includes few items referring to regional
(i.e., Central South Slavic) ethnolinguistic identities, the list of key lemmas in the 5+ hits
section of SERBCORP includes items referring to all major regional (as well as other)
ethnonyms and glottonyms: srpski ‘Serbian’ (7,309 occurrences), now the second most
key key lemma, as well as crnogorski ‘Montenegrin’ (rank 28, 696 occurrences),
srpskohrvatski ‘Serbo-Croatian’ (rank 29, 226 occurrences), Srbi ‘Serbs’ (rank 35, 1,432
occurrences), hrvatski ‘Croatian’ (rank 36, 684 occurrences), bosanski ‘Bosnian’ (rank
46, 266 occurrences), and bošnjački ‘Bosniak’ (rank 47, 223 occurrences). In addition,
randomly selected articles such as ćirilica ‘Cyrillic’ (rank 11, 590 occurrences), pismo
‘alphabet’ (rank 19, 1,243 occurrences), narod ‘people’ (rank 21, 1,510 occurrences),
manjina ‘minority’ (rank 37, 491 occurrences), and nacionalni ‘national’ (rank 40, 1,361
occurrences). Further, the remainder of the 151 key lemmas (Table D1, Appendix D)
‘Montenegrins’ (rank 60, 213 occurrences), Vuk (Karadžić)28 (rank 64, 482 occurrences),
Hrvati ‘Croats’ (rank 65, 315 occurrences), SANU ‘Serbian Academy of Arts and
Sciences’ (rank 66, 235 occurrences), identitet ‘identity’ (rank 68, 331 occurrences), naziv
‘(language) label’ (rank 70, 442 occurrences), ime ‘name’ (rank 73, 943 occurrences),
politika ‘politics’ (rank 74, 1,475 occurrences), Crna Gora ‘Montenegro’ (ranks 76 and
77, 1,072 occurrences), tradicija ‘tradition’ (rank 92, 252 occurrences), and nacija
‘nation’ (rank 93, 305 occurrences). This confirms that the 5+ hits section of
68
SERBCORP may indeed be a better target for analysis here. Also, note that the negative
key lemmas (Table 11) exhibit semantic patterns very similar to those identified for
SERBCORP as a whole (Table C2, Appendix C), i.e., lack of references to national
69
Table 10
Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
5 English engleski 1220 0.11 271 722 4949.45 0.0000000000
6 literature književnost 1305 0.12 269 1158 4667.73 0.0000000000
7 mother (adj.) maternji 611 0.05 197 1 3712.29 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
9 dictionary rečnik 801 0.07 135 320 3576.38 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
11 Cyrillic ćirilica 590 0.05 74 188 2756.74 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
13 professor profesor 1524 0.14 307 5952 0.03 2313.27 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
22 education obrazovanje 893 0.08 187 2969 0.01 1558.60 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
24 education (profession) prosvete 563 0.05 219 965 1516.59 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
26 linguist lingvista 260 0.02 81 12 1488.69 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
29 Serbo-Croatian srpskohrvatski 226 0.02 70 6 1323.38 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
34 school (university) fakultet 995 0.09 89 5250 0.02 1101.39 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
36 Croatian hrvatski 684 0.06 163 2749 0.01 1010.51 0.0000000000
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000
70
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 national nacionalni 1361 0.12 88 9893 0.04 961.57 0.0000000000
41 French francuski 477 0.04 140 1488 875.85 0.0000000000
42 elementary osnovni 870 0.08 87 5077 0.02 849.01 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
46 Bosnian bosanski 266 0.02 64 432 736.64 0.0000000000
47 Bosniak bošnjački 223 0.02 70 254 725.60 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000
Table 11
Negative Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 government vlada 412 0.04 138 26794 0.12 -844.02 0.0000000000
2 Serbia Srbija 2922 0.26 180 93159 0.41 -700.54 0.0000000000
3 millions miliona 177 0.02 104 15619 0.07 -654.02 0.0000000000
4 president predsednik 469 0.04 184 23415 0.10 -518.40 0.0000000000
5 year godina 5350 0.48 591 139140 0.62 -371.61 0.0000000000
6 parties stranke 148 0.01 69 10165 0.05 -339.42 0.0000000000
7 against protiv 419 0.04 240 18154 0.08 -312.14 0.0000000000
8 day dan 912 0.08 151 30134 0.13 -256.64 0.0000000000
9 authorities vlast 510 0.05 106 17482 0.08 -167.75 0.0000000000
10 director direktor 321 0.03 139 11611 0.05 -130.52 0.0000000000
11 Kosovo Kosovu 106 69 5254 0.02 -114.91 0.0000000000
12 last prošle 169 0.02 138 7041 0.03 -111.63 0.0000000000
13 citizens građani 412 0.04 70 13444 0.06 -109.59 0.0000000000
14 time vreme 1681 0.15 523 43233 0.19 -106.00 0.0000000000
15 public javnost 294 0.03 84 10303 0.05 -105.73 0.0000000000
16 solution rešenje 193 0.02 84 7465 0.03 -99.91 0.0000000000
17 law zakon 687 0.06 112 19899 0.09 -99.40 0.0000000000
18 percent odsto 816 0.07 193 22713 0.10 -92.36 0.0000000000
19 after posle 953 0.09 490 25886 0.12 -91.46 0.0000000000
20 now sad 396 0.04 237 12467 0.06 -89.12 0.0000000000
21 affairs poslova 92 66 4199 0.02 -79.70 0.0000000000
22 choice izbor 309 0.03 110 9934 0.04 -76.78 0.0000000000
23 moment trenutku 153 0.01 124 5648 0.03 -67.10 0.0000000000
24 expect očekuje 89 64 3828 0.02 -64.83 0.0000000000
71
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
25 week nedelje 104 86 4240 0.02 -64.12 0.0000000000
26 larger veći 410 0.04 108 11978 0.05 -62.32 0.0000000000
27 decision odluka 463 0.04 85 13130 0.06 -58.97 0.0000000000
28 political politički 856 0.08 131 22155 0.10 -56.87 0.0000000000
29 yesterday juče 206 0.02 148 6719 0.03 -54.67 0.0000000000
30 group grupa 463 0.04 110 12934 0.06 -53.63 0.0000000000
31 case slučaj 499 0.04 125 13753 0.06 -52.89 0.0000000000
32 say reći 1087 0.10 247 27137 0.12 -52.38 0.0000000000
33 five pet 320 0.03 227 9446 0.04 -51.56 0.0000000000
34 place mesto 799 0.07 222 20390 0.09 -47.07 0.0000000000
35 problem problem 824 0.07 250 20854 0.09 -45.08 0.0000000000
36 parliament skupštine 134 0.01 71 4590 0.02 -43.92 0.0000000000
37 city grada 159 0.01 104 5193 0.02 -42.45 0.0000000000
38 end kraj 588 0.05 102 15327 0.07 -41.39 0.0000000000
39 immediately odmah 193 0.02 148 5868 0.03 -36.46 0.0000000001
40 six šest 201 0.02 152 6055 0.03 -36.17 0.0000000001
72
6.1.2 Keyword analysis (5+ hits section of SERBCORP vs. 1-4 hits section of
SERBCORP). The second keyword analysis involved a comparison between the 5+ hits
section of SERBCORP as the research corpus and the 1-4 hits section of SERBCORP as
the comparator corpus. This analysis identified a total of 90 positive and 14 negative key
lemmas (the full list of positive key lemmas is shown in Table E1 in Appendix E). The
top 50 positive key lemmas and all negative key lemmas in the 5+ hits section of
SERBCORP with 1-4 hits section of SERBCORP as the comparator corpus are presented
Quite expectedly, of course, JEZIK is the top lemma also here. Compared to the 1-
4 hits section of SERBCORP, the discursive profile of the 5+ hits section of SERBCORP
is defined by items related to regional ethnolinguistic identities and education, with some
translation) and culture. Importantly, however, items referring to the major regional
ethnolinguistic identities are now all in the top 30 key lemmas, while most other relevant
items identified toward the end of the previous section have moved up in the list.
Interestingly, the lemma zakon ‘law’ is now identified as a positive keyword (rank 90,
687 occurrences). Note also that this prominence of the relevant (i.e., Central South
Slavic) ethnonyms and glottonyms in the list further validates the sampling criterion as
well as the cutoff point of 5 hits for the lemma JEZIK per article (see Section 4 and
Appendix A). Interestingly, as before, the negative key lemmas (now considerably fewer
in number on account of the smaller size of the comparator corpus) indicate a consistent
73
Table 12
Top 50 Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
4 dictionary rečnik 801 0.07 135 783 1717.42 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
6 mother (adj.) maternji 611 0.05 197 649 1249.68 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
8 Cyrillic ćirilica 590 0.05 74 905 942.80 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
14 linguist lingvista 260 0.02 81 96 823.13 0.0000000000
15 professor profesor 1524 0.14 307 5987 0.06 775.36 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
18 Croatian hrvatski 684 0.06 163 1623 0.02 729.29 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
20 Serbo-Croatian srpskohrvatski 226 0.02 70 142 597.25 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
22 education obrazovanje 893 0.08 187 3078 0.03 574.07 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
26 Bosniak bošnjački 223 0.02 70 181 526.18 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
29 Bosnian bosanski 266 0.02 64 383 445.70 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000
31 elementary osnovni 870 0.08 87 3686 0.03 378.99 0.0000000000
32 national nacionalni 1361 0.12 88 6963 0.07 369.42 0.0000000000
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000
74
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats Hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
44 Montenegrins Crnogorci 213 0.02 62 564 199.56 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
48 meaning značenje 262 0.02 79 867 179.80 0.0000000000
49 school (university) fakultet 995 0.09 89 5774 0.05 177.70 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000
Table 13
Negative Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by
Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 year godina 5350 0.48 591 61158 0.58 -195.54 0.0000000000
2 Kosovo Kosovu 106 69 2830 0.03 -155.68 0.0000000000
3 president predsednik 469 0.04 184 7431 0.07 -139.42 0.0000000000
4 day dan 912 0.08 151 12274 0.12 -119.94 0.0000000000
5 government vlada 412 0.04 138 6396 0.06 -112.15 0.0000000000
6 Belgrade Beograd 1480 0.13 258 17753 0.17 -85.56 0.0000000000
7 after posle 953 0.09 490 12093 0.11 -85.48 0.0000000000
8 millions miliona 177 0.02 104 2933 0.03 -63.14 0.0000000000
9 city grada 159 0.01 104 2584 0.02 -52.49 0.0000000000
10 during tokom 299 0.03 202 4110 0.04 -44.45 0.0000000000
11 now sad 396 0.04 237 5147 0.05 -41.82 0.0000000000
12 last prošle 169 0.02 138 2524 0.02 -38.56 0.0000000000
13 time put 943 0.08 341 10847 0.10 -36.64 0.0000000000
14 saw video 104 80 1711 0.02 -36.06 0.0000000001
75
Table 14
Positive Key Semantic Domains in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference
Rank Ethnolinguistic Etnolingvistički Rank Education & Obrazovanje i Rank Literature & Književnost i Rank Foreign Strani jezici
identities identiteti science nauka translation prevođenje languages
1 language jezik 3 linguistic jezički 9 word word 13 English engleski
2 Serbian srpski 5 school škola 16 literature književnost 28 foreign strani
4 dictionary rečnik 6 mother (adj.) maternji 24 literary književni 46 second drugi
7 alphabet pismo 10 instruction nastava 27 learning učenje 52 German nemački
8 Cyrillic ćirilica 11 learn učiti 34 culture kultura 53 Spanish španski
12 Montenegrin crnogorski 14 linguist lingvista 35 speech govor 61 French francuski
18 Croatian hrvatski 15 professor profesor 36 speak govoriti 62 Russian ruski
19 use (n.) upotreba 17 subject predmet 38 translator prevodilac 68 understand razumeti
20 Serbo-Croatian srpskohrvatski 21 education (pro.) prosvete 42 translation prevod 69 be able to moći
25 people narod 22 education obrazovanje 47 expression izraz
26 Bosniak bošnjački 23 grade razred 48 meaning značenje
29 Bosnian bosanski 30 students (K-12) učenici 51 wrote pisali
32 national nacionalni 31 elementary osnovni 63 poetry poezija
33 Vuk (Karadžić) Vuk 40 science nauka 66 writer pisac
37 minority manjina 43 class period čas 72 cultural kulturni
39 Croats Hrvati 45 doctor dr 81 poet pesnik
41 Serbs Srbi 49 school (univ.) fakultet
44 Montenegrins Crnogorci 54 exam ispit
50 label naziv 58 children deca
55 SANU SANU 59 knowledge znanje
56 name ime 60 scientific naučni
57 identity identitet 71 academician akademik
64 introduction uvođenje 73 example primer
65 percent odsto 76 sentence rečenica
67 Monte(negro) Gora 78 letters (a, b, c) slova
70 (Monte)negro Crna 79 students (K-8) đaci
74 nation nacija 82 schooling školovanje
75 Vojvodina Vojvodini 83 program/curric. program
77 today danas 85 book knjiga
80 same isti 87 parents roditelji
84 difference razlika
86 century vek
88 history istorija
89 change (v.) menja
90 law zakon
76
Based on the patterns identified so far in this section, it is clear that the 5+ hits
is also quite clear that keywords identified for topically heterogeneous research corpora
such as SERBCORP as a whole or the 5+ hits section of SERBCORP are not as insightful
as those identified for topically homogeneous research corpora such as, for example,
university instructors (e.g., Subtirelu, 2015).29 In other words, despite the identification
of promising lexical items and patterns demonstrated above, a decision about where to
begin analysis or what to focus on would still have to depend on researcher inference.
Therefore, in order to get a better sense of the discursive profile of the 5+ hits
section of SERBCORP, I classified all 90 key lemmas into semantic fields based on their
ambiguous cases). This semantic classification resulted in four distinct semantic fields
with different numbers of items in each: ethnolinguistic identities (the largest), education
and science, literature and translation, and foreign languages (the smallest; see Table 14).
So, based on this semantic patterning, we can conclude that Serbian newspaper discourse
ethnolinguistic identities, education, and, to a lesser extent, literature and translation and
foreign languages. From this, it is possible to further extrapolate that this general
77
education and literature as the primary sociocultural domains with respect to which
language is explicitly and overtly thematized. This is an important finding not only
circulation here, but also because a very similar discursive profile emerged from
analysis. Let us briefly illustrate the problems with this approach using a small set of
interesting examples. In addition to the obviously important Central South Slavic ethno-
and glottonyms, in Section 6.1.1 the following items were identified as some of the
pertinent key lemmas in this corpus: Vuk (Karadžić) (rank 64, 482 occurrences), SANU
‘Serbian Academy of Arts and Sciences’ (rank 66, 235 occurrences), ime ‘name’ (rank 73,
943 occurrences), and nacija ‘nation’ (rank 93, 305 occurrences). Again, the same items
were also identified as significant collocates of the lemma JEZIK: Vuk (Karadžić) (rank
113, 75 occurrences, 50 texts), SANU ‘Serbian Academy of Arts and Sciences’ (rank 120,
70 occurrences, 40 texts), ime ‘name’ (rank 36, 208 occurrences, 96 texts), and nacija
in turn itself based on background knowledge and a close reading of large numbers of
texts in the corpus) and quantitative evidence (results of keyword and collocation
analyses). Once identified by quantitative analyses, Vuk (Karadžić) and SANU were thus
deemed to be of potential interest because both Vuk Karadžić as an individual and SANU
78
as an institution have been historically closely linked to issues of language and
ethnolinguistic identity in the Central South Slavic area, which are the primary focus of
this study (for a detailed discussion, see Section 7.3). Similarly, the lexical item ime
‘name’ was deemed to be of potential interest because the naming of the different
varieties of Central South Slavic has been at the center of the public debate and
lexical item nacija ‘nation’, finally, is an obvious choice of lexical item to investigate in a
However, as can be seen, even a small set of relevant items presents the
potential interest. The problem of how to deal with large numbers of potentially
morphology of Central South Slavic in that, unlike English for example, semantic
forms of both the node word and any collocates (e.g., nacija, nacije, naciji, etc.). This
atomizes the overall semantic profile of the lexical item but also renders lexical software
designed primarily with languages with simpler inflectional morphologies in mind, such
the 5+ hits section of SERBCORP features articles that are topically rather
minimally informative lexical patterns, the lexical item Vuk, for example, has a mere 17
79
patterns of import for either language ideology or ethnonationalism.30 As will be shown
in Section 7.3, it is the fact that Vuk Karadžić as a historical figure is featured so
patterns associated with this lexical item that is important here. Put differently, lexical
items can have high discursive and ideological significance without exhibiting any
explicit collocational (or other) patterns. Concordance analysis therefore does not seem
It should be noted, however, that concordance analysis can still be useful for
microscopic lexical analysis during the preliminary stages of discursive corpus profiling
example, the verb postoji ‘exists’ (rank 48, 161 occurrences, 126 texts) was identified as a
significant collocate of the lemma JEZIK in the 5+ hits section of SERBCORP (see Table
expression of contestation as it is often used with a negator (ne postoji ‘does not exist’)
and applied to non-Serb varieties of Central South Slavic. The ‘concordance’ tool in
WST showed that, as a collocate of JEZIK, postoji was most often found in the R2
position.
As can be seen from Figure 3, (ne) postoji ‘(does not) exist’ is indeed applied to
the discourse of contestation.31 Interestingly, (ne) postoji is also applied to Serbian (line
16), but this was the only occurrence in conjunction with Serbian which failed to show up
80
Figure 3. Concordance lines for postoji ‘exists’ in the 5+ hits section of SERBCORP
The conclusion we can draw from this brief demonstration, therefore, is that, if we
concordance lines of limited sets of lexical items selected on the basis of researcher
analysis based on statistically significant patterns that can identify not only lexical foci
for analysis (i.e., discourses) but also individual representative texts for follow-up
qualitative analysis.
that represents a step in this direction. Based on keyword analysis, a database can be
created of items that are key in several texts in the corpus (key-keywords). This is done
by running separate keyword analyses for each individual text rather than the corpus as a
whole, and the result is information on which keywords are key in a researcher-
81
determined minimum number of texts. In addition to this, the function computes which
keywords (associates) co-occur with each of the key-keywords, forming lexical sets
(clumps) which can be suggestive of discourses and potentially also ideologies. Table 15
their clumps (top ten most frequent associates with MI scores ≥ 3). As can be seen, this
is quite an improvement over the keyword list as most key-keywords have discursively
indicative sets of associates. For example, the key-keyword Srbi ‘Serbs’ co-occurs with
the following keywords: Serbian, Croats, Croatian, people, name, literature, national,
academy, Croatia, book, professor, school, linguistic, literary, war, and learn.32 Clearly,
then, texts in which ‘Serbs’ appears as a keyword tend to discuss the Croats and Croatian
language, the language’s name and the (1990s’) war, all of which point to the (big ‘D’)
discourse of contestation mentioned toward the end of the methods section. Further,
there are indications of a discussion of the national academy of sciences and arts (i.e.,
SANU) and linguistics, as well as of education and literature more generally. Similarly,
the key-keywords Crna and Gora ‘Montenegro’, for example, co-occur with
Montenegrin, Serbian, mother (tongue), Montenegrins, label, and official, which suggests
Montenegro, whereby the erstwhile official language, Serbian, was first replaced by an
identity-neutral label ‘mother tongue’ and then, upon independence from Serbia, by
‘Montengrin’.
analysis (see Section 6.3) showed up as key-keywords. Perhaps unsurprisingly, they all
82
do (Table 16). Thus, the most salient variable in Factor 2 (Cyrillic-Only), alphabet co-
occurs with Cyrillic, Serbian, Latin, official, professor, use (n.), school, English, high
school, and book here and with Cyrillic (n./adj.), use (n.), official, Latin, constitution,
protection, association, and law in Factor 2. The most salient variable in Factor 11
subject, and education here and with Bosniak, elective, element, national, and board in
Factor 11. Thus, the associates of alphabet and Factor 2 both seem to suggest a discourse
of endangerment concerned with the protection of the Serbian Cyrillic from the
(perceived) threat posed by the widespread use of the Latin alphabet in Serbia. The
minority language rights concerned with the recent official recognition of Bosnian as a
minority language in Serbia and its introduction in schools. The overlap between the
The problem with this analytical technique, however, arises when one decides to
explore these sets of associates further. The number of associates, for one, can be very
large, depending on their overall frequency in the corpus, which means that it may be
necessary to start focusing on the most frequent items as with keyword analysis proper.
Further, it would, for instance, be interesting to look up some (or all) of the texts in which
this is not possible as there is currently no way to obtain this information automatically
using the ‘associates’ tool (Mike Scott, personal communication). One could, of course,
look for this information manually, but with a research corpus of this size, that is clearly
undesirable.
83
Table 15
Ethnolinguistic Identity-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits
84
N KW KW Texts % Overall No. Associates (English) Associates (Serbian)
(English) (Serbian) Freq. Ass.
101 identity identitet 9 1.10 72 46 national nacionalni
106 Croatia Hrvatska 8 0.97 107 30 language·I·Serbs·name·Croatian jezik·ja·Srbi·ime·hrvatski
110 renaming preimenovanje 8 0.97 41 45 high school·professor·Montenegrin·mother (adj.) gimnazija·profesor·crnogorski·maternji
111 Serbia Srbija 8 0.97 417 49
161 Bosnia Bosna 5 0.61 35 19
164 Herzegovina Hercegovina 5 0.61 37 17
169 nationalism nacionalizam 5 0.61 54 17
195 ethnic etnički 4 0.49 23 15 minority manjina
85
Table 16
Factor-related Key-keywords and Key-keyword Associates in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of
F KKW KKW Texts % Overall Freq. No. Ass. Associates (English) Associates (Serbian)
(English) (Serbian)
1 grade razred 47 5.72 378 117 school·subject·learn·instructor·education škola·predmet·učiti·nastavnik·prosvete·
(profession)·instruction·English·first-graders·class period· nastava·engleski·prvaci·čas·đaci
students (K-8)
2 alphabet pismo 49 5.97 657 108 Cyrillic·Serbian·Latin·official·professor·use (n.)· ćirilica·srpski·latinica ·službeni·profesor·
school·English·high school·book upotreba·škola ·engleski·gimnazija·knjiga
3 exam ispit 24 2.92 189 69 school·mathematics·students (K-12)·students (K-8)· škola·matematika·učenici·đaci·engleski·
English·high (school)·mother (adj.)·Serbian·high maternji·srpski·gimnazija·nastavnik·prijemni
school·instructor·entrance
4 school fakultet 25 3.05 433 87 professor·instruction·instructor·university·school·English· profesor·nastava·nastavnik·univerzitet·škola·
(university) education·student·literature·program of study engleski·obrazovanje·student·književnost·
studija
5 minority manjina 29 3.53 284 101 mother (adj.)·national·school·Bosnian·minority maternji·nacionalni·škola·bosanski·manjinski·nacija
(n.) (adj.)·nation·education· rights·subject·learn ·obrazovanje·prava·predmet·učiti
6 Croatia Hrvatska 8 0.97 107 30 Croatian·Serbs·name·I·literature hrvatski·Srbi·ime·ja·književnost
7 book knjiga 66 8.04 997 156 Serbian·literature·writer·literary·poetry·poem·poet· srpski·književnost·pisac·književni·poezija·
professor·dictionary·award pesma·pesnik·profesor·rečnik·nagrada
8 Montenegri crnogorsk 38 4.63 393 98 (Monte)negro·Serbian·Monte(negro)·Montenegrins·moth crna·srpski·gora·crnogorci·maternji·
n i er (adj.)·literature·linguistics·literary·poetry·professor književnost·jezički·književni·poezija·profesor
9 teach predavati 6 0.73 45 30 instructor nastavnik
10 linguistic jezički 60 7.31 367 122 Serbian·dictionary·literary·dialect·English·speech·literatu srpski·rečnik·književni·dijalekat·engleski·
re·people·Croatian·school govor·književnost·narod·hrvatski·škola
11 Bosnian bosanski 16 1.95 143 49 Bosniak·national·minority·subject·education bošnjački·nacionalni·manjina·predmet·
(profession)·Serbian·learn·literary·school prosvete·srpski·učiti·književni·škola
12 center centar 4 0.49 56 13
86
WST includes several other tools for the exploration of co-occurrences among
keywords (see Scott, 2014a, for details) such as the ‘keywords plot’ and ‘links’, which
calculate and plot a keyword’s collocates (i.e., keywords that occur within a researcher-
defined collocation span of the chosen keyword). However, this analysis only works with
individual texts so its usefulness for our purposes is limited. Another option is to take a
phrasal approach to keywords and calculate keyword clusters (i.e., n-grams), but this
technique only uses keywords which makes it highly unlikely to produce a sufficient
number of observations for analysis. With this, my exploration of keyword analysis here
is complete. In the next section, I examine the results of collocation analysis as applied
in this study.
This section presents the results of collocation analyses of the lemma JEZIK
conducted on the 5+ hits section of SERBCORP (for results and discussion of collocation
analysis performed on SERBCORP as a whole, see Appendix F). Lists of collocates are
presented first by frequency and then also by MI score. As with keywords, only the top
50 collocates are shown in the tables in the body of the text; full lists are presented in
appendices. Lastly, a sample of the most frequent n-grams in the 5+ hits section of
analysis of the lemma JEZIK conducted on the 5+ hits section of SERBCORP produced a
total of 305 lemma collocates of the lemma JEZIK (Appendix G, Tables G1 and G2).
F, Table F1), the most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’
87
with 3,449 occurrences in 802 texts.
The prominent semantic fields are similar to those in SERBCORP, with most
identity (see Appendix F). However, items referring to literature are now unaccompanied
by items referring to translation, while the semantic field of culture remains marginal
(one item). The semantic fields of school and foreign languages are somewhat less
conclude at this point that these two fields account for much of the information ‘loss’ due
to sampling. On the other hand, in line with the above demonstrated trend of increased
relevance of articles with higher numbers of hits for the lemma JEZIK, srpski ‘Serbian’,
hrvatski ‘Croatian’ (rank 15, 363 occurrences), and crnogorski ‘Montenegrin’ (rank 16,
occurrences) and bosanski ‘Bosnian’ (rank 46, 165 occurrences) in the top 50. Other
pertinent items remain: narod ‘people’ (rank 27, 252 occurrences) and, nacionalni
‘national’ (rank 31, 222 occurrences), ime ‘name’ (rank 36, 208 occurrences), postoji
‘exists’ (rank 48, 161 occurrences), with the addition of novi ‘new’ (rank 25, 254
occurrences) and pitanje ‘question’ (rank 45, 168 occurrences). Pismo ‘alphabet’ (rank
10, 457 occurrences) and rečnik ‘dictionary’ (rank 50, 158 occurrences), finally, suggest
1972).
88
Table 17
Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of
89
Table 18
Top 50 Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ Hits Section of
90
The top most significant collocates (Table 18) present a more opaque pattern, with
considerably more attributive adjectives in the list, as above: službeni ‘official’ (rank 1,
285 occurrences) and zvaničan ‘official’ (rank 20, 112 occurrences), različit ‘different’
‘common’ (rank 21, 83 occurrences), nov ‘new’ (rank 31, 254 occurrences), savremeni
‘contemporary’ (rank 32, 106 occurrences), govorni ‘spoken’ (rank 33, 39 occurrences).
occurrences) and mađarski ‘Hungarian’ (rank 36, 108 occurrences), and a set of most
8, 351 occurrences), ime ‘name’ (rank 25, 208 occurrences), and preimenovanje
‘renaming’ (rank 42, 94 occurrences). New, previously unattested items include nacija
‘nation’ (rank 16, 51 occurrences), latinica ‘Latin (alphabet)’ (rank 17, 41 occurrences),
grupa ‘group’ (rank 22, 65 occurrences), and nasilje ‘violence’ (rank 34, 28 occurrences).
although, again, frequency does seem to be a better guide to insights into discursive
offers another analytical technique33 which has the potential to increase its usefulness for
6.2.2 N-grams. Given the demonstrated higher relevance of the 5+ hits articles
for our purposes here, n-gram analysis was conducted on the 5+ hits section of
SERBCORP only. As expected, the total number of identified recurrent phrases was
91
large (3,753), and most were bigrams (2,381). The list of bigrams presented here (Table
19) is a researcher-selected sample from the top 100 most frequent phrases in every
category.
lists as we can now see the node word (here, different forms of the lemma JEZIK) in a
variety of phrasal contexts. Further, although this n-gram analysis is based on the
concordance lines of different forms of the lemma JEZIK, we can see a large number of
relevant phrases that do not contain any of the forms of this lemma. Because n-grams are
based on lexical patterns identified by collocation analysis, we are likely to see a lot of
the same items, only contextualized. Indeed, if we look at the bigrams here, we can see
many of the same lexical items and traces of that same discourse of construction and
jezik ‘Serbian language’, maternji jezik ‘mother tongue’, svoj jezik ‘own language’, naš
jezik ‘our language’ (ranks 1, 2, 5, 7), on the one hand, and the phrases strani jezik
‘foreign language’, engleski jezik ‘English language’, and drugi (strani) jezik ‘second
(foreign) language’ (ranks 3, 4,11), on the other. We also note the high prominence of
bosanski jezik ‘Bosnian language’ (ranks 1, 6, 9,10). The top trigrams, for example,
confirm the association between language and literature (jezik i književnost ‘language and
literature, rank 15) and language and culture (jezik i kulturu ‘language and culture, rank
26), but also point to a discourse related to language policy (jezik i pismo ‘language and
alphabet’, rečnik srpskog jezika ‘dictionary [of the] Serbian language’, odbor za
92
standardizaciju ‘board for standardization’, ranks 16, 18, 19) and a discourse of
preimenovanju jezika ‘about [the] renaming [of] language’, ranks 22, 24).
The top n-grams with four, five and six constituents confirm these and other
already attested patterns. Thus, in the 4-gram section, we see more evidence of a
discourse on minority language rights (na jezicima nacionalnih manjina ‘in [the]
languages [of] national minorities’, rank 32), endangerment (za zaštitu srpskog jezika ‘for
[the] protection [of the] Serbian language’, za odbranu srpskog jezika ‘for [the] defense
[of the] Serbian language’, ranks 35, 38), and contestation (preimenovanje srpskog jezika
u ‘renaming [of the] Serbian language into’, ne postojanju crnogorskog književnog ‘non-
existence [of the] Montenegrin literary’, o ne postojanju crnogorskog ‘about [the] non-
existence [of] Montenegrin’, ranks 34, 39, 42). In the 5-gram section, we see traces of a
standardizaciju srpskog jezika ‘board for [the] standardization [of the] Serbian language’,
zakon o službenoj upotrebi jezika ‘law on [the] official use [of] language’, institut za
srpski jezik SANU ‘SANU institute for [the] Serbian language’, ranks 45, 54, 59), as well
as discourses of endangerment (e.g., primena latiničnog pisma srpskog jezika ‘use [of
the] Latin alphabet [of the] Serbian language’, sačuvati sopstveni jezik [i] njegovu
posebnost ‘preserve [one’s] own language [and] its autonomy’, rat za srpski jezik i ‘war
for [the] Serbian language and, ranks 58, 60, 65), contestation (e.g., srpski jezik
preimenuju u crnogorski ‘rename [the] Serbian language into [the] Montenegrin’, rank
63), and ethnolinguistic identity (svoju nacionalnost i svoj jezik ‘[one’s] own nationality
93
Table 19
Sample of the Most Frequent N-grams in the 5+ Hits Section of SERBCORP (by Number
94
N N-gram (English) N-gram (Serbian) Freq.
56 association for (the) protection (of the) Cyrillic (of) Serbian udruženja za zaštitu ćirilice srpskog 8
57 non-existence (of the) Montenegrin literary language ne postojanju crnogorskog književnog jezika 8
58 use (of the) Latin alphabet (of the) Serbian language primena latiničnog pisma srpskog jezika 7
59 SANU institute for (the) Serbian language institut za srpski jezik SANU 6
60 preserve (one’s) own language and (its) autonomy sačuvati sopstveni jezik njegovu posebnost 6
61 rename (the) language into (the) Montenegrin language jezik preimenuju u crnogorski jezik 6
62 Serbian language into mother tongue srpski jezik u maternji jezik 6
63 rename (the) Serbian language into Montenegrin srpski jezik preimenuju u crnogorski 6
64 (one’s) own nationality and (one’s) own language svoju nacionalnost i svoj jezik 5
65 war for (the) Serbian language and rat za srpski jezik i 5
66 professor (of the) Serbian language and literature profesor srpskog jezika i književnosti 5
6-grams
67 about (the) official use (of) language and alphabet o službenoj upotrebi jezika i pisma 25
68 (of the) department of (the) Serbian language and literature odseka za srpski jezik i književnost 18
69 Bosnian language with elements of national culture bosanski jezik sa elementima nacionalne kulture 13
70 association for protection (of) Cyrillic (of the) Serbian language udruženja za zaštitu ćirilice srpskog jezika 8
71 dictionary (of) Serbo-Croatian literary and people’s language rečnik srpskohrvatskog knjiž. i narodnog jezika 8
72 students (in the) department (of the) Serbian language and studenti odseka za srpski jezik i 7
73 rename (the) Serbian language into (the) Montengrin language srpski jezik preimenuju u crnogorski jezik 6
74 subject (of) Serbian language into mother tongue predmeta srpski jezik u maternji jezik 6
75 in (the) languages (of) national minorities and for na jezicima nacionalnih manjina i za 5
76 official use (of) other languages and alphabets službena upotreba drugih jezika i pisama 5
77 Serbian language and literature into mother srpski jezik i književnost u maternji 5
78 chair (of) board for (the) standardization (of) Serbian language preds. odbora za standardizaciju srpskog jezika 5
79 fellow (of the) SANU institute for (the) Serbian language saradnik instituta za srpski jezik SANU 5
80 war for (the) Serbian language and alphabet rat za srpski jezik i pravopis 5
Finally, in the 6-gram section, we see many of the same phrases, only more
complete (e.g., udruženja za zaštitu ćirilice srpskog jezika ‘association for [the]
protection [of the] Cyrillic [of the] Serbian language’, srpski jezik preimenuju u
crnogorski jezik ‘rename [the] Serbian language into [the] Montenegrin language’, rat za
srpski jezik i pravopis ‘war for [the] Serbian language and orthography’, ranks 70, 73, 80)
rights (bosanski jezik sa elementima nacionalne kulture ‘Bosnian language with elements
presents an amount of information which is not easily dealt with by an analyst. In other
words, although we have been able to identify a certain number of patterns pointing to
95
we are not missing anything and, of course, it is still difficult to make a principled
decision about what to actually focus our analysis on. Also, even if we decided that this
was enough information to choose a focus, how do we identify the most representative
texts for qualitative analysis, for example? As mentioned above, available research
favors examination of concordance lines at this point, but again we are dealing with
concordance lines. I noted above that Hunston (2002), for instance, suggests
concentrating on a random sample of concordance lines to get around this problem, but
that hardly solves the problem. Similarly, using the ‘plot’ function to identify texts with
the highest numbers of hits for a particular lemma form of the node word (as in Vessey,
2013a) seems inadequate and ineffective if we are looking to account for the corpus as a
whole. Fortunately, exploratory factor analysis and cluster analysis seem to provide
solutions for these and a number of other issues in corpus-based discourse and ideology
research.
34.13% of the total variance in the data. In contrast to Fitzsimmons Doolan (2011, 2014),
however, the factors were not interpreted as language ideologies, but rather as indicators
of the most salient topics and thus discourses in the data. The factors were labeled as
follows: Language education (5.40 %), Cyrillic-only (3.18 %), Entrance exams (3.16 %),
Contestation over language ownership and name (2.67 %), Literature and publishing
(2.61 %), Officialization of Montenegrin 2 (2.61 %), Foreign language education (2.60
96
%), Linguistics as a science, lexicography, standardization and contestation (2.55 %),
Officialization of Bosnian (2.10 %), and Linguacultural diplomacy, language, and culture
(1.73 %). Descriptive statistics for all of the variables in the preferred factor solution are
presented in Table 20. The eigenvalues of the unrotated factor analysis are shown in
Table 21. Figure 4 shows the scree plot of eigenvalues. Table 22 presents the rotated
factor patterns for the 12-factor solution using the Varimax rotation. Finally, Table 23
shows a summary of the factorial structure, with the salient collocates for each factor
Table 20
Descriptive Statistics for the Variables in the 12-factor Solution (N = 943, k = 107)
97
Collocate of JEZIK Mean Minimum Maximum Range Standard
English Serbian value value deviation
expression izraz 0.3 0.0 10.4 10.4 1.0
first prvi 2.0 0.0 30.9 30.9 2.5
foreign strani 2.2 0.0 31.4 31.4 4.0
framework okvir 0.3 0.0 11.8 11.8 0.9
grade razred 0.7 0.0 24.4 24.4 2.5
high school (gen.) srednja 0.3 0.0 12.2 12.2 1.0
high school (acad.) gimnazija 0.4 0.0 19.7 19.7 1.7
Hungarian mađarski 0.2 0.0 47.1 47.1 1.7
institute institut 0.3 0.0 24.8 24.8 1.7
instruction nastava 1.0 0.0 36.2 36.2 3.0
instructors nastavnici 0.6 0.0 23.4 23.4 2.1
interest interesovanje 0.2 0.0 7.8 7.8 0.6
introduction uvođenje 0.2 0.0 14.3 14.3 1.0
knowledge znanje 0.6 0.0 16.6 16.6 1.7
Latin latinica 0.5 0.0 36.1 36.1 2.5
law zakon 0.6 0.0 29.4 29.4 2.1
level nivo 0.3 0.0 22.1 22.1 1.2
linguist lingvista 0.3 0.0 13.7 13.7 1.1
linguistic jezički 0.9 0.0 19.8 19.8 2.0
linguistic lingvistički 0.2 0.0 10.1 10.1 0.7
linguistics lingvistika 0.1 0.0 12.5 12.5 0.7
literary književni 0.9 0.0 20.0 20.0 2.1
literature književnost 1.4 0.0 34.4 34.4 3.2
mathematics matematika 0.3 0.0 19.3 19.3 1.5
minority manjina 0.6 0.0 31.1 31.1 2.8
Montenegrin crnogorski 0.9 0.0 40.0 40.0 3.4
Montenegrins Crnogorci 0.2 0.0 19.3 19.3 1.2
Montenegro Crna 1.2 0.0 36.1 36.1 3.8
mother (adj.) maternji 0.8 0.0 24.6 24.6 2.2
name ime 0.9 0.0 22.5 22.5 2.2
national nacionalni 1.4 0.0 27.6 27.6 3.4
Nikšić Nikšić 0.1 0.0 15.3 15.3 0.9
official službeni 0.6 0.0 43.2 43.2 2.9
part deo 1.2 0.0 20.7 20.7 2.9
philology filološki 0.2 0.0 9.9 9.9 0.9
philosophy filozofski 0.2 0.0 8.7 8.7 0.8
poem pesma 0.4 0.0 21.3 21.3 1.8
poetry poezija 0.5 0.0 24.5 24.5 2.0
professor profesor 1.8 0.0 39.2 39.2 3.9
protection zaštita 0.2 0.0 10.5 10.5 0.8
publish objaviti 0.4 0.0 14.7 14.7 1.1
published objavljen 0.3 0.0 8.3 8.3 0.8
renaming preimenovanje 0.1 0.0 15.3 15.3 0.9
rights prava 0.9 0.0 33.3 33.3 2.2
Romanian rumunski 0.1 0.0 15.0 15.0 0.8
Ruthenian rusinski 0.1 0.0 12.7 12.7 0.8
SANU SANU 0.2 0.0 19.6 19.6 1.3
school (K-12) škola 2.8 0.0 35.8 35.8 6.0
school (univ.) fakultet 1.1 0.0 38.8 38.8 3.3
science nauka 0.7 0.0 23.6 23.6 1.8
scientific naučni 0.2 0.0 11.1 11.1 0.9
alphabet pismo 1.7 0.0 53.8 53.8 5.4
section odeljenje 0.4 0.0 21.9 21.9 1.7
Serbian srpski 8.5 0.0 68.6 68.6 10.2
Serbo-croatian srpskohrvatski 0.2 0.0 29.1 29.1 1.3
Serbs Srbi 1.2 0.0 35.1 35.1 3.1
Slovak slovački 0.1 0.0 12.7 12.7 0.7
standard standardni 0.1 0.0 7.5 7.5 0.5
students (K-12) učenici 0.7 0.0 19.4 19.4 2.4
students (univ.) studenti 0.7 0.0 32.8 32.8 2.9
study učiti 0.9 0.0 27.8 27.8 2.6
subject predmet 0.9 0.0 32.8 32.8 2.9
teach predavati 0.2 0.0 12.1 12.1 1.0
teachers učitelji 0.2 0.0 15.7 15.7 0.8
use (n.) upotreba 0.8 0.0 31.8 31.8 2.5
war rat 0.4 0.0 16.3 16.3 1.2
word reč 2.6 0.0 53.5 53.5 4.6
work delo 1.3 0.0 25.9 25.9 2.2
98
Collocate of JEZIK Mean Minimum Maximum Range Standard
English Serbian value value deviation
writer pisac 0.8 0.0 30.1 30.1 2.1
Table 21
99
Table 22
100
English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
first prvi .477 -.004 .100 -.030 -.063 -.002 .127 -.088 .093 -.026 -.007 -.071
foreign strani .328 -.035 .054 -.048 -.039 -.112 -.141 -.045 .450 .017 -.102 .061
framework okvir .245 .000 .006 .043 -.014 -.080 -.026 -.024 .408 .091 .008 .018
grade razred .797 -.015 .176 -.049 -.025 -.048 -.055 .010 .322 -.048 .005 -.085
high school (gen.) srednja .084 .026 .631 .006 .008 -.031 -.061 -.033 .016 -.081 -.023 -.025
high school (acad.) gimnazija .115 -.042 .532 .206 -.030 -.061 -.052 .075 -.033 -.031 -.015 -.107
Hungarian mađarski -.015 -.001 .034 .000 .458 .013 -.017 -.039 .007 -.033 -.009 -.029
institute institut .008 -.055 -.020 .014 -.010 -.070 -.043 -.029 -.070 .183 -.039 .362
instruction nastava .448 -.023 .233 .185 .018 -.082 -.108 .031 .453 -.055 .083 .006
instructors nastavnici .570 -.026 .074 .086 -.030 -.047 -.074 -.012 .306 -.058 -.066 -.030
interest interesovanje .042 -.059 .048 .034 -.006 -.067 .052 -.024 .004 -.029 -.023 .391
introduction uvođenje .068 .077 -.023 .021 .014 -.029 -.065 .370 .111 .013 .136 .062
knowledge znanje .207 -.063 .303 .074 -.057 -.120 -.109 -.063 .362 -.026 -.055 .103
Latin latinica -.030 .547 -.004 -.066 -.130 .135 -.008 -.043 -.006 -.043 .037 .060
law zakon .104 .348 -.067 .095 .270 -.106 -.187 -.078 -.032 -.004 .103 -.143
level nivo .103 .008 .007 .003 .088 -.075 -.116 -.018 .577 .024 -.036 .021
linguist lingvista -.063 .015 -.025 .027 -.007 .219 -.038 .060 .041 .496 .081 .041
linguistic jezički -.103 .005 .010 -.076 -.024 .027 .003 .026 .139 .576 .045 -.053
linguistic lingvistički -.073 -.007 -.020 .078 -.005 .115 -.018 .157 .080 .438 .045 .037
linguistics lingvistika -.087 -.022 .007 .067 -.023 .103 -.056 .024 .060 .373 .048 .007
literary književni -.088 -.008 -.055 .000 -.044 .128 .464 -.012 .002 .147 -.006 .038
literature književnost -.020 -.036 .024 .237 -.033 .075 .443 .005 .033 .002 -.008 .124
mathematics matematika .210 -.036 .638 -.045 -.012 -.041 -.009 -.017 .012 -.047 -.044 .008
minority manjina -.028 .098 .018 -.037 .785 .011 -.077 -.026 .039 -.056 .246 -.055
Montenegrin crnogorski -.088 .024 -.044 .032 -.001 .078 -.066 .765 .012 .073 .001 -.004
Montenegrins Crnogorci -.064 -.021 -.046 -.046 .056 .101 -.059 .428 .032 .011 -.017 .015
Montenegro Crna -.095 .075 -.048 .075 -.002 .033 -.075 .744 -.029 .003 -.002 .006
mother (adj.) maternji .100 .045 .297 .185 .083 .008 -.098 .401 -.066 -.066 .022 -.091
name ime -.079 -.003 -.064 .000 -.013 .385 -.042 .146 -.041 .077 .027 -.100
national nacionalni -.024 .149 -.028 -.078 .459 .138 -.072 .027 .041 -.020 .395 -.017
Nikšić Nikšić -.034 .027 .111 .454 -.032 -.038 -.024 .466 -.123 -.002 -.012 -.213
official službeni -.039 .558 -.022 .021 .179 -.037 -.096 .118 -.003 .028 .058 -.087
part deo -.066 -.019 .000 -.042 .004 .011 .413 -.062 -.003 .014 -.031 .133
philology filološki .030 -.029 -.030 .426 -.033 .051 .040 -.113 .141 .069 .025 .261
philosophy filozofski -.013 -.009 .023 .550 -.026 .016 -.023 .204 -.049 .038 .013 -.030
poem pesma -.040 -.062 -.071 -.027 -.030 -.084 .358 -.049 -.009 -.072 -.030 -.153
poetry poezija -.057 -.062 -.060 .012 -.039 -.086 .316 -.059 -.013 -.066 -.023 -.142
professor profesor .096 .047 .122 .578 -.071 -.093 .035 .089 .094 .078 -.008 .018
protection zaštita -.036 .388 .005 -.024 .171 .019 -.033 .096 -.017 -.056 -.025 .042
publish objaviti -.029 .064 -.054 .016 -.010 .005 .514 -.048 -.053 -.025 -.004 -.010
published objavljen -.040 -.023 .057 -.029 -.031 -.044 .440 -.052 -.037 .124 -.032 -.014
renaming preimenovanje -.038 .001 .079 .327 -.040 .003 -.037 .483 -.112 -.008 -.003 -.210
rights prava -.067 .113 -.048 .049 .401 -.005 -.142 .130 .156 -.025 .100 -.109
Romanian rumunski .014 -.006 .011 -.017 .528 .050 .052 -.029 -.031 -.067 -.051 .009
Ruthenian rusinski .026 .006 .019 -.021 .709 .001 -.012 -.006 -.025 -.036 -.053 .014
101
English Serbian F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
SANU SANU .009 -.029 .004 -.026 .006 .130 .032 -.046 -.055 .487 -.028 .054
school (K-12) škola .570 -.040 .561 .021 -.027 -.083 -.198 .007 .200 -.100 .024 .096
school (univ.) fakultet .089 -.027 -.045 .657 .039 -.028 -.046 -.109 .174 -.020 -.007 .170
science nauka .043 -.005 -.043 .143 .000 .074 .079 .052 -.011 .352 -.063 .063
scientific naučni -.019 -.010 -.013 .165 -.011 .060 .093 .032 -.001 .365 -.016 .057
alphabet pismo -.045 .817 -.021 -.022 -.072 .063 -.030 -.079 -.028 .000 .005 .024
section odeljenje .146 -.039 .480 -.007 .026 -.015 -.042 -.036 .042 .032 -.014 .001
Serbian srpski -.063 .259 -.008 .165 .000 .351 .222 .230 -.044 .363 .011 .103
Serbo-croatian srpskohrvatski -.034 .043 -.007 .036 -.018 .348 .047 .018 .025 .317 .029 .026
Serbs Srbi -.085 .162 -.057 -.066 .040 .501 -.016 .080 -.003 .034 .009 .003
Slovak slovački .029 .001 .024 -.023 .615 .012 .003 .035 -.026 -.007 -.055 .037
standard standardni -.048 .047 -.023 .000 -.048 .009 -.037 -.027 .057 .317 .046 -.014
students (K-12) učenici .374 -.020 .629 -.006 .025 -.054 -.082 -.002 -.015 -.058 -.025 -.014
students (univ.) studenti -.007 -.035 -.040 .589 -.017 -.040 -.050 .026 .005 -.050 -.025 .215
study učiti .594 -.040 .137 -.021 -.046 -.043 -.132 .019 .086 -.094 .183 .094
subject predmet .761 -.013 .116 .060 .037 -.057 -.029 .085 -.002 -.036 .197 -.059
teach predavati .293 .002 .125 .136 -.042 -.009 -.008 .056 .606 -.038 .036 -.070
teachers učitelji .442 -.035 -.036 .084 -.003 -.019 -.046 -.042 .094 -.065 .024 -.040
use (n.) upotreba -.061 .647 -.047 -.014 .199 -.101 -.125 -.081 -.004 .212 .044 -.140
war rat -.077 -.065 -.049 -.067 -.040 .338 .004 -.016 -.015 -.058 -.013 -.002
word reč -.106 -.014 -.082 -.124 -.081 -.133 .028 -.131 -.054 .328 -.050 -.113
work delo -.061 -.060 -.047 -.009 -.016 -.053 .423 -.040 -.037 .073 .009 .052
writer pisac -.110 -.073 -.073 -.071 -.042 -.037 .473 -.002 -.029 -.111 .025 .017
102
Table 23
Summary of the Factorial Structure (Collocates in Parentheses were not Used in the
103
Factor Collocates Collocates Factor loadings Factor/discourse label
English Serbian
Serbs Srbi .501
academy akademija .463
name ime .385
(Serbian) (srpski) .351
Serbocroatian srpskohrvatski .348
war rat .338
call (v.) zvati .334
Factor 7 (2.61 %) book knjiga .562 Literature and publishing
publish objaviti .514
writer pisac .473
literary književni .464
literature književnost .443
published objavljen .440
work delo .423
part deo .413
edition izdanje .409
poem pesma .358
poetry poezija .316
Factor 8 (2.61 %) Montenegrin crnogorski .765 Officialization of Montenegrin 2
Montenegro Crna (Gora) .744
renaming preimenovanje .483
Nikšić Nikšić .466
Montenegrins Crnogorci .428
mother (adj.) maternji .401
authorities vlast .392
introduction uvođenje .370
Factor 9 (2.60 %) teach predavati .606 Foreign language education
level nivo .577
be able to moći .482
common zajednički .461
(instruction) (nastava) .453
foreign strani .450
framework okvir .408
begin početi .375
knowledge znanje .362
(grade) (razred) .322
(department) (katedra) .314
(instructors) (nastavnici) .306
Factor 10 (2.55 %) linguistic jezički .576 Linguistics as a science,
linguist lingvista .496 lexicography, standardization
SANU SANU .487 and contestation
dictionary rečnik .485
linguistic lingvistički .438
linguistics lingvistika .373
scientific naučni .365
Serbian srpski .363
science nauka .352
word reč .328
standard standardni .317
(Serbocroatian) (srpskohrvatski) .317
(edition) (izdanje) .314
expression izraz .311
Factor 11 (2.10 %) Bosnian bosanski .804 Officialization of Bosnian
Bosniak bošnjački .669
elective izborni .523
element element .477
(national) (nacionalni) .395
board odbor .351
Factor 12 (1.73 %) center centar .469 Linguacultural diplomacy,
cultural kulturni .421 language, and culture
course kurs .394
interest interesovanje .391
culture kultura .377
institute institut .362
Belgrade Beograd .349
104
6.3.1 Factor 1: Language education. The texts with the top twenty factor scores
on Factor 1 are listed in Table 24 (for a key, see Section 4). It should be noted that the
top three and a total of seven of the twenty top scoring articles here were originally
identified as multivariate outliers (see Section 5.3.2). Importantly, there was some
overlap with Factor 11 (Officialization of Bosnian) as the two top scoring articles on
Factor 1, for example, also had high factor scores on Factor 11. Discursive links between
individual factors were confirmed by the results of cluster analysis (see Section 6.5.), so
factors are presented in groups according to their discursive links (Table 41) rather than
according to the amount of variation they account for alone (Table 23). This language-
related (small ‘d’) discourse included the following salient collocates: grade, subject,
compulsory, class period, elementary, learn, school (K-12), instructors, children, elective,
first, instruction, teachers, students (K-12), curriculum, attend, foreign, and education,
and accounted for most variation (5.40%). Note that all text excerpts were taken from the
Table 24
105
Based on the salient variables and a qualitative examination of representative, i.e.
top scoring, texts (see Section 6.6.1), Factor 1 was interpreted as a general discourse
about language education. Keyword and collocation analyses above showed that
education was one of the most prominent semantic fields in this corpus and this is
illustrates the discourse typically found in articles scoring highly on Factor 1.34
In this excerpt from a Politika article from April 13, 2003, we see an example of the
general discourse about language education thematizing the then topical changes to
aspects of (foreign language) education in Serbia. Serbia is a republic and, despite the
existence of an autonomous region (Vojvodina), its education system and the government
106
as a whole are fairly highly centralized, which means that policy and curriculum
decisions are made in the republic Ministry of Education in Belgrade. This excerpt refers
language instructors.
Despite some overlap with several other factors noted above, texts scoring highly
education (e.g., educational reform topical during the subject period). Texts scoring
highly on Factor 1 thus typically do not thematize Central South Slavic (or any other)
ethnolinguistic identities. However, there are exceptions to this as this general language-
related educational (small ‘d’, see Section 3.3.4) discourse is sometimes permeated by a
(big ‘D’) discourse of contestation which is directly related to Central South Slavic
see Section 6.6), as in text excerpt 2 from a Politika article from November 13, 2004
Bosniaks are fighting to get the educational authorities in Serbia to allow their
children in Tutin, Sjenica, and Novi Pazar [municipalities with a Bosniak majority in
southwest Serbia] to study their own language with elements of national culture as an
elective in the first and second grades [of elementary school]. Officials of the Bosniak
National Council in Serbia and Montenegro [the country’s official name before
Montenegrin independence] say that elementary school children in this area are already
taking classes in their mother tongue but only as an optional subject, which means only
107
one class per week. If this subject became an elective, then the children would be able to
take two classes per week which would be reserved for their tradition, culture, and
language. At a first glance, there is nothing problematic about this. And there wouldn’t
be anything problematic about this if they, the Bosniaks in Serbia, did not call their
mother tongue Bosnian. Educational authorities in Serbia cannot allow them to study
Bosnian in schools until linguists recognize the existence of that language.
identity, tradition and culture. The contestation of a self-ascribed language name which is
and the collective rights that come with a separate ethnolinguistic identity (such as a
group’s right to name its own language) than it is about language itself. As we will see
further below, this discourse of contestation is pervasive in this corpus, but at the same
time, some factors are much more parsimonious pointers to it than others, so follow-up
6, 8, 10, and 11) which arise from texts that routinely thematize Central South Slavic
ethnolinguistic identities explicitly and thus are much more pertinent to an analysis of
ethnonationalism.
following salient collocates: exam, mathematics, high school (general and academic),
students (K-12), school (K-12), section, attend and knowledge, and accounted for 3.16%
of the variation. The texts with the top twenty factor scores on Factor 3 are listed in
Table 25. Note that only six of the twenty top scoring articles here were originally
identified as multivariate outliers. There was also some overlap with Factors 1, 9 and 11
as several texts scored highly on all or most of these factors (which, again, was confirmed
108
by Factors 1, 3, 9, and 11 clustering together, see Table 41).
Table 25
texts, Factor 3 was interpreted as a general educational discourse which thematized high
school entrance exams typically consisting of tests of skills in mathematics and (foreign)
languages. Text excerpt 3 (from POL-15-5-2003-92, ranked 1 in Table 25) illustrates the
A total of 255 students have applied for the 168 available slots in the Belgrade
philological high school, for now. Of the total number of 342 candidates, this is
how many passed the special entrance exam which was identical for all linguistic
high schools and sections in Serbia. We say for now because also the students
109
who took the Serbian language and literature and foreign languages exams at the
Karlovac, Smederevo, Kruševac or Kragujevac high schools (which have one
English-language section each) will be eligible to apply to the Belgrade school in
June. And, they can ‘squeeze out’ a student who has fewer points than them.
Similar to other eighth-graders, all these students take qualification exams in
mother tongue and mathematics in mid-June. Failing these exams does not
disqualify a student from the process of admission to the philological schools, but
they do mean extra points for children who do well on them.
Here, it is important to note that, depending on the type of school, the process of
admission to high schools in Serbia, and elsewhere in the Balkans, requires applicants to
pass several entrance exams which include mother tongue (i.e., Serbian), math, and a
foreign language, most often English. Texts scoring highly on Factor 3 thus typically
discuss the process of admission to high schools and the relevant entrance exams; texts
scoring highly on this factor exhibit an administrative educational discourse and do not
included the following salient collocates: teach, level, be able to, common, instruction,
foreign, framework, begin, knowledge, grade, department (katedra) and instructors, and
accounted for 2.60% of the variation. The texts with the top twenty factor scores on
Factor 9 are listed in Table 26. Here, as many as thirteen of the twenty top scoring
articles were originally identified as multivariate outliers. As noted above, the principal
area of overlap was with Factors 1 and 3 (Language education, Entrance exams), as well
110
Table 26
texts, Factor 9 was interpreted as a general discourse on foreign language education. Text
Ministarstvo prosvete Srbije odlučilo je juče da proširi listu predavača koji imaju
pravo da predaju strani jezik od prvog do šestog razreda osnovne škole, ako
poseduju znanje stranog jezika najmanje na nivou B2 zajedničkog evropskog
okvira. To znači da će strane jezike u nižim razredima moći da predaju i
profesori razredne nastave, diplomirani filolozi, psiholozi, pedagozi i druga lica
koja su završila neki nastavnički fakultet, saopštilo je Ministarstvo prosvete. Nivo
znanja stranog jezika dokazuje se polaganjem odgovarajućeg ispita na nekoj od
filoloških katedri univerziteta u Srbiji ili "međunarodno priznatom javnom
ispravom čiju valjanost utvrđuje Ministarstvo prosvete". Ministarstvo se odlučilo
na ovaj korak zbog nedostatka nastavnika za nastavu stranih jezika koja u
predstojećoj školskoj godini treba da počne u svim prvim razredima osnovne
škole.
Similar to the excerpt used to illustrate the discourse identified by Factor 1, this example
also points to a shortage of foreign language instructors in Serbian schools during the first
decade of the twenty-first century. Texts scoring highly on Factor 9, such as this Politika
article from August 23, 2003, typically discuss foreign language education issues in
Serbian elementary schools and high schools. Texts representative of this discourse thus
included the following salient collocates: Bosnian, Bosniak, elective, element, national
and board, and accounted for 2.10% of the variation. The texts with the top twenty factor
scores on Factor 11 are listed in Table 27. Note that as many as sixteen of the twenty top
scoring articles were originally identified as multivariate outliers here. As noted above,
the principle area of overlap was with Factors 1, 3, and 9 (see discussion of these factors
above and Table 41 below). There were also minor areas of overlap with Factors 5
and contestation), although this latter overlap was not attested by cluster analysis; a
possible reason for this is that Bosnian is sometimes discussed as a minority language in
112
Table 27
Similar to the second excerpt used to illustrate the discourse identified by Factor 1 above,
with elements of national culture”). Texts scoring highly on Factor 11, such as this
Politika article from November 12, 2004, typically discuss the then topical introduction
southwest Serbia, an area where the Bosniak minority has traditionally been in the
majority. Note the explicit link between minority language rights and ethnocultural
rights. Texts representative of this discourse also typically thematize Central South
Slavic ethnolinguistic identities and show traces of the discourse of contestation related to
these identities.
following salient collocates: alphabet, Cyrillic (adj./n.), use (n.), official, Latin,
constitution, protection, association and law, and accounted for 3.18% of the variation.
The texts with the top twenty factor scores on Factor 2 are listed in Table 28. Note that as
many as twelve of the twenty top scoring articles here were originally identified as
multivariate outliers. Similar to Factor 1, there was some overlap with Factor 5 (Minority
114
Table 28
texts, Factor 2 was interpreted as a classic discourse of endangerment (cf. Duchêne &
Heller, 2007), here referring to a (perceived) threat to the Cyrillic alphabet from the
widespread use of the Latin alphabet in Serbia. Text excerpt 6 (from POL-11-2-2005-
108, ranked 1 in Table 28) illustrates the discourse typically found in articles scoring
highly on Factor 2.
The association for the protection of the Serbian alphabet “Serbian Cyrillic”
demands that the President of Serbia apologize to the Serbian people because a
constitution draft submitted by an expert group he formed treats both Latin and
Cyrillic as Serbian alphabets. In their yesterday’s statement, members of the
association ask if the authors of this draft know of any other people in the world
115
who use a foreign alphabet in their language. […] Above all, the statement further
reads, the authors of this constitution draft do not know that the Latin alphabet
had not been used in the Serbian language before a time during which the good
will to help the Southern Slavic co-tribesmen Croats to finally get their own
alphabet was abused.
It should be noted here that both Latin and Cyrillic alphabets are in use in Serbia (for a
discussion of the significance of this issue, see Section 7). Although the two scripts are
equally functional, Cyrilic is widely seen as autochthonous and thus as closely linked to
the Serbian ethnonational identity. This has made the use of the Latin alphabet a target
for Serbian ultranationalists who argue that it represents a threat to Serbian Cyrilic and
Texts scoring highly on Factor 2 thus typically discuss a (perceived) threat to the
identities as the Latin alphabet is sometimes linked to Croats, as in the excerpt from a
national, Hungarian, rights and community, and accounted for 2.72% of the variation.
The texts with the top twenty factor scores on Factor 5 are listed in Table 29. Again, as
many as fifteen of the twenty top scoring articles here were originally identified as
multivariate outliers. The principal area of overlap was with Factor 2 (Cyrillic-only).
116
Table 29
noted that, although varieties of what used to be called Serbo-Croatian such as Croatian
and Bosnian are sometimes mentioned in the general discourse on minority rights (as in
the excerpt from a Politika article from June 6, 2006 below), there is a clear distinction
Slovak (hence the separate factors). Text excerpt 7 (from POL-6-6-2006-188, ranked 4 in
Table 29) illustrates the discourse typically found in articles scoring highly on Factor 5.
117
podsetimo, daje pravo pripadnicima nacionalnih manjina na dvojezične
dokumente u mestima u kojima je njihov maternji jezik u službenoj upotrebi.
Texts scoring highly on Factor 5 typically discuss language minority rights in Serbia, and
particularly in the northern province of Vojvodina where most minority populations are
ethnolinguistic identities, but, as noted above, because non-Serb Central South Slavic
Slavic minorities, texts representative of Factor 5 do not typically treat what I have been
there are often traces of a discourse of endangerment (for a description of this discourse,
see Section 6.6) in texts representative of Factor 5, which refers to the perceived
118
odluka o proterivanju rusinskog, slovačkog i zabrani latiničnog pisma uzbunila je
javnost.
decision and renaming, and accounted for 2.83% of the variation. The texts with the top
twenty factor scores on Factor 4 are listed in Table 30. Note that as many as fifteen of the
twenty top scoring articles here were originally identified as multivariate outliers. The
119
Table 30
texts, Factor 4 was interpreted as one of the two discourses on the officialization of
Montenegrin, which proceeded in two distinct phases with Serbian first being renamed
into mother tongue and then into Montenegrin. Text excerpt 9 (from BLI-30-3-2004-544,
ranked 1 in Table 30) illustrates the discourse typically found in articles scoring highly on
Factor 4.
As can be seen in this excerpt from a Blic article from March 30, 2003, texts scoring
highly on Factor 4 typically report on the protests against the new policy by professors
and students of Serbian in Montenegro. It should be noted that although Serbs (who were
Montenegro with strong political representation in the Montenegrin parliament, they were
unable to stop the implementation of this policy because ethnic Montenegrins and their
political parties received support from all other minority groups (Bosniaks, Albanians,
etc.). Texts representative of this discourse are directly relevant to Central South Slavic
ethnolinguistic identities and are linked to texts representative of the discourse identified
by Factor 8 (see Section 6.3.8), so they will be discussed in conjunction with one another
Nikšić, Montenegrins, mother (adj.), authorities and introduction, and accounted for
2.61% of the variation. The texts with the top twenty factor scores on Factor 8 are listed
in Table 31. Here, seven of the twenty top scoring articles were originally identified as
multivariate outliers. As noted above, the principal overlap was with Factor 4
121
(Officialization of Montenegrin 1).
Table 31
texts, Factor 8 was interpreted as the second of the two discourses on the officialization
Montenegrin, which was predominantly focused on the protests against the new policy by
students and professors of Serbian in Montenegro, this discourse also comprised the more
general views opposing the policy, particularly those espoused by nationalist intellectuals.
Matija Bećković, “dežurni branilac svesrpstva u Crnoj Gori, hitro je, u svom
poznatom stilu, reagovao na ideju dr Vukotića”, o uvođenju engleskog kao
službenog jezika u Crnu Goru: “Bilo bi veoma korisno da se engleski jezik
uvede kao drugi službeni jezik u Crnu Goru, jer bi se malo odmorio crnogorski
jezik koji je to odavno zaslužio... Ako umesto srpskog jezika i srpske azbuke
uvedu maternji, onda bi možda bilo pravednije da ga nazovu maćehinskim.”
Texts scoring highly on Factor 8, such as this Vreme article from July 17, 2003, typically
discuss the fallout around the officialization of Montenegrin in Montenegro as well as its
Bećković, Serbian writer and member of the SANU Department for language and
literature, first ironizes a proposal for the introduction of English as a second official
deserved rest”). Following this, Bećković, typically of Serbian objections to the change
in language policy in Montenegro at the time, also denounces the change in the official
name of the language in Montenegro from Serbian into mother tongue by proposing that
it be called “step-mother tongue” instead. Both of these proposals were (rightly) seen by
Serbs in general and Serbian nationalists in particular, in both Montenegro and Serbia, as
representative of this discourse thematize ethnolinguistic identity and are directly relevant
to a discussion of ethnonationalism.
Croatian, Serbs, academy, name, Serbian, Serbo-Croatian, war and call, and accounted
for 2.67% of the variation. The texts with the top twenty factor scores on Factor 6 are
listed in Table 32. Here, nine of the twenty top scoring articles were originally identified
123
as multivariate outliers. The principal area of overlap was with Factor 10 (Linguistics as
Table 32
and name. The existence and prominence of this discourse were noted at several points
above. Note that although contestation here is mainly between Serbs and Croats, also
Bosnians and Montenegrins are featured prominently in most relevant texts. Similarly,
ranked 4 in Table 32) illustrates the discourse illustrative of articles scoring highly on
Factor 6.
124
poznatom Bečkom književnom dogovoru (1850) ili ono što su Hrvati na osnovu
tog izvornog srpskog jezičkog standarda sačinili kao varijantu tog jezika za
današnji "hrvatski jezik"? Nesumnjivo je da Srbi u Hrvatskoj znaju da govore i
da pišu osim svog srpskog jezika i "novohrvatski", koji je izveden iz spomenutog
srpskog jezičkog standarda, a mnogi znaju i raniji hrvatski (čakavski i
kajkavski). Ako g. Mesić misli da Srbi uopšte ne govore svoj srpski jezik, to
može jedino da znači da su tamo srpski jezik i pismo zabranjeni. Ako misli da
Srbi u Hrvatskoj i nemaju svog srpskog jezika, to je u domenu šovinističke
farse, a ako su srpski jezik i ćirilica tamo i dalje zabranjeni, to bi moglo biti da on
tu istinu potvrđuje, pa bi mu trebalo odati neku vrstu priznanja.
Here we see a typical example of the discourse of contestation mentioned above, with the
exception that this particular text also shows that contestation can go in the opposite
direction as well. In other words, rather than Serbs contesting other Central South Slavs’
Texts scoring highly on Factor 6, such as this Politika article from January 1, 2003, thus
typically discuss language and identity-related contestation between the different Central
South Slavic communities (and, again, most often Serbs and Croats) that has been going
on since the nineteenth century but which, for obvious reasons, has been particularly
125
intense since the breakup of Yugoslavia. Historically, the Croats were the only non-Serb
Central South Slavic ethnic group allowed to officially name and standardize their
language during the Serbo-Croatian era, so they have borne the brunt of Serbian
nationalist wrath since the breakup of Yugoslavia (as well as before) as they are seen as
the precursor/precedent that led the other two ethnic groups (Bosniaks and Montenegrins)
thematize ethnolinguistic identity in a most pertinent way and are directly discursively
linked to texts representative of Factor 10, so they will be treated together in the
edition and expression, and accounted for 2.55% of the variation. The texts with the top
twenty factor scores on Factor 10 are listed in Table 33. Nine of the twenty top scoring
articles were originally identified as multivariate outliers. As noted above, the principle
area of overlap was with Factor 6 (Contestation over language ownership and name).
126
Table 33
6-2006-95, ranked 2 in Table 33) provides an illustration of the discourse typically found
127
sistema, samim tim i modelima istog jezika. Isti jezički sistem I novija kritička
misao (osobito danas) najčešće se ne bavi razlikama u okviru jezičkog sistema,
već imenovanjima jezika, pri čemu je kritici osobito podvrgnut naziv „srpsko-
hrvatski". […] Neodrživa je njegova ocena da u Rečniku SANU „ima malo
leksike iz Srbije", mada nje zapravo ima najviše. Ono čega nema jeste - dovoljan
broj saradnika i savremenih sredstava za dalji rad na Rečniku SANU.
As this excerpt suggests, texts scoring highly on Factor 10, such as this Politika article
from June 17, 2006, are typically longer and more complex than average newspaper texts
in this corpus. Top scoring texts here are characterized by a combination of references to
and language standardization issues, which often serve as arguments in the contestation
128
South Slavic ethnolinguistic identities overtly, and so are directly relevant to an analysis
included the following salient collocates: book, publisher, writer, literary, literature,
published, work, part, edition, poem and poetry, and accounted for 2.61% of the
variation. The texts with the top twenty factor scores on Factor 7 are listed in Table 34.
Note that only three of the twenty top scoring articles here were originally identified as
multivariate outliers. There was also no identifiable overlap between Factor 7 and any
other factors in this solution, suggesting a discourse separate from others identified here.
Table 34
texts, Factor 7 was interpreted as a general discourse on literature and publishing. The
existence and prominence of this discourse were noted above. Text excerpt 13 (from
129
articles scoring highly on Factor 7.
The poetry of Duško Novaković, the laureate of the “Vasko Popa” award, is one
of the most potent literary contributions of the last decade. […]. The tenth annual
“Vasko Popa” award was given to the poet Duško Novaković for the best book of
poetry printed in 2003 by a unanimous jury decision. Novaković received the
award for his book I chose the moon which was published by the city library
“Vladislav Petković Dis” in Čačak [an urban area in Serbia]. Before the author of
I chose the moon, the “Vasko Popa” award […] was given to Borislav Radović [a
[Serbian poet] […]. Novaković published his first book of poetry, The mirror
connoisseur, in 1976. The second edition of that book (The mirror connoisseur
and associated poems), in which some poems were published in new, more or less
changed versions […], was published about ten years ago.
Texts scoring highly on Factor 7, such as this NIN article from March 25, 2004, typically
feature reports on new literary editions, literary awards or events. Texts representative of
and they do not exhibit any discursive links with texts representative of other
factors/discourses.
course, interest, culture, institute and Belgrade, and accounted for 1.73% of the variation.
The texts with the top twenty factor scores on Factor 12 are listed in Table 35. Eleven of
the twenty top scoring articles were originally identified as multivariate outliers here.
130
Similar to Factor 7, there were no identifiable discursive links between this discourse and
Table 35
illustration of the discourse typically found in articles scoring highly on Factor 12.
Strani kulturni centri oduvek su bili prozor u svet i prilika da se ne samo nauče
strani jezici, već i da se bolje upozna kultura velikih nacija. Strani kulturni
centri su kod nas već decenijama sastavni deo kulturne ponude. […] Naši
sagovornici kažu da je najveći broj korisnika među mladima, naročito onih koji
pohađaju kurseve jezika. […] Po obimu literature, beogradski institut spada u
prvih pet naših instituta u svetu, a našu publiku čine većinom visokoobrazovani
ljudi - kaže Gudrun Krivokapić [head librarian, Belgrade Goethe Institute library].
Zbog velikog interesovanja, zaposleni u centrima imaju problem s prostorom,
ali su zato prezadovoljni odzivom.
Foreign cultural centers have always been a window onto the world and an
opportunity to not only learn foreign languages but also acquaint oneself better
with the cultures of the great nations. For decades, foreign cultural centers have
also been an integral part of our cultural scene. […] Our collocutors say that most
users are young people, particularly those attending language courses. […] By
131
library size, the Belgrade institute is in the top five of our major institutes
globally, and most of our audience is made up of highly educated people, says
Gudrun Krivokapić. Due to enormous interest, the staff at these centers are
dealing with a lack of space, but they are also very happy with the number of
visitors they get.
As can be seen from this excerpt from a Blic article from November 21, 2004, texts
centers in Belgrade and around Serbia, language courses and cultural events in particular.
(e.g., “the cultures of the great nations”), they do not thematize ethnolinguistic identities
To conclude this section, it has been shown that the twelve factors suggest twelve
(small ‘d’) language-related discourses, some of which are linked. Further, it has been
shown that, although most discourses identified here do feature references to Central
South Slavic identities, six of the twelve factors/discourses are clearly more pertinent for
one hand, and ethnonationalism, on the other. Those six factors/(small ‘d’) discourses (2,
4, 6, 8, 10, 11) are further analyzed using quantitative and qualitative methods in the
of Variance)
This section presents the results of analysis of synchronic and diachronic variation
individual factors interpreted as (small ‘d’) discourses above. Note, again, that only the
six factors (2, 4, 6, 8, 10, and 11) whose comparatively greater relevance for Central
132
South Slavic ethnonationalism(s) was established in Section 6.3 are analyzed here.
publication the factor scores for each of the six selected language-related discourses (i.e.,
factors) of texts grouped by publication (Blic, NIN, Politika, and Vreme) were compared.
Table 36.
Table 36
Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a significant
difference, (3, N = 943) = 11.902, p = .008. Pairwise comparisons showed that texts from
Politika oriented more positively on this language-related discourse than did texts from
Blic, although the difference did not reach the adjusted significance level.
Officialization of Montenegrin 1 (Factor 4), the Kruskal-Wallis test was conducted and
Contestation over language ownership and name (Factor 6), the Kruskal-Wallis test was
Pairwise comparisons showed that texts from NIN and Politika oriented significantly
133
more positively on this language-related discourse than did texts from Blic, while also
texts from NIN oriented significantly more positively on this discourse than did texts
from Vreme.
Officialization of Montenegrin 2 (Factor 8), the Kruskal-Wallis test was conducted and
Kruskal-Wallis test was conducted and indicated a significant difference, (3, N = 943) =
16.707, p = .001. Pairwise comparisons showed that texts from Politika oriented
significantly more positively on this language-related discourse than did texts from Blic
and Vreme.
Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted and
showed that texts from Politika oriented significantly more positively on this language-
publications was thus attested only for Factors 4 and 8 (Officialization of Montenegrin 1
and 2) as texts from different publications were shown not to be significantly different in
this respect (i.e., they treat the issue of officialization of Montenegrin in similar ways).
Analysis of mean differences on all other factors showed that texts from Politika tended
to score significantly more highly than texts from Blic on all factors except Factor 2
134
(where the difference failed to reach the significance level adjusted for multiple pairwise
comparisons). Texts from Politika also scored significantly more highly than texts from
Vreme on Factor 10, while texts from NIN scored significantly more highly than texts
from either Blic or Vreme on Factor 6. This suggests some differences in the discursive
broadsheet dailies and some periodicals (Politika vs. Vreme), tabloid dailies and some
periodicals (Blic vs. NIN), as well as between periodicals themselves (NIN vs. Vreme).
Blic, and to a lesser extent Vreme, thus seem either to have devoted less attention to the
discourses suggested by Factors 2, 6, 10, and 11 than Politika and NIN, or to have treated
factors in ways undetected by factor analysis here. Note, however, that this does not
year of publication the factor scores for each of the six selected language-related
discourses (i.e., factors) of texts grouped by year of publication (2003, 2004, 2005, 2006,
and 2008) were compared. Descriptive statistics for each language-related discourse by
Table 37
Publication
135
To evaluate the hypothesis that ‘year of publication’ could differentiate factor
scores on Cyrillic-only (Factor 2), the Kruskal-Wallis test was conducted and indicated a
that texts from 2003, 2004, and 2006 oriented significantly more positively on this
scores on Contestation over language ownership and name (Factor 6), the Kruskal-Wallis
test was conducted and indicated no significant difference, (4, N = 943) = 2.939, p = .568.
10), the Kruskal-Wallis test was conducted and indicated no significant difference, (4, N
scores on Officialization of Bosnian (Factor 11), the Kruskal-Wallis test was conducted
then, a period of five to six years may not be long enough for many significant diachronic
136
differences to emerge. At the same time, a high degree of stability in a majority of
discourses identified here seems entirely plausible. Hence, we see very little change in
the discourses suggested by Factors 4, 6, 8, 10, and 11 during this period of time. The
only significant difference here is on Factor 2 between the years 2003, 2004, and 2006,
on the one hand, and the year 2008, on the other, which suggests a possible abatement of
interest in the issue of alphabet and therefore this facet of the discourse of endangerment
toward the end of this period. But this finding should be taken with caution as the
dynamics of discourse are volatile and interest in a particular issue can be quickly
article the factor scores for each of the six selected language-related discourses (i.e.,
factors) of texts grouped by type of article (general newspaper articles vs. letters-to-the-
editor) were compared. Descriptive statistics for each language-related discourse by type
Table 38
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
on Cyrillic-only (Factor 2), the Mann-Whitney U test was conducted and indicated a
oriented significantly more positively on this language-related discourse than did general
137
newspaper texts.
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
and indicated a significant difference, U = 21.607, p = .002. This result indicates that
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
on Contestation over language ownership and name (Factor 6), the Mann-Whitney U test
was conducted and indicated a significant difference, U = 22.885, p = .013. This result
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
and indicated a significant difference, U = 20.793, p = .000. This result indicates that
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
26.159, p = .349.
To evaluate the hypothesis that ‘type of article’ could differentiate factor scores
on Officialization of Bosnian (Factor 11), the Mann-Whitney U test was conducted and
138
6.4.6 Summary of variation by type of article. Newspaper articles and letters-
articles are often written by public figures and professionals or experts. Similarly, in
addition to the general readership, letters-to-the-editor are often written by public figures
expressions of discursive and practical consciousness (in the sense of Kroskrity, 1998).
However, to the extent that newspaper articles tend to be written by journalists and other
written by the general readership arguably exhibiting practical consciousness, these two
editor scored significantly more highly than did newspaper articles on Factor 2,
suggesting a heightened interest in alphabet and thus greater currency of this facet of the
discourse of endangerment among the general readership (and the readers of dailies in
particular, see above). It is also worth noting that some of this variation is due to the
associations demanding more stringent legal protections for the Cyrilic alphabet (for an
139
articles, on the other hand, scored significantly more highly on Factors 4 and 8 which
suggests that the change of language policy in Montenegro attracted comparatively more
interest among journalists and language experts and which may be another indicator of a
dwindling interest in language-related issues among the public. Lastly, there was no
significant difference between the groups in terms of Factors 10 and 11. Arguably, this is
dominant the (big ‘D’) discourse of contestation is, as well as to what extent it is
internalized.
However, it should again be noted that discursive differences attested here do not
necessarily mean that significant ideological differences exist also, as examples arising
from qualitative analysis below suggest that the same essentialist language ideology can
of publication, or type of article. The results of the final quantitative technique applied
This section presents the results of cluster analysis which was used to further
analyze covariance patterns. First, the discursive links between factors (e.g., Factors 4
tested by examining mean factor scores for each cluster. Second, the composition of each
cluster was examined with respect to three categorical independent variables: publication,
140
6.5.1 Preferred cluster solution and scoring patterns by factor and cluster.
After a range of cluster solutions was examined, a six-cluster solution was identified as
optimal for this data set. Table 39 shows the descriptive statistics for all twelve factors as
Table 39
141
N Mean SD Std. 95% Confidence Interval for Mean Minimum Maximum
Error Lower Bound Upper Bound
F8 1 775 -.7218 2.35914 .08474 -.8882 -.5555 -2.03 17.95
2 72 -1.1108 1.81867 .21433 -1.5381 -.6834 -2.03 6.00
3 134 -.2289 2.00141 .17290 -.5709 .1131 -2.03 11.20
4 159 -1.6072 1.28089 .10158 -1.8078 -1.4065 -2.03 8.63
5 78 11.5404 9.81900 1.11178 9.3265 13.7542 -2.03 39.05
6 39 .6529 4.92769 .78906 -.9444 2.2503 -2.03 16.33
Total 1257 .0000 4.46109 .12583 -.2469 .2469 -2.03 39.05
F9 1 775 -.6976 2.76180 .09921 -.8923 -.5028 -3.00 26.40
2 72 -.1311 2.70570 .31887 -.7669 .5047 -3.00 9.13
3 134 6.8956 9.77446 .84438 5.2254 8.5658 -3.00 65.71
4 159 -1.5884 1.21759 .09656 -1.7792 -1.3977 -3.00 2.68
5 78 -.4929 2.47942 .28074 -1.0520 .0661 -3.00 7.75
6 39 -2.1271 1.47721 .23654 -2.6060 -1.6483 -3.00 2.89
Total 1257 .0000 4.65937 .13142 -.2578 .2578 -3.00 65.71
F10 1 775 -.9072 3.50589 .12594 -1.1544 -.6599 -4.03 17.62
2 72 14.9900 8.98762 1.05920 12.8780 17.1020 -4.03 35.57
3 134 -2.7043 1.63691 .14141 -2.9840 -2.4246 -4.03 5.18
4 159 -.5382 3.21347 .25485 -1.0415 -.0348 -3.89 10.35
5 78 .8504 4.05708 .45937 -.0643 1.7652 -4.03 17.53
6 39 .1380 3.73877 .59868 -1.0740 1.3499 -4.03 11.72
Total 1257 .0000 5.42277 .15295 -.3001 .3001 -4.03 35.57
F11 1 775 -.1661 2.48777 .08936 -.3415 .0093 -.86 25.26
2 72 .0630 1.70718 .20119 -.3382 .4641 -.86 9.25
3 134 1.7206 7.44665 .64329 .4482 2.9930 -.86 39.91
4 159 -.6243 1.11762 .08863 -.7994 -.4493 -.86 10.02
5 78 .0578 1.64304 .18604 -.3127 .4282 -.86 6.72
6 39 -.2975 1.29823 .20788 -.7183 .1234 -.86 6.01
Total 1257 .0000 3.25725 .09187 -.1802 .1802 -.86 39.91
F12 1 775 .0491 3.88165 .13943 -.2246 .3228 -2.32 35.09
2 72 .1018 2.56701 .30253 -.5015 .7050 -2.32 10.16
3 134 -.5944 2.43468 .21032 -1.0104 -.1784 -2.32 9.73
4 159 .1818 2.69452 .21369 -.2402 .6039 -2.32 13.46
5 78 .5309 4.56334 .51670 -.4980 1.5598 -2.32 19.77
6 39 -.9244 1.97424 .31613 -1.5643 -.2844 -2.32 4.80
Total 1257 .0000 3.56106 .10044 -.1971 .1971 -2.32 35.09
142
Table 40
Table 41
Discursive Links Between Twelve Factors Based on Highest Mean Scores for Six Clusters
As can be seen from Table 39, then, the mean factor scores are different for each
of the six clusters. Also, there is a significant difference between the mean factor scores
143
for all factors except Factor 12 (Table 40). The mean plots further show that none of the
factors has a high mean score for Cluster 1, while the highest Factor 12 mean score is for
Cluster 5 but without a statistically significant difference. Table 41 shows how the
Slavic ethnolinguistic identities and thus these two factors are grouped in Cluster 2.
education and thus are grouped in Cluster 3. Here, three generally ethnolinguistically
in the context of language education and the (small ‘d’) discourses represented by all four
factors share language education-related lexis. Factor 7 (Literature and publishing) with
the highest mean score for Cluster 4 is not grouped with any other factors. This was
expected as texts loading on this factor typically dealt with subject matter that was
(Officialization of Montenegrin 1 and 2), grouped in Cluster 5, make a logical set since
both deal with the same issue, albeit with slightly different foci, as explained above.
Finally, Factors 2 and 5 (Cyrillic-only, Minority language rights) are grouped in Cluster 6
144
The results of cluster analysis shown above thus suggest two conclusions. First,
lexical covariance extends beyond the individual factors and suggests the existence of
discursive links between factors, i.e. small ‘d’ discourses shared by two or more factors.
The most obvious example of this is the discourse on the officialization of Montenegrin
(Cluster 5), but other (and perhaps qualitatively somewhat different) discursive links
between factors can be inferred from the other clusters as well (e.g., language education
discourse in Factors 1, 3, 9, and 11). Second, despite its obvious usefulness, lexical
covariance alone is not sufficient to identify (big ‘D’) discourses and particularly
ideologies. This can be seen in the way Factors 4, 6, 8, 10, and 11, all of which bear
traces of an overarching (big ‘D’) discourse of contestation (see Section 6.6), are grouped
in separate clusters. Although each of these factors points to an aspect of the discourse of
contestation suggested by previous analyses and thus confirms its existence and extent,
quantitative correlational analysis seems unable to capture the underlying link and thus
the (big ‘D’) discursive (and ultimately ideological) construct itself. This, as has been
6.5.2 Synchronic and diachronic clustering patterns. Cluster analysis can also
texts cluster in any particular patterns with respect to independent variables such as, in
our case, publication, year of publication, and type of article (general newspaper articles
145
Table 42
Table 43
Table 44
which none of the twelve factors had high mean scores; this cluster thus probably
represents the majority of the variance in the data (65.87%) unaccounted for by this
factor solution. Total cluster membership further shows that the largest number of the
remaining texts (159) are grouped in Cluster 4, which is not surprising considering the
146
general nature of Factor 7 (Literature and publishing) which had the highest mean score
for this cluster. The second largest cluster was Cluster 3 (134 texts) with four of the
twelve factors (1, 3, 9, 11) having their highest mean scores for that cluster as well. This
was followed by Cluster 5 (grouping Factors 4, 8, and 12, and 78 texts), Cluster 2
(grouping Factors 6 and 10, and 72 texts), and Cluster 6 (grouping Factors 2 and 5, and
39 texts).
cluster membership, although it should be noted that Cluster 6 comprises no texts from
the weeklies (NIN, Vreme). This suggests either that weeklies were less interested in the
Cyrillic alphabet and minority language rights (Factors 2 and 5) as compared with the
dailies (Blic, Politika), or that the readers of weeklies themselves had less interest in these
issues and hence did not write any letters-to-the-editor pertaining to them; it is also
possible that fringe groups’ activists were less interested in the weeklies as a vehicle for
their message on the endangerment of the Cyrillic alphabet so they didn’t contribute any
letters-to-the-editor either. Further, the largest numbers of Blic and Vreme articles
grouped outside of Cluster 1 (35 and 7, respectively) were concerned with language
education (Cluster 3, Factors 1, 3, 9, and 11), whereas the largest numbers of NIN and
Politika articles (22 and 123, respectively) were concerned with literature and publishing
(Cluster 4, Factor 7). Interestingly, Politika articles contributed the largest proportion of
endangerment and contestation (63/72 for Cluster 2/Factors 6 and 10, 60/78 for Cluster
5/Factors 4 and 8, and 35/39 for Cluster 6/Factors 2 and 5), but this finding is hardly
surprising considering that most 5+ hits articles (847/1,257) come from Politika.
147
Table 43 shows that there is no clear relationship between ‘year of publication’
manifested in texts grouped in Clusters 3 and 5 (pertaining to language education and the
officialization of Bosnian and Montenegrin) seem to have been more pertinent in 2003
and 2004, while discourses manifested in texts grouped in Clusters 2 and 6 (pertaining to
contestation, Cyrillic alphabet, and minority language rights) seem to have been more
pertinent in 2005 and 2006. Arguably, these patterns simply reflect peaks in (public)
interest in these issues and the concomitant fluctuation in discursive activity and numbers
of articles (rather than any qualitative differences in terms of the big ‘D’ discourses).
Finally, Table 44 shows that there is no clear relationship between ‘type of article’
Cluster 6 (pertaining to the Cyrillic alphabet and minority language rights) again seem to
of variance in Section 6.4.5 also. These patterns suggest that writers of the letters-to-the-
editor thematizing language (whether lay people, activists, or experts) seem to have had
surprisingly little interest in the officialization of Montenegrin, but this is perhaps another
ethnolinguistic identity toward the end of this period (see Figure 2), possibly due to a
fatigue with nationalism and the inevitability of Montenegrin independence; that writers
controversial (and more technical) issues of literature and publishing is less surprising.
148
The status of the Cyrillic alphabet and minority language rights, on the other hand, are
issues that seem to have been closer to home for many writers of letters-to-the-editor, and
particularly fringe groups’ activists, so the comparatively high degree of interest in issues
represented by Cluster 6 exhibited in this type of text is expected here. Arguably, the
declining language standards (arising from standard language ideology) which is widely
attested in the public (and, again, particularly lay) language-related discourses in many
other societies (see, e.g., Johnson & Ensslin, 2007, especially Part II), and Serbian society
Based on a lack of any clear-cut patterns here, we can conclude that, despite some
overlap between the discursive profiles of the different publications, and between the
‘d’ discourses during this period. This is coupled by a high degree of both synchronic
contestation, see Section 6.6). With the presentation of the results of quantitative
As noted above, qualitative analysis was based on and informed by the results of
quantitative analysis; the results of qualitative analysis were in turn checked against the
results of quantitative analysis. Initial qualitative analysis consisted of basic content and
149
thematic analysis informed by the results of quantitative analyses above; this was
followed by an analysis of topoi in the CDA/DHA tradition. The findings are presented
by factor (i.e., small ‘d’ language-related discourse); again, factors grouped in the same
clusters are treated together (with the exception of Factors 2 and 11 as their ‘partner’
factors, Factor 5 and Factors 1, 3, and 9, respectively, were shown to be marginal in terms
of Central South Slavic ethnolinguistic identities and ethnonationalism). Note that the
texts cited below can be identified as representative by looking up their file codes in the
tables showing top scoring texts for individual factors in Section 6.3 (for a key to the
tables, see Section 4) . Presentation is organized as follows: presented first are excerpts
in original Serbian, followed by English translations; file codes are only given with the
original text (and translations when repeated without the original text).
points to texts discussing the issues of alphabet choice and status in Serbia. These are
most often discussed in the context of changes to the constitution that were under
Usvajanje novog ustava Srbije, o kome se već dugo priča, podstaklo je i raspravu
o službenom pismu naše države. (BLI-2-9-2006-272)
The adoption of the new constitution of Serbia, which has been discussed for a
long time now, initiated a discussion about the official alphabet in our country.
Ima mišljenja, veli on, da sa svojim pismom “ne možemo u Evropu i svet” i da
stoga moramo preći na latinicu. (POL-16-3-2003-93)
Some people think, he says, that “we cannot join Europe and the world” with our
alphabet so we have to switch to the Latin [alphabet].
150
Političke potrebe danas zahtevaju unifikaciju latiničkog pisma svugde u svetu.
(POL-22-8-2006-59)
Pobornici stava da srpski jezik treba da ima dva službena pisma, latinicu i
ćirilicu, smatraju se naprednijim, ističući da je upotreba latinice ono što će nas
približiti Zapadu. (BLI-2-9-2006-272)
The proponents of a two-alphabet (Latin and Cyrillic) solution for the Serbian
language consider themselves more progressive, insisting that the use of the Latin
alphabet is what will bring us closer to the West.
Nema naroda u svetu koji danas drži do sebe i do svojih kulturnih i nacionalnih
korena a da toliko zapostavlja svoje pismo koliko to čini srpski narod. (POL-16-
3-2003-93)
Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does.
explained that Avramov [the mayor of Šid, an urban area in the province of
Vojvodina] came up with the idea to outlaw the Latin alphabet because he was
afraid for the Cyrillic.
[d]a li je autorima ovog predloga poznat još neki narod na svetu koji za svoj jezik
koristi tuđe pismo (POL-11-2-2005-108)
do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language
opšte pravilo: jedan jezik – jedno pismo, jer na dva pisma u svom jeziku ni jedan
151
drugi narod u svetu ne piše (POL-21-9-2004-72)
general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language.
Ćirilično pismo je Srbima deo identiteta, to je njihova važna odrednica bez koje
oni ne bi više bili ono što su bili i što jesu. […] nije reč o ličnoj upotrebi pisma,
već je reč o kolektivnom i osnovnom ljudskom pravu Srba na svoj jezik i svoje
pismo. (POL-16-3-2003-93)
For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […] this is not about personal use
of alphabet, but rather about a collective and basic human right of Serbs to their
language and their alphabet.
Svuda u svetu pismo i jezik većinskog naroda mora biti na prvom mestu, i mi
tražimo da tako bude i kod nas. (POL-21-9-2004-72)
Everywhere in the world the language and the alphabet of the majority must come
first and we ask that this be so here also.
In addition, the Latin alphabet is often, implicitly or explicitly, identified with the Croats
srpski jezik nikada nije pisan latinicom, sve do trenutka kada je zloupotrebljena
dobra volja da se južnoslovenskim saplemenicima Hrvatima pomogne da i oni
konačno dobiju svoje pismo (POL-11-2-2005-108)
Serbian language had never been written using the Latin alphabet until the
moment when the good will to help the South Slavic co-tribesmen Croats finally
get their own alphabet was abused
as well as non-South Slavic minorities which are routinely discussed in terms of their
152
Avramov je mislio da nacionalne manjine koje ne prelaze 15 odsto ukupnog broja
stanovništva nemaju pravo na službenu upotrebu maternjeg jezika. (POL-16-4-
2005-91)
Avramov [the mayor of Šid, an urban area in the province of Vojvodina] thought
that national minorities which do not cross the threshold of 15 percent of the total
population do not have a right to official use of their mother tongue [and
alphabet].
Recently, “Politika” published several texts which claim that the Latin alphabet
which Serbs use in their language is not a Serbian but rather “a Croatian
alphabet”. This is, undoubtedly, an attempt to again start a campaign against the
Latin alphabet and against the “destruction of the Serbian Cyrillic”, which has
been led by members of several associations for the protection of the Cyrillic and
their supporters. Although I am a Serb and although I use only Cyrillic to write in
Serbian, I think that their attitude and work are wrong, useless, and even
153
damaging to the Serbian language and Serbia. […] The members of the
“Cyrillic” associations and their supporters uselessly throw slogans around such
as that “Serbian Latin alphabet does not exist”, “Latin is not a Serbian but
Croatian alphabet” and so on, and ask if there is another nation in the world which
uses somebody else’s alphabet in its language. Of course, there isn’t because
every nation considers its own the alphabet it uses regardless of where, when and
how it came to be. It is well known, for example, that the characters of the
Japanese alphabet originate from China, but the Japanese and others say they use
the Japanese alphabet to write. Besides, the French, English and Dutch alphabets
are identical and yet no one accuses anyone of using somebody else’s alphabet.
Those who insist that the Latin alphabet used by Serbs is “somebody else’s
alphabet” do not know or do not want to know that exclusive ownership over any
alphabet does not exist in world linguistics. Alphabets belong to all languages
that use them, either in whole or in part. Therefore, the Latin alphabet learned
and used for as long as 90 years by Serbians and even longer by other Serbs, and
by 80 percent of Serbs all the time or part of the time, cannot be somebody else’s
but only a Serbian alphabet. […] In order to protect a language and alphabet
many different segments of society must be activated and made responsible.
Using labels such as “somebody else’s” and “Croatian” to satanize the Latin
alphabet will not protect the Serbian Cyrillic.
language policy in Montenegro whereby the name of the official language was first
changed from Serbian to mother tongue (prior to independence) and then from mother
against the change in policy by students and professors of Serbian in Montenegro, some
of whom ultimately lost their jobs because of their refusal to implement the new policy,
Profesori srpskog jezika i književnosti nikšićke gimnazije, koji već šesti dan
bojkotuju izvođenje nastave zbog preimenovanja nastavnog predmeta u
“maternji”, dobili su podršku kolega iz škole, koji su u pismu upućenom
Ministarstvu prosvete Crne Gore zapretili opštim bojkotom nastave, ukoliko se
njihovim kolegama uruče najavljeni otkazi. (POL8-9-2004-157)
The professors of the Serbian language and literature at the Nikšić [an urban area
in Montenegro] high school, who have boycotted instruction in protest against the
renaming of their subject to ‘mother tongue’ for six days now, received support
from their colleagues from the school, who in their letter sent to the Ministry of
154
Education of Montenegro threatened a general boycott if their colleagues are fired
as has been announced.
The protesters and various other actors in Serbia itself such as journalists, linguists, and
politicians object to the policy on historical, cultural, practical, and (pseudo-) scientific
grounds,
Also the Kotor [an urban area in Montenegro] education workers protested,
emphasizing that Serbian has been spoken in their area for centuries.
taj čin doprineo ostvarivanju težnji vlasti i dela ljudi u Crnoj Gori da se odavde
protera sve što bi moglo da asocira na srpstvo (POL-15-4-2004-89)
that act contributed to the realization of the plan of the authorities and a part of the
people in Montenegro to banish from here all associations to Serbhood
The hunger strikers called on all students and education workers of Nikšić [an
urban area in Montenegro] to join them and raise their voices in “the defense of
what is sacred”.
nije im problem žrtvovati struku i nauku, istoriju i tradiciju, uneti zabunu i haos u
nastavni i školski sistem, a time i u društvo u celini. (POL-7-72004-109)
they have no problem sacrificing profession and science, history and tradition, or
introducing confusion and chaos into the educational system and therefore the
society as a whole.
155
The council holds that the label mother tongue cannot be introduced into curricula
for elementary schools and high schools because it is imprecise, linguistically
problematic and baseless and because it would cause numerous scientific and
professional as well as practical problems…
Naš Odsek broji oko 300 studenata i svi smo jedinstveni da ne dozvolimo
mešanje najprizemnije politike u fundamentalne naučne i lingvističke principe.
(POL8-9-2004-157)
Our department has around 300 students and we are all of one mind not to allow
the meddling of basest politics in fundamental scientific and linguistic principles.
In addition, objections are also raised by relying on (again, selective and flawed)
Ako Amerikancima ne smeta engleski jezik, ne vidim razloga zbog čega bi nekom
u Crnoj Gori smetalo ime jezika srpski, ili srpskohrvatski. (POL-7-7-2004-109)
If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian.
In addition to texts about the protests, Factor 8 points also to texts which discuss the
Svi istorijski izvori, književnost i celokupna kulturna baština Crne Gore ćiriličke
su provinijencije i svedoče da je jezik Crnogoraca bio i jeste srpski jezik. (POL-
31-3-2004-2)
156
All historical sources, literature and the entire cultural heritage of Montenegro is
Cyrillic and testifies that the language of Montenegrins has always been and is the
Serbian language.
In King Nicholas I’s Law,35 which has 83 articles, 13 elementary school subjects
are mentioned. First is Christian doctrine, second is Serbian history, and third is
the subject of Serbian language.
Prvi put je Austrougarska, ukinula srpski i ćirilicu, drugi put su Italijani 1941.
godine naložili da se uvede maternji jezik, a sada to čini crnogorska vlast.
(POL-31-3-2004-2)
The first time Serbian and Cyrillic were outlawed by the Austro-Hungary, the
second time in 1941 the Italians ordered a change to mother tongue, and now this
is being done by the Montenegrin authorities.
Ono što Đukanović hoće do sada niko nije ostvario, ni turski osvajači. (BLI-20-9-
2004-204)
The theme of (symbolic) historical violence is also carried through to the present,
157
this means a danger of changing history, on the one hand, and of killing the spirit,
the being and the creativity of the people, on the other
protest against the violations of the Constitution and violence against language
Obviously, they are doing this through a forced change of the identity of
Montenegro.
158
provided scientific argumentation for this, that there are no scientific, linguistic,
historical or sociocultural reasons to rename Serbian into Montenegrin. Besides
Serbs and Montenegrins, Serbian is spoken in Montenegro also by Muslims and
Bosniaks.
language are often revealed to be about conflicts, societal, political, religious, cultural,
“a process of assimilation of Serbs has begun and the authorities will, through the
raising of the questions of language and the status of the church, through
discrimination against the Serbian people, want to transform that people into what
they want it to be – to become people who declare their nationality to be
Montenegrin, who speak the Montenegrin language and belong to the non-
existing Montenegrin church”
6.6.3 Excerpts from texts representative of Factor 11. Similar to Factors 4 and
159
a result of the Council of Europe and European Union requirements pertaining to
minority rights. Factor 11 thus points to texts discussing the then pending recognition of
areas with a Bosniak majority, as well as the resistance to this on the part of various
political and academic actors in Serbia. Similar to the officialization of Montenegrin, the
of contestation,
The educational board of the Serbian parliament has concluded that the minister
of education Slobodan Vuksanović [then Serbian Minister of Education] had
exceeded his legal authority by approving instruction in the subject Bosnian
language with elements of national culture.
The minister and his assistant said that the Bosnian language did not exist, that
regulations did not foresee official use of that language, that curricula and
textbooks for that subject had not been approved, that that subject, according to
law and the Rulebook on curricula, could be neither a compulsory nor an elective
nor an optional subject in this school year.
which is mostly about the name of the language rather than the right to a minority status
itself,
Njihov zahtev bi mogao da se okarakteriše i kao manji od onoga što već uživaju
Albanci, Hrvati, Mađari... Samo kada bi bosanski jezik postojao. Greška ili
namera – Ne znam zašto su tražili da se uči bosanski, a ne bošnjački. Možda je
160
greška. (POL-12-11-2004-113)
Their request could also be characterized as less than what is already enjoyed by
Albanians, Croats, Hungarians… If only the Bosnian language existed. Error or
intent – I don’t know why they requested that Bosnian and not Bosniak be taught.
Perhaps it’s an error.
Zvanično ime jezika može da bude samo bošnjački jezik, odnosno da proizilazi iz
priznatog etnonima Bošnjaci, a ne “bosanski”. BiH je zemlja u kojoj žive tri
ravnopravna naroda i Bošnjaci ne treba da uzurpiraju pravo na bosansko ime.
Samo nekoliko primera, koji pokazuju da se u svetu koriste prvobitni nazivi za
jezike koji su u upotrebi dva ili više naroda. Tako, austrijski narod, koji ima
državu hiljadu godina, govori nemačkim jezikom, a ne austrijanskim ili
austrijskim. Švajcarci nemačkog porekla govore nemačkim, a ne švajcarskim
jezikom. Američki narod svoj jezik naziva engleskim, a ne angloameričkim ili
američkim. (POL-10-1-2005-134)
The official name of the language can only be Bosniak, deriving from the
recognized ethnonym Bosniaks, and not ‘Bosnian’. Bosnia-Herzegovina is a
country with three equal nations and Bosniaks should not usurp the right to the
Bosnian name. Just a couple of examples which show that for languages used by
two or more nations the original name is used around the world. Thus, the
Austrian nation, which has had a state for a thousand years, speaks German and
not Austrian. The Swiss of Germanic origin speak German, not Swiss. American
people call their language English, not Anglo-American or American.
After that an expert opinion was read in the meeting which had been provided to
the Educational board by the SANU Board for the standardization of the Serbian
language from which it can be concluded that in Serbian Bosnian means Bosniak,
and in Bosnian – Bosnian.
Ako jezikoslovci kažu da bosanski jezik postoji (kao da je nauka o jeziku od juče
pa se ne zna koji jezici postoje na Balkanu i u svetu), onda pravnici treba da ga
pretoče u paragrafe. (POL-10-1-2005-134)
If linguists say that the Bosnian language exists (as if linguistics was a recent
161
development so we didn’t know which languages existed in the Balkans and
around the world), then lawyers need to turn it into paragraphs.
They reminded that the minister said at one time that the Bosnian language did
not exist, that he had to wait for an expert opinion on what language was spoken
by Bosniaks.
162
can nevertheless inform themselves about the clarity of categories and the
meritum of things via the name of the language, said professor Klajn. According
to him, it would be illogical to introduce ‘Bosnian’ as the official language for the
citizens of Bosnia-Herzegovina, Bosnians and Herzegovinans, members of three
[ethnic] national communities who speak three languages – Serbian, Croatian and
Bosnian. That would mean an introduction of Bosnian as the state language,
while Serbian and Croatian, according to this logic, would have the status of
minority languages.
However, the Bosniak minority is also sometimes given a voice, albeit very rarely, as in
For our culture and for the Bosniaks of Sanjak [area in Southwest Serbia with a
Bosniak majority] this is an exceptional historical event – said author Alija
Džogović [Bosniak textbook author in Sanjak/Serbia] on the occasion, reminding
that “the Bosnian language was outlawed almost a hundred years ago.”
Bosanski, a ne bošnjački zbog toga što su se, objasnio je, građani izjasnili da je
njihov jezik bosanski, što je to tradicija i zato što se u Sarajevu, gde je bio sa
predsednikom Srbije Borisom Tadićem, uverio da bosanski jezik postoji na
lingvističkoj karti. (POL-10-12-2004-127)
Bosnian and not Bosniak because, as he explained, the citizens opted for Bosnian
as their language, because that’s the traditional name and because he had an
opportunity during a trip to Sarajevo with Serbian President Boris Tadić to see for
himself that the Bosnian language existed on the linguistic map.
6.6.4 Excerpts from texts representative of Factors 6 and 10. Finally in this
section, Factors 6 and 10 (Contestation over language ownership and name, Linguistics
163
particular emphasis on the role of the Serbian Academy of Science and Arts (SANU).
Factor 6 points to texts discussing contestation over language ownership and its name as
well as linguacultural and ethnic authenticity, primarily involving Serbs and Serbian on
the one hand, and Croats and Croatian on the other, as in the following excerpts,
The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speaks its own, Serbian language, which is close to other
Slavic languages, but also different from them, as well as that Serbs had three
faiths: [Eastern] Orthodox, Roman-Catholic and Mohammedan.
Everything Serbian and Croatian was mixed together. This will turn out to be a
big mistake because everything that was common, on a new and unnatural basis,
would soon begin to divide. That division, projected from the Croatian side,
naturally was at Serbian expense. It meant that Serbian culture could keep only
what had been created by Orthodox speakers of the Serbo-Croatian language.
164
The delays in the publishing of the different volumes of the Dictionary are the
least problem in terms of damage. It is much more of a problem that the downfall
of the Serbian linguistic science that began immediately after the death of Vuk
Karadžić (1864) continues today. Renaming the language from Serbian (which is
what it was called during the time of its last reformer) into Serbo-Croatian,
Serbian linguists entered a period during which the Vuk’s path in the naming and
development of the language of the Serbian people was abandoned. That that
period still continues is confirmed by the fact that Serbian linguists continue to
call their language “Serbo-Croatian” in the Dictionary even after the demise of the
“Serbo-Croatian language”.
The name of the language is the most prominent point of contention, while the
[pseudo-] scientific discourse on language and linguistics) are merged with other similar
and ethnolinguistic identity (and thus, implicitly, political legitimacy) which rests on
When, at the 1861 Croatian Assembly, the issue of the name of the official
language was brought up, it was suggested that it be: “Croato-Slavonic-Serbian”,
“Croato-Slavonic”, “Croato-Serbian”, “Croatian or Serbian”, “Croatian”,
“Serbian” or “people’s language in the three-nation Kindgom”. […] The
Assembly adopted a law according to which the official language was called
165
“Yugoslav”. Serbs were not satisfied with such a solution. […] In the
Yugoslavhood that was offered them, they unmistakably detected a form of
Greater-Croatianhood, the aim of which was to erase the Serbian name, Serbian
national identity, and even the Serbian national being itself. […] For the reasons
mentioned, the Serbian name was thus excluded via a proposal that was attractive
and seemingly satisfying for both Serbs and Croats. The Greater-Croatian
tendency was thus concealed by a Yugoslav name. That name was supposed to
trick the Serbs, they were supposed to be slowly but steadily erased from the
everyday life of Croatia, to deprive them of political individuality and make them
an integral part of a Croatian “political” people.
Po toj tezi, Vuk Karadžić je za osnov standardnog srpskog književnog jezika uzeo
jezik kojim su govorili pravoslavni istočni Hercegovci. A oni su, po toj hrvatskoj
nacionalističkoj teoriji, u stvari Hrvati prevedeni u pravoslavlje, tako da su Srbi
“ukrali” Hrvatima jezik koji danas zovu srpski, pa zato sada ima toliko problema
sa tim jezicima. (POL-10-7-2006-142)
According to this thesis, Vuk Karadžić took the language spoken by [Eastern]
Orthodox East-Herzegovinans as the basis of the standard Serbian literary
language. And they, according to this nationalist theory, were in fact Croats
converted to [Eastern] Orthodox Christianity, so Serbs “stole” the language they
call Serbian today from Croats, which is why there are so many problems with
these language now.
There were many tricks in their labeling, naming and representing the language.
They said it was one, unified, the same and common language. But that doesn’t
say whose language that is. Yes, it is one, but Serbian, it is unified, but not also
166
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. But all this (it being the same, common and one) cannot be reason enough
to dual-label or multiple-label the language, or to entirely rename the Serbian
language into Croatian. A language can only be named after the people it belongs
to, but not after the names of the peoples who also use it. English is also one,
common and the same for all people who speak it, but it is well known whose
language it is and what it’s name is, regardless of who speaks it and where. It is
always only English even when it is spoken in the United States, Canada,
Australia, New Zealand or anywhere else in the world. Such are also German,
Spanish, and Portuguese language. A language does not belong to him who
speaks it, but to him who created it. Serbian people have created their language
for centuries. The Croats did not create that language. They got it and took it
over ready-made, with all the characteristics that the Serbian language already
had.
Science and politics Thus politics began to meddle in science: Science undeniably
determined that the Vuk’s Štokavian language is Serbian, politics demanded that
it also be Croatian. But when Croatian linguists tyranically throw the Serbian
language out of the name and call that Serbian language Croatian, and Croatian
alone, Serbian linguists, under the pressure of outdated political ideas and
philosophies, continue to stubbornly call their language both Serbian and Croatian
(Serbo-Croatian).
167
there are no dual-label language names anymore anywhere in Europe or the world
(even English is not called, nor has it even been called, “American-English”).
Serbian linguists also do not understand what the purpose of the label “Serbo-
Croatian/Croato-Serbian”, or “Croatian or Serbian” language was (only to
separate the “Croatian literary language” from the Serbian language, to separate
“Bosniak” or “Bosnian language”, to now plan the separation of the “Montenegrin
language).
Našim lingvistima ostaje da po volji biraju BHMS ili srpski. Vreme je da se naši
lingvisti potpuno okrenu nauci i da već jednom perstanu da strahuju od politike.
Bio bi to čin ne samo prihvatanja naučnih vrednosti nego i moralni čin pokajanja
i izvinjenja srpskoj nauci i srpskom narodu. Neka mirno i slobodno nazovu
veliki rečnik SANU srpskim rečnikom. (POL-22-7-2006-55).
Factor 10, on the other hand, suggests a more technical discourse pertaining to
lexicography and language standardization. Texts scoring highly on this factor typically
discuss language standardization issues, linguistic studies or book editions, most of which
[Milan Šipka] Već sam jednom prilikom rekao da se nakon disolucije zajedničkog
srpskohrvatskog standardnog jezika srpski lingvisti, i Srbi kao narod, nisu
168
jasno odredili prema novonastaloj situaciji. Zbog toga znatno zaostajemo u
lingvističkim aktivnostima, što je posledica i odsustva šire društvene podrške
negovanju srpskog standardnog jezika i jezičke kulture. (POL-5-1-2008-166)
[Milan Šipka, well-known Bosnian Serb linguist] I’ve already said on one
occasion that Serbian linguists, as well as Serbs as a people, never adopted a clear
position toward the situation that came about after the dissolution of the common
Serbo-Croatian standard language. This is why we lag behind in terms of
linguistic activity, which is also a consequence of a lack of broader societal
support for the development of the Serbian standard language and the linguistic
culture.
The unabridged SANU dictionary of Serbian (in preparation since the 1960s) is the most
we never paid much attention to such dictionaries because our main project for
decades was the unabridged SANU dictionary of Serbian which, large as it was,
was supposed to meet all lexicographic needs
Lastly, the (pseudo-) scientific arguments and a discourse of contestation are present but
6.6.5 Topoi. In addition to the basic content and thematic analysis above,
representative texts were examined also for evidence of argumentation strategies in the
DHA tradition (Wodak, 2001; topoi, explicit or inferable obligatory premises which make
169
it possible to connect arguments with the conclusion, or simply “the common-sense
reasoning typical for specific issues,” van Dijk, 2000 cited in Baker et al., 2008, p. 299).
evidence that the language spoken by Central South Slavs is Serbian in origin
The pseudo-scientific arguments which form the basis of these two topoi were
already noted above. Nauka ‘science’ and naučni ‘scientific’, for example, were
identified by both keyword (Tables 12, 14 and E1) and collocation (Table F1) analysis as
items of potential discursive and ideological interest; they also both loaded highly on
Factor 10 (Table 23) which was shown by cluster analysis to be discursively linked with
Factor 6 (Table 41), perhaps the single most representative factor of the discourse of
contestation. In the excerpts from representative texts above, we saw the following
Competent experts, linguists with scientific authority, have claimed since the time
of Vuk Karadžić, the creator of the literary Serbian language, that the language of
Montenegrins and Serbs is the same (POL-31-3-2004-2)
Serbian linguistics clearly says that only the syntagma Bosniak language can be
used in Serbian for the standard language used by Bosniaks (POL-16-2-2005-82)
linguists have provided scientific argumentation for this, that there are no
scientific, linguistic, historical or sociocultural reasons to rename Serbian into
170
Montenegrin (POL-31-3-2004-2)
less obvious in the results of the quantitative analyses (partly, perhaps, on account of the
omnipresence of references to English many of which do not pertain to this topos), but
quite prominent in the representative texts. The examples we saw above include,
If Americans are OK with calling their language English, I can’t see any reason
why someone in Montenegro would have a problem with the name Serbian, or
Serbo-Croatian. (POL-7-7-2004-109)
Just a couple of examples which show that for languages used by two or more
nations the original name is used around the world. Thus, the Austrian nation,
which has had a state for a thousand years, speaks German and not Austrian. The
Swiss of Germanic origin speak German, not Swiss. American people call their
language English, not Anglo-American or American. (POL-10-1-2005-134)
A language can only be named after the people it belongs to, but not after the
names of the peoples who also use it. English is also one, common and the same
for all people who speak it, but it is well known whose language it is and what it’s
name is, regardless of who speaks it and where. It is always only English even
when it is spoken in the United States, Canada, Australia, New Zealand or
anywhere else in the world. Such are also German, Spanish, and Portuguese
languages. (POL-22-7-2006-55)
171
Today, no nation in the world which cares about its identity and its cultural and
national roots neglects its alphabet as much as the Serbian nation does. (POL-16-
3-2003-93)
do the authors of this proposal know of any other nation in the world which uses
somebody else’s alphabet in its own language (POL-11-2-2005-108)
general rule: one language – one alphabet, because no nation in the world uses
two alphabets to write its language. (POL-21-9-2004-72)
Finally, it should be noted that both of these topoi are in evidence in language-related
discourses coming from the leading academic linguists, as well as the more marginal
7. Discussion
language ideologies in Central South Slavic and what similarities/differences are there
between them? This question was addressed through a continuing comparison between
the different methods and techniques throughout the presentation of results (Section 6).
Despite all the challenges that extensive inflectional morphology presents for
that all four methods can be successfully applied to identify lexical patterns suggestive of
key-keywords and their associates provided a macroscopic view of the characteristic lexis
and lexical patterns in the research corpus and hinted at their covariance and thus the
172
Significant collocates and n-grams provided complementary evidence that confirmed and
supplemented the patterns identified by keyword analysis and added a phrasal dimension
to the discursive profile; collocation analysis also supplied data for exploratory factor
analysis and cluster analysis. Most importantly, exploratory factor analysis took the
somewhat amorphous collocate data and turned them into a detailed discursive profile of
representative texts in the form of factor scores unavailable from any other methods.
synchronic and diachronic variation do exist and can be used in conjunction with the
results of other methods.) Finally, cluster analysis built on both the collocate data and the
factorial structure to provide an account of the patterning in the data with respect to the
discursive links between factors, and the three independent variables for a more fine-
differences between the lexical patterns identified by each method were largely a product
of their different approaches to the data. For example, where keyword analysis focuses
on lexical items that are significantly more frequent in the research corpus, collocation
analysis focuses on the lexis co-occurring with the core concept(s). Both analyses thus
produce patterns that are characteristic of the corpus, but from different perspectives.
This has been shown to involve a great deal of overlap as well as some differences.
Keyword and collocation analyses thus sometimes pointed to two different sides of the
same coin, as it were. A good example here is the relative prominence of the item
173
latinica ‘Latin (alphabet)’ in the results of collocation analysis, and its almost complete
absence from the results of keyword analysis. Similarly, ćirilica ‘Cyrillic’ is considerably
more prominent in the results of keyword analysis. These two lexical items are both very
prominent in the discourse of endangerment around the Serbian Cyrillic alphabet and so
incomplete picture, even it would have been quite possible to identify this pattern through
Systematic similarities and differences can also be seen in the way these analyses
can combine pertinent lexical items into groups for higher-order analysis. Keyword
analysis is in this respect somewhat similar to factor analysis as it can take a text-based
(i.e., macroscopic) view of covariance between individual lexical items to produce sets
associates and factors illustrated how similar the results of these two analyses can be.
another clear indication of the superiority of factor analysis to the other methods
employed here, particularly because of the widespread politicization of and biases in the
Collocation analysis, on the other hand, does not offer a way of combining items
into groups, except for those that repeatedly occur together in more or less fixed ways
(i.e., phrases or n-grams). Further, similar to keyword analysis, collocation analysis does
not provide an objective way to identify representative texts. However, unlike both
keyword and factor analyses, collocation analysis offers concordance lines which can be
174
used to quickly assess lexical patterns in actual use in different texts, but which have been
shown to be of limited use here. Also, unlike all three perhaps, cluster analysis accounts
for all of the data, and provides a way of testing relationships between the factorial
structure and independent categorical variables (which was also done using analysis of
variance).
identified in the 5+ hits section of SERBCORP? This question was addressed through
examination of top scoring (i.e., representative) texts for evidence of explicit or implicit
references to Central South Slavic ethnolinguistic identities and topoi. The findings were
The quantitative evidence from keyword and collocation analysis showed that, at
the most general level, one of distinct and remote cultural identities (i.e., in SERBCORP,
implicitly monolithic codes (cf. standard language ideology, Milroy, 2001). This was
indicated by frequent use of glottonyms which imply monolithic language varieties with
clearly demarcated boundaries and associated national identities (e.g., Serbian, English),
as well as sets of possessive pronouns constructing in- and out-groups and implying
ownership of language (e.g., our, own). At the level of SERBCORP as a whole, then, the
one-to-one correspondence between language and national identity, which at the same
175
time seems to be an expression of a belief in the “impossibility of heterogeneous
Blommaert & Verschueren, 1998, p. 207). However, despite the binary difference and the
emphasis on an “us and them” view of collective identity, differences between what are
understood as distinct language varieties are internalized and taken for granted and so
there is very little evidence of identity-related contestation. In other words, only intra-
This is, of course, entirely different at the level of less distinct and geographically
and culturally closer regional (i.e., Central South Slavic) ethno-cultural identities (5+ hits
section of SERBCORP). Here, even the most basic quantitative analysis pointed to the
prominence of lexical items such as, for example, name, label, renaming and (does not)
exist and thus a tendency toward negation of separateness and contestation of separate
names and identities. Keyword associates (Section 6.1.3) and n-grams (Section 6.2.2)
confirmed this tendency and showed that it pertained to a limited set of Central South
Slavic ethnolinguistic identities (e.g., the renaming of the Serbian language into
Montenegrin), while factor analysis showed the (big ‘D’) discourses of endangerment and
particularly contestation to be the most dominant, extending across six of the twelve
identified factors (i.e., small ‘d’ discourses). The dominant conceptualization of language
here is still one of natural one-to-one correspondence between language and (ethno-)
national identity and thus also homogeneism, but now the boundaries between in- and
out-groups are much less clearly defined as the lack of linguistic distinctiveness is used to
undermine claims to separate identity (as elsewhere in Europe, cf. Blommaert &
176
differences projected outwardly and minimize (or erase) differences projected inwardly,
typical of nationalism (see, e.g., Hobsbawm, 1990). Ultimately, the dominant language
putative immutable, primordial character of the nation that created it, e.g.,
For Serbs the Cyrillic alphabet is part of their identity, it is their important
determiner without which they would not have been who they have been and
without which they would not be who they are […]. (POL-16-3-2003-93)
But, how are we to understand the function of such a conceptualization of language and
the use of specific argumentation strategies (i.e., topoi), particularly with respect to the
The third research question was: What links can be identified between the
177
language-related discourses and language ideologies relevant to Central South Slavic
Central South Slavic area. In addition to the contestation we note today, there have been
earlier historical examples, sometimes making equally absurd claims, such as the theory
according to which Carinthian Slovene dialects were more closely related to Germanic
Greeks, Bulgarians, and Serbs, particularly in the latter half of the twentieth century,
which still continues today (for details about these, see Voss, 2006). Arguably, these
represent examples of what Irvine and Gal (2000) call ‘fractal recursivity’ whereby (inter-
linguistic) binary oppositions are used for the specific local purposes of delegitimation of
one (intra-linguistic) ethnolinguistic identity or another. All this contestation has two
things in common. The first is a focus on language. As the French-Serbian scholar, Yves
Tomić, notes in his expert report on the ideology of Greater Serbia in the nineteenth and
twentieth centuries written for the United Nation’s International Criminal Tribunal for the
former Yugoslavia (UN ICTY) in The Hague (Tomić, n.d.), at the root of the Greater
Serbian ideology is the Herderian language ideology according to which language is the
only valid criterion for the determination of national identity (see also Carmichael, 2000).
178
At the end of the nineteenth and beginning of the twentieth centuries (and even
today, see, e.g. Glenny, 1995), linguists were putting their knowledge at the
service of politicians by choosing one or another isogloss as the definitive
justification for their ethnic identity – and therefore nationality […]. [L]inguistic
features become ‘flags’ that are manipulated to represent territorial claims. […].
The claims about nationality [a]re then translated into claims for the territory to be
included in the nation-state.
In his study of the negations of the Macedonian ethnolinguistic identity, Voss (2006, pp.
120-122) thus writes that “even in Yugoslav times we notice the coincidence of national
language ideology and ethnic identity ideology”, a contradiction which “becomes even
sharper after 1991” when “cultural policy became a tool in the rivalry of post-communist
However, cultural policy has been a favorite tool in the nationalist projects for
much longer. In her book, Yugoslavia’s implosion: The fatal attraction of Serbian
nationalism, Sonja Biserko, a former Yugoslav diplomat and the president of the Helsinki
Committee for Human Rights in Serbia, traces the origins of contemporary Serbian
nationalism back to the beginnings of the nineteenth century, “the formative period of
Serbia as a nation-state” (Biserko, 2012, p. 34), and the idea of resurrection of the
homogeneous state” (p. 33). This idea, known as “Greater Serbia” throughout the
twentieth century, was first formulated into a national strategy in 1844 in a work by Ilija
Garašanin, Serbian minister of internal affairs from 1843 to 1852, famously titled
“Načertanije” (‘draft plan’). The plan envisaged a resurrection of the medieval Serbian
state which had been destroyed by the Turks by integrating all Balkan territories in which
Serbs lived, either as a majority or as a minority, into a single state. These included large
179
Montenegro, and northern parts of Albania (Tomić, n.d., p. 13). The plan has also been
widely known by an oft-repeated formula which summarizes it as “all Serbs in one state”.
However, much like elsewhere in Europe, this was a time of Romanticism and inception
of the national consciousness,36 when collective identities where much less clearly
delineated and much more fluid than they are today, so it was not always clear who Serbs,
for example, were. But this dilemma would be conclusively solved for Serbian
nationalists by the Serbian linguist and ethnographer, Vuk Karadžić, who created the
Serbian Cyrillic alphabet and initiated the standardization of modern Serbian. In his
book, indicatively titled Serbs, all and everywhere, written in 1836 and published in
1849, Karadžić demarcated the national Serbian territories and launched the theory of
Serbs as a people of several faiths (i.e., Orthodox, Catholic, and ‘Mohammedan’) unified
by a common language (see, for example, Tomić, n.d., pp. 8-9). Indeed, Western analysts
cited in Tomić, n.d., p. 10, Note 13), consider the ideas of Vuk Karadžić to be the
nationalism in the last two centuries, from the formation of Serbia as a nation-state in the
first half of the nineteenth century, to the two world wars and two Yugoslav states in the
first half of twentieth century, to the breakup of Yugoslavia and the ensuing Yugoslav
wars at the end of the twentieth century. However, for our purposes here, one other
showing signs of internal struggles and instability already before Josip Broz Tito’s death
in 1980. This trend was accelerated by the political and economic uncertainties in the
180
period following Tito’s death. Again, as mentioned above, Serbian elites tended to view
the state which was opposed primarily by Slovenes and Croats (cf. Biserko, 2012). In
this climate and very much in the Serbian tradition of drafting conspiratorial nationalist
and Arts at least one of whom was a leading linguist (Pavle Ivić), drafted the infamous
SANU Memorandum in the fall of 1986 (SANU, 1986). The memorandum alleged Serbs
and Serbia to be in an “unequal position” and “threatened” in Yugoslavia, and blamed the
1974 constitution which decentralized the country and gave greater rights to individual
republics, arguably a historically and politically valid arrangement but one in which Serbs
were a minority everywhere outside of Serbia itself. As Biserko (2012, p. 82) notes, “[i]n
essence, the Memorandum reiterared the Serbian national agenda from the late nineteenth
and early twentieth century, calling for ‘the liberation and unification of the entire Serb
people and the establishment of a Serb national and state community on the whole Serb
territory’.” Needles to say, the Memorandum was and continues to be widely regarded in
the former Yugoslavia as the definitive statement of the Serbian nationalist program, the
Greater Serbia, which was the principal cause of the 1990s wars.
Most interestingly, the Memorandum mentions the noun ‘language’ ten times, the
noun ‘linguists’ one time, and the adjective ‘linguistic’ three times in its thirty-two pages
181
Delovi srpskog naroda, koji u znatnom broju žive u drugim republikama, nemaju
prava, za razliku od nacionalnih manjina, da se služe svojim jezikom i pismom, da
se politički i kulturno organizuju, da zajednički razvijaju jedinstvenu kulturu svog
naroda.
That language was made compulsory also for Serbs in Croatia through a
constitutional decree, while the nationalist Croatian linguists continue to distance
it from the language in other republics of the Serbo-Croatian language area
through systematic and well-organized actions, which contributes to the
weakening of the links between Serbs in Croatia and other Serbs.
Praktično značenje izjava: „moramo brinuti“, „treba se boriti“, „više treba učiti
ćirilicu“ itd. može se procenjivati samo u njihovom suočenju sa stvarnom
jezičkom politikom koja se vodi u SRH. Ostrašćena revnost kojoj je cilj
konstituisanje zasebnog hrvatskog jezika što se izgranuje u protivstavu prema
svakoj ideji o zajedničkom jeziku Hrvata i Srba ne ostavlja dugoročno mnogo
izgleda srpskom narodu u Hrvatskoj da očuva svoj nacionalni identitet.
The practical meaning of statements such as “we must take care of”, “we need to
fight”, “Cyrillic should be taught more often”, etc., can be evaluated only against
the real language policy in the Federal Republic of Croatia. The zeal whose aim
is to create a separate Croatian language, opposed to the idea of a common
language of Croats and Serbs, does not leave much longterm prospect to the
Serbian people in Croatia of preserving their national identity.
Pod dejstvom vladajuće ideologije kulturne tekovine srpskog naroda otuđuju se,
prisvajaju ili obezvređuju, zanemaruju ili propadaju, jezik se potiskuje, a ćirilsko
182
pismo postepeno gubi.
Although the topoi featuring pseudo-scientific arguments on language attested above are
missing here, the discourse of endangerment is present and is more pronounced than
above, while the discourse of contestation is implicit rather than explicit. Apparently, the
discourse of Serbian linguistic nationalism evolved between 1986 and early 2000s,
adapting to the changing circumstances and replacing the alarmist, mobilizing discourse
widespread conflict fatigue. Further, the Croats are clearly labeled as the enemy through
labeling them “nationalist” and “zeal(ous)” which are then used to justify the proposed
action (i.e., a recentralization of the Yugoslav state). In other words, all the main
elements of the discursive complex of Serbian linguistic nationalism which we saw above
are on display here also. This is confirmed by the results of quantitative analysis which
identified Vuk Karadžić and SANU as some of the pertinent lexical items in the research
corpus: both Vuk Karadžić and SANU appear in the key lemma (Tables 12, 14, and E1)
and collocation (Tables G1 and G2) lists, while SANU also appears numerous times in n-
grams (Table 19) and Factor 10 in EFA (Table 23). Similarly, the results of qualitative
analysis exemplify the routine intertextual references to Vuk Karadžić and the work done
within SANU and thus the interdiscursivity between language and nationalism in Serbia,
183
It is much more of a problem that the downfall of the Serbian linguistic science
that began immediately after the death of Vuk Karadžić (1864) continues today.
Renaming the language from Serbian (which is what it was called during the time
of its last reformer) into Serbo-Croatian, Serbian linguists entered a period during
which the Vuk’s path in the naming and development of the language of the
Serbian people was abandoned. That that period still continues is confirmed by
the fact that Serbian linguists continue to call their language “Serbo-Croatian” in
the Dictionary even after the demise of the “Serbo-Croatian language”. (POL-08-
8-2003-80)
The Serbian Linguistic Culture Society, Serbian Learned Society and the Serbian
Royal Academy represent three acts in the creation of the most authoritative
Serbian scientific institution [SANU]. All three institutions held that Serbs are a
South Slavic people who speak their own, Serbian language, which is close to
other Slavic languages, but also different from them, as well as that Serbs had
three faiths: Orthodox, Roman-Catholic and Mohammedan. (POL-10-9-2005-
127)
language ideologies in evidence in the mainstream Serbian press therefore seem largely
to derive from the revived Serbian nationalist program first articulated in the nineteenth-
century discursive and ideological work by Vuk Karadžić and Ilija Garašanin, as well as
the 1986 SANU Memorandum, and are employed as a cultural policy tool in various
aspects of the realization of this program and the establishment of a Greater Serbian
hegemony in the South Slavic area in the Balkans. The alternative, and sometimes
hand, though present (as in the critique of the work of the “Cyrillic” associations
presented at the end of Section 6.6.1), are marginal and not easily detectable by either
related discourses and language ideologies are largely discourses and ideologies of
184
“attempt to maintain or reproduce a threatened national identity” (Wodak, de Cillia,
The fourth research question was: Is there synchronic and diachronic variation in
comparison of the factor scores for each of the six selected language-related discourses
(i.e., factors) of texts grouped by a) publication: Blic, NIN, Politika, and Vreme; b) year
of publication: 2003, 2004, 2005, 2006, 2008; and c) type of article: general newspaper
articles vs. letters-to-the-editor. Synchronic and diachronic variation were also examined
related discourses identified by factors suggest differences between the broadsheet daily
Politika (est. in 1904) as the oldest and most presitigious daily in Serbia and the weekly
NIN (est. in 1935) as the oldest weekly in Serbia, on the one hand, and the tabloid daily
Blic (est. in 1996) and the weekly Vreme (est. in 1990), on the other. Politika and NIN
articles, it will be remembered, scored significantly more highly than Blic or Vreme on all
difficult to interpret with any degree of certainty, it seems safe to conclude that the older,
more conservative Politika and NIN offer better representations of dominant discourses
and thus of linguistic nationalism in Serbia. However, it should be noted that the
the actual (big ‘D’) discursive and ideological content of texts across all four publications
185
(see Section 6.6), so this difference may be quantitative rather than qualitative in nature.
stability over time. Although a period of five to six years arguably is not long enough for
particularly because the one factor that actually showed diachronic variation (Factor 2:
Cyrillic-only) was the most marginal to the contestation of Central South Slavic
representative texts which suggested a high degree of congruence in the actual (big ‘D’)
discursive and ideological content of texts across all five years of publication considered
here. Finally, although the patterns of synchronic variation between different types of
picture, with some congruence (Factors 10 and 11) as well as significant differences
(Factors 2, 4, 6 and 8), also here the qualitative examination of representative texts
suggested a high degree of congruence in the actual (big ‘D’) discursive and ideological
content of texts.
8. Conclusion
This study had two major goals. The first was to determine whether a corpus-
terms of their usefulness and effectiveness for identification of lexical patterns suggestive
of language-related discourses and, ultimately, language ideologies. The second goal was
186
the mainstream Serbian press and to then examine those for any links with
straightforward of the two goals, it was dealt with first throughout the dissertation and it
is also dealt with first in this final section (Section 8.1). Conclusions about language-
related discourses and language ideologies identified in this study are offered in Section
8.2. Implications, limitations, and directions for future research are discussed in Sections
It has been noted already that most studies of language-related discourses and
language ideologies have relied on qualitative methods only, while those that also use
quantitative methods tend to rely on keyword and collocation analysis. The reasons for
quantitative terms, as well as the relative ease of use and effectiveness of basic corpus-
linguistic methods such as keyword and collocation analysis. However, as this study has
to slippery concepts such as discourse and ideology, this approach offers novel insights
and differences, the quantitative methods applied here were shown to each offer a unique,
complementary angle from which to consider the data, which ensures both a more
analysis was shown to be much more effective than collocation analysis at corpus
profiling for sampling purposes. In other words, the decision to focus on the 5+ hits
section of SERBCORP would have been more difficult to justify based on the results of
187
collocation analysis alone. Perhaps most interestingly, it was demonstrated that reliance
instructors, regardless of their size, tend to have more homogeneous discursive and
ideological profiles also which are easier to identify using basic techniques. Topically
heterogeneous data sets, on the other hand, present a challenge on account of their more
heterogeneous and therefore more complex discursive and ideological profiles. Again,
this is where exploratory factor analysis proved to be far superior to other methods.
One conclusion, therefore, is that these methods are complementary in that they
provide both unique (e.g., micro- vs. macroscopic) and common (e.g., frequent, recurrent
lexis) perspectives, so they are best applied in conjunction with one another for the
analysis (based though it is on the results of collocation analysis) is the only method that
can effectively take researcher inference out of the process of identification both of (small
‘d’) discourses (i.e., factors) and representative texts (by providing factor scores for each
text). This finding is of paramount importance for research into discourses and
traditionally politicized and biased. Similarly, this is important for critical discourse
analysis which has been in need of research guided and supported by the results of
only identify salient lexical patterns and small ‘d’ discourses. This study thus also
188
confirms the effectiveness of a mixed methods approach whereby quantitative and
based on and guided by the results of quantitative analysis and vice versa.
Despite its dangers, lemmatization may be a necessary evil as its advantages (e.g.,
outweigh its drawbacks (e.g., conflation of lexical items with potentially different
extensive inflectional morphology also has considerable potential for polysemy in the
problem for concordance analysis also because it breaks up the semantic unity of a
lemma (and with it its concordance patterns) into a myriad forms which concordancing
software such as WST is unable to deal with at the present moment. Here, the solution is
less clear, but will involve the development of software more capable of dealing with the
The first, a discourse of endangerment, was itself attested in two related forms each of
which focuses on a different aspect of language as well as a different ‘threat’. The first
form focuses on the perceived danger to the Cyrillic alphabet that a widespread use of the
189
Latin alphabet in Serbia is deemed to represent. Perhaps somewhat unusually among the
world’s languages, Central South Slavic uses both the Latin and the Cyrillic alphabets.
While the alphabets themselves are fully equivalent, the difference between them is in
their sociohistorical origins and thus sociolinguistic in nature. Despite the equivalence,
the Cyrillic alphabet has historically been strongly associated with the Serbian
ethnonational identity, whereas the Latin alphabet, largely because it provides a means of
distinction from the Serbs, is strongly preferred by the Bosniaks, Croats, and now
Montenegrins also. Here, then, we see an example of what Irvine and Gal (2000) call
‘iconization’ or mapping of linguistic features onto social images, positing a direct link
between one or more linguistic features and (an essentialist conceptualization of) the
nature of the persons or social groups who display them (for a reverse example, see
Section 6.6.1). Interestingly, however, the Latin alphabet is also in widespread use
among the Serbs themselves, both in Serbia and elsewhere, mostly because it has come to
iconization). However, the Cyrillic alphabet is preferred in official contexts and has
never been in true danger of being phased out. There rather seems to exist a kind of
is used in most official contexts and is associated with political conservatism and cultural
traditionalism, while the Latin script is mainly used in popular culture and is associated
with political liberalism and modernity (but see Herzfeld, 1987 for a problematization of
the concept of diglossia). The assessment of the endangerment of the Cyrillic is thus
(grossly) exaggerated, and the calls for its defense have primarily ideological and
political motivations. Note here that the calls to banish the Latin alphabet from Serbian
190
altogether represent an example of ‘erasure’ as the simplification of a sociolinguistic field
through which some persons, social groups, or sociolinguistic phenomena are rendered
invisible in ideologically and politically convenient ways (Irvine & Gal, 2000). Serbian
society has traditionally harbored a cult of victimhood since the defeat by the Ottoman
Turks in 1389 in Kosovo (see, e.g., Biserko, 2012), while vocal self-interested public
profitable) since the breakup of Yugoslavia that there are now widely used expressions
such as “to Serb” and “a professional Serb” to (mockingly) refer to the phenomenon.
Most proclamations of the endangerment of the Cyrillic in this data set (and, arguably, in
general) thus come from the fringe (minor civic associations and minor-league
academics) that harbors extremist political views; nevertheless, they are given enormous
The second form of the discourse of endangerment focuses on the perceived threat
to the Serbian language and ethnonational identity posed by the post-Yugoslav political
and cultural independence of other Central South Slavs and their concomitant exercise of
their right to name the common language to reflect their own separate identities (i.e.,
Bosnian, Croatian, Montenegrin). Though equally baseless and ultimately absurd, this
linguists, many of whom, such as Ivan Klajn for instance, are members of SANU, as well
as other academics and prominent writers (again, often members of SANU) and, more
often than not, Serbian politicians and, sometimes (though not attested here), also
members of the clergy in the Serbian Orthodox Church. As could be seen from the many
samples of this discourse cited above, there is an insistence on (and perhaps also a belief
191
in) the Serbian ethnic origins of all Central South Slavs (as well as the Macedonians who
speak a related but separate South Slavic language but are predominantly Eastern
Orthodox). The argument, based on this theory of ethnic origins and the tradition started
by Vuk Karadžić, but also the (politically convenient) Herderian language ideology
according to which language is the only valid criterion for ethnonational affiliation, is
theme in public discourse in Serbia since the breakup of Yugoslavia and a keyword in the
sense of Williams, 1976) of the Serbian Volk and therefore a step towards its ultimate
destruction. Figure 5 shows the discourse prosody and a range of applications for (the
decidedly negative term) rasparčavanje, from the partitioning of the Byzantine empire
(lines 3, 4, 6) to the partitioning of the (Serbian) language and literature (lines 5, 8, 10-
18) to the partitioning of the Vinča Institute of Nuclear Sciences at the University of
Belgrade (9, 19, 20). Note that a similar discourse of endangerment was evident in the
SANU Memorandum, although the focus there was on the ‘threat’ from the Croats since
Bosniaks and Montenegrins had not yet asserted themselves by mid 1980s when the
discourse of contestation. This discourse is directly related to the second form of the
discourse of endangerment and is routinely purveyed by the same set of actors, only here
Slavic identities and with them their political legitimacy and ultimately their rights to
192
Figure 5. Concordance lines for rasparčavanje ‘partitioning’ in SERBCORP
Topos 1 presents the issue of language name as a “scientific” problem which has been
conclusively settled, although it is never quite explained what this exactly means or what
the scientific methodology used to come to this conclusion was, except for occasional
vague references to dialectological studies. Regardless, the claim is repeatedly made that
the only “scientifically” justified name for Central South Slavic is Serbian. Topos 2,
South Slavic and colonial languages such as English, Spanish and Portuguese (but also
German), arguing that in cases of such polycentricity the “original” name is always
193
preserved even if the language comes to be used by different nations. The fact that
colonial languages were transplanted to the new nations largely via colonialist enterprises
is ignored, and is in fact rather indicative of the conceptualization behind this argument;
that Serbian was not transplanted from Serbia to other Central South Slavic nations in
such as, for instance, that in Scandinavia (see, e.g., Vikør, 2000) or the situation with
Hindi/Urdu, which is nearly identical to that with Central South Slavic in several respects
It has thus been shown that underlying the discourses of endangerment and
from the language philosophy of Johann Gottfried Herder as well as the Romantic
movement (Bauman & Briggs 2000, 2003), coupled by a Slavic language ideology
termed slovesnost in Serbian which encompasses language, alphabet, and literature and
treats them as different aspects of a unified linguistic entity. It has also been shown that
there exist intertextual and interdiscursive links between language-related discourses and
language ideologies in the mainstream Serbian press, on the one hand, and Serbian
has been shown that the dominant language-related discourses and language ideologies
were widely accepted in Serbian society during this period. Based on these findings, we
can conclude that, in the case of Serbia and the Balkans, language ideologies (and
language-related discourses) are indeed not “about language alone” (Woolard, 1998, p.
3), as well as that “[t]he continuing intensity of contestation” over language and
194
ethnonational identity is “hardly surprising, given the consequences envisaged and
authorized by the reigning language ideology and occasionally enacted under its auspices.
because they are also claims to territory and sovereignty” (Irvine and Gal, 2000, p. 72,
conflict situation of the Middle East, “[t]he political and military conflicts have stopped
but the linguistic conflict goes on” (Abd-el-Jawad & Al-Haq, 1997, p. 439). The ultimate
What remains to be done is to take a stand with respect to such ideologies, fully
recognizing the impossibility of ideology-free positions. Here, the best course of action
seems to be to follow theorists such as Gramsci (1971) and Gee (2010) in their appeal to
judge ideologies in terms of their social effects rather than their truth values.
8.3 Implications
The findings of this study should be informative and useful to several different
audiences. The results of the comparison between the quantitative methodologies should
but also those interested in similar approaches to discourses. The discursive and
195
sociolinguistics, as well as those in related fields such as linguistic anthropology, political
science, and sociology, as this is, to the best of my knowledge, currently unavailable.
More specifically, regional linguists and other public figures with an interest in language
such as politicians, academicians, and religious officials will be interested in how their
ideologies. Similarly, journalists and others working in and with the media will be
members of the public in the four states of the Central South Slavic area, as well as
regionally, should find the language-ideological profile of the mainstream Serbian press
which is often exploited for political purposes. It is hoped that empirical contributions
based on transparent, replicable procedures such as this one can help deconstruct the
reconciliation.
8.4 Limitations
analysis) perspectives, as well as an exhaustive data set (including all relevant articles
from the given time-frame), the language-ideological profile presented here must be
understood as tentative. The primary reason for this is that manifestations of public
language-related discourses and language ideologies are not limited to the press, but can
be found throughout the public sphere. Furthermore, while the press is an excellent
196
source of data on dominant discourses and ideologies purveyed by the elites, it is clear
that it does not represent equally well what the linguistic anthropologist Paul Kroskrity
calls “practical consciousness” (Kroskrity, 2004, p. 505), i.e., the so-called ordinary
people who tend to accept and naturalize dominant ideologies. In addition to this, there
are further aspects of language-related discourses that merit consideration such as, for
political affiliation, which are considered only in part here. Similarly, the concepts of
discourse and (language) ideology are notoriously problematic and the difficulties
associated with them are necessarily carried over into the study proposed here. Although
language ideology, more often than not, is a linguistic discursive phenomenon, to treat it
account of it. In other words, although the lexical approach to the identification of
their lexical manifestations and so should be examined from multiple other perspectives
(see below).
Central South Slavic area, i.e., Bosnia-Herzegovina, Croatia, Montenegro, and Serbia.
discourses and language ideologies in these four states are intimately linked on account
of the nations’ shared history. So, the first task for future research would be to produce
197
remaining three national contexts. The second task would be to subject the findings of
links to ethnonationalism in this area. Simultaneously with this, future research would do
well to include also data from other discursive sites such as various kinds of institutional
distinctions in our attempt to account for the totality of language-related discourses and
in addition to other kinds of data from different discursive sites, the approach to the
1
For a critical review of the literature on the evolution of a symbol system that is arguably what makes
can, not unreasonably, be characterized as pathological. This has meant the politicization of virtually
everything from strategically insignificant borderland areas between Slovenia and Croatia to the very name
of Macedonia which continues to be contested by Greece (see Voss, 2006); the virus, to continue the
medical metaphor, has easily infected the regional academic production, particularly in linguistics.
Greenberg (2004) himself thus notes that often such works, “given the ethnic affiliation of their authors, are
subjective and at times lack the scholarly rigor required in the study of linguistics” (p. 4; see also Irvine &
Gal, 2000, pp. 67-68). This situation has necessitated a reliance on outside sources, unaffiliated with any of
the parties in the region, so Kordić (2010), for example, relies heavily on extra-regional sources,
198
particularly those originating in the German-speaking world. However, as Greenberg also notes, little
relevant material is available in English, while also the treatments in other languages (e.g., Gröschel, 2009),
curiously enough, often exhibit obvious and disqualifying biases and are therefore not relied upon here (see
roughly the twelfth century and 1918, Serbia and Montenegro were part of the Ottoman empire between the
middle of the fifteenth century and 1878, while Bosnia-Herzegovina was part of both, belonging to the
Ottoman empire between the middle of the fifteenth century and 1878, and to the Austro-Hungarian empire
its subjects, there is a close association between religious and ethnic identities in the Balkans. Most
(religious) Croats are therefore Catholic, most (religious) Serbs and Montenegrins Eastern Orthodox, while
most (religious) Bosniaks are Muslim. This was further reinforced by the 1974 Yugoslav constitution
which recognized the non-Christian Bosnians as a separate ethnonational group but one defined by its
religious affiliation rather than ethnic identity, Muslims (as opposed to their now official ethnonym,
Bosniaks).
5
During that time, the area was still divided between the Austro-Hungarian and Ottoman empires, but
Serbia, in particular, exploited the increasing weakness of the Ottoman Empire to quickly move towards
full independence. Serbia and Montenegro were finally granted independence by the great European
powers at the Berlin Congress in 1878, while Bosnia-Herzegovina was placed under the administrative rule
of the Austro-Hungarian Empire to be annexed in 1908; Croatia remained part of Austria-Hungary until the
54) came in 1867, in Pero Budmani’s Serbo-Croatian Grammar (see also Katičić, 1997, p. 171).
7
Although Montenegro had been independent since 1878, it was annexed by Serbia shortly after the end of
World War I.
8
Advancing her thesis that Serbo-Croatian/Croato-Serbian is one polycentric language rather than several
different languages, Kordić (2010), counter to virtually all Croatian linguists, contends that Serbian
199
domination of the common standard is “a myth”. However, while this may be true to a large extent for
Croatia, it is patently untrue for Bosnia-Herzegovina and Montenegro (see Endnote 10). In addition, her
insistence on the exclusionary, if historical, name for the language, i.e. Serbo-Croatian/Croato-Serbian, is
rather telling.
9
I.e., South-Slavia, a name which symbolically incorporates all Southern Slavs (except Bulgarians). One
should note, however, that for many non-Serb former Yugoslavs this name has come to index Serbian
and particularly by the Serbs, who since at least the time of Vuk Karadžić, the nineteenth-century Serbian
language reformer, had disputed the existence of any other separate Central South Slavic ethnolinguistic
identities including the Croatian, arguing as Karadžić did that the Croats and Bosniaks were “Serbs of the
Catholic and Islamic faiths”, respectively. For details on Vuk Karadžić, see Endnote 27.
11
Not only did no Bosniak or Montenegrin linguists or literary figures participate in either the Vienna or
Novi Sad agreements (see, e.g., Völkl, 2002, p. 216), Conclusion 7 of the Novi Sad agreement literally
reads, “[…] A mutually (sic!) agreed-upon Commission of Serb and Croat experts will develop a draft of
schools, while all public institutions were required to display bi-alphabetal signs and use both alphabets in
their day-to-day operation. An illustrative example of this latter practice is the alternating use of the two
alphabets by the leading national daily Oslobodjenje, which published in the Latin alphabet one day and in
respectively, Bosniaks reverted to the historical designation of language in Bosnia-Herzegovina which had
first been upheld then abruptly ended by Austria-Hungary, declaring their language to be Bosnian in the
face of opposition from the other two groups which continues to this day.
14
Montenegro chose not to declare independence at that time and instead sided with Serbia in the
subsequent wars. Macedonia was spared a military conflict until a brief civil war with its Albanian
minority in 2001. Kosovo, long a southern Serbian province with a large Albanian majority, was
200
recognized as an independent state in 2008 after a brief 1999 war with Serbia and a NATO intervention
Section 7.3.
16
The period 2003-2008 was chosen on the basis of data availability at the time of corpus compilation.
17
Precise circulation figures are somewhat difficult to come by in the Balkans. Independent auditors such
as ABC Srbija (Audit Bureau of Circulations, www.abcsrbija.com) do keep track of circulation figures for
marketing purposes, but their reports are proprietary and require a costly subscription for access. However,
information on circulation figures can also be obtained from occasional press reports issued by publishing
are currently unavailable, Serbian newspaper market is fairly small and centralized, so there is little doubt
as to which publications are generally considered to be authoritative. It should also be noted that, in
addition to the four publications included in this study, the broadsheet daily Večernje Novosti also would
merit inclusion here on account of its relatively high circulation and standing; however, complete data sets
for the subject period were unavailable at the time of corpus compilation, so the Večernje Novosti data were
content.
21
JEZIK = jezik, jezika, jeziku, jezikom, jezici, jezike, jezicima (lemma forms by number and case).
22
The text of the articles was saved in plain txt format using Unicode (UTF-16) encoding and formatted
according to the TEI-guidelines for electronic text encoding and interchange (http://www.tei-
c.org/Guidelines/).
23
Pronouns, numbers, and quantifiers were excepted from deletion on account of their potential functions
in discourse strategies such as the referential/nomination strategy (i.e., the construction of in- and out-
201
groups, Wodak, 2001, pp. 72-74) as well as other, as-of-yet undetermined discursive functions.
24
Following recommendations in Tabachnick and Fidell (2007), the full data set was subjected to a log
transformation in an attempt to retain the cases previously identified as multivariate outliers and enhance
the factorability. However, the log transformation did not produce significantly better results, so the
(negro) and Gora (monte). Consequently, the two words were treated separately by collocation analysis
and were identified as two different collocates of the lemma JEZIK. Unsurprisingly, they turned out to be
correlated with one another to the point of singularity, so a decision was made to exclude one of the two
from further analysis and treat the remaining one as the full country name.
26
Although Biber (1988, p. 85, Note 2) correctly notes that “oblique solutions might be generally
preferable in studies of language use and acquisition, since it is unlikely that orthogonal, uncorrelated
factors actually occur as components of the communication process,” Varimax (orthogonal) and Promax
(oblique) rotations produced virtually the same results on this data set.
27
Z-scores were preferred to regression analysis here because the addition of multivariate outliers produced
a slightly different and inferior factor solution and thus factors and factor scores which were not directly
comparable to the preferred solution above and the regression analysis estimates of factor scores in it.
28
Vuk Stefanović Karadžić (1787-1864) was a Serbian language and literary scholar who created the
spelling system in use in contemporary Serbian and other Central South Slavic varieties and published
several early Serbian dictionaries and editions of Southern Slavic folk literarure (see
both concepts have been found to be useful in analysis of discourse, lexical patterns here are seemingly too
heterogeneous for any clear prosodies to emerge on account of a) the topical heterogeneity of the research
corpus, and b) the multifariousness of the concept of language compared to concepts such as age, for
202
reforma ‘reform’, Stefanović, jezik ‘language’, srpski ‘Serbian’, sabor ‘assembly’, prvi ‘first’, [Petar II
Serbian writer and government minister in the early 2000s], Dositej [Obradović, 1739-1811, Serbian writer,
Enlightenment philosopher, and the first Minister of Education of Serbia], zadužbina ‘endowment’, reč
‘word’, nagrada ‘award’, jezički ‘linguistic’, delo ‘work’, ministar ‘minister’, and knjiga ‘book’.
31
The verb postoji ‘exists’ was identified as a shared significant collocate of the core concept lemmas
bosanski jezik ‘Bosnian language’ (8 occurrences, MI score = 6.696) and crnogorski jezik ‘Montenegrin
language’ (18 occurrences, MI score = 7.223), but not hrvatski jezik ‘Croatian language’. Although it is
clear from other evidence (e.g., excerpts from texts representative of factors/discourses) that Croatian also
is routinely discursively constructed as non-existent, Croatian, as noted already, has historically enjoyed a
more or less equal status with Serbian and is often implicitly treated as more legitimate than either Bosnian
or Montenegrin. It should further be noted that other lexical items are also used for the same purpose (e.g.,
associate.
33
Following previous research (e.g., Vessey, 2013a), also the ‘plot’ and ‘patterns’ functions of the WST
concordancer were considered. The ‘plot’ function calculates the total number of hits for each text (but, as
already mentioned above, only for individual lemma forms) as well as their dispersion throughout a text.
The ‘patterns’ function presents identified collocates in a table ordered by their frequencies in each slot in
the collocation horizon around the node word (L5-R5). Unfortunately, both proved to be marginally useful
with a data set of this size, so they were excluded from further analysis.
34
In original text, the salient collocates are in bold. Because literal translations into English make for poor
readability, translated text may include equivalents which are not identified as salient collocates in the
original text. In translated text, both salient collocates and equivalents are underlined.
35
Nicholas I, Nikola Petrović (1841-1921), prince and king of Montenegro (see
http://www.britannica.com/EBchecked/topic/414057/Nicholas-I).
36
In a comprehensive study of nationalism in popular Serbian literature from the critical period between
1985 and 1995, Žunić (1999) suggests that Serbian literary Romanticism played a formative role in the
203
development of Serbian nationalism. Furthermore, he finds evidence of an instrumentalization of literature
in the Serbian nationalist project around the breakup of Yugoslavia. Quite in line with the prevalent
understanding of the role of literature in Serbian society (briefly exemplified in Section 7), some of the
most prominent popular literary figures of this period whose works Žunić (1999) studies were or still are
members of SANU (e.g., former President of the Federal Republic of Yugoslavia and one of the authors of
204
References
Aarsleff, H. (1982). From Locke to Saussure: Essays on the study of language and
intellectual history. Minneapolis: University of Minnesota Press.
Abd-el-Jawad, H. R. S., & Al-Haq, F. A. A. (1997). The impact of the peace process in
the Middle East on Arabic. In Clyne, M. (Ed.), Undoing and redoing corpus
planning (pp. 415-444). Berlin, New York: Mouton de Gruyter.
Althusser, L. (1971). Lenin and philosophy and other essays. London: New Left Books.
Anderson, B. (1983). Imagined communities. London: Verso.
Baker, P. (2004). Querying keywords: Questions of difference, frequency and sense in
keywords analysis. Journal of English Linguistics, 32(4), 346-359.
Baker, P. (2006). Using corpora in discourse analysis. London: Continuum.
Baker, P. (2010) Sociolinguistics and corpus linguistics. Edinburgh: Edinburgh
University Press.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyzanowski, M., McEnery, T., and
Wodak, R. (2008). A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine discourses of refugees and
asylum seekers in UK press. Discourse & Society, 19(3), 273-306.
Baker, P., Gabrielatos, C., & McEnery, T. (2013). Sketching Muslims: A corpus driven
analysis of representations around the word ‘Muslim’ in the British Press 1998-
2009. Applied Linguistics, 13(3), 255-278.
Barbour, S. (2000). Nationalism, language, Europe. In S. Barbour & C. Carmichael
(Eds.), Language and nationalism in Europe (pp. 1-17). Oxford: Oxford
University Press.
Barić, E., Hudeček, L., Koharović, N., Lončarić, M., Lukenda, M., Mamić, M.,
Mihaljević, M., Šarić, Lj., Švaćko, V., Vukojević, L., Zečević, V., & Žagar, M.
(1999). Hrvatski jezični savjetnik [Croatian language handbook]. Zagreb: Školske
Novine.
Bassi, E. (2010). A contrastive analysis of keywords in newspaper articles on the “Kyoto
Protocol”. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 207-218).
Amsterdam: John Benjamins Publishing.
Bauman, R., & Briggs, C. L. (2000). Language philosophy as language ideology: John
Locke and Johann Gottfried Herder. In P. V. Kroskrity (Ed.), Language regimes:
Ideologies, polities, and identities (pp. 139-204). Santa Fe, New Mexico: School
of American Research Press.
Bauman, R., & Briggs, C. L. (2003). Voices of modernity: Language ideologies and the
politics of inequality. Cambridge: Cambridge University Press.
Berber Sardinha, T. (1999). Using key words in text analysis: Practical aspects. Direct
Papers 42, 1-9. ISSN 1413-442x.
Berber Sardinha, T. (2004). Linguistica de corpus. Barueri: Sao Pãulo.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, D. (1993). Representativeness in Corpus Design. Literary and Linguistic
Computing, 8(4), 243-257.
Biber, D. (2006). University language: A corpus-based study of spoken and written
registers. Amsterdam: John Benjamins Publishing.
205
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in
university teaching and textbooks. Applied Linguistics, 25(3), 371-405.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language
structure and use. Cambridge: Cambridge University Press.
Biber, D., & Staples, S. (in press). Cluster analysis. In L. Plonsky (Ed.), Advancing
quantitative methods in second language learning. Routledge.
Biserko, S. (2012). Yugoslavia’s implosion: The fatal attraction of Serbian nationalism.
Belgrade: The Norwegian Helsinki Committee. Retrieved from
http://www.helsinki.org.rs/doc/yugoslavias%20implosion.pdf [Last accessed May
5, 2015]
Blackledge, A. (2005). Discourse and power in a multilingual world. Amsterdam: John
Benjamins Publishing.
Blackledge, A., & Pavlenko, A. (Eds.) (2002). Language ideologies in multilingual
contexts [Special Issue]. Multilingua 21(2/3).
Blommaert, J. (Ed.) (1999). Language ideological debates. Berlin, New York: Mouton de
Gruyter.
Blommaert, J. (2005). Discourse: A critical introduction. Cambridge: Cambridge
University Press.
Blommaert, J. (2006a). Language ideology. In B. Keith (Ed.), Encyclopedia of language
& linguistics (pp. 510-522). Boston: Elsevier.
Blommaert, J. (2006b). Language policy and national identity. In T. Ricento (Ed.),
Introduction to language policy: Theory and method (pp. 238-253). Malden, MA:
Blackwell Publishing.
Blommaert, J., & Verschueren, J. (1998). The role of language in European nationalist
ideologies. In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory (pp. 189-210). New York: Oxford
University Press.
Bondi, M., & Scott, M. (Eds.) (2010). Keyness in texts. Amsterdam: John Benjamins.
Bourdieu, P. (1991). Language and symbolic power. Cambridge, MA: Harvard
University Press.
Brown, G., & Yule, G. (1983). Discourse analysis. Cambridge: Cambridge University
Press.
Bugarski, R. (2004). Language policies in the successor states of former Yugoslavia.
Journal of Language and Politics, 3(2), 189-207.
Carmichael, C. (2000). ‘A people exists and that people has its language’: Language and
nationalism in the Balkans. In S. Barbour & C. Carmichael (Eds.), Language and
nationalism in Europe (pp. 220-239). Oxford: Oxford University Press.
Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language
Learning & Technology, 14(2), 30-49.
Cheng, W., & Lam, P. W. Y. (2013). Western perceptions of Hong Kong ten years on: A
corpus driven critical discourse study. Applied Linguistics, 34(2), 173-190.
Cortes, V., & Csomay, E. (Eds.) (2015). Corpus-based research in applied linguistics:
Studies in honor of Doug Biber. Amsterdam: John Benjamins.
Crapanzano, V. (2000). Serving the word: Literalism in America from the pulpit to the
bench. New York: New Press (distributed by W.W. Norton).
206
Culpeper, J. (2009). Keyness: Words, parts-of-speech and semantic categories in the
character-talk of Shakespeare’s Romeo and Juliet. International Journal of
Corpus Linguistics, 14(1), 29–59.
de Beaugrande, R. (1999). Discourse studies and the ideology of ‘liberalism’. Discourse
Studies, 1(3), 259-295.
DiGiacomo, (1999). Language ideological debates in an Olympic city: Barcelona 1992-
1996. In J. Blommaert (Ed.), Language ideological debates (pp. 105-142). Berlin,
New York: Mouton de Gruyter.
Dirven, R., Hawkins, B., & Sandikcioglu, E. (Eds.) (2001). Language and ideology (Vols.
1 & 2). Philadelphia: John Benjamins Publishing Company.
Đoković, D., Hrvatin, S. B., & Petković, B. (2004). Media ownership and its influence on
independence and pluralism of media in Serbia and the region. Belgrade: Medija
centar. Retrieved from
http://www.mc.rs/upload/documents/biblioteka/vlasnistvomedija1.pdf [Last
accessed May 5, 2015]
Dronjic, V. (2011). Serbo-Croatian: The making and breaking of an ausbausprache.
Language Problems & Language Planning, 35(1), 1-14.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English
for Academic Purposes. English for Specific Purposes, 28(3), 157-169.
Eagleton, T. (1991). Ideology: An introduction. London: Verso.
Edwards, J. (1985). Language, society, and identity. Oxford: Blackwell.
Ensslin, A. (2010). ‘Black and white’: Language ideologies in computer game discourse.
In S. Johnson, & T. M. Milani (Eds.), Language ideologies and media discourse:
Texts, practices, politics (pp. 205-222). London: Continuum.
Ensslin, A., & Johnson, S. (2006). Language in the news: Investigating representations of
“Englishness” using WordSmith Tools. Corpora, 1(2): 153-185.
Erjavec, K. (2009). The Bosnian “war on terrorism”. Journal of Language and Politics,
8(1), 5-27.
Fairclough, N. (2001). Language and power (2nd Ed.). Essex: Pearson Education
Limited.
Fairclough, N. (2010). Critical discourse analysis: The critical study of language (2nd
ed.). Harlow: Longman.
Fishman, J. A. (1972). Language and nationalism. Rowley: New Berry House Publishers.
Fishman, J. A. (1997). Language and ethnicity: The view from within. In Coulmas, F.
(Ed.), The handbook of sociolinguistics (pp. 327-343). Oxford: Blackwell.
Fitzsimmons Doolan, S. (2009). Is public discourse about language policy really public
discourse about immigration? A corpus-based study. Language Policy 8, (4), 377-
402.
Fitzsimmons Doolan, S. (2011). Identifying and describing language ideologies related
to Arizona educational language policy (Unpublished doctoral dissertation).
Northern Arizona University, Flagstaff, AZ. (UMI No. 3467048)
Fitzsimmons Doolan, S. (2014). Using lexical variables to identify language ideologies in
a policy corpus. Corpora, 9(1), 57-82.
Fleischer, A. A. (2007). The politics of language in Quebec: Language policy and
language ideologies in a pluriethnic society (Unpublished doctoral dissertation).
Georgetown University, Washington, D.C.
207
Ford, C. (2001). The (re-)birth of Bosnian: Comparative perspectives on language
planning in Bosnia-Herzegovina (Unpublished doctoral dissertation). University
of North Carolina at Chapel Hill, Chapel Hill, NC.
Foucault, M. (1972). The archeology of knowledge. London: Tavistock.
Fought, C. (2006). Language and ethnicity. Cambridge: Cambridge University Press.
Fowler, R. (1991). Language in the news. London: Routledge.
Fraysee-Kim, S. H. (2010). Keywords in Korean national consciousness: A corpus-based
analysis of school textbooks. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp.
219-234). Amsterdam: John Benjamins Publishing.
Freake, R. (2011). A cross-linguistic corpus-assisted discourse study of language
ideology in Canadian newspapers. Paper presented at the Corpus Linguistics
Conference, Birmingham, England. Retrieved from http://www.birmingham.ac.uk
/documents/college-artslaw/corpus/conference-archives/2011/paper-17.pdf [Last
accessed May 5, 2015]
Freake, R., Gentil, G., & Sheyholislami, J. (2011). A bilingual corpus-assisted discourse
study of the construction of nationhood and belonging in Quebec. Discourse &
Society, 22(1) 21-47.
Friedman, V. (1999). Linguistic emblems and emblematic languages: On language as
flag in the Balkans. Columbus: Department of Slavic and East European
Languages and Literatures at the Ohio State University.
Gal, S. (1998). Multiplicity and contention among language ideologies: A commentary.
In B. B. Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language
ideologies: Practice and theory (pp. 317-331). New York: Oxford University
Press.
Gal, S. (2001). Linguistic theories and national images in nineteenth-century Hungary. In
S. Gal, & K. A. Woolard (Eds.), Languages and publics: The making of authority
(pp. 30-45). Manchester: St. Jerome.
Gal, S., & Woolard, K. A. (Eds.) (2001). Languages and publics: The making of
authority. Manchester: St. Jerome.
Gee, P. J. (2010). An introduction to discourse analysis: Theory and method. London:
Routledge.
Giddens, A. (1979). Central problems in social theory: Action, structure and
contradiction in social analysis. Berkeley and Los Angeles: University of
California Press.
Gramsci, A. (1971). Selections from the prison notebooks of Antonio Gramsci. London:
Lawrence & Wishart.
Gray, B., & Biber, D. (2013). Lexical frames in academic prose and
conversation. International Journal of Corpus Linguistics, 18(1), 109-135.
Greenberg, R. D. (2004). Language and identity in the Balkans: Serbo-Croatian and its
disintegration. Oxford: Oxford University Press.
Gröschel, B. (2009). Das Serbokroatische zwischen Linguistik und Politik: Mit einer
Bibliographie zum postjugoslawischen Sprachstreit [Serbo-Croatian between
linguistics and politics: With a bibliography on the post-Yugoslav language
conflict]. Lincom Europa: München.
Habermas, J. (1989). The structural transformation of the public sphere. Cambridge,
MA: MIT Press.
208
Halliday. M. A. K., & Matthiessen, C. M. I. M. (2004). Introduction to functional
grammar. New York: Routledge.
Hardt-Mautner, G (1995). ‘Only connect’: Critical discourse analysis and corpus
linguistics. Retrieved from http://ucrel.lancs.ac.uk/papers/techpaper/vol6.pdf
[Last accessed May 5, 2015]
Haugen, E. (1972). Dialect, language, nation. In J. B. Pride & J. Holmes
(Eds.), Sociolinguistics (pp. 97-111). Harmondsworth: Penguin. (Originally
published in American Anthropologist 68 (1966): 922-935.)
Heller, M. (1999). Heated language in a cold climate. In J. Blommaert (Ed.), Language
ideological debates (pp. 143-172). Berlin, New York: Mouton de Gruyter.
Herzfeld, M. (1987). Anthropology Through the looking-glass: Critical ethnography in
the margins of Europe. Cambridge: Cambridge University Press.
Hobsbawm, E. (1990). Nations and nationalism since 1780. Cambridge: Cambridge
University Press.
Hornberger, N. H., & McKay, S. L. (Eds.) (2010). Sociolinguistics and language
education. Bristol: Multilingual Matters.
Hult, F. M., & Pietikainen, S. (2014). Shaping discourses of multilingualism through a
language ideological debate: The case of Swedish in Finland. Journal of
Language and Politics, 13(1), 1-20.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University
Press.
IBM (2012). Statistical Package for the Social Sciences.
Irvine, J. T. (1989). When talk isn’t cheap: Language and political economy. American
Ethnologist, 16(2), 248-267.
Irvine, J. T., & Gal, S. (2000). Language ideology and linguistic differentiation. In P. V.
Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp. 35-
83). Santa Fe, New Mexico: School of American Research Press.
Jaffe, A. (1999). Ideologies in action: Language politics on Corsica. Berlin: Mouton de
Gruyter.
Johnson, S., & Ensslin, A. (2007). Language in the media: Representations, identities,
ideologies. New York: Continuum.
Johnson, S., & Milani, M. M. (Eds.) (2010). Language ideologies and media discourse:
Texts, practices, politics. London: Continuum.
Johnson, S., Milani, M. M., & Upton, C. (2010). Language ideological debates on the
BBC ‘Voices’ website: Hypermodality in theory and practice. In S. Johnson, & T.
M. Milani (Eds.), Language ideologies and media discourse: Texts, practices,
politics (pp. 223-251). London: Continuum.
Johnson, S., & Suhr, S. (2003). From ‘political correctness’ to ‘politische Korrektheit’:
Discourses of ‘PC’ in the German newspaper, Die Welt. Discourse & Society,
14(1), 49-68.
Katičić, R. (1997). Undoing a ‘unified language’: Bosnian, Croatian, Serbian. In M.
Clyne (Ed.), Undoing and redoing corpus planning (pp. 269-289). Berlin: Mouton
de Gruyter.
Kloss, H. (1967). ‘Abstand languages’ and ‘Ausbau languages’. Anthropological
Linguistics, 9(7), 29-41.
Kordić, S. (2010). Jezik i nacionalizam [Language and nationalism]. Zagreb: Durieux.
209
Kroskrity, P. V. (1998). Arizona Tewa Kiwa speech as a manifestation of a dominant
language ideology. In B. B.Schieffelin, K. A.Woolard & P. V. Kroskrity (Eds.),
Language ideologies: Practice and theory. New York: Oxford University Press.
Kroskrity, P. V. (Ed.) (2000a). Language regimes: Ideologies, polities, and identities.
Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2000b). Regimenting languages: Language ideological perspectives. In
P. V. Kroskrity (Ed.), Language regimes: Ideologies, polities, and identities (pp.
1-34). Santa Fe, New Mexico: School of American Research Press.
Kroskrity, P. V. (2004). Language ideologies. In A. Duranti (Ed.), A companion to
linguistic anthropology (pp. 496-517). Malden, MA: Blackwell Publishing.
Kuo, S., & Nakamura, M. (2005). Translation or transformation? A case study of
language and ideology in the Taiwanese press. Discourse & Society, 16(3), 393-
417.
Lippi-Green, R. (2007). English with an accent: Language, ideology, and discrimination
in the United States. London: Routledge.
Luuk, E. (2013). The structure and evolution of symbol. New Ideas in Psychology, 31(2),
87-97.
Mautner, G. (2007). Mining large corpora for social information: The case of elderly.
Language in Society, 36, 51-72.
May, S. (2001). Language and minority rights: Ethnicity, nationalism and the politics of
language. Harlow: Pearson.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. New York: Routledge.
McGroarty, M. (2008). The political matrix of linguistic ideologies. In B. Spolsky & F.
M. Hult (Eds.), The handbook of educational linguistics (pp. 98-112). Malden,
MA: Blackwell Publishing.
McGroarty, M. (2010). Language and ideologies. In N. N. Hornberger & S. L. McKay
(Eds.), Sociolinguistics and language education (pp. 3-39). Bristol: Multilingual
Matters.
Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal
of Sociolinguistics, 5(4), 530-555.
Moskovljević, M. (1966). Rečnik savremenog srpskohrvatskog jezika s jezičkim
savetnikom [Dictionary of contemporary Serbo-Croatian with a language
handbook]. Beograd: Tehnička Knjiga i Nolit.
O’Rourke, B., & Ramallo, F. (2013). Competing ideologies of linguistic authority
amongst new speakers in contemporary Galicia. Language in Society, 42(3), 287-
305.
Partington, A. (2003). The linguistics of political argument: The spin-doctor and the
wolf-pack at the White House. London: Routledge.
Partington, A. (2010). Modern Diachronic Corpus-Assisted Discourse Studies (MD-
CADS) [Special Issue]. Corpora, 5(2).
Pennycook, A. (2001). Critical applied linguistics: A critical introduction. Mahwah, NJ:
Routledge.
210
Pujolar, J. (2007). The future of Catalan: Language endangerment and nationalist
discourses in Catalonia. In A. Duchêne & M. Heller (Eds.), Discourses of
endangerment: Ideology and interest in the defence of languages (pp. 121-148).
London: Continuum.
Rayson, P. (2008). From key words to key semantic domains. International Journal of
Corpus Linguistics, 13(4), 519-549.
Reisigl, M., & Wodak, R. (2009). The discourse-historical approach (DHA). In R. Wodak
& M. Meyer (Eds.), Methods of critical discourse analysis: Theory and method
(pp. 87-121). SAGE: London.
Ricento, T. (Ed.) (2000). Ideology, politics and language policies: Focus on English.
Philadelphia: John Benjamins Publishing.
Ricento, T. (2003). The discursive construction of Americanism. Discourse & Society,
14(5), 611-637.
Ricento, T. (2006). Americanization, language ideologies and the construction of
European identities. In C. Mar-Molinero, & P. Stevenson (Eds.), Language
ideologies, policies and practices. Language and the future of Europe (pp. 44-57).
Basingstoke: Palgrave Macmillan.
Rumsey, A. (1990). Wording, meaning and linguistic ideology. American Anthropologist,
92(2), 346-361.
Safran, W. (1999). Nationalism. In Fishman, J. A. (Ed.), Handbook of language & ethnic
identity (pp. 77-93). Oxford: Oxford University Press.
Salama, A. H. Y. (2011). Ideological collocation and the recontextualization of Wahhabi-
Saudi Islam post-9/11: A synergy of corpus linguistics and critical discourse
analysis. Discourse & Society, 22(3), 315-342.
SANU (1986). Memorandum srpske akademije nauka i umetnosti (nacrt) [A draft
memorandum of the Serbian Academy of Sciences and Arts]. Retrieved from
http://www.helsinki.org.rs/serbian/doc/memorandum%20sanu.pdf [Last accessed
May 5, 2015]
Schieffelin, B. B., Woolard, K. A., & Kroskrity, P. V. (Eds.) (1998). Language
ideologies: practice and theory. New York: Oxford University Press.
Scott, M. (1997). PC analysis of key words – and key key words. System, 25(2), 233-245.
Scott, M. (2009). In search of a bad reference corpus. In D. Archer (Ed.), What’s in word-
list? Investigating word frequency and keyword extraction (pp. 79-92). Oxford:
Ashgate.
Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and
marking out our trails… In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 43-
58). Amsterdam: John Benjamins Publishing.
Scott, M. (2014a). WordSmith Tools Help Manual. Version 6.0. Liverpool: Lexical
Analysis Software.
Scott, M. (2014b). WordSmith Tools. Liverpool: Lexical Analysis Software.
Scott, M. R., & Tribble C. (2006). Key words and corpus analysis in language education.
Amsterdam: John Benjamins Publishing.
Seargeant, P. (2009). Language ideology, language theory, and the regulation of
linguistic behavior. Language Sciences, 31, 345-359.
211
Silverstein, M. (1979). Language structure and linguistic ideology. In R. Clyne, W.
Hanks & C. Hofbauer (Eds.), The elements: A parasession on linguistic units and
levels (pp. 193-247). Chicago: Chicago Linguistic Society.
Silverstein, M. (1993). Metapragmatic discourse and metapragmatic function. In J.A.
Lucy (Ed.), Reflexive language: Reported speech and metapragmatics (pp. 33-
58). Cambridge: Cambridge University Press.
Silverstein, M. (1998). The uses and utility of ideology: A commentary. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 123-145). New York: Oxford University Press.
Silverstein, M. (2000). Whorfianism and the linguistic imagination of nationality. In P. V.
Kroskrity, (Ed.), Language regimes: Ideologies, polities, and identities (pp. 85-
138). Santa Fe, New Mexico: School of American Research Press.
Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford: Oxford University
Press.
Skutnabb-Kangas, T. (2000). Linguistic genocide in education or worldwide diversity
and human rights? Mahwah, NJ: Lawrence Erlbaum Associated Inc., Publishers.
Spitulnik, D. (1998). Mediating unity and diversity: The production of language
ideologies in Zambian broadcasting. In B. B. Schieffelin, K. A. Woolard & P. V.
Kroskrity (Eds.), Language ideologies: Practice and theory (pp. 163-188). New
York: Oxford University Press.
Spolsky, B. (2004). Language policy. Cambridge: Cambridge University Press.
Stubbs, M. (1983). Discourse analysis: The sociolinguistic analysis of natural language.
Chicago: University of Chicago Press.
Stubbs, M. (1996). Text and corpus analysis. London: Blackwell.
Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness
in texts (pp. 21-42). Amsterdam: John Benjamins Publishing.
Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in
students’ evaluations of mathematics instructors on RateMyProfessors.com.
Language in Society 44 (1), 35-62.
Sudetic, C. (1993, December 26). Balkan conflicts are uncoupling Serbo-Croatian. The
New York Times. Retrieved from
http://www.nytimes.com/1993/12/26/world/balkan-conflicts-are-uncoupling-
serbo-croatian.html [Last accessed May 5, 2015]
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston, MA:
Pearson Education, Inc.
Thompson, J. B. (1984). Studies in the theory of ideology. Cambridge: Polity Press.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam: John Benjamins
Publishing Company.
Tomić, Y. (n.d.). The ideology of Greater Serbia in the nineteenth and twentieth
centuries: An expert report. Paris: Bibliotheque de documentation internationale
contemporaine, Universite de Paris X-Nanterre. Retrieved from
http://www.helsinki.org.rs/serbian/doc/expert%20report%20-
%20yves%20tomic.pdf [Last accessed May 5, 2015]
212
Tošović, B. (n.d.). Herausbildung des Bosnischen/Bosniakischen [The development of
Bosnian/Bosniak]. Retrieved from http://www-gewi.uni-
graz.at/gralis/Slawistikarium/BKS/Herausbildung_Bosnisch-Bosniakisch.pdf
[Last accessed May 5, 2015]
van Dijk, T. A. (1998). Ideology: A multidisciplinary approach. London: SAGE.
van Dijk, T. A. (2006). Ideology and discourse analysis. Journal of Political Ideologies,
11(2), 115-140.
Vessey, R. (2013a). Language ideologies and discourses of national identity in Canadian
newspapers: A cross-linguistic corpus-assisted discourse study (Unpublished
doctoral dissertation). University of London, London.
Vessey, R. (2013b). Too much French? Not enough French?: The Vancouver Olympics
and a very Canadian language ideological debate. Multilingua, 32(5), 659-682.
Vikør, L. S. (2000). Northern Europe: Languages as prime markers of ethnic and national
identity. In S. Barbour & C. Carmichael (Eds.), Language and nationalism in
Europe (pp. 105-129). Oxford: Oxford University Press.
Völkl, S. D. (2002). Bosnisch [Bosnian]. In M. Okuka (Ed.), Lexikon der Sprachen des
europäischen Ostens [The lexicon of East European languages] (Vol. 10, pp. 209-
218). Klagenfurt: Wieser. Retrieved from
http://wwwg.uni-klu.ac.at/eeo/Bosnisch.pdf [Last accessed May 5, 2015]
Voss, C. (2006). The Macedonian standard language: Tito-Yugoslav experiment or
symbol of ‘Great Macedonian’ ethnic inclusion? In C. Mar-Molinero, & P.
Stevenson (Eds.), Language ideologies, policies and practices. Language and the
future of Europe (pp. 118-132). Basingstoke: Palgrave Macmillan.
Wallis, D. A. (1998). Language, attitude, and ideology: An experimental social-
psychological study. Journal of Pragmatics, 30, 21-48.
Warren, M. (2010). Identifying aboutgrams in engineering texts. In M. Bondi & M. Scott
(Eds.), Keyness in texts (pp. 113-126). Amsterdam: John Benjamins Publishing.
Wilce, J. (2010). Society, language, history and religion: A perspective on Bangla from
linguistic anthropology. In T. Omoniyi (Ed.), The sociology of language and
religion: Change, conflict, and accommodation (pp. 126-155). London: Palgrave/
Macmillan.
Williams, R. (1976). Keywords: A vocabulary of culture and society. New York: Oxford
University Press.
Wodak, R. (2001). The discourse-historical approach. In R. Wodak & M. Meyer (Eds.),
Methods of critical discourse analysis (pp. 63-94). London: SAGE.
Wodak, R. (2004). Critical discourse analysis. In C. Seale, G. Gobo, J. F. Gubrium & D.
Silverman (Eds.), Qualitative research practice (pp.197-213). London: Sage.
Wodak, R. (2012). Language, power and identity. Language Teaching, 45(2), 215-233.
Wodak, R., & de Cillia, R., & Reisigl, M., & Liebhart, K. (1999). The discursive
construction of national identity. Edinburgh: Edinburgh University Press.
Wodak, R., & Meyer, M. (Eds.) (2009). Methods of critical discourse analysis: Theory
and method. SAGE: London.
Woolard, K. A. (1998). Introduction: Language ideology as a field of inquiry. In B. B.
Schieffelin, K. A. Woolard & P. V. Kroskrity (Eds.), Language ideologies:
Practice and theory (pp. 3-47). New York: Oxford University Press.
213
Woolard, K. A., & Schieffelin, B. B. (1994). Language ideology. Annual Review of
Anthropology, 23, 55-82.
Wright, S. (2004). Language policy and language planning: From nationalism to
globalization. New York: Palgrave MacMillan.
Xiao, R., & McEnery, T. (2005). Two approaches to genre analysis: Three genres in
Modern American English. Journal of English Linguistics, 33(1), 62-82.
Žunić, D. (1999). Nacionalizam i književnost: Srpska književnost 1985-1995
[Nationalism and literature: Serbian literature 1985-1995]. Open Society Institute:
Budapest, Hungary. Retrieved from
http://rss.archives.ceu.hu/archive/00001127/01/133.pdf [Last accessed May 5,
2015]
214
Appendices
Although one of the main goals of the methodological synergy between CL and
CDA is to provide an objective and reliable sampling procedure, similar to many other
corpus-based discourse studies the data set compiled here is too large for a
research deals with this issue in a variety of ways. Studies based on keyword analysis
number of items identified as key. Studies based on collocation analysis, on the other
hand, focus on limited sets of search terms and collocates, as well as random samples of
concordance lines. To solve the problem of oversampling, Hunston (2002), for example,
Somewhat similarly, Baker et al. (2008) suggest a focus on what they call consistent
collocates (i.e., items that appear as significant collocates of the target core concept(s)
throughout the timeframe rather than over isolated periods within it). Vessey (2013a), in
resulting from previous research. Despite the relative practical and theoretical merits of
such downsampling procedures, however, it is clear that they all incur a loss of
Exploratory factor analysis (EFA), on the other hand, is ideally suited for principled
analyses of large data sets with numerous variables, as it employs covariance among
variables to produce sets of mutually positively correlated variables that can help the
215
researcher identify discourses and ideologies in the data in an objective manner and with
the promise of minimal loss of information. For EFA to identify meaningful covariance
and thus produce useful results, however, variables must meet certain minimal
requirements such as sufficient frequency per observation (e.g., individual text; Douglas
In order to find a solution to this problem, the data were carefully examined from
a variety of standpoints, paying particular attention to the hit count (i.e., number of
occurrences of forms of the lemma JEZIK ‘language’) in individual articles and its
relationship to article content. Although there is a variety of ways to count hits and then
examine articles based on this information, the ‘plot’ function in WordSmith Tools 6.0
(WST; Scott, 2014b) concordancer tool was used initially to sort articles according to the
number of hits as the quickest and most practical solution. The top ten and bottom ten
articles on the list were then examined closely in their entirety with an eye to potential
Quite expectedly, this examination revealed that higher numbers of hits meant a higher
likelihood of an article potentially containing relevant material. For example, while the
top ten articles tended to have high numbers of hits (e.g., between 22 and 45 for the
nominative form of the lemma JEZIK) and content in which language was explicitly
discussed, the bottom ten articles (and roughly 66% of all articles, see Table 4.6) all had a
single hit and tended to mention language only in passing. Interestingly, Vessey (2013a)
found a similar pattern in Canadian newspaper data in English and French; for illustrative
examples of the pattern here, see Examples 1 and 2 below. Based on this, hit count was
taken as a reasonable indicator of content relevance and thus taken as a valid sampling
216
criterion. Further, based in part on the methodological constraints of EFA (Douglas
Biber, personal communication), the cutoff point for inclusion in the research sample was
Example 1 (excerpt from “The restoration of Serbian studies,” Politika, July 22,
2006, 45 hits)
Jezik jeste jedan, ali srpski, jeste jedinstven, ali ne i hrvatski, jeste zajednički, ali
samo po upotrebi,no nikako ne i po pripadnosti i poreklu Akademik Pavle Ivić, u
svojim dijalektološkim studijama, dao je tvrde i nepobitne naučne dokaze da
štokavski dijalekt, po svome poreklu i svojoj prvobitnoj teritorijalnoj
rasprostranjenosti, obuhvata oblasti srednjevekovne srpske države (uglavnom do
reke Cetine). Nema nikakve sumnje da je tim jezikom govorio srpski narod i da je
to jezik srpskog naroda. Nauci je takođe dobro poznato da su čakavski i kajkavski
dijalekti nastali na tlu Hrvatske i da oni predstavljaju izvorni hrvatski jezik.
Hrvati su se svoga čakavskog i kajkavskog jezika odrekli u prvoj polovini 19.
veka i prihvatili su Vukov, srpski, štokavski govor. Tako je srpski jezik postao i
hrvatski, zajednički, srpski i hrvatski, srpski ili hrvatski, srpskohrvatski i
hvratskosrpski. […]
The language [Central South Slavic] is one, but Serbian, it is unified, but not also
Croatian, it is common, but only in terms of use, not in terms of affiliation and
origin. Academician Pavle Ivić, in his dialectological studies, has given hard and
irrefutable evidence that the Štokavian dialect, in terms of its origin and its
original territorial spread, covers the area of the medieval Serbian state (mostly to
the river Cetina). There is absolutely no doubt that this was the language spoken
by the Serbian people and belonging to the Serbian people. At the same time, it
has been scientifically established that the Čakavian and Kajkavian dialects came
into being in the territory of Croatia and that they represent the original Croatian
language. The Croats gave up on their Čakavian and Kajkavian language (sic) in
the first half of the nineteenth century, accepting Vuk’s [Vuk Stefanović Karadžić,
nineteenth century Serbian grammarian and language reformer] Serbian,
Štokavian speech. Thus, Serbian language became also Croatian, shared, Serbian
and Croatian, Serbian or Croatian, Serbo-Croatian or Croato-Serbian. […]
Večeras je u Novom Sadu održana svečana sednica Matice srpske povodom 177.
godišnjice postojanja ovog hrama naše kulture. Povodom 50. godišnjice Zbornika
Matice srpske za književnost i jezik besedio je prof. dr Jovan Delić. Na
večerašnjoj svečanosti pesniku Nikoli Vujičiću uručena je Zmajeva nagrada za
2002. godinu za knjigu pesama „Prepoznavanje" koju je izdalo Kulturno društvo
„Prosvjeta" iz Zagreba. Stihove laureata kazivao je dramski umetnik Ivan
217
Jagodić.
The Matica Srpska [Serbian Language and Literary Society] held a celebratory
session tonight in Novi Sad to mark the 177th anniversary of this temple of our
culture. To mark the 50th anniversary of the Matica Srpska Journal of Literature
and Language an address was delivered by Dr. Jovan Delić. Also at tonight’s
ceremony poet NikolaVujičić received the Zmaj [Jovan Jovanović Zmaj,
nineteenth century Serbian poet] Award for the year 2002 for his book of poetry
titled “Recognition” which was published by the Zagreb-based “Enlightenment”
Cultural Society. A selection of the poet laureate’s verses was performed by
theater actor Ivan Jagodić.
The ‘plot’ function in the WST concordancer (see above), however, is unable to
calculate the total number of hits for any given lemma (i.e., it is only capable of
calculating the number of hits for individual lemma forms separately). In order to arrive
at the total number of hits per article for all forms of the lemma JEZIK, a custom Python
application was used to simultaneously compute the total number of hits for all lemma
forms per article and sort articles into separate folders according to the number of hits.
Following this, another custom Python program was used to calculate the total number of
articles per hit category (1, 2-4, 5-9, and 10+ hits) and publication. Using the above-
mentioned cutoff point of 5 hits per article (again, for all forms of the lemma JEZIK
combined), a total of 1,257 articles were identified (with 5+ hits, see Tables 6-9 in
Section 4).
Another way to test hit count as a sampling criterion is to compare the frequencies
in the two sections of forms of the lemma JEZIK and its collocates directly related to the
between language ideologies and ethnonationalism (Table A1). (I am also including the
verb postoji “exists”, which here suggests a pervasive discourse and concomitant
218
ideology of contestation, as another example).
Table A1
Even a quick glance at the results of frequency and collocation analyses on the 1
hit and 5+ hits sections, shows that while forms of the lemma JEZIK can be found in both
sections of the corpus, they are roughly eight times more frequent in the 5+ hits section
overall (11,202 vs. 1,423 occurrences per million words), which of course is expected
given the selection criterion here. More importantly, the pertinent ethnolinguistic
collocates and the verb postoji ‘exists’ as another indicator of relevant discourses (all
numbers of explicit references to language. Perhaps the starkest example here is the
219
collocate ‘Bosniak’ (103.7 vs. 1.9 occurrences per million words) which as a language
label is an indicator of a pervasive discourse of contestation (as many Serbian but also
Croatian linguists, politicians, and public figures argue that the language of Bosniaks can
only be called Bosniak, after the people, and not Bosnian, after the country, as that
identities within Bosnia, even if all three languages are official according to the country’s
constitution).
A final and perhaps most convincing piece of evidence of the greater relevance of
articles with a higher hit count, and thus of the validity of the hit count as a sampling
criterion, can be obtained from keyword analysis. The keyword list resulting from a
comparison between the 1-4 hits and 5+ hits sections of SERBCORP thus includes the
lemma JEZIK and the relevant ethnonyms and glottonyms, as well as a considerable
number of other potentially relevant items such as narod ‘people’, nacionalni ‘national’,
naziv ‘[language] label’, identitet ‘identity’, nacija ‘nation’, etc., which indicates their
greater salience in the 5+ hits section (see Table 6.5). It therefore seems reasonable to
conclude that, while traces of relevant discourses and ideologies can be found also in
texts that are not equally language-focused (i.e., articles with a lower hit count), the
sampled data set represents a concentrated discourse of higher relevance to the study of
the links between language-related discourses and language ideologies, on the one hand,
220
Appendix B: Comparative Analyses of Comparator Corpora
also compiled a set of three wordlists from the very large, newly available web-as-corpus
(Table B1).
Table B1
To determine the optimal reference corpus, keyword analysis was conducted with
SERBCOMP as well as the WaC corpora as reference corpora, using both the chi-square
and log-likelihood tests. Minimum KW frequency (5) and the p value (.0000000001)
were kept the same for all tests and corpora combinations. In a discussion of measures of
similarity between corpora, Baker (2010, pp. 91-93) suggests the use of frequency and
rank information and the Spearman rank correlation statistic as a way to assess the degree
between reference corpora, I used keyness scores rather than item frequency to assess
similarity. Thus, in order to compare the results of keyword analyses conducted using
SERBCOMP and WaC corpora, I compared the (keyness-based) ranks of the top 100
KWs resulting from keyword analysis with SERBCOMP as the reference corpus to the
ranks of the same KWs resulting from keyword analyses with the WaC corpora as
reference corpora. The correlation scores produced by the Spearman rank correlation
221
statistic (p < .01) ranged from .62 for the largest but most heterogeneous srWaC14 to .61
for the somewhat smaller but more coherent OPUS to .71 for the smallest and most
and is thus the most similar to SERBCOMP of the three WaC corpora, the result of
A careful analysis of the keyword lists produced by the different reference corpora
revealed that, despite their very different sizes (see Table B1), different reference corpora
yield comparable results regardless of the statistic chosen. This lends support to Xiao &
McEnery’s (2005) claim that the size of the reference corpus may not be important,2
Culpeper’s (2009) finding that the chi-square and log-likelihood tests produce negligibly
different keyword lists, so log-likelihood was used in all subsequent analyses. The most
notable difference between the keyword lists thus produced was in the noise levels, which
can be defined as the proportion of functional words and semantically generic lexical
material such as, for instance, prepositions (e.g., by) and time adverbs with context-
specific reference (e.g., yesterday) identified as key. Unsurprisingly, the keyword list
produced by SERBCOMP as the reference corpus exhibited the lowest noise level. An
additional difficulty in dealing with the WaC corpora as reference corpora was that
keyword analyses were based on comparisons of wordlists (rather than the corpora
comparator corpus was thus determined to have two principal advantages. First, because
both SERBCORP and SERBCOMP comprise newspaper register, the resulting keyword
222
list is largely free from items that characterize newspaper language in general as well as
other items contributing to noise, and second, items identified as key can be expected to
and SERBCOMP are very similar except in whether or not they contain texts mentioning
language, the KWs resulting from keyword analysis based on SERBCOMP as the
discussing language (rather than newspaper discourse in general) and so are more likely
to be useful for the identification of any discourses and ideologies pertaining to language
(cf. “irrelevant stylistic differences” in Culpeper, 2009 above). Based on these findings, a
decision was made to conduct all keyword analyses with SERBCOMP as the reference
223
Appendix C: Keyword Analysis (SERBCORP)
compared to SERBCOMP (the lists of positive and negative KWs are presented in Tables
C1 and C2). This analysis identified a total of 111 positive and 77 negative key
lemmas.41 Expectedly, the top positive key lemma in SERBCORP is jezik ‘language’,
which simply confirms that SERBCORP is about language. The top ten positive key
lemmas also include knjiga ‘book’, srpski ‘Serbian’, književnost ‘literature’, engleski
‘English’, škola ‘school’, pisac ‘writer’, roman ‘novel’, pesnik ‘poet’, and kultura
pervasive discourse of national identity based on group membership (‘us’ and ‘them’).
Other top fifty positive key lemmas suggest similar semantic fields: ethnonyms and
glottonyms as well as other identity-related nouns and pronouns (naš ‘our, Srbi ‘Serbs’,
narod ‘people’, istorija ‘history’, francuski ‘French, ja ‘I’, svoj ‘own’, moj ‘my’, ona
‘she’, and Crna Gora ‘Montenegro’); education (e.g., professor ‘professor’, obrazovanje
‘publisher, nagrada ‘award’, autor ‘author’, urednik ‘editor’), and theater and film (e.g.,
umetnost ‘art’, predstava ‘[theater] play’, film ‘film’, pozorišta ‘theater’). Most
pertinently for our purposes, there is a remarkable absence of references to other regional
ethnolinguistic identities (with the possible exception of the key lemma ime ‘name’),
which confirms that general language-related newspaper discourse may not be ideally
224
suited for the study of links between language-related discourses and language ideologies
The top ten negative key lemmas include Srbija ‘Serbia’, vlada ‘government’,
evra ‘Euros’, miliona ‘millions’, odsto ‘percent’, dinara ‘Dinars’, zakon ‘law’, stranke
‘(political) parties’, protiv ‘against’, and predsednik ‘president’). This suggests that,
political and state institutions, as well as finances. Similar to the positive key lemmas
above, the remaining negative key lemmas confirm the relative absence of references to
225
Table C1
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 33834 0.29 6517 1 72780.53 0.0000000000
2 book knjiga 22712 0.19 3535 10369 0.05 16361.50 0.0000000000
3 Serbian srpski 35745 0.31 3720 32565 0.14 9509.99 0.0000000000
4 literature književnost 6070 0.05 1280 1158 7657.32 0.0000000000
5 English engleski 5252 0.05 1361 722 7490.84 0.0000000000
6 school škola 12984 0.11 1362 7491 0.03 7281.08 0.0000000000
7 writer pisac 7477 0.06 1687 2633 0.01 6678.77 0.0000000000
8 novel roman 5908 0.05 1287 2166 5120.75 0.0000000000
9 poet pesnik 2907 0.02 857 600 3541.36 0.0000000000
10 culture kultura 9544 0.08 1172 8355 0.04 2762.23 0.0000000000
11 professor profesor 7511 0.06 1931 5952 0.03 2636.04 0.0000000000
12 our naš 32116 0.28 3284 43596 0.19 2242.57 0.0000000000
13 world svet 14916 0.13 2489 17017 0.08 2148.98 0.0000000000
14 Serbs Srbi 9874 0.08 1669 9918 0.04 2073.39 0.0000000000
15 life život 12633 0.11 3228 14019 0.06 1991.74 0.0000000000
16 people narod 8529 0.07 1667 8257 0.04 1966.10 0.0000000000
17 history istorija 6990 0.06 1015 6267 0.03 1922.79 0.0000000000
18 French francuski 2943 0.03 881 1488 1913.89 0.0000000000
19 publisher izdavač 2454 0.02 1020 1065 1850.34 0.0000000000
20 I ja 22907 0.20 3820 30315 0.13 1817.01 0.0000000000
21 art umetnost 5876 0.05 991 5011 0.02 1793.73 0.0000000000
22 award nagrada 5714 0.05 1037 4956 0.02 1685.40 0.0000000000
23 own svoj 43119 0.37 4169 64375 0.29 1673.43 0.0000000000
24 author autor 5520 0.05 1948 4725 0.02 1672.44 0.0000000000
25 word reč 15127 0.13 4482 18651 0.08 1638.93 0.0000000000
26 my moj 9172 0.08 1790 9866 0.04 1590.91 0.0000000000
27 education obrazovanje 3953 0.03 1001 2969 0.01 1522.34 0.0000000000
28 children deca 9665 0.08 1456 10786 0.05 1496.38 0.0000000000
29 education (profession) prosvete 1951 0.02 978 965 1297.91 0.0000000000
30 love ljubav 3050 0.03 978 2236 1222.29 0.0000000000
31 theater play predstava 4601 0.04 857 4259 0.02 1178.86 0.0000000000
32 work delo 3289 0.03 2119 2736 0.01 1054.13 0.0000000000
33 knowledge znanje 3355 0.03 858 2910 0.01 989.41 0.0000000000
34 science nauka 3731 0.03 1204 3468 0.02 946.90 0.0000000000
35 story priča 9748 0.08 3126 12432 0.06 916.19 0.0000000000
36 film film 7155 0.06 1105 8872 0.04 757.21 0.0000000000
37 doctor dr 5362 0.05 2203 6243 0.03 719.92 0.0000000000
38 program of study studija 2937 0.03 882 2775 0.01 717.56 0.0000000000
39 text tekst 5182 0.04 1413 5996 0.03 711.01 0.0000000000
226
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 wrote napisao 1844 0.02 1399 1451 655.18 0.0000000000
41 alphabet pismo 4667 0.04 865 5401 0.02 639.96 0.0000000000
42 Monte(negro) Crna 7474 0.06 824 9931 0.04 580.66 0.0000000000
43 (Monte)negro Gora 8250 0.07 812 11299 0.05 548.57 0.0000000000
44 one jedan 35933 0.31 7258 59267 0.26 545.43 0.0000000000
45 program/curriculum program 7711 0.07 1570 10476 0.05 535.17 0.0000000000
46 editor urednik 1600 0.01 1085 1307 530.83 0.0000000000
47 theater pozorišta 2067 0.02 988 2036 456.21 0.0000000000
48 she ona 12336 0.11 4913 18787 0.08 410.14 0.0000000000
49 name ime 6627 0.06 2440 9292 0.04 386.30 0.0000000000
50 here ovde 6153 0.05 3213 8589 0.04 368.01 0.0000000000
51 history njegov 20865 0.18 2728 34010 0.15 363.81 0.0000000000
52 part deo 14419 0.12 3981 22718 0.10 357.13 0.0000000000
53 experience iskustvo 3092 0.03 1135 3766 0.02 351.32 0.0000000000
54 picture slika 3168 0.03 1261 3970 0.02 320.85 0.0000000000
55 youth mladi 3415 0.03 1340 4380 0.02 312.91 0.0000000000
56 always uvek 7295 0.06 4195 10746 0.05 310.79 0.0000000000
57 born rođen 1174 0.01 949 1081 304.37 0.0000000000
58 many mnogi 7031 0.06 2371 10358 0.05 299.35 0.0000000000
59 her njen 7002 0.06 1551 10316 0.05 297.97 0.0000000000
60 live žive 2338 0.02 1729 2775 0.01 292.91 0.0000000000
61 woman žena 3900 0.03 1149 5254 0.02 282.68 0.0000000000
62 Belgrade (adj.) beogradski 4401 0.04 850 6097 0.03 274.72 0.0000000000
63 first prvi 21664 0.19 5387 36303 0.16 267.38 0.0000000000
64 today danas 10922 0.09 5569 17349 0.08 250.07 0.0000000000
65 man čovek 23017 0.20 2663 38926 0.17 249.37 0.0000000000
66 topic tema 4097 0.04 1184 5723 0.03 244.03 0.0000000000
67 father otac 1602 0.01 1017 1823 232.58 0.0000000000
68 age doba 1666 0.01 1277 1962 214.73 0.0000000000
69 war rat 6446 0.06 1078 9871 0.04 204.88 0.0000000000
70 death smrti 1762 0.02 1192 2165 193.38 0.0000000000
71 community zajednica 4381 0.04 868 6445 0.03 188.31 0.0000000000
72 common zajednički 2242 0.02 946 2937 0.01 186.44 0.0000000000
73 generation generacije 1096 889 1187 186.16 0.0000000000
74 sometimes ponekad 1462 0.01 1162 1751 177.12 0.0000000000
75 America Americi 1319 0.01 854 1557 168.56 0.0000000000
76 past prošlosti 1372 0.01 981 1651 163.28 0.0000000000
77 foreign strani 12907 0.11 1688 21659 0.10 156.14 0.0000000000
78 Yugoslavia Jugoslavije 2462 0.02 1531 3409 0.02 154.11 0.0000000000
79 person ličnosti 1474 0.01 1113 1835 153.41 0.0000000000
80 idea ideja 3168 0.03 1387 4585 0.02 151.94 0.0000000000
81 second drugi 30613 0.26 4548 54094 0.24 150.84 0.0000000000
82 abroad inostranstvu 1298 0.01 935 1605 138.88 0.0000000000
83 scene sceni 1723 0.01 1163 2276 0.01 137.69 0.0000000000
84 space prostor 5020 0.04 1297 7859 0.03 131.81 0.0000000000
227
N Keyword (Serbian) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
85 this ovaj 49317 0.42 4897 89477 0.40 120.73 0.0000000000
86 every svaki 12309 0.11 2478 20959 0.09 120.22 0.0000000000
87 house kuća 1897 0.02 1386 2621 0.01 120.17 0.0000000000
88 Belgrade Beograd 19233 0.17 3386 33602 0.15 120.01 0.0000000000
89 you vi 7922 0.07 1306 13072 0.06 119.35 0.0000000000
90 voice/vote glas 1151 857 1439 117.70 0.0000000000
91 opinion mišljenje 3248 0.03 946 4940 0.02 109.05 0.0000000000
92 large veliki 18380 0.16 3411 32248 0.14 105.32 0.0000000000
93 world svetski 5387 0.05 814 8708 0.04 102.92 0.0000000000
94 right pravo 9921 0.09 2455 16852 0.07 100.55 0.0000000000
95 desire želja 1375 0.01 809 1893 88.83 0.0000000000
96 Serbia and Montenegro SCG 2523 0.02 1078 3843 0.02 83.71 0.0000000000
97 work rad 8687 0.07 2785 14832 0.07 81.27 0.0000000000
98 never nikad 5219 0.04 1290 8643 0.04 75.14 0.0000000000
99 most often najčešće 1445 0.01 1145 2067 74.65 0.0000000000
100 carry nosi 1048 935 1420 73.72 0.0000000000
101 society društvo 5464 0.05 1323 9127 0.04 70.33 0.0000000000
102 truth istina 2307 0.02 1148 3596 0.02 62.99 0.0000000000
103 city grad 4841 0.04 1385 8095 0.04 61.42 0.0000000000
104 find naći 3008 0.03 1266 4924 0.02 49.86 0.0000000000
105 conversation razgovor 1051 843 1536 47.25 0.0000000000
106 little mali 9949 0.09 1240 17645 0.08 45.01 0.0000000000
107 task zadatak 1458 0.01 866 2247 43.93 0.0000000000
108 decade decenije 1066 916 1588 41.91 0.0000000000
109 there tamo 4117 0.04 2543 6997 0.03 41.36 0.0000000000
110 emphasize ističe 1895 0.02 1472 3039 0.01 39.37 0.0000000000
111 all the time stalno 1776 0.02 1375 2851 0.01 36.52 0.0000000001
228
Table C2
N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 Serbia Srbija 33742 0.29 2431 93159 0.41 -3359.86 0.0000000000
2 government vlada 6808 0.06 1877 26794 0.12 -3142.76 0.0000000000
3 Euros evra 3045 0.03 1325 15874 0.07 -3107.69 0.0000000000
4 millions miliona 3110 0.03 1511 15619 0.07 -2889.99 0.0000000000
5 percent odsto 5838 0.05 1923 22713 0.10 -2594.47 0.0000000000
6 Dinars dinara 2690 0.02 1060 11867 0.05 -1759.88 0.0000000000
7 law zakon 5689 0.05 1069 19899 0.09 -1733.10 0.0000000000
8 parties stranke 2288 0.02 1047 10165 0.05 -1527.36 0.0000000000
9 against protiv 5439 0.05 2939 18154 0.08 -1376.63 0.0000000000
10 president predsednik 7900 0.07 2829 23415 0.10 -1162.70 0.0000000000
11 authorities vlast 5740 0.05 1262 17482 0.08 -966.86 0.0000000000
12 money novac 3009 0.03 1163 10517 0.05 -913.74 0.0000000000
13 minister ministar 4453 0.04 1485 13981 0.06 -865.05 0.0000000000
14 decision odluka 4129 0.04 912 13130 0.06 -849.51 0.0000000000
15 citizens građani 4283 0.04 873 13444 0.06 -831.09 0.0000000000
16 EU EU 3393 0.03 875 10873 0.05 -722.24 0.0000000000
17 state država 10880 0.09 2251 26849 0.12 -484.33 0.0000000000
18 case slučaj 5020 0.04 1289 13753 0.06 -475.39 0.0000000000
19 public javnost 3582 0.03 938 10303 0.05 -449.71 0.0000000000
20 larger veći 4342 0.04 1194 11978 0.05 -428.86 0.0000000000
21 system sistem 3988 0.03 1152 10931 0.05 -378.75 0.0000000000
22 director direktor 4308 0.04 2047 11611 0.05 -368.04 0.0000000000
23 prime-minister premijera 1298 0.01 877 4429 0.02 -358.90 0.0000000000
24 former bivši 1163 896 4033 0.02 -342.56 0.0000000000
25 problem problem 8628 0.07 2392 20854 0.09 -318.88 0.0000000000
26 year godina 66508 0.57 7372 139140 0.62 -298.02 0.0000000000
27 time vreme 19337 0.17 6382 43233 0.19 -295.25 0.0000000000
28 solution rešenje 2675 0.02 1011 7465 0.03 -282.97 0.0000000000
29 day dan 13186 0.11 2168 30134 0.13 -268.22 0.0000000000
30 clearly jasno 2464 0.02 1852 6790 0.03 -241.75 0.0000000000
31 week nedelje 1401 0.01 1148 4240 0.02 -228.71 0.0000000000
32 development razvoj 3534 0.03 1185 9132 0.04 -226.23 0.0000000000
33 group grupa 5265 0.05 1301 12934 0.06 -225.25 0.0000000000
34 result rezultat 1052 875 3307 0.01 -205.43 0.0000000000
35 now sada 13005 0.11 6602 29006 0.13 -191.79 0.0000000000
36 publicly javno 1200 0.01 927 3603 0.02 -188.37 0.0000000000
37 last prošle 2693 0.02 2056 7041 0.03 -187.51 0.0000000000
38 expect očekuje 1334 0.01 1106 3828 0.02 -165.29 0.0000000000
39 parliament skupštine 1684 0.01 1024 4590 0.02 -154.44 0.0000000000
229
N Keyword English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
40 choice izbor 4185 0.04 1285 9934 0.04 -129.72 0.0000000000
41 political politički 10025 0.09 1651 22155 0.10 -129.10 0.0000000000
42 number broj 8152 0.07 3245 18272 0.08 -128.79 0.0000000000
43 nobody niko 3674 0.03 2526 8821 0.04 -127.41 0.0000000000
44 means sredstva 1190 0.01 871 3301 0.01 -121.47 0.0000000000
45 earlier ranije 2036 0.02 1659 5184 0.02 -116.69 0.0000000000
46 be able to moći 42024 0.36 1782 86394 0.38 -114.45 0.0000000000
47 process proces 2077 0.02 994 5254 0.02 -113.17 0.0000000000
48 momentarily trenutno 1688 0.01 1364 4354 0.02 -106.62 0.0000000000
49 account računa 1052 875 2903 0.01 -104.07 0.0000000000
50 moment trenutku 2289 0.02 1711 5648 0.03 -101.71 0.0000000000
51 affairs poslova 1634 0.01 1023 4199 0.02 -100.40 0.0000000000
52 now sad 5543 0.05 2973 12467 0.06 -91.76 0.0000000000
53 possible moguće 2430 0.02 1830 5834 0.03 -84.22 0.0000000000
54 five pet 4137 0.04 2844 9446 0.04 -83.18 0.0000000000
55 less manje 3841 0.03 2753 8824 0.04 -83.17 0.0000000000
56 help pomoći 1236 0.01 986 3173 0.01 -75.37 0.0000000000
57 reason razlog 3839 0.03 1096 8736 0.04 -74.00 0.0000000000
58 situation situacija 1300 0.01 1023 3281 0.01 -69.57 0.0000000000
59 state stanje 3319 0.03 1042 7590 0.03 -68.01 0.0000000000
60 six šest 2608 0.02 1993 6055 0.03 -63.82 0.0000000000
61 ministry ministarstvo 4059 0.03 1124 9085 0.04 -62.85 0.0000000000
62 institution institucija 3133 0.03 953 7145 0.03 -62.06 0.0000000000
63 question pitanje 12767 0.11 3667 26711 0.12 -57.01 0.0000000000
64 Kosovo Kosova 3086 0.03 1165 6961 0.03 -53.10 0.0000000000
65 say reći 13062 0.11 2110 27137 0.12 -48.48 0.0000000000
66 power/forces snage 1148 866 2813 0.01 -48.01 0.0000000000
67 goal cilj 2369 0.02 1461 5394 0.02 -45.99 0.0000000000
68 nothing ništa 4976 0.04 3212 10772 0.05 -45.60 0.0000000000
69 get dobiti 8347 0.07 888 17602 0.08 -45.05 0.0000000000
70 persons lica 1410 0.01 927 3352 0.01 -44.36 0.0000000000
71 plan plan 2555 0.02 861 5743 0.03 -41.93 0.0000000000
72 largest najveći 5070 0.04 1664 10897 0.05 -40.71 0.0000000000
73 bad loše 1050 877 2540 0.01 -39.08 0.0000000000
74 far daleko 1596 0.01 1298 3696 0.02 -37.94 0.0000000000
75 ten deset 2621 0.02 1946 5838 0.03 -37.88 0.0000000000
76 immediately odmah 2636 0.02 2028 5868 0.03 -37.78 0.0000000000
77 last poslednji 4936 0.04 1049 10577 0.05 -37.39 0.0000000000
230
Appendix D: Keyword Analysis (5+ Hits Section of SERBCORP)
Table D1
Positive Key Lemmas in the 5+ Hits Section of SERBCORP (by Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 1 76538.41 0.0000000000
2 Serbian srpski 7309 0.65 670 32565 0.14 9779.72 0.0000000000
3 lingustic jezički 901 0.08 112 10 5387.05 0.0000000000
4 school škola 2593 0.23 237 7491 0.03 5050.31 0.0000000000
5 English engleski 1220 0.11 271 722 4949.45 0.0000000000
6 literature književnost 1305 0.12 269 1158 4667.73 0.0000000000
7 mother (adj.) maternji 611 0.05 197 1 3712.29 0.0000000000
8 book knjiga 2499 0.22 371 10369 0.05 3583.33 0.0000000000
9 dictionary rečnik 801 0.07 135 320 3576.38 0.0000000000
10 literary književni 1027 0.09 128 1288 3210.07 0.0000000000
11 Cyrillic ćirilica 590 0.05 74 188 2756.74 0.0000000000
12 learn učiti 825 0.07 70 1245 2369.50 0.0000000000
13 professor profesor 1524 0.14 307 5952 0.03 2313.27 0.0000000000
14 instruction nastava 710 0.06 107 1016 2091.33 0.0000000000
15 writer pisac 998 0.09 182 2633 0.01 2073.11 0.0000000000
16 grade razred 609 0.05 87 650 2033.88 0.0000000000
17 word reč 2633 0.24 502 18651 0.08 1941.47 0.0000000000
18 poetry poezija 543 0.05 63 526 1881.56 0.0000000000
19 alphabet pismo 1243 0.11 169 5401 0.02 1702.15 0.0000000000
20 translator prevodilac 395 0.04 102 232 1605.55 0.0000000000
21 people narod 1510 0.13 234 8257 0.04 1601.00 0.0000000000
22 education obrazovanje 893 0.08 187 2969 0.01 1558.60 0.0000000000
23 culture kultura 1485 0.13 230 8355 0.04 1519.44 0.0000000000
24 education (profession) prosvete 563 0.05 219 965 1516.59 0.0000000000
25 students (K-12) učenici 637 0.06 120 1397 1492.42 0.0000000000
26 linguist lingvista 260 0.02 81 12 1488.69 0.0000000000
27 translation prevod 478 0.04 111 629 1462.75 0.0000000000
28 Montenegrin crnogorski 696 0.06 106 2006 1357.09 0.0000000000
29 Serbo-Croatian srpskohrvatski 226 0.02 70 6 1323.38 0.0000000000
30 novel roman 678 0.06 122 2166 1221.90 0.0000000000
31 learning učenje 352 0.03 129 343 1217.01 0.0000000000
32 subject predmet 770 0.07 147 2938 0.01 1193.64 0.0000000000
33 poet pesnik 402 0.04 89 600 1160.62 0.0000000000
34 school (university) fakultet 995 0.09 89 5250 0.02 1101.39 0.0000000000
35 Serbs Srbi 1432 0.13 250 9918 0.04 1093.62 0.0000000000
36 Croatian hrvatski 684 0.06 163 2749 0.01 1010.51 0.0000000000
231
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
37 minority manjina 491 0.04 111 1320 1006.48 0.0000000000
38 speak govoriti 1764 0.16 90 14569 0.06 992.19 0.0000000000
39 use (n.) upotreba 585 0.05 72 2054 975.53 0.0000000000
40 national nacionalni 1361 0.12 88 9893 0.04 961.57 0.0000000000
41 French francuski 477 0.04 140 1488 875.85 0.0000000000
42 elementary osnovni 870 0.08 87 5077 0.02 849.01 0.0000000000
43 speech govor 487 0.04 111 1636 842.68 0.0000000000
44 students (K-8) đaci 262 0.02 110 326 821.57 0.0000000000
45 science nauka 668 0.06 189 3468 0.02 753.63 0.0000000000
46 Bosnian bosanski 266 0.02 64 432 736.64 0.0000000000
47 Bosniak bošnjački 223 0.02 70 254 725.60 0.0000000000
48 children deca 1294 0.12 220 10786 0.05 714.75 0.0000000000
49 wrote pisali 1012 0.09 76 7364 0.03 713.61 0.0000000000
50 edition izdanje 437 0.04 72 1700 665.47 0.0000000000
51 century vek 855 0.08 72 6062 0.03 628.97 0.0000000000
52 cultural kulturni 655 0.06 95 3896 0.02 623.19 0.0000000000
53 meaning značenje 262 0.02 79 578 611.55 0.0000000000
54 foreign strani 2004 0.18 265 21659 0.10 598.10 0.0000000000
55 knowledge znanje 542 0.05 123 2910 0.01 587.42 0.0000000000
56 history istorija 848 0.08 125 6267 0.03 582.63 0.0000000000
57 exam ispit 305 0.03 63 910 579.48 0.0000000000
58 doctor dr 843 0.08 314 6243 0.03 577.16 0.0000000000
59 expression izraz 310 0.03 103 999 554.76 0.0000000000
60 Montenegrins Crnogorci 213 0.02 62 397 548.47 0.0000000000
61 German nemački 460 0.04 123 2301 0.01 541.69 0.0000000000
62 class period čas 542 0.05 63 3156 0.01 530.37 0.0000000000
63 world svet 1624 0.15 235 17017 0.08 528.78 0.0000000000
64 Vuk (Karadžić) Vuk 482 0.04 103 2575 0.01 525.54 0.0000000000
65 Croats hrvati 315 0.03 117 1111 523.22 0.0000000000
66 SANU SANU 235 0.02 84 582 509.44 0.0000000000
67 Spanish španski 201 0.02 63 414 488.93 0.0000000000
68 identity identitet 331 0.03 94 1356 480.11 0.0000000000
69 academician akademik 210 0.02 74 558 433.97 0.0000000000
70 label naziv 442 0.04 119 2660 0.01 413.93 0.0000000000
71 scientific naučni 255 0.02 65 947 404.94 0.0000000000
72 our naš 3182 0.28 341 43596 0.19 392.44 0.0000000000
73 name ime 943 0.08 252 9292 0.04 360.34 0.0000000000
74 politics politika 1475 0.13 897 17075 0.08 355.97 0.0000000000
75 own svoj 4336 0.39 440 64375 0.29 343.75 0.0000000000
76 Monte(negro) Gora 1072 0.10 93 11299 0.05 343.32 0.0000000000
77 (Monte)negro Crna 971 0.09 86 9931 0.04 337.32 0.0000000000
78 second drugi 3666 0.33 534 54094 0.24 301.76 0.0000000000
79 my/mine moj 929 0.08 165 9866 0.04 291.30 0.0000000000
80 author autor 540 0.05 170 4725 0.02 270.30 0.0000000000
81 schooling školovanje 160 0.01 62 558 268.30 0.0000000000
232
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
82 published objavljen 273 0.02 89 1595 265.93 0.0000000000
83 Russian ruski 525 0.05 106 4619 0.02 259.79 0.0000000000
84 writing pisanje 172 0.02 75 708 248.32 0.0000000000
85 program/curriculum program 931 0.08 161 10476 0.05 245.99 0.0000000000
86 publisher izdavač 211 0.02 82 1065 245.91 0.0000000000
87 iunderstand razumeti 287 0.03 62 1847 244.68 0.0000000000
88 literature literature 96 83 188 240.43 0.0000000000
89 spirit duh 275 0.02 64 1755 237.30 0.0000000000
90 I ja 2152 0.19 346 30315 0.13 230.32 0.0000000000
91 letters (a, b, c…) slova 100 72 227 229.30 0.0000000000
92 tradition tradicija 252 0.02 64 1559 227.21 0.0000000000
93 nation nacija 305 0.03 66 2155 225.56 0.0000000000
94 parents roditelji 314 0.03 106 2264 0.01 224.71 0.0000000000
95 text tekst 584 0.05 128 5996 0.03 200.77 0.0000000000
96 sentence rečenica 153 0.01 64 737 187.91 0.0000000000
97 today danas 1306 0.12 567 17349 0.08 185.88 0.0000000000
98 study studija 333 0.03 101 2775 0.01 183.93 0.0000000000
99 award nagrada 490 0.04 80 4956 0.02 175.20 0.0000000000
100 title naslov 246 0.02 63 1783 174.54 0.0000000000
101 lectures predavanja 100 72 394 150.48 0.0000000000
102 reality stvarnost 201 0.02 65 1413 149.85 0.0000000000
103 art umetnost 473 0.04 104 5011 0.02 149.29 0.0000000000
104 life život 1036 0.09 249 14019 0.06 135.40 0.0000000000
105 wrote napisao 194 0.02 141 1451 130.56 0.0000000000
106 one jedan 3595 0.32 684 59267 0.26 126.54 0.0000000000
107 work delo 288 0.03 168 2736 0.01 120.16 0.0000000000
108 Vojvodina Vojvodini 171 0.02 72 1316 109.51 0.0000000000
109 people's narodni 435 0.04 65 4981 0.02 108.71 0.0000000000
110 love ljubav 241 0.02 79 2236 106.18 0.0000000000
111 here ovde 664 0.06 306 8589 0.04 106.04 0.0000000000
112 many mnogi 768 0.07 238 10358 0.05 101.94 0.0000000000
113 self sebe 752 0.07 271 10149 0.05 99.50 0.0000000000
114 notion pojam 86 64 457 94.36 0.0000000000
115 Italian italijanski 85 67 452 93.18 0.0000000000
116 part deo 1475 0.13 378 22718 0.10 91.27 0.0000000000
117 interest interesovanje 160 0.01 80 1336 88.02 0.0000000000
118 example primer 571 0.05 342 7455 0.03 87.65 0.0000000000
119 special poseban 633 0.06 75 8470 0.04 87.14 0.0000000000
120 every svaki 1363 0.12 261 20959 0.09 85.31 0.0000000000
121 she ona 1227 0.11 449 18787 0.08 79.15 0.0000000000
122 form oblik 183 0.02 64 1734 76.81 0.0000000000
123 born rođen 131 0.01 77 1081 73.74 0.0000000000
124 often često 426 0.04 275 5427 0.02 72.44 0.0000000000
125 that onaj 1511 0.14 167 24063 0.11 72.35 0.0000000000
126 sense smisao 404 0.04 74 5093 0.02 71.65 0.0000000000
233
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
127 common zajednički 263 0.02 72 2937 0.01 71.12 0.0000000000
128 opinion mišljenje 392 0.04 109 4940 0.02 69.61 0.0000000000
129 your(s) vaš 215 0.02 68 2431 0.01 55.93 0.0000000000
130 age doba 183 0.02 131 1962 55.83 0.0000000000
131 both oba 149 0.01 121 1491 54.75 0.0000000000
132 introduction uvođenje 185 0.02 74 2005 54.71 0.0000000000
133 same isti 1104 0.10 171 17538 0.08 53.93 0.0000000000
134 live žive 235 0.02 167 2775 0.01 53.01 0.0000000000
135 creation stvaranje 187 0.02 65 2057 52.93 0.0000000000
136 generation generacije 125 0.01 98 1187 52.20 0.0000000000
137 phenomenon pojava 164 0.01 76 1759 49.98 0.0000000000
138 experience iskustvo 293 0.03 97 3766 0.02 48.04 0.0000000000
139 difference razlika 419 0.04 101 5836 0.03 47.50 0.0000000000
140 change menja 138 0.01 107 1433 46.01 0.0000000000
141 sometimes ponekad 159 0.01 121 1751 44.85 0.0000000000
142 story priča 794 0.07 237 12432 0.06 43.48 0.0000000000
143 newspaper novina 101 76 953 42.81 0.0000000000
144 community zajednica 446 0.04 95 6445 0.03 41.44 0.0000000000
145 their(s) njihov 1265 0.11 188 21010 0.09 41.29 0.0000000000
146 population stanovništva 156 0.01 91 1766 40.43 0.0000000000
147 they oni 5019 0.45 579 92128 0.41 38.69 0.0000000000
148 most often najčešće 173 0.02 127 2067 37.46 0.0000000000
149 first prvi 2077 0.19 480 36303 0.16 37.10 0.0000000000
150 past prošlosti 145 0.01 95 1651 36.89 0.0000000000
151 topic tema 395 0.04 107 5723 0.03 36.15 0.0000000001
234
Appendix E: Keyword Analysis (5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference
Corpus)
Table E1
Positive Key Lemmas in the 5+ Hits Section of SERBCORP with the 1-4 Hits Section of SERBCORP as the Reference Corpus (by
Keyness Score)
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
1 language jezik 12530 1.12 1118 21304 0.20 18516.54 0.0000000000
2 Serbian srpski 7309 0.65 670 28440 0.27 3799.77 0.0000000000
3 linguistic jezički 901 0.08 112 792 2043.82 0.0000000000
4 dictionary rečnik 801 0.07 135 783 1717.42 0.0000000000
5 school škola 2593 0.23 237 10392 0.10 1269.04 0.0000000000
6 mother (adj.) maternji 611 0.05 197 649 1249.68 0.0000000000
7 alphabet pismo 1243 0.11 169 3424 0.03 1108.27 0.0000000000
8 Cyrillic ćirilica 590 0.05 74 905 942.80 0.0000000000
9 word reč 2633 0.24 502 12494 0.12 879.24 0.0000000000
10 instruction nastava 710 0.06 107 1467 0.01 875.24 0.0000000000
11 learn učiti 825 0.07 70 2015 0.02 851.36 0.0000000000
12 Montenegrin crnogorski 696 0.06 106 1452 0.01 849.83 0.0000000000
13 English engleski 1220 0.11 271 4033 0.04 839.02 0.0000000000
14 linguist lingvista 260 0.02 81 96 823.13 0.0000000000
15 professor profesor 1524 0.14 307 5987 0.06 775.36 0.0000000000
16 literature književnost 1305 0.12 269 4765 0.05 760.37 0.0000000000
17 subject predmet 770 0.07 147 1997 0.02 740.22 0.0000000000
18 Croatian hrvatski 684 0.06 163 1623 0.02 729.29 0.0000000000
19 use (n.) upotreba 585 0.05 72 1249 0.01 697.87 0.0000000000
20 Serbo-Croatian srpskohrvatski 226 0.02 70 142 597.25 0.0000000000
21 education (profession) prosvete 563 0.05 219 1388 0.01 574.70 0.0000000000
22 education obrazovanje 893 0.08 187 3078 0.03 574.07 0.0000000000
23 grade razred 609 0.05 87 1672 0.02 545.16 0.0000000000
24 literary književni 1027 0.09 128 3959 0.04 541.68 0.0000000000
25 people narod 1510 0.13 234 7019 0.07 530.86 0.0000000000
26 Bosniak bošnjački 223 0.02 70 181 526.18 0.0000000000
27 learning učenje 352 0.03 129 627 497.70 0.0000000000
28 foreign strani 2004 0.18 265 10904 0.10 449.44 0.0000000000
29 Bosnian bosanski 266 0.02 64 383 445.70 0.0000000000
30 students (K-12) učenici 637 0.06 120 2163 0.02 419.59 0.0000000000
235
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
31 elementary osnovni 870 0.08 87 3686 0.03 378.99 0.0000000000
32 national nacionalni 1361 0.12 88 6963 0.07 369.42 0.0000000000
33 Vuk (Karadžić) Vuk 482 0.04 103 1540 0.01 349.22 0.0000000000
34 culture kultura 1485 0.13 230 8098 0.08 330.50 0.0000000000
35 speech govor 487 0.04 111 1685 0.02 311.04 0.0000000000
36 speak govoriti 1764 0.16 90 10282 0.10 309.94 0.0000000000
37 minority manjina 491 0.04 111 1758 0.02 295.94 0.0000000000
38 translator prevodilac 395 0.04 102 1249 0.01 290.68 0.0000000000
39 Croats hrvati 315 0.03 117 929 256.25 0.0000000000
40 science nauka 668 0.06 189 3063 0.03 242.74 0.0000000000
41 Serbs Srbi 1432 0.13 250 8442 0.08 240.77 0.0000000000
42 translation prevod 478 0.04 111 1998 0.02 214.29 0.0000000000
43 class period čas 542 0.05 63 2441 0.02 205.62 0.0000000000
44 Montenegrins Crnogorci 213 0.02 62 564 199.56 0.0000000000
45 doctor dr 843 0.08 314 4519 0.04 198.29 0.0000000000
46 second drugi 3666 0.33 534 26947 0.26 187.05 0.0000000000
47 expression izraz 310 0.03 103 1128 0.01 181.64 0.0000000000
48 meaning značenje 262 0.02 79 867 179.80 0.0000000000
49 school (university) fakultet 995 0.09 89 5774 0.05 177.70 0.0000000000
50 label naziv 442 0.04 119 1995 0.02 166.80 0.0000000000
51 wrote pisali 1012 0.09 76 6015 0.06 164.69 0.0000000000
52 German nemački 460 0.04 123 2186 0.02 152.83 0.0000000000
53 Spanish španski 201 0.02 63 630 149.85 0.0000000000
54 exam ispit 305 0.03 63 1222 0.01 149.17 0.0000000000
55 SANU SANU 235 0.02 84 827 145.87 0.0000000000
56 name ime 943 0.08 252 5684 0.05 144.97 0.0000000000
57 identity identitet 331 0.03 94 1400 0.01 144.68 0.0000000000
58 children deca 1294 0.12 220 8371 0.08 144.52 0.0000000000
59 knowledge znanje 542 0.05 123 2813 0.03 140.91 0.0000000000
60 scientific naučni 255 0.02 65 1005 128.83 0.0000000000
61 French francuski 477 0.04 140 2466 0.02 125.47 0.0000000000
62 Russian ruski 525 0.05 106 2827 0.03 121.69 0.0000000000
63 poetry poezija 543 0.05 63 2988 0.03 117.18 0.0000000000
64 introduction uvođenje 185 0.02 74 645 116.65 0.0000000000
65 percent odsto 816 0.07 193 5022 0.05 114.84 0.0000000000
66 writer pisac 998 0.09 182 6480 0.06 109.40 0.0000000000
67 Monte(negro) Gora 1072 0.10 93 7178 0.07 99.97 0.0000000000
68 understand razumeti 287 0.03 62 1339 0.01 99.90 0.0000000000
69 be able to moći 4619 0.41 186 37405 0.35 90.76 0.0000000000
70 (Monte)negro Crna 971 0.09 86 6503 0.06 90.45 0.0000000000
71 academician akademik 210 0.02 74 900 89.19 0.0000000000
72 cultural kulturni 655 0.06 95 4147 0.04 81.12 0.0000000000
73 example primer 571 0.05 342 3576 0.03 74.36 0.0000000000
74 nation nacija 305 0.03 66 1622 0.02 73.55 0.0000000000
75 Vojvodina Vojvodini 171 0.02 72 740 71.08 0.0000000000
236
N Keyword (English) Keyword (Serbian) Freq. % Texts RC. Freq. RC. % Keyness P
76 sentence rečenica 153 0.01 64 640 68.47 0.0000000000
77 today danas 1306 0.12 567 9616 0.09 65.65 0.0000000000
78 letters (a, b, c…) slova 100 72 344 64.47 0.0000000000
79 students (K-8) đaci 262 0.02 110 1426 0.01 58.63 0.0000000000
80 same isti 1104 0.10 171 8156 0.08 54.05 0.0000000000
81 poet pesnik 402 0.04 89 2505 0.02 53.54 0.0000000000
82 schooling školovanje 160 0.01 62 769 51.61 0.0000000000
83 program/curriculum program 931 0.08 161 6780 0.06 50.85 0.0000000000
84 difference razlika 419 0.04 101 2665 0.03 50.77 0.0000000000
85 book knjiga 2499 0.22 371 20214 0.19 49.74 0.0000000000
86 century vek 855 0.08 72 6192 0.06 48.64 0.0000000000
87 parents roditelji 314 0.03 106 1893 0.02 48.21 0.0000000000
88 history istorija 848 0.08 125 6142 0.06 48.20 0.0000000000
89 change (v.) menja 138 0.01 107 674 42.65 0.0000000000
90 law zakon 687 0.06 112 5002 0.05 37.58 0.0000000000
237
Appendix F: Collocation Analysis (SERBCORP)
produced a total of 368 lemma collocates of the lemma JEZIK (Tables F1 and F2). Table
F1 shows the lemma collocates by frequency. Unsurprisingly for a corpus of Serbian, the
most frequent lemma collocate of the lemma JEZIK is srpski ‘Serbian’ with 8,063
occurrences in 4,486 texts. Perhaps equally expectedly, the second most frequent lemma
many other countries around the world, is seen as the most important foreign language
(followed here by French, German, Russian, and Spanish). Other top ten most frequent
lemma collocates of the lemma JEZIK include: strani ‘foreign’, govoriti ‘speak’, svoj
‘own’, maternji ‘mother (tongue)’, naš ‘our’, književnost ‘literature’, svi ‘all’, and
francuski ‘French’. As with keyword analysis of SERBCORP above, the most frequent
routinized construction of in- and out-groups (Serbian, own, mother tongue, and our vs.
English, foreign, and French). Further, similar to the results of keyword analysis, the
remainder of the top 50 collocates indicate semantic fields of education (učiti ‘learn’,
discourses than keywords even at the level of SERBCORP. Thus, we see hrvatski
238
‘people’, istorija ‘history’, postoji ‘exists’, and ime ‘name’, hinting at the discourse of
suggested by the collocate manjina ‘minority’ as well as the glottonyms referring to the
two largest minority groups in Serbia, albanski ‘Albanian’ and mađarski ‘Hungarian’.
The top ten most significant collocates (see Table F2), on the other hand, exhibit a
more opaque pattern. Zli ‘evil’ refers to the common metonymy zli jezici ‘evil tongues’
(or, rather, ‘malicious tongues’). We also see an eclectic mix of items such as the verb
izučavati ‘to study’, the attributive adjective razumljiv ‘comprehensible’, the plural noun
brojki ‘numerals’, the singular noun geografija ‘geography’, and the glottonym švedski
‘Swedish’. More interestingly, the most significant collocates of the lemma JEZIK in
SERBCORP also include službeni and zvaničan both meaning ‘official’, and hrvatski
to bilingualism, dvojezično ‘bilingual’. Finaly, one entirely new pattern suggests the
attributive adjectives jednostavnim ‘simple, jasnim ‘clear’, and čisti ‘pure’. In sum, then,
the top collocates of the lemma JEZIK in SERBCORP seem to show patterns similar to
those exhibited by key lemmas, while frequency seems to offer a better insight into the
239
Table F1
240
N Collocate (English) Collocate (Serbian) MI score Texts Total
64 Serbs Srbi 6.95 176 234
65 knowledge poznavanje 8.85 208 231
66 poem pesma 5.94 205 229
67 course kurs 8.11 184 226
68 different različit 10.44 192 222
69 many mnogi 8.77 204 220
70 Greek grčki 7.80 173 219
71 official zvaničan 11.18 166 218
72 must morati 6.70 195 207
73 edition izdanje 6.43 187 205
74 mean (v.) značiti 5.81 194 204
75 class period čas 6.20 164 197
76 use (v). koristiti 9.73 172 197
77 translator prevodilac 5.37 178 197
78 understand razumeti 5.34 178 191
79 student (university) student 5.52 157 189
80 media mediji 6.22 161 187
81 poetry poezija 6.31 152 184
82 science nauka 5.76 146 182
83 text tekst 8.81 162 181
84 number broj 6.84 152 179
85 special poseban 5.51 152 177
86 education obrazovanje 5.63 156 176
87 four četiri 5.49 154 174
88 department katedra 6.66 130 174
89 European evropski 8.57 149 173
90 Italian italijanski 6.81 156 168
91 Slovene slovenski 8.27 114 168
92 others ostali 5.16 148 164
93 Romanian rumunski 7.08 126 160
94 state država 7.67 135 159
95 hair dlake 10.56 151 158
96 speech govor 6.31 127 155
97 instructor nastavnik 6.63 113 154
98 Bosniak bošnjački 7.02 87 152
99 exam ispit 6.79 117 152
100 introduction uvođenje 7.65 112 150
101 group grupa 8.61 120 148
102 textbook udžbenik 6.32 122 146
103 Latin latinski 8.81 115 143
104 written pisan 7.93 135 143
105 teach predavati 10.71 112 143
106 philological filološki 7.80 133 142
107 publish objaviti 5.39 136 141
108 novel roman 6.95 129 140
109 best najbolji 8.65 130 136
110 standardization standardizacija 9.41 69 133
111 five pet 5.15 123 132
112 said rečeno 9.00 125 129
113 music muzika 5.48 118 127
114 Vuk (Karadžić) Vuk 6.58 100 127
115 orthography pravopis 8.68 85 125
116 law zakon 5.05 100 124
117 faith vera 6.51 108 122
118 Slovene slovenački 5.97 112 121
119 program of study studija 5.52 105 121
120 written napisan 9.66 113 119
121 renaming preimenovanje 9.76 66 117
122 life život 5.60 111 117
123 university univerzitet 5.04 108 116
124 Bulgarian bugarski 8.36 83 115
125 serve služiti 9.32 105 115
126 represent predstavljati 5.07 106 114
127 film (adj.) filmski 5.48 101 112
128 customs običaji 8.21 107 112
129 area oblast 5.71 100 111
130 SANU SANU 6.86 77 111
131 hatred mržnje 7.80 91 109
132 call zvati 8.19 81 109
241
N Collocate (English) Collocate (Serbian) MI score Texts Total
133 board odbor 5.60 65 107
134 various razni 9.14 99 107
135 dialect dijalekat 7.12 73 106
136 communication komunikacija 5.02 89 106
137 students (K-12) učenici 5.39 89 106
138 protection zaštita 5.50 94 106
139 Japanese japanski 8.28 79 105
140 Slovak slovački 7.43 87 105
141 grammar gramatika 8.65 82 104
142 think misliti 7.37 97 104
143 self sebe 7.59 92 101
144 Arabic arapski 9.31 86 100
145 church crkva 6.89 86 99
146 excellent odlično 5.82 96 98
147 poetry pesnik 8.67 90 98
148 know poznavati 9.47 88 95
149 six šest 5.34 89 95
150 doctor dr 6.52 86 94
151 Chinese kineski 8.02 76 94
152 compulsory obavezan 6.78 78 93
153 level nivo 5.25 78 92
154 standard standardni 7.63 55 92
155 desire (v.) želeti 5.09 84 92
156 literature literatura 6.71 84 88
157 expression izraz 6.04 74 85
158 title naslov 9.11 70 85
159 introduce uvesti 8.97 70 85
160 territory prostor 7.66 80 84
161 attend pohađati 7.42 66 83
162 poetic pesnički 5.89 62 82
163 Croats Hrvati 6.16 52 80
164 department odsek 7.77 63 80
165 high school gimnazija 5.78 72 79
166 need (n.) potreba 5.03 71 79
167 framework okvir 5.03 66 76
168 style stil 6.81 71 76
169 so-called takozvani 6.05 59 75
170 instructional nastavni 5.54 55 73
171 translation prevođenje 11.04 64 73
172 belong pripadati 6.18 60 73
173 evil zli 13.67 68 73
174 against protiv 8.25 64 72
175 spoken govorni 8.62 56 71
176 Turkish turski 5.95 60 71
177 minority manjinski 11.05 48 69
178 both oba 6.77 58 69
179 Latin (alphabet) latinica 6.82 51 68
180 Polish poljski 8.55 62 68
181 needed potreban 5.33 64 67
182 exclusively isključivo 6.37 63 66
183 everyday svakodnevni 6.21 62 66
184 study (v.) izučavati 12.16 53 65
185 paper list 5.51 64 65
186 Macedonian makedonskom 9.40 58 64
187 Ruthenian rusinski 7.87 42 64
188 read čitati 6.05 59 63
189 ordinary običan 5.66 62 63
190 preservation očuvanje 7.34 59 63
191 comprehensible razumljiv 11.12 62 63
192 study (v.) studirati 5.10 53 63
193 fluently tečno 7.64 63 63
194 third treći 5.14 53 62
195 simultaneously istovremeno 5.02 57 61
196 classical klasični 7.27 40 61
197 appear pojaviti 6.24 58 61
198 defense odbrana 5.05 44 60
199 geography geografija 11.13 52 59
200 students (K-8) đaci 5.61 55 58
201 beautiful lep 5.58 53 58
242
N Collocate (English) Collocate (Serbian) MI score Texts Total
202 listen slušati 7.12 50 58
203 test test 6.99 44 58
204 nature priroda 5.16 54 57
205 high (school) srednji 5.50 53 56
206 Cyrillic (adj.) ćirilično 7.12 47 55
207 hear čuti 10.20 54 55
208 unique jedinstven 7.26 35 54
209 novelty novina 7.30 51 54
210 create stvarati 5.62 43 53
211 association udruženje 5.01 44 53
212 public javni 5.18 42 52
213 writing pisanje 5.68 46 52
214 computer računaru 8.93 50 52
215 element element 5.55 35 51
216 informing informisanje 5.49 46 51
217 necessary neophodan 5.30 46 51
218 again ponovo 6.46 48 50
219 follow pratiti 5.49 49 50
220 make praviti 7.53 45 50
221 picture slika 7.78 47 50
222 hundred sto 6.75 45 50
223 help pomoć 6.60 45 49
224 printed štampan 7.60 48 49
225 scientific naučni 7.64 43 48
226 significance značaj 5.33 44 48
227 magazine/journal časopis 6.01 39 47
228 abroad (n.) inostranstvu 5.78 41 47
229 interest interesovanje 5.42 45 47
230 Karadžić Karadžić 5.63 45 47
231 private privatni 9.29 33 47
232 environment sredina 5.01 34 47
233 election (adj.) izborni 5.67 32 46
234 universal univerzalni 9.34 42 46
235 clear čisti 9.80 42 45
236 Priština (adj.) prištinski 9.43 45 45
237 Roma Roma 6.25 20 45
238 influence uticaj 5.57 35 45
239 religion religija 8.77 41 44
240 schooling školovanje 5.72 42 44
241 teachers učitelji 6.24 35 44
242 readers čitaoci 5.16 43 43
243 philosophy filozofija 7.76 30 43
244 institute matica 5.47 38 43
245 show pokazivati 5.06 40 43
246 structure struktura 5.51 34 43
247 task zadatak 5.03 35 43
248 keep držati 5.14 40 42
249 comes out izlazi 6.87 41 42
250 persons lica 5.05 21 42
251 linguistic lingvistički 8.26 31 42
252 form oblik 5.16 35 42
253 syntax sintaksa 8.66 29 42
254 stand stajati 6.85 38 42
255 easier lakše 5.44 40 41
256 editor lektor 6.82 37 41
257 none nijedan 5.74 39 41
258 lectures predavanja 6.19 38 41
259 differentiate razlikovati 5.86 36 41
260 symbol simbola 7.04 35 41
261 variant varijanta 6.75 33 41
262 future (adj.) budući 5.33 38 40
263 rule pravilo 8.03 37 40
264 similar sličan 8.60 35 40
265 written ispisan 10.21 34 39
266 linguistics lingvistika 5.90 31 39
267 mother (n.) majka 6.91 37 39
268 Nikšić Nikšić 5.67 32 39
269 area područje 5.45 36 39
270 take (exams) polagati 5.40 32 39
243
N Collocate (English) Collocate (Serbian) MI score Texts Total
271 existence postojanje 6.26 31 39
272 including uključujući 5.73 37 39
273 speaking govoreći 8.48 38 38
274 violence nasilje 7.41 33 38
275 news vesti 6.24 34 38
276 Ijekavian ijekavski 7.02 29 37
277 call nazivati 6.82 32 37
278 organize organizovati 5.70 33 37
279 grade book dnevnik 6.78 36 36
280 rename preimenovati 8.24 24 36
281 reality stvarnost 6.96 35 36
282 computer science informatike 8.14 30 35
283 simple jednostavnim 11.02 34 35
284 encompass obuhvata 7.07 23 35
285 momentarily trenutno 6.76 33 35
286 thirty trideset 5.46 34 35
287 Ukrainian ukrajinski 8.97 30 35
288 purity čistota 8.59 26 34
289 reads glasi 6.79 32 34
290 linguist lingvista 6.73 33 34
291 first-graders prvaci 5.97 29 34
292 time slot termin 6.10 25 34
293 numerals brojki 11.62 32 33
294 use (n.) korišćenje 6.63 32 33
295 local lokalni 5.70 29 33
296 understood podrazumeva 5.04 33 33
297 research istraživanja 8.49 29 32
298 accent izgovor 5.52 27 32
299 adequate odgovarajući 5.85 29 32
300 sings peva 6.14 32 32
301 sentence rečenica 5.94 31 32
302 regional regionalni 10.31 24 32
303 study (n.) izučavanje 8.44 29 31
304 performs izvodi 7.26 30 31
305 minister ministar 5.19 29 31
306 enable omogućavati 5.52 27 31
307 jargon žargon 5.88 24 31
308 broadcast emituje 6.96 27 30
309 first najpre 8.26 28 30
310 paper papiru 7.80 30 30
311 master (v.) savladati 6.20 28 30
312 letter (a, b, c…) slovo 7.21 25 30
313 governing vladaju 7.32 27 30
314 rich bogat 5.56 28 29
315 diploma diploma 6.01 27 29
316 continue nastaviti 5.48 29 29
317 suits (v.) odgovara 7.05 27 29
318 discussion rasprava 5.29 25 29
319 equal ravnopravan 10.34 24 29
320 preserve sačuvati 5.25 26 29
321 population stanovništva 7.16 28 29
322 Subotica subotica 6.14 24 29
323 love (v.) volim 7.39 29 29
324 additional dodatni 5.41 27 28
325 bilingual dvojezično 9.00 28 28
326 offer (v.) nuditi 8.14 28 28
327 training obuku 9.65 26 28
328 count (n.) računa 7.12 25 28
329 walls zidova 6.49 22 28
330 optional fakultativni 8.59 24 27
331 South-Slavic južnoslovenski 11.88 22 27
332 phenomenon pojava 5.15 27 27
333 defend braniti 5.60 22 26
334 ethnic etnički 9.40 24 26
335 Hebrew hebrejskom 9.39 25 26
336 this (way) ovako 6.66 22 26
337 Swedish švedski 11.43 22 26
338 body tela 5.05 22 26
339 show (n.) emisije 5.79 24 25
244
N Collocate (English) Collocate (Serbian) MI score Texts Total
340 voice glas 5.05 25 25
341 past prošlosti 6.57 21 25
342 understanding razumevanje 5.94 22 25
343 choose birati 5.55 21 24
344 document dokumenta 5.97 23 24
345 Yugoslavia Jugoslavije 7.09 23 24
346 find pronađu 5.12 24 24
347 across širom 8.17 24 24
348 perfecting usavršavanje 6.57 24 24
349 connoisseur znalac 8.38 23 24
350 Czech češki 6.38 20 23
351 twenty dvadeset 5.12 23 23
352 clear jasnim 10.26 23 23
353 unity jedinstvo 9.91 21 23
354 ignorance nepoznavanje 8.57 23 23
355 try (v.) pokuša(va)ti 6.68 22 23
356 little pomalo 7.06 23 23
357 declare izjasniti 5.04 20 22
358 Yugoslav jugoslovenske 8.94 21 22
359 dialect narečja 7.30 20 22
360 connoisseur poznavalac 7.27 22 22
361 whole (n.) celina 5.12 20 21
362 contribution doprinos 8.69 21 21
363 twenty dvadesetak 5.27 21 21
364 thousands hiljada 5.88 21 21
365 learn about upoznaju 6.09 20 21
366 works delima 5.16 20 20
367 notion pojam 5.94 20 20
368 taking (exams) polaganje 6.76 20 20
245
Table F2
246
N Collocate (English) Collocate (Serbian) MI score Texts Total
64 best najbolji 8.65 130 136
65 French francuski 8.63 857 1052
66 spoken govorni 8.62 56 71
67 group grupa 8.61 120 148
68 similar sličan 8.60 35 40
69 purity čistota 8.59 26 34
70 optional fakultativni 8.59 24 27
71 European evropski 8.57 149 173
72 ignorance nepoznavanje 8.57 23 23
73 Polish poljski 8.55 62 68
74 German nemački 8.52 739 877
75 research istraživanja 8.49 29 32
76 speaking govoreći 8.48 38 38
77 study (n.) izučavanje 8.44 29 31
78 Serbian srpski 8.42 4486 8063
79 mathematics matematika 8.40 219 323
80 connoisseur znalac 8.38 23 24
81 Bulgarian bugarski 8.36 83 115
82 use upotreba 8.35 366 538
83 Montenegrin crnogorski 8.34 196 459
84 Japanese japanski 8.28 79 105
85 Slovene slovenski 8.27 114 168
86 first najpre 8.26 28 30
87 linguistic lingvistički 8.26 31 42
88 against protiv 8.25 64 72
89 rename preimenovati 8.24 24 36
90 customs običaji 8.21 107 112
91 call zvati 8.19 81 109
92 across širom 8.17 24 24
93 offer (v.) nuditi 8.14 28 28
94 computer science informatike 8.14 30 35
95 course kurs 8.11 184 226
96 Serbo-Croatian srpskohrvatski 8.07 196 287
97 rule pravilo 8.03 37 40
98 Chinese kineski 8.02 76 94
99 translate prevoditi 7.96 359 404
100 written pisan 7.93 135 143
101 Russian ruski 7.89 518 717
102 Ruthenian rusinski 7.87 42 64
103 paper papiru 7.80 30 30
104 Greek grčki 7.80 173 219
105 philological filološki 7.80 133 142
106 hatred mržnje 7.80 91 109
107 picture slika 7.78 47 50
108 department odsek 7.77 63 80
109 philosophy filozofija 7.76 30 43
110 state država 7.67 135 159
111 territory prostor 7.66 80 84
112 introduction uvođenje 7.65 112 150
113 literature književnost 7.64 761 1092
114 scientific naučni 7.64 43 48
115 fluently tečno 7.64 63 63
116 standard standardni 7.63 55 92
117 printed štampan 7.60 48 49
118 self sebe 7.59 92 101
119 translated preveden 7.57 660 696
120 alphabet pismo 7.54 435 785
121 dictionary rečnik 7.54 171 265
122 instruction nastava 7.53 312 464
123 make praviti 7.53 45 50
124 Slovak slovački 7.43 87 105
125 attend pohađati 7.42 66 83
126 violence nasilje 7.41 33 38
127 their njihov 7.41 328 374
128 love (v.) volim 7.39 29 29
129 think misliti 7.37 97 104
130 preservation očuvanje 7.34 59 63
131 governing vladaju 7.32 27 30
132 dialect narečja 7.30 20 22
247
N Collocate (English) Collocate (Serbian) MI score Texts Total
133 novelty novina 7.30 51 54
134 classical klasični 7.27 40 61
135 connoisseur poznavalac 7.27 22 22
136 minority manjina 7.27 209 312
137 performs izvodi 7.26 30 31
138 unique jedinstven 7.26 35 54
139 knowledge znanje 7.23 391 453
140 letter (a, b, c…) slovo 7.21 25 30
141 Roma (adj.) romski 7.17 146 273
142 population stanovništva 7.16 28 29
143 Cyrillic (adj.) ćirilično 7.12 47 55
144 count (n.) računa 7.12 25 28
145 dialect dijalekat 7.12 73 106
146 listen slušati 7.12 50 58
147 contemporary savremeni 7.10 213 256
148 Yugoslavia Jugoslavije 7.09 23 24
149 Romanian rumunski 7.08 126 160
150 encompass obuhvata 7.07 23 35
151 little pomalo 7.06 23 23
152 suits (v.) odgovara 7.05 27 29
153 symbol simbola 7.04 35 41
154 Ijekavian ijekavski 7.02 29 37
155 Bosniak bošnjački 7.02 87 152
156 test test 6.99 44 58
157 broadcast emituje 6.96 27 30
158 reality stvarnost 6.96 35 36
159 Serbs Srbi 6.95 176 234
160 novel roman 6.95 129 140
161 mother (n.) majka 6.91 37 39
162 church crkva 6.89 86 99
163 comes out izlazi 6.87 41 42
164 label naziv 6.86 179 255
165 SANU SANU 6.86 77 111
166 stand stajati 6.85 38 42
167 number broj 6.84 152 179
168 published objavljen 6.84 252 274
169 Latin (alphabet) latinica 6.82 51 68
170 editor lektor 6.82 37 41
171 call nazivati 6.82 32 37
172 style stil 6.81 71 76
173 Italian italijanski 6.81 156 168
174 exam ispit 6.79 117 152
175 reads glasi 6.79 32 34
176 grade book dnevnik 6.78 36 36
177 compulsory obavezan 6.78 78 93
178 say kazati 6.78 436 478
179 both oba 6.77 58 69
180 momentarily trenutno 6.76 33 35
181 taking (exams) polaganje 6.76 20 20
182 hundred sto 6.75 45 50
183 variant varijanta 6.75 33 41
184 linguist lingvista 6.73 33 34
185 literature literatura 6.71 84 88
186 must morati 6.70 195 207
187 try (v.) pokuša(va)ti 6.68 22 23
188 this (way) ovako 6.66 22 26
189 department katedra 6.66 130 174
190 use (n.) korišćenje 6.63 32 33
191 instructor nastavnik 6.63 113 154
192 help pomoć 6.60 45 49
193 Vuk (Karadžić) Vuk 6.58 100 127
194 past prošlosti 6.57 21 25
195 perfecting usavršavanje 6.57 24 24
196 culture kultura 6.54 676 818
197 doctor dr 6.52 86 94
198 professor profesor 6.52 503 619
199 faith vera 6.51 108 122
200 walls zidova 6.49 22 28
201 again ponovo 6.46 48 50
248
N Collocate (English) Collocate (Serbian) MI score Texts Total
202 wrote pisali 6.44 337 402
203 edition izdanje 6.43 187 205
204 learn učiti 6.41 519 714
205 Czech češki 6.38 20 23
206 people's narodni 6.37 194 278
207 exclusively isključivo 6.37 63 66
208 textbook udžbenik 6.32 122 146
209 poetry poezija 6.31 152 184
210 speech govor 6.31 127 155
211 man čovek 6.30 316 347
212 existence postojanje 6.26 31 39
213 Roma Roma 6.25 20 45
214 appear pojaviti 6.24 58 61
215 news vesti 6.24 34 38
216 teachers učitelji 6.24 35 44
217 media mediji 6.22 161 187
218 everyday svakodnevni 6.21 62 66
219 class period čas 6.20 164 197
220 master (v.) savladati 6.20 28 30
221 lectures predavanja 6.19 38 41
222 belong pripadati 6.18 60 73
223 Croats Hrvati 6.16 52 80
224 sings peva 6.14 32 32
225 Subotica Subotica 6.14 24 29
226 time slot termin 6.10 25 34
227 learn about upoznaju 6.09 20 21
228 world (adj.) svetski 6.06 301 323
229 read čitati 6.05 59 63
230 so-called takozvani 6.05 59 75
231 expression izraz 6.04 74 85
232 national nacionalni 6.02 356 488
233 word reč 6.02 543 644
234 magazine/journal časopis 6.01 39 47
235 diploma diploma 6.01 27 29
236 Slovene slovenački 5.97 112 121
237 document dokumenta 5.97 23 24
238 first-graders prvaci 5.97 29 34
239 Turkish turski 5.95 60 71
240 understanding razumevanje 5.94 22 25
241 sentence rečenica 5.94 31 32
242 poem pesma 5.94 205 229
243 foreign strani 5.94 1448 2293
244 notion pojam 5.94 20 20
245 our naš 5.90 1026 1262
246 linguistics lingvistika 5.90 31 39
247 poetic pesnički 5.89 62 82
248 people narod 5.89 340 455
249 history istorija 5.88 315 372
250 jargon žargon 5.88 24 31
251 Cyrillic ćirilica 5.88 173 256
252 thousands hiljada 5.88 21 21
253 differentiate razlikovati 5.86 36 41
254 adequate odgovarajući 5.85 29 32
255 school (university) fakultet 5.85 298 351
256 school škola 5.85 501 673
257 translation prevod 5.84 369 413
258 excellently odlično 5.82 96 98
259 speak govoriti 5.82 1609 1977
260 mean (v.) značiti 5.81 194 204
261 good dobar 5.80 284 301
262 show (n.) emisije 5.79 24 25
263 abroad (n.) inostranstvu 5.78 41 47
264 high school gimnazija 5.78 72 79
265 learn naučiti 5.76 229 252
266 science nauka 5.76 146 182
267 name ime 5.74 198 320
268 none nijedan 5.74 39 41
269 including uključujući 5.73 37 39
270 schooling školovanje 5.72 42 44
249
N Collocate (English) Collocate (Serbian) MI score Texts Total
271 area oblast 5.71 100 111
272 local lokalni 5.70 29 33
273 organize organizovati 5.70 33 37
274 writing pisanje 5.68 46 52
275 Nikšić Nikšić 5.67 32 39
276 election (adj.) izborni 5.67 32 46
277 own svoj 5.67 1080 1442
278 ordinary običan 5.66 62 63
279 exist postojati 5.64 278 328
280 education obrazovanje 5.63 156 176
281 Karadžić Karadžić 5.63 45 47
282 create stvarati 5.62 43 53
283 students (K-8) đaci 5.61 55 58
284 defend braniti 5.60 22 26
285 board odbor 5.60 65 107
286 life život 5.60 111 117
287 same isti 5.59 321 402
288 beautiful lep 5.58 53 58
289 influence uticaj 5.57 35 45
290 rich bogat 5.56 28 29
291 choose birati 5.55 21 24
292 element element 5.55 35 51
293 instructional nastavni 5.54 55 73
294 enable omogućavati 5.52 27 31
295 student (university) student 5.52 157 189
296 accent izgovor 5.52 27 32
297 program of study studija 5.52 105 121
298 paper list 5.51 64 65
299 structure struktura 5.51 34 43
300 special poseban 5.51 152 177
301 high (school) srednji 5.50 53 56
302 protection zaštita 5.50 94 106
303 follow pratiti 5.49 49 50
304 informing informisanje 5.49 46 51
305 four četiri 5.49 154 174
306 music muzika 5.48 118 127
307 continue nastaviti 5.48 29 29
308 film (adj.) filmski 5.48 101 112
309 institute matica 5.47 38 43
310 thirty trideset 5.46 34 35
311 area područje 5.45 36 39
312 easier lakše 5.44 40 41
313 interest interesovanje 5.42 45 47
314 additional dodatni 5.41 27 28
315 take (exams) polagati 5.40 32 39
316 students (K-12) učenici 5.39 89 106
317 publish objaviti 5.39 136 141
318 translator prevodilac 5.37 178 197
319 understand razumeti 5.34 178 191
320 six šest 5.34 89 95
321 future (adj.) budući 5.33 38 40
322 significance značaj 5.33 44 48
323 needed potreban 5.33 64 67
324 necessary neophodan 5.30 46 51
325 discussion rasprava 5.29 25 29
326 twenty dvadesetak 5.27 21 21
327 preserve sačuvati 5.25 26 29
328 level nivo 5.25 78 92
329 second drugi 5.21 781 1022
330 subject predmet 5.21 207 297
331 minister ministar 5.19 29 31
332 public javni 5.18 42 52
333 others ostali 5.16 148 164
334 nature priroda 5.16 54 57
335 form oblik 5.16 35 42
336 readers čitaoci 5.16 43 43
337 works delima 5.16 20 20
338 literary književni 5.15 308 514
339 phenomenon pojava 5.15 27 27
250
N Collocate (English) Collocate (Serbian) MI score Texts Total
340 five pet 5.15 123 132
341 keep držati 5.14 40 42
342 third treći 5.14 53 62
343 find pronađu 5.12 24 24
344 twenty dvadeset 5.12 23 23
345 whole (n.) celina 5.12 20 21
346 several nekoliko 5.11 227 238
347 study (v.) studirati 5.10 53 63
348 desire (v.) želeti 5.09 84 92
349 all svi 5.08 902 1068
350 represent predstavljati 5.07 106 114
351 show pokazivati 5.06 40 43
352 defense odbrana 5.05 44 60
353 body tela 5.05 22 26
354 law zakon 5.05 100 124
355 three tri 5.05 217 259
356 persons lica 5.05 21 42
357 voice glas 5.05 25 25
358 university univerzitet 5.04 108 116
359 understood podrazumeva 5.04 33 33
360 declare izjasniti 5.04 20 22
361 need (n.) potreba 5.03 71 79
362 framework okvir 5.03 66 76
363 task zadatak 5.03 35 43
364 two dva 5.02 381 484
365 simultaneously istovremeno 5.02 57 61
366 communication komunikacija 5.02 89 106
367 environment sredina 5.01 34 47
368 association udruženje 5.01 44 53
251
Appendix G: Collocation Analysis (5+ Hits Section of SERBCORP)
Table G1
Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP
(by frequency)
252
N Collocate (English) Collocate (Serbian) MI score Texts Total
58 elementary osnovni 9.64 107 138
59 Serbs Srbi 5.74 86 138
60 Cyrillic (n.) ćirilica 7.61 70 135
61 mathematics matematika 9.29 68 135
62 Roma romski 8.49 34 135
63 (Monte)negro Crna 7.75 81 134
64 history istorija 7.43 97 134
65 part deo 5.62 95 129
66 minority manjina 7.89 64 128
67 knowledge znanje 8.76 90 124
68 country zemlja 7.03 86 122
69 curriculum program 7.89 85 121
70 school (university) fakultet 7.22 86 120
71 standardization standardizacija 10.72 56 117
72 Bosniak bošnjački 7.75 54 116
73 children deca 7.21 75 113
74 every svaki 6.34 94 113
75 official zvaničan 10.91 72 112
76 institute institut 8.02 54 110
77 little mali 7.28 75 110
78 big veliki 5.85 82 110
79 man čovek 6.13 86 109
80 Hungarian mađarski 9.85 51 108
81 that is odnosno 5.51 81 108
82 contemporary savremeni 10.23 73 106
83 grade razred 7.20 65 103
84 Slovene slovenski 13.86 58 101
85 science nauka 7.56 69 100
86 instructor nastavnik 8.59 62 99
87 Spanish španski 7.60 47 98
88 renaming preimenovanje 9.54 45 94
89 that onaj 6.63 74 93
90 special poseban 11.94 71 93
91 introduction uvođenje 7.73 58 92
92 department katedra 13.25 57 91
93 orthography pravopis 7.82 53 90
94 student (university) student 8.57 62 90
95 board odbor 7.74 48 88
96 number broj 9.08 61 84
97 class period čas 7.76 64 84
98 linguistic jezički 7.09 63 83
99 begin početi 7.97 33 83
100 different različit 13.87 61 83
101 common zajednički 10.79 64 83
102 good dobar 6.75 75 82
103 rights prava 5.28 65 82
104 standard (adj.) standardni 9.12 45 81
105 state država 8.45 61 79
106 dialect dijalekat 9.50 46 78
107 many mnogi 7.18 68 78
108 learn (v.) naučiti 13.36 66 78
109 call zvati 9.15 51 78
110 cultural kulturni 7.13 70 77
111 Albanian albanski 9.84 37 76
112 use (v.) koristiti 12.79 55 76
113 Vuk (Karadžić) Vuk 7.57 50 75
114 say reći 11.64 64 74
115 basis osnov 9.41 60 72
116 speech govor 6.51 49 71
117 Belgrade Beograd 5.46 64 70
118 Greek grčki 7.76 34 70
119 problem problem 5.37 59 70
120 SANU SANU 6.19 40 70
121 Croats Hrvati 6.48 42 69
122 translation prevod 6.20 54 69
123 law zakon 6.13 47 69
124 teach predavati 8.05 49 68
125 European evropski 7.12 48 67
126 course kurs 7.98 43 67
253
N Collocate (English) Collocate (Serbian) MI score Texts Total
127 grammar gramatika 8.34 44 65
128 group grupa 10.75 43 65
129 my moj 5.67 44 65
130 relation odnos 6.56 50 65
131 become postati 6.94 50 65
132 political politički 6.23 52 64
133 writer pisac 9.14 46 63
134 education obrazovanje 5.37 49 62
135 own sopstveni 8.14 46 62
136 that tim 8.73 53 62
137 get dobiti 7.31 52 60
138 only jedini 9.34 50 60
139 decision odluka 6.81 41 60
140 translated preveden 9.30 49 60
141 Latin latinski 7.87 35 59
142 section odeljenje 10.40 38 59
143 textbook udžbenik 8.23 44 59
144 four četiri 5.08 46 58
145 exam ispit 6.69 38 57
146 living živ 8.50 47 57
147 media mediji 8.43 35 56
148 translate prevoditi 9.40 42 56
149 students (K-12) učenici 5.80 42 56
150 percent odsto 7.03 32 55
151 translator prevodilac 6.20 42 55
152 world (adj.) svetski 7.40 45 55
153 introduce uvesti 12.45 41 55
154 I ja 7.89 41 54
155 these ovi 9.36 51 54
156 represent predstavljati 5.53 48 54
157 work (v.) raditi 5.80 48 54
158 constitution ustav 6.69 40 54
159 others ostali 8.06 46 53
160 case slučaj 7.62 45 53
161 desire (v.) želeti 10.03 49 53
162 Europe Evropa 7.33 41 52
163 always uvek 7.22 49 52
164 Bulgarian bugarski 6.96 24 51
165 nation nacija 11.23 37 51
166 need (n.) potreba 7.05 43 51
167 study (n.) studija 6.37 40 51
168 make (v.) čini 6.02 43 50
169 identity identitet 6.31 37 50
170 compulsory obavezan 8.16 36 50
171 poetry poezija 10.49 63 50
172 Romanian rumunski 8.65 33 50
173 so-called takozvani 8.37 36 50
174 be able to moći 7.20 35 49
175 communication komunikacija 8.06 33 48
176 level nivo 6.70 38 48
177 area oblast 7.54 38 48
178 instructional nastavni 7.63 31 47
179 engage baviti 9.66 39 46
180 Montenegrins Crnogorci 6.48 22 46
181 department odsek 9.27 30 46
182 consider smatrati 5.01 42 46
183 art umetnost 6.15 42 46
184 day dan 5.70 41 45
185 defense odbrana 8.67 30 45
186 possiblity mogućnost 5.67 38 44
187 war rat 5.95 27 44
188 difference razlika 10.73 35 44
189 thing stvar 5.86 38 44
190 society društvo 5.84 32 43
191 think misliti 6.60 37 43
192 change (n.) promena 6.62 33 43
193 Ruthenian rusinski 8.88 24 43
194 text tekst 5.74 30 43
195 protection zaštita 7.41 39 43
254
N Collocate (English) Collocate (Serbian) MI score Texts Total
196 old stari 7.37 33 42
197 lead (v.) voditi 6.63 39 42
198 spirit duh 6.60 34 41
199 philological filološki 7.97 35 41
200 Latin (adj.) latinica 11.10 27 41
201 best najbolji 6.37 39 41
202 poem pesma 7.04 35 41
203 majority većina 9.34 36 41
204 link veza 7.32 35 41
205 framework okvir 8.61 30 40
206 remain ostati 6.85 33 40
207 knowledge poznavanje 7.61 34 40
208 Slovak slovački 8.65 27 40
209 faith vera 6.89 32 40
210 come doći 6.61 37 39
211 spoken govorni 10.14 28 39
212 understand razumeti 7.05 33 39
213 such takav 5.61 36 39
214 center centar 5.70 29 38
215 title naslov 9.50 26 38
216 attend pohađati 7.86 23 38
217 belong pripadati 6.04 27 38
218 element element 8.90 23 37
219 biggest najveći 5.51 33 37
220 write pisati 8.37 30 37
221 both oba 5.89 26 36
222 self sebe 5.62 32 36
223 study (v.) izučavati 9.23 26 35
224 linguistic lingvistički 8.38 25 35
225 written napisan 9.45 29 35
226 beginning početak 5.95 32 35
227 development razvoj 5.57 28 35
228 influence (n.) uticaj 6.82 25 35
229 community zajednica 6.66 27 35
230 elective izborni 6.37 20 34
231 published objavljen 6.39 30 34
232 system sistem 9.58 29 34
233 use (v.) služiti 10.39 29 34
234 expert stručnjak 7.47 30 34
235 tradition tradicija 6.62 29 34
236 third treći 6.46 28 34
237 philosophical filozofski 6.39 24 33
238 Croatia Hrvatska 6.48 26 33
239 Italian italijanski 6.55 27 33
240 expression izraz 8.84 25 33
241 to not have nemati 6.57 32 33
242 her njen 9.47 29 33
243 publish objaviti 8.07 32 33
244 council savet 5.80 24 33
245 Belgrade (adj.) beogradski 6.52 29 32
246 edition izdanje 5.95 25 32
247 existence postojanje 8.03 24 32
248 Cyrillic (adj.) ćirilično 7.91 23 31
249 citizens građani 9.02 25 31
250 linguistics lingvistika 8.17 23 31
251 change menjati 5.76 22 31
252 needed potreban 7.06 28 31
253 republic republika 6.29 25 31
254 Ijekavian (dialect) ijekavski 7.26 22 30
255 plan plan 6.20 24 30
256 association udruženje 6.96 23 30
257 linguist lingvista 5.77 28 29
258 reason razlog 5.64 20 29
259 six šest 5.14 25 29
260 high school gimnazija 5.14 24 28
261 violence nasilje 10.11 23 28
262 scientific naučni 6.33 25 28
263 necessary neophodan 8.16 25 28
264 never nikad 5.50 26 28
255
N Collocate (English) Collocate (Serbian) MI score Texts Total
265 standard (n.) standard 7.19 21 28
266 exclusively isključivo 8.51 25 27
267 Karadžić (Vuk) Karadžić 6.33 25 27
268 come into being nastati 6.76 22 27
269 explain objašnjavati 9.21 23 27
270 preservation očuvanje 6.67 23 27
271 accept prihvatiti 7.65 25 27
272 structure struktura 7.05 20 27
273 claim (v.) tvrditi 7.58 26 27
274 teachers (K-8) učitelji 6.27 20 27
275 see videti 10.42 23 27
276 state (adj.) državni 7.00 20 26
277 Nikšić Nikšić 7.59 20 26
278 form oblik 6.23 20 26
279 concern (v.) ticati 5.34 23 26
280 often često 5.09 24 25
281 name (v.) nazvati 7.03 20 25
282 nature priroda 6.54 23 25
283 sense smisao 6.11 24 25
284 high (school) srednji 8.12 22 25
285 creation stvaranje 6.13 20 25
286 topic tema 5.49 23 25
287 last poslednji 6.27 23 24
288 means (n.) sredstvo 6.50 22 24
289 everyday svakodnevni 8.18 20 24
290 difficult (adv.) teško 8.10 24 24
291 authorities vlast 5.07 20 24
292 significance značaj 5.70 20 24
293 academy akademija 5.99 22 23
294 institution institucija 5.39 21 23
295 opinion mišljenje 5.26 21 23
296 consideration obzir 5.07 20 23
297 bigger veći 5.69 21 23
298 work (n.) delo 5.47 20 22
299 less/er manje 5.15 21 22
300 origin poreklo 6.67 21 22
301 project (n.) projekat 8.95 20 22
302 hundred sto 5.55 20 22
303 interest interesovanje 5.74 20 21
304 studying izučavanje 7.87 20 21
305 government vlada 5.66 20 20
256
Table G2
Lemma Collocates of the Lemma JEZIK ‘Language’ in the 5+ hits section of SERBCORP
(by MI Score)
257
N Collocate (English) Collocate (Serbian) MI score Texts Total
61 explain objašnjavati 9.21 23 27
62 alphabet pismo 9.19 165 457
63 literary književni 9.16 176 335
64 call zvati 9.15 51 78
65 writer pisac 9.14 46 63
66 standard (adj.) standardni 9.12 45 81
67 number broj 9.08 61 84
68 citizens građani 9.02 25 31
69 project (n.) projekat 8.95 20 22
70 (Monte)negro Gora 8.92 80 144
71 our naš 8.91 257 401
72 element element 8.90 23 37
73 Ruthenian rusinski 8.88 24 43
74 mother (adj.) maternji 8.86 296 636
75 expression izraz 8.84 25 33
76 Serbian srpski 8.80 802 3449
77 Serbo-Croatian srpskohrvatski 8.76 96 182
78 knowledge znanje 8.76 90 124
79 culture kultura 8.74 153 231
80 that tim 8.73 53 62
81 defense odbrana 8.67 30 45
82 Romanian rumunski 8.65 33 50
83 Slovak slovački 8.65 27 40
84 national nacionalni 8.64 126 222
85 framework okvir 8.61 30 40
86 instructor nastavnik 8.59 62 99
87 student (university) student 8.57 62 90
88 exclusively isključivo 8.51 25 27
89 living živ 8.50 47 57
90 Roma romski 8.49 34 135
91 state država 8.45 61 79
92 media mediji 8.43 35 56
93 linguistic lingvistički 8.38 25 35
94 so-called takozvani 8.37 36 50
95 write pisati 8.37 30 37
96 grammar gramatika 8.34 44 65
97 instruction nastava 8.30 149 262
98 all svi 8.24 290 406
99 textbook udžbenik 8.23 44 59
100 everyday svakodnevni 8.18 20 24
101 linguistics lingvistika 8.17 23 31
102 compulsory obavezan 8.16 36 50
103 necessary neophodan 8.16 25 28
104 professor profesor 8.14 145 239
105 own sopstveni 8.14 46 62
106 high (school) srednji 8.12 22 25
107 Croatian hrvatski 8.10 157 363
108 difficult (adv.) teško 8.10 24 24
109 publish objaviti 8.07 32 33
110 others ostali 8.06 46 53
111 communication komunikacija 8.06 33 48
112 teach predavati 8.05 49 68
113 existence postojanje 8.03 24 32
114 institute institut 8.02 54 110
115 Russian ruski 7.98 89 202
116 course kurs 7.98 43 67
117 begin početi 7.97 33 83
118 philological filološki 7.97 35 41
119 Cyrillic (adj.) ćirilično 7.91 23 31
120 minority manjina 7.89 64 128
121 curriculum program 7.89 85 121
122 I ja 7.89 41 54
123 Latin latinski 7.87 35 59
124 studying izučavanje 7.87 20 21
125 attend pohađati 7.86 23 38
126 orthography pravopis 7.82 53 90
127 class period čas 7.76 64 84
128 Greek grčki 7.76 34 70
129 Monte(negro) Crna 7.75 81 134
258
N Collocate (English) Collocate (Serbian) MI score Texts Total
130 Bosniak bošnjački 7.75 54 116
131 board odbor 7.74 48 88
132 introduction uvođenje 7.73 58 92
133 name naziv 7.72 89 151
134 own svoj 7.70 360 608
135 German nemački 7.66 112 177
136 accept prihvatiti 7.65 25 27
137 that taj 7.64 484 791
138 instructional nastavni 7.63 31 47
139 case slučaj 7.62 45 53
140 Cyrillic (n.) ćirilica 7.61 70 135
141 knowledge poznavanje 7.61 34 40
142 Spanish španski 7.60 47 98
143 Nikšić Nikšić 7.59 20 26
144 literature književnost 7.58 220 456
145 claim (v.) tvrditi 7.58 26 27
146 Vuk (Karadžić) Vuk 7.57 50 75
147 science nauka 7.56 69 100
148 area oblast 7.54 38 48
149 their njihov 7.48 135 168
150 expert stručnjak 7.47 30 34
151 his njegov 7.45 165 213
152 history istorija 7.43 97 134
153 protection zaštita 7.41 39 43
154 world (adj.) svetski 7.40 45 55
155 same isti 7.39 112 154
156 old stari 7.37 33 42
157 Europe Evropa 7.33 41 52
158 link veza 7.32 35 41
159 get dobiti 7.31 52 60
160 learning učenje 7.28 136 222
161 little mali 7.28 75 110
162 Ijekavian (dialect) ijekavski 7.26 22 30
163 school (university) fakultet 7.22 86 120
164 always uvek 7.22 49 52
165 children deca 7.21 75 113
166 grade razred 7.20 65 103
167 be able to moći 7.20 35 49
168 standard (n.) standard 7.19 21 28
169 many mnogi 7.18 68 78
170 cultural kulturni 7.13 70 77
171 European evropski 7.12 48 67
172 people narod 7.09 153 252
173 linguistic jezički 7.09 63 83
174 needed potreban 7.06 28 31
175 need (n.) potreba 7.05 43 51
176 understand razumeti 7.05 33 39
177 structure struktura 7.05 20 27
178 poem pesma 7.04 35 41
179 country zemlja 7.03 86 122
180 percent odsto 7.03 32 55
181 name (v.) nazvati 7.03 20 25
182 state (adj.) državni 7.00 20 26
183 school (K-12) škola 6.97 179 318
184 Bulgarian bugarski 6.96 24 51
185 association udruženje 6.96 23 30
186 become postati 6.94 50 65
187 faith vera 6.89 32 40
188 remain ostati 6.85 33 40
189 influence (n.) uticaj 6.82 25 35
190 decision odluka 6.81 41 60
191 come into being nastati 6.76 22 27
192 good dobar 6.75 75 82
193 level nivo 6.70 38 48
194 exam ispit 6.69 38 57
195 constitution ustav 6.69 40 54
196 preservation očuvanje 6.67 23 27
197 origin poreklo 6.67 21 22
198 dictionary rečnik 6.66 77 158
259
N Collocate (English) Collocate (Serbian) MI score Texts Total
199 community zajednica 6.66 27 35
200 that onaj 6.63 74 93
201 lead (v.) voditi 6.63 39 42
202 change (n.) promena 6.62 33 43
203 tradition tradicija 6.62 29 34
204 come doći 6.61 37 39
205 think misliti 6.60 37 43
206 spirit duh 6.60 34 41
207 to not have nemati 6.57 32 33
208 relation odnos 6.56 50 65
209 Italian italijanski 6.55 27 33
210 Serbia Srbija 6.54 109 148
211 nature priroda 6.54 23 25
212 this ovaj 6.53 211 265
213 Belgrade (adj.) beogradski 6.52 29 32
214 speech govor 6.51 49 71
215 means (n.) sredstvo 6.50 22 24
216 Croats Hrvati 6.48 42 69
217 Montenegrins Crnogorci 6.48 22 46
218 Croatia Hrvatska 6.48 26 33
219 third treći 6.46 28 34
220 world (n.) svet 6.43 112 150
221 published objavljen 6.39 30 34
222 philosophical filozofski 6.39 24 33
223 study (n.) studija 6.37 40 51
224 best najbolji 6.37 39 41
225 elective izborni 6.37 20 34
226 every svaki 6.34 94 113
227 scientific naučni 6.33 25 28
228 Karadžić (Vuk) Karadžić 6.33 25 27
229 identity identitet 6.31 37 50
230 republic republika 6.29 25 31
231 teachers (K-8) učitelji 6.27 20 27
232 last poslednji 6.27 23 24
233 political politički 6.23 52 64
234 form oblik 6.23 20 26
235 one jedan 6.22 281 454
236 translation prevod 6.20 54 69
237 translator prevodilac 6.20 42 55
238 plan plan 6.20 24 30
239 SANU SANU 6.19 40 70
240 art umetnost 6.15 42 46
241 man čovek 6.13 86 109
242 law zakon 6.13 47 69
243 creation stvaranje 6.13 20 25
244 sense smisao 6.11 24 25
245 first prvi 6.08 154 222
246 belong pripadati 6.04 27 38
247 make (v.) čini 6.02 43 50
248 academy akademija 5.99 22 23
249 war rat 5.95 27 44
250 beginning početak 5.95 32 35
251 edition izdanje 5.95 25 32
252 both oba 5.89 26 36
253 question pitanje 5.88 108 168
254 they oni 5.86 245 310
255 word reč 5.86 180 253
256 thing stvar 5.86 38 44
257 big veliki 5.85 82 110
258 society društvo 5.84 32 43
259 students (K-12) učenici 5.80 42 56
260 work (v.) raditi 5.80 48 54
261 council savet 5.80 24 33
262 linguist lingvista 5.77 28 29
263 change menjati 5.76 22 31
264 Serbs Srbi 5.74 86 138
265 text tekst 5.74 30 43
266 interest interesovanje 5.74 20 21
267 two dva 5.72 138 206
260
N Collocate (English) Collocate (Serbian) MI score Texts Total
268 day dan 5.70 41 45
269 center centar 5.70 29 38
270 significance značaj 5.70 20 24
271 bigger veći 5.69 21 23
272 my moj 5.67 44 65
273 possiblity mogućnost 5.67 38 44
274 government vlada 5.66 20 20
275 reason razlog 5.64 20 29
276 part deo 5.62 95 129
277 self sebe 5.62 32 36
278 such takav 5.61 36 39
279 development razvoj 5.57 28 35
280 hundred sto 5.55 20 22
281 represent predstavljati 5.53 48 54
282 that is odnosno 5.51 81 108
283 biggest najveći 5.51 33 37
284 never nikad 5.50 26 28
285 topic tema 5.49 23 25
286 work (n.) delo 5.47 20 22
287 Belgrade Beograd 5.46 64 70
288 book knjiga 5.41 118 169
289 he on 5.39 133 156
290 institution institucija 5.39 21 23
291 problem problem 5.37 59 70
292 education obrazovanje 5.37 49 62
293 year godina 5.36 186 239
294 concern (v.) ticati 5.34 23 26
295 rights prava 5.28 65 82
296 opinion mišljenje 5.26 21 23
297 less/er manje 5.15 21 22
298 six šest 5.14 25 29
299 high school gimnazija 5.14 24 28
300 itself sam 5.11 338 494
301 often često 5.09 24 25
302 four četiri 5.08 46 58
303 authorities vlast 5.07 20 24
304 consideration obzir 5.07 20 23
305 consider smatrati 5.01 42 46
1
SETIMES2, OPUS2, and srWaC14 are available at www.sketchengine.co.uk.
2
The caveat here, of course, is that the reference/comparator corpus should be at least the size of the
research corpus.
3
Because of their large sizes and limited availability, the WaC corpora could only be used as reference
corpora by first downloading their full wordlists in txt format and then converting these into WST wordlists
for the purposes of keyword analysis. The alternative solution, uploading the entire SERBCOMP onto the
SketchEngine website to conduct keyword analysis there, was technically demanding and prohibitively
expensive.
4
It should be noted that Serbian is a heavily inflectional language and so all search terms, keywords,
collocates, and n-grams are likely to (and do) show up in multiple inflectional forms in the corpus.
Although lemmatization can be problematic because it “has the potential to disguise important differences
261
in collocational preferences between different forms of a lemma” (Durrant, 2009, p. 162; see also Sinclair,
1991), it was consistently applied to all quantitative CL analyses (with the exception of n-gram analysis) in
this study to reduce the impact of inflectional morphology on statistical analyses (cf. Baker, Gabrielatos &
McEnery, 2013; Partington, 2010). For example, treating individual lemma forms separately often meant
that obviously important lexical items either fell (well) below the frequency threshold or appeared to be
less salient than they are. Lemmatization solved this problem by adding up the frequencies of all individual
lemma forms for a total lemma frequency. Similarly, treating individual lemma forms separately would
have multiplied the sometimes already large numbers of keywords, collocates, and n-grams. As a
corollary, many collocate variables based on individual lemma forms in EFA would have likely failed to
load on any factors due to their considerably lower frequencies. Thus, even though different lemma forms
do often exhibit different, and sometimes complementary, collocational preferences, this does not appear to
the predominant number (singular or plural) for nouns and pronouns, first person singular masculinum for
adjectives, and the infinitive case for verbs. Keywords that appeared in only one of their possible lemma
262