1093/ijl/eci051
Abstract
The present article starts from a broad definition of collocations as holistic
lexico-grammatical or semantic units (see Part I for full details), asking how such
units can be adequately represented in bilingual and monolingual encoding dictionaries.
It is found that an onomasiological approach to dictionary making is better suited to
this task than a semasiological, framework-based methodology whereby individual
lexicographers work on small, alphabetically classified sections of the dictionary.
Typically, semasiological dictionaries and corresponding methodologies have difficulty
in arranging items in a clear and memorable way, give patchy or inadequate coverage to
semantic-pragmatic collocations, cannot provide adequate cross-referencing between
synonymous items and are prone to translation errors. It is shown how onomasiological
dictionaries and methodologies can remedy such deficiencies. The Bilexicon project
aimed at creating thematic learners dictionaries is the main source laid under
contribution with a view to illustrating the suggestions made.
1. Introduction
There is growing recognition that both structurally simple (i.e. (bound)
morphemes, lexemes) and structurally complex units (i.e. collocations or
colligational patterns) are linguistic signs (Feilke 2003). If the dictionary is
meant to be a record of such signs, the task of the lexicographer is to gather
together evidence of both types of sign. So far it has been lexemes, noncompositional idioms and morphemes that have received the bulk of
lexicographic attention, but the future clearly belongs to collocation and
colligation in the widest possible sense. However, most linguistic models of
collocation are too limited (e.g. Hausmann 1999), too formalist (e.g. Melcuk
1998) or too broad (e.g. Kjellmer 1994) to be readily adaptable to lexicographic
practice (see the first part of this article, IJL 18/4).
International Journal of Lexicography, Vol. 19 No. 1. Advance access publication 29 November 2005
2005 Oxford University Press. All rights reserved. For permissions,
please email: journals.permissions@oxfordjournals.org
Dirk Siepmann
2.1 Rationale
The rationale behind the Bilexicon project proceeds from a paradox about
foreign language learning in higher education: language teaching specialists
have long demanded that university graduates in modern languages should
have a native-like lexical competence in their L2 (e.g. Meiner et al. 2001);
in practice, however, such a competence is seldom attained, and few serious
Dirk Siepmann
Dirk Siepmann
It should have become clear that, despite its deficiencies, the second
alternative is more promising than an approach based on frequency alone,
especially if the point of departure is a clearly delimited area of the vocabulary,
such as the language of motoring or the vocabulary relating to feelings. First,
a very large corpus of subject-specific material is assembled from Internet and
other sources, such as corpora and published dictionaries. In constructing
such a corpus, it is important to include Internet genres that are lexically close
to real-life speech, such as news forums, e-mail, fan fiction, film and soap
opera scenarios. A further means of reducing the inevitable bias towards
writing in corpus construction is to elicit judgements from native speakers on
the currency of particular words and collocations in speech. It is to be expected,
however, that such tests will produce tangible results in only a few vocabulary
areas, such as proverbs (Arnaud 1992) or idioms. In others, such as motoring,
the sheer size of the lexical material precludes any detailed investigation of
native-speaker judgements.
The third alternative is some sort of combination of the frequency-based
approach and the approach drawing on economy effects, which could,
for example, be applied in succession. Economy effects may also be taken
into consideration in determining proficiency levels below the near-native level.
The subsequent procedure involves three major steps:
(1) Corpora and dictionary sources are tapped to identify all the individual
word-forms and words belonging to the vocabulary area in question. This
involves the making of a corpus-based word list using for example
the WordSmith tool of the same name and the use of dictionaries which
allow full-text searches or searches by subject area, such as TLF, DO, PR
or CIDE.
(2) In the next step, programs such as WordSmith and Collocate are used
to determine the collocations and patterns entered by the items on the
word list.
(3) The third step is to eliminate redundant collocations on the basis of the
aforementioned economy effects.
In a fourth, optional step various proficiency levels might be distinguished
on the basis of the frequency of collocations and single words or on the
basis of the transparency of items for particular user groups (cf. Hausmann,
forthcoming).
2.3 Macrostructure
The project stands in the long tradition of what, borrowing from McArthur
(1986), we might call thematic learner lexicography a tradition that goes
Topic Area 1:
Situation Type 1
Topic Area 2:
Situation Type 2
money/funds/a sum/etc.
leave account/bank/etc.
Tu craches ta valda ?
Banking
Emotions: Impatience
Emotions: Care
(or Caution)
Dirk Siepmann
setting up and internal structuring of sub-areas and situation types. This stands
in contrast with traditional approaches to thesaurus building, where terms were
inserted into a fully pre-determined ontological structure. There are, of course,
obvious limitations to such an approach in that some words and collocations
have both general and topic-specific uses. A case in point is the vocabulary
relating to damage, which is important in such situation types as car accidents
but may also apply to a wide range of other situations (any kind of accident,
intention to harm, legal terminology, etc.).
Underlying this thematic organization in the electronic version will be a layer
of semantic links inspired by such work as Francis, Hunston and Manning
(1996, 1998), who have shown that words entering similar patterns usually
share an aspect of meaning. This will enable users to extend their vocabulary
along a non-thematic route and will raise their awareness of the close link
between sense and syntax.
tradition of recording single words which has existed at least since Babylonian
antiquity.
There is, of course, no denying the fact that speakers can isolate words
from context and thus arrive at a definition of word meanings. However, since
the definition of word meaning requires the speaker to engage in a process of
abstraction, it is at least debatable whether it is word meanings that underlie
the speakers competence. Even the elicitability of paradigmatic relations
between the meanings of individual words does not allow us to conclude
that word meanings are stored in paradigmatic networks in what is often
called the mental lexicon (cf. Aitchison 1994). It is equally conceivable that
observees in psychological experiments respond with particular paradigmatic
associations because they have repeatedly met the associated items in
syntagmatic strings (cf. Rapp and Wettler 1992, Rapp 1995); as Jones (2002)
has shown, antonyms, for example, tend to co-occur syntagmatically (good or
bad, rich and poor).
The crucial factor in the acquisition of meanings thus seems to be the
primary association between lexical units of varying length3 and their extralinguistic and/or intralingual context of occurrence rather than the secondary
paradigmatic connections between two or more words that speakers can
establish when prompted or the word meaning which they can abstract out of
context when asked. Put another way, when unprompted, speakers produce
meanings by syntagmatically associating and/or modifying lexical chunks
which they have encountered before in similar contexts as the current one.
Our own practices of dictionary making have blinded us to the fact that we do
not communicate by stringing together individual words, but rather by means
of semi-prefabricated lexico-grammatical units.
This view, first proposed in outline by Bally (1909), has recently come to the
fore again in the Firthian tradition. Meaning is seen as residing in typical
combinations of lexical choices or collocability on the one hand, and typical
combinations of grammatical choices or colligation on the other (Hunston
2001). A crucial aspect of an items meaning is its semantic prosody, a term
which reflects the realisation that lexical items become infused with particular
connotations due to their typical linguistic environment (Sinclair 1991, Louw
1993, Stubbs 1995).
The implications of the above for lexicography, especially learner lexicography are clear: if a) meaning is considered to be inherent in collocation (under
which term I here subsume colligation) and b) the dictionary is intended to
provide a record of the units of meaning in a language, then future dictionaries
will have to provide a full account of collocational meaning units and their
typical contexts of occurrence.4 One of the most obvious desiderata, then, is for
collocations, as defined in the introduction, to be given entry status. Rather
than appear in the exemplificatory material, collocations of this type should
themselves be illustrated with examples as necessary.
10
Dirk Siepmann
11
12
Dirk Siepmann
type of syntactic relationship, after the manner of OC, for example. But then
again such clustering may be difficult to justify with clearly motivated multiword units like there is good reason to INF; there is a strong case here for
treatment under the relevant sense division of reason.
There are, of course, equally good reasons for giving main entry to
collocations as there are for recording them under a sub-entry, whether this be
a separate entry or a sense division of a particular headword (cf. Burger 1998:
172 on multi-word units). However, if we decide to give collocations main
entry status, this will entail an even more complex macrostructure. To take but
one example, multi-word collocations serving a pragmatic or text-structuring
function and beginning with the pronoun it (it behoves us to INF, it is worth
bearing in mind that/wh-clause, etc.) or the preposition to (to give an example,
to this end, to return to NP) would fill dozens of pages, and so would two-item
collocations beginning either with common nodes or common collocates
(such as increase or give).
From all this it seems reasonable to conclude, as most theorists do (cf. for
example Burger 1989: 595 on phrasemes), that there is no ready-made solution
for the positioning of collocational units in semasiological dictionaries.
Each case requires to be considered on its own merits, and the preferences of
particular user groups have to be taken into account (Bogaards 1990, 1991);
there should be neither consistent conflation into end-of-article nests nor
arbitrary allocation to a particular sense division. Rather, as with derivatives
and compounds (which have traditionally been conceived of as distinct from
collocations), it is inevitable to steer a middle course between considerations
of semantic relatedness, user convenience and economy of treatment (cf. Cowie
1999: 150 on derivatives and compounds). In any case, collocations should
be highlighted typographically, and, if necessary, attention should be drawn
to their special pragmatic and/or text-structuring functions. However, given
the sheer size of the class of collocations, alphabetical access seems an
unmanageable solution in the long run.
3.2.2 Representation of semantic-pragmatic collocations. If we now ascertain the
relationship between types of collocations and the problems associated with
recording them, it turns out that the semasiological dictionary experiences
the greatest difficulty in adequately representing purely semantic-pragmatic
collocations occurring in specific situation-types or topic areas. A pertinent
example is afforded by semantic-pragmatic collocations based around mordre
sur (overlap into, go over into, cut into, veer off course into/onto), which
occur in three main topic areas, viz. a) geography (e.g. une region mord sur une
autre), b) medicine (une partie du corps mord sur une autre) and c) motoring
(une voiture mord sur une partie de la route).
The bilingual semasiological encoding dictionary has two options to
represent such information: by adapting PGF style: une voiture mord sur qc
13
14
Dirk Siepmann
for users but it becomes one in the case of collocations which appear to have
been freely put together by the application of general semantic and syntactic
rules. This can be illustrated with two examples, one from an unabridged
monolingual dictionary (GR) and one from a monolingual learners dictionary
(CCED).
GR, which offers a sprinkling of extended collocations, will serve to
illustrate the haphazard nature of current practice (for further detail, see
Siepmann 2005). Thus, the exemplificatory infinitive clause pour nen citer
quun exemple a collocation of type 2 common in academic writing is found
as the second example under sub-entry II.2:
(XIVe). Cas, evenement particulier, chose precise qui entre dans
(une categorie, un genre . . .) et qui sert a` confirmer, illustrer, preciser
(un concept). Voici un exemple de sa betise. Pour ne (nen) citer quun
(seul) exemple. Apercu, echantillon, specimen. Ce cas offre un exemple
typique de telle maladie. 5X Type. Cest un bel exemple de presence
desprit! Alleguer, apporter des exemples a` lappui dune assertion, dune
affirmation. 5X Preuve. Exemple concret illustrant une idee abstraite.
Appuyer (cit. 5) dun exemple. Exemples donnes dans un manuel de physique,
de chimie. Exemple bien, mal choisi. Donnez-moi un exemple de volcan
eteint, de plissement tertiaire. Exemples a` lappui dun raisonnement,
dune demonstration. Exemple qui prouve que . . . Il ma cite lexemple de
ce chanteur (! 1. Basse, cit. 7). Puiser ses exemples dans lhistoire
(! Ego sme, cit. 1). (GR, s.v. exemple)
The multi-word collocation in question has been entered as an example
sentence followed by a full stop. This implies that the phrase can stand on its
own, thus obscuring its textual function of introducing an example, and
potentially leading at least the foreign-born user astray.
With a collocation such as we (now) turn (now) to the situation is even less
clear. In CCED it appears in the exemplificatory material at sub-entry 12 for
turn and is not explicitly marked as a collocational unit:
We turn now to the British news.
This example sentence may, however, not be very useful to learners, since it
neglects to highlight that we are dealing with a transitional device that can be
employed in both spoken and written English rather than an ad-hoc formation.
The drawbacks of such practice should by now be obvious. For one thing,
neither the native nor the non-native user will be sensitised to the holistic
nature of multi-word units. For another, the non-native user in particular
will find it difficult to find variants of a particular collocation, such as pour ne
donner quun exemple or pour prendre un seul exemple in the case of the example
from GR this is due to the lack of synonymic links in the mediostructure
15
already touched upon. One reason for the lack of cross-referencing with
regard to synonyms is what may be termed the alphabetical framework
approach to dictionary making. In the compilation of large-scale dictionaries
one commonly starts by drawing up an alphabetical list, or framework
of the major sense divisions before assigning one small section of the
alphabetical list to the individual lexicographer, who will identify and enter
collocations of individual lexemes without much regard to the findings of his
or her colleagues.
As can also be inferred from the above examples, another serious
disadvantage of current practice is that common collocations tend to be
submerged amid a welter of detail. Thus, in GR, it takes a considerable amount
of searching to locate the concessive discourse marker il faut bien reconnatre
que within one of the sub-entries for reconnatre. The specific pragmatic
function of the marker is not made explicit; rather, it must be inferred from
the general definition given under sense division 4 of reconnatre or from its
synonymy with the evidence marker il faut se rendre a` levidence, to which the
reader is cross-referred.
4. (XIVe). Admettre pour vrai apre`s avoir nie, ou apre`s avoir doute,
accepter malgre des reticences. 5X Admettre, averer, declarer . . . On a fini
par reconnatre son innocence. 5X Croire (a`); ! aussi Rendre hommage*
a` . . . On est force de reconnatre des divergences (cit. 1) entre certains
textes . . . Maintes fois, il le reconnat lui-meme, il manquait de bon sens
(! Grain, cit. 26). Reconnatre la superiorite de qqn. 5X Ceder (3.: le
ceder a`); proclamer . . . Amener qqn a` reconnatre. 5X Convaincre.
Reconnatre que. 5X Admettre, avouer, convenir (de); ! Boiteux, cit. 7;
demarche, cit. 4; Dieu, cit. 47; malheur, cit. 39; oracle, cit. 4. Ils ont tous
reconnu quil a fait ce quil a pu. 5X Tomber (daccord). Vous nhesiterez
(cit. 14) pas a` reconnatre que. . . Je reconnais que . . . 5X Accorder; entendre
(jentends bien). - Quoi quon dise, on doit reconnatre que . . . (- Canaille,
cit. 12). Force (cit. 58) lui etait de reconnatre que . . . (- Exciter, cit. 32).
Il faut bien, on doit reconnatre que . . . 5X Evidence (se rendre a` levidence);
! Melodique, cit. 1.
Turning now to colligational patterns, we find that quite a number of these
have found their way into the dictionaries, but that they are usually treated by
way of lexical exemplification. Here are a few examples from PR:
un mecanicien en herbe (PR; underlying colligation: NP [vocation]
en herbe)
de la graine de voyou (PR; underlying colligation: de la graine de NP)
etre musicien dans lame (PR; underlying colligation: NP dans lame)
16
Dirk Siepmann
Note that such treatment is doubly limiting. For one thing, it conceals the
generativity of the patterns as well as the limits of such generativity; for
another, it omits to signal typical textual embeddings. Thus, a colligational
pattern such as NP/ADJ a` ses heures tends to occur as an appositive (often
clause-initial), and this information must be made available to the dictionary
user. Cf. for example:
Poe`te a` ses heures, Guillaume improvisait des vers.
Nicolas, jardinier a` ses heures, dispose dune plantation qui lui fournit la
matie`re premie`re de ses petards.
17
18
Dirk Siepmann
19
German
der Verkehrsstrom /
die Verkehrsflut
die kontinuierliche Verkehrsflut
in Richtung St. Sampsons
(die sich nach St Sampsons
ergieende Blechlawine)
schauen Sie sich fruhzeitig um
und ordnen Sie sich bei einer
gunstigen Gelegenheit in den
flieenden Verkehr ein
die Blechlawine*
French
German
I couldnt
je ne pouvais pas rester en
ich konnte nicht lange halten /
wait very long stationnement tre`s longtemps ich konnte nicht lange anhalten
German compound noun Blechlawine. See the entry from the projected
English-German bilingual thesaurus in Table 2.
To take but one more example, neither the big four monolingual learners
dictionaries8 nor CR recognize the specific sense that wait assumes in the area
of traffic; a bilingual methodology would reveal this sense since it requires nonliteral renditions such as rester en stationnement in French and stehen or halten
in German (see Table 3). This shows that, in a bilingual thesaurus, explicitness
can be achieved quasi automatically by recording all possible variants of
a collocation along with its topic-specific or situation-specific translations,
e.g. magazine feminin / magazine pour femmes womens magazine.
Likewise, the principle of internal coherence (Melcuk et al. 1995: 36 ff.) can
be readily adhered to in a bilingual thesaurus based on collocations rather than
20
Dirk Siepmann
lexemes (or lexemes and collocations). This principle states that there should
be perfect correspondence between the definition (i.e., in the case of a bilingual
thesaurus, the translation), the syntactic patterns and the lexical patterns
entered by a lexeme or phraseme; the only problem here is the directionality
of translation, which may lead to a larger number of entries in a bilingual
dictionary, as illustrated by the aforementioned collocation stream of traffic.
When used on its own, this collocation can be translated almost literally into
German in the form of the compound nouns Verkehrsstrom or Verkehrsflut.
When modified by the adjective endless, however, it can be rendered more
elegantly by the colloquial compound Blechlawine.
The problems with the definition of lexemes which arise from the inclusion
of such collocations as celibataire endurci do not occur in bilingual dictionaries
and are in fact purely theoretical, since collocations should be considered as
holistic meaning units. As Melcuk et al. (1995: 37) rightly conclude, the lexeme
celibataire on its own can never have the meaning homme en age detre marie
qui na jamais ete marie et qui veut rester tel although the above collocation
would seem to suggest just that.
Two additional principles proposed by Melcuk et al. (1995) are the principle
of exhaustiveness and that of compulsory consultation of databases.
As outlined in Section 2, the fulfilment of these principles can be greatly aided
through using a bilingual or multilingual approach which should proceed in an
iterative cycle:
compilation of subject-specific corpora in at least two languages !
compilation of subject-specific word and collocation lists ! analysis of the
contextual embedding of collocations with the help of the Internet !
additions to corpora from Internet sources used in context analysis (etc.)
In summary, it could be said that future lexicography should pursue
a methodology which is diametrically opposed to the framework approach
outlined above. Sooner than proceeding from alphabetical lists of individual
lexical units based on monolingual dictionaries, it would be grounded in topicspecific lists of collocations. The methodology of monolingual dictionary
making would thus also be turned on its head, since monolingual dictionaries
would benefit from the more detailed sense divisions established by bilingual
onomasiological lexicography.
21
donner exemple, which can be used in three different types of situation with
two different meanings (see Siepmann 2003):
(1) a situation where the speaker/writer wishes to cite another author: Miller
(1995) donne un exemple de . . .
(2) a situation where the speaker/writer introduces an example of his or her
own: pour donner un exemple, je vais vous donner un exemple
(3) a situation where the speaker/writer gives an actual example: lArabie
Saoudite donne un exemple dEtat islamique moderne ( is an example)
The collocation would thus be given at least three entries in different subsections of an onomasiological dictionary. Similar considerations hold true for
English collocations such as avoid an accident (cf. French empecher un accident
vs. eviter un accident) or leave the road (cf. German von der Strae abfahren
[intentional] vs. von der Strae abkommen [accidental]). It is the contrastive
background of a foreign language that allows the lexicographer to uncover the
polysemy of such items.9
Another problem noted above was the placement of collocations within
the dictionary; this can be resolved quite elegantly in an onomasiological
dictionary (or hybrid electronic dictionaries) such as the projected EnglishFrench Bilingual Thesaurus (Bilexicon), where topic area and situation type
are the decisive factor in determining place of entry.
Likewise, in an onomasiological dictionary semantically related or synonymic expressions do not need to be cross-referenced, as they will appear at
the same place in the dictionary. Examples are given in Table 4.
Table 4: Synonymic collocations in an onomasiological dictionary
Synonymic or semantically related
collocations
Discourse Markers:
Reformulation
Noise: Telling people
to be quiet
Hobbies: Describing amateurs
Timing: Right moment
22
Dirk Siepmann
23
24
Dirk Siepmann
5. Coverage
This section is meant to illustrate by example how the onomasiological
approach can close some of the gaps found in current encoding dictionaries.
It will be seen that even the best collocational dictionaries are far from covering
anything like the entire range of collocation described in Part I of this article.
The section is divided into three parts. The first deals with breadth of coverage,
the second with depth, while the third offers suggestions for improvement.
25
Additional collocations
from trilingual analysis
N ADJ: busy, four-lane (etc.), N ADJ: big, large, major (! Fr. grande
autoroute); clear (! G. frei); clogged;
orbital, urban
congested;controlled; deserted; elevated;
N V: join, leave, turn off, build empty; toll-free (! G. gebuhrenfrei,
mautfrei)
N N: driving, traffic, network,
N V: block, come off, cruise, get onto,
system, bridge, junction,
go onto, go on, turn off, get off,
service area, service station,
pull off, open, reopen
crash, pile-up
N Prep.: along the motorway, N motorway: toll (! Fr. a` peage,
G. gebuhrenpflichtig, mautpflichtig),
down the motorway,
motorway N: access, bridge, company
off the motorway,
(! Fr. societe dautoroute),
onto the motorway,
intersection, journey
on the motorway,
(! G. Autobahnfahrt),
up the motorway,
lay-by, madness, maintenance,
motorway from,
miles, project
motorway to
(! Fr. projet dautoroute), trip
N Prep.: (be) beside he motorway
(! F. border lautoroute)
triples: electronic motorway tolls
(elektronische Mauterhebung), on a clear
motorway, on clear motorway
(! G. auf freier (Auto-)Bahn, auf einer
freien Autobahn), excellent motorway
access, turn a trunk road into a
motorway (enlarge a trunk road into
a motorway) (! G. eine Bundesstrae
zur Autobahn ausbauen), widen a
motorway to four lanes (! G. vierspurig
ausbauen), to do a lot of motorway
driving, the motorway links A with B
(! F. relie A a` B)
26
Dirk Siepmann
27
French
theres no discussion
il ny rien a` discuter
German
da gibt es nichts
zu diskutieren
I wouldnt wish it
cest quelque chose que das wurde ich
niemandem wunschen
on anyone
je ne souhaiterais
pas a` mon pire ennemi (wollen) / das wurde
ich nicht einmal
meinem argsten
Feind wunschen
ich meine es ja nur gut
just being friendly
jai seulement voulu
etre (me montrer)
aimable avec
(pour) toi/vous
this isnt really
(pour toi) il ne sagit
Dir geht es ja gar
about NP
pas de INF / NP
nicht um NP
and Bobs your uncle
et le tour est joue /
und fertig ist die Laube
et voila` le travail
Ich wurde ihn/
I wouldnt kick
Je ne coucherais
sie nicht von der
him/her out of the bed. pas dans le
Bettkante stoen.
porte-savon.
28
Dirk Siepmann
29
French
30
Dirk Siepmann
German
31
32
Dirk Siepmann
so /sou/
(...)
12 You can use not so to say that what you have just
PHR as
stated is untrue although it may have seemed probable sentence
at first sight. This use is particularly common in written PRAGMATICS
English. Some might think Volkswagen, which now
owns 70 per cent of the Czech company, would have
thought the Skodas identity problematic. Not so. VW
sees Skoda as one of the most recognised brand names
in advertising.
Lemma type
Example
morpheme
lexeme
collocations of
type 1, 2 and 3
morphematic lemma
one-item lemma
multi-item lemma:
a) colligational
b) collocational
long-distance
collocations
of type 3
separable lemmas
of each collocation should be given for the user to form a correct understanding of its use and to be able to use it productively in a new context.
Accordingly, unabridged dictionaries of the future should contain at least
the three major types of lemmas (one-item lemmas, multi-item lemmas
and morphematic lemmas)10; to this we might add separable lemmas as
representations of long-distance collocations and some collocations of type 3
(see Table 14). As seen in Tables 5 and 6, complementation patterns can be
shown using placeholders such as so or sth or typical representatives of the
semantic class which can be inserted into a particular slot, such as abeille
in Table 5.
7. The limits of translatability
Opponents of bilingual dictionaries or vocabulary lists for encoding purposes
have often argued that such learning materials encourage the erroneous
assumption of one-to-one equivalences between items. The argument is
clearly valid if we equate one-word items such as house and maison or
English population and French population, but it falls apart in the case of
33
French
German
il a du avoir son
permis dans une
pochette surprise
French
German
A budding N
un N en herbe
an amateur N
un N a` ses heures
similarly with NP
il en va semblablement
pour NP
un NP superieur aux
attentes
an NP that exceeds
expectations
34
Dirk Siepmann
rather than citation forms. The same goes for collocations where one language
uses an implicit form of words which the other tends to make explicit. Thus,
imagine a car parked alongside a fence, so that little space is left between the
passenger door and the fence. The typical question German drivers put to their
passengers in such a situation will go something like this: Soll ich ein Stuck
vorsetzen? An English driver might prefer a more explicit wording along
the lines of: Do you want me to move the car / it forward a bit? (alongside Shall I
go forward a bit?)
Exceptions of type 2 occur when the languages under survey do not offer
the same number of collocations for some particular idea. Such difference
has frequently been noted in the area of single-word lexemes: it has long
been known, for example, that English has more verbs of movement than
either French or German. Similar observations can now be made for
collocations. Thus, English resemblance collocates with a wider variety of
adjectives denoting strangeness than its French and German counterparts
(cf. Siepmann 2003).
Both types of exceptions require special attention on the part of the lexicographer. It is particularly dangerous to resort to intuitive translations, as
a number of defective translations from published dictionaries (e.g. weitraumige
Umleitung ! *diversion covering a wide area [PGE]) readily attest.
Sometimes such translation errors occur because there are genuine
collocational gaps, but nevertheless the translator wishes to provide a
collocation at all costs. The best strategy to follow in such cases is to study
parallel texts and to offer a suitable paraphrase which should be marked
as such (e.g. by using the tilde).
8. Conclusion
The broadly-based definition of collocation on which this article is based
opens up new perspectives for both monolingual and bilingual lexicography.
Future dictionaries will need to record any type of structurally complex unit,
paying increased attention to collocational frameworks (my NP exactly) and
fixed expressions of regular syntactic composition (Ive got eyes in my head,
there are good reasons for believing that, I couldnt agree more, etc.). It has been
shown that bilingual or multilingual onomasiological lexicography is set to lead
the way in this endeavour, since it has obvious advantages over monolingual
and semasiological approaches; bilingual dictionaries should no longer be
based on monolingual dictionaries, but rather the other way round. It has
also emerged that the onomasiological dictionary of the future will constitute
a new kind of dictionary of synonyms to the extent that it will contain
collocational rather than one-word synonyms, along the lines of Schemanns
(1991) dictionary of German idioms (SR).
35
Notes
1
Hoey (1998) defines colligation thus: (a) the grammatical company a word keeps
(or avoids keeping) either within its own group or at a higher rank (b) the grammatical
functions that the words group prefers (c) the place in a sequence that a word prefers
(or avoids).
2
Note, however, that there is much less non-native material to be found on the
Internet for languages such as French, German or Italian, so that a more reliable picture
of native language use can be built up.
3
Of course, meaning arises through the interaction of mother and child long before
it can be represented linguistically (cf. Nelson 1998, Stern 1998). It is commonly
assumed that babies who are not yet able to speak assign meaning to the different
phases of a proto-narrative sequence. The first meanings acquired in early language
acquisition are therefore of a holistic nature; the words bath or bathroom, for
example, will be associated with the relevant proto-narrative sequence (entering the
room, opening the tap, feeling the warmth of the water, the stinging sensation of soap
in the babys eyes, etc.) rather than a room containing a toilet, a shower, a bathtub and
a washbasin. It thus appears that meaning is created by the repeated connection between
feelings and/or lexical units on the one hand and contexts on the other hand.
4
It will be noted that the underlying assumption here is that more is better. Active
users such as advanced foreign language learners and translators working into a foreign
language require the most detailed and comprehensive information possible. It might
be argued that such users should turn directly to corpora instead, but the advantage of
a good dictionary is that it provides a ready-made account of the significant features
of a lexical item in a clear and memorable way.
5
Another problem attendant upon automatic extraction is the lack of an adequate
corpus base for collocations typical of spoken language.
6
It may be noted in passing that most complete utterances which consist of an
individual word are, in fact, collocational in nature, cf. help!; blood!; bed!; they are
holistic, situation-specific units (cf. Gonzalez-Rey 2002: 95, 101).
7
I do not wish to suggest that contrastive lexicology and bilingual or multilingual
lexicography can take account of all possible distinctions arising from cross-linguistic
comparison. As Hausmann (1995: 23) notes, such comparison could only be exhaustive
if it is restricted to lexical units with a relative degree of semantic autonomy; Hausmann
argues, for example, that lexical units exhibiting a high degree of context-dependence,
such as the French adjective sauvage would give rise to an endless multiplication
of potential equivalences. Arbitrary limits must therefore be set on the number of
languages to be compared as well as on equivalences and sense distinctions. The number
of languages will usually be restricted to two, i.e. the language pair treated in the
dictionary, since sense distinctions that are useful to, say, Italians using English are
not relevant to a French-English dictionary. It should also be noted, however, that
Hausmann overstates his case by focussing too much on the language of literature,
where creativity is at a premium. We will soon be able to cover exhaustively the ordinary
patterns, collocations and sense distinctions found in conversation and pragmatic
text types.
8
OALD is the only monolingual dictionary to record a similar sense (stop a vehicle
at the side of the road), which is too specific (cf. waiting at the traffic lights).
36
Dirk Siepmann
9
This does not mean that the question of an items polysemy is decided by applying
interlingual criteria; rather, cross-linguistic comparison should be viewed as a useful
heuristic to discovering language-internal polysemy which could theoretically also be
detected through monolingual investigation. It is also worth bearing in mind that
polysemy is an extremely relative notion, and that the spectrum of meanings covered by
a large number of words can give rise to an almost infinite number of context-dependent
sense divisions (cf. footnote 4 above).
10
It may be misleading to speak of multi-word lemmas, as Steyer (2000) does,
since colligational patterns contain slots filled by particular categories rather than
a specific word.
11
Technically, of course, bees do not collect honey, but the collocation is often
used in everyday language.
References
1. Dictionaries
Atkins, B. T. et al. 1993. Collins Robert French-English English-French Dictionary.
Unabridged. (3rd ed.). Glasgow: HarperCollins. (CR)
Atkins, B. T. et al. 1994. Le Robert & Collins. Vocabulaire anglais et americain. Paris:
Le Robert. (VAEA)
Binon, J. et al. 2000. Dictionnaire dapprentissage du francais des affaires. Paris: Didier.
(DAFA)
Dendien, J. 2004. Tresor de la Langue Francaise Informatise. Paris: CNRS. (TLF)
Cop, M. et al. 2001. PONS Groworterbuch Englisch. Stuttgart: Klett. (PGE)
Correard, M. (ed.) 1994. Oxford/Hachette French Dictionary. French-English/
English-French, Oxford: Oxford University Press. (OH)
Crowther, J. et al. 2002. Oxford Collocations Dictionary for Students of English. Oxford:
Oxford University Press. (OC)
Chapman, R. L. (ed.) 1996. Rogets International Thesaurus. Glasgow: HarperCollins.
(RO)
Collins Cobuild English Dictionary for Advanced Learners (3rd ed. 2001). Glasgow:
HarperCollins. (CCED)
Dornseiff, F. and Quasthoff, U. 2004. Der deutsche Wortschatz nach Sachgruppen. Berlin:
De Gruyter. (DO)
Hamblock, D. and Wessels, D. 1999. Groworterbuch Wirtschaftsenglisch
Deutsch-Englisch/Englisch-Deutsch (5th ed.). Berlin: Cornelsen. (GW)
Knight, L. S. et al. 1999. Collins German-English English-German Dictionary. Unabridged
(4th ed.). Glasgow: HarperCollins. (CG)
McArthur, T. 1981. Longman Lexicon of Contemporary English. London: Longman.
(LLCE)
Quasthoff, U. (ed.) 2003. Franz Dornseiff: Der deutsche Wortschatz nach Sachgruppen
(CD-ROM). (DO)
Procter, P. (ed.) 2001. Cambridge International Dictionary of English on CD-ROM.
Cambridge: Cambridge University Press. (CIDE)
Rey, A. (ed.) 1993. Le nouveau Petit Robert. Paris: Le Robert. (PR)
Rey, A. (ed.) 1985. Le Grand Robert de la langue francaise sur CD-ROM. Paris:
Le Robert. (GR)
Schnorr, V. et al. 1996. PONS Groworterbuch Franzosisch. Stuttgart: Klett. (PGF)
Schemann, H. 1991. Synonymworterbuch der deutschen Redensarten. Stuttgart:
Klett. (SR)
37
2. Other literature
Aitchison, J. 1994. Words in the mind. An Introduction to the Mental Lexicon. Oxford:
Blackwell.
Arnaud, P. J. L. 1992. La connaissance des proverbes francais par les locuteurs natifs
et leur selection didactique. Cahiers de Lexicologie 1: 195238.
Baker, M., Francis, G. and Tognini-Bonelli, E. 1993. Text and Technology: In Honour
of John Sinclair. Amsterdam/Philadelphia: Benjamins.
Bally, C. 1909/1951. Traite de Stylistique Francaise (Vol. 1). Geneva: Librairie
Georg & Cie.
Biber, Douglas et al. 1999. Longman Grammar of Spoken and Written English. London:
Longman.
Bogaards, P. 1990. Ou` cherche-t-on dans le dictionnaire? International Journal of
Lexicography 3: 79102.
Bogaards, P. 1991. Word frequency in the Search Strategies of French Dictionary
Users. Lexicographica 7: 202212.
Burger, H. 1989. Phraseologismen im allgemeinen einsprachigen Worterbuch
in F. J. Hausmann, Franz Josef, et al. (eds.), Worterbucher: Ein internationales
Handbuch zur Lexikographie. Vol. 1 (Handbu cher zur Sprach- und
Kommunikationswissenschaft; Vol. 5). Berlin/New York: De Gruyter, 593599.
Burger, H. 1998. Phraseologie: Eine Einfuhrung am Beispiel des Deutschen. Berlin:
Schmidt.
Church K. W. and Hanks P. 1990. Word Association Norms, Mutual Information
and Lexicography. Computational Linguistics 1: 2229.
Council of Europe 2001. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
Cowie, A. 1999. English Dictionaries for Foreign Learners. Oxford: Oxford
University Press.
Cummins, S. and Desjardins, I. 2002. A Case Study in Lexical Research for
Translation. International Journal of Lexicography 2: 139156.
de Florio-Hansen, I. (2004), Wortschatzerwerb und Wortschatzlernen von
Fremdsprachenstudierenden. Erste Ergebnisse einer empirischen Untersuchung.
Fremdsprachen Lehren und Lernen 33: 83113.
Dunning, T.E. 1993. Accurate Methods for the Statistics of Surprise and Coincidence.
Computational Linguistics 1: 6174.
Feilke, H. 1996. Sprache als soziale Gestalt. Frankfurt: Suhrkamp.
Feilke, H. 2003. Kontext Zeichen Kompetenz. Wortverbindungen unter
sprachtheoretischem Aspekt in K. Steyer (ed.), 4164.
Firth, R. 1957. Papers in Linguistics. London: Oxford University Press.
Fontenelle, T. 2003. Collocations et traitement automatique du langage naturel
in F. Grossmann et A. Tutin, 7588.
Francis, G., Hunston, S. and Manning, E. 1998. Collins Cobuild Grammar Patterns 2:
Nouns and Adjectives. London: HarperCollins.
Gates, E. 1988. The treatment of multi-word lexemes in some current dictionaries of
English in M. Snell-Hornby, Mary (ed.) (1986), ZuriLEX86 Proceedings. Papers
read at the Euralex International Congress. Tubingen: Francke, 99106.
38
Dirk Siepmann
Gotze, L. 1999. Der Zweitspracherwerb aus der Sicht der Hirnforschung in Deutsch
als Fremdsprache 1: 1016.
Gonzalez-Rey, I. 2002. La phraseologie du francais. Toulouse: Presses Universitaires
du Mirail.
Grossmann, F. and Tutin, A. (eds.) 2003. Les collocations: analyse et traitement. Travaux
et recherches en linguistique appliquee Serie E. Amsterdam: De Werelt.
Harras, G. 1989. Zu einer Theorie des lexikographischen Beispiels in Hausmann et al.
(eds.), 607614.
Hartmann, R. R. K. 2001. Teaching and Researching Lexicography. London: Longman.
Hausmann, F. J. et al. (eds.) 19891991. Dictionaries: An International Encyclopedia
of Lexicography (3 Vols.). Berlin: Walter de Gruyter.
Hausmann, F. J. 1995. Von der Unmoglichkeit der kontrastiven Lexikologie in
H.-P. Kormann and A. L. Kjaer (eds.), Von der Allgegenwart der Lexikologie.
Kontrastive Lexikologie als Vorstufe zur zweisprachigen Lexikographie. Tubingen:
Niemeyer, 1923.
Hausmann, F. J. 1999. Le dictionnaire de collocations Crite`res de son organisation
in N. Greiner et al., Texte und Kontexte in Sprachen und Kulturen. Festschrift fur
Jorn Albrecht. Trier: Wissenschaftlicher Verlag Trier, 121140.
Hausmann, F. J. 2002. La lexicographie bilingue en Europe: peut-on lameliorer? in
La Lessicograa Bilingue tra presente e avvenire, Atti del Convegno Vercelli, 45
maggio 2000, a cura di Elena Ferrario e Virginia Pulcini, Vercelli: Mercurio, 1132.
Hausmann, F. J. 2003. Was sind eigentlich Kollokationen? in K. Steyer (ed.), 309334.
Hausmann, F. J. forthcoming. Der undurchsichtige Wortschatz des Franzosischen.
Lernwortlisten fur Schule und Studium.
Hoey, M. 1998. Introducing Applied Linguistics: 25 Years On. Plenary Paper in the
31st BAAL Annual Meeting: Language and Literacies, University of Manchester,
September 1998.
Howarth, P. 1996. Phraseology in English Academic Writing. Some Implications for
Language Learning and Dictionary Making. Tubingen: Niemeyer.
Hunston, S. 2001. Colligation, Lexis, Pattern and Text in M. Scott and G. Thompson
(eds.), Patterns of Text. In honour of Michael Hoey. Amsterdam: Benjamins, 1334.
Jones, S. 2002. Antonymy: A Corpus-based Perspective. London: Routledge.
Kjellmer, G. 1994. A Dictionary of English Collocations. Oxford: Clarendon Press.
Kocourek, R. 1991. La langue francaise de la technique et de la science. Wiesbaden:
Brandstetter.
Kromann, H.-P. 1991. Principles of Bilingual Lexicography in F. J. Hausmann et al.,
27112728.
Lafing, J. 1991. Towards High-Precision Machine Translation. Based on Contrastive
Textology. Berlin: Foris Publications.
Louw, B. 1993. Irony in the text or insincerity in the writerthe diagnostic potential
of semantic prosodies in M. Baker, G. Francis and E. Tognini-Bonelli (eds.),
157176.
Lyne, A. A. 1985. The vocabulary of French business correspondence. Word frequencies,
collocations and problems of lexicographic method. Gene`ve/Paris: Slatkine-Champion.
McArthur, T. 1981. Longman Lexicon of Contemporary English. Londres: Longman.
McArthur, T. 1986. Thematic Lexicography in R. R. K. Hartmann, The History
of Lexicography. Papers from the Dictionary Research Centre Seminar at Exeter,
March 1986. Amsterdam: Benjamins, 157166.
McArthur, T. 1998. Living Words: Language, Lexicography and the Knowledge
Revolution, Exeter: University of Exeter Press.
39