Testing of Second Language

/$1*8$*(
Article 7(67,1*
Language Testing
Testing of second language XX(X) 1–19

© The Author(s) 2011
pragmatics: Past and future Reprints and permission:

sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0265532210394633
http://ltj.sagepub.com
Carsten Roever
University of Melbourne, Australia
Abstract
Testing of second language pragmatic competence is an underexplored but growing area of second
language assessment. Tests have focused on assessing learners’ sociopragmatic and pragmalinguistic
abilities but the speech act framework informing most current productive testing instruments in
interlanguage pragmatics has been criticized for under-representing the construct. In particular,
the assessment of learners’ ability to produce extended monologic and dialogic discourse is a
missing component in existing assessments. This paper reviews existing tests and argues for a
discursive re-orientation of pragmatics tests. Suggestions for tasks and scoring approaches to
assess discursive abilities while maintaining practicality are provided, and the problematicity of
native speaker benchmarking is discussed.
Keywords
discourse completion test, interlanguage pragmatics, language testing, oral proficiency, role play,
second language pragmatics
Testing of second language pragmatics is a growing area of research and practice in L2

assessment, and 15 years after Hudson, Detmer, and Brown’s (1995) seminal study, it is
time to take stock of what has been achieved and to look ahead at ways to assess the
construct of L2 pragmatic competence more comprehensively. This review starts out by
delineating the construct of pragmatic ability and outlining an interpretive argument and
a validity argument to underpin assessment.
Construct and interpretive validation argument

In a broad definition, Crystal (1997) characterizes pragmatics as ‘the study of language
from the point of view of users, especially of the choices they make, the constraints they
encounter in using language in social interaction and the effects their use of language
Corresponding author:
Carsten Roever, Linguistics & Applied Linguistics, School of Languages & Linguistics,The University of
Melbourne,Vic 3010, Australia.
Email: carsten@unimelb.edu.au
Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

2 Language Testing XX(X)
has on other participants in the act of communication’ (p. 301). More technical and
descriptive definitions following a similar approach have been provided by Levinson
(1983) and Mey (2001).
In a description of the major components of language users’ pragmatic competence,
Leech (1983) distinguishes between pragmalinguistics, the linguistic tools necessary to
express and comprehend speech intentions, and sociopragmatics, the social rules that
constrain speakers’ linguistic choices and hearers’ possible interpretations. Both are
tightly connected, as a speaker’s sociopragmatic analysis of a situation (in terms of
politeness, possible meanings, and cultural norms and prohibitions) is linguistically
encoded through pragmalinguistic choices.
Pragmatics is part of all major models of communicative competence (Bachman,
1990; Bachman & Palmer, 2010; Canale, 1983; Canale & Swain, 1980). In the most
recent formulation of their model, Bachman and Palmer (2010) describe pragmatic
knowledge as one of the two components of language ability, and parallel Leech (1983)
in distinguishing between functional knowledge and sociolinguistic knowledge.
Based on the above conceptualizations, interpretive and validity arguments (Kane,
2006) for tests of second language pragmatics can be outlined. Following Kane (2006),
an interpretive argument explicates the chain of inferences which lead from observed
performances via scores and score interpretations to conclusions and decisions. A valid-
ity argument provides an evaluation of each component of the interpretive argument,
supporting it or challenging it by means of critical examination and analysis of empirical
data. The interpretive argument parallels Bachman and Palmer’s (2010) Assessment Use
Argument, and the validity argument parallels their concept of assessment justification.
In a first step, observations of learners’ pragmatic knowledge and ability for use can
be generated by probing their sociopragmatic and pragmalinguistic knowledge through
proficiency tests, and assessing their ability for use of such knowledge in social interac-
tion through performance tests.
The resulting sample of responses to test tasks is interpreted as an observed test score,
and validated by investigating the rating scale or the defensibility of a native-speaker
benchmark. In pragmatics assessment, rating scales have been used in scoring conven-
tional and appropriate use (Hudson, Detmer, & Brown, 1995; Roever, 2005), whereas
NS benchmark performances underlie the scoring of tests of implicature and routine
formulae (Bouton, 1988, 1994; Roever, 2005; Taguchi, 2008) and recognition of appro-
priate politeness level (Liu, 2006).
The observed score is generalized as a universe score, and interpreted as covering the
whole universe of possible items and responses. This interpretation can be supported and
validated by a reliability coefficient like Cronbach’s alpha (Brown, 2001; Roever, 2005),
interrater reliability (Brown, 2001; Roever, 2008; Walters, 2007) or generalizability
(Brown, 2008).
The universe score is then extrapolated to a target score across the target domain,
defined as the ‘range of observations associated with the attribute of interest’ (Kane,
2006, p. 31). For example, the target domain for most pragmatics tests, although not usu-
ally explicitly formulated, tends to be social language use in everyday settings (small
talk, shopping, dinner table talk) and, to a lesser degree, academic settings. This crucial
step of extrapolation interprets test results as indicative of the level of the targeted trait

Roever 3
of L2 pragmatic knowledge and ability for use, and validation relies on empirical and
analytic arguments, for example criterion measures (Liu, 2006), comparing different
populations (Roever, 2005), and convergent validity evidence like the multitrait-
multimethod matrix (Campbell & Fiske, 1959). A validation argument for extrapolation
inferences is not usually explicitly made in pragmatics assessment (but see Roever, 2005,
for an exception), and the limited breadth of the construct of second language pragmatics
operationalized in current tests makes such an extrapolation questionable.
Finally, the implications of the target score for the characteristics of the trait need to
be explicated. Such implications might involve decisions about learners, such as place-
ment at different levels of instructional treatment, giving credit for study abroad, or
accepting test results for admission to academic programs. These interpretations can be
validated by investigating the consequences of test use but this has not yet been done in
pragmatics assessment since only one pragmatics assessment instrument has been used
operationally to date in a very limited, low-stakes context (Roever, 2010b).
No current tests of second language pragmatics have been designed following an
explicit interpretive argument. Most problematically, they tend to under-represent the
construct and sample observations too narrowly, breaking the inferential chain at the
point of extrapolation from a universe score to a target score.
In this paper, I will argue for a broadening of the evidence base that allows extrapola-
tion inferences to a target domain of social language use in academic and non-institutional
contexts, which is implicit in most second language pragmatics tests. The next section
gives a brief overview of the research traditions underlying assessment in second lan-
guage pragmatics.
Interlanguage pragmatics research

Current tests of pragmatics have been informed by research in interlanguage pragmatics
and cross-cultural pragmatics in the late 1980s and early 1990s, especially the Cross-
Cultural Speech Act Realization Project (CCSARP) (Blum-Kulka, House, & Kasper,
1989). CCSARP chose the speech act (Austin, 1962; Searle, 1969) as its unit of
analysis, and focused on cross-cultural differences in the realization of requests and
apologies under different settings of the three context factors identified by Brown
and Levinson (1987): relative Power of the interlocutors, degree of Social Distance
(i.e. acquaintanceship and membership in a similar social group), and degree of
Imposition (i.e. the cost to the hearer of carrying out the interlocutor’s request or the
damage caused by an action requiring an apology). CCSARP researchers collected
responses by means of a Discourse Completion Test (DCT), in which individual
tasks contain a prompt with a situation description and participants are instructed to
write what they would say in the situation. An example of a DCT item eliciting a
high-imposition request with an interlocutor of equal power and low social distance
is shown in Figure 1.
CCSARP classified response strategies according to a coding scheme and investi-
gated cross-cultural differences based on frequencies of strategy occurrence.

You are traveling overseas and arranged for a friend to take you to the airport but at the last
minute, she called and cancelled. Your only option is to ask your housemate Jack to give you a lift.
Jack is in the living room reading the paper.
What would you say in this situation?
You say: __________________________________
Figure 1. DCT item
A great deal of research in interlanguage pragmatics has been conducted under the
speech act paradigm, investigating learners’ knowledge of a variety of speech acts with
request and apology being the most widely researched (e.g. recently by Biesenbach-Lucas,
2007; Byon, 2004; Cohen & Shively, 2007; Felix-Brasdefer, 2007; Jung 2004; Warga &
Scholmberger, 2007) in addition to refusals, complements, advice, agreements/disagree-
ments, suggestions, complaints, and gratitude (e.g. Barron, 2003; Eisenstein & Bodman,
1993; Matsumura 2001, 2003, 2007; Lorenzo-Dus, 2001; Salsbury & Bardovi-Harlig,
2001; Trosborg, 1995). DCTs are the most widely used type of research instrument, but
role plays, sociopragmatic judgment tasks, and multiple-choice tests have also been
employed (see Martínez-Flor & Usó-Juan, 2010, for a recent overview of speech act
research methodology).
While speech act research has been by far the largest component of cross-cultural and
interlanguage pragmatics research, research in some other areas of second language
pragmatics also exists. Bouton (1988, 1994, 1999) assessed ESL learner’s knowledge of
implicature, that is, indirect uses of language. Broadly following Grice (1975), he distin-
guished between idiosyncratic implicature, which is general conversational implicature
(e.g. ‘Can I borrow your car?’ – ‘It’s in the shop.’), and several types of ‘formulaic impli-
cature’, for example the Pope Q (‘Is the Pope Catholic?’), topic change (‘How was the
job interview?’ – ‘This curry is really good.’), and indirect criticism through praise of an
irrelevant aspect of the whole (‘Did you like the dessert?’ – ‘Let’s just say it was color-
ful.’). Bouton focused on learners’ longitudinal development of implicature comprehen-
sion, and Taguchi (2005, 2008) has continued this work and added investigations of
reaction time.
Another area of research concerns routine formulae (Bardovi-Harlig, 2006, 2008,
2009; House, 1996; Nattinger & DeCarrico, 1992; Kanagy, 1999; Roever, 1996, in press;
Scarcella, 1979; Wildner-Bassett, 1986), which are conceptualized as formulaic expres-
sions bound to specific social settings and fulfilling a specific purpose (Coulmas, 1981).
Research has focused on the effects of exposure, teachability and the socialization effects
of routines usage.
Much less interlanguage pragmatics research has been conducted investigating the
use of language in extended social interaction and even less on the effect of such lan-
guage use on interlocutors. Several studies have used extended discourse as data,
mostly collected through role plays (e.g. Félix-Brasdefer, 2004, 2007; Gass & Houck,

Roever 5
1999; Kobayashi & Rinnert, 2003; Taguchi, 2007; Trosborg, 1995) but occasionally
also in naturalistic settings (Achiba, 2003; Bardovi-Harlig & Hartford, 1993, 1996;
Ishida, 2009). However, data analysis then applied a traditional speech act framework,
coding speech act strategies, and did not analyze the sequential organization of the
discourse, although some attempts have been made at segmenting interactions (Gass &
Houck, 1999; Félix-Brasdefer, 2010). Very few studies have taken the effect on the
interlocutor into account (Al-Gahtani & Roever, 2010, and van Lier & Matsuo, 2000,
are rare exceptions).
Of course, the analysis of sequential organization is central to Conversation Analysis,
which in principle allows investigations of L2 development as comparisons between
groups of participants (Sidell, 2009; Zimmerman, 1999) but has only recently started
generating developmental work (Al-Gahtani, 2010; Al-Gahtani & Roever, 2010),
although some contrastive studies exist (Gardner & Wagner, 2004; Günthner, 2008;
Pavlidou, 2008; Taleghani-Nikazm, 2002).
On the whole that research in interlanguage pragmatics has generally focused on the
investigation of isolated aspects of language learners’ pragmatic competence but under-
emphasized ability for use in social interaction. This tendency is equally apparent in tests
of second language pragmatics.
Tests of L2 pragmatics
Hudson, Detmer, and Brown’s (1992, 1995) took the theoretical framework underlying
CCSARP and its methodological approach as the starting point for their own test battery.
They focused on politeness and directness levels and employed five different test instru-
ments to measure ESL learners’ knowledge of the speech acts request, apology, and refusal:
• oral DCTs, where test takers spoke their utterance into a microphone;
• traditional written DCTs, where they wrote ‘what they would say’;
• multiple choice DCTs, where they marked the appropriate utterance for a given
situation among three response options;
• role plays, where they produced a request, apology, and refusal in the same inter-
action with a role play conductor; and
• self-assessments where they evaluated their own performance on the DCTs and
the role plays.
All instruments (except the self assessments) were designed around high / low set-
tings of the contextual variables Power, Social Distance, and Imposition (Brown &
Levinson, 1987), rendering eight different combinations of context variables. The test
was specifically designed for L1 Japanese learners of English, and trialed with 25 partici-
pants. Raters assessed test taker performance on six dimensions: ability to use the correct
speech act, formulaic expressions, amount of speech used and information given, for-
mality, directness, and politeness. They used a five-step scale from ‘very unsatisfactory’
to ‘completely appropriate’.

Hudson, Detmer, and Brown (1995) describe the development of the instrument in
detail, and Hudson (2001a) reports reliabilities of .75 for the role play ratings, .86 for the
written DCTs, and .78 for the oral DCTs. Correlations between the DCTs were higher
than for DCTs with the role play, which points to a DCT method effect. Hudson (2001b)
also shows similarly high reliabilities for the self-assessments.
This test battery stimulated a good deal of research and led to three spin-offs.
Yamashita (1996) adapted it for L1 English-speaking learners of Japanese, Yoshitake
(1997) used it with EFL learners in Japan, and Ahn (2005) adapted the written and oral
DCTs as well as the role plays for Korean as a target language. In reviews of the three
spin-off studies, Brown (2001, 2008) found good reliabilities for all sub-tests except the
multiple-choice DCT, which is unfortunate, as this is the component of the battery with
the greatest practicality and the only one that is receptive rather than productive.
In an attempt to design a reliable multiple-choice DCT, Liu (2006) undertook a
focused development effort, also integrating written DCTs and self-assessments. He
found a significant correlation between the two DCT types, and obtained alpha reliabili-
ties around .9. This indicates high internal consistency of his instruments, but since he
used native speaker utterances as correct response options and non-native speaker ones
as incorrect options, McNamara and Roever (2006) question whether it was possibly the
idiomaticity of the native speaker responses in the multiple-choice DCT to which the
more able test takers reacted.
Tada (2005) developed another, smaller-scale instrument in the tradition of Hudson,
Detmer, and Brown, consisting of an oral DCT and a multiple-choice DCT, supported by
video prompts. He obtained reliabilities for both instruments around .75, and found sig-
nificant group differences between and high- and low-proficiency learners on his pro-
ductive tasks.
While the instruments discussed so far focused exclusively on speech acts and
politeness, Roever’s (2005, 2006) test was additionally informed by research on impli-
cature and routine formulae. His battery consisted of three sections assessing interpre-
tation of idiosyncratic and formulaic implicature, recognition of situational routine
formulae, and production of the speech acts of request, apology, and refusal. The
implicature and routines sections of his test were multiple-choice and self-scoring;
while the speech act section consisted of DCT items with rejoinders (responses by the
imaginary interlocutor) and was dichotomously rater-scored. Roever obtained an over-
all alpha reliability of .91, and noted the high practicality of the instrument, in particu-
lar the ease of scoring speech act items due to the use of rejoinders (Roever, 2008).
Unlike tests in the Hudson, Detmer, and Brown (1995) tradition, Roever’s instrument
did not target a specific native language, and Roever (2007, 2010a) demonstrated
through differential item functioning (DIF) analyses that neither test takers from
European nor Asian L1s were advantaged overall.
A spin-off from Roever’s instruments are Roever, Elder, Harding, Knoch, McNamara,
Ryan, and Wigglesworth’s (2009) speech act and implicature sections, which were devel-
oped as part of a larger, computer-based but not yet operational proficiency battery. In
pilot testing, Roever et al. obtained Cronbach’s alpha reliabilities of .77 for the implica-
ture section and .97 for the speech act section.
Another spin-off is Roever’s (2010b) implicature section as part of a post-admission
ESL diagnostic screening instrument for health-sciences students. The section assessed

Roever 7
idiosyncratic implicature and two types of formulaic implicature, topic change and indirect
criticism. Due to the very high overall proficiency of the sample, over half of which were
native speakers, alpha reliability was fairly low at .52.
A third proficiency test project followed a very different theoretical orientation to the
previous ones, and is notable as a first comprehensive attempt at a pragmatics test that
was not informed by speech act theory. Walters (2004, 2007, 2009), working in a conver-
sation analytic framework, used a listening comprehension test, role play, and DCT to
test ESL learners comprehension and production of assessment responses (i.e. an evalu-
ative response to a previous turn), compliment responses, and responses to pre-sequences.
Interestingly, despite the theoretical framework he adopted that would have allowed
investigations of sequential organization, Walters also focused exclusively on responses
as isolated utterances. His instrument was hampered by low reliabilities for all sub-tests,
possibly due to the small and highly homogenous group of participants. However, inter-
rater reliabilities for the role play were in the .7 region.
Beyond these proficiency‑oriented test batteries and the instruments developed in
their wake, DCTs, multiple-choice instruments, and occasionally role plays have been
used to trace acquisition, ascertain effects of instruction in the classroom (Ishihara &
Cohen, 2010) or investigate the effect of the learning environment or other learner back-
ground variables. Most of these studies have used self-made instruments whose psycho-
metric characteristics are often not well known. A notable exception is Takimoto (2009),
who employed role plays, DCTs, and aural and written appropriateness judgment tasks
to assess the effect of various approaches to teaching ESL request strategies. He obtained
high reliabilities in the .9 region.
Constructs, competencies, and inferences

It is readily apparent from this review of assessment studies in L2 pragmatics research
that the majority of studies has been conducted within the theoretical tradition of speech
act theory (Austin, 1962; Searle, 1969, 1975) and politeness theory (Brown & Levinson,
1987) with the exception of Walters’ (2004, 2007) and some aspects of Roever’s (2005,
2006) test. The main construct under investigation has been learners’ ability to produce
language that is appropriate and polite enough given the context (as defined by Brown &
Levinson, 1987), or their ability to judge whether exemplars of language are appropriate
and polite in context. Productive studies have frequently employed DCTs, which elicit
the target speech act with some external and internal modifications.
However, speech act theory and politeness theory as theoretical frameworks for prag-
matics research and the use of DCTs as research instruments have come under increasing
criticism for under-representing the discursive side of pragmatics. More than twenty
years ago, Schegloff (1988) attacked speech act analysis for not taking into account the
sequential organization of discourse, and ignoring that the function of an utterance is
conditioned by its placement in the turn-by-turn conduct of discourse. Politeness theory
too has been criticized with regard to its applicability to non-Western cultures (Hill, Ide,
Ikuta, Kawasaki, & Ogino, 1986; Mao, 1994; Matsumoto, 1988) and its deterministic
understanding of context as being entirely constituted by Brown and Levinson’s (1987)

situational variables of Power, Distance, and Imposition (Heritage, 1997; Kasper, 2006;
Schegloff, 1991), which does not take into account the dynamic discourse-internal con-
text of conversation (Heritage, 1984). The use of DCTs has become questionable after
Golato (2003) showed that DCTs elicit responses that do not occur in natural discourse
and fail to elicit some that do. Kasper (2006) delivers a strident critique of the speech act
research tradition as a whole, arguing that speech act analyses lead analysts to impute
meaning to utterances without regard to participants’ emic understanding, develop
etic and at times contradictory taxonomies (see also Meier, 1998), employ unreliable
research instruments, and ignore discourse-internal sequential context. She argues for
a re-orientation of pragmatics in a more discursive direction, employing the analysis of
extended interaction to uncover conversational actors’ interactional competencies as
they are brought to bear on the moment-by-moment conduct of talk.
While Kasper’s (2006) view concerns pragmatics research and especially the use of
data, the overall critique of the speech act framework, politeness theory, and the use of
DCTs also calls into question the constructs and methodology used in testing L2 prag-
matics. Just as interlanguage pragmatics research has underemphasized investigations of
conversational actors’ interactional ability as it is displayed through extended discourse,
so has work in the assessment of L2 pragmatics. It has primarily elicited isolated speech
acts and then judged their appropriateness, ignoring learners’ ability to participate in
extended interactions or structure monologic discourse in a target-like manner.
Furthermore, language use in interactions requires online processing and allows conclu-
sions as to ability for use (Hymes, 1972; Widdowson, 1989), whereas the commonly
used DCT involves offline processing and only allows conclusions as to knowledge.
A speech act approach therefore underrepresents the construct of pragmatic ability, and
tests need to include real-time measures of learners’ interactional abilities to allow defen-
sible extrapolation to a target domain of social language use. Of course, it is not claimed
here that pragmatics tests need to consist exclusively of dialogic interaction – language
users also engage in oral or written monologic discourse as well as offline comprehension
and judgment, and this needs to be tested as well. The following section outlines possible
approaches to the measurement of an extended construct of second language pragmatic
ability.
Measuring a broader construct of pragmatic ability

As argued above, tests of L2 pragmatics need to include assessment of learners’ partici-
pation in extended discourse, both monologic and dialogic. The elicitation of extended
monologic and dialogic discourse is nothing new in language testing research and prac-
tice, but such discourse is not generally used to assess learners’ pragmatic abilities. In
tests that use monologic discourse like IBT-TOEFL, no social purpose is indicated in the
tasks, and the scoring rubrics (Educational Testing Service, 2004) do not take into
account pragmatic features relevant to social language use.
Tests that employ dialogs between a tester and a test taker, like IELTS or the ACTFL
OPI and its variants, do not currently have as their purpose the assessment of interactional

Roever 9
abilities (Halleck, 2005), and have been shown to elicit language different from ordinary
conversation (Johnson, 2001; Kasper & Ross, 2007; Lazaraton, 2002; Young & He,
1998). They do not explicitly rate test takers’ interactional language use and actually
encourage extensive monologic production to obtain a ratable sample of candidate per-
formance. While the OPI often contains a role play, which does allow test takers to
deploy some of the interactional abilities that they need in real-world interactions (Okada,
2009), it would need to be investigated further (Kormos, 1999; van Lier, 1989) if it were
to be used as a tool for assessing interactional abilities.
Employing monologic and dialogic discourse for the assessment of interactional
abilities in a social setting requires further specification of the construct that is to be
assessed. With regard to monologic discourse, the use of contextualization cues to index
discourse structure and speech styles causes problems for learners, as research in
Interactional Sociolinguistics has shown (Gumperz, 1982, 1996), and can be a focal
area of assessment of pragmatic ability. Gumperz (1982) defines a contextualization cue
as ‘any feature of linguistic form which contributes to the signalling of contextual pre-
suppositions’ (p. 131) including prosody, facial expression, gesture, self- and other-
reference, formality level, word choice, conventional expressions, argument structure,
choice of content, and adherence to or flaunting of social norms. All of these features
index the speaker’s analysis of the social situation and in sum establish a speech style.
Competent speakers of a target language can recognize a situationally appropriate
speech style and produce it, indicating through their use of linguistic features that they
recognize the social rules and norms of the speech event (Cook, 2001). This is similar
to the concept of ‘interactional competence’ (Young, 2002; Young & He, 1998; Young
& Miller, 2004), which describes the ability to configure interactional resources for a
specific social and discursive practice.
Test takers’ ability to use contextualization cues and their repertoire of speech styles
can be measured in its breadth through requiring them to speak in very different situa-
tions; for example a formal presentation in a job interview, leaving a voicemail request-
ing that a professor re-mark an assignment, or casual reporting of a trip to friends in an
email. It would be relatively easy to integrate assessment of discourse structure and
establishment of a situationally appropriate speech style with current tests that elicit
monologic discourse but this requires a clearly defined target situation which must be a
social situation with an imaginary audience, a purpose for speaking, and an imaginary
physical context.
Contextualization cues and intuition about appropriate speech styles can also profit-
ably be tested receptively as Cook (2001) has shown. In her classroom-based study,
native American English speaking learners of Japanese as a foreign language over-
whelmingly did not recognize a highly inappropriate speech style for the speech event of
a job interview. They focused instead on propositional content, ignoring the contextual-
ization cues that indicated to native speaker and high-proficiency listeners that the focal
job applicant’s speech sample evidenced very poor control of contextually appropriate
language in Japanese.
While the measurement of learners’ ability to produce situationally appropriate mono-
logic discourse and to judge the appropriateness of such discourse is important partial

evidence of their pragmatic competence, it does not provide information about their
interactional abilities, to which we will now turn.
Just like in monologic discourse, test takers need to indicate their understanding of
the social situation through their choice of contextualization cues but interactive dis-
course additionally requires them to display their ability to sequentially organize the
interaction as they co-construct it with the interlocutor (Jacoby & Ochs, 1995), which
has been investigated extensively in Conversation Analysis. Aspects of sequential
organization include openings and closings, which can be general (Schegloff & Sacks,
1973) or specific to the institutional setting, for example in calls to an IT helpline
(Baker, Emmison, & Firth, 2001). These opening sequences may be followed by a pre-
sequence that projects the upcoming core business of the talk, for example pre-offers,
pre-invitations, pre-tellings, pre-announcements, pre-requests, and so on (Terasaki,
2004). Pre-sequences can include prefatory moves (pre-pres) (Schegloff, 1980, 2007)
and pre-expansions (Schegloff, 2007), and research suggests that presence, length, and
sequencing of pre-expansions differentiates learners of different proficiency levels
(Al-Gahtani & Roever, 2010). The core sequence, be it request-acceptance/refusal,
complaint-acknowledgement/rejection, compliment-acceptance/rejection shows pref-
erence organization, with preferred responses being short, simple, and immediate,
whereas dispreferred responses can include mitigation, elaboration, and non-contiguity
(Schegloff, 2007). In addition, post-expansions following the core sequence can occur
(Schegloff, 2007); for example when a request is rejected or conditions need to be
negotiated. Testers can score the resulting production based on test takers’ ability to
react to first-pair parts (i.e. utterances that open an exchange such as questions,
requests, offers, etc.), adequate sequencing of the interaction in terms of use of pre-
expansions or projection of the upcoming core sequence, use of contextualization cues
to index the relationship with the interlocutor and establish an appropriate speech
style, and the effect of the test taker’s production on the interlocutor, that is, the amount
of repair or need for the interlocutor to take control of the interaction (Al-Gahtani &
Roever, 2010; van Lier & Matsuo, 2000).
While assessment of discursive and interactional abilities should be central to the
assessment of pragmatic competence, other aspects of pragmatics can also be assessed.
Recognition and production of routine formulae as well as comprehension of implicature
are other areas where a great deal of groundwork has been done, and which second lan-
guage speaker arguably need to control. Figure 2 shows a diagrammatic summary of a
comprehensive construct of second language pragmatics for measurement. Monologic
and dialogic abilities are primary components, control of routine formulae and implica-
ture is less important.
Challenges and future research

Two challenges facing the assessment of a broader construct of L2 pragmatic ability are
the design of assessment instruments that are practical while providing the relevant evi-
dence for claims, and the role of the native speaker standard.

Roever 11
Monologic: extended Dialogic: participation in Routine formulae Implicature

monolog interaction
Production & Production & Production and Comprehension
recognition of: recognition of: recognition of of implicature
• Speech styles • Speech styles routine formulae
• Contextualization cues • Contextualization cues
• Discourse structure • Sequence organization:
pre-sequences, core
sequences,
post-sequences
• Openings & closings
• Repair
• Response to first-pair
parts
• Effect on interlocutor
Figure 2. Components of L2 pragmatic ability with sub-constructs
Assessment instruments
The choice and design of assessment tasks affects the generalization and extrapolation
phases of validation. Generalization makes the observed score independent of the actual
tasks used, and extrapolation applies it to the target domain, implying that the test actu-
ally does measure the targeted skills. To allow generalization, tasks must measure reli-
ably, and to allow extrapolation, they must measure comprehensively. Kane (2006)
discusses the tension between using a few comprehensive but resource-intensive perfor-
mance tasks that cover the target domain broadly but may be less dependable, or using
many highly standardized, narrow measures that have high reliability but only represent
a sliver of the target domain. A trade-off is unavoidable here, and this problem is particu-
larly acute when it comes to the measurement of interactive abilities. DCTs are clearly
too limited in their representation of the target domain, and observing learners in natural
discourse settings is often not feasible, so the compromise assessment approach is the
use of open role plays (Kasper & Dahl, 1991; Kasper & Rose, 2002). Role plays elicit
interactive, extended discourse, combine external and discourse-internal context, and
allow some degree of standardization through the design of the role play situation (Félix-
Brasdefer, 2010). The resulting interactions cannot be considered equivalent to real-
world language use (Aston, 1993; Felix-Brasdefer, 2007; Kasper & Rose, in press) but
test takers do need to deploy similar interactional competencies that they also need in
real-world interaction (Okada, 2009). In fact, it appears that role plays elicit a broader
range of interactional performances than natural interaction (Al-Gahtani, 2010), making
them particularly suitable for assessment purposes.
However, besides being attenuated versions of real-world language use, role plays are
problematic in terms of scoring. The rating of interactive performances is complicated by
the co-constructed nature of interaction, which makes it difficult to extract the contribu-
tion of one participant (the test taker) from the larger interaction (McNamara, 1997), so

interlocutor variations may affect test taker ratings. This has been shown in the context
of oral proficiency interviews (Brown, 2003, 2005) as well as paired learner–learner
interactions (Brooks, 2009; Davis, 2009; Lazaraton & Davis, 2008). Future research on
role plays as assessment tools in second language pragmatics will need to investigate
how a degree of standardization between interlocutors can be achieved to ensure compa-
rability and reliability without compromising the emergent and situated nature of talk,
and how rating guides can be designed to focus on the test taker’s performance while still
taking the effect on the interlocutor into account.
Role plays are also challenging with regard to practicality. They are resource inten-
sive because they require one-on-one interaction with an interlocutor and rating by
human raters, making them expensive to conduct. This is a concern since practicality is
an important aspect of validity: if an assessment is not practical, it is less likely to be
used, and decisions are more likely to be made without it, lowering their defensibility
(Ebel, 1964). To improve practicality, it would be interesting to experiment with ‘online
interlocutors’, located in a central ‘role play call center’, who interact with test takers via
the Internet, for example on a platform like Skype. Empirical research would need to
investigate to what extent the communication channel affects interactions and the defen-
sibility of extrapolation from computer-based talk to face-to-face talk, although a recent
study by Yanguas (2010) indicates a high degree of similarity between Skype video chats
and face-to-face interactions.
The elusive native speaker

Most tests and research instruments in second language pragmatics are referenced against
a native speaker norm, which is problematic in several respects. For one, different types
of pragmatics tasks lend themselves differentially well to benchmarking against an NS
norm. Recognition of routine formulae and interpretation of implicature have high
degrees of native speaker agreement (Roever, 2005) but there is far less agreement for
sociopragmatic judgment (Matsumura, 2001). Benchmarking is also unlikely to be fea-
sible for interactive discourse, where differences have been identified between first and
second language interactions (e.g. Wong, 2004) but these are due to different configura-
tions of interactional resources rather than to measurable gaps in language learners’ prag-
matic abilities (Wagner & Gardner, 2004).
Besides issues of benchmarking and scoring, the concept of ‘native speaker’ itself is
problematic, which has been discussed in applied linguistics in general (e.g. Davies,
2004) as well as pragmatics (e.g. Kasper, 1995). Native speakers are far from a homog-
enous group, but differ along lines of socio-economic status, ethnicity, age, gender,
regional origin, employment, education, and so on. However, in pragmatics assessment
as well as interlanguage and cross-cultural pragmatics studies, both the test taker popula-
tion and the NS comparison group usually consist of convenience samples of university
students, who tend to share a common socio-economic, educational, geographic, and age
background. Where the target domain for a test or research instrument is broader than
just interaction in academic contexts with other language users from the same socio-
economic background, extrapolation is threatened.

Roever 13
Finally, performing like a native speaker may not be appropriate for learners at all
where the target speech community does not expect them to accommodate to their norms
(Hassall, 2004) but rather to follow foreigner-specific norms (McNamara & Roever,
2006). An even more complex issue, which is beyond the scope of the present study, is
lingua franca communication, where norms are situated, emergent and practice-based
(Canagarajah, 2007), although recent research highlights some interesting systematic
tendencies (Baumgarten & House, 2010).
While it is unavoidable for assessments to be benchmarked against some sort of norm,
be it a native speaker or second language user norm, the current reliance on narrow
benchmarking samples deserves critical examination.
Conclusion
Testing of second language pragmatic ability is an important part of the overall construct
of second language communicative competence. However, tests of second language
pragmatics need to include monologic and dialogic extended discourse to allow infer-
ences as to learners’ ability to use language in real time. The integration of extended
discourse tasks is the next research frontier in second language pragmatics assessment.
References
Achiba, M. (2003). Learning to request in a second language: Child interlanguage pragmatics.
Clevedon, UK: Multilingual Matters.
Ahn, R. C. (2005) Five measures of interlanguage pragmatics in KFL (Korean as a foreign
language) learners. Unpublished PhD thesis, University of Hawai‘i at Manoa.
Al-Gahtani, S. (2010). Requests made by L2 learners of Arabic: pragmatic development,
methodological comparison, and politeness. Unpublished PhD thesis, University of Melbourne.
Al-Gahtani, S., & Roever, C. (2010). Role-playing L2 requests: Proficiency and discursive develop-
ment. Unpublished manuscript, University of Melbourne.
Aston, G. (1993). Notes on the interlanguage of comity. In G. Kasper & S. Blum-Kulka (Eds).
Interlanguage pragmatics (pp. 224–250). Oxford: Oxford University Press.
Austin, J. L. (1962). How to do things with words. Oxford: Oxford University Press.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University
Press.
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University
Press.
Baker, C., Emmison, M., & Firth, A. (2001). Discovering order in opening sequences: Calls to a
software helpline. In A. McHoul & M. Rapley (Eds.), How to analyse talk in institutional
settings (pp. 41–56). London: Continuum.
Bardovi-Harlig, K. (2006). On the role of formulas in the acquisition of L2 pragmatics. In
K. Bardovi-Harlig, C. Félix-Brasdefer, & A. S. Omar (Eds.), Pragmatics and language learning
(Vol. 11, pp. 1–28). Honolulu: University of Hawai‘i, National Foreign Language Resource Center.
Bardovi-Harlig, K. (2008). Recognition and production of formulas in L2 pragmatics. In Z.-H.
Han (Ed.), Understanding second language process (pp. 205–222). Clevedon, UK: Multilin-
gual Matters.
Bardovi-Harlig, K. (2009). Conventional expressions as a pragmalinguistics resource: Recognition
and production of conventional expressions in L2 pragmatics. Language Learning, 59(4),
755–795.

Bardovi-Harlig, K., & Hartford, B. S. (1993). Learning the rules of academic talk: A longitudinal
study of pragmatic development. Studies in Second Language Acquisition, 15, 279–304.
Bardovi-Harlig, K., & Hartford, B. (1996). Input in an institutional setting. Studies in Second Lan-
guage Acquisition, 18, 171–188.
Barron, A. (2003). Acquisition in interlanguage pragmatics: Learning how to do things with words
in a study abroad context. Amsterdam: John Benjamins.
Baumgarten, N., & House, J. (2010). I think and I don’t know in English as lingua franca and
native English discourse. Journal of Pragmatics, 42, 1184–1200.
Biesenbach-Lucas, S. (2007). Students writing emails to faculty: An examination of E-politeness
among native and non-native speakers of English. Language Learning & Technology, 11(2),
59–81.
Blum-Kulka, S., House, J., & Kasper, G. (Eds.) (1989). Cross-cultural pragmatics: Requests and
apologies. Norwood, NJ: Ablex.
Bouton, L. (1988). A cross-cultural study of ability to interpret implicatures in English. World
Englishes, 17, 183–196.
Bouton, L. F. (1994). Conversational implicature in the second language: Learned slowly when not
deliberately taught. Journal of Pragmatics, 22, 157–167.
Bouton, L. F. (1999). Developing non-native speaker skills in interpreting conversational implica-
tures in English: Explicit teaching can ease the process. In E. Hinkel (Ed.), Culture in second
language teaching and learning (pp. 47–70). Cambridge: Cambridge University Press.
Brooks, L. (2009). Interacting in pairs in a test of oral proficiency: Co-constructing a better perfor-
mance. Language Testing, 26(3), 341–366.
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Lan-
guage Testing, 20(1), 1–25.
Brown, A. (2005). Interviewer variability in oral proficiency interviews. Frankfurt: Peter Lang.
Brown, J. D. (2001). Six types of pragmatics tests in two different contexts. In K. Rose & G. Kasper
(Eds.), Pragmatics in language teaching (pp. 301–325). New York: Cambridge University Press.
Brown, J. D. (2008). Raters, functions, item types and the dependability of L2 pragmatics tests.
In E. Alcón Soler & A. Martínez-Flor (Eds.), Investigating pragmatics in foreign language
learning, teaching and testing (pp. 224–248). Clevedon: Multilingual Matters.
Brown, P., & Levinson, S. D. (1987). Politeness: Some universals in language usage. Cambridge:
Cambridge University Press.
Byon, A. S. (2004). Sociopragmatic analysis of Korean requests: Pedagogical settings. Journal of
Pragmatics, 36(9), 1673–1704.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-
multimethod matrix. Psychological Bulletin, 56, 81–105.
Canagarajah, S. (2007). Lingua franca English, multilingual communities, and language acquisi-
tion. Modern Language Journal, 91, 923–939.
Canale, M. (1983). From communicative competence to communicative language pedagogy. In
J. Richards & R. Schmidt (Eds.), Language and communication (pp. 2–27). London:
Longman.
Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second
language teaching and testing. Applied Linguistics, 1, 1–47.
Chapelle, C., & Chung, Y. R. (2010). The promise of NLP and speech processing technologies in
language assessment. Language Testing, 27(3), 301–315.
Cohen, A. D., & Shively, R. L. (2007). Acquisition of requests and apologies in Spanish and
French: Impact of study abroad and strategy-building intervention. The Modern Language
Journal, 91(2), 189–212.

Roever 15
Cook, H. M. (2001). Why can’t learners of JFL distinguish polite from impolite speech styles?
In K. Rose & G. Kasper (Eds.), Pragmatics in language teaching (pp. 80–102). Cambridge:
Coulmas, F. (Ed.) (1981). Conversational routine. The Hague: Mouton.
Crystal, D. (1997). A dictionary of linguistics and phonetics. Oxford: Basil Blackwell.
Davies, A. (2004). The native speaker in applied linguistics. In A. Davies & C. Elder (Eds.), Hand-
book of applied linguistics (pp. 431–450). Oxford: Blackwell.
Davis, L. (2009). The influence of interlocutor proficiency in a paired oral assessment. Language
Testing, 26(3), 367–396.
Ebel, R. L. (1964). The social consequences of educational testing. In Proceedings of the 1963
invitational conference on testing problems (pp. 130–143). Princeton, NJ: Educational Testing
Service.
Educational Testing Service (2004). iBT/Next Generation TOEFL Test: Independent speaking
rubrics (scoring standards). Retrieved from www.ets.org/Media/Tests/TOEFL/pdf/ Speaking_
Rubrics.pdf.
Eisenstein, M., & Bodman, J. (1993). Expressing gratitude in American English. In G. Kasper &
S. Blum-Kulka (Eds.), Interlanguage pragmatics (pp. 64–81). Oxford: Oxford University Press.
Félix-Brasdefer, J. C. (2004). Interlanguage refusals: Linguistic politeness and length of residence
in the target community. Language Learning, 54(4), 587–653.
Félix-Brasdefer, J. C. (2007). Natural speech vs. elicited data: A comparison of natural and role
play requests in Mexican Spanish. Spanish in Context, 4(2), 159–185.
Félix-Brasdefer, J. C. (2010). Data collection methods in speech acts performance: DCTs, role
plays, and verbal reports. In A. Martínez-Flor & E. Usó-Juan (Eds.) Speech act performance:
Theoretical, empirical, and methodological issues (pp. 41–56). Amsterdam: John Benjamins.
Gardner, R., & Wagner, J. (Eds.) (2004). Second language conversations. London: Continuum.
Gass, S. M., & Houck, N. (1999). Interlanguage refusals. Berlin: Mouton de Gruyter.
Golato, A. (2003). Studying compliment responses: A comparison of DCTs and recordings of natu-
rally occurring talk. Applied Linguistics, 24(1), 90–121.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics
(Vol. 3, pp. 41–58). New York: Academic Press.
Gumperz, J. (1982). Discourse strategies. Cambridge: Cambridge University Press.
Gumperz, J. (1996). The linguistic and cultural relativity of conversational inference. In J. Gumperz
& S. Levinson (Eds.), Rethinking linguistic relativity (pp. 374–406). Cambridge: Cambridge
University Press.
Günthner, S. (2008). Negotiating rapport in German-Chinese conversation. In H. Spencer-Oatey
(Ed.), Culturally speaking (pp. 207–226). London: Continuum.
Halleck, G. B. (2005). Unsubstantiated claims about the oral proficiency interview. Language
Assessment Quarterly, 2(4), 315–319.
Hassall, T. (2004). Through a glass, darkly: When learner pragmatics is misconstrued. Journal of
Pragmatics, 36, 997–1002.
Heritage, J. (1984). Garfinkel and ethnomethodology. Cambridge: Polity Press.
Heritage, J. (1997). Conversation analysis and institutional talk: Analyzing data. In D. Silverman
(Ed.), Qualitative research (pp. 161–182). London: Sage.
Hill, B., Ide, S., Ikuta, S., Kawasaki, A., & Ogino, T. (1986). Universals of linguistic politeness:
Qualitative evidence from Japanese and American English. Journal of Pragmatics, 10(3),
347–371.
House, J. (1996). Developing pragmatic fluency in English as a foreign language: Routines and
metapragmatic awareness. Studies in Second Language Acquisition, 18, 225–252.

Hudson, T. (2001a). Indicators for cross-cultural pragmatic instruction: Some quantitative tools.
In K. Rose & G. Kasper (Eds.), Pragmatics in language teaching (pp. 283–300). Cambridge:
Hudson, T. (2001b). Self-assessment methods in cross-cultural pragmatics. In T. Hudson &
J. D. Brown (Eds.), A focus on language test development (Technical Report 21, pp. 57–74).
Honolulu: University of Hawai‘i, Second Language Teaching & Curriculum Center.
Hudson, T., Detmer, E., & Brown, J. D. (1992). A framework for testing cross-cultural pragmatics
(Technical Report 2). Honolulu: University of Hawai‘i, Second Language Teaching and Cur-
riculum Center.
Hudson, T., Detmer, E., & Brown, J. D. (1995). Developing prototypic measures of cross-cultural
pragmatics (Technical Report 7). Honolulu: University of Hawai‘i, Second Language Teach-
ing and Curriculum Center.
Hymes, D. (1972) On communicative competence. In J. Pride & J. Holmes (Eds.), Sociolinguis-
tics: Selected readings (pp. 269–293). Harmondsworth: Penguin.
Ishida, M. (2009). Development of interactional competence: Changes in the use of ne during
Japanese study abroad. In H. thi Nguyen & G. Kasper (Eds.), Talk-in-interaction: Multilin-
gual perspectives (pp. 351–385). Honolulu, HI: National Foreign Language Resource Center,
University of Hawai‘i.
Ishihara, N., & Cohen, A. (2010). Teaching and learning pragmatics. Harlow, UK: Pearson.
Jacoby, S., & Ochs, E. (1995). Co-construction: An introduction. Research on Language and
Social Interaction, 2(3), 171–183.
Johnson, M. (2001). The art of non-conversation. New Haven, CT: Yale University Press.
Jung, E. H. (2004), Interlanguage pragmatics: Apology speech acts. In C. L. Moder & A. Martinovic
(Eds.), Discourse across languages and cultures (pp. 99–116). Amsterdam: John Benjamins.
Kanagy, R. (1999). Interactional routines as a mechanism for L2 acquisition and socialization in an
immersion context. Journal of Pragmatics, 31(11), 1467–1492.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed.,
pp. 17–64). Westport, CT: American Council on Education/Praeger.
Kasper, G. (1995). Wessen Pragmatik? Für eine Neubestimmung sprachlicher Handlungskompe-
tenz. Zeitschrift für Fremdsprachenforschung, 6, 1–25.
Kasper, G. (2006). Speech acts in interaction: Towards discursive pragmatics. In K. Bardovi-
Harlig, J. C. Félix-Brasdefer, & A. S. Omar (Eds.), Pragmatics & Language Learning, Vol. 11
(pp. 281–314). University of Hawai‘i at Manoa: National Foreign Language Resource Center.
Kasper, G., & Dahl, M. (1991). Research methods in interlanguage pragmatics. Studies in Second
Language Acquisition, 13, 215–247.
Kasper, G., & Rose, K. R. (2002). Pragmatic development in a second language. Oxford: Basil
Blackwell.
Kasper, G., & Rose, K. R. (in press). Research methods in interlanguage pragmatics. Mahwah,
NJ: Lawrence Erlbaum.
Kasper, G., & Ross, S. J. (2007). Multiple questions in language proficiency interviews. Journal
of Pragmatics, 39, 2045–2070.
Kobayashi, H., & Rinnert, C. (2003). Coping with high imposition requests: High vs. low pro-
ficiency EFL students in Japan. In A. Martínez Flor, E. Usó Juan, & A. Fernández Guerra
(Eds.), Pragmatic competence and foreign language teaching (pp. 161–184). Castelló de la
Plana, Spain: Publicacions de la Universitat Jaume I.
Kormos, J. (1999). Simulating conversations in oral-proficiency assessment: A conversation
analysis of role-plays and non-scripted interviews in language exams. Language Testing, 16,
163–188.

Roever 17
Lazaraton, A. (2002). A qualitative approach to the validation of oral language tests. Cambridge:
Lazaraton, A., & Davis, L. (2008). A microanalytic perspective on discourse, proficiency, and
identity in paired oral assessment. Language Assessment Quarterly, 5(4), 313–335.
Leech, G. (1983). Principles of pragmatics. London: Longman.
Levinson, S. (1983). Pragmatics. Cambridge: Cambridge University Press.
Liu, J. (2006). Measuring interlanguage pragmatic knowledge of EFL learners. Frankfurt: Peter
Lang.
Lorenzo-Dus, N. (2001). Compliment responses among British and Spanish university students:
A contrastive study. Journal of Pragmatics, 33(1), 107–127.
McNamara, T. F. (1997). ‘Interaction’ in second language performance assessment: Whose perfor-
mance? Applied Linguistics, 18(4), 446–466.
McNamara, T. F., & Roever, C. (2006). Language testing: The social dimension. Oxford: Basil
Blackwell.
Mao, L. R. (1994). Beyond politeness theory: ‘face’ revisited and renewed. Journal of Pragmatics,
21(5), 451–486.
Martínez-Flor, A., & Usó-Juan, E. (Eds.) Speech act performance: Theoretical, empirical, and
methodological issues. Amsterdam: John Benjamins.
Matsumoto, Y. (1988). Reexamination of the universality of face: Politeness phenomena in Japa-
nese. Journal of Pragmatics, 12(4), 403–426.
Matsumura, S. (2001). Learning the rules for offering advice: A quantitative approach to second
language socialization. Language Learning, 51(4), 635–679.
Matsumura, S. (2003). Modelling the relationships among interlanguage pragmatic development,
L2 proficiency, and exposure to L2. Applied Linguistics, 24(4), 465–491.
Matsumura, S. (2007). Exploring the aftereffects of study abroad on interlanguage pragmatic
development. Intercultural Pragmatics, 4(2), 167–192.
Meier, A. J. (1998). Apologies: What do we know? International Journal of Applied Linguistics,
8, 215–231.
Mey, J. L. (2001). Pragmatics: An introduction (2nd ed.). Oxford: Blackwell.
Nattinger, J., & DeCarrico, J. (1992). Lexical phrases and language teaching. Oxford: Oxford
University Press.
Okada, Y. (2009). Role-play in oral proficiency interviews: Interactive footing and interactional
competencies. Journal of Pragmatics. DOI: 10.1016/j.pragma.2009.11.002
Pavlidou, T. (2008). Interactional work in Greek-German telephone conversations. In H. Spencer-
Oatey (Ed.), Culturally speaking (pp. 118–135). London: Continuum.
Roever, C. (1996). Linguistische Routinen: Systematische, psycholinguistische und fremdspra-
chendidaktische Überlegungen. Fremdsprachen und Hochschule, 46, 43–60.
Roever, C. (2005). Testing ESL pragmatics. Frankfurt: Peter Lang.
Roever, C. (2006). Validation of a web-based test of ESL pragmalinguistics. Language Testing,
23(2), 229–256.
Roever, C. (2007). DIF in the assessment of second language pragmatics. Language Assessment
Quarterly, 4(2), 165–189.
Roever, C. (2008). Rater, item, and candidate effects in discourse completion tests: A FACETS
approach. In A. Martinez-Flor, & E. Alcon (Eds.), Investigating pragmatics in foreign lan-
guage learning, teaching, and testing (pp. 249–266). Clevedon, UK: Multilingual Matters.
Roever, C. (2010a). Effects of native language in a test of ESL pragmatics: A DIF approach. In
G. Kasper, H. thi Nguyen, D. R. Yoshimi, & J. Yoshioka (Eds.), Pragmatics and language
learning (Vol. 12, pp. 187–212). Honolulu, HI: National Foreign Language Resource Center.

Roever, C. (2010b). Testing implicature under operational conditions. Unpublished manuscript,

University of Melbourne.
Roever, C. (in press). What learners get for free (and when): Learning of routine formulae in ESL
and EFL environments. The ELT Journal.
Roever, C., Elder, C., Harding, L. W., Knoch, U., McNamara, T. F., Ryan, K., & Wigglesworth,
G. (2009). Social language tasks: Speech acts and implicatures. Unpublished manuscript,
University of Melbourne.
Salsbury, T., & Bardovi-Harlig, K. (2001). ‘I know your mean, but I don’t think so.’ Disagreements
in L2 English. In L. F. Bouton (Ed.), Pragmatics and language learning (Vol. 10, pp. 131–151).
University of Illinois, Urbana-Champaign: Division of English as an International Language.
Scarcella, R. (1979). On speaking politely in a second language. In C. A. Yorio, K. Perkins, &
J. Schachter (Eds.) On TESOL ’79 (pp. 275–287). Washington, DC: TESOL.
Schegloff, E. A. (1980). Preliminaries to preliminaries: ‘Can I ask you a question?’ Sociological
Inquiry, 50 (3/4), 104–152.
Schegloff, E. A. (1988). Presequences and indirection: Applying speech act theory to ordinary
conversation. Journal of Pragmatics, 12, 55–62.
Schegloff, E. A. (1991). Reflections on talk and social structure. In D. Boden & D. H. Zimmerman
(Eds.), Talk and social structure (pp. 44–71). Cambridge: Polity Press.
Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis.
Cambridge: Cambridge University Press.
Schegloff, E. A., & Sacks, H. (1973). Opening up closing. Semiotica, 8(4), 289–327.
Searle, J. (1969). Speech acts. Cambridge, UK: Cambridge University Press.
Searle, J. (1975). Indirect speech acts. In P. Cole & J. L. Morgan (eds.), Syntax and Semantics, 3:
Speech Acts (pp. 59–82). New York: Academic Press.
Shimizu, T. (2009). Influence of learning context on L2 pragmatic realization: A comparison
between JSL and JFL learners’ compliment responses. In N. Taguchi (Ed.), Pragmatic compe-
tence (pp. 167–198). Berlin: Mouton de Gruyter.
Sidell, J. (2009). Conversation analysis: Comparative perspectives. Cambridge: Cambridge
University Press.
Tada, M. (2005). Assessment of EFL pragmatic production and perception using video prompts.
Unpublished doctoral dissertation, Temple University.
Taguchi, N. (2005). Comprehending implied meaning in English as a foreign language. The Mod-
ern Language Journal, 89(4), 543–562.
Taguchi, N. (2007). Development of speed and accuracy in pragmatic comprehension in English
as a foreign language. TESOL Quarterly, 41(2), 313–338.
Taguchi, N. (2008). Pragmatic comprehension in Japanese as a foreign language. The Modern
Language Journal, 92(4), 558–576.
Takimoto, M. (2009). Exploring the effects of input-based treatment and test on the development
of learners’ pragmatic proficiency. Journal of Pragmatics, 41, 1029–1046.
Taleghani-Nikazm, C. (2002). A conversation analytical study of telephone conversation openings
between native and non-native speakers. Journal of Pragmatics, 34, 1807–1832.
Terasaki, A. (2004). Pre-announcement sequences in conversation. In G. Lerner (Ed.), Conversa-
tion analysis: Studies from the first generation (pp. 171–224). Amsterdam: John Benjamins.
Trosborg, A. (1995). Interlanguage pragmatics: Requests, complaints and apologies. Berlin:
Mouton de Gruyter.
van Lier, L. (1989). Reeling, writing, drawing, stretching, and fainting coins: Oral proficiency
interviews as conversation. TESOL Quarterly, 23, 489–508.
van Lier, L., & Matsuo, N. (2000). Varieties of conversational experience: Looking for learning
opportunities. Applied Language Learning, 11, 265–287.

Roever 19
Wagner, J., & Gardner, R. (2004). Introduction. In R. Gardner & J. Wagner (Eds.), Second lan-
guage conversations (pp. 1–17). London: Continuum.
Walters, F. S. (2004). An application of conversation analysis to the development of a test of sec-
ond language pragmatic competence. Unpublished doctoral dissertation, University of Illinois
at Urbana-Champaign.
Walters, F. S. (2007). A conversation-analytic hermeneutic rating protocol to assess L2 oral prag-
matic competence. Language Testing, 24(2), 155–183.
Walters, F. S. (2009). A conversation analysis-informed test of L2 aural pragmatic comprehen-
sions. TESOL Quarterly, 43(1), 29–54.
Warga, M., & Scholmberger, U. (2007), The acquisition of French apologetic behavior in a study
abroad context. Intercultural Pragmatics, 4(2), 221–251.
Widdowson, H. G. (1989). Knowledge of language and ability for use. Applied Linguistics, 10(2),
128–137.
Wildner-Bassett, M. (1986). Teaching and learning ‘polite noises’: Improving pragmatic aspects of
advanced adult learners’ interlanguage. In G. Kasper (Ed.), Learning, teaching and communica-
tion in the foreign language classroom (pp. 163–178). Aarhus, Denmark: Aarhus University Press.
Wong, J. (2004). Some preliminary thoughts on delay as an interactional resource. In R. Gardner &
J. Wagner (Eds.), Second language conversations (pp. 114–131). London: Continuum.
Yamashita, S. O. (1996). Six measures of JSL pragmatics (Technical Report 14). Honolulu: Uni-
versity of Hawai‘i, Second Language Teaching and Curriculum Center.
Yanguas, Í. (2010). Oral computer-mediated interaction between L2 learners: It’s about time! Lan-
guage Learning & Technology, 14(3), 79–93.
Yoshitake, S. S. (1997). Measuring interlanguage pragmatic competence of Japanese students of
English as a foreign language: A multi-test framework evaluation. Unpublished doctoral dis-
sertation, Columbia Pacific University, Novata, CA.
Young, R. (2002). Discourse approaches to oral language assessment. Annual Review of Applied
Linguistics, 22, 243–262.
Young, R., & He, A. (Eds.) (1998). Talking and testing. Amsterdam: John Benjamins.
Young, R., & Miller, E. R. (2004). Learning as changing participation: Discourse roles in ESL
writing conferences. Modern Language Journal, 88(4), 519–535.
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native
spontaneous speech in tests of spoken English. Speech Communication, 51, 883–895.
Zimmerman, D. H. (1999). Horizontal and vertical comparative research in language and social
interaction. Research on Language and Social Interaction, 32, 195–203.

Testing of Second Language

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Testing of Second Language

Diunggah oleh

Hak Cipta:

Format Tersedia

/$1*8$*(

Testing of second language ﻿XX(X) 1–19

pragmatics: Past and future Reprints and permission:

Testing of second language pragmatics is a growing area of research and practice in L2

Construct and interpretive validation argument

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Interlanguage pragmatics research

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

What would you say in this situation?

You say: __________________________________

Figure 1. DCT item

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Constructs, competencies, and inferences

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Measuring a broader construct of pragmatic ability

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Challenges and future research

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Monologic: extended Dialogic: participation in Routine formulae Implicature

Figure 2. Components of L2 pragmatic ability with sub-constructs

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

The elusive native speaker

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Roever, C. (2010b). Testing implicature under operational conditions. Unpublished manuscript,

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Downloaded from ltj.sagepub.com at PENNSYLVANIA STATE UNIV on March 4, 2016

Anda mungkin juga menyukai

/$18$(

Testing of second language XX(X) 1–19