Anda di halaman 1dari 35

Language

Testing
http://ltj.sagepub.com/

How assessing reading comprehension with multiple-choice questions


shapes the construct: a cognitive processing perspective
Andr A. Rupp, Tracy Ferne and Hyeran Choi
Language Testing 2006 23: 441
DOI: 10.1191/0265532206lt337oa
The online version of this article can be found at:
http://ltj.sagepub.com/content/23/4/441

Published by:
http://www.sagepublications.com

Additional services and information for Language Testing can be found at:
Email Alerts: http://ltj.sagepub.com/cgi/alerts
Subscriptions: http://ltj.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://ltj.sagepub.com/content/23/4/441.refs.html

>> Version of Record - Oct 1, 2006


What is This?

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

How assessing reading


comprehension with multiple-choice
questions shapes the construct:
a cognitive processing perspective
Andr A. Rupp Humboldt University of Berlin,
Tracy Ferne and Hyeran Choi University of Ottawa

This article provides renewed converging empirical evidence for the hypothesis
that asking test-takers to respond to text passages with multiple-choice questions induces response processes that are strikingly different from those that
respondents would draw on when reading in non-testing contexts. Moreover,
the article shows that the construct of reading comprehension is assessment
specific and is fundamentally determined through item design and text selection. The data come from qualitative analyses of 10 cognitive interviews conducted with non-native adult English readers who were given three passages
with several multiple-choice questions from the CanTEST, a large-scale
language test used for admission and placement purposes in Canada, in a
partially counter-balanced design. The analyses show that:

There exist multiple different representations of the construct of reading


comprehension that are revealed through the characteristics of the items.
Learners view responding to multiple-choice questions as a problemsolving task rather than a comprehension task.
Learners select a variety of unconditional and conditional response
strategies to deliberately select choices; and
Learners combine a variety of mental resources interactively when
determining an appropriate choice.

These findings support the development of response process models that are
specific to different item types, the design of further experimental studies of test
method effects on response processes, and the development of questionnaires
that profile response processes and strategies specific to different item types.

I Introduction
When people read in a non-testing context, they do not answer
multiple-choice (MC) questions in their heads. While this statement
Address for correspondence: Andr A. Rupp, Institut zur Qualittsentwicklung im Bildungswesen,
Humboldt Universitt zu Berlin, Unter den Linden 6 10099 Berlin, Germany; email:
Andre.Rupp@IQB.hu-berlin.de
Language Testing 2006 23 (4) 441474 10.1191/0265532206lt337oa

2006 SAGE Publications

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

442

Assessing reading with multiple-choice questions

may appear trivial to some, its counterpart has given rise to


numerous research programs in the applied linguistic and language
testing literature over several decades. That is, if readers do not
answer MC questions in their head when they read in non-testing
contexts, what kinds of evidence are needed to claim that, when
test-takers do answer MC questions about a passage they have just
read, the assessment offers evidence of an understanding of the
passage?
This article shows that one of the fundamental challenges in
answering this question lies with the fact that reading
comprehension is a complex construct and that the process of making sense of printed text in non-testing contexts is a complex, fluid,
and purpose-driven process. However, the design of items and the
selection of texts on a reading comprehension assessment
operationalizes this construct in a very particular way. It is shown in
this article that the action of assessing a certain level of textual
comprehension with MC questions changes the process itself and
induces supplementary processes that are, in their intensity, unique
to the testing context.
To support these theses with empirical evidence, results from
cognitive interviews on 10 adult readers whose native language is not
English (i.e. ESL readers) are presented. The participants were
observed and interviewed on various aspects of their response
processes when answering MC comprehension questions on a standardized large-scale language test in Canada (see Farr et al., 1990).
Based on qualitative analyses of the interviews, characteristics of
their response processes for MC items were analysed to compare and
contrast them with theoretical predictions about reading strategies
and reading processes in a non-testing context. The objective of the
analyses was not only to develop response process profiles for
examinees, but, specifically, to also link the qualitative findings to
empirical quantitative findings in the reading assessment literature
on strategy selection, test method effects, and item characteristic
prediction. In addition, a strategy selection and response process
inventory was developed for administration with large-scale
language tests of reading comprehension that utilize a MC format
and is available on request from the first author.
The article is divided into the following sections. In the next
section, the theoretical foundations for investigations of response
processes to MC items are reviewed. This includes a discussion on
the construct of reading comprehension, different models of reading

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 443

comprehension for mature readers, and different models of responding to MC reading comprehension questions for mature readers. The
subsequent section describes the participants, the methodology, and
the instruments that were used to collect and analyse the data in this
study. Following this, results from qualitative analyses of the interview transcripts are presented and are linked to relevant findings in
the reading assessment literature. The article then closes with a section that presents directions for future research and reflects on the
strengths and limitations of this study.
II Deconstructing reading comprehension
Without doubt, there is no such thing as the comprehension of a
text. As Kintsch (1998: 2) noted in his book entitled Comprehension:
a paradigm for cognition:
The terms understanding and comprehension are not scientific terms but are
commonsense expressions. As with other such expressions, their meaning is
fuzzy and imprecise . . . What seems most helpful here is to contrast
understanding with perception on the one hand and with problem solving on
the other.

Similarly, in the book Reading and understanding, the authors state:


this is not a chapter about reasoning, although the point it makes is that in
order to understand [a text] we often need to make inferences about the relationships of the agents and events. These inferences often require the use of
our existing knowledge, and are often calculated as we read them; they are
processed on-line. (Underwood and Batt, 1996: 190)

Both of these quotes can be used to underline two key facts. First,
comprehension is more complex than mere perception even
though comprehension processes draw upon perception processes.
Second, and most relevant for the purpose of this article,
comprehension of texts in non-testing contexts is not simply the
result of a continual, conscious, and linear engagement in problemsolving activities, which, as will be shown in the article, contrasts
sharply with responding to MC questions about a passage in a testing context.
To provide a detailed theoretical context for the qualitative results
used to corroborate this contention later, the following three sections
of the article discuss a processing, reader-purpose, and genre
perspective on reading comprehension assessment (see Enright
et al., 2000).

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

444

Assessing reading with multiple-choice questions

1 Foundational component processes in reading comprehension


Reading is, first and foremost, a purposeful activity (Alderson,
2000). However, while the purpose for reading influences a readers
type of engagement with a text which is discussed in more detail
in the next section of the article there are several component
processes of reading that are essential for succeeding at almost any
purpose of reading that can be imagined. The complexity of the
reading comprehension process has been echoed over several
decades by numerous reading researchers who have shown that an
integrated comprehension of a text relies heavily on the fluid, accurate, and efficient application of bottom-up processes (e.g.
Rumelhart 1980; Stanovich, 1980; Anderson and Pearson, 1984;
Carrell, 1984; Aebersold and Field, 1997). In a recent synthesis of a
large body of research on reading comprehension, Carver (1997)
proposed a model where reading comprehension is, at its most fundamental level, composed of fluency, word recognition accuracy, and
rate of processing, the latter of which might combine processing efficiency and reading rate.
Similarly, to highlight the importance of the efficiency and accuracy of bottom-up skills, the term Matthew Effect was coined by
Stanovich (1986) to represent the notion that automaticity of lowerlevel reading skills, in particular phonological awareness and word
recognition skills, are crucial to the future development of successful reading comprehension. The term captures the notion that those
readers who have poorly automatized low-level reading skills and
who, by implication, will read less than more proficient readers will
eventually lose even their limited reading ability.
One of the major objectives of reading connected text in nontesting contexts, however, is not merely the decoding of individual
words but the development of a coherent understanding of texts. To
describe and model the processes that lead to such a level of comprehension, the verbal efficiency theory (e.g. Perfetti, 1985; 1997) can
be utilized, which postulates that a reader is proficient when each
of the component processes in reading is as efficient as possible. It
specifically postulates three key processing levels required for successful text comprehension: lexical access, propositional encoding,
and text modeling. Lexical access refers to the process where words
are recognized and matched to both a concept and phonological representation. Propositional encoding takes place when the recognized
meanings of individual words are integrated with the meanings of
other words in the immediate context to form units of meaning.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 445

Finally, text modeling refers to the integration of propositions into a


coherent mental representation of the text. Therefore, if the goal of
reading is elaborative and efficient text modeling, lexical access has
to be automatic to support the encoding of propositions and their
integration.
To further describe, operationalize, and model the comprehension
process, the construction-integration (CI) model (e.g. Kintsch, 1998)
has proven to be a useful frame for investigating reading comprehension. It operationalizes propositions as main idea units of a text that
are generated by the relational properties between nouns, predicates,
and modifiers. In addition to formalizing textual features, a strong
emphasis is placed on the interaction between the reader and the text.
A distinction is made between the microstructure of a text, which
refers to local properties of the text, and the macrostructure of a text,
which refers to global properties of the text, both of which can either
facilitate or debilitate comprehension processes depending on the
reader who interacts with the text. Similarly, the theory distinguishes
between a textbase, which is a literal and exact structural as well as
semantic representation of the text, and a situation model, which is a
representation of the text that is achieved through integrating the
textbase with prior knowledge through elaboration and inference
processes. In other words:
neither the micro- nor the macrostructure of the situation model is necessarily
the same as the micro- or macrostructure of the textbase, for the reader may
deviate from the authors design and restructure a text both locally and globally according to his or her own knowledge and beliefs. (Kintsch, 1998: 50).

Indeed, text features can have an impact on how a textbase and a


situation model are formed for a specific purpose for reading such as
learning from text (e.g. McNamara and Kintsch, 1996).
The impact of text features on comprehension processes, specifically those features that facilitate coherent mental representations of
text, are similarly elaborated in the structure-building framework
(e.g. Gernsbacher, 1990; 1997). In this framework, comprehension is
viewed as a process of building mental representations of text in the
form of a nested architecture that consists of the three fundamental
processes: activation, enhancement, and suppression. Comprehension
of a text is fundamentally aided by coherence cues, which help readers to map new information onto existing mental structures. This
process can be viewed through a neurolinguistic informationprocessing lens, where similar levels of a mental representation are
associated with clusters of similarly activated memory cells. In this

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

446

Assessing reading with multiple-choice questions

manner, memory cells can either enhance or suppress the activation


level of other cells, the fluidity of which represents the building and
rearrangement of mental structures, which similarly requires the
automaticity of lower-level skills.
To illustrate how these theories and models have informed work in
reading comprehension assessment, one can look at a prominent
model for responding to MC reading comprehension questions by
Embretson and Wetzel (1987). The model can be viewed as consisting of two components, a component for the comprehension of the
text itself, driven by the CI theory, and a component for the selection
of an answer to an item, driven by two processes of falsification and
confirmation. In essence, the model assumes that comprehension of
the text precedes the answer selection process, despite the fact that
there can be some back and forth between the text and the answers
during the selection process, and that selection of a choice is a logical process of elimination of incorrect distractors.
Consequently, the theory predicts that it is possible to code different
features of the text, the question stem, the correct answer, and the distractors to operationalize the intensity with which test-takers engage in
each of the component processes and to empirically verify hypotheses
about which features affect response processes in which manner
through item characteristics. In such studies, a typical response variable
is item difficulty, measured either by the percentage of test-takers who
respond correctly to an item or an IRT parameter estimate as well as
response time. The goal of studies that explicitly draw on this model
has been to predict item difficulty and to capture as well as to better
understand the response process itself (e.g. Gorin, 2002; 2005; Gorin
and Embretson, in press; see also Enright et al., 2000).
All of the theories and models of reading comprehension and
responding to MC reading comprehension questions described above
suggest that the key to successful comprehension is a readers ability
to efficiently, accurately, and automatically extract and organize information from texts and to integrate it with existing knowledge to form
a coherent mental representation of the text. It is generally assumed
that readers engage in similar processes both in non-testing and
in test-taking situations when they respond to either selected- or
constructed-response items about passages. However, the purpose of
reading in a testing situation is not the same as that of other situations
in which readers read for personal interest, read to gain pleasure, or
read to participate in society. It is, thus, necessary to understand how
the purpose of reading can impact the types of strategies and skills
that readers utilize, which is the objective of the next section.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 447

2 Effects of reading purpose on strategies and skills


One of the main implications of a purpose for reading is that it guides
readers in the selection of their strategies, the range of skills they
draw on, and the intensity with which they draw on each skill. This is
particularly pertinent for MC reading comprehension tests, because
they present a context in which readers apply a variety of strategies
that are unique and different from those utilized in a non-testing context. Importantly, it is first necessary to distinguish between reading
strategies and reading skills, because there is a great deal of overlap
in what is determined to be a skill and that which is a strategy (e.g.
Wenden and Rubin 1987; Alderson, 2000; Grabe, 2000).
Skills are typically considered to be automatic internalized
reading abilities possessed by learners, which unconsciously facilitate their reading comprehension in both non-testing and testing situations. As discussed in the previous section, these include abilities
that could be termed more lower-level such as the phonological
processing of symbols and the disambiguation of competing word
meanings, to those that could be termed more higher-level such as
the utilization of background knowledge and the anticipation of
future propositions.
Strategies, by contrast, refer to conscious techniques and tactics
deliberately employed by a reader for successful reading such as the
use of a dictionary, the underlining of key words, or the skimming
and scanning of certain sections (e.g. Clarke, 1979; Barnett, 1989;
Anderson, et al., 1991; Aebersold and Field, 1997; Alderson, 2000;
Birch, 2002; Kitao and Kitao, 2002).
Research demonstrates that readers adjust their strategies and
engage in the type of comprehension process that most suit their purpose for reading (e.g. van Dijk, 1985; Goldman, 1997; Alderson,
2000). The main purpose of responding to MC questions about
reading passages is, undoubtedly, to answer them correctly, and so
test-takers select their strategies accordingly to optimize their
chances for success. A variety of factors have a potential influence on
the selection of strategies in testing situations such as the level of linguistic difficulty of the text, the topic of the text, the linguistic level
of the questions, the content and phrasing of the questions, the location of information from the correct answers and the distractors, as
well as the level of cognitive activity required of the respondent
(Nevo, 1989: 202).
One can further argue that strategy selection in test-taking situations
is further guided by the specific testing format that is used (e.g. MC,

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

448

Assessing reading with multiple-choice questions

constructed response, paragraph sorting, oral cloze). Consequently,


test-takers often rely on test preparation materials, which suggest the
most efficient strategies for successful performance. In this regard,
test-taking strategies for reading comprehension tests can be classified
into general, text-related, and item-related strategies, and these are differentiated in this context from general test-taking strategies (see
Allan, 1992). We compiled a list of commonly recommended strategies from representative test preparation materials published during
the period of 200005 for large scale standardized tests that contain
reading comprehension sections such as the SAT (Scholastic Aptitude
Test), the TOEFL, the GRE (Graduate Record Examinations), and the
TOEIC; for illustration purposes, commonly recommended strategies
for the GRE are presented in Table 1 and commonly recommended
strategies for the TOEFL are presented in Table 2.
Both tables show that test-taking guides are not consistent with the
kinds of recommendations that are made for test-takers, neither
within nor across the test that the guide is developed for. For example, some guides recommend reading the text first while others recommend reading the questions first and while some recommend
looking for key words, others do not recommend this strategy at all.
This diversity in recommended strategies may partially explain the
range of strategies that test-takers use to respond to MC questions,
the lack of confidence they often feel in their preparation, and the
lack of consensus, on part of the test developers, of what cognitive
processes are actually involved in responding to the MC questions on
their tests. Understanding the types of strategies test-takers use and
the skills that they draw on when they respond to test questions can,
therefore, provide important information about how their response
processes likely differ from reading comprehension processes in a
non-testing context. This would, in turn, aid test preparation developers to be more consistent in their recommendations and to base
these on converging empirical evidence.
3 The relationship between genre conventions and
reading purpose
If reading comprehension is guided by purpose, it is necessary to
also discuss the notion of genres, text types, or registers, which are
similar, albeit not identical notions, representing differences in text
organization and function (see Enright et al., 2000). Texts within
different genres fulfill different purposes and are characterized by
different linguistic conventions, which, in turn, are recognized by

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Question-related strategies:
Answer the questions you know first
Use the process of elimination to make the best educated guess
Avoid choices that are too specific or too broad
Always look for choices that sound consistent with the main idea
Use prior knowledge to answer questions

Text-related strategies:
Read the first sentence of each paragraph for the main idea
Look for how the text is organized and ignore details
Try to predict where the authors points are leading
Get the gist of each paragraph
Pay special attention to the first part of the passage
Find short sentences within paragraphs
Preview key sentences
When reading, form ideas about the text
Relate what you read to what you already know
Look for context clues for the meaning of an unfamiliar word

General strategies:
Budget your time
Read the questions first before reading the text
Read the text first before reading the questions
Identify major reading question types
Look for key words
Remember that the questions follow the order of the passage
Dont try to read every word
Try to summarize after you read

Lurie et al.
(2005)

*
*

*
*
*
*

Goodman
et al. (2005)

Martinson
(2005)

*
*

Green and
Wolf (2000)

Source material

List of representative recommended strategies for reading comprehension sections of the GRE

Recommended strategy

Table 1

*
*

Goodman
et al. (2004)

Andr A. Rupp, Tracy Ferne and Hyeran Choi 449

Text-related strategies:
Read the first sentence of each
paragraph for the main idea
Look for how the text is organized
and ignore details
Try to predict where the authors
points are leading

General strategies:
Budget your time
Read the text first before reading
the questions
Read the questions first before
reading the text
Identify major reading
question types
Look for key words
Remember that the questions follow
the order of the passage
Dont try to read every word
Try to summarize after you read

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Sullivan
et al. (2004)

Hinkel
(2004)

Lougheed
(2003)

Rymniak and
Shanks (2002)

Rogers
(2005)

Source material
Shmailo
(2002)

*
*

Gallagher
(2000)

List of representative recommended test-taking strategies for reading comprehension sections of the TOEFL

Recommended strategy

Table 2

Sullivan
et al. (2000)

450
Assessing reading with multiple-choice questions

Question-related strategies:
Answer the questions you know first
Use the process of elimination to
make the best educated guess
Avoid choices that are too specific
or too broad
Always look for choices that
sound consistent with the main idea
Use prior knowledge to
answer questions

Get the gist of each paragraph


Pay special attention to the first
part of the passage
Find short sentences
within paragraphs
Preview key sentences
When reading, form ideas
about the text
Relate what you read to what
you already know
Look for context clues for the
meaning of an unfamiliar word

Andr A. Rupp, Tracy Ferne and Hyeran Choi 451

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

452

Assessing reading with multiple-choice questions

social groups who share such knowledge of use in appropriate contexts (Grabe, 2000; 2002). Therefore, questions about a text from a
given genre A are likely to engage learners in very different response
processes than questions for a text from an alternative genre B due to
the inherent characteristics of the genres.
A recent study by Kobayashi (2002) investigated the impact of
different response formats and genre types on the performance of
Japanese adult ESL readers. The author found that text organization,
which is related to text type or genre, did not lead to strong performance differences for test formats that measured less integrative comprehension such as cloze tests or for learners of limited ESL
proficiency. On the contrary, stronger performance differences due to
organizational differences in texts were observed for testing formats
that measure more integrative forms of comprehension, especially
for learners with higher levels of ESL proficiency. This interaction
effect between learner competency and testing format highlights,
again, that the construct of reading comprehension that is assessed
and the processes that learners engage in will change as a result of
the testing format and text types used.
4 Levels of reading comprehension assessed with MC questions
The discussions so far have not addressed differences in item formats
and their influence on the response process. This article focuses
specifically on MC questions, but, even within an item format such
as MC, the quality and intensity of the reading comprehension
process that test-takers engage in can vary considerably across items.
Specifically, it appears reasonable to state that an analysis of the
structure and content of MC questions on any reading comprehension test will typically reveal that very different levels of reading
comprehension are assessed with different items. This becomes evident in typologies of such questions. A glance at the types of MC
questions published in testing preparation manuals for tests like the
TOEFL or the SAT reveal that they are likely to assess a mixture of
what could be termed local and global comprehension processes,
which will force readers to draw on component and integrative
processes to different degrees.
For example, using the TOEFL reading comprehension section as
a basis, Sheehan et al. (1999) differentiate between MC questions
that ask readers to identify the main idea of a passage, to directly
search a specific piece of information, to disambiguate vocabulary in
context, and to resolve anaphoric or pronominal references. As one

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 453

reviewer pointed out, however, the newest version of the TOEFL


contains a wider range of item types and includes three basic comprehension, three inference, and one reading-to-learn item types (see
also Enright et al., 2000). Moreover, Sheehan and Ginther (2001)
illustrate how MC questions can be classified by whether they ask
test-takers to either augment or reduce information directly stated in
the text (i.e. an augmentation or reduction of the textbase in CI terminology) or to either augment or reduce information resulting from
an integrated understanding of the text (i.e. an augmentation or
reduction of the situation model in CI terminology). All of these
typologies underscore that different facets of reading comprehension
are typically assessed by different item types. Unfortunately, all
typologies discussed above appear to be more well-known to test
developers and researchers than to the test users and test-takers for
whom reading comprehension often remains an undifferentiated
whole.
5 Predicting MC item characteristics through cognitive
psychometric approaches
The need and desire for more fine-grained differentiations can be
illuminated by modern psychometric research paradigms aimed at
providing fine-grained cognitive diagnostic information to testtakers and other stakeholders (see, e.g. Nichols et al., 1995; Junker,
1999; Leighton, 2004), which can be used to explicitly detail componential skills that make up reading comprehension for sets of
items (e.g., Buck et al., 1997). As a result, research has shown that
separate processing models for the different types of MC reading
comprehension items seem to be necessary to properly explain
response processes for reading comprehension test (e.g., Sheehan
and Ginther, 2001).
Interestingly, there is a rather long history of studies that predict
item difficulty empirically. However, many of them do so without a
detailed account of the processes that may have given rise to such
differences (e.g., Drum et al., 1981; Freedle and Kostin, 1991; 1992;
1993; Rupp et al., 2001). While the definition of variables in these
studies is undoubtedly driven by substantive linguistic theories of
reading such as those discussed earlier, a specific model of how
learners respond to the questions is not explicated. As a result, from
the perspective of a more elaborated model of reading comprehension, the variables in these studies operationalize a myriad of components of higher-order comprehension processes for texts from

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

454

Assessing reading with multiple-choice questions

different genres so that, not surprisingly, results are difficult to combine


across studies. One of the persistent findings across studies is, perhaps, that variables coding item features such as the plausibility of
the distractors appear to be more predictive of item difficulty than the
characteristics of the texts that the items refer to as there is variation
in the types of tasks these represent and, consequently, the response
processes they engage test-takers in.
These results are in alignment with findings from a series of studies
that have investigated how important the text itself is for responding
correctly to MC questions on standardized tests, which have shown
consistent and systematic above-chance performance for conditions
when items were answered alone (Powers and de Leung, 1995; Katz
et al., 1990; 1991; Katz and Lautenschlager, 1994; 2001). These
studies provide more evidence that responding to MC questions on a
reading comprehension test might draw much more on verbal reasoning abilities relevant to a problem-solving context than on general
higher-order comprehension abilities.
6 Summary
In conclusion, all of the previous discussions have highlighted how
responding to MC questions on reading comprehension tests is a
complex process, which might be very consistent with reading comprehension in non-testing context as far as the requirements for isolated lower-level skills are concerned, but which might be very
different as far as the engagement of higher-level skills are concerned. Since testing provides a unique purpose for reading, it
impacts the strategies that test-takers draw on when responding to
questions, which are mediated by characteristics of the test input
such as text type and question type. In other words, it is reasonable
to hypothesize that responding to MC reading comprehension questions on many standardized reading comprehension tests is much
more a problem-solving process relying heavily on verbal reasoning
than a fluid process of integrating propositions to arrive at a connected mental representation of a text. Thus, if we used the term
reading comprehension broadly without detailing the types and
intensities of comprehension processes that readers actually engage
in, we would, indeed, be using a vague term that is meaningless
without further explication.
The following section now presents results from cognitive interviews with mature adult ESL readers who responded to MC reading

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 455

comprehension questions from a large-scale assessment. The major


purpose of this study was to investigate, through think-aloud
segments and elicited feedback from respondents about their
response processes:

what conscious strategies test-takers deliberately select when


they respond to questions;
what unconscious skills the test-takers draw on when they respond
to questions;
how characteristics of the passages and the questions influence
these conscious choices and unconscious engagements.

Therefore, apart from gaining a better understanding of how testtakers interact with the materials to successfully answer questions,
the goal of this study was to illustrate how reading comprehension is
a construct with many fine-grained differentiations and shades that
become operationalized differentially across passages and questions.
As stated earlier, a secondary goal of this study was to develop a
strategy inventory and a questionnaire that could be administered in
both small-scale and large-scale testing settings, which is available
from the first author on request and is currently being field tested.

III Method
The following sections describe the participants that were recruited
for this study, the instruments that were used to elicit data on
response processes and perceptions of texts and MC questions, and
the procedure for administering these instruments.
1 Participants
There were 10 participants in this study that were recruited from second language courses at a large Canadian university in Ontario; the
participants were selected because they had either recently taken the
CanTEST or were planning on taking it soon. Moreover, in dialogue
with the CanTEST administration office, the characteristics of our
participants matched those of typical test-takers in Ontario; Table 3
shows the characteristics of the participants.
The participants came from Argentina, Brazil, Canada, China,
Damaskus, Syria, and Sri Lanka; 3 participants were male and 7
were female. They had generally had taken several MC tests on other

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

27

34

19

33

19

20

18

26

25

10

Female

Female

Female

Female

Female

Male

Gender

Female

Brazil

Male

Argentina Female

Syria

Sri Lanka Male

Canada

China

Canada

China

China

Canada

Country

1.3

2.1

6.2

6.9

19.0

3.8

19.4

3.0

1.7

12.0

Time in
Canada
(in years)

Notes : i.p. in progress; n/p not provided

21

Age

Description of participants

Participant

Table 3

18.1

23.3

13.0

20.3

15.0

25.0

14.1

19.2

17.0

15.0

Time in
school
(in years)

8.0

19.0

8.0

n/p

4.0

23.0

7.1

14.2

10.0

15.0

Time studying
English
(in years)
Bachelors
degree (i.p.)
Masters
degree
Masters
degree (i.p.)
Bachelors
degree (i.p.)
Masters
degree
Bachelors
degree (i.p.)
Bachelors
degree (i.p.)
High school
diploma
Masters
degree
Bachelors
degree (i.p.)

Education

Somewhat

Very much

Very much

Very much

Little

Very much

Much

Very much

Very much

Much

Experience
with MC
test

Experience
with MC
reading test

Little

Somewhat

Much

Somewhat

Very much Very much

Much

Very much Little

Much

Very much Very much

Very much Very much

Very much Much

Very much Very much

Somewhat Much

Comfort
level with
MC tests

456
Assessing reading with multiple-choice questions

Andr A. Rupp, Tracy Ferne and Hyeran Choi 457

subjects in their lives and, as Table 3 shows, 8 of them reported having


much or very much experience with those tests, 9 participants
responded to be much or very much comfortable with this testing
format, but only 6 reported having much or very much experience
with reading comprehension tests that utilize MC questions. In addition, they reported that they generally prepare for reading comprehension tests through test preparation materials, test preparation
courses, and by reading authentic materials. They cite reading books,
magazines, and newspapers in their first language and English, but
they read more in English, which is probably due to the immersion
context. However, only a few of them work specifically on improving their general reading comprehension skills and they do so primarily by learning vocabulary through the use of a dictionary.
2 Qualitative data collection methodology
In order to elicit verbal reports the participants were prompted using
a semi-structured interview format while responding to reading comprehension questions, as verbal reports play a central role in collecting qualitative data (Creswell, 1998). In addition, the participants
were asked to think-aloud during responding to MC questions as a
tool for tapping into their higher-order comprehension and response
processes. The think-aloud methodology is one of major tools used
in qualitative studies investigating reading comprehension, since it
provides a way to obtain an indirect view of a readers mental
processes which are unobservable during silent reading (e.g.
Hosenfeld, 1977; Block, 1986; Sarig, 1987; Farr et al., 1990).
Moreover, the recent literature argues for exactly such data to
strengthen the development of cognitive psychometric research on
achievement tests for enhancing, in part, their construct validity and
diagnostic value (Leighton, 2004).
3 Instrument
The source material for this study were items from previously operational forms of the CanTEST, which is a large-scale paper-andpencil test developed in Canada and comprised of four different sections
on reading and listening comprehension as well as speaking and
writing. It is taken annually by more than 500 people across seven sites
in Manitoba, Alberta, Saskatchewan, and Ontario, who are typically
foreign nationals wanting to be admitted to undergraduate and graduate programs at Canadian universities. The reading comprehension

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

458

Assessing reading with multiple-choice questions

section on each form of the test consists of a paragraph comprehension section and a cloze test, the latter not being of importance for
this study. The paragraph comprehension section consists of three
passages with about six items per passage, most of which are MC
questions with a few of them being open-ended questions. For the
purpose of this study, we utilized only the MC questions on passages
and classified the passages by content domain; we eventually chose
11 passages from 6 content domains. According to the schema presented in Enright et al. (2000: 30), the passages on the CanTEST are
all expository texts and the items required the test-takers to find
information, read for basic comprehension, and read to learn.
In addition to the CanTEST we also piloted a questionnaire that
profiles test-takers in MC reading comprehension contexts, a revised
version of which is to be administered with the CanTEST in the
future. The questionnaire contained many open-ended questions that
allowed us to fine-tune answer choices for the operational version,
which will be in a selected-response format due to time constraints
in the administration.
4 Procedure of administration
The participants were given three texts in different sequences. The
passages chosen as the first text were sampled from two domains
only whereas the other two passages were sampled from four
domains with the order of presentation partially counterbalanced.
Specifically, the first text was considered a preparation text where we
observed the participants while they were responding to the questions and recorded their behavior such as how long they spent on
each question or whether they seemed to focus on specific sections
of the text. For the second texts, participants were allowed to read the
text or the questions first and, after they responded to all questions,
we asked them, question by question, how they selected their answer,
how they would rate the difficulty of the question on a five-point
Likert scale from very easy to very difficult and what made the
questions difficult or easy; the median perceived difficulty ratings of
the items given by the participants are presented in Table 4.
As can be seen in Table 4, most items were rated by the participants as either very easy or easy even though an inspection of the
answers showed that the difficulty ratings did not correlate highly
with the answer scores. For the third text, we asked the participants
to think aloud while responding to each of the questions and asked
them similar questions for clarification.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 459


Table 4

Table of median examinee item difficulty ratings

Content
domain

Item 1

Item 2

Item 3

Item 4

Item 5

Item 6

Item 7

Median
per text

Environment
Statistics
Statistics
Food
Food
Technology
Technology
Language
Language
Other
academic
Other
academic

3.0
1.0
2.5
1.5
1.5
2.0
1.5
1.0
2.5
2.5

2.5
1.5
2.0
2.0
2.0
2.5
2.0
1.0
1.0
1.0

3.0
2.5
2.0
2.5
2.5
2.5
1.0
2.5
1.0
1.5

3.5
1.0
2.5
4.0
1.5
2.5
2.5
1.5
2.5
3.5

2.0
1.5
3.0
3.0
3.0
2.5
2.5
2.0
3.0
1.5

2.0
1.5
2.0
1.5
1.5
2.5
n/a
2.5
2.0
2.0

1.0
1.5
n/a
1.5
1.0
n/a
n/a
n/a
n/a
n/a

2.50
1.50
2.25
2.00
1.50
2.50
2.00
1.75
2.25
1.75

2.0

4.0

2.0

2.0

4.0

n/a

n/a

2.00

Notes : The scale ranges from 1 very easy to 5 very difficult; n/a not
applicable because the text had fewer items

Finally, participants were given the profiling questionnaire and


were asked to ask about any questions that appeared to be unclear.
The participants were fluent and comfortable enough in English so
that responding in English did not pose a problem for them. The sessions lasted approximately 2 hours and participants were nominally
reimbursed for their time and efforts with $30.
5 Procedure of coding and analysis
Interviews were digitally recorded, transcribed, and read into the
NVivo software (2005). Subsequently, responses were coded according to the kinds of strategies participants used to answer the items,
the ratings of perceived difficulty, sources of the perceived difficulty,
processes for selecting answers, and related characteristics of their
response processes. Based on the coding, holistic profiles of the participants were developed and supplemented with information from
the questionnaire data to help with extracting process information
from the data. The following section now describes the key results of
these analyses for the purposes of this article.
IV Results
In alignment with our expectations, we found that the behavior predicted by models about reading in a non-testing context differed

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

460

Assessing reading with multiple-choice questions

from the response behavior for MC questions about passages in a


testing context in striking ways.
1 Strategy selection
Based on their previous experience with MC questions in general
and reading comprehension tests with MC questions in particular,
test-takers in our study had developed various general strategies for
responding. These strategies appeared to be separable into macroand micro-level strategies. Macro-level strategies are concerned with
general approaches that test-takers chose to employ, either consciously or subconsciously, when they are given a MC reading comprehension test. On the other hand, test-takers tend to use micro-level
strategies when they respond to each individual item.
a Macro-level strategies: Macro-level strategies varied considerably
according to personal experiences and individual characteristics. Our
participants used either unconditional strategies that were always
employed for a MC reading comprehension test or conditional strategies that were employed depending on the perceived characteristics
of the passages and the questions.
Essentially, test-takers who employed unconditional strategies had
only one major strategy that they relied on in order to have a good
sense of what the text and questions were about before they
responded to individual items. We were able to abstract the following four major unconditional strategies from the responses of our
participants:
1) Scan or read the first paragraph of the text to get an idea of the
topic and type of text. Then scan or read the questions first and
look for or underline key words that help to locate information
in related paragraphs. Then answer the questions sequentially.
2) Scan or read the entire text first to get an idea of the topic
and type of text and look for or underline key words that might
help to answer questions later. Then scan or read the questions
and look for or underline key words that help to locate information in relevant paragraphs. Then answer the questions
sequentially.
3) Scan or read the questions first to assess the general topic of the
text implied in the questions as well as the question foci looking
for or underlining key words. Then scan or read the whole text
and look for or underline key words that help to answer questions. Then answer the questions sequentially.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 461

4) Scan or read the questions first to assess the general topic of the
text implied in the questions as well as the question foci looking
for or underlining key words. Then answer the questions sequentially by reading the text in chunks.
In particular, participants experiences with MC tests played an
important role in developing their unconditional strategies. One participant, who had taken about 15 to 20 MC tests since he was 14
years old, expressed his unconditional strategy as follows:
So basically what I did is first I read the question pretty fast (to have a sense
of) what they (the questions) looking for, like, (and) what kind of answer
they looking for. So I just read (questions) fastly. And then I directly to the
answer sheet, I mean to the test (text) and so I just can (read it) until something match like some words match and then I try to find out I see some
similarities in the text I try to find out the right answer inside, so that is the
way I use . . . I mean, when the meaning of the question is clear, I go right
away I know it is somewhere here. Usually it is somewhere here I know the
main idea and the main idea should be somewhere here, and I can, like,
answer the question . . .

These unconditional strategies had been developed by the test-takers,


not only based on their previous experiences with MC reading comprehension tests and other MC tests but also on their experience with
teachers. One participant expressed this as follows:
I remember sometimes they would tell us in school read the question first
and this way I guess that I have time to first . . . read the questions so I know
about what they are asking approximately.

Other test-takers, however, appeared to employ conditional strategies


by altering their repertoire of strategies that they rely on depending
on the characteristics of the input.
Specifically, the conditional strategies appeared to be dependent
on two different factors: the perceived characteristics of the text and
the perceived characteristics of the questions with the former appearing
to be more influential than the latter. We were able to abstract the
following four major conditional strategies from the responses of our
participants:
1) Scan or read the first paragraph of the text to assess the difficulty
of the text.
If the text is perceived to be easy, read the text first completely and look for or underline key words. Then answer the
questions sequentially.
If the text is perceived to be difficult, scan the questions first
and look for or underline key words.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

462

Assessing reading with multiple-choice questions

2) Scan or read the questions first before reading the text to assess
their difficulty.
If the questions are perceived to be easy, answer the first
question first and then proceed sequentially.
If the questions are perceived to be difficult, scan the text and
look for or underline key words first. Then answer the questions sequentially.
3) Scan or read the questions first before reading the text looking
for or underlining key words.
If the text is perceived to be short, read the entire text
looking for or underlining key words and answer the questions sequentially.
If the text is perceived to be long, immediately begin to
answer the questions sequentially.
4) Scan the text for length.
If the text is perceived to be long, start to answer questions
sequentially.
If the text is perceived to be short, read the text first looking
for or underlining key words. Then answer the questions
sequentially.
A few clear patterns emerge from the above strategies. First and foremost, for both unconditional and conditional strategies, the process
of responding to MC questions in our sample and context heavily
relied on key word matching, a process which test-takers often
facilitate by underlining or highlighting individual words or phrases
considered to be pertinent for understanding the text or answering
the questions. As one participant stated in response to how he located
relevant information for answering a question:
So they are talking about device here so I see the same word here, device,
so I stop over here and I know, like, the answers over here. So I just stop by
and try to like look around and I am pretty sure it is maybe one sentence
before or maybe one sentence after so that is how I proceed.

Yet, while constructing a coherent mental representation of a text


relies on coherence cues that link propositions which can be key
words, phrases, or structures the key words that are relevant for
responding to MC questions may not necessarily be coherently connected and mentally stored. Since in answering MC questions our
participants always segmented the text, the linking of individual
propositions through the linking of key words to form coherent mental representations is achieved only when an individual question
requires such a linking or, alternatively, when different questions

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 463

semantically or logically build on each other and induce such a linking. However, it is doubtful that those instances were frequent.
Second, the unconditional strategies were essentially variants of
the conditional strategies. The main deciding factor in choosing a
particular conditional strategy appeared to be the perceived difficulty
of the text or the questions, which are complex perceptions consisting of the perceived length of the text, the perceived familiarity with
the text type, the perceived familiarity with its topic, the perceived
familiarity with its vocabulary, and the perceived complexity of the
processes required to answer a question correctly.
Third, responding to MC questions was rarely a linear process in
which a text was read first to form a textbase with an integrated
situation model as suggested by the CI framework or to build a
coherent mental representation of nested structures as suggested by
the structure-building framework and then the set of MC questions
was answered. More commonly, texts were merely scanned for key
words and this took place either in chunks or after questions had
been scanned or had been read.
b Micro-level strategies: Our study showed that the response
process to individual items was a complex process with certain common features that could be abstracted.

2 The role of item order and perceived item difficulty


Most notably, participants all clearly assumed that the questions
were ordered sequentially. As one participant stated:
Well, in general, almost of the essays to they put the questions in order [. . .]
so I remember that, knowing better read the first question before reading the
passage [. . .] since [for] most of these tests the questions come in order.

It further appeared that the response process was largely influenced


by how difficult or easy an individual item was perceived to be,
which, just like the perception of the difficulty of the text, turned out
to be a complex assessment on the part of the test-taker. While it was
generally assumed that all response options had to be read, understood, and eliminated before the correct answer choice could be
selected, an answer choice was quickly selected when test-takers
perceived the item to be easy, because they could clearly remember
key text information. From an information-processing perspective,
this can be explained by inferring that the information in the selected
choice was much more highly activated than competing pieces of
information in other choices (see Sheehan and Ginther, 2001).

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

464

Assessing reading with multiple-choice questions

One participant stated that she did not even attempt to eliminate
choices more carefully because:
the main idea was who can get this information I think when I read I got the
idea that, the information was already . . . what I remembered from the paragraph on, its like an answer while I was reading this was much more quickly
so I didnt need to eliminate, but there were three and like uh, no, uh, no, uh,
no. It was easy because in some way its like I knew what I was looking for.

3 The role of logical elimination of distractors and


informed guessing
In those cases when items were perceived to be more and more difficult, the solution process became more and more characterized by
a continual back and forth between the question and relevant text
sections in order to logically eliminate i.e. falsify in Embretson
and Wetzels (1987) terminology potentially incorrect choices.
This process was continued until the potentially correct option could
be selected i.e. can be confirmed in Embretson and Wetzels
(1987) terminology or until fewer options remained, at which point
a final choice was made by guessing. One participant described why
she decided to start eliminating choices with respect to the perceived
difficulty:
When I dont understand very well the text I start to eliminate, but if I have
a clear idea of the text I just go for the answer. This kind of question is kind
of difficult because its not that easy to get this idea from paper or this answer
from our experience our background so I will use the elimination answer
method to answer this question.

Yet, importantly, guessing could not generally be characterized as


an uninformed process whereby a random selection is made
among the possible choices. Rather, the process could be best
described as conditional informed guessing, which is to say that
guessing was usually seen only as a last resort. Moreover, it was
not exerted upon all choices but only those that were left after
knowledge-based or logic-based elimination of a few had already
taken place. As a side comment at this point, whenever sufficient
data are available and fit a unidimensional model, the magnitude
of the estimated lower-asymptote parameters in a three-parameter
logistic IRT model can provide quantitative information about the
degree to which test-takers, on average, engage in informed guessing for certain items.
For illustration purposes, the following quote illustrates the
complexity and artificiality of the logical elimination process. The

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 465

participant responded to the following item:


You might want to use census data if you were interested in . . .
a)
b)
c)
d)

designing new homes for large families;


building components for computers;
moving your business to a new area;
finding a more favorable climatic zone.

as follows:
I picked C, because A, designing new homes for large families, youd need
demographics data to do that really not necessarily census data for new
homes. Theres always going to be expansion within a city now whether you are
going to build homes for large families, small families, all depending, would
you really rely on census data? Like federal government census data to do that?
No. Building components for computers, well, that just doesnt . . . why. . . I
dont know why wed use census data. Its about people not things its about
things that they need, but really how important is, like, a hard drive to somebody, you know, depending. Now finding a more favorable climatic zone, that
one was a good one, but no. Why would you? You dont need census data to figure out its cold there its warm here, you know, that just does not strike me.
Moving your business to a new area, yes. If you have a candy store, youre moving to a city with a lot of children, or if you have a toy store or something like
that or if youre moving to, like, a retirement community more like pharmacy
or, you know, chiropractors or something like that. See, that makes more sense
to me that youd want to know what kind of population you are going into.

This description underscores clearly that the reasoning process


through these options is induced by the MC question itself and, thus,
unique to a testing context.
4 The role of prior knowledge
The previous verbal report furthermore highlights the importance of
prior knowledge, which our respondents frequently drew upon to
eliminate choices. While prior knowledge may certainly also be
utilized to aid in general text comprehension i.e. in creating a
micro- or macrostructure situation model of the text in Kintschs
(1998) terminology which is used to make a knowledge-based
choice for questions, in this study it was more consciously drawn
upon when logical reasoning became the last resort to eliminate
incorrect answers. As one participant stated clearly:
And I also tried to use common sense because, why the statistical date, and
it, generally, the idea of its, like, I know what in regards to this kind of thing
racial anonymity, and in some way I use a lot of common sense and so I try
to what I do when I go back, its to go control what I did was try. I think
thats in some way, this is easier for me than that one, because in that one I
was not able to use common sense.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

466

Assessing reading with multiple-choice questions

5 Factors affecting the perceived difficulty of items


In general, it was found that the factors that impacted the participants perceived difficulty of individual items were those that had
been frequently described in the literature, which provided evidence
for the continual coding or consideration of certain types of item features when developing tests and analysing test data. Of particular
relevance were the semantic similarity and plausibility of the distractors, the length and vocabulary of the stem and choices, the difficulty
of the text, and the item type.
Semantically similar or plausible distractors caused problems for
test-takers, a phenomenon that is reflected in the following comment:
So that, I think thats what made it confusing. [The options] are so close
together that when you are reading youre like associating the ideas together
as you go. You understand, but youre kind of sticking things together that
dont necessarily, like, shouldnt go together most of the time.

With respect to the perceived difficulty of choices, the length of the


options also increased difficulty, since they required more information to be processed in order for the test-takers to falsify incorrect
options and to confirm the correct option. Notes one participant:
There more confusing, the longer, the answers are longer so that there is more
information in them. It is easy just like A blue B red, pink or stuff like that . . .

Also, vocabulary of the answer options or a question as well as the


wording of a question was a source of the perceived difficulty of
reading comprehension questions. One participant stated why:
Yes, sometimes when they are tricky with the words but in some way you
know that when you are not sure about two answers there is some thing in
the formulation of the answer that you have to identify - the tricky one and
so the other. Maybe there was one or two words but what I did is reading
comprehension because what I was thinking was the main idea not in particular words so in some words that I may not understand I still try to get the
context meaning.

The perceived difficulty of the text that an item referred to was


another major factor influencing the perceived difficulty of the item
itself, which consequently influenced how questions were perceived
and how response strategies were chosen, at least for some testtakers. For instance, inference items that asked test-takers to engage
with the whole text were perceived as very difficult, particularly
when the text was perceived as very difficult also. One participant
described it this way:
Because you couldnt find the answer directly from the text, so you make
assumption but actually whether your assumption is right or wrong you have

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 467


to really think about it, thats why after I finished all the questions I review
the whole text again.

On the other hand, items that asked for the main idea of a passage were usually perceived as easy, because test-takers could skip
local detailed information without necessarily losing key
information required to respond correctly (see Rupp et al., 2001).
One participant stated clearly her perception of easiness of the main
idea item:
I think its the main idea and its very declared in the first paragraph so thats
also the reason why I think this is actually very easy for me.

Accordingly, test-takers in our study often just scanned the text to


locate local features that a question asked for without an effort to
understand paragraphs; this may make such questions easy. One participant noted:
No, for this one we dont need to understand [the paragraph]. We just know
the place and we can find the exact words, exactly all these things for the
answer.

Paradoxically, however, the assessment of reading comprehension at


a local level may also make such items more difficult for some testtakers. For example, MC questions asking about local information in
the text can be perceived as relatively difficult due to the time constraints imposed by longer and more difficult texts, which makes
locating the appropriate passage section more difficult:
More difficult in the way that it requires more time, generally you have a lot
of questions in this sort of evaluations and you need, you know what you
have to go quickly through the questions, so the questions requires more time
means that in that way for me they are more difficult in the way that I know
I need more time to read that question.

Because of these conflicting results, we deduced that local features


of a text became more or less relevant depending on the MC questions that referred to them. Consequently, they appeared to be more
important for test-takers for determining the perceived difficulty of a
MC question that refers to them rather than for assessing the perceived difficulty of a text per se.
Many MC questions on the CanTEST and, as we would hypothesize, other reading comprehension tests currently in use particularly the non-standardized ones that have to be developed under
enormous resource constraints are highlighters for local features
of texts so that the microstructure characteristics of the textbase and
the situation model become proportionally more relevant.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

468

Assessing reading with multiple-choice questions

6 Shades of reading comprehension


In conclusion, all of the above statements are vivid reminders that it
always remains to be shown empirically what type and level of reading comprehension is being assessed with a particular MC question
as characteristics of the questions and the text interact with characteristics of the test-takers to induce response processes that are mediated by prior experience with such tests. Specifically, our data
indicated that test-takers first tended to apply macro-level strategies
in order to have an overall idea of what the given text and the related
questions were about. The macro-level strategies were either unconditional or conditional based on test-takers own previous experiences. Then, regarding the micro-level strategies, test-takers
selected, consciously or subconsciously, particular strategies among
their repertoire of item response strategies, which depend mostly on
the perceived difficulty of the text and questions.
In particular, during the process of logically finding the correct
option under time constraints, the degree of interaction between the
text and questions seemed to have been influenced largely by the perceived difficulty of a question type and the plausibility of distractors.
Therefore, for questions with distractors that are very close in meaning and plausibility, comprehension of the text content or general
argument structure might have been subordinate to logical reasoning,
which changes the process of reading comprehension in a testing
context compared to non-testing contexts. Therefore, MC questions
might function well as separable measures of how difficult different
aspects of texts are for test-takers or of how well test-takers engage
in lower-order component processes rather than as composite measures of higher-order reading comprehension, which they may be
sometimes colloquially assumed to be.
V Limitations and future directions
This study provided a brief qualitative perspective on the processes
that test-takers engage in when they respond to MC items. As such,
this study is limited by the number of participants, the texts and
questions chosen, and the way in which the think-aloud and reflection questions were posed to the participants. Moreover, it is limited
because it only compares the processes of responding to MC items
to theoretical predictions about processes in non-testing contexts but
does not compare the actual behavior of the participants in such contexts. Furthermore, additional systematic analyses that link the text

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 469

types, item types, test-taker characteristics, and response processes


across MC reading comprehension tests for various populations of
test-takers more closely would be desirable. It would also be important to investigate some of the hypotheses about the effects of text
and item characteristics on response processes through experimental
means; to this end, an experimental study that investigates how text
length and different combinations of administering items and passages from the new SAT impacts item characteristics and response
strategies is currently underway.

VI Conclusions
Our findings strongly suggest that the sequence and structure of MC
questions appear to provide important cues for test-takers that allow
or influence them to select response strategies, which may result in
response processes that deviate significantly from those predicted by
a model of reading comprehension in non-testing contexts. Despite
newer types of MC questions that focus more strongly on higher-level
reading comprehension, which can be found on the newer TOEFL or
SAT versions, for example, we hypothesize that test-takers frequently
segment a text into chunks that are aligned with individual questions
and focus predominantly on the microstructure representation of a
text base rather than the macrostructure of a situation model. As a
result, higher-order inferences that may lead to an integrated
macrostructure situation model in a non-testing context are often suppressed or are limited to grasping the main idea of a text.
It is indisputable that theoretical models of reading comprehension are often well suited to explain the cognitive processes of reading comprehension in non-testing contexts, but they neglect the
segmentation and localization functions of many types of MC questions. Put differently, it is certainly possible to link observations such
as responses to MC questions and verbal responses to interviewer
questions to individual components of theoretical models for higherorder reading comprehension such as the CI model, the structurebuilding framework, or a general information-processing model.
However, models for MC responding that postulate an integrated
reading comprehension process that is linearly followed by a decision-making process neglect testing format effects, which manifest
themselves in variation amongst response processes. That variation
is, in turn, grounded in the variation among test-takers, text types,
question types, and question formats.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

470

Assessing reading with multiple-choice questions

As a result, we specifically endorse the approach discussed in


Sheehan and Ginther (2001) whereby different theoretical processing
models would be developed for different question types (e.g.
vocabulary-in-context, reading to learn), even if these question types
are developed within the same testing format (e.g. MC). If reading
specialists and assessment experts want to explain large proportions
of variance in measures of item characteristics such as item difficulty, it appears to be pertinent to replace general predictive statistical models for responding to certain question formats by specific
predictive statistical models for responding to certain sub-types
within a given format.
In sum, different MC questions do not merely tap but, indeed,
create very particular comprehension and response processes.
Therefore, a blanket statement such as MC questions assess reading
comprehension is nonsensical for any test. If anything, through
detailed diagnostic information, a bridge between response
processes for MC items and reading comprehension processes
generally needs to be built both rationally and empirically so that
results from MC tests about some form(s) of reading comprehension
can be more effectively and clearly communicated to test-takers and
educational decision-makers. For this purpose, models in cognitive
psychometric research should prove to be useful and further
inquiries along the lines of Buck et al. (1997) with the rule-space
methodology or Hartz (2002) with the fusion model could open up
important pathways to construct validation of reading comprehension tests with MC items (see also Rupp, in press).
Acknowledgements
The authors wish to gratefully acknowledge that this research was
funded through the standard research grant #0410-2004-0114 from
the Social Sciences and Humanities Research Council (SSHRC) in
Canada. Moreover, the authors wish to thank Amelia Hope and
Mary-Ruth at the CanTEST office at the University of Ottawa in
Canada, who graciously and patiently provided testing materials and
insights into the test development process.
VII References
Aebersold, J. and Field, M. 1997: From reader to reading teacher: issues and
strategies for second language classrooms. Cambridge: Cambridge
University Press.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 471


Alderson, J.C. 2000: Assessing reading. Cambridge: Cambridge University
Press.
Allan, A. 1992: Development and validation of a scale to measure testwiseness in EFL/ESL reading test takers. Language Testing 9, 10122.
Anderson, R. 1991: Individual differences in strategy use in second language
reading and testing. Modern Language Journal 75, 46072.
Anderson, R. and Pearson, P. 1984: A schema-theoretic view of basic
processes in reading comprehension. In Pearson, P.D., editor, Handbook
of reading research. New York: Longman, 25591.
Barnett, M. 1989: More than meets the eye: foreign language reading theory
and practice. Englewood Cliffs, NJ: CAL and Prentice-Hall.
Birch, B. 2002: English L2 reading: getting to the bottom. Mahwah, NJ:
Erlbaum.
Block, E.L. 1986: The comprehension strategies of second language readers.
TESOL Quarterly 20, 46394.
Brownstein, S.C., Weiner, M., Green, S.W. and Hilbert, S. 1999: How to prepare for the GRE. Hauppauge, NY: Barron.
Buck, G., Tatsuoka, K. and Kostin, I. 1997: The subskills of reading:
Rulespace analysis of a MC test of second language reading comprehension. Language Learning 47, 42366.
Carrell, P. 1984: Schema theory and ESL reading: classroom implications and
applications. The Modern Language Journal 68, 33243.
Carver, R.P. 1997: Reading for one second, one minute, or one year from the
perspective of rauding theory. Scientific Studies of Reading 1, 343.
Clarke, A. 1979: Reading in Spanish and English. Language Learning 29,
12150.
Creswell, J.W. 1998: Qualitative inquiry and research design: choosing
among five traditions. Thousand Oaks, CA: Sage.
Drum, P.A., Calfee, R.C. and Cook, L.K. 1981: The effects of surface structure variables on performance in reading comprehension tests. Reading
Research Quarterly 16, 486514.
Embretson, S.E. and Wetzel, D. 1987: Component latent trait models for paragraph comprehension tests. Applied Psychological Measurement 11,
17593.
Enright, M.K., Grabe, W., Koda, K., Mosenthal, P., Mulcahy-Ernt, P. and
Schedl, M. 2000: TOEFL 2000 Reading Framework: a working paper.
TOEFL Monograph Series MS-17. Princeton, NJ: Educational Testing
Service.
Farr, R., Pritchard, R. and Smitten, B. 1990: A description of what happens
when an examinee takes a multiple-choice reading comprehension test.
Journal of Educational Measurement 27, 20926.
Freedle, R. and Kostin, I. 1991: The prediction of SAT reading comprehension
item difficulty for expository prose passages. Research Report RR 9129.
Princeton, NJ: Educational Testing Service.
1992: The prediction of GRE reading comprehension item difficulty for
expository prose passages for each of three item types: main Ideas, inferences, and explicit statements. Research Report RR 9159. Princeton,
NJ: Educational Testing Service.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

472

Assessing reading with multiple-choice questions

1993: The prediction of TOEFL reading item difficulty: implications for


construct validity. Language Testing 10, 13370.
Gallagher, N. 2000: DELTAs key to the TOEFL test McHenry, IL: Delta.
Gernsbacher, M.A. 1990: Language comprehension as structure building.
Hillsdale, NJ: Erlbaum.
1997: Two decades of structure building. Discourse Processes 23,
265304.
Goldman, S.R. 1997: Learning from text: Reflections on the past and suggestions for the future. Discourse Processes 23, 35798.
Goodman, E., Ojserkis, R. and Verini, B. 2004: GRE exam. New York: Simon
and Schuster.
2005: GRE exam. New York: Kaplan
Gorin, J.S. 2002: Cognitive and psychometric modeling of text-based reading
comprehension GRE-V test items. Unpublished doctoral dissertation,
University of Kansas.
2005: Manipulation of processing difficulty of reading comprehension
test questions: the feasibility of verbal item generation. Journal of
Educational Measurement 42, 35173.
Gorin, J.S. and Embretson, S.E. in press: Predicting item properties without
tryout: cognitive modeling of paragraph comprehension items. Applied
Psychological Measurement.
Grabe, W. 2000: Developments in reading research and their implication for
computer-adaptive reading assessment. In Chalhoub-Deville, M., editor,
Issues in computer-adaptive tests of reading, Cambridge: Cambridge
University Press, 1147.
2002: Narrative and expository macro-genres. In Johns, A.M., editor,
Genre in the classroom: multiple perspectives. Mahwah, NJ: Erlbaum,
24967.
Green, S.W. and Wolf, I.K. 2000: How to prepare for the GRE. Hauppauge,
NY: Barron.
Hartz, S.M. 2002: A Bayesian guide for the Unified Model for assessing cognitive abilities: blending theory with practicality. Unpublished doctoral
dissertation, University of Illinois, Urbana-Champaign.
Hinkel, E. 2004: TOEFL test strategies. Hauppauge, NY: Barron.
Hosenfeld, C. 1977: A preliminary investigation of the reading strategies of
successful and nonsuccessful second language learners. System 5,
11023.
Junker, B.W. 1999: Some statistical models and computational methods that
may be useful for cognitively-relevant assessment. Unpublished manuscript. Available online at http://www.stat.cmu.edu/~brian/nrc/cfa (June
2006).
Katz, S., Blackburn, A.B. and Lautenschlager, G. 1991: Answering reading
comprehension items without passages on the SAT when the items are
quasi-randomized. Educational and Psychological Measurement 51,
74754.
Katz, S. and Lautenschlager, G.J. 1994: Answering reading comprehension
questions without passages on the SAT-I, ACT, and GRE. Educational
Assessment 2, 295308.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

Andr A. Rupp, Tracy Ferne and Hyeran Choi 473


2001: The contribution of passage and no-passage factors to item
performance on the SAT reading task. Educational Assessment 7,
16576.
Katz, S., Lautenschlager, G.J., Blackburn, A.B. and Harris, F. 1990:
Answering reading comprehension items without passages on the SAT.
Psychological Science 1, 12227.
Kintsch, W. 1998: Comprehension: a paradigm for cognition. New York:
Cambridge University Press.
Kitao, S.K. and Kitao, K. 2002: Testing reading. Asian Journal of English
Language Teaching 12, 16178.
Kobayashi, M. 2002: Method effects on reading comprehension test performance: Text organization and response format. Language Testing 19,
193220.
Leighton, J.P. 2004: Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing.
Educational Measurement: Issues and Practice 23, 615.
Lougheed, L. 2003: How to prepare for the TOEFL test. Hauppauge, NY:
Barron.
Lurie, K., Pecsenye, M. and Robinson, A. 2005: Cracking the GRE. New
York: Random House.
Martinson, T.H. 2005. Master the GRE. Lawrenceville, NJ: Thomson
Petersons.
McNamara, D.S. and Kintsch, W. 1996. Learning from texts: effects of prior
knowledge and text coherence. Discourse Processes 22, 24788.
Nevo, N. 1989. Test-taking strategies on a MC test of reading comprehension.
Language Testing 6, 199215.
Nichols, P.D., Chipman, S.F. and Brennan, R.L. editors, 1995: Cognitively
diagnostic assessment. Hillsdale, NJ: Erlbaum.
NVivo [Software Program] 2005: Doncaster, Victoria, Australia: QSR
International.
Perfetti, C.A. 1985: Reading ability. New York: Oxford University Press.
1997: Sentences, individual differences, and multiple texts: three issues
in text comprehension. Discourse Processes 23, 33755.
Powers, D.E. and Leung, S.W. 1995: Answering the new SAT reading comprehension questions without the passages. Journal of Educational
Measurement 32, 10529.
Rogers, B. 2005: TOEFL success. Lawrenceville, NJ: Thomson Petersons.
Rumelhart, D. 1980: Schemata: the building blocks of cognition. In Spiro,
R.J., Bruce, B.C. and Brewer, B.W., editors, Theoretical issues in reading comprehension. Hillsdale, NJ: Erlbaum, 3358.
Rupp, A.A. in press: The answer is in the question: a guide to describing and
investigating the conceptual foundations and statistical properties of cognitive psychometric models. International Journal of Testing.
Rupp, A.A., Garcia, P. and Jamieson, J. 2001: Combining multiple regression
and CART to understand difficulty in second language reading and listening comprehension test items. International Journal of Testing 1,
185216.
Rymniak, M. and Shanks, J. 2002: TOEFL CBT exam. New York: Kaplan.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014

474

Assessing reading with multiple-choice questions

Sarig, G. 1987: High-level reading in first and in the foreign language: some
comparative process data. In Devine, J. Carrell, P.L. and Eskey, D.E.
editors, Research in reading as a second language. Washington, DC:
Teachers of English to Speakers of Other Language 2, 10520.
Sheehan, K.M. and Ginther, A. 2001: What do passage-based MC verbal reasoning items really measure? An analysis of the cognitive skills underlying performance on the current TOEFL reading section. Paper presented
at the 2000 Annual Meeting of the National Council of Measurement in
Education.
Sheehan, K.M., Ginther, A. and Schedl, M. 1999: Development of a proficiency scale for the TOEFL reading comprehension section. TOEFL
Research Report. Princeton, NJ: Educational Testing Service.
Shmailo, L. 2002: Mastering the TOEFL CBT. New York: Kaplan.
Stanovich, K. 1980: Toward an interactive-compensatory model of individual
differences in the development of reading fluency. Reading Research
Quarterly 16, 3271.
1986: Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly 21,
360406.
Sullivan, P.N., Brenner, G.A. and Zhong, G.L.Q. 2004: Master the TOEFL
2005. Lawrenceville, NJ: Thomson/Peterson.
Sullivan, P.N., Zhong, G.L.Q. and Brenner, G.A. 2000 : Everything you need
to score high on the TOEFL. New York: Pearson Education.
Underwood, G. and Batt, V. 1996: Reading and understanding: an introduction to the psychology of reading. Cambridge, MA: Blackwell.
Van Dijk, T. 1985: Strategic discourse comprehension. In Ballmer, T., editor,
Linguistic dynamics: Discourses, procedures, and evolution. Berlin:
Walter de Gruyter, 3061.
Wenden, A. and Rubin, J. editors. 1987: Learner strategies in language learning. London: Prentice Hall International.

Downloaded from ltj.sagepub.com at Istanbul Universitesi on May 4, 2014