Anda di halaman 1dari 9

# fea-schoenfeld.

## Purposes and Methods

of Research in
Mathematics Education
Alan H. Schoenfeld

Bertrand Russell has defined mathematics as the science in which we never know what we are talking about or
whether what we are saying is true. Mathematics has been shown to apply widely in many other scientific fields.
Hence, most other scientists do not know what they are talking about or whether what they are saying is true.
—Joel Cohen, “On the nature of mathematical proofs”

## There are no proofs in mathematics education.

—Henry Pollak

T
he first quotation above is humorous; the purposes and goals of research in mathematics ed-
second serious. Both, however, serve to ucation.
highlight some of the major differences This article begins with an attempt to lay out
between mathematics and mathematics some of the relevant perspectives and to provide
education—differences that must be un- background regarding the nature of inquiry within
derstood if one is to understand the nature of mathematics education. Among the questions
methods and results in mathematics education. explored are the following: Just what is the enter-
The Cohen quotation does point to some seri- prise? That is, what are the purposes of research
ous aspects of mathematics. In describing various in mathematics education? What do theories and
geometries, for example, we start with undefined models look like in education as opposed to those
terms. Then, following the rules of logic, we prove in mathematics and the physical sciences? What
that if certain things are true, other results must kinds of questions can educational research
follow. On the one hand, the terms are undefined; answer? Given such questions, what constitute
i.e., “we never know what we are talking about.” On reasonable answers? What kinds of evidence are
the other hand, the results are definitive. As appropriate to back up educational claims? What
Gertrude Stein might have said, a proof is a proof kinds of methods can generate such evidence?
is a proof. What standards might one have for judging claims,
Other disciplines work in other ways. Pollak’s models, and theories? As will be seen, there are
statement was not meant as a dismissal of math-
significant differences between mathematics and
ematics education, but as a pointer to the fact that
education with regard to all of these questions.
the nature of evidence and argument in mathe-
matics education is quite unlike the nature of Purposes
evidence and argument in mathematics. Indeed, the
Research in mathematics education has two main
kinds of questions one can ask (and expect to be
purposes, one pure and one applied:
able to answer) in educational research are not
• Pure (Basic Science): To understand the
the kinds of questions that mathematicians might
nature of mathematical thinking, teaching, and
expect. Beyond that, mathematicians and educa-
tion researchers tend to have different views of the learning;
• Applied (Engineering): To use such under-
Alan H. Schoenfeld is Elizabeth and Edward Conner standings to improve mathematics instruction.
Professor of Education at the University of California, These are deeply intertwined, with the first at
Berkeley. His e-mail address is alans@socrates. least as important as the second. The reason is sim-
berkeley.edu. ple: without a deep understanding of thinking,

## JUNE/JULY 2000 NOTICES OF THE AMS 641

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 642

teaching, and learning, no sustained progress on on how much weight these outcomes are given.
the “applied front” is possible. A useful analogy is Similar issues arise even if one focuses solely
the relationship between medical research and on the mathematics being taught. Suppose one
practice. There is a wide range of medical research. wants to address the question, Do students learn
Some is done urgently, with potential applications as much mathematics in large classes as in small
in the immediate future. Some is done with the goal classes? One must immediately ask, “What counts
of understanding basic physiological mechanisms. as mathematics? How much weight will be placed
Over the long run the two kinds of work live in syn- (say) on problem solving, on modeling, or on the
ergy. This is because basic knowledge is of ability to communicate mathematically?” Judg-
intrinsic interest and because it establishes and ments concerning the effectiveness of one form of
strengthens the foundations upon which applied instruction over another will depend on the an-
work is based. swers to these questions. To put things bluntly, a
These dual purposes must be understood. They researcher has to know what to look for and what
contrast rather strongly with the single purpose of to take as evidence of it before being able to
research in mathematics education, as seen from determine whether it is there.
the perspective of many mathematicians: The fact that one’s judgments reflect one’s
• “Tell me what works in the classroom.” values also applies to questions of the type, Which
Saying this does not imply that mathematicians approach works better (or best)? This may seem
are not interested at some abstract level in basic obvious, but often it is not. Consider calculus
research in mathematics education, but that their reform. Soon after the Tulane “Lean and Lively”
primary expectation is usefulness in rather direct conference, whose proceedings appeared in
and practical terms. Of course, the educational Douglas [5], the National Science Foundation (NSF)
community must provide useful results—indeed, funded a major calculus reform initiative. By the
usefulness motivates the vast majority of educa- mid-1990s NSF program officers were convinced
tional work—but it is a mistake to think that direct that calculus reform was a “good thing” and that
applications (curriculum development, “proof” it should be a model for reform in other content
that instructional treatments work, etc.) are the areas. NSF brought together mathematicians who
primary business of research in mathematics had been involved in reform with researchers in
education. mathematics education and posed the following
question: “Can we obtain evidence that calculus
On Questions reform worked (that is, that reform calculus is
A major issue that needs to be addressed when better than the traditional calculus)?” What they
thinking about what mathematics education can had in mind, basically, was some form of test.
offer is, What kinds of questions can research in They thought it should be easy to construct a test,
Simply put, the most typical educational ques- better.
tions asked by mathematicians—“What works?” Those who advocated this approach failed to
and “Which approach is better?”—tend to be understand that what they proposed would in
unanswerable in principle. The reason is that what essence be a comparison of apples and oranges. If
a person will think works will depend on what one gave a traditional test that leaned heavily
that person values. Before one tries to decide on the ability to perform symbolic manipulations,
whether some instructional approach is success- “reform” students would be at a disadvantage
ful, one has to address questions such as: Just because they had not practiced computational skills.
what do you want to achieve? What understand- If one gave a test that was technology-dependent or
ings, for what students, under what conditions, that had a heavy modeling component, traditional
with what constraints? Consider the following students would be at a disadvantage because
examples. technology and modeling had not been a large part
One question asked with some frequency by of their curriculum. Either way, giving a test and
faculty and administrators is, “Are large classes as comparing scores would be unfair. The appropriate
good as small classes?” I hope it is clear that this way to proceed was to look at the curriculum,
question cannot be answered in the abstract. How identifying important topics and specifying what it
satisfied one is with large classes depends on the means to have a conceptual understanding of them.
consequences one thinks are important. How much With this kind of information, individual institu-
does students’ sense of engagement matter? Are tions and departments (and the profession as a
students’ feelings about the course and toward whole, if it wished) could then decide which aspects
the department important? Is there concern about of understanding were most important, which they
the percentage of students who go on to enroll in wanted to assess, and how. As a result of extended
subsequent mathematics courses? The conclu- discussions, the NSF effort evolved from one that
sions that one might draw regarding the utility of focused on documenting the effects of calculus
large classes could vary substantially, depending reform to one that focused on developing a

## 642 NOTICES OF THE AMS VOLUME 47, NUMBER 6

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 643

framework for looking at the effects of calculus attraction, for example. Models are understood to be
instruction. The result of these efforts was the 1997 approximations, but they are expected to be very
book Student Assessment in Calculus [10]. precise approximations in deterministic form. Thus,
In sum, many of the questions that would seem for example, to model heat flow in a laminar plate,
natural to ask—questions of the type, What works? we specify the initial boundary conditions and the
or Which method works best?—cannot be an- conditions of heat flow, and we then solve the
swered, for good reason. relevant equations. In short, there is no ambiguity
Given this, what kinds of questions can research in the process. Descriptions are explicit, and the
in mathematics education address? I would argue standard of correctness is mathematical proof. A
that some of the fundamental contributions from theory and models derived from it can be used to
research in mathematics education are the make predictions, which in turn are taken as
following: empirical substantiation of the correctness of the
• theoretical perspectives for understanding theory.
thinking, learning, and teaching; Things are far more complex in the biological
• descriptions of aspects of cognition (e.g., think- sciences. Consider the theory of evolution, for
ing mathematically; student understandings example. Biologists are in general agreement with
and misunderstandings of the concepts of regard to its essential correctness, but the evidence
function, limit, etc.); marshaled in favor of evolution is quite unlike the
• existence proofs (evidence of cases in which kind of evidence used in mathematics or physics.
students can learn problem solving, induc- There is no way to prove that evolution is correct in
tion, group theory; evidence of the viability of a mathematical sense; the arguments that support
various kinds of instruction); it consist of (to borrow the title of one of Pólya’s
• descriptions of (positive and negative) conse- books) “patterns of plausible reasoning”, along with
quences of various forms of instruction. the careful consideration of alternative hypotheses.
Michèle Artigue’s recent Notices article [1] In effect, biologists have said the following: “We
describes many of the results of such studies. I have mountains of evidence that are consistent with
will describe some others and comment on the the theory, broadly construed; there is no clear
methods for obtaining them in the section “Methods” evidence that falsifies the proposed theory, and no
below. rival hypotheses meet the same criteria.” While
predictions of future events are not feasible given the
On Theories and Models (and Criteria for time scale of evolutionary events, the theory does
Good Ones) support an alternative form of prediction. Previ-
When mathematicians use the terms “theory” and ously unexamined fossil records must conform to
“models”, they typically have very specific kinds the theory, so that the theory can be used to describe
of things in mind, both regarding the nature of properties that fossils, in particular geological strata,
those entities and the kinds of evidence used to should or should not have. The cumulative record
make claims regarding them. The terms “theory” is taken as substantiation for the theory.
and “models” are sometimes used in different In short, theory and supporting evidence can
ways in the life sciences and social sciences, and differ substantially in the life sciences and in math-
their uses may be more akin to those used in ed- ematics and physics. The same holds for models, or
ucation. In this section I shall briefly walk through at least the degree of precision expected of them:
the examples indicated in Table 1. nobody expects animal populations modeled by
predator-prey equations to conform to those mod-
Subject Mathematics, Biology Education, els in the same way that heat flow in a laminar plate
Reprinted with permission from

## Physics Psychology is expected to conform to models of heat flow.

Theory of. . . Equations; Evolution Mind Finally, theories and models in the sciences are
Gravity always subject to revision and refinement. As
glorious and wonderful as Newtonian gravitational
Model of. . . Heat Flow Predator-Prey Problem
in a Plate Relations Solving theory was, it was superseded by Einstein’s theory
[11], page 9.

## of relativity. Or consider nuclear theory. Valence

Table 1. Theories and models in mathematics/physics, theory, based on models of electrons that orbited
biology, and education/psychology. around nuclei, allowed for amazing predictions,
such as the existence of as-yet-undiscovered
In mathematics, theories are laid out explicitly, as elements. But physicists no longer talk about
in the theory of equations or the theory of complex electrons in orbit around nuclei; once-solid
variables. Results are obtained analytically: we particles in the theory such as electrons have been
prove that the objects in question have the replaced in the theory by probabilistic electron
properties we claim they have. In classical physics clouds. Theories evolve.
there is a comparable degree of specificity; physicists Research in mathematics education has many of
specify an inverse-square law for gravitational the attributes of the research in the physical and

## JUNE/JULY 2000 NOTICES OF THE AMS 643

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 644

life sciences described above. In a “theory of mind”, evidence in favor of this assertion is compelling
for example, certain assumptions are made about but cannot be definitive. Many kinds of experi-
the nature of mental organization—e.g., that there ments have been performed in which people are
are certain kinds of mental structures that function given tasks that call for using more than nine slots
in particular ways. One such assumption is that in working memory, and people have failed at
there are various kinds of memory, among them them (or, after some effort, performed them by
working or “short-term” memory. According to doing what could be regarded as some form of
the theory, thinking gets done using working chunking).
memory: that is, the “objects of thought” that peo- As with evolution, there are mountains of evi-
ple manipulate mentally are temporarily stored in dence that are consistent with this assertion, there
working memory. What makes things interesting is no clear evidence to contradict it, and no rival
(and scientific) is that the theory also places rather hypothesis meets the same criteria. But is it proven?
strong limits on working memory: it has been No, not in the mathematical sense. The relevant
claimed (e.g., in [8]) that people can keep no more standard is, in essence, what a neutral jury would
than about nine “chunks” of information in work- consider to be evidence beyond a reasonable doubt.
ing memory at one time. The same holds for models of, say, problem solv-
To see that this claim might actually be true, one ing or (my current interest) models of teaching
could try to multiply 379 by 658 with eyes closed. (see [12], [13]). I am currently engaged in trying to
Most people will find it difficult if not impossible. construct a theoretical description that explains
(In a recent meeting I gave a group of about sev- how and why teachers do what they do, on the fly,
enty-five mathematicians this task. None of them in the classroom. This work, elaborated at the
succeeded within a few minutes.) The reason is that same level of detail as a theory of memory, is
the number of things a person has to keep track called a “theory of teaching-in-context”. The claim
of—the original numbers and the various subto- is that with the theory and with enough time to
tals that arise in doing the multiplication—exceeds model a particular teacher, one can build a
nine. Now, a person is better able to do the task description of that person’s teaching that charac-
mentally after rehearsing some of the subtotals: terizes his or her classroom behavior with
e.g., a person can calculate 8 × 379 = 3032 and remarkable precision. When one looks at this work,
repeat “3032” mentally until it becomes a chunk one cannot expect to find the kind of precision
and occupies only one space (a “buffer”) in work- found in modeling heat flow in a laminar plate. But
ing memory. That leaves enough working space to (see, e.g., [12]) it is not unreasonable to expect
do other computations. By using this kind of chunk- that such behavior can be modeled with the same
ing, people can transcend the limits of working degree of fidelity to “real-world” behavior as with
memory.1 predator-prey models.
Now consider the truth status of the assertion We pursue the question of standards for judg-
that people’s working memory has no more than ing theories, models, and results in the section
about nine slots. There will never be an absolute after next.
proof of this assertion. First, it is unlikely that the
researchers will find the physical location of work- Methods
ing memory buffers in the brain even if they exist; In this article I cannot provide even a beginning cat-
the buffers are components of models, and they alogue of methods of research in undergraduate
are not necessarily physical objects. Second, the mathematics education. As an indication of the
magnitude of that task, consider the fact that the
1People use “chunking” as a mechanism all the time. A
Handbook of Qualitative Research in Education [6]
trivial example: one can recall 10-digit phone numbers in is nearly 900 pages long! Chapters in that
part by memorizing 3-digit area codes as a unit. More sub- volume include extensive discussions of ethnog-
stantially, the theory asserts that chunking is the primary
raphy (how does one understand the “culture of
the words a person reads is a chunk, which was once a the classroom”, for example?), discourse analysis
collection of letters that had to be sounded out. The same (what patterns can be seen in the careful study of
is the case for all sorts of mathematical concepts that a conversations?), the role of culture in shaping
person now “brings to mind” as a unit. Finally, are “light- cognition, and issues of subjectivity and validity.
ning calculators”—the people who do extraordinary And that is qualitative work alone—there is, of
mental computations rapidly—a counterexample to the course, a long-standing quantitative tradition of re-
claim made here? It does not appear to be the case. Those search in the social sciences as well. My goal, rather,
who have been studied turn out to have memorized a huge
is to provide an orientation to the kinds of work
number of intermediary results. For example, many
that are done and to suggest the kinds of findings
people will bring “72” to mind automatically as a chunk
when working on a calculation that includes 9 × 8 ; the (and limitations thereof) that they can produce.
“lightning calculators” may do the same for the products Those who are new to educational research tend
of 2- or 3-digit numbers. This reduces the load on to think in terms of standard experimental stud-
working memory. ies, which involve experimental and control groups

## 644 NOTICES OF THE AMS VOLUME 47, NUMBER 6

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 645

and the use of statistics to determine whether or just two difficulties, double blind experiments in
not the results are significant. As it turns out, the the medical sense (in which neither the doctors nor
use of statistics in education is a much more com- the patients know who is getting the real treat-
plex issue than one might think. ment and who is getting a placebo treatment) are
For some years from mid-century onward, re- rarely blind, and many experimental variables are
search in the social sciences (in the United States, rarely controllable in any rigorous sense. (That
at least) was dominated by the example of agri- was the point of the example in the previous para-
culture. The basic notion was that if two fields of graph.) As a result, both positive and negative
a particular crop were treated identically except for results can be difficult to interpret. This is not to
one variable, then differences in crop yield could say that such studies are not useful or that large-
be attributed to the difference in that variable. scale statistical work is not valuable—it clearly
Surely, people believed, one could do the same in is—but that it must be done with great care and that
education. If one wanted to prove that a new way results and claims must be interpreted with equal
of teaching X was superior, then one could conduct care. Statistical work of consistent value tends to
an experiment in which two groups of students be that which
studied X—one group taught the standard way, one a) produces general findings about a population.
taught the new way. If students taught the new way For example, Artigue [1] notes that “[m]ore
did better, one had evidence of the superiority of than 40% of students entering French univer-
the instructional method. sities consider that if two numbers A and B are
Put aside for the moment the issues raised in closer than 1/N for every positive N , then they
the previous section about the goals of instruction are not necessarily equal, just infinitely close.”
and the fact that the old and new instruction might b) provides a clear comparison of two or more pop-
not focus on the same things. Imagine that one ulations. For example, the results of the Third
could construct a test fair to both old and new International Mathematics and Science Study
instruction. And suppose that students were ran- document the baseline performance of students
domly assigned to experimental and control in various nations on a range of mathematical
groups, so that standard experimental procedures content.
were followed. Nonetheless, there would still be c) provides substantiation, over time, of findings
serious potential problems. If different teachers that were first uncovered in more small-scale
taught the two groups of students, any differences observational studies.
in outcome might be attributable to differences in What one finds for the most part is that research
teaching. But even with the same teacher, there can methods in undergraduate mathematics education—
be myriad differences. There might be a difference in all of education for that matter—are suggestive of
in energy or commitment: teaching the “same old results and that the combined evidence of many
stuff” is not the same as trying out new ideas. Or studies over time is what lends substantiation to
students in one group might know they are getting findings.
something new and experimental. This alone might I shall expand on this point with one extended
result in significant differences. (There is a large example drawn from my own work. The issue con-
literature showing that if people feel that changes cerns “metacognitive behavior”, or metacognition:
specifically, the effective use of one’s resources
are made in their own best interests, they will
(including time) during problem solving.
work harder and do better—no matter what the
Here is a motivating example. Many years ago,
changes actually are. The effects of these changes
when one standard first-year calculus topic was
fade with time.) Or the students might resent being
techniques of integration, the following exercise
experimented upon.
was the first problem on a test given to a large
Here is a case in point. Some years ago I devel-
lecture class: Z
oped a set of stand-alone instructional materials x
for calculus. Colleagues at another university dx.
x2 − 9
agreed to have their students use them. In all but
two sections the students who were given the The expectation was that the students would make
materials did better than students who were not the obvious substitution u = (x2 − 9) and solve the
given them. However, in two sections there were problem in short order. About half the class did.
essentially no differences in performance. It turns However, about a quarter of the class, noticing that
out that most of the faculty had given the materi- the denominator was factorable, tried to solve the
als a favorable introduction, suggesting to the problem using the technique of partial fractions.
students that they would be helpful. The instruc- Moreover, about 10 percent of the students, notic-
tor of the sections that showed no differences had ing that the denominator was of the form (x2 − a2 ) ,
handed them out saying, “They asked me to give tried to solve the problem using the substitution
these to you. I don’t know if they’re any good.” x = 3 sin θ . All of these methods yield the correct
In short, the classical experimental method can answer, of course, but the second and third are very
be problematic in educational research. To mention time consuming for students. The students who

## JUNE/JULY 2000 NOTICES OF THE AMS 645

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 646

used those techniques did poorly on the test, largely suggests the range. If anything, the number and type
because they ran out of time. of methods have increased, as evidenced in the three
Examples such as this led me to develop some volumes of Research in Collegiate Mathematics
instructional materials that focused on the strategic Education. One finds, for example, reports of
choices that one makes while working integration detailed interviews with students, comparisons of
problems. The materials made a difference in student reform and traditional calculus, an examination of
performance. This provided some evidence that calculus “workshops”, and an extended study of one
strategic choices during problem solving are student’s developing understanding of a physical
important. device and graphs related to it. Studies employing
The issue of strategic choices appeared once anthropological observation techniques and other
again when, as part of my research on problem solv- qualitative methods are increasingly common.
ing, I examined videotapes of students trying to How valid are such studies, and how much can
solve problems. Quite often, it seemed, students we depend on the results in them? That issue is
would read a problem statement, choose a solution pursued immediately below.
method quickly, and then doggedly pursue that
approach even when the approach did not seem to Standards for Judging Theories, Models,
be yielding results. To make such observations and Results
rigorous, I developed a “coding scheme” for There is a wide range of results and methods in
analyzing videotapes of problem solving. This mathematics education. A major question then
analytical framework provided a mechanism for is the following: How much faith should one have
identifying times during a problem session when in any particular result? What constitutes solid
decision making could shape the success or failure reason, what constitutes “proof beyond a reason-
of the attempt. The framework was defined in such able doubt”?
a way that other researchers could use it, not only The following list puts forth a set of criteria that
for purposes of examining my tapes, but also for can be used for evaluating models and theories (and
examining their own as well. Using it, researchers more generally any empirical or theoretical work)
could see how students’ decision making helped or in mathematics education:
hindered their attempts at problem solving. • Descriptive power
Such frameworks serve multiple purposes. First, • Explanatory power
having such a scheme allows the characterization • Scope
of videotapes to become relatively objective: if • Predictive power
two trained analysts working on the same tape • Rigor and specificity
independently produce the same coding of it, then • Falsifiability
there is reason to believe in the consistency of the • Replicability
interpretation. Second, having an analytic tool of • Multiple sources of evidence (“triangulation”)
this type allows one to trace the effects of problem- I shall briefly describe each.
solving instruction: “before and after” comparisons Descriptive Power
of videotapes of problem-solving sessions can By descriptive power I mean the capacity of a
reveal whether students have become more efficient theory to capture “what counts” in ways that seem
or effective problem solvers. Third, this kind of tool faithful to the phenomena being described. As
allows for accumulating data across studies. The Gaea Leinhardt [7] has pointed out, the phrase
one-line summary of results in this case: metacog- “consider a spherical cow” might be appropriate
nitive competence is a very productive factor in when physicists are considering the cow in terms
problem solving.2 For extensive detail, see [9]. of its gravitational mass—but not if one is ex-
As indicated above, research results in education ploring some of the cow’s physiological properties!
are not “proven” in the sense that they are proven Theories of mind, problem solving, or teaching
in mathematics. Moreover, it is often difficult to should include relevant and important aspects of
employ straightforward experimental or statistical thinking, problem solving, and teaching respec-
methods of the type used in the physical sciences tively. At a very broad level, fair questions to ask
because of complexities related to what it means are: Is anything missing? Do the elements of the
for educational conditions to be “replicable”. In theory correspond to things that seem reason-
education one finds a wide range of research able? For example, say a problem-solving session,
methods. A look at one of the first volumes on an interview, or a classroom lesson was video-
undergraduate mathematics education, namely [14], taped. Would a person who read the analysis and
saw the videotape reasonably be surprised by
2In the case at hand (metacognitive behavior), a large num-
things that were missing from the analysis?
ber of studies have indicated that effective decision mak-
ing during problem solving does not “come naturally”. Explanatory Power
Such skills can be learned, although intensive instruction By explanatory power I mean providing explana-
is necessary. When students learn such skills, their prob- tions of how and why things work. It is one thing
lem-solving performance improves. to say that people will or will not be able to do

## 646 NOTICES OF THE AMS VOLUME 47, NUMBER 6

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 647

certain kinds of tasks or even to describe what problems! Thus, engaging in such predictions is an
they do on a blow-by-blow basis; it is quite important methodological tool, even when it is
another thing to explain why. It is one thing, for understood that precise prediction is impossible.]
example, to say that people will have difficulty Rigor and Specificity
multiplying two three-digit numbers in their heads. Constructing a theory or a model involves the
But that does not provide information about how specification of a set of objects and relationships
and why the difficulties occur. The full theoretical among them. This set of abstract objects and
description of working memory, which was men- relationships supposedly corresponds to some set
tioned above, comes with a description of memory of objects and relationships in the “real world”. The
buffers, a detailed explanation of the mechanism relevant questions are:
of chunking, and the careful delineation of how the How well defined are the terms? Would you know
components of memory interact with each other. one if you saw one? In real life? In the model? How
The explanation works at a level of mechanism: it well defined are the relationships among them? And
says in reasonably precise terms what the objects how well do the objects and relations in the model
in the theory are, how they are related, and why correspond to the things they are supposed to
some things will be possible and some not. represent? As noted above, one cannot necessarily
Scope expect the same kinds of correspondences between
By scope I mean the range of phenomena covered parts of the model and real-world objects as in the
by the theory. A theory of equations is not very case of simple physical models. Mental and social
impressive if it deals only with linear equations. constructs such as memory buffers and the “didac-
Likewise, a theory of teaching is not very impres- tical contract” (the idea that teachers and students
sive if it covers only straight lectures! enter a classroom with implicit understandings
Predictive Power regarding the norms for their interactions and that
The role of prediction is obvious: one test of any these understandings shape the ways they act) are
theory is whether it can specify some results in ad- not inspectable or measurable in the ways that heat
vance of their taking place. Again, it is good to keep flow in a laminar plate is. But we can ask for detail,
things like the theory of evolution in mind as a both in what the objects are and in how they fit
model. Predictions in education and psychology are together. Are the relationships and changes among
not often of the type made in physics. them carefully defined, or does “magic happen”
Sometimes it is possible to make precise predic- somewhere along the way? Here is a rough analogy.
tions. For example, Brown and Burton [4] studied For much of the eighteenth century the phlogiston
the kinds of incorrect understandings that students theory of combustion—which posited that in all
develop when learning the standard U.S. algorithm flammable materials there is a colorless, odorless,
for base 10 subtraction. They hypothesized very weightless, tasteless substance called “phlogiston”
specific mental constructions on the part of stu- liberated during combustion—was widely accepted.
dents—the idea being that students did not simply (Lavoisier’s work on combustion ultimately
fail to master the standard algorithm, but rather refuted the theory.) With a little hand waving, the
that students often developed one of a large class of phlogiston theory explained a reasonable range of
incorrect variants of the algorithm and applied it phenomena. One might have continued using it, just
consistently. Brown and Burton developed a simple as theorists might have continued building epicy-
diagnostic test with the property that a student’s cles upon epicycles in a theory of circular orbits.3 The
pattern of incorrect answers suggested the false theory might have continued to produce some use-
algorithm he or she might be using. About half of the ful results, good enough “for all practical purposes”.
time they were then able to predict the incorrect That may be fine for practice, but it is problematic
answer that the student would obtain to a new with regard to theory. Just as in the physical sci-
problem before the student worked the problem! ences, researchers in education have an intellectual
Such fine-grained and consistent predictions obligation to push for greater clarity and specificity
on the basis of something as simple as a diagnos- and to look for limiting cases or counterexamples to
tic test are extremely rare of course. For example, see where the theoretical ideas break down.
no theory of teaching can predict precisely what Here are two quick examples. First, in my research
a teacher will do in various circumstances; human group’s model of the teaching process, we
behavior is just not that predictable. However, a represent aspects of the teacher’s knowledge, goals,
theory of teaching can work in ways analogous to beliefs, and decision making. Skeptics (including us)
the theory of evolution. It can suggest constraints should ask, how clear is the representation? Once
and even suggest likely events. terms are defined in the model (i.e., once we specify
[Making predictions is a very powerful tool in a teacher’s knowledge, goals, and beliefs) is there
theory refinement. When something is claimed to 3This example points to another important criterion, sim-
be impossible and it happens, or when a theory plicity. When a theory requires multiple “fixes” such as
makes repeated claims that something is very epicycles upon epicycles, that is a symptom that something
likely and it does not occur, then the theory has is not right.

## JUNE/JULY 2000 NOTICES OF THE AMS 647

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 648

hand waving when we say what the teacher might do was inconclusive: about half of the studies showed
defined so that others could run it and make half not. A closer look revealed the reason: the very
the same predictions? Second, the “APOS theory” term was ill defined. Various experimenters made
expounded in [2] uses terms such as Action, Process, up their own advance organizers based on what
Object, and Schema. Would you know one if you met they thought they should be—and there was huge
one? Are they well defined in the model? Are the variation. No wonder the findings were inconclu-
ways in which they interact or become transformed sive! (One standard technique for dealing with
well specified? In both cases the bottom line issues issues of well-definedness, and which addresses
are, What are the odds that this too is a phlogiston- issue (2) above, is to have independent researchers
like theory? Are the people employing the theory go through the same body of data and then
constantly testing it in order to find out? Similar compare their results. There are standard norms
questions should be asked about all of the terms in the field for “inter-rater reliability”; these norms
used in educational research, e.g., the “didactical quantify the degree to which independent
contract”, “metacognition”, “concept image”, and analysts are seeing the same things in the data.)
“epistemological obstacles”. Multiple Sources of Evidence (“Triangulation”)
Falsifiability Here we find one of the major differences between
The need for falsifiability—for making nontauto- mathematics and the social sciences. In mathe-
logical claims or predictions whose accuracy can be matics one compelling line of argument (a proof)
tested empirically—should be clear at this point. It is enough: validity is established. In education and
is a concomitant of the discussion in the previous two the social sciences we are generally in the business
subsections. A field makes progress (and guards of looking for compelling evidence. The fact is,
against tautologies) by putting its ideas on the line. evidence can be misleading: what we think is
Replicability general may in fact be an artifact or a function of
The issue of replicability is also intimately tied to circumstances rather than a general phenomenon.
that of rigor and specificity. There are two related Here is one example. Some years ago I made a
sets of issues: (1) Will the same thing happen if the series of videotapes of college students working
circumstances are repeated? (2) Will others, once on the problem, How many cells are there in an
appropriately trained, see the same things in the average-size human adult body? Their behavior
data? In both cases answering these questions was striking. A number of students made wild
depends on having well-defined procedures and guesses about the order of magnitude of the
constructs. dimensions of a cell—from “let’s say a cell is an
The phrasing of (1) is deliberately vague, be- angstrom unit on a side” to “say a cell is a cube
cause it is supposed to cover a wide range of cases. that’s 1/100 of an inch wide.” Then, having
In the case of short-term memory, the claim is that dispatched with cell size in seconds, they spent a
people will run into difficulty if memory tasks very long time on body size, often breaking the
require the use of more than nine short-term body into a collection of cylinders, cones, and
memory buffers. In the case of sociological analy- spheres and computing the volume of each with
ses of the classroom, the claim is that once the some care. This was very odd.
didactical contract is understood, the actions of the Some time later I started videotaping students
students and teacher will be seen to conform to working problems in pairs rather than by them-
that (usually tacit) understanding. In the case of selves. I never again saw the kind of behavior
beliefs, the claim is that students who hold certain described above. It turns out that when they were
beliefs will act in certain ways while doing mathe- working alone, the students felt under tremendous
matics. In the case of epistemological obstacles pressure. They knew that a mathematics professor
or APOS theory, the claims are similarly made that would be looking over their work. Under the
students who have (or have not) made particular circumstances they felt they needed to do some-
mental constructions will (or will not) be able to do thing mathematical, and volume computations
certain things. at least made it look as if they were doing
In all of these cases the usefulness of the find- mathematics! When students worked in pairs, they
ings, the accuracy of the claims, and the ability to started off by saying something like “This sure is a
falsify or replicate depend on the specificity with weird problem.” That was enough to dissipate some
which terms are defined. Consider this case in of the pressure, the result being that there was no
point from the classical education literature. need for them to engage in volume computations
Ausubel’s theory of “advance organizers” in [3] pos- to relieve it. In short, some very consistent behavior
tulates that if students are given an introduction was actually a function of circumstances rather than
to materials they are to read that orients them to being inherent in the problem or the students.
what is to follow, their reading comprehension One way to check for artifactual behavior is to
will improve significantly. After a decade or two vary the circumstances: to ask, do you see the same
and many, many studies, the literature on the topic thing at different times in different places? Another

## 648 NOTICES OF THE AMS VOLUME 47, NUMBER 6

fea-schoenfeld.qxp 5/10/00 9:57 AM Page 649

is to seek as many sources of information as possi- mathematics education should be healthy skeptics.
ble about the phenomenon in question and to see In particular, because there are no definitive
whether they portray a consistent message. In my answers, one should certainly be wary of anyone
research group’s work on modeling teaching, for who offers them. More generally, the main goal for
example, we draw inferences about the teacher’s the decades to come is to continue building a
behavior from videotapes of the teacher in action— corpus of theory and methods that will allow
but we also conduct interviews with the teacher, research in mathematics education to become an
review his or her lesson plans and class notes, and ever more robust basic and applied field.
discuss our tentative findings with the teacher. In this
way we look for convergence of the data. The more References
independent sources of confirmation there are, the [1] M. ARTIGUE, The teaching and learning of mathe-
more robust a finding is likely to be. matics at the university level: Crucial questions for
contemporary research in education, Notices Amer.
Math. Soc. 46 (1999), 1377–1385.
Conclusion
[2] M. ASIALA, A. BROWN, D. DE VRIES, E. DUBINSKY, D. MATHEWS,
The main point of this article has been that and K. T HOMAS , A framework for research and
is a very different enterprise from research in matics education, Research in Collegiate Mathematics
mathematics and that an understanding of the Education (J. Kaput, A. Schoenfeld, and E. Dubinsky,
differences is essential if one is to appreciate (or eds.), vol. II, Conference Board of the Mathematical
better yet, contribute to) work in the field. Findings Sciences, Washington, DC, pp. 1–32.
[3] D. P. AUSUBEL, Educational Psychology: A Cognitive
are rarely definitive; they are usually suggestive.
View, Holt-Reinhardt-Winston, New York, 1968.
Evidence is not on the order of proof, but is
[4] J. S. BROWN and R. R. BURTON, Diagnostic models
cumulative, moving towards conclusions that can for procedural bugs in basic mathematical skills,
be considered to be beyond a reasonable doubt. A Cognitive Science 2 (1978), 155–192.
scientific approach is possible, but one must take [5] R. G. D OUGLAS (ed.), Toward a Lean and Lively
care not to be scientistic—what counts are not the Calculus, MAA Notes Number 6, Mathematical
trappings of science, such as the experimental Association of America, Washington, DC, 1986.
method, but the use of careful reasoning and [6] M. LECOMPTE, W. MILLROY, and J. PREISSLE, Handbook of
standards of evidence, employing a wide variety of Qualitative Research in Education, Academic Press,
New York, 1992.
methods appropriate for the tasks at hand.
[7] G. LEINHARDT, On the messiness of overlapping goals
It is worth remembering how young mathemat- in real settings, Issues in Education 4 (1998), 125–132.
ics education is as a field. Mathematicians are used [8] G. MILLER, The magic number seven, plus or minus
to measuring mathematical lineage in centuries, if two: Some limits on our capacity for processing
not millennia; in contrast, the lineage of research in information, Psychological Review 63 (1956), 81–97.
mathematics education (especially undergraduate [9] A. H. SCHOENFELD, Mathematical Problem Solving,
The journal Educational Studies in Mathematics dates [10] ——— (ed.), Student Assessment in Calculus, MAA
Notes Number 43, Mathematical Association of
to the 1960s. The first issue of Volume 1 of the
America, Washington, DC, 1997.
Journal for Research in Mathematics Education was
[11] ——— , On theory and models: The case of teaching-
published in January 1970. The series of volumes in-context, Proceedings of the XX Annual Meeting of
Research in Collegiate Mathematics Education—the the International Group for Psychology and Mathe-
first set of volumes devoted solely to mathematics matics Education (Sarah B. Berenson, ed.), Psychol-
education at the college level—began to appear in ogy and Mathematics Education, Raleigh, NC, 1998.
1994. It is no accident that the vast majority of [12] ——— , Toward a theory of teaching-in-context, Issues
articles cited by Artigue [1] in her 1999 review of in Education 4 (1998), 1–94.
research findings were written in the 1990s; there was [13] ——— , Models of the teaching process, Journal of
Mathematical Behavior (in press).
little at the undergraduate level before then! There
[14] D. TALL (ed.), Advanced Mathematical Thinking,
has been an extraordinary amount of progress in Kluwer, Dordrecht, 1991.
recent years, but the field is still very young, and
there is a very long way to go.
Because of the nature of the field, it is appro-
priate to adjust one’s stance toward the work and
its utility. Mathematicians approaching this work
should be open to a wide variety of ideas, under-
standing that the methods and perspectives to
which they are accustomed do not apply to
educational research in straightforward ways. They
should not look for definitive answers but for
ideas they can use. At the same time, all consumers
and practitioners of research in (undergraduate)