Anda di halaman 1dari 22

Journal of Experimental Child Psychology 111 (2012) 246–267

Contents lists available at SciVerse ScienceDirect

Journal of Experimental Child


Psychology
journal homepage: www.elsevier.com/locate/jecp

Cognitive processes of numerical estimation in children


Mark H. Ashcraft ⇑, Alex M. Moore
Department of Psychology, University of Nevada, Las Vegas, NV 89154, USA

a r t i c l e i n f o a b s t r a c t

Article history: We tested children in Grades 1 to 5, as well as college students, on


Received 22 September 2010 a number line estimation task and examined latencies and errors to
Revised 30 July 2011 explore the cognitive processes involved in estimation. The devel-
Available online 19 September 2011
opmental trends in estimation were more consistent with the
hypothesized shift from logarithmic to linear representation than
Keywords:
with an account based on a proportional judgment application of
Estimation
Number representations
a power function model; increased linear responding across ages,
Magnitude estimates as predicted by the log-to-lin shift position, yielded reasonable
Number lines developmental patterns, whereas values derived from the cyclical
Midpoint strategy power model were difficult to reconcile with expected develop-
Log-to-lin shift mental patterns. Neither theoretical position predicted the marked
Math achievement ‘‘M-shaped’’ pattern that was observed, beginning in third graders’
errors and fourth graders’ latencies. This pattern suggests that esti-
mation comes to rely on a midpoint strategy based on children’s
growing number knowledge (i.e., knowledge that 50 is half of
100). As found elsewhere, strength of linear responding correlated
significantly with children’s performance on standardized math
tests.
Ó 2011 Elsevier Inc. All rights reserved.

Introduction

How do children make magnitude or numerical estimates? Considerable research has been devoted
to this topic recently, revealing fascinating results about children’s growing understanding of number
relations. Very young children come to appreciate the relationship between increases in number and
magnitude, as demonstrated, for example, by their growing ability to generalize count words to new
contexts (Huang, Spelke, & Snedeker, 2010). But according to a widely held interpretation (e.g., Booth
& Siegler, 2008; Dehaene, 1997; Siegler & Opfer, 2003), they often fail to appreciate an integral aspect

⇑ Corresponding author. Fax: +1 702 895 0195.


E-mail address: mark.ashcraft@unlv.edu (M.H. Ashcraft).

0022-0965/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.jecp.2011.08.005
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 247

of the number system—that the whole numbers are evenly spaced along the number line. Instead, it
seems that in young children’s mental representation of magnitude, small values are far more differ-
ent from each other than larger values; for example, 1 and 2 are more different than 8 and 9.
The evidence for such conclusions consists of children’s numerical estimation performance, often
how they make estimates on a number line (e.g., Siegler & Opfer, 2003; see also Booth & Siegler,
2006). Typically, children are shown a horizontal number line, labeled 0 on the left and 100 (or
1000) on the right. In the number-to-position (NP) version of the task, they are given a number, say
20, and asked to show where that value is located on the line. Kindergartners usually choose a location
by drawing a hatch mark closer to where 50 belongs. More generally, they generate estimates across
the line that are best fit by a logarithmic function (see Fig. 1 in Booth & Siegler, 2006). In the alternate
version of the task that we focus on here, called position-to-number (PN), the hatch mark is already
placed on the line and children give an estimate of its numerical value. To continue with the example,
if the hatch mark is at position 50, a kindergartner may estimate its value to be 20. More generally,
these children give estimates that are best fit by an exponential function (essentially a mirror image
of the log function in the NP task). These patterns led Siegler and colleagues to conclude that the
underlying mental representation of magnitude is an unevenly spaced logarithmic or ‘‘compressed’’
mental number line (e.g., Dehaene, 1997) in which smaller values are spaced farther apart than they
should be and larger values are compressed closer together. Such a representation is similar in many
respects to those found in children with math disabilities and even some nonhumans (e.g., Dehaene,
Dehaene-Lambertz, & Cohen, 1998; Geary, Hoard, Byrd-Craven, Nugent, & Numtee, 2007; Starkey &
Cooper, 1980).
Somewhat older children, however, show a substantial developmental change in their perfor-
mance. For the 0–100 range, for example, second and third graders generally give estimates of mag-
nitude that are more linear, showing essentially a one-to-one relationship between a value being
judged and their estimate of its value (Siegler & Booth, 2004). The interpretation is that their under-
lying mental representation of these numbers has now become linear as well (e.g., Booth & Siegler,
2006); that is, it now reflects the equal spacing principle of the mature number system. Interestingly,
children at this age also show evidence of holding both kinds of representations simultaneously,
depending on the number line being judged. That is, although the majority of Siegler and Booth’s
(2004; see also Siegler & Opfer, 2003) second graders responded linearly when estimating values on
a 0–100 number line, they continued to respond logarithmically on the 0–1000 number line. Appar-
ently, their familiarity with numbers to 100 provided them with the requisite knowledge to adopt the
linear mental representation of equal intervals in that range, whereas their unfamiliarity with larger
values left them reliant on the more primitive log representation of number. Within another year or
so, however, linear responding in the 0–1000 range became much more common, and by adulthood
the linear model provided the best fit for 100% of the tested adults (Siegler & Opfer, 2003). The role
of educational effects seems particularly strong in the shift to the linear representation of number.
For example, using a number line task with 1 to 10 dots, Dehaene, Izard, Spelke, and Pica (2008) tested
children and adults from a minimally schooled native tribe in the Amazon and found overwhelming
evidence for logarithmic representations in both groups versus 100% linear responding in educated
Westerners.
Part of our interest in the number line estimation task is the clear developmental pattern it has re-
vealed. As shown in Siegler and Opfer (2003), linear fits to children’s estimates on the 0–1000 number
lines grow across age, reflecting children’s growing understanding of numerosity and the number sys-
tem and in particular the principle of equal intervals between integer values. As such, the number line
estimation task seemed important to include in our battery of math tasks designed to assess growth in
children’s math skills. A second reason for our interest is the strong correlation found between perfor-
mance in number estimates and math achievement; the greater the linear fit of a child’s numerical
estimates, the higher the child’s math score on standardized testing (Booth & Siegler, 2006; Siegler
& Booth, 2004). This relationship suggests that estimates about numerosity tap into children’s growing
numerical sophistication and reasoning.
A different explanation of children’s estimation performance, the proportion judgment approach,
was recently offered by Barth and Paladino (2011). This account challenges the fits of existing data
to logarithmic or linear patterns and, therefore, challenges the inference that there is a shift from log
248 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

to linear representation of magnitude. This new account is an extension of work using a power
model for perceptual proportion judgments (Spence, 1990; see Hollands & Dyre, 2000, and Hollands,
Tanaka, & Dyre, 2002, for the original explanations of this account). Barth and Paladino (2011)
suggested that whenever an estimation task provides numbers marking both endpoints of the line
as well as a value to be judged, the participant is of necessity making an estimate of a proportion.
That is, if given the value 30 to place on a 0–100 line, the child must estimate the magnitude of 30,
estimate the magnitude of 100, and then—critically—estimate what proportion 30 is of the total 100.
Given this insight, they argued that a more appropriate way to examine performance is in terms of
the literature on proportion judgments, that is, Spence’s (1990) and Hollands and Dyre’s (2000)
adaptations of Stevens’ (1957) power law for psychophysical judgments along a continuum. Thus,
the Barth and Paladino (2011) stance is that the attempt to infer underlying mental representation
based on log or linear patterns is misguided given that it ignores the proportion judgment aspect of
performance inherent in the task. Taking this aspect into account, the authors argued that a simpler
pattern of responding is present in the data, predicted by the cyclical power model of proportion
judgments (Hollands & Dyre, 2000), and that this model provides a better explanation of the data.
The base prediction of the cyclical power model (Hollands & Dyre, 2000; Hollands et al., 2002) is
that responses will take the form of an S-shaped function around the y = x line, with the degree or
amplitude of the deviation from this line, the bias in estimates, indexed by b, the exponent of the func-
tion (Stevens exponent). In an estimation task, the function is S-shaped when b is greater than 1.0 and
is reverse S-shaped when b is less than 1.0; as responses approach linearity, b approaches 1.0. In the
simple case with b < 1.0, the function predicts overestimates of magnitude below the midpoint (pro-
portions < .5), underestimates beyond the midpoint (proportions > .5), and little or no bias in estimates
(y = x) at the origin, midpoint, and endpoint; this over-then-underestimate pattern is commonly ob-
served in many tasks (see Hollands et al., 2002). The pattern reverses to under-then-overestimates
when b > 1.0. In the simple case, there is only one under-then-over cycle across the entire line, so
the responses are fit by the one-cycle power model. When an additional reference point is used during
estimation, typically the midpoint, a two-cycle power function often fits the data (Hollands & Dyre,
2000), where the cyclic function completes its under-then-overestimate S-pattern up to the midpoint
and then repeats again from the midpoint up to the endpoint of the line. Such one- and two-cycle
functions have been used to fit adult data in a variety of perceptual proportion tasks (see Hollands
& Dyre, 2000).
Barth and Paladino (2011) extended this proportion judgment approach to children’s performance
using the NP task with 7-year-olds. They indicated that most of their 21 participants were better fit by
either the one- or two-cycle power model (11 and 8 children, respectively) and that these fits were
better than the linear fits proposed by Siegler and his associates (although no statistical evidence
was provided); the two-cycle model fit their aggregate data. Their explanation for Siegler’s repeated
observation of logarithmic responding among younger children on the NP task was that these children
have an indistinct or inaccurate representation in mind for the endpoint of 100 (or 1000) and, thus,
functionally treat that value as some other quantity when making their proportion judgments. They
argued that seemingly logarithmic curves are in fact cyclical power curves once the functional end-
point of the scale has been adjusted to take into account children’s inaccurate representation of that
value. Because they found children’s responses to be fit better by the cyclic power model, Barth and
Paladino rejected the notion that there is a log-to-lin shift in children’s mental representation of mag-
nitude. Instead, they suggested that a single representation, ‘‘a power function representation of num-
ber’’ (p. 133), can generate the observed patterns and that the proportion judgment account can
explain the data more parsimoniously with fewer parameters (one parameter, b, vs. two parameters
in computing R2 fits, the slope, and the intercept).
A concern with Barth and Paladino’s (2011) method and results, however, needs to be addressed
here. During their instruction phase, Barth and Paladino asked their 7-year-old participants to mark
the 0–100 number line to show where 50 should go and then showed them a line marked in the mid-
dle. The authors then asked the children whether they knew why 50 went there and gave an expla-
nation that ‘‘because 50 is half of 100, it goes right in the middle between 0 and 100. So, 50 goes
right there, but it’s the only number that goes there’’ (p. 128). In our view, such instructions, even
if seemingly brief, may have pronounced effects on performance; for example, Siegler and Ramani
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 249

(2009) found dramatic shifts in performance after minimal teaching about equally spaced intervals
(see also Izard & Dehaene, 2008). Our instructions avoided such examples and instead attempted to
assess children’s estimation processes unaffected by possible biases that might have been introduced
by such targeted instructions.
At a more general level, notice that neither of these two explanations discusses the possible mental
processes involved in children’s numerical estimates. Siegler’s approach is very clear in discussing
children’s mental representations of magnitude and how those representations might change across
age and education; in our view, the Barth and Paladino (2011) approach is rather opaque in this re-
gard, referring to a power function representation and children’s possibly inaccurate knowledge of
the endpoint. But both positions are relatively silent with respect to the underlying mental processes
involved in numerical estimation and the general processes of accessing and using the underlying
mental representations of magnitude (but see the discussion of a response strategy based on subjec-
tive landmarks in Siegler & Opfer, 2003). Theorists speak of the ‘‘mental ruler’’ or ‘‘mental number
line’’ (e.g., Dehaene, 1997; Restle, 1970), and even its left-to-right orientation in physical space (Deh-
aene, Bossini, & Giraux, 1993), but they do not say how mental processes might operate on such rep-
resentations. The ruler metaphor implies that the mental processes might resemble the act of reading
off marked entries from a mental ruler, but existing evidence has done little to reveal what these pro-
cesses might be.
In an attempt to gain insight about these underlying cognitive processes, we measured not only
children’s responses and errors in the number line estimation task but also their latencies as they
made their estimates; the general point, of course, is that mental processes may be revealed by exam-
ining their time characteristics (e.g., Posner, 1978). The combination of latencies and errors has been
used to advantage in several recent magnitude estimation studies with adults (Durette, 2009; Durette,
Rudig, & Ashcraft, 2011) and so was adopted here as well. In brief, we used the PN version of the mag-
nitude estimation task. We showed children number lines with 0 on the left and either 100 or 1000 on
the right and a vertical hatch mark placed somewhere on the line. Children needed to say what num-
ber corresponded to the hatch mark, and their oral responses were timed. If a mental analog to a phys-
ical number line is accessed during processing, an increasing pattern of latencies and errors across line
positions might suggest some sequential access process across the mental number line; a flat pattern
of latencies, on the other hand, might imply a direct access process (e.g., Sternberg, 1966). More
nuanced patterns, in either latencies or errors, could also indicate more tailored or specialized estima-
tion strategies.

Method

Participants

A total of 124 students from Grades 1 to 5 in a public elementary school participated in this study.
The school was located in a middle- to upper middle-class area of Las Vegas, Nevada, where 19.1% of
the students were eligible for reduced price or free lunches. In the sample, 54.5% of the students were
Caucasian, 19.8% were Hispanic, 15.7% were Asian/Pacific Islander, and 9.4% were African American.
For comparison purposes, we also tested a sample of 20 college students using the same battery of
tests and procedures. We present the results with adults to illustrate the developmental endpoint
of the processes under consideration. Because including adult data in the analyses might be expected
to render all grade effects significant regardless of differences among the grade school samples, we
present statistical analyses both with and without the adult sample throughout. Table 1 provides a
brief description of the sample of participants by age and gender. Except for the college sample, all
participants were tested during the spring of 2008.
The battery of tasks was designed to test a variety of cognitive processes related to math perfor-
mance; we present the results on the number line estimation task in this article. A classroom session
was conducted prior to individual testing to collect demographic information and familiarize the
children with the number line estimation task (using 0–10 number lines). In individual testing, partic-
ipants completed the number line estimation tasks on ordinary laptop computers, presented via
250 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

E-prime software (Schneider, Eschman, & Zuccolotto, 2002), as well the remainder of the battery of
math tasks. Total testing time in individual sessions was limited to approximately 45 min with
roughly 25 min for line estimations.

Stimuli

For all estimation stimuli, a number line was presented on the computer monitor with the appro-
priate endpoints shown just below the line, 0 at the left and 100 or 1000 at the right. We used the PN
version of the task, where a hatch mark is shown on the line and children are asked to state the num-
ber that corresponds to that value. Each mark corresponded to an exact value on the number line;
these values were used to compute participants’ errors. First and second graders completed 0–100
number lines, whereas third, fourth, and fifth graders completed 0–100 and 0–1000 number lines.
A total of 26 values were used for the hatch mark positions on the 0–100 number line trials (1 per
trial); the values were 3, 4, 6, 8, 12, 14, 17, 18, 21, 24, 25, 29, 33, 39, 42, 48, 52, 57, 61, 64, 72, 79, 81, 84,
90, and 96. For the 0–1000 lines, the values were 31, 44, 62, 89, 123, 143, 176, 182, 215, 243, 253, 297,
333, 395, 421, 489, 526, 577, 610, 644, 724, 791, 814, 847, 901, and 966. As was the case in Siegler and
Opfer (2003), we oversampled values in the lower half of the number lines to maximize the chances of
detecting responding based on a logarithmic number line representation. (Note that equal sampling of
values below and above the midpoint did not alter the results in any meaningful way in our other
studies with adults; see Durette et al., 2011.) Children in Grades 3 to 5 were shown the 0–100 line tri-
als prior to the 0–1000 line trials, thereby completing the simpler task first. The order of trials within a
set was random for each participant.

Procedure

Children were tested individually by the experimenter with an assistant present. Participants were
given the following task instructions: ‘‘Now I’m going to show you some number lines. Have you seen
number lines in math class? Okay, the number lines here will start at 0 and end at 100 [or 1000]. On
each number line, you will see a mark somewhere on that line. What I want you to do is tell me what
number you think that mark is. I want you to say the number into the microphone as quickly as you
can, but only after you are completely sure what number you want to say. Do you have any ques-
tions?’’ The participants were provided with two practice trials before beginning the experimental tri-
als, using marks at 3 and 61 (0–100) and 31 and 610 (0–1000). The only feedback provided was a
reminder of the upper bound of the number line in case children estimated beyond that number,
and this feedback was provided only during the practice trials. Instructions placed equal emphasis
on speed and accuracy. Importantly, there was no mention made during instructions or the practice
trials of any particular value or its position on the number line, even to provide an example. Thus, un-
like the procedures used in Barth and Paladino’s (2011) study, children were not shown or taught
where 50 went on the 0–100 number line nor was any mention made of ‘‘the midpoint’’ (and likewise
for the 0–1000 line practice trials).
Participants spoke their estimates into a microphone, which operated as a voice key to the software
to stop the trial and record the trial latency (in ms). We cautioned participants against making
any extraneous vocalizations that would trigger the voice key. After an answer was provided, the

Table 1
Sample size, mean age, and gender distribution by grade.

Grade n Mean age (years) Males Females


First grade 23 6.75 9 14
Second grade 26 7.54 14 12
Third grade 20 8.25 8 12
Fourth grade 32 9.71 15 17
Fifth grade 23 10.71 14 9
College 20 23.00 8 12
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 251

experimenter entered the participant’s response on the computer keyboard and this value was re-
corded by the software for later analysis. Prior to conducting the analyses on latencies, reaction times
were screened for possible outliers using Dixon’s (1953) test and were removed from the analyses if
they were identified as extreme scores (at the .05 level). Outliers did not cluster around particular val-
ues or areas along the number lines but were simply more numerous for third graders on the 0–1000
lines and somewhat more numerous for college students, whose latencies were much faster overall
(hence, discrepant values were more likely to be detected as outliers). The percentages of latencies re-
moved on the 0–100 task were 2.5%, 1.1%, 2.7%, 1.7%, 2.2%, and 5.0% for Grades 1 to 5 and college,
respectively. For the 0–1000 task, the percentages were 9.2%, 1.8%, 2.5%, and 4.8% for Grade 3 to 5
and college, respectively.

Results

We begin by examining children’s overall estimation results, first in terms of the possibility of a
log-to-lin shift in numerical representation (e.g., Siegler & Opfer, 2003) and then in terms of the pro-
portion judgment model (Barth & Paladino, 2011), followed by a direct comparison of the two ap-
proaches. We then consider performance in terms of absolute errors and latencies to examine
underlying cognitive processes of estimation. With few exceptions,1 all children contributed 26 data
points to each analysis on the 0–100 and 0–1000 number lines. All results reported here had obtained
p values < .05 or smaller unless otherwise noted.

Shape of mental number line across grades

Fig. 1 shows the median responses to the 0–100 estimation task across all 26 positions sampled
on the number line separately by grade. Each panel also shows the diagonal reference line from 0 to
100, representing the linear relationship between x and y. Departures from this reference line
represent deviations from perfect linear estimation (i.e., errors in estimation). Inspection of the
panels beginning with third grade shows clustering of the responses close to the reference line,
suggesting a high degree of linearity. The panels for first and second grades, however, appear to
show substantial departures from linearity, in particular rather large underestimates on the lower
portion of the line and smaller overestimates at the upper end. This is the pattern found in Booth
and Siegler (2006) for the PN task, possibly an exponential tendency. If these are truly exponential,
this change in patterns from first to third grades and an increase across grades in the number of
children showing linear responding (e.g., Booth & Siegler, 2006) would be consistent with Siegler’s
proposal of a log-to-lin shift.

Log-to-lin shift
To assess this possibility, we computed the statistical fit of each child’s responses to linear, expo-
nential, and logarithmic functions by means of regression analysis, to see which function yielded the
highest R2 value on the 0–100 lines. The log function provided the best fit for only 2 first graders and 1
second grader on the 0–100 lines and otherwise was far below linear and exponential fits. Accordingly,
we do not discuss the results on log fits further.
Table 2 presents the mean R2 values of these curve fitting analyses separately by grade and also
shows the number of children at each grade level who were best fit by the linear and exponential
models. Each child’s obtained R2 values for exponential and linear fits were compared in paired-sam-
ple t tests, and effect sizes were computed separately for each grade; positive t values indicate supe-
riority of the linear fits.
As shown in Table 2 for the 0–100 line data, neither model provided a better overall fit to the first
graders’ responses. There was a mixture of linear (9) and exponential (11) responders in first grade,

1
Due to experimenter error, 1 fourth grader was not tested on the 0–1000 task. In addition, 4 fourth graders consistently (e.g.,
on 20 of the 26 trials), and possibly mischievously, made very small estimates on the 0–1000 task (e.g., responded with values no
larger than 300 even when the hatch mark was in the 700 to 800 range). Because these estimates were suspicious, seriously
inaccurate, and frequent and had a biasing effect on the group analyses, these children were dropped from the 0–1000 analyses.
252 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

A B

C D

E F

Fig. 1. Median responses, Grades 1 to 5 and college (A–F, respectively), on the 0–100 number line task. The y = x diagonal is
illustrated for reference.

suggesting that children at this age are in transition from a logarithmic to a linear representation of
magnitude. Beginning with second grade, however, the R2 linear values show increasingly better fits
with the data; for example, two thirds of the second graders were better fit by the linear model. Fig. 2
displays the median estimation plots for first and second graders separately for the linear and
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 253

Table 2
Grade (with sample size), mean R2 values (with number of individuals best fit by that model) for linear and exponential
responders, t test comparison between linear and exponential solutions, and effect size (d) for the comparison.

Grade R2lin R2exp t Statistic Effect size (d)


0–100 Estimation task
First grade (22) .69 (9) .69 (11) t(21) = .01 (ns) 0.00
Second grade (25) .85 (16) .78 (8) t(24) = 2.77 0.55
Third grade (20) .93 (20) .80 (0) t(19) = 8.79 1.96
Fourth grade (32) .95 (32) .82 (0) t(31) = 12.55 2.22
Fifth grade (23) .96 (23) .79 (0) t(22) = 12.66 2.64
College (20) .98 (20) .80 (0) t(19) = 17.89 4.00

0–1000 Estimation task


Third grade (20) .88 (16) .78 (4) t(19) = 3.91 0.87
Fourth grade (27) .87 (20) .76 (7) t(26) = 3.60 0.69
Fifth grade (23) .92 (19) .82 (4) t(22) = 4.73 0.94
College (20) .97 (20) .84 (0) t(19) = 20.05 4.48

Note. On the 0–100 task, 2 first graders and 1 second grader were best fit by the log model. All t values were significant at p < .05
or smaller except as noted.

A B

C D

Fig. 2. Median responses, Grade 1 (A,B) and Grade 2 (C,D), on the 0–100 number line task separately for linear (A,C) and
exponential (B,D) responders. The y = x diagonal is illustrated for reference.
254 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

A B

C D

Fig. 3. Median responses, Grades 3 to 5 and college (A–D, respectively), on the 0–1000 number line task. The y = x diagonal is
illustrated for reference.

exponential subgroups, showing how different the two response patterns were. In contrast, children in
third grade and older were universally linear in their estimation responses for the 0–100 lines, as
shown in Table 2; accordingly, Grade 3 and higher grades are not shown in Fig. 2. The increasing lin-
earity of responding across grades was confirmed in an analysis of variance (ANOVA) using each
child’s R2lin estimate as the dependent variable, F(4, 117) = 20.19, g2 = .41 [with adults included,
F(5, 136) = 21.20, g2 = .44]. A post hoc analysis (Tukey’s WSD for post hoc tests on between-participant
effects throughout; for this analysis, means needed to differ by .09) indicated that first grade was sig-
nificantly lower in linearity than all other grades, second grade was marginally different from third
grade, and Grade 3 and higher grades were all equal in terms of linearity. The growing superiority
of the linear fits across grades is also demonstrated by the significant t statistics and effect sizes shown
in Table 2.
For the 0–1000 lines, third to fifth graders showed a strong, but not universal, pattern of linear
responding (see Table 2), and fully 100% of the adult participants were best fit by the linear model.
Fig. 3 displays the overall median response patterns for the 0–1000 lines. The analysis of R2lin values
showed an age effect only when the college sample was included, F(3, 86) = 5.55, g2 = .16 [with college
excluded, F(2, 67) = 1.48, ns]. This indicates that there was no increase in linearity of fits on the 0–1000
lines across Grades 3 to 5. Note, however, that based on the t tests reported in Table 2, linear fits to the
estimates were still superior to exponential fits and were certainly more frequent. Fig. 4 segregates the
participants into linear and exponential subgroups. As with younger children on the 0–100 lines, a
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 255

A B

C D

E F

Fig. 4. Median responses, Grade 3 (A,B), Grade 4 (C,D), and Grade 5 (E,F), on the 0–1000 number line task separately for linear
(A,C,E) and exponential (B,D,F) responders. The y = x diagonal is illustrated for reference.

distinctly different pattern was apparent for the two subgroups, with marked underestimates on the
lower portion of the line for the exponential responders versus estimates close to the reference
diagonal for those who responded linearly.
256 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

Overall, these results replicated those reported by Siegler and colleagues (e.g., Booth & Siegler,
2006; Siegler & Opfer, 2003) as well as by Durette and colleagues (2011) in their work with adults.
The data conform well to the hypothesis of a log-to-lin shift in the mental representation of
magnitude, as assessed by estimates on the number line estimation task. Note also that the evidence
supports Siegler’s hypothesis that some children seem to hold both logarithmic and linear represen-
tations of numerical magnitude simultaneously. That is, all children at third grade and above
responded in a linear fashion to the 0–100 lines. Yet, a substantial minority of the same
children—20%, 26%, and 17% from Grades 3 to 5, respectively—responded with exponential patterns
to the 0–1000 lines.

Proportion judgment model


Several of the patterns in the figures (e.g., second grade linear curve in Fig 2, college curve in Fig. 3)
seem to conform to the S- or reverse S-shaped patterns found in Barth and Paladino (2011). To assess
the adequacy of this account, we estimated each participant’s one- and two-cycle betas on the 0–100
and 0–1000 lines, based on the cyclic power model (Hollands & Dyre, 2000), using nonlinear regres-
sion. We then conducted simple one-way ANOVAs to evaluate possible changes in beta across age.
Note that despite recent departures from this usage, we computed these betas in accordance with
the original terminology in Hollands and colleagues (2002) and the psychophysical literature in gen-
eral. That is, we treat our PN task as a true estimation task, in which the participant ‘‘assigns a numeric
value’’ to the hatch mark to ‘‘reflect its relative contribution to the total magnitude’’ (p. 563; see also
Stevens, 1957, p. 165, for a parallel definition of ‘‘magnitude estimation’’). This is in contrast to a pro-
duction task, in which the participant is given a number and ‘‘adjusts the relative magnitude of the
stimuli’’ (Hollands et al., 2002, p. 563), that is, draws a hatch mark in the NP task to correspond to
the value of the number (see Stevens, 1957, p. 165, for the parallel definition of ‘‘magnitude produc-
tion’’). We adhere to these definitions and the models specified in Hollands and colleagues (2002, p.
565, Eq. 1 for estimation and Eq. 2 for production). This is in contrast to three recent articles (Barth
& Paladino, 2011; Cohen & Blanc-Goldhammer, 2011; Sullivan, Juhasz, Slattery, & Barth, 2011) that
treated the NP task as an estimation task rather than a production task and would consider our PN task
as a production task. A consequence of switching from the standard definitions is that the beta values
in Barth and Paladino (2011) are not directly comparable to those presented here because their NP
task was apparently modeled with Eq. (1) rather than Eq. (2) from Hollands and colleagues (2002). De-
spite this, there appear to be no larger interpretive difficulties in adhering versus switching; the pro-
gression in beta that we report below, from above to below 1.0 across ages, would simply be reversed
under the other equation, with no change in interpretation. Likewise, Barth and Paladino’s (2011) one-
and two-cycle patterns do not change regardless of whether their reported values are beta (Eq. 1) or
the reciprocal of beta (Eq. 2).
We found a decrease in one-cycle beta across grades on the 0–100 number line task, both with and
without the adult sample, F(5, 132) = 5.99, g2 = .18, and F(4, 113) = 5.31, g2 = .16, respectively. As
shown in Table 3, the mean values of beta in first and second grades were well above unity. Estimates
of beta in third and fourth grades were 1.0 but then dropped significantly below 1.0 at fifth grade and
college, t(23) = –2.73 and t(19) = –5.93, respectively. A post hoc test on beta values (means needed to
differ by .49) confirmed that the two youngest grades (first and second grades) did not differ, the two
oldest grades (fifth grade and college) did not differ, and the two youngest grades differed from the
two oldest grades; second grade also differed from all older grades. The analysis of two-cycle beta val-
ues also showed a significant age effect, F(5, 131) = 3.12, g2 = .11. The obtained beta values, however,
were puzzling—1.24 for first grade but in the .82 to .87 range for all other grades. Thus, there was no
real developmental progression found with the two-cycle beta values, although there was an abrupt
change in beta from above to below 1.0 at second grade. All betas from second grade and above dif-
fered significantly from 1.0.
We would expect to find a developmental trend toward unity in beta given that a beta of 1.0 indi-
cates no bias in estimation (Hollands et al., 2002). One might expect beta = 1.0 to be the idealized
developmental endpoint in numerical estimation given that full appreciation of equal spacing on
the number line should yield unbiased linear estimation (within the limits of estimation and measure-
ment error). What is somewhat surprising is that children at third and fourth grades appear to have
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 257

Table 3
Grade, mean beta values, and 95% CI values separately for 0–100 and 0–1000 number lines.

Grade One-cycle Two-cycle


Beta Lower bound Upper bound Beta Lower bound Upper bound
0–100 Estimation task
First grade 1.46 0.91 2.00 1.24 0.75 1.73
Second grade 1.52 1.19 1.86 0.83 0.73 0.94
Third grade 1.00 0.86 1.15 0.87 0.79 0.95
Fourth grade 1.00 0.88 1.12 0.84 0.74 0.94
Fifth grade 0.87 0.77 0.97 0.82 0.73 0.91
College 0.84 0.78 0.89 0.85 0.79 0.96
0–1000 Estimation task
Third grade 1.40 1.10 1.69 1.19 0.94 1.44
Fourth grade 1.09 0.89 1.29 0.90 0.77 1.03
Fifth grade 1.01 0.85 1.16 0.95 0.84 1.05
College 0.79 0.74 0.84 0.77 0.71 0.82

reached this level of linearity, as indexed by beta, and yet fifth graders and adults tested with the same
task produced values significantly lower than 1.0. We are unaware of any version of the proportion
judgment model that anticipates such a developmental outcome. Because values of beta that differ
from 1.0 indicate bias in estimation, straightforward interpretation of this result would suggest that
older children and adults start to become more biased in their estimation after a period of unbiased
estimation, a problematic conclusion. An alternate interpretation is that beta, derived from a power
law, is sensitive to other factors besides numerical estimation per se (e.g., psychophysical aspects of
the task; see Laming, 2009).
A second puzzle is that the PN task revealed a reversal of the under- and overestimate pattern
across ages. First and second graders generated S-shaped patterns, under-then-overestimates, across
the number line (see Figs. 1 and 2), with beta substantially above 1.0. Third and fourth graders had
beta values of 1.0 with no over- or underestimate pattern. Yet, fifth graders and college students
showed beta values below 1.0 and a clear reverse S-shaped over-then-underestimate pattern across
the line, a pattern we find repeatedly with adult samples (Durette et al., 2011). Such a reversal is unu-
sual; in cases discussed by Hollands and colleagues (2002), tasks generally elicit one pattern or the
other but not both.
One- and two-cycle beta values were also analyzed on the 0–1000 task with similar results. In the
one-cycle analysis, there was again a progression in obtained beta exponents from above to below 1.0,
F(3, 83) = 6.64, g2 = .19, with third graders’ mean beta values being substantially above 1.0, fourth and
fifth graders’ beta values approximating unity, and college students’ beta values being significantly be-
low 1.0, t(19) = –9.38. Post hoc tests (means needed to differ by .35) showed that the third grade beta
differed from fifth grade and college betas, whereas values from fourth grade and above did not differ.
The age effect was also significant when college students were excluded, F(2, 64) = 3.61, g2 = .10.
Two-cycle beta values also showed the developmental progression and the change from above to
below 1.0 (all means in Table 3). Post hoc tests showed only that the third grade beta differed from
older groups’ betas (means needed to differ by .27). The college beta value was significantly lower
than 1.0, t(19) = –8.58, suggesting a significant departure from linearity. Again, this is a problematic
conclusion based on the high degree of linearity of adults’ estimates, as shown in Table 2, and the
implication that adults are more biased than the linear fourth and fifth graders.

Model comparison

To compare Siegler’s log-to-lin model (e.g., Siegler & Opfer, 2003) with the proportion judgment
account advocated by Barth and Paladino (2011), we first tallied the number of participants at each
grade who were best fit by the linear, exponential, one-cycle, or two-cycle model and computed mean
R2 fits for these groupings; Table 4 presents these results.
258 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

Table 4
Percentage of participants best fit by linear, exponential, one-cycle, and two-cycle models, with mean R2 fits, separately by grade
and 0–100 and 0–1000 number lines.

Grade Linear Exponential One-cycle Two-cycle Tied (%)


% Fit R2 % Fit R2 % Fit R2 % Fit R2
0–100 Estimation task
First grade 26 .61 58 .75 5 .91 5 .56 5
Second grade 32 .80 24 .90 36 .93 4 .79 4
Third grade 40 .91 0 – 40 .95 10 .97 10
Fourth grade 62 .95 0 – 28 .97 6 .94 3
Fifth grade 52 .96 0 – 35 .95 9 .96 4
College 35 .97 0 – 55 .98 5 .98 5
0–1000 Estimation task
Third grade 60 .91 20 .82 15 .96 5 .93 0
Fourth grade 48 .89 18 .80 18 .94 11 .90 4
Fifth grade 61 .93 17 .90 17 .96 4 .97 0
College 60 .97 0 – 35 .97 5 .98 0

Note first the growing percentage of participants fit by the linear function across ages, the growth
in fit of the R2lin function, and the declining percentage fit by the exponential model (also shown in
Table 2). These trends are consistent with Siegler’s results. Second, note the reasonably high percent-
ages of participants fit by the one-cycle model and the very high R2 fits for that model. Although this
seems supportive of the Barth and Paladino (2011) approach, there is reason to doubt the fits obtained
here. In particular, Opfer, Siegler, and Young (2011) conducted simulations in which the underlying
data were drawn exclusively from linear and logarithmic distributions of responding. Even so, the
one-cycle power model fit the data with R2 fits of .92 and above—excellent cyclic fits when no cyclic
responding was present. As such, the fits obtained here (from .91 to .98) are likely misleading. Third,
despite fitting the 0–100 data with some frequency, the one-cycle model fit the 0–1000 task data quite
infrequently, certainly less frequently than the linear model. Finally, the two-cycle model showed no
growing tendency to fit performance at the older ages despite evidence (presented below) of a grow-
ing reliance on the midpoint of the number line in performing estimates; according to the two-cycle
model, however, use of the midpoint as a reference should generate two-cycle model fits. (Note also
that Barth and Paladino (2011) found the two-cycle model to fit second graders with some regularity,
an effect that was not replicated here.)
Going beyond this tabulation, we also conducted statistical tests of the log-to-lin versus the cyclic
power model. We reasoned that a good statistical fit for either the linear or exponential model would
support the log-to-lin shift approach and, likewise, that a good fit for either the one- or two-cycle
model would support the cyclic power model that Barth and Paladino (2011) advocated. Thus, for each
participant, we first took the higher R2 value of the linear or exponential fit to that participant’s re-
sponses as a measure of the log-to-lin model’s relative success in fitting the data. Likewise, again
for each participant, we took the higher R2 value of the one- or two-cycle beta fits as a measure of
the cyclic power model’s relative success in fitting the data. These two R2 values were then compared
in paired t tests separately for the 0–100 and 0–1000 tasks.
For the 0–100 task, the t test that included all grades was significant, indicating that the linear or
exponential fits yielded higher R2 values than the one- or two-cycle model, t(137) = 3.09, mean
R2 = .91 for the linear or exponential fits versus .89 for the one- or two-cycle fits. This analysis was also
significant when college adults were excluded, t(117) = 3.11. When the analysis was conducted sepa-
rately by grade, the effect held for Grades 1 and 4, t(17) = 2.59 and t(31) = 2.43, respectively, was mar-
ginally significant for Grade 3, t(19) = 1.98, p = .06, and failed to reach significance in Grades 2 and 5
and college (although in the same direction).
In the 0–1000 task, the t test that included all participants showed higher fits for the linear or expo-
nential R2 (.92) than the one- or two-cycle R2 (.89), t(86) = 3.37; the same was true without college
students, t(66) = 3.33. In the separate grade analyses, the linear or exponential fits were stronger in
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 259

A B
Mean absolute error

Mean absolute error


quartile quartile
Location of hatchmark Location of hatchmark

C D
Mean absolute error
Mean absolute error

quartile quartile
Location of hatchmark Location of hatchmark

E F
Mean absolute error
Mean absolute error

quartile quartile
Location of hatchmark Location of hatchmark

Fig. 5. Mean absolute errors (and 95% CI) across Grades 1 to 5 and college (A–F, respectively) at the origin, first quartile,
midpoint, third quartile, and endpoint of the 0–100 number line.

Grades 3 and 4, t(19) = 2.26 and t(23) = 2.25, respectively, whereas the effect was not significant in
Grade 5 or in the college sample (although again in the same direction).
These analyses demonstrate that the proportion judgment account is less successful than the
log-to-lin shift account in fitting developmental estimation data. Although several of the individual
260 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

grade analyses did not reach significance, there was no statistical test in which the cyclical models
outperformed the linear or exponential model. We suspect strongly that the high values of R2 for
one-cycle fits here and the several instances of nonsignificant t tests are explained by Opfer and
colleagues’ (2011) demonstration that one-cycle fits are uniformly high even in the absence of cyclic
responding.

Mental processes of estimation

Absolute errors, 0–100 lines


We next examined children’s errors directly, that is, the amounts by which their estimates deviated
from the correct values. The dependent variable was the absolute value of a child’s error at each posi-
tion. That is, if the child estimated 60 for a hatch mark located at position 64, the absolute error was 4,
the absolute value of 60–64 (and likewise if the child estimated 68 at that position). The full analysis
was a 6 (Grade: 1, 2, 3, 4, 5, or college)  26 (Hatch Mark Position) mixed ANOVA; we also conducted
this analysis without college students. For purposes of presentation here, we report the results of
‘‘contour’’ analyses. For these analyses, we averaged, on a child-by-child basis, the two observations
closest to the origin, midpoint, and endpoint and the two observations nearest the two intermediate
quartiles (the values nearest 25 and 75). This approach captures the overall contours of the profiles
across the full set of 26 positions, allows a less cluttered presentation of the data, and focuses on
the important regions of the number line identified by Barth and Paladino (2011). Analyzing just these
five points per line also addresses a possible concern about inflated degrees of freedom (df) for the
overall F tests. The contour analyses were entirely redundant with the full ANOVA results (the sum-
mary table and full graphic results across all 26 positions are contained in the supplementary
material).
Overall, there was a significant decrease in errors across grades, F(5, 137) = 34.21, gp2 = .56; mean
errors (equivalent to percentages for the 0–100 lines) were 19.3, 11.3, 4.8, 5.6, 4.7, and 3.9 for Grades 1
to 5 and college, respectively. This decline in errors was also significant when the college sample was
excluded, F(4, 118) = 33.32, gp2 = .53. Post hoc analyses revealed that the first grade mean was higher
than the means of all older grades, as was the second grade mean, and the means in Grade 3 and high-
er grades did not differ (means needed to differ by 5.7). More interesting, there was a significant effect
of position, F(4, 548) = 36.85, gp2 = .21, and a position by grade interaction, F(20, 548) = 9.39, gp2 = .26.
The main effect and interaction were also significant without the college sample, F(4, 472) = 35.40,
gp2 = .23, and F(16, 472) = 9.57, gp2 = .24, respectively.
Fig. 5 shows the position by grade interaction along with 95% confidence interval (CI) around the
means. The first panel in the figure shows the first grade results, a steady increase in absolute errors
across the number line with a moderate decrease but high variability in errors approaching the end-
point. Low errors and variability at the origin suggests that these children were beginning their esti-
mates at the origin and working forward, with growing errors and variability across the line as that
hatch mark was placed farther from the origin. Second graders, in contrast, showed a marked tent-
shaped pattern, that is, low errors and variability at both endpoints and an increase in absolute errors
and variability at the midpoint. This would suggest that second graders used a more sophisticated
strategy for their estimates, beginning their estimates from whichever endpoint was closer to the
hatch mark rather than routinely starting at the origin and working upward. (Note that the mean
absolute error at the midpoint, 16.8, is quite high, with scores ranging from –25 to 40, although the
median was near 0; see Fig. 1B).
Remarkably, third graders showed both a marked improvement in accuracy and another dramatic
change in pattern—a pronounced dip in absolute errors and variability at the midpoint of the number
line. This pattern continued in the fourth and fifth grade results and was overwhelmingly common in
our adult results both here and in previous work (Durette, 2009; Durette et al., 2011). It reflects what
has been called a ‘‘midpoint strategy’’ (e.g., Petitto, 1990). In our view, this strategy involves not just
the perceptual salience of the midpoint of the number line but also the arithmetic knowledge that a
hatch mark near the perceptual midpoint must equal a value near 50 given that 50 is arithmetically
half the length of the 0–100 number line. In other words, we interpret the ‘‘M-shaped’’ function as
indicative of a midpoint strategy in estimating values on the number line: When the hatch mark is
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 261

near the center of the line, its value is arithmetically half the length of the line. It would appear that
third graders had become familiar enough with numbers and arithmetic to have this knowledge acces-
sible, and this ‘‘half of 100’’ knowledge improves the accuracy of their estimates in the line estimation
task (see also Petitto, 1990; Schneider et al., 2008).
Simple effects tests were conducted to verify that the declines in errors at the midpoint beginning
in third grade were indeed significant. We made two comparisons per grade: mean error at the first
quartile versus the midpoint and mean error at the midpoint versus the third quartile. With one
exception, all tests beginning with third grade indicated that the dip in errors was in fact significant
[Fs ranged from 5.83 to 31.50 on df values from (1, 76) to (1, 124)]; the one exception was in fourth
grade, where the midpoint to third quartile comparison was not significant. Consistent with the pat-
terns shown in Fig. 5, the same simple effects tests at first and second grades revealed no evidence of a
midpoint strategy. Instead, the increase from midpoint to third quartile was significant at first grade,
as was the decline at the same position in second grade [Fs of 4.20 and 10.56 on df values of (1, 88) and
(1, 96), respectively]. Thus, the overall increase in errors at first grade and the tent-shaped pattern at
second grade were confirmed by simple effects tests. The same held true when the analysis was re-
stricted to first and second graders who responded linearly (i.e., no evidence of an M-shaped pattern
for these children).

Latencies, 0–100 lines


Analyses of the 0–100 estimation latencies at each hatch value paralleled those for errors (see the
Supplementary material for the full results across 26 positions). Starting with second grade, there was
a general decline in latencies across grades, F(5, 137) = 16.87, gp2 = .38 [when the college sample was
excluded, F(4, 118) = 8.96, gp2 = .23]. Mean latencies were 2818, 3417, 3428, 2404, 2195, and 1437 ms
across Grades 1 to 5 and college, respectively. Post hoc analysis (means needed to differ by 703 ms)
showed that second and third grade means were equivalent, they differed from fourth and fifth grade
means (which were also equivalent), and all four of these means differed from the college mean. First
graders’ mean, however, differed only from the college mean.
The pattern of latencies across positions was also significant, F(4, 548) = 35.99, gp2 = .21, and with
the college sample excluded, F(4, 472) = 35.76, gp2 = .23. Finally, the grade by position interaction that
included all grades in the analysis was significant, F(20, 548) = 3.31, gp2 = .11, and remained significant
after the exclusion of the college sample, F(16, 472) = 2.62, gp2 = .08.
Fig. 6 demonstrates the contour patterns of latencies across grades for the 0–100 lines. Note first
the somewhat faster latencies in first grade than in second grade, especially toward the endpoint of
the line. This was accompanied by much higher errors (and variability), as shown in Fig. 5. First grad-
ers were simply reacting fairly quickly to the hatch marks, often with very inaccurate estimates. In
contrast, second and third graders slowed down in their estimates and by third grade achieved far
greater accuracy; of course, a much larger proportion of these children provided linear responses than
in Grade 1. Because the linear representation would be a fairly new accomplishment for these chil-
dren, it is probably not surprising that they were slower overall.
More interesting are the latency patterns at fourth grade and higher grades shown in Fig. 6. The
fourth graders showed the same M-shaped pattern in latencies as was seen in the third graders’ errors.
Simple effects tests revealed that fourth grade and higher grades all show a significant dip from the
first quartile to the midpoint and a significant increase in latency from the midpoint to the third quar-
tile [Fs ranged from 12.43 to 47.04 on df values from (1, 76) to (1, 124)]. Grades 1 to 3 failed to show a
significant M-shaped dip, although the increase in latency from the midpoint to third quartile was sig-
nificant at first grade, F(1, 88) = 5.31. Apparently, a year after the midpoint strategy begins to improve
the accuracy of children’s estimates at the middle of the line, the facilitation also becomes apparent in
processing speed. Thus, the M-shaped pattern seems to indicate the development of a more sophisti-
cated estimation strategy than has been reported previously, one that beginning at third grade mirrors
adults’ performance on accuracy and beginning at fourth grade mirrors adults’ processing speed. The
strategy seemingly combines the perceptual salience of the midpoint of the number line (i.e., line
bisection) with the arithmetic knowledge of ‘‘half of 100,’’ knowledge that is considerably more
specific than the general principle of equal spacing of numbers.
262 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

quartile
Contour position

Fig. 6. Mean latency across Grades 1 to 5 and college (adult) at the origin, first quartile, midpoint, third quartile, and endpoint of
the 0–100 number line. RT, response time.

Interestingly, there is no evidence, even in our adult sample, of an equivalent dip or facilitation at
the quartile points in the number line. Despite intuition that it should be relatively easy for
participants to first bisect the line at 50 and then bisect again at 25 and 75, we found no evidence
of such processing in either errors or latencies.

Absolute errors, 0–1000 lines


We conducted a comparable analysis on absolute errors for the 0–1000 lines, a 4 (Grade: 3, 4, 5, and
college)  5 (Contour Position) mixed ANOVA (three levels of grade when adults were excluded; full
results on 26 positions are furnished in the supplementary material). Average errors per grade were
82.6, 102.6, 75.5, and 44.9 for Grades 3 to 5 and college, respectively (for the 0–1000 lines, a mean
error of 82.6 equals an error rate of 8.26%). This effect was significant when all ages were considered,
F(3, 87) = 3.96, gp2 = .12, but not when the college sample was excluded, F(2, 68) = 1.21, ns. In other
words, our third graders were indistinguishable from their elementary school peers in terms of error
rates, just as was the case for third graders on the 0–100 lines. The post hoc analysis confirmed that
the fourth grade versus college contrast was the only significant difference among the means (means
needed to differ by 58.0). The position effect was significant [for all grades, F(4, 348) = 11.10, gp2 = .11;
with college sample excluded, F(4, 272) = 11.45, gp2 = .14], as was the grade by position interaction,
F(12, 348) = 1.96, gp2 = .06 (with the college sample excluded, F < 1.0).
We do not depict this grade by position interaction because of the results of follow-up analyses.
Recall from Table 2 that a nontrivial number of children in Grades 3 to 5 were best fit by an exponen-
tial function on their 0–1000 performance—20%, 26%, and 17%, respectively. In the follow-up analyses,
we examined absolute errors separately for children who responded linearly and those who re-
sponded exponentially on the 0–1000 lines and found dramatic differences, as shown in Fig. 7.
To test these differences, a one-way ANOVA between linear and exponential children was con-
ducted with absolute error as the dependent variable. Not surprisingly, this analysis revealed a signif-
icant difference such that the linear children estimated with less error (mean = 78.6) than the
exponential children (mean = 179.9), F(1, 68) = 54.78, g2 = .45. To test this possibility in the proportion
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 263

A B
Mean absolute error

Mean absolute error


quartile quartile quartile quartile
Location of hatchmark Location of hatchmark

C
Mean absolute error

quartile quartile
Location of hatchmark

Fig. 7. Mean absolute errors (and 95% CI) across Grade 3 (A), Grade 4 (B), and Grade 5 (C) at the origin, first quartile, midpoint,
third quartile, and endpoint of the 0–1000 number line separately for linear and exponential responders.

judgment model, we also conducted a parallel analysis, dividing the sample into one- and two-cycle
responders. This analysis did not reveal a group difference (F < 1.0).
In addition, for all three grades (third, fourth, and fifth), participants who responded linearly
showed an increase in errors from the origin to the first quartile and then a drop in errors from the
first quartile to the midpoint, the middle of the M-shaped pattern. This drop was significant for third
and fifth grade [Fs of 8.39 and 12.80 on df values of (1, 60) and (1, 72), respectively] but not for fourth
grade due to a large amount of variability in the sample. The third and fourth grade patterns showed
only a slight growth in errors beyond the midpoint, whereas the fifth graders showed a significant in-
crease [F of 8.68 on df of (1, 72)], thereby completing the M-shaped pattern. Thus, there is solid evi-
dence that ‘‘half of 1000’’ knowledge is firmly in place and affecting fifth graders’ estimations, and
there is suggestive evidence that this was the case beginning in third grade, where the same pattern
was obtained on 0–100 lines as well. In striking contrast, for all three grades shown in Fig. 7, children
who responded exponentially showed the tent-shaped pattern of errors and an overall level of errors
264 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

that was considerably higher than that of children responding linearly, especially at the midpoint.
Thus, in a very real sense, the obtained grade by position interaction was qualified by the grouping
of participants into linear versus exponential responders—those who demonstrated M-shaped versus
tent-shaped error functions, respectively.

Latencies, 0–1000 lines


Latencies in the 0–1000 task decreased across grades, with means of 3964, 2510, 2651, and 1768
ms for Grades 3 to 5 and college, respectively. These differences were significant for all grades,
F(3, 87) = 21.49, gp2 = .43, and when the college sample was excluded, F(2, 68) = 15.44, gp2 = .31. The
post hoc analysis (means needed to differ by 765 ms) revealed that third grade responded significantly
slower than all other grades, fourth and fifth grades did not differ, and fifth grade and college differed
(the fourth grade vs. college comparison was marginally significant). The position effect was also
significant for all grades, F(4, 348) = 5.460, gp2 = .06, and with the college sample excluded,
F(4, 272) = 7.76, gp2 = .10. As in our previous analyses, the grade by position interaction was significant
for all grades, F(12, 348) = 5.84, gp2 = .17, and when the college sample was excluded, F(8, 272) = 5.50,
gp2 = .14. Third graders showed an increasing pattern of latencies across positions with slow respond-
ing. Older children showed faster responding but only hints of the characteristic M-shaped patterning;
at fourth grade the midpoint differed from the first quartile [F of 20.81 on df of (1, 108)] but not the
third quartile, and at fifth grade the midpoint to third quartile comparison was only marginally differ-
ent, F(1, 88) = 3.49, p < .10. For college students, however, the midpoint dip was significantly different
from both quartile points [Fs of 16.58 and 7.29 on df of (1, 76)]. Consistent with the error data in Fig. 7,
children whose responses were best fit by the exponential function showed no evidence of a dip at the
midpoint.

Relationship between number line estimation and math achievement

An important facet of Siegler’s work (e.g., Booth & Siegler, 2006) was the finding that children’s
number line estimates correlate significantly with their performance on standardized math achieve-
ment tests. In particular, that research has shown that both the linearity of children’s estimates and
the absolute error of those estimates correlate strongly with math achievement scores in the .45 to
.57 range (the correlation is negative for errors and positive for linearity; see Table 3 in Booth &
Siegler, 2006, for details). We obtained school records for children in our sample to see whether the
same kinds of relationships held here.
We correlated estimation performance with standardized math achievement scores using each
child’s R2lin and mean absolute error on the 0–100 and 0–1000 tasks. The achievement scores were
from a standardized criterion referenced test (CRT) administered to all students in the state beginning
in Grade 3.2
The correlations replicated the patterns obtained by Booth and Siegler (2006); the more linear and
accurate a child’s estimates were in the number line task, the higher the child’s score was on the stan-
dardized exam. In the 0–100 task, R2lin correlated .39 with the CRT scores and absolute error correlated
–.34 with CRT (df = 70). In the 0–1000 task, R2lin correlated .40 with CRT scores and the absolute error
scores correlated –.46 with CRT (df = 65). Even more compelling, we divided the entire sample into the
same responder type groups shown earlier—children who responded in linear fashion and those who
responded exponentially. The linear responders scored significantly higher on the CRT (mean = 343)
than the exponential responders (mean = 286), F(1, 66) = 9.31, gp2 = .12. Importantly, scores below
300 on this test are categorized by the school district as ‘‘does not meet grade standards’’.
Correlations were also computed using measures from the cyclic power model, in particular the R2
values for the one- and two-cycle model fits for both the 0–100 and 0–1000 tasks. For the 0–100 task,

2
The Nevada criterion referenced test was developed collaboratively between the Nevada Department of Education and WestEd
to assess grade-level proficiency in mathematics. Test items were aligned to grade-specific content standards. We used children’s
scaled scores, the overall summary score of math achievement, as the variable in our correlation analyses; because these scores do
not enable straightforward grade-to-grade comparisons, our analyses first partialled out the effect of age. For details on test
development and evaluation, see http://nde.doe.nv.gov/Assessment_CRT.htm.
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 265

the one- and two-cycle correlations with CRT were .29 and .25 (df = 70); for the 0–1000 task, they both
were .49 (df = 62). But dividing the sample into one-cycle responders and two-cycle responders
yielded no group difference on average CRT score (F < 1.0). This is somewhat surprising given that
two-cycle responding should represent more sophisticated processing because it relies on using the
midpoint as an additional reference. As such, two-cycle responding should, in principle, be associated
with higher math achievement.

Discussion

The results obtained here are largely consistent with the log-to-lin shift in responding reported by
Siegler and colleagues (e.g., Booth & Siegler, 2006; Booth & Siegler, 2008; Siegler & Opfer, 2003). Chil-
dren fit by an exponential model revealed markedly different patterns of estimation than those of lin-
ear responders. Furthermore, there was a progressive increase across grades in the number of children
best fit by a linear model and an increase in the linearity of their fits across grades. To the extent that
children’s underlying mental representation of number can be inferred directly from response profiles,
these results support Siegler’s contention that children’s initial numerical representation is logarith-
mic in nature, gradually shifting to a linear representation during early elementary school. The emer-
gence of solidly linear responding (i.e., among all children at a single grade level) co-occurred with the
M-shaped pattern in errors we observed, which we interpret as another sign of increased number
knowledge. The influence of this increased knowledge was also revealed in correlations with stan-
dardized scores, where children who continued to be fit by exponential response patterns scored sig-
nificantly worse than those best fit by linear response profiles.
Our results provide some support for the proportion judgment approach. Several median plots of
estimates showed the S-shaped curves predicted by the power model. Furthermore, beta exponents
decreased across age for both the 0–100 and 0–1000 lines, indicating decreasing bias in estimation
across age. In addition, individuals’ R2 fits on the 0–1000 lines provided a significant relationship with
standardized test performance in Grades 3 to 5.
On the other hand, three specific difficulties arose with beta, and therefore with the proportion
judgment approach, in this developmental context. First, for both 100 and 1000 lines, the develop-
mental progression was from under-then-overestimates to over-then-underestimates (i.e., beta from
above to below 1.0). Such a pattern reversal is highly unusual, to say the least, in the psychophysical
literature that yielded the proportion judgment approach and the cyclical power model. Second, for
both the 0–100 and 0–1000 lines, adults’ beta value was significantly less than 1.0 (as was fifth grad-
ers’ beta on the 0–100 task), as Durette and colleagues (2011) found consistently with adult samples.
This is problematic because it implies that adults perform numerical estimations with greater bias
than is found in the older child samples. Admitting that adults have bias in a perceptual task, based
on a beta exponent, would not be an interpretive difficulty; indeed, it is quite common. But in a
numerical estimation task, if beta is interpreted as estimation bias, this leads to the conclusion that
adults are more biased estimators than third and fourth graders. We reject that conclusion, of course,
and suggest that beta may be sensitive to other factors besides numerical estimation per se, for exam-
ple, subtle perceptual aspects that are detectable at the adult level because numerical estimates them-
selves are very accurate. Third, the direct statistical comparisons often favored Siegler’s log-to-lin
approach and never favored the cyclic model approach, even with the strong tendency for the one-cy-
cle model to fit number line estimation data (Opfer et al., 2011). All of these discrepancies will need to
be addressed if the proportion judgment approach can ultimately be judged as useful in understand-
ing number line estimation from a developmental perspective.
Our results suggest that children’s mental processes of number line estimation are jointly influ-
enced by the underlying representation of numerical magnitude and their increasing knowledge of
arithmetic. The origin of the line is always a region of highly accurate estimates regardless of age
and underlying representation (i.e., linear or logarithmic). This point is then joined by accurate
estimates at the endpoint of the line and, with increasing knowledge of arithmetic, by the
midpoint of the line. These three landmark positions show facilitation on both accuracy and
latency but do so according to a timetable that unfolds gradually across elementary school ages.
266 M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267

Latency effects lag somewhat behind accuracy by a year for the 0–100 range and even longer for
the 0–1000 range.
It appears that first graders perform their estimates in an ‘‘origin up’’ fashion, beginning at the
origin of the line and working forward to the hatch mark, with increasing errors and latencies as
the location to be estimated gets farther from the origin. This seems to be a rather simple sequential
process of starting at the origin and moving up to the hatch mark’s location; children presumably
keep track of distance moved up by means of the internal representation of numerosity, whether
linear or log. Second graders, however, seem to be working from the end of the line closest to
the hatch mark, forward from the origin when the hatch marks are in low positions, and backward
from the endpoint when the hatch marks are in the upper part of the line. Thus, they estimated up
from 0 or down from 100 in a strategically economical and accurate fashion. The farther a hatch
mark was from an endpoint (i.e., the closer to the middle of the line), the slower and more error
prone the estimate. Finally, when the arithmetic middle of the line is known or can be easily com-
puted, beginning in third grade, nearby points can be estimated from that landmark as well. This
facilitates accuracy and, after another year, processing speed as well. Using all three landmarks
yields the characteristic M-shaped pattern of adult performance. Thus, estimation seems to have
shifted developmentally from a simple forward process with one landmark to a process using two
and then three landmarks, with the error and latency measures demonstrating the effectiveness
of those strategies.
Two notable features of the mature M-shaped pattern emerged here as characteristic of the
three-landmark process. First, it appears gradually across development. It appeared first in third
graders’ errors on the 0–100 lines and then in latencies a year later. However, the pattern was
not solid on the 0–1000 lines in either errors or latencies, possibly because of the greater difficulty
of this quantity; after all, 17% of fifth graders continued to give exponential responses on 0–1000
lines. It is nonetheless typical and nearly universal among college participants. Second, the M-
shaped pattern was notably absent (actually reversed) in children whose estimates indicated contin-
ued reliance on the logarithmic representation of magnitude. The 0–1000 error patterns for these
children, even as late as fifth grade, resemble those of first and second grades on the 0–100 lines.
It is telling, then, that the standard measure of mature number representation, the linearity of
responding as indexed by R2, was found here to be accompanied by less than adult-like performance
on latencies and errors. To take one example, fifth graders may average 92% on R2 linear perfor-
mance on 0–1000 estimations (Table 2), but some of their latency and error patterns still fall short
of adult performance.
The consequences of using a linear or logarithmic representation were particularly clear when er-
rors in estimation were examined in relation to standardized math achievement scores. We found that
greater degrees of linearity in responding, and hence lower errors, accompanied the emergence of the
midpoint strategy with its characteristic M-shaped pattern of responding. These were associated with
higher math achievement scores. In all cases, the measures would be assessing children’s growing
knowledge of arithmetic and mathematics and their fluency with that knowledge. We are not the first
to identify a midpoint strategy in number line estimations; Petitto (1990) noted that children used the
midpoint of estimation lines increasingly as age and education increased, and Schneider and
colleagues (2008) documented increases in attention to the midpoint by recording eye fixation
frequencies. But this is the first report to document the use of the midpoint as an important strategy
in estimation and to tie the strategy to both errors and latencies as measures of the underlying
cognitive processes of estimation.

Acknowledgments

The research reported here was partially supported by a generous donation from Phyllis Frias to the
UNLV Foundation. We thank the students, teachers, and administrators of the Frias Elementary School
of Las Vegas, Nevada, for their cooperation in this project as well as Hilary Barth, Justin Hollands, and
an anonymous reviewer for their helpful comments on earlier versions of the manuscript. A prelimin-
ary report of this research was presented at the meetings of the Psychonomic Society, Boston, in
November 2009.
M.H. Ashcraft, A.M. Moore / Journal of Experimental Child Psychology 111 (2012) 246–267 267

Appendix A. Supplementary material

Supplementary material for this article is available in the online version at doi:10.1016/
j.jecp.2011.08.005.

References

Barth, H. C., & Paladino, A. M. (2011). The development of numerical estimation: Evidence against a representational shift.
Developmental Science, 14, 125–135.
Booth, J. L., & Siegler, R. S. (2006). Developmental and individual differences in pure numerical estimation. Developmental
Psychology, 41, 189–201.
Booth, J. L., & Siegler, R. S. (2008). Numerical magnitude representations influence arithmetic learning. Child Development, 79,
1016–1031.
Cohen, D. J., & Blanc-Goldhammer, D. (2011). Numerical bias in bounded and unbounded number line tasks. Psychonomic
Bulletin & Review, 18, 331–338.
Dehaene, S. (1997). The number sense: How the mind creates mathematics. New York: Oxford University Press.
Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of
Experimental Psychology: General, 122, 371–396.
Dehaene, S., Dehaene-Lambertz, G., & Cohen, L. (1998). Abstract representations of numbers in the animal and human brain.
Trends in Neuroscience, 21, 355–361.
Dehaene, S., Izard, V., Spelke, E., & Pica, P. (2008). Log or linear? Distinct intuitions of the number scale in Western and
Amazonian Indigene cultures. Science, 320, 1217–1220.
Dixon, J. W. (1953). Processing data for outliers. Biometrics, 9, 74–89.
Durette, R. T. (2009). Adult estimation, eye movements, and math anxiety. Unpublished master’s thesis, University of Nevada, Las
Vegas.
Durette, R. T., Rudig, N. O., & Ashcraft, M. H. (2011). Cognitive processes in adults’ number line estimates. University of Nevada, Las
Vegas: Unpublished manuscript.
Geary, D. C., Hoard, M. K., Byrd-Craven, J., Nugent, L., & Numtee, C. (2007). Cognitive mechanisms underlying achievement
deficits in children with mathematical learning disability. Child Development, 78, 1343–1359.
Hollands, J. G., & Dyre, B. P. (2000). Bias in proportion judgments: The cyclical power model. Psychological Review, 107, 500–524.
Hollands, J. G., Tanaka, T., & Dyre, B. P. (2002). Understanding bias in proportion production. Journal of Experimental Psychology:
Human Perception and Performance, 3, 563–574.
Huang, Y. T., Spelke, E., & Snedeker, J. (2010). When is four far more than three? Children’s generalization of newly acquired
number words. Psychological Science, 21, 600–606.
Izard, V., & Dehaene, S. (2008). Calibrating the mental number line. Cognition, 106, 1221–1247.
Laming, D. (2009). Weber’s law. In P. Rabbitt (Ed.), Inside psychology: A science over 50 years (pp. 179–191). Oxford, UK: Oxford
University Press.
Opfer, J. E., Siegler, R. S., & Young, C. J. (2011). The powers of noise-fitting: Reply to Barth and Paladino. Developmental Science, 14,
1194–1204.
Petitto, A. L. (1990). Development of number line and measurement concepts. Cognition and Instruction, 7(1), 55–78.
Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Lawrence Erlbaum.
Restle, F. (1970). Speed of adding and comparing numbers. Journal of Experimental Psychology, 83, 274–278.
Schneider, M., Heine, A., Thaler, V., Torbeyns, J., De Smet, B., Verschaffel, L., et al (2008). A validation of eye movements as a
measure of elementary school children’s developing number sense. Cognitive Development, 23, 409–422.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-prime reference guide. Pittsburgh, PA: Psychology Software Tools.
Siegler, R. S., & Booth, J. L. (2004). Development of numerical estimation in young children. Child Development, 75, 428–444.
Siegler, R. S., & Opfer, J. (2003). The development of numerical estimation: Evidence for multiple representations of numerical
quantity. Psychological Sciences, 14, 237–243.
Siegler, R. S., & Ramani, G. B. (2009). Playing linear number board games—but not circular ones—improves low-income
preschoolers’ numerical understanding. Journal of Educational Psychology, 101, 545–560.
Spence, I. (1990). Visual psychophysics of simple graphical elements. Journal of Experimental Psychology: Human Perception and
Performance, 16, 683–692.
Starkey, P., & Cooper, R. G. Jr., (1980). Perception of numbers by human infants. Science, 210, 1033–1035.
Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652–654.
Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64, 153–181.
Sullivan, J. L., Juhasz, B. J., Slattery, T. J., & Barth, H. C. (2011). Adults’ number-line estimation strategies: Evidence from eye
movements. Psychonomic Bulletin & Review, 18, 557–563.

Anda mungkin juga menyukai