Anda di halaman 1dari 8

The Challenges of Assessing Young Children Appropriately

Author(s): Lorrie A. Shepard


Source: The Phi Delta Kappan, Vol. 76, No. 3 (Nov., 1994), pp. 206-212
Published by: Phi Delta Kappa International
Stable URL: http://www.jstor.org/stable/20405297
Accessed: 04-10-2017 13:36 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Phi Delta Kappa International is collaborating with JSTOR to digitize, preserve and extend
access to The Phi Delta Kappan

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
The Challenges of Assessing
Young Children Appropriately

In the past decade, testing of 4-, 5-, and 6-year Note that I use the term test when re
olds has been excessive and inappropriate. ferring to traditional, standardized devel
opmental and pre-academic measures and
Given this history of misuse, Ms. Shepard the term assessment when referring to
maintains, the burden of proof must rest with more developmentally appropriate proce
dures for observing and evaluating young
assessment advocates to demonstrate the children. This is a semantic trick that
usefulness of assessment and to ensure that plays on the different connotations of the
two terms. Technically, they mean the
abuses will not recur same thing. Tests, as defined by the Stan
dards for Educational and Psychological
Testing, have always included systemat
ic observations of behavior, but our ex
By Lorrie A. Shepard
perience is with tests as more formal, one
P R OPOSALS to "assess" young child's entrance into preschool or kinder right-answer instruments used to rank and
children are likely to be met with garten. Early childhood experts, however,sort individuals. As we shall see, assess
outrage or enthusiasm, depend are more likely to respond with alarm ments might be standardized, involve pa
ing on one's prior experience and because they are more familiar with the per-and-pencil responses, and so on, but
one's image of the testing in complexities of defining and measuring in contrast to traditional testing, "assess
volved. Will an inappropriate paper-and development and learning in young chil ment" implies a substantive focus on stu
pencil test be used to keep some 5-year dren and because they are more aware of dent learning for the purpose of effective
olds out of school? Or will the assess the widespread abuses of readiness test intervention. While test and assessment
ment, implemented as an ordinary part of ing that occurred in the 1980s. cannot be reliably distinguished techni
good instruction, help children learn? A Given a history of misuse, it is impos cally, the difference between these two
governor advocating a test for every pre sible to make positive recommendations terms as they have grown up in common
schooler in the nation may have in mind about how assessments could be used to parlance is of symbolic importance. Us
the charts depicting normal growth in the monitor the progress of individual chil ing the term assessment presents an op
pediatrician's office. Why shouldn't par dren or to evaluate the quality of educa portunity to step away from past practices
ents have access to similar measures to tional programs without offering assur and ask why we should try to measure
monitor their child's cognitive and aca ances that the abuses will not recur. In what young children know and can do. If
demic progress? Middle-class parents, what follows, I summarize the negative there are legitimate purposes for gather
sanguine about the use of test scores to history of standardized testing of young ing such data, then we can seek the ap
make college-selection decisions, may be children in order to highlight the trans propriate content and form of assessment
eager to have similar tests determine their formation needed in both the substance to align with those purposes.
and purposes of early childhood assess
LORRIEA. SHEPARD is a professor of ed ment. Then I explain from a measurement
ucation at the University of Colorado, Boul perspective how the features of an assess Negative History of Testing
der. She is pastpresident of the National Coun
ment must be tailored to match the pur
Young Children
cil on Measurement in Education, past vice In order to understand the negative his
pose of the assessment. Finally, I describe
president of the American Educational Re
search Association, and a member of the Na
differences in what assessments might tory of the standardized testing of young
tional Academy of Education. She wishes to look like when they are used for purpos
children in the past decade, we need to
thank Sharon Lynn Kagan, M. Elizabeth es of screening for handicapping condi understand some larger shifts in curricu
Graue, and Scott E Marion for their thought tions, supporting instruction, or monitor lum and teaching practices. The distortion
ful suggestions on drafts of this article. ing state and national trends. of the curriculum of the early grades dur

206 PHI DELTA KAPPAN

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
ing the 1980s is now a familiar and well to young children with several ill-consid who might be harmed. Readiness testing
documented story. Indeed, negative ef ered policies: raising the entrance age for was the chief means of implementing pol
fects persist in many school districts to school, instituting readiness screening to icies aimed at removing young children
day. hold some children out of school for a from inappropriate instructional pro
Although rarely the result of conscious year, increasing retentions in kindergar grams. Thus the use of readiness testing
policy decisions, a variety of indirect pres ten, and creating two-year programs with increased dramatically during the 1980s
sures - such as older kindergartners, ex an extra grade either before or after kin and continues today in many school dis
tensive preschooling for children from af dergarten. These policies and practices tricts.3
fluent families, parental demands for the had a benign intent: to protect children Two different kinds of tests are used:
teaching of reading in kindergarten, and from stress and school failure. However, developmental screening measures, orig
accountability testing in higher grades - they were ill-considered because they inally intended as the first step in the eval
produced a skill-driven kindergarten cur were implemented without contemplat uation of children for potential handicaps;
riculum. Because what once were first ing the possibility of negative side ef and pre-academic skills tests, intended for
grade expectations were shoved down to fects and without awareness that retain use in planning classroom instruction.4

kindergarten, these shifts in practice were ing some children and excluding others The technical and conceptual problems
referred to as the "escalation of curricu only exacerbated the problems by creat with these tests are numerous.' Tests are
lum" or "academic trickle-down." The ing an older and older population of kin being used for purposes for which they
result of these changes was an aversive dergartners.' The more reasonable cor were never designed or validated. Wait
learning environment inconsistent with rective for a skill-driven curriculum at ing a year or being placed in a two-year
the learning needs of young children. De earlier and earlier ages would have been program represents a dramatic disruption
velopmentally inappropriate instruction curriculum reform of the kind exempli in a child's life, yet not one of the exist
al practices, characterized by long peri fied by the recommendations for devel ing readiness measures has sufficient re
ods of seatwork, high levels of stress, and opmentally appropriate practices issued liability or predictive validity to warrant
a plethora of fill-in-the-blank worksheets, by the National Association for the Edu making such decisions.
placed many children at risk by setting cation of Young Children (NAEYC), the Developmental and pre-academic
standards for attention span, social matu nation's largest professional association skills tests are based on outmoded theo
rity, and academic productivity that could of early childhood educators.2 ries of aptitude and learning that origi
not be met by many normal 5-year-olds. The first response of many schools, nated in the 1930s. The excessive use of
Teachers and school administrators re however, was not to fix the problem of these tests and the negative consequences
sponded to the problem of a kindergarten inappropriate curriculum but to exclude of being judged unready focused a spot
environment that was increasingly hostile those children who could not keep up or light on the tests' substantive inadequa

Illustration by Kay Salem NOVEMBER 1994 207

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
cies. The widely used Gesell Test is made ular belief that kindergarten children are sessments should reflect and model prog
up of items from old I.Q. tests and is in "too young to notice" retention, most of ress toward important learning goals.
distinguishable statistically from a meas them know that they are not making Conceptions of what is important to learn
ure of I.Q.; the same is true for devel "normal" progress, and many continue to should take into account both physical
opmental measures that are really short make reference to the decision years lat and social/emotional development as
form I.Q. tests. Assigning children to dif er. "If I hadn't spent an extra year in kin well as cognitive learning. For most as
ferent instructional opportunities on the dergarten, I would be in grade now." sessment purposes in the cognitive do
basis of such tests carries forward nativ In the face of such evidence, there is lit main, content should be congruent with
ist assumptions popular in the 1930s and tle wonder that many early childhood ed subject matter in emergent literacy and
1940s. At that time, it was believed that ucators ask why we test young children numeracy. In the past, developmental
I.Q. tests could accurately measure innate at all. measures were made as "curriculum free"
ability, unconfounded by prior learning or "culture free" as possible in an effort
experiences. Because these measured "ca to tap biology and avoid the confounding
Principles forAssessment
pacities" were thought to be fixed and un effects of past opportunity to learn. Of
alterable, those who scored poorly were
And Testing course, this was an impossible task be
given low-level training consistent with The NAEYC and the National Asso cause a child's ability to "draw a triangle"
their supposedly limited potential. Tests ciation of Early Childhood Specialists in or "point to the ball on top of the table"
of academic content might have the prom State Departments of Education have depends on prior experiences as well as
ise of being more instructionally relevant played key roles in informing educators on biological readiness. However, if the
than disguised I.Q. tests, but, as Anne about the harm of developmentally inap purpose of assessment is no longer to sort
Stallman and David Pearson have shown, propriate instructional practices and the students into programs on the basis of a
the decomposed and decontextualized misuse of tests. In 1991 NAEYC pub one-time measure of ability, then it is pos
prereading skills measured by traditional lished "Guidelines for Appropriate Cur sible to have assessment content mirror
readiness tests are not compatible with cur riculum Content and Assessment in Pro what we want children to learn.
rent research on early literacy.6 grams Serving Children Ages 3 Through A third guiding principle can be in
Readiness testing also raises serious 8."9 Although the detailed recommen ferred from several of the NAEYC guide
equity concerns. Because all the readiness dations are too numerous to be repeated lines. The methods of assessment must be
measures in use are influenced by past op here, a guiding principle is that assess appropriate to the development and ex
portunity to learn, a disproportionate ments should bring about benefits for periences of young children. This means
number of poor and minority children are children, or data should not be collected that - along with written products - ob
identified as unready and are excluded at all. Specifically, assessments "should servation, oral readings, and interviews
from school when they most need it. Thus not be used to recommend that children should be used for purposes of assess
children without preschool experience stay out of a program, be retained in grade, ment. Even for large-scale purposes, as
and without extensive literacy experien or be assigned to a segregated group based sessment should not be an artificial and
ces at home are sent back to the very en on ability or developmental maturity.'01? decontextualized event; instead, the de
vironments that caused them to score Instead, NAEYC acknowledges three le mands of data collection should be con
gitimate purposes for assessment: 1) to
poorly on readiness measures in the first sistent with children's prior experiences
plan instruction and communicate with
place. Or, if poor and minority children in classrooms and at home. Assessment
who do not pass the readiness tests are ad parents, 2) to identify children with spe practices should recognize the diversity
mitted to the school but made to spend an cial needs, and 3) to evaluate programs. of learners and must be in accord with
extra year in kindergarten, they suffer dis Although NAEYC used assessment children's language development - both
proportionately the stigma and negative in its "Guidelines," as I do, to avoid asso in English and in the native languages of
effects of retention. ciations with inappropriate uses of tests, those whose home language is not Eng
The last straw in this negative account both the general principle and the specif lish.
of testing young children is the evidence ic guidelines are equally applicable to for A fourth guiding principle can be
that fallible tests are often followed by mal testing. In other words, tests should drawn from the psychometric literature
ineffective programs. A review of con not be used if they do not bring about ben on test validity. Assessments should be
trolled studies has shown no academic efits for children. In what follows I sum tailored to a specific purpose. Although
benefits from retention in kindergarten or marize some additional principles that not stated explicitly in the NAEYC doc
from extra-year programs, whether de can ensure that assessments (and tests) are ument, this principle is implied by the rec
velopmental kindergartens or transition beneficial and not harmful. Then, in later ommendation of three sets of guidelines
al first grades. When extra-year children sections, I consider each of NAEYC's for three separate assessment purposes.
finally get to first grade, they do not do recommended uses for assessment, in
better on average than equally "unready" cluding national, state, and local needs for
program evaluation and accountability
Matching the Why and How
children who go directly on to first grade.7
However, a majority of children placed in data.
Of Assessment
these extra-year programs do experience I propose a second guiding principle The reason for any assessment -i.e.,
some short- or long-term trauma, as re for assessment that is consistent with the how the assessment information will be
ported by their parents.8 Contrary to pop NAEYC perspective. The content of as used -affects the substance and form of

208 PHI DELTA KAPPAN

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
small portion of the assessment - to pro ulum. In the past, this requirement has led
vide a rich, in-depth assessment of the in to the problem of achievement tests that
The intended tended content domain without overbur
dening any of the children sampled. When
are limited to the "lowest common de
nominator." Should the instrument used
use of an assess the "group" is very large, such as all for program evaluation include only the

ment will the fourth-graders in a state or in the na


tion, then assessing a representative sam
content that is common to all curricula?
Or should it include everything that is in
determine the need ple will produce essentially the same re any program's goals? Although the com
sults for the group average as if every stu mon core approach can lead to a narrow
for normative dent had been assessed. ing of curriculum when assessment re
information or Purpose must also determine the con
tent of assessment. When trying to diag
sults are associated with high stakes, in
cluding everything can be equally trou
other means nose potential learning handicaps, we still blesome if it leads to superficial teaching
rely on aptitude-like measures designed
to support the to be as content-free as possible. We do
in pursuit of too many different goals.
Finally, the intended use of an assess
interpretation so in order to avoid confusing lack of op
portunity to learn with inability to learn.
ment will determine the need for norma
tive information or other means to sup
of results. When the purpose of assessment is to port the interpretation of assessment re
measure actual learning, then content sults. Identifying children with special
must naturally be tied to learning out needs requires normative data to dis
the assessment in several ways. First, the comes. However, even among achieve tinguish serious physical, emotional, or
degree of technical accuracy required de ment tests, there is considerable variabil learning problems from the wide range of
pends on use. For example, the identifi ity in the degree of alignment to a specif normal development. When reporting to
cation of children for special education ic curriculum. Although to the lay person parents, teachers also need some idea of
has critical implications for individuals. "math is math" and "reading is reading," what constitutes grade-level perform
Failure to be identified could mean the de measurement specialists are aware that ance, but such "norms" can be in the form
nial of needed services, but being identi tiny changes in test format can make a of benchmark performances - evidence
fied as in need of special services may al large difference in student performance. that children are working at grade level
so mean removal from normal classrooms For example, a high proportion of stu -rather than statistical percentiles.
(at least part of the time) and a potential dents may be able to add numbers when To prevent the abuses of the past, the
ly stigmatizing label. A great deal is at they are presented in vertical format, but purposes and substance of early child
stake in such assessment, so the multi many will be unable to do the same prob hood assessments must be transformed.
faceted evaluation employed must have a lems presented horizontally. If manipula Assessments should be conducted only if
high degree of reliability and validity. Or tives are used in some elementary class they serve a beneficial purpose: to gain
dinary classroom assessments also affect rooms but not in all, including the use of services for children with special needs,
individual children, but the consequences manipulatives in a mathematics assess to inform instruction by building on what
of these decisions are not nearly so great. ment will disadvantage some children, students already know, to improve pro
An inaccurate assessment on a given day while excluding their use will disadvan grams, or to provide evidence nationally
may lead a teacher to make a poor group tage others. or in the states about programmatic needs.
ing or instructional decision, but such an Assessments that are used to guide in The form, substance, and technical fea
error can be corrected as more informa struction in a given classroom should be tures of assessment should be appropri
tion becomes available about what an in integrally tied to the curriculum of that ate for the use intended for assessment
dividual child "really knows." classroom. However, for large-scale as data. Moreover, the methods of assess
Group assessment refers to uses, such sessments at the state and national level, ment must be compatible with the devel
as program evaluation or school account the issues of curriculum match and the ef opmental level and experiences of young
ability, in which the focus is on group fect of assessment content on future in children. Below, I consider the implica
performance rather than on individual struction become much more problemat tions of these principles for three differ
scores. Although group assessments may ic. For example, in a state with an agreed ent categories of assessment purposes.
need to meet very high standards for tech upon curriculum, including geometry as
nical accuracy, because of the high stakes sessment in the early grades may be ap
Identifying Children with
associated with the results, the individual propriate, but it would be problematic in
scores that contribute to the group infor states with strong local control of curric
Special Needs
mation do not have to be so reliable and ulum and so with much more curricular I discuss identification for special ed
do not have to be directly comparable, so diversity. ucation first because this is the type of as
long as individual results are not report Large-scale assessments, such as the sessment that most resembles past uses of
ed. When only group results are desired, National Assessment of Educational Prog developmental screening measures. How
it is possible to use the technical advan ress, must include instructionally relevant ever, there is no need for wholesale ad
tages of matrix sampling -a technique content, but they must do so without con ministration of such tests to all incoming
in which each participant takes only a forming too closely to any single curric kindergartners. If we take the precepts

NOVEMBER 1994 209

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
of developmentally appropriate practices their limited abilities in English."'" dependent on mastery of writing letters,
seriously, then at each age level a very In-depth developmental assessments just as listening comprehension, making
broad range of abilities and performance are needed to ensure that children with predictions about books, and story retell
levels is to be expected and tolerated. If disabilities receive appropriate services. ings should be developed in parallel to,
potential handicaps are understood to be However, the diagnostic model of special not after, mastery of letter sounds.
relatively rare and extreme, then it is not education should not be generalized to a Although there is a rich research liter
necessary to screen all children for "hid larger population of below-average learn ature documenting patterns of emergent
den" disabilities. By definition, serious ers, or the result will be the reinstitution literacy and numeracy, corresponding
learning problems should be apparent. Al of tracking. Elizabeth Graue and I ana assessment materials are not so readily
though it is possible to miss hearing or vi lyzed recent efforts to create "at-risk" kin available. In the next few years, national
sion problems (at least mild ones) without dergartens and found that these practices interest in developing alternative, per
systematic screening, referral for evalu are especially likely to occur when re formance-based measures should gener
ation of a possible learning handicap sources for extended-day programs are ate more materials and resources. Specif
should occur only when parents or teach available only for the children most in ically, new Chapter 1 legislation is likely
ers notice that a child is not progressing need. ' 3The result of such programs is of to support the development of reading as
normally in comparison to age-appropri ten to segregate children from low socio sessments that are more authentic and in
ate expectations. In-depth assessments economic backgrounds into classrooms structionally relevant.
should then be conducted to verify the se where time is spent drilling on low-level For example, classroom-embedded
verity of the problem and to rule out a va prereading skills like those found on reading assessments were created from
riety of other explanations for poor per readiness tests. The consequences of ordinary instructional materials by a
formance. dumbed-down instruction in kindergar group of third-grade teachers in conjunc
For this type of assessment, develop ten are just as pernicious as the effects of tion with researchers at the Center for Re
mental measures, including I.Q. tests, con tracking at higher grade levels, especial search on Evaluation, Standards, and Stu
tinue to be useful. Clinicians attempt to ly when the at-risk kindergarten group is dent Testing. 4 The teachers elected to fo
make normative evaluations using rela kept together for first grade. If resources cus on fluency and making meaning as
tively curriculum-free tasks, but today for extended-day kindergarten are scarce, reading goals; running records and story
they are more likely to acknowledge the one alternative would be to group chil summaries were selected as the methods
fallibility of such efforts. For such diffi dren heterogeneously for half the day and of assessment.
cult assessments, clinicians must have then, for the other half, to provide extra But how should student progress be
specialized training in both diagnostic as enrichment activities for children with evaluated? In keeping with the idea of
sessment and child development. limited literacy experiences. representing a continuum of proficiency,
When identifying children with spe third-grade teachers took all the chapter
cial needs, evaluators should use two gen books in their classrooms and sorted them
Classroomn Assessments
eral strategies in order to avoid con into grade-level stacks, 1-1 (first grade,
founding the ability to learn with past op Unlike traditional readiness tests that first semester), 1-2, 2-1, and so on up to
portunity to learn. First, as recommend are intended to predict learning, class fifth grade. Then they identified repre
ed by the National Academy Panel on room assessments should support in sentative or marker books in each cate
Selection and Placement of Students in struction by modeling the dimensions of gory to use for assessment. Once the books
Programs for the Mentally Retarded, ' a learning. Although we must allow con had been sorted by difficulty, it became
child's learning environment should be siderable latitude for children to construct possible to document that children were
evaluated to rule out poor instruction as their own understandings, teachers must reading increasingly difficult texts with
the possible cause of a child's lack of nonetheless have knowledge of normal understanding. Photocopied pages from
learning. Although seldom carried out in development if they are to support chil the marker books also helped parents see
practice, this evaluation should include dren's extensions and next steps. Ordi what teachers considered to be grade-lev
trying out other methods to support learn nary classroom tasks can then be used to el materials and provided them with con
ing and possibly trying a different teacher assess a child's progress in relation to a crete evidence of their child's progress.
before concluding that a child can't learn developmental continuum. An example Given mandates for student-level report
from ordinary classroom instruction. A of a developmental continuum would be ing under Chapter 1, state departments of
second important strategy is to observe a that of emergent writing, beginning with education or test publishers could help de
child's functioning in multiple contexts. scribbles, then moving on to pictures and velop similar systems of this type with
Often children who appear to be impaired random letters, and then proceeding to sufficient standardization to ensure com
in school function well at home or with some letter/word correspondences. These parability across districts.
peers. Observation outside of school is continua are not rigid, however, and sev In the meantime, classroom teachers
critical for children from diverse cultural eral dimensions running in parallel may or preferably teams of teachers -are
backgrounds and for those whose home be necessary to describe growth in a sin left to invent their own assessments for
language is not English. The NAEYC gle content area. For example, a second classroom use. In many schools, teachers
stresses that "screening should never be dimension of early writing -a child's are already working with portfolios and
used to identify second language learners ability to invent increasingly elaborated developing scoring criteria. The best pro
as 'handicapped,' solely on the basis of stories when dictating to an adult is not cedure appears to be having grade-level

210 PHI DELTA KAPPAN

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
teams and then cross-grade teams meet to to "label, stigmatize, or classify any indi dren in the primary grades. If an early
discuss expectations and evaluation cri vidual child or group of children."'" childhood assessment were conducted pe
teria. These conversations will be more However, with this fearsome idea set riodically, it would be possible to demon
productive if, for each dimension to be as aside, the Technical Planning Subgroup strate the relationship between health ser
sessed, teachers collect student work and endorsed the idea of an early childhood vices and early learning and to evaluate
use marker papers to illustrate continua assessment system that would period the impact of such programs as Head Start.
of performance. Several papers might be ically gather data on the condition of In keeping with the precept that meth
used at each stage to reflect the tremen young children as they enter school. The ods of assessment should follow from
dous variety in children's responses, even purpose of the assessment would be to in the purpose of assessment, the Technical
when following the same general pro form public policy and especially to help Planning Subgroup recommended that
gression. "in charting progress toward achieve sampling of both children and assessment
Benchmark papers can also be an ef ment of the National Education Goals, items be used to collect national data.
fective means of communicating with par Sampling would allow a broad assess
ents. For example, imagine using sample ment of a more multifaceted content do
papers from grades K-3 to illustrate ex main and would preclude the misuse of
pectations regarding "invented spelling." individual scores to place or stigmatize
Invented spelling or "temporary spelling" Beginning in individual children. A national early child
is the source of a great deal of parental
dissatisfaction with reform curricula. Yet
1998-99, a hood assessment should also serve as a
model of important content. As a means
most parents who attack invented spelling representative to shape public understanding of the full
have never been given a rationale for its range of abilities and experiences that in
use. That is, no one has explained it in sample of 23,000 fluence early learning and development,
such a way that the explanation builds on
the parents' own willingness to allow suc
kindergarten the Technical Planning Subgroup identi
fied five dimensions to be assessed: 1)
cessive approximations in their child's students will physical well-being and motor develop
early language development. They have
never been shown a connection between be assessed and ment, 2) social and emotional develop
ment, 3) approaches toward learning, 4)
writing expectations and grade-level then followed language usage, and 5) cognition and gen
spelling lists or been informed about dif eral knowledge.
ferences in rules for first drafts and final through grade 5. Responding to the need for national da
drafts. Sample papers could be selected ta to document the condition of children
to illustrate the increasing mastery of as they enter school and to measure prog
grade-appropriate words, while allowing ress on Goal 1, the U.S. Department of
for misspellings of advanced words on and for informing the development, ex Education has commissioned the Early
first drafts. Communicating criteria is pansion, and/or modification of policies Childhood Longitudinal Study: Kinder
helpful to parents, and, as we have seen and programs that affect young children garten Cohort. Beginning in the 1998-99
in the literature on performance assess and their families."'6 Assuming that cer school year, a representative sample of
ment, it also helps children to understand tain safeguards are built in, such data could 23,000 kindergarten students will be as
what is expected and to become better at be a powerful force in focusing national sessed and then followed through grade 5.
assessing their own work. attention and resources on the needs of The content of the assessments used will
young children. correspond closely to the dimensions rec
Unlike past testing practices aimed at ommended by the Technical Planning
Monitoring National and evaluating individual children in com Subgroup. In addition, data will be col
State Trends parison with normative expectations, a lected on each child's family, communi
In 1989, when the President and the large-scale, nationally representative as ty, and school/program. Large-scale stud
nation's governors announced "readiness sessment would be used to monitor na ies of this type serve both program eval
for school" as the first education goal, tional trends. The purpose of such an as uation purposes (How effective are pre
many early childhood experts feared the sessment would be analogous to the use school services for children?) and re
creation of a national test for school en of the National Assessment of Education search purposes (What is the relationship
try. Indeed, given the negative history of al Progress (NAEP) to measure major between children's kindergarten experi
readiness testing, the first thing the Goal shifts in achievement patterns. For exam ences and their academic success through
1 Technical Planning Subgroup did was ple, NAEP results have demonstrated out elementary school?).
to issue caveats about what an early child gains in the achievement of black students National needs for early childhood da
hood assessment must not be. It should in the South as a result of desegregation, ta and local needs for program evaluation
not be a one-dimensional, reductionist and NAEP achievement measures showed information are similar in some respects
measure of a child's knowledge and abil gains during the 1980s in basic skills and and dissimilar in others. Both uses require
ities; it should not be called a measure of declines in higher-order thinking skills group data. However, a critical distinc
"readiness" as if some children were not and problem solving. Similar data are not tion that affects the methods of evaluation
ready to learn; and it should not be used now available for preschoolers or for chil is whether or not local programs share a

NOVEMBER 1994 211

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms
textualized skills. In response, fearing that by walking, by car; and demonstrating
"assessment" is just a euphemism for understanding of patterns and quantities
more bad testing, many early childhood in a variety of ways.
Fearing that professionals have asked, Why test at all? In classrooms, we need new forms of

assessment is Indeed, given a history of misuse, the bur


den of proof must rest with assessment
assessment so that teachers can support
children's physical, social, and cognitive
just a euphemism advocates to demonstrate the usefulness
of assessment and to ensure that abuses
development. And at the level of public
policy, we need new forms of assessment
for more bad will not recur. Key principles that support so that programs will be judged on the ba
responsible use of assessment informa sis of worthwhile educational goals.
testing, many tion follow.
early childhood * No testing of young children should
occur unless it can be shown to lead to 1. Lorrie A. Shepard and Mary Lee Smith, "Esca

professionals beneficial results.


lating Academic Demand in Kindergarten: Coun
terproductive Policies," Elementary School Jour

have asked, Why * Methods of assessment, especially


the language used, must be appropriate
nal, vol. 89, 1988, pp. 135-45.
2. Sue Bredekamp, ed., DevelopmentallyAppropri
test at all? to the development and experiences of ate Practice in Early Childhood Programs Serv
ing Children from Birth Through Age 8, exp. ed.
young children.
(Washington, D.C.: National Association for the Ed
* Features of assessment - content, ucation of Young Children, 1987).
form, evidence of validity, and standards 3. M. Th?r?se Gnezda and Rosemary Bolig, A Na
common curriculum. If local programs, for interpretation - must be tailored to tional Survey of Public School Testing of Pre
such as all the kindergartens in a school the specific purpose of an assessment. Kindergarten and Kindergarten Children (Wash
ington, D.C.: National Forum on the Future of Chil
district, have agreed on the same curricu * Identifying children for special edu
dren and Families, National Research Council,
lum, it is possible to build program eval cation is a legitimate purpose for assess 1988).
uation assessments from an aggregation ment and still requires the use of curricu 4. Samuel J. Meisels, "Uses and Abuses of Devel
of the measures used for classroom pur lum-free, aptitude-like measures and nor opmental Screening and School Readiness Testing,"
poses. Note that the entire state of Ken mative comparisons. However, handicap Young Children, vol. 42, 1987, pp. 4-6, 68-73.
ping conditions are rare; the diagnostic 5. Lorrie A. Shepard and M. Elizabeth Graue, "The
tucky is attempting to develop such a sys
Morass of School Readiness Screening: Research
tem by scoring classroom portfolios for model used by special education should on Test Use and Test Validity," in Bernard Spodek,
state reporting. not be generalized to a larger population ed., Handbook of Research on the Education of
If programs being evaluated do not of below-average learners. Young Children (New York: Macmillan, 1993), pp.
* For both classroom instructional 293-305.
have the same specific curricula, as is
6. Anne C. Stallman and P. David Pearson, "Formal
the case with a national assessment and purposes and purposes of public policy
Measures of Early Literacy," in Lesley Mandel Mor
with some state assessments, then the as making, the content of assessments row and Jeffrey K. Smith, eds., Assessment for In
sessment measures must reflect broad, should embody the important dimensions struction in Early Literacy (Englewood Cliffs, N.J.:
agreed-upon goals without privileging any of early learning and development. The Prentice-Hall, 1990), pp.7-44.
specific curriculum. This is a tall order, tasks and skills children are asked to per 7. Lorrie A. Shepard, "A Review of Research on
more easily said than done. For this rea form should reflect and model progress Kindergarten Retention," in Lorrie A. Shepard and
Mary Lee Smith, eds., Flunking Grades: Research
son, the Technical Planning Subgroup rec toward important learning goals. and Policies on Retention (London: Falmer Press,
ommended that validity studies be built In the past, local newspapers have pub 1989), pp. 64-78.
into the procedures for data collection. For lished readiness checklists that suggested 8. Lorrie A. Shepard and Mary Lee Smith, "Aca
example, pilot studies should verify that that children should stay home from kin demic and Emotional Effects of Kindergarten Re
tention in One School District," in idem, pp. 79-107.
what children can do in one-on-one as dergarten if they couldn't cut with scis
9. "Guidelines for Appropriate Curriculum Content
sessment settings is consistent with what sors. In the future, national and local as
and Assessment in Programs Serving Children Ages
they can do in their classrooms, and as sessments should demonstrate the rich 3 Through 8," Young Children, vol. 46, 1991, pp.
sessment methods should always allow ness of what children do know and should 21-38.
children more than one way to show what foster instruction that builds on their 10. Ibid., p. 32.
they know. strengths. Telling a story in conjunction 11. Kirby A. Heller, Wayne H. Holtzman, and
with scribbles is a meaningful stage in lit Samuel Messick, eds., Placing Children in Special
Education (Washington, D.C.: National Academy
eracy development. Reading a story in
Conclusion English and retelling it in Spanish is evi
Press, 1982).
12. "Guidelines," p. 33.
In the past decade, testing of 4-, 5-, and dence of reading comprehension. Evi 13. Shepard and Graue, op. cit.
6-year-olds has been excessive and in dence of important learning in beginning 14. The Center for Research on Evaluation, Stan
appropriate. Under a variety of different mathematics should not be counting to dards, and Student Testing is located on the cam
names, leftover I.Q. tests have been used 100 instead of to 10. It should be extend puses of the University of California, Los Angeles,
and the University of Colorado, Boulder.
to track children into ineffective pro ing patterns; solving arithmetic problems
15. Goal 1: Technical Planning Subgroup Report on
grams or to deny them school entry. Pre with blocks and explaining how you got
School Readiness (Washington, D.C.: National Ed
reading tests held over from the 1930s your answer; constructing graphs to show ucation Goals Panel, September 1991).
have encouraged the teaching of decon how many children come to school by bus, 16. Ibid., p. 6. K

212 PHI DELTA KAPPAN

This content downloaded from 49.50.237.133 on Wed, 04 Oct 2017 13:36:33 UTC
All use subject to http://about.jstor.org/terms

Anda mungkin juga menyukai