Anda di halaman 1dari 5

The

Corruption
of
School
Accountability
How experience with quantitative
measurements in other sectors can inform
the use of high-stakes test scores in education severe for disadvantaged students who are
most in need of a balanced curriculum.
BY RICHARD ROTHSTEIN Schools now focus on “bubble” students
(those just below the proficiency point)
to the detriment of students who are

I
far behind or already above proficiency.
Accountability for an unreliable single
t has become conventional to say were misleading about how other indus- test score results in arbitrary classifications
that holding educators accountable tries and sectors behave. In the private of some fine schools as failing, and some
and paying for higher test scores sector, pay is almost never based primar- poor schools as adequate. Drills leading to
will improve performance. When ily on quantitative performance measures. limited long-term learning and “teaching
New York City Mayor Michael Fewer firms than in the past now use to the test” have become commonplace.
Bloomberg recently announced the city commissions and piece rates for sales and Some schools manipulate data — for
would pay teachers bonuses where scores production workers, and more firms award example, by opportunistic assignment of
increase, he said, “In the private sector, bonuses to professionals based largely on students to subgroups where they do the
cash incentives are proven motivators for subjective supervisory evaluations. least harm to school ratings.
producing results. The most successful Defenders of NCLB are quick to
employees work harder, and everyone else NCLB Defenders denounce teachers for such tactics. They
tries to figure out how they can improve It is not hard to see why. Under No Child suggest if only teacher quality were higher,
as well.” Left Behind, reliance solely on numeri- teachers would get high test scores with-
Real estate developer Eli Broad, whose cal measures, principally math and read- out teaching to the test or engaging in
foundation promotes incentive pay for ing scores, to evaluate performance has other shortcuts to high test scores that
teachers, added, “Virtually every other corrupted schooling. Educators have do not reflect true learning. But teachers’
industry compensates employees based on responded rationally to incentives that, responses to quantitative accountability
how well they perform. … We know from to devote more time to math and reading, systems are no worse than the responses of
experience across other industries and sec- spur reductions in social studies, science, professionals and nonprofessionals in other
tors that linking performance and pay is a art, music, physical education, coopera- fields to similar performance incentives.
powerful incentive.” tive learning and other character-building Such responses are inherent in simplistic
Yet the two billionaires’ statements activities. Reductions have been most quantitative evaluations.

14 T H E S C H O O L A D M I N I S T R AT O R j u n e 2 0 0 8
Corruption of education from NCLB-
type accountability, therefore, should
have been foreseen. After all, there are
many commonplace examples of the harm
that such systems can do. Those famil-
iar with the instructional distortions and
gaming that have characterized schools’
responses to NCLB and similar state test-
score accountability policies should see
obvious analogies in these examples from
other fields.
l During the Vietnam War, Sec-
retary of Defense Robert McNamara
believed strongly in numerical measures
and demanded reports of American and
North Vietnamese body counts. Just as
high reading test scores usually indicate
reading proficiency, relative casualties
usually indicate the fortunes of nations at
war. But an army can be corrupted if local
commanders are judged primarily by this
relatively easily measured indicator, losing
sight of political and economic objectives.
High enemy body count numbers (some-
times contrived) misled American leaders falsely confess to multiple crimes. lishes an annual ranking of colleges, truly
into believing the war was being won. l Television stations sell advertising an accountability system because college
l Motorists cited for trivial traffic viola- at rates determined by viewership during boards of trustees sometimes consider the
tions may have experienced an accountabil- designated sweeps months when a survey rankings when determining presidential
ity system in which commanders evaluate company, Nielsen, determines what pro- compensation. Rankings are based partly
police officers by ticket quotas. Certainly, grams typical viewers watch. The system on how selective a college is, determined
issuing citations for violations is one meas­ assumes that sweeps programming is repre- by the percentage of admitted applicants.
ure of good policing, but when officers are sentative of programming throughout the (More selective colleges admit a smaller
judged by this easily quantifiable outcome, year. Yet stations respond to these high- percentage of applicants.)
they have incentives to focus on trivial stakes surveys by scheduling programs that Selectivity would be a reasonable indi-
offenses that meet a numerical goal rather are more attention-grabbing than a typical cator if there were no stakes attached to
than investigating serious crimes where month’s shows. When viewership num- measuring it. Colleges that accept relatively
payoffs may be less certain. bers during sweeps months become ends few applicants are likely to have higher
l The Federal Bureau of Investigation in themselves rather than a reflection of quality, but once this indicator became
tracks local police clearance rates as a basis year-round program popularity, they dis- an accountability measure, colleges had
for evaluating effectiveness. The clearance tort advertising rates. incentives to boost their own rejection
rate is the percentage of reported crimes l Several newspapers, most notably The rates. Some send promotional mailings to
resulting in convictions. Just as high math New York Times, publish weekly best-seller unqualified applicants, drop application fees
scores characterize effective schools, high lists. Books that make the list get special or send already-completed applications to
clearance rates characterize effective police promotional displays in book stores, result- high school seniors to sign. The indicator
departments. But as with math scores, the ing in increased sales and authors’ royalties. has thus lost much of its value.
clearance rate indicator is corrupted when The best-seller list is compiled from sales l The U.S. Department of Transporta-
it becomes an end in itself. reports collected by the newspaper from a tion requires airlines to report the percent-
Police increase the rate by offering national sample of book outlets. But pub- age of flights that departed and arrived on
reduced charges to suspects who confess lishers can “teach to the test,” identifying time, defined as within 15 minutes of the
I llustration © by david clark

other crimes, including those they have stores to be sampled and organizing bulk published schedule. The department, con-
not actually committed. Such plea bargains purchases at them. The Times cannot sumer groups and members of Congress who
give detectives big boosts in clearance rates. always successfully monitor store sales to advocated such reporting believed that trav-
Meanwhile, those pleading guilty only to identify such artificial purchases that cor- elers would be more likely to choose airlines
the crime for which they were arrested typi- rupt the representativeness of the index. with better on-time performance, and this
cally get harsher sentences than those who l U.S. News and World Report pub- would be an incentive for the airlines to

j u n e 2 0 0 8 T he S chool A dministrator 15
improve. To avoid incentives for airlines to ing the sum of crimes in several categories, ices on the 91st day. Other agencies simply
hurry departures in unsafe conditions, flights one of which was serious larceny (where refused to enroll the most difficult-to-place
delayed because of mechanical difficulties the loss was worth at least $50). Many cities unemployed workers. Others cut back on
were excluded from the calculations. subsequently posted significant reductions educational activities designed to train
Airlines responded by reporting more in crime. The biggest reductions were in workers for higher-paying and longer-lasting
phony mechanical difficulties when flights larcenies of $50 and over in value. jobs because only short-term employment
were late. And they padded schedules — Valuing larceny is a matter of judg- counted in the accountability system.
when more time was allotted, flights’ ment, so police departments placed lower l The Medicare system has issued
on-time performance improved. This did values on losses after the accountability report cards on health providers. One has
nothing to accomplish the Transportation system was implemented. Although crime been based on mortality rates of patients
Department’s stated objective, to improve reportedly declined, the number of $49 who undergo open heart surgery. Some
on-time performance on previously pub- larcenies increased. hospitals and physicians responded sim-
lished schedules, which were purported to l Other public sectors have had similar ply by refusing to operate on the sickest
be realistic. experiences. The federal government has patients. Because the accountability sys-
l As a presidential candidate, Richard held local job training agencies account- tem attempted “risk adjustment,” statisti-
M. Nixon promised in 1968 to reduce crime able for placing unemployed workers in cally controlling for patient characteris-
nationwide. After his election, the Federal jobs that last at least 90 days. Some agen- tics, other providers simply claimed the
Bureau of Investigation publicly reported cies responded by providing child care and patients were sicker than they were. The
crime statistics by city. It judged whether transportation to workers for the first 90 distortions were so great that Medicare
police departments were effective by track- days of employment, terminating these serv­ abandoned the system in 1993. The cur-

16 T H E S C H O O L A D M I N I S T R AT O R j u n e 2 0 0 8
rent administration has reinstituted it. reviewed health care report cards and con- often nonquantifiable potential that stock
l Medicare also issues report cards on cluded: “[A]dministrators will place all their prices or other easily measured character-
nursing homes, based on whether they organizations’ resources in areas that are istics might obscure. Equity markets can
meet 15 recognized quality standards — being measured. Areas that are not high- only exist because easily measured indica-
for example, the percentage of residents lighted in report cards will be ignored.” tors are not transparent — buyers and sell-
who have pressure sores (from being turned ers have different interpretations of what
in bed too infrequently). Because nurses’ Gaming Incentives firms’ financial indicators mean.
time is limited, if they spend more time For business organizations generally, quan- Corporate gamesmanship further limits
complying with the turning-patients-in-bed titative measures of performance are used the ability of the SEC and private account-
standard for which they are held account- warily and never exclusively. Even stock ing standards to prevent the distortion of
able, they may have less time to maintain prices or profit are not simple guides to numerical performance incentives. Execu-
hygienic standards by washing hands reg- public companies’ performance and tives whose compensation is based partly
ularly, something for which they are not potential. The Securities and Exchange on corporate earnings can maximize their
held accountable. Following the introduc- Commission has complex regulations to bonuses by manipulating depreciation
tion of the report card, performance on the prevent publicly traded firms from using schedules for long-term assets; by varying
accountability indicators improved, but numerical indicators to mislead investors. whether shipments to or from inventories
adherence to many other standards (like Yet financial data are still too complex should be accelerated or delayed at the
hand washing) declined, resulting in poorer for laypersons to interpret, which is why end of an accounting period; by transfer-
overall quality in nursing homes. investors rely on sophisticated analysts, ring other revenues or expenses from one
The U.S. General Accountability Office employed to discern the underlying and accounting period to another; by allocat-

j u n e 2 0 0 8 T he S chool A dministrator 17
ing overhead to inventories; and by Curiously, the federal govern-
shifting whether major repair activi- ment administers a balanced score-
ties, research and development or card approach, simultaneously with
even advertising expenses should be its test score-based No Child Left
capitalized or expensed. Behind Act. Each year since 1988,
Most private-sector jobs, like the U.S. Department of Com-
teaching, include a composite of merce has made Malcolm Baldrige
easily measured and less-easily mea- National Quality Awards for exem-
sured responsibilities. Adding multi- plary institutions in manufacturing
ple measures of accountability is, by and other business sectors. Numeri-
itself, insufficient to minimize goal cal performance indicators play only
distortion if the added measures are a small role in award decisions. For
also quantitative. For example, one the private sector, 450 out of 1,000
of the nation’s largest banks deter- points are for “results,” although
mined that branch managers should even here, some results, such as
not be rewarded only for short-term ethical behavior, social responsibil-
branch financials, but also for other ity, trust in senior leadership, work-
measures that contributed to long- force capability and capacity, and
term profitability, such as customer customer satisfaction and loyalty are
satisfaction as determined by sur- difficult or impossible to quantify.
veying customers. One manager Richard Rothstein, a research associate with the The Baldrige award program and
boosted his ratings, and thus his bonuses, Economic Policy Institute, speaks frequently to its principles were extended to health and
education audiences.
by serving free food and drinks, but this education institutions in 1999. For school
did nothing to boost the bank’s long-term districts, only 100 of 1,000 points are for
financial prospects. clients that judgment of results should student learning outcomes, with other
Because of the ease with which most always focus on long-term, not short-term points awarded for subjectively evaluated
employees game purely quantitative incen- (and more easily quantifiable), goals. A measures, such as “how senior leaders’ per-
tives, most private-sector accountability company director estimated that at Bain sonal actions reflect a commitment to the
systems blend quantitative and qualita- itself, each manager devotes about 100 organization’s values.”
tive measures, with emphasis on the lat- hours a year to evaluating five employees The most recent Baldrige award in
ter. McDonald’s does not evaluate store for purposes of its incentive pay system. elementary and secondary education was
managers by sales volume or profitabil- “When I try to imagine a school princi- given in 2005 to the Jenks, Okla., school
ity alone. Instead, a manager and his or pal doing 30 reviews, I have trouble,” he district. The Department of Commerce
her supervisor establish targets for easily observed. cited the school district’s test scores as
quantifiable measures, but also less eas- A widespread business reform in recent well as low teacher turnover and innova-
ily quantifiable product quality, service, decades has been total quality manage- tive programs such as an exchange rela-
cleanliness and personnel training because ment, inspired by W. Edwards Deming, tionship with schools in China and the
these factors may affect long-term profit- who warned that businesses seeking to enlistment of residents in a long-term
ability, as well as the reputation of other improve quality and thus long-term per- care facility to mentor kindergartners and
outlets over which a local manager has formance should eliminate work standards pre-kindergartners. Yet in 2006 the Jenks
no control. Wal-Mart uses a similar sys- (quotas), eliminate management by num- district was deemed by NCLB to be sub-
tem for professional employees, as do most bers and numerical goals and abolish merit standard because students had failed for
other private organizations that engage in ratings and management by objective two consecutive years to make adequate
employee evaluation for purposes of pay. because all of these encourage employees yearly progress in reading test scores.
Certainly, supervisory evaluations of to focus on short-term results. A good accountability system in edu-
employees are less reliable than objec- A corporate accountability tool that has cation — one that takes account of both
tive, quantitative indicators. Supervisory grown more recently in popularity is the the easily measured and the subjectively
evaluations may be tainted by favorit- balanced scorecard, also first proposed in evaluated indicators of quality — will be
ism, bias, inflation and even kickbacks the early 1990s because business manage- expensive, far more expensive than NCLB’s
or other forms of corruption. Yet the fact ment theorists concluded that quantifiable reliance on flawed standardized tests. In
that subjective evaluations are so widely short-term financial results were not an designing a good accountability system, pol-
used, despite these flaws, suggests that accurate guide to future profitability. Firms’ icymakers should take to heart the calls of
PHOTO COURTESY OF CARLTON FORBES

most private-sector employers consider goals were too complex to be reduced to a Mayor Bloomberg and Eli Broad to model
quantitative judgment even worse. few quantifiable measures because predict- incentives after those that are actually in
ing future performance relies not only on use in the private sector. n
Accountability Scorecards past financial success, but on subjective
Managing accountability in the private judgments of product quality, employee Richard Rothstein is a research associate at the
sector is labor intensive. Bain and Co., motivation, internal corporate cohesion Economic Policy Institute in Washington, D.C.
the management consulting firm, advises and customer satisfaction and loyalty. E-mail: rrothstein@epi.org

18 T H E S C H O O L A D M I N I S T R AT O R j u n e 2 0 0 8

Anda mungkin juga menyukai