Anda di halaman 1dari 26

stephen baker

final
jeopardy
m a n v s. m a c h i n e
and the quest to know

every thing
STEPHEN BAKER

FINAL
J E O PA R D Y
MAN vs. MACHINE

AND THE QUEST

TO KNOW EVERYTHING

HOUGHTON MIFFLIN HARCOURT


BOSTON • NEW YORK
2011
Copyright © 2011 by Stephen Baker
All rights reserved
For information about permission to reproduce selections from this book,
write to Permissions, Houghton Mifflin Harcourt Publishing Company,
215 Park Avenue South, New York, New York 10003.
www.hmhbooks.com

Library of Congress Cataloging-in-Publication Data


Baker, Stephen, date.
Final Jeopardy: man vs. machine and the quest to know everything /
Stephen Baker.
p. cm.
Summary: “Researchers at IBM have launched a project to develop a machine
that can compete in the quiz show Jeopardy — and win. Early next year,
if all goes well, the machine will face off in a nationally televised match against two
kingpins of trivia: The reigning champion of Jeopardy, still to be determined,
and the game’s greatest all-time winner, Ken Jennings. Final Jeopardy,
by journalist Stephen Baker, carries readers on a captivating journey from
the IBM labs to the showdown in Hollywood. The story features brilliant
Ph.D.s, Hollywood moguls, knowledge-obsessed Jeopardy masters — and a very
special collection of silicon and circuitry named Watson. It is a classic match of
Man vs. Machine, not seen since the chess-playing computer Deep Blue bested the
world’s reigning grandmaster, Garry Kasparov. But Watson will need to do more
than churn through chess moves or find a relevant Web page. It will have to under-
stand language, including puns and irony, and master everything from history
and literature to science, arts, and entertainment” — Provided by publisher.
isbn 978-0-547-48316-0
1. Natural language processing (Computer science) 2. Semantic computing.
3. Artificial intelligence. 4. Database management. 5. Watson (Computer)
6. Jeopardy (Television program)
I. Title.
qa76.9.n38b35 2011
006.3 — dc22
2010051653

Book design by Melissa Lotfy


Printed in the United States of America
doc 10 9 8 7 6 5 4 3 2 1
1

The Germ of the Jeopardy Machine

The J E O P A RD Y machine’s birthplace — if a computer can


stake such a claim — was the sprawling headquarters of the
global research division named after its flesh-and-blood an-
cestor, IBM’s founder, Thomas J. Watson. In 1957, when IBM
presided over the rest of the infant computer industry, the
company cleared woods on a hill in Yorktown Heights, New
York, about forty miles north of midtown Manhattan, and
hired the Finnish-American architect Eero Saarinen to de-
sign a lab. If computing was the future, as seemed inevitable,
it was on this hill that a good part of it would be dreamed
up, modeled mathematically, and prototyped. Saarinen was
a natural choice to express this sparkling future in glass and
rock. A year earlier, he had designed the winged TWA Ter-
minal for the new Idlewild Airport (later called JFK). Before
that, he’d drawn up the majestic Gateway Arch that would
loom over St. Louis. In Yorktown, it was as if he had laid the
Gateway Arch on its side. The building, with three stories of
glass walls, curved along the top of the hill. For visitors stroll-
ing the wide corridors decades later, the combination of the
structure’s rough stone and the broad vistas of rolling hills still
20 F I N A L J E O P A R D Y

delivered just the right message of wealth, vision, and perma-


nence.
The idea for a Jeopardy machine, at least according to one
version of the story, dates back to an autumn day in 2004. For
several years, top executives at the company had been pushing
researchers to come up with the next Grand Challenge. In the
’90s, the challenge had been to build a computer that would
beat a grand champion in chess. This produced Deep Blue. Its
1997 victory over Garry Kasparov turned into a global event
and fortified IBM’s reputation as a giant in cutting-edge com-
puting. (This grew more important as consumer and Web
companies, from Microsoft to Yahoo!, threatened to steal the
spotlight — and the young brainpower. Google was still just a
couple of grad students at Stanford.) Later, in another Grand
Challenge in the first years of the new century, IBM produced
Blue Gene, the world’s fastest supercomputer.
What would the next challenge be? On that fall day, a se-
nior manager at IBM Research named Charles Lickel drove
north from his lab, up the Hudson, to the town of Pough-
keepsie, and spent the day with a small team he managed.
That evening, the group went to the Sapore Steakhouse in
nearby Fishkill, where they could order venison, elk, or buf-
falo, or split a whopping fifty-two-ounce porterhouse steak
for two. There, something strange happened. At seven o’clock,
many of the diners stood up from their tables, their food un-
touched, and filed into the bar, which had a television set.
“The dining room emptied,” Lickel said. People were packed
in there, three rows deep, to see whether Ken Jennings, who
had won more than fifty straight matches on Jeopardy, would
win again. He did. A half hour later, the crowd returned to
their food, raving about the question-answering phenom. As
Lickel noted, their steaks had to have been stone cold.
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 21

Though he hadn’t watched much Jeopardy since he was a


kid, that scene in the bar gave him an idea for the next Grand
Challenge. What if an IBM computer could beat Ken Jen-
nings? (Other accounts have it that the vision for a Jeopardy
computer was already circulating along the corridors of the
Yorktown lab. The original idea, it turns out, is tough to
trace.)
In any event, Lickel pushed the idea. In the first meet-
ing, it provoked plenty of dissent. Chess was nearly as clean
and timeless as mathematics itself, a cerebral treasure handed
down through the ages. Jeopardy, by contrast, looked ques-
tionable from the get-go. Produced by a publicly traded com-
pany, Sony, and subject to ratings and advertisers, it was in
the business of making money and pleasing investors. It was
Hollywood, for crying out loud. “There was a lot of doubt
in the room,” Lickel said. “People wanted something more
obviously scientific.” A second argument was perhaps more
compelling: people playing Jeopardy would in all likelihood
annihilate an IBM machine. “They all grabbed me after the
meeting,” Lickel recalled, “and said, ‘Charles, you’re going to
regret this.’ ”
In the end, it was up to Paul Horn. A former professor of
physics at the University of Chicago, Horn had headed IBM’s
three-thousand-person research arm since 1996. “If you think
about smart machines,” he later said, “Blue Gene by some
measures had the raw computing power of the human brain,
at least within an order of magnitude or two.” Horn discussed
those early days in his sun-splashed office at New York Uni-
versity, where he took up residence after his 2008 retirement
from IBM. He had a black beard, and a tiny ponytail poked
out from the back of his head.
“So here we have a machine that’s as fast as your brain,
22 F I N A L J E O P A R D Y

or close,” he said. “But it doesn’t think the way we think. So


what would be an appropriate grand challenge that would
have high visibility and excite people?” He didn’t remember
the idea coming from Lickel or hearing about the Fishkill din-
ner. In fact, Horn thought the idea might have come from
him. In any case, he liked it — and promptly ran into resis-
tance. “The general response was negative,” he recalled. “Peo-
ple said, ‘It can’t be done. It’s too much of a publicity stunt.
The only reason that you’re interested in it is because it’s a
show on TV.’ ” But Horn thought that building a savvy an-
swering machine was the ideal challenge for IBM. While he
maintained that he viewed the grand challenge as pure re-
search, it also made plenty of sense.
IBM’s business had undergone a radical transformation
over the course of Horn’s thirty-year career at the company.
As late as the 1970s, IBM ruled the computer industry. It
launched its first computers for business in 1952. But it was
its breakthrough mainframe in 1964, Series 360, that estab-
lished a single standard of computing in business, industry,
and science. IBM pitched itself as a safe, if expensive, bet for
companies looking to computerize. Its buttoned-down sales
and consulting teams spread a compelling message around
the world: “Nobody ever got fired for buying IBM.” Big Blue,
a name derived from the massive blue mainframes it sold,
grew so big that its rivals, including Sperry, Burroughs, Hon-
eywell, and four other companies, came to be known as the
Seven Dwarfs. During this time, IBM researchers at Saarin-
en’s edifice and at other labs around the world churned out
an array of new technologies. They came up with magnetic
strips for credit cards and floppy disks for computer data
storage. Yet it was computers that drove the business. When
Horn arrived at IBM Research in 1979, the greatest threat
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 23

to IBM appeared to be a decade-long antitrust complaint


brought by the U.S. Justice Department. It alleged that IBM
had violated the Sherman Act by attempting to monopolize
the fast-growing industry for business computers. Whether or
not Big Blue had broken the law, its dominance was beyond
question.
By 1982, when the Justice Department dropped the suit
for lack of evidence, the computer world was shifting under
Big Blue’s feet. The previous year, IBM had unveiled its first
personal computer, or PC. Priced at $1,500, it provided both
legitimacy and a standard for the young industry. Early on,
as corporate customers gobbled up PCs, it seemed as though
IBM would go on to dominate this next stage of comput-
ing. But there was a crucial difference between these desk-
top machines and the mainframes. Nearly every component
of the mainframes, including their processors and software,
was made by IBM. In the lingo of the industry, the comput-
ers were vertically integrated. This was not the case with PCs.
In order to get to market quickly at a low price, IBM built
them from off-the-shelf technology — microprocessors from
Intel and a rudimentary operating system, MS-DOS, from a
Seattle startup called Microsoft. Since the PC had commod-
ity innards, it took no time at all for newcomers, including
Compaq and Taiwan’s Acer, to plug them into cheaper “IBM-
compatible” computers, or clones. IBM found itself slugging
it out with a slew of upstarts while Intel and Microsoft ran
away with the profits and grew into titans. Big Blue was in de-
cline, falling faster than most people imagined. And in 1992,
the vast industrial behemoth stunned the business world by
registering a $4.97 billion loss, the largest in U.S. history at
the time. In the space of a decade, a company that had been
synonymous with cutting-edge technology now looked tired
24 F I N A L J E O P A R D Y

and wasteful, a manufacturing titan ill-suited to the Informa-


tion Age. It almost went under.
A new chief executive, Louis V. Gerstner, arrived in 1993
and transformed IBM. He sold off or shuttered old manufac-
turing divisions and steered the company toward businesses
based on information. IBM did not have to sell machinery to
be a leader in technology, he said. It could focus on the intel-
ligence to run the technology — the software — along with the
know-how to put the systems to good use. That was services,
including consulting, and it led IBM back to growth.
Technology, in the early ’90s, was convulsing entire indus-
tries and the new World Wide Web promised even more dra-
matic change. IBM’s customers, which included virtually ev-
ery blue-chip company on the planet, were confused about
how these new networks and services fit into their businesses.
Did it make sense to shift design work to China or India and
have teams work virtually? Should they remake customer ser-
vice around the Web? They had loads of questions, and IBM
decided it could sell the answers. It could even take over tech
operations for some of its customers and charge for the ser-
vice.
This push toward services and software continued under
Gerstner’s successor, Samuel J. Palmisano. Two months after
Charles Lickel came back from Poughkeepsie with the idea
for a Jeopardy computer that could play Jeopardy, IBM sold
its PC division to Lenovo Group of China. That year IBM
Global Services registered $40 billion in sales, more than the
$31 billion in hardware sales and a much larger share of prof-
its. (By 2009, services would grow to $55 billion, nearly 60
percent of the company’s revenue. And the consultants work-
ing in the division sold lots of IBM software, which registered
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 25

$21 billion in sales.) Naturally, a Jeopardy computer would run


on IBM hardware. But the heart of the system, like IBM it-
self, would be the software created to answer difficult ques-
tions.
A Jeopardy machine would also respond to another change
in technology: the move toward human language. For most
of the first half-century of the computer age, machines spe-
cialized in orderly rows of numbers and words. If the buyers
in a database were listed in one column, the products in an-
other, and the prices in a third, everything was clear: Com-
puters could run the numbers in a flash. But if one of the cus-
tomers showed up as “Don” in one transaction and “Donny”
in another, the computer viewed them as two people: The
two names represented different strings of ones and zeros,
and therefore Don ≠ Donny. Computers had no sense of lan-
guage, much less nicknames. In that way, they were clueless.
The world, and all of its complexity, had to be simplified,
structured and spoon-fed to these machines.
But consider what hundreds of millions of ordinary peo-
ple were using computers for by 2004. They were e-mailing
and chatting. Some were signing up for new social networks.
(Facebook launched in February of that year.) Online human-
ity was creating mountains of a messy type of digital data: hu-
man language. Billions of words were rocketing through net-
works and piling up in data centers. Those words expressed
what millions of people were thinking, desiring, fearing, and
scheming. The potential customers of IBM’s clients were out
there spilling their lives. Entire industries grew by understand-
ing what people were saying and predicting what they might
want to do, where they might want to go, and what they were
eager to buy. Google was already mining and indexing words
26 F I N A L J E O P A R D Y

on the Web, using them to build a media and advertising em-


pire. Only months earlier, Google had debuted as a publicly
traded company, and the new stock was sky-rocketing.
IBM wasn’t about to mix it up with Google in the com-
mercial Web. But Big Blue needed state-of-the-art tools to
provide its corporate customers with the fastest and most in-
sightful read of the words cascading through their networks.
To keep a grip on its gold-plated consulting business, IBM re-
quired the very smartest, language-savvy technology — and it
needed its customers to know and trust that it had it. It was
central to IBM’s brand.
So in mid-2005 Horn took up the challenge with a num-
ber of his top researchers, including Ferrucci. A twelve-year
veteran at the company, Ferrucci managed a handful of re-
search teams, including the five people who were teaching
machines to answer simple questions in English. Their disci-
pline was called question-answering. Ferrucci knew the chal-
lenges all too well. The machines stumbled in understanding
English and appeared to plateau, in competitions sponsored
by the U.S. government, at a success rate of about 35 percent.
Ferrucci wasn’t a big Jeopardy fan, but he was familiar with
it enough to appreciate the obstacles involved. Jeopardy tested
a combination of knowledge, speed, and accuracy, along with
game strategy. The show featured three contestants, each with
a buzzer. In the course of about twenty minutes, they raced
to respond to sixty clues representing a combined value of
$54,000. Each one — and this was a Jeopardy quirk — was in
fact an answer, some far more complex than others. The con-
testant had to provide the missing question. For example, in
an unusual Tournament of Champions game that aired in
November 1994, contestants were presented with this $500
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 27

clue* under the category Furniture: “French term for a what-


not, a stand of tiered shelves with slender supports used to
display curios.” The host, Alex Trebek, read the clue from the
big game board. The moment he finished, a panel around the
question lit up setting off the race to buzz. On average, con-
testants had about four seconds to read and consider the clue
before buzzing. The first to buzz was, in effect, placing a bet.
The right response — “What is an étagère?” — was worth $500
and gave the contestant the chance to pick again. (“Let’s try
European Capitals for $200.”) A botched response wiped the
same amount from a contestant’s score and gave the other two
a chance to try. (In this example, no one dared to buzz. Such a
clue, uncommon in Jeopardy, is known as a “triple-stumper.”)
To compete in Jeopardy, a machine not only would need
to come up with the answer, posed as a question, within four
seconds, but it would also have to gauge its confidence in
its response. It would have to know what it knew. “Humans
know what they know like that,” Ferrucci said later, snap-
ping his fingers. Replicating such confidence in a computer
would be tricky. What’s more, the computer would have to
calculate the risk according to where it stood in the game. If it
was far ahead and had only middling confidence on “étagère,”
it might make more sense not to buzz. In addition to piling
up knowledge, a computer would have to learn to play the
game.
Complicating the game strategy were four wild cards.
Three of the game’s sixty hidden clues were so-called Daily
Doubles. In that 1994 game, a contestant named Rachael

* In Jeopardy, the answers on the board are called “clues,” and the players’
questions — what most viewers perceive as answers — are “responses.”
28 F I N A L J E O P A R D Y

Schwartz, an attorney from Bedminster, New Jersey, asked for


the $400 clue in the Furniture category. Up popped a Daily
Double giving her the chance to bet some or all of her money
on a furniture-related clue she had yet to see. She wagered
$500, a third of her winnings, and was faced with this clue:
“This store fixture began in 15th century Europe as a table
whose top was marked for measuring.” She missed it, guess-
ing, “What is a cutting table?,” and lost $500. (“What is a
counter?” was the correct response.) It was early in the game
and didn’t have much impact. The three players were all
around the $1,000 mark. But later in a game, Ferrucci saw,
Daily Doubles gave contestants the means to storm back from
far behind. A computer playing the game would require a
clever game program to calibrate its bets.
The biggest of the wild cards was Final Jeopardy, the last
clue of the game. As in Daily Doubles, contestants could bet
all or part of their winnings on a single category. But all three
contestants participated — as long as they had positive earn-
ings. Often the game boiled down to betting strategies in Final
Jeopardy. Take that 1994 contest, in which the betting took a
strange turn. Going into Final Jeopardy, Rachael Schwartz led
Kurt Bray, a scientist from Oceanside, California, by a slim
margin, $9,200 to $8,600. The category was Historic Names.
To lock down a win, she had to assume he would bet every-
thing, reaching $17,200. A bet of $8,001 would give her one
dollar more, provided she got it right. But if they both bet
big and missed, they might fall to the third-place contestant,
Brian Moore, a Ph.D. candidate from Pearland, Texas. In the
minute or so that they took to place their bets, the two leaders
had to map out the probabilities of a handful of different sce-
narios. They wrote down their dollar numbers and waited for
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 29

the clue: “Though he spent most of his life in Europe, he was


governor of the Bahamas for most of World War II.”
The second-place player, Bray, was the only one to get it
right: “Who was Edward VIII?” Yet he had bet only $500. It
was a strange number. It placed him $100 behind the leader,
not ahead of her. But the bet kept him beyond the reach of
the third-place player. Most players bet at least something on
a clue. If Schwartz had wagered and missed, he would win.
Indeed, Schwartz missed the clue. She didn’t even bother
guessing. But she had bet nothing, leaving herself $100 ahead
and winning the game.
The betting in Final Jeopardy, Ferrucci saw, might actually
play to the strength of a computer. A machine could analyze
betting patterns over thousands of games. It could crunch the
probabilities and devise optimized strategies in a fraction of a
second. “Computers are good at that kind of math,” he said.
It was the rest of Jeopardy that appeared daunting. The
game featured complex questions and a wide use of puns pos-
ing trouble for literal-minded computers. Then there was
Jeopardy ’s nearly boundless domain. Smaller and more specific
subject areas were easier for computers, because they offered
a more manageable set of facts and relationships to master.
They provided context. A word like “leak,” for example, had
a specific meaning in deep-sea drilling, another in heart sur-
gery, and a third in corporate press relations. A know-it-all
computer would have to recognize different contexts to keep
the meanings clear. And Jeopardy ’s clues took the concept of a
broad domain to a near-ludicrous extreme. The game had an
entire category on Famous Understudies. Another was on the
oft-forgotten president Rutherford B. Hayes. Worse, from a
computer architect’s point of view, the game demanded an-
30 F I N A L J E O P A R D Y

swers within seconds — and penalized players for getting them


wrong. A Jeopardy machine, just like the humans on the show,
would have to store all of its knowledge in its internal mem-
ory. (The challenge, IBM figured, wouldn’t be nearly as im-
pressive if a bionic player had access to unlimited informa-
tion on the Web. What’s more, Jeopardy would be unlikely to
accept a Web-surfing contestant, since others didn’t have the
same privilege.) Beating humans in Jeopardy, it seemed, was
more than a stretch goal. It appeared impossible and spelled
potential disaster for researchers. To embarrass the company
on national television — or, more likely, to flame out before
even getting there — was no way to manage a career.
Ferrucci’s pessimism was also grounded in experience. In
annual government competitions, known as TRec (Text Re-
trieval Conference), his question-answering (Q-A) team de-
veloped a system called Piquant. It struggled far below Jeop-
ardy levels with a much easier test. In TRec, the competing
teams were each given a relatively small “corpus” of about one
million documents. They then had to train the machines to
answer questions based on the material. (In one version from
2004, several of the questions had to do with Tom Cruise and
his ex-wife.)
In answering these questions, the computer, for all its
processing power and memory, resembled nothing so much
as a student with serious brain damage. An apparently sim-
ple question could turn it into knots. In 2005, it was asked:
“What is Francis Scott Key best known for?” The first job was
to determine which of those words represented the subject of
the question, the “entity,” and whether that might be a person,
a state, or perhaps an animal or a machine. Each one had dif-
ferent characteristics. “Francis” and “Scott” looked like names.
But “Key”? That could be a metal tool to open doors or a men-
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 31

tal breakthrough to solve problems. In its hunt, the computer


might even spend a millisecond or two puzzling over Key lime
pies. Clearing up these doubts might require a visit to the sys-
tem’s “disambiguation” unit, where the answering program
consulted a dictionary or looked for contextual clues in the
surrounding words. Could “Key” be something the ingenious
Francis Scott invented, collected, planted, or stole? Could he
have baked it? Probably not. The structure of the question,
with no direct object, made it look like the third name of a
person. The capital K on Key strengthened that case.
A person confronting that question either knew or did not
know that Francis Scott Key wrote the U.S. national anthem,
“The Star-Spangled Banner.” But he or she wasted no time
searching for the subject and object in the sentence or won-
dering if it was a last name, a metal tool, or a tangy South
Florida dessert.
For the machine, things only got worse. The question
lacked a verb, which could disorient the computer. If the
question were, “What did Francis Scott Key write?” the ma-
chine could likely find a passage of text with Key writing
something, and that something would point to the answer.
The only pointer here — “is known for” — was maddeningly
vague. Assuming the computer had access to the Internet (a
luxury it wouldn’t have on the show), it headed off with noth-
ing but the name. In Wikipedia, it might learn that Key was
“an American lawyer, author and amateur poet, from George-
town, who wrote the words to the United States national an-
them, ‘The Star-Spangled Banner.’ ” For humans, the answer
was right there. But the computer, with no verb to guide it,
might answer that Key was known as an amateur poet or a
lawyer from Georgetown. In the TRec competitions, IBM’s
Piquant botched two out of every three questions.
32 F I N A L J E O P A R D Y

All too often, the system failed to understand the question


or to put it in the right context. For this, a growing school of
Artificial Intelligence argued, systems needed to spend more
time in the computer equivalent of infancy, mastering the
concepts that humans take for granted: time, space, and the
basic laws of cause and effect.
Toddlerhood is a tribulation for computers, because it rep-
resents knowledge that is tied to the human experience: the
body and the senses. While crawling, we learn about space
and physical objects, and we get a sense of time. The toddler
reaches for the jar on the table. Moments later pieces of it
lie scattered on the floor. What happened between those two
states? It fell. Such lessons establish notions of before and af-
ter, cause and effect, and the nature of gravity. These experi-
ences, most of them accompanied by a steady stream of hu-
man language, set the foundation for practically everything
we learn. “You crawl around and bump into things,” said Da-
vid Gunning, a senior manager at Vulcan Inc., an AI incuba-
tor in Seattle. “That’s basic research.” It isn’t just jars that fall,
the toddler notices. Practically everything does. (Certain bal-
loons are exceptions, which seem magical.) The child turns
these observations into theory. Unlike computers, humans
generalize.
Even the metaphors in our language lead back to the tum-
bles and accidents seared into our consciousness in our early
years. We “fall” for a sales pitch or “fall” in love, and we cringe
at hearing “sharp” words or “stinging” rebukes. We process
such expressions on such a basic level that they seem closer to
feeling than thought (though for humans, unlike computers,
the two are intertwined). Over the course of centuries, these
metaphors infused language and, consequently, were funda-
mental to understanding Jeopardy clues. Yet to a machine with
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 33

no body or experience in the physical world, each one was a


puzzle.
In some Artificial Intelligence labs, scientists were at-
tempting to transmit these elementary experiences to comput-
ers. Sajit Rao, a professor at MIT, was introducing comput-
ers equipped with vision to rumpus-room learning, showing
them objects moving, falling, obstructing paths, and piling on
top of one another. The goal was to establish a conceptual un-
derstanding so that eventually computers could draw conclu-
sions from visual observations. What would happen, for ex-
ample, when vehicles blocked a road?
Several years later, the U.S. Defense Department’s Ad-
vanced Research Projects Agency (DARPA) would fund Rao’s
research for a program called Mind’s Eye. The idea was to
teach machines not only to recognize objects but to be able
to reason about what they were doing, where they might have
come from. This work, they hoped, would lead to smart sur-
veillance cameras, which would mean that computers could
replace humans in the tedious and exhausting task of moni-
toring a spot — what the Pentagon calls “persistent stare.” In-
stead of simply recording movements, these systems would
interpret them. If a man in Afghanistan went into a build-
ing carrying a package and emerged without it, the system
would conclude that he had left it there. If he walked toward
another person with a suitcase in his hand, it would predict
that he was going to give it to him. A seeing and thinking ma-
chine that could generate hypotheses based on observations
might zero in on potential roadside bombs or rooftop snipers.
This type of intelligence, according to DARPA, would extend
computer surveillance from objects to actions — from nouns
to verbs.
This skill required the computer to understand relation-
34 F I N A L J E O P A R D Y

ships — precisely the stumbling block of IBM’s Piquant as it


struggled with questions in the TRec competition. But po-
tential breakthroughs such as Mind’s Eye were still in the in-
fant stage of research and wouldn’t be ready for years — cer-
tainly not in time to give a Jeopardy machine a dose of human
smarts. What’s more, Ferrucci was busy managing another
big software project. So after consulting his team and assem-
bling the discouraging evidence, he broke the news to a disap-
pointed Paul Horn. His team would not pursue the Jeopardy
challenge. It was just too hard to guarantee results on a sched-
ule.
Free of that distraction, the Q-A team returned to its
work, preparing Piquant for the next TRec competition. As
it turned out, though, Ferrucci had won them only a respite,
and a short one at that. Months later, in the summer of 2006,
Horn returned with exactly the same question: How about
Jeopardy?
Reluctantly, Ferrucci and his small Q-A team gathered in
a small room at the Hawthorne research center, a ten-minute
drive south from Yorktown. (It was a far less elegant structure,
a cuboid of black glass in an office park. But unlike Yorktown,
where the public spaces were bathed in natural light and the
offices windowless, Hawthorne’s offices did have views, mostly
of parking lots.) The discussion followed the familiar, depress-
ing lines: the team’s travails in the TRec competitions, the in-
sanely broad domain of Jeopardy, and the difficulty of coming
up with answers and a betting strategy in three to five sec-
onds. TRec had no time limit at all, and the computer often
churned away for minutes trying to answer a single question.
While the team talked, Ferrucci sat at the back of the
room, uncharacteristically quiet. He had a laptop open and
was typing away. He was looking up Jeopardy clues online and
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 35

then searching for answers on Google. The answers certainly


didn’t pop up. But in many cases, the search engine led to the
right neighborhood. He started thinking about the technolo-
gies needed to refine Google’s vague pointer to a precise an-
swer. It would require much of the tech muscle of IBM. He’d
have to bring in top natural-language researchers and experts
in machine learning. To speed up the answering process, he’d
need to spread out the computing to hundreds or even thou-
sands of machines. This would require a crack hardware unit.
His team would also need to educate the machine in strategy.
Ferrucci had a few colleagues who focused on game theory.
Several of them were training computers to play the Japanese
game Go (whose computational complexity made chess look
like Tic-Tac-Toe). Putting together all the pieces of this elec-
tronic brain would require a large multidisciplinary team and
a huge investment — and even then they might fail. But the
prospect of success, however remote, was tantalizing. Ferrucci
looked up from his computer and said “Hey, I think we can
do this.”

At the dawn of Artificial Intelligence (AI), a half century ago,


scientists predicted that computers would soon be speaking
and answering questions fluently. A pioneer in the field, Her-
bert Simon, predicted in 1965 that “machines w[ould] be ca-
pable, within twenty years, of doing any work a man can do.”
These were the glory days of AI, a period of boundless vision
and bounteous funding. Machines, it seemed, would soon
master language, recognize faces, and maneuver, as robots, in
factories, hospitals, and homes. In short, computer scientists
would create a superendowed class of electronic servants. This
led, of course, to failed promises, to such a point that Artifi-
cial Intelligence became a term of derision. Bold projects to
36 F I N A L J E O P A R D Y

build bionic experts and conversational computers lost their


sponsors. A long AI winter ensued, lasting through much of
the ’80s and ’90s.
What went wrong? In retrospect, it seems almost incon-
ceivable that leading scientists, including Nobel laureates like
Simon, believed it would be so easy. They certainly appreci-
ated the complexity of the human brain. But they also realized
that a lot of that complexity was tied up in dreams, memories,
guilt, regrets, faith, desires, along with the controls to main-
tain the physical body. Machines wouldn’t have to bother with
those details. All they needed was to understand the elements
of the world and how they were related to one another. Ma-
chines trained in the particulars of sick people; ambulances
and hospitals, for example, could conceivably devote their
analytical skills to optimizing emergency services. Yet teach-
ing the machines proved extraordinarily difficult. One of the
biggest challenges was to anticipate the responses of humans.
The machines weren’t up to it. And they had serious trouble
with even the most basic forms of perception, such as seeing.
For example, researchers struggled to teach machines to per-
ceive the edges of things in the physical world. As it turned
out, it required experience and knowledge and advanced pow-
ers of pattern recognition just to look through a window and
understand that the oak tree in the yard was a separate entity.
It was not connected to the shed on the other side of it or a
pattern on the glass or the wallpaper surrounding the win-
dow.
The biggest obstacle, though, was language. In the early
days, it looked beguilingly easy. It was just a matter of pro-
gramming the machine with vocabulary and linking it all to-
gether with a few thousand rules — the kind you’d find in a
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 37

grammar book. If the machine still underperformed? Well,


just give it more vocabulary, more rules.
Once the electronic brain mastered language, it was sim-
ply a question of teaching it about the world. Asia’s over there.
This is the United States. We have a democracy. That’s the
Pacific Ocean between the two. It’s big, and wet. If research-
ers kept adding facts, millions of them, and defining their re-
lationships, by the end of the grant cycle they might have a
talking, thinking machine that “knew” what humans did.
Language, of course, turns out to be far more complicated.
Jaime Carbonell, a top researcher at Carnegie Mellon Univer-
sity, has been teaching language to machines for decades. The
way he describes it, our minds are swimming with cultural
and historical allusions, accumulated over millennia, along
with a complex scheme of who’s who. Words, when spoken
or read, vary wildly according to context. (Just imagine if the
cops in New York raced off to Citi Field, sirens wailing, ev-
ery time someone was heard saying, “The Mets are getting
killed!”)
Carbonell, sitting in his Pittsburgh office, gave another ex-
ample. He issued a statement: “I want a juicy hamburger.”
What does it mean? Well, if a child says it to his mother, it’s
a request or a plea. If a general says it to a corporal, it’s a tacit
command. And if a prisoner says it to a cellmate, it might be
nothing more than a wish. Scientists, of course, could attempt
to teach a computer those variables as rules. But new layers of
complexity pop up. Is the general a vegan or speaking sarcas-
tically? Or maybe “hamburger” means something entirely dif-
ferent in prison lingo?
This flexibility isn’t a weakness of language but a strength.
Humans need words to be inexact; if they were too precise,
38 F I N A L J E O P A R D Y

each person would have a unique vocabulary of several billion


words, all of them unintelligible to everyone else. You might
have a unique word for the sip of coffee you just took at 7:59
a.m., which was flavored with the anxiety about the traffic in
the Lincoln Tunnel or along Paris’s Périphérique. (That single
word would be as useless to you as to everyone else. A word
has to be used at least twice to have any purpose.)
Each word is a lingua franca, a fragment of a clumsy com-
mon language. Imagine a man saying a simple sentence to a
friend: “I’m weary.” He’s thinking about something, but what
is it? Has he carried a load a long way in the sun? Does he
have a sick child or financial troubles? His friend certainly
has different ideas, based on his own experience, about what
“weary” means. In addition to the various contexts, it might
send other signals. Maybe where he comes from, the word has
a slightly rarefied feel, and he’s wondering whether his friend
is trumpeting his sophistication. Neither one knows exactly
what the other is thinking. But that single word, “weary,” ex-
tends an itsy bridge between them.
Now, with that bridge in place, the word shared, they dig
deeper to see if they can agree on its meaning. They study each
other’s expression and tone of voice. As Carbonell noted, con-
text is crucial. Someone who has won the Boston Marathon
might be contentedly weary. Another, in a divorce hearing, is
anything but. One person may slack his jaw in an exaggerated
way, as if to say “Know what I mean?” In this tiny negotia-
tion, far beyond the range and capabilities of machines, two
people can bridge the gap between the formal definition of a
word and what they really want to say.

It’s hard to nail down the exact end of AI winter. A certain


thaw set in when IBM’s computer Deep Blue bested Garry
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 39

Kasparov in their epic 1997 showdown. Until that match, hu-


man intelligence, with its blend of historical knowledge, pat-
tern recognition, and the ability to understand and anticipate
the behavior of the person across the board, ruled the game.
Human grandmasters pondered a rich set of knowledge, jew-
els that had been handed down through the decades — from
Bobby Fischer’s use of the Sozin Variation in his 1972 match
with Boris Spassky to the history of the Queen’s Gambit De-
nied. Flipping through scenarios at about three per second — a
glacial pace for a computing machine — these grandmasters
looked for a flash of inspiration, an insight, the hallmark of
human intelligence.
Equally important, chess players tried to read the minds
of their foes. This is a human specialty, a mark of our intel-
ligence. Cognitive scientists refer to it as “theory of mind”;
children develop it at about age four. It’s what enables us to
imagine what someone else is experiencing and to build large
and convoluted structures based on such analysis. “I wonder
what he was thinking I knew when I told him . . .” Most fic-
tion, from Henry James to Elmore Leonard, revolves around
this very human analysis, something other species — and com-
puters — cannot even approach. (It’s also why humans make
such expert liars.)
Unlike previous AI visions, in which a computer would
“think” more or less the way we do, Deep Blue set off on a dif-
ferent course. It played on the strengths of a supercomputer:
a fabulous memory and extraordinary calculating speed. Sta-
tistical approaches to machine intelligence had been around
since the dawn of AI, but the numbers mavens had never wit-
nessed anything approaching this level of computing power
and speed. Deep Blue didn’t try to read Garry Kasparov’s
mind, and it certainly didn’t count on flashes of inspiration.
40 F I N A L J E O P A R D Y

Instead, it raced through a century of grandmaster games, an-


alyzing similar moves and situations. It then constructed the
most probable scenarios for each possible move. It analyzed
two hundred million moves per second (nearly seventy mil-
lion for each one the humans considered). A similar approach
for a computer writing poetry would be to scrutinize the pat-
terns and vocabulary of every poem ever written before choos-
ing each word.
Forget inspiration, creativity, or blinding insight. Deep
Blue crunched data and won its match by juggling statistics,
testing thousands of scenarios, and calculating the odds. Its
intelligence was alien to human beings — if it could be con-
sidered intelligence at all. IBM at the time described the ma-
chine as “less intelligent than the stupidest person.” In fact,
the company stressed that Deep Blue did not represent AI,
since it didn’t mimic human thinking. But the Deep Blue
team made good on a decades-old promise. They taught a
machine to win a game that was considered uniquely human.
In this, they passed a chess version of the so-called Turing test,
an intelligence exam for machines devised by Alan Turing, a
pioneer in the field. If a human judge, Turing wrote, were to
communicate with both a smart machine and another hu-
man, and that judge could not tell one from the other, the
machine passed the test. In the limited realm of chess, Deep
Blue aced the Turing test — even without engaging in what
most of us would recognize as thought.
But knowledge? That was another challenge altogether.
Chess was esoteric. Only a handful of specially endowed peo-
ple had mastered the game. Yet all of us played the knowledge
game. By advancing from chess to Jeopardy, IBM was shift-
ing the focus from a remote island off the coast straight to our
cognitive mainland. Here, the computer would grapple with
T H E G E R M O F T H E J E O P A R D Y M A C H I N E 41

far more than game theory and math. It would be competing


in a field utterly defined by human intelligence. The competi-
tors in Jeopardy, as well as the other humans writing the clues,
would feast on knowledge tied to experiences and sensations,
sights and tastes. The machine, by contrast, would be blind
and deaf, with no body, no experience, no life. Its only mem-
ories — if you could call them that — would be millions of lists
and documents encoded in ones and zeros. And the entire
game would be played in endlessly complex and nuanced lan-
guage — a cinch for humans, a tribulation for machines.
Picture one of those cartoons in which a land animal, per-
haps a coyote, runs off a cliff and continues to run so fast
in midair that it manages to fly (at least for a while). Now
imagine that animal not only surviving but flying upward and
competing with birds. That would be the challenge facing
an IBM machine. It would have to use its native strengths in
speed and computation to thrive in an utterly foreign setting.
Strictly speaking, the machine would be engaged in a knowl-
edge game without “knowing” a thing.
Still, Ferrucci believed his team had a fighting chance,
though he wasn’t quite ready to commit. He code-named the
project Blue J — Blue for Big Blue, J for Jeopardy — and right
before the holidays, in late 2006, he asked Horn to give him
six months to see if it was possible.

Anda mungkin juga menyukai