The Computer that Took on Jeopardy! - The Strong National Museum of Play

By Adam Nedeff, researcher for the National Archives of Game Show History

In a well-publicized 1997 showdown, world chess champion Gary Kasparov competed against Deep Blue, a chess-playing computer built by IBM. Deep Blue won. Kasparov vehemently complained that the computer hadn’t been programmed so much to play chess as it had to specifically compete against him, but it was a huge victory for IBM.

Still, IBM acknowledged that success in a game like chess wasn’t such an extraordinary achievement. Eric Brown, an IBM research manager, later said, “…[C]hess is fairly mathematical and well defined—each game state and the corresponding possible moves can be easily represented by a computer.”

IBM aimed a little higher. They sought a game that varied from competition to competition, a game that required more than mere strategy; a game that would require the ability to analyze language; a game that required a computer to think and make decisions in the way a human would make them, as opposed to the Deep Blue experiment, where the decision-making was a matter of calculations. Jeopardy! was just what they wanted.

An IBM employee told the PBS series Nova, “Humans communicate very fluidly—natural language—and that’s where computers struggle.”

Jeopardy! clues were the epitome of that struggle. Some Jeopardy! clues are written so vaguely that multiple answers are possible, with only the category clarifying which is the one that the game is seeking. Some Jeopardy! clues sneak hints into their phrasing. Some contain puns. And every clue is different. Today’s game of Jeopardy! isn’t the game that was played yesterday, nor is it the game that will be played tomorrow. Unlike chess, Jeopardy! requires its players to contend with new information at every turn. For IBM, Jeopardy! would be the biggest barrier to topple in the efforts to make computers understand human language. In essence, IBM sought to create a computer as smart as—or smarter than—a human being.

Waston computer at the middle podium with a winning score of $77,000 displayed

IBM constructed Watson (named for IBM founder Thomas J. Watson), a series of two monolithic units housed in a separate room and containing a total of 20 IBM servers. When linked together, they formed the equivalent of 2,800 high-powered computers operating together in a high-speed network, and, to test their experiment, it built a replica Jeopardy! set to subject it to a battery of tests. There was also a refrigeration unit stored in there to combat the mighty waves of heat that Watson produced when it was up and running.

Watson was stuffed with literally thousands of resources: complete dictionaries, encyclopedias, newspaper archives, books, and content from the Internet. It was even given “cheat sheets” of all the clues and correct responses from past episodes of Jeopardy!, to teach Watson patterns and quirks in the ways that clues are written and presented. To simulate the experience of playing Jeopardy!, a replica Jeopardy! set was constructed at IBM, so that the company could bring in actual contestants to play the game as the Watson system became more refined.

As an example of the refining that Watson needed, an early test game included the category “It’s in My Coffee.” One of the clues: “This trusted friend was the first non-dairy powdered creamer.” Watson, assessing only the words “dairy” and “creamer,” guessed, “What is milk?” The computer system couldn’t grasp the wordplay and the hint tucked into the first three words of the clue. The correct response was “What is Coffee Mate?” For another clue, about a staple of the grasshopper’s diet, Watson replied, “What is kosher?”

Over the next four years, IBM programmers would help the Watson system grasp nuances of language, and the computer’s ability to play Jeopardy! improved dramatically. IBM was doing more than programming it to respond to clues. Watson was subjected to the rules of the show. The game prohibited contestants from conducting any kind of research as the game was in progress, obviously, so Watson’s access to the Internet was always severed before a game began; it could only use whatever information it had already stored. Contestants on Jeopardy! were also prohibited from ringing in until after Alex Trebek had finished reading a clue and an off-stage light was illuminated; Watson was held to that same standard. And to make sure that Watson was being tested against a reasonable degree of competition, actual Jeopardy! contestants were brought to IBM to participate in the experimental games. Watson played 79 games against former contestants, and 55 games against players who had made it into the annual Tournament of Champions.

But IBM went even further than that. It taught Watson how to play the game strategically. Watson’s algorithms would sort through all its information as each clue was presented. Based on that information, Watson would gather three potential answers and calculate, in terms of a percentage, how confident it was in that answer. The answer with the top percentage would be the one that Watson gave, but only if that answer crossed Watson’s “buzz threshold.” Watson would take into account its own score, the scores of the opposing players, how far ahead or behind the opponents it was, the number of remaining clues on the board, the value of those clues, whether or not a Daily Double was on the board, and from all that information, calculate the percentage of confidence it would require to ring in on the next clue. That confidence level was the buzz threshold. If Watson’s top answer didn’t cross that line, Watson didn’t ring in.

MAN VS. MACHINE

When IBM programmers were positive that Watson was ready for the real thing, they alerted Jeopardy! In January 2011, Jeopardy! taped a series of episodes called “The IBM Challenge” at the TJ Watson Research Center in Yorktown Heights, New York. Watson would compete against the two best players in the show’s history: Ken Jennings, who won 74 games and a grand total of $2,522,700 in 2004 and actually inspired the Watson experiment in the first place (writers compared Jennings’ mind to a computer so often that it caught IBM’s attention); and Brad Rutter, a 2000 undefeated champion (back when the show required contestants to leave after their fifth game) and had subsequently amassed over $3 million by returning to the show and winning tournaments, including the Ultimate Tournament of Champions, in which he had decisively defeated Ken Jennings in three consecutive games.

Watson would be represented onstage by a monitor displaying an avatar, a circle adorned with a halo of five dash marks. Watson’s avatar would change color and animate according to how hard it was “thinking.” Blue or green meant confidence; orange meant Watson wasn’t sure. Sped-up animation meant that Watson had to do a little extra thinking about this one. And the home audience would hear an artificial voice, sounding like a soft-spoken middle-aged man, presenting Watson’s response when it rang in.

IBM’s employees watched nervously as millions of viewers were let in on the most intense artificial intelligence testing ever conducted. Watson wasn’t perfect; not only was it wrong at times during the game, but it could also be spectacularly wrong; it guessed “What is Toronto?” for a Final Jeopardy clue in the category “US Cities.” But most of the time, Watson was spot-on. It was almost frighteningly good; like something out of a science fiction movie, the computer was decidedly outsmarting its human opponents.

Ken Jennings, conceding defeat, invoked a quote from a classic Simpsons episode during Final Jeopardy!, scribbling “I, for one, welcome our computer overlords” on his screen. Total scores after two games: Brad Rutter, $21,600. Ken Jennings, $24,000. Watson, $77,147.

DR. WATSON

For most Americans, the IBM computer versus the Jeopardy! superbrains was a noteworthy episode, an oddity worth looking in on, but promptly forgotten. But Watson’s work was just beginning. The Jeopardy! game went so well that John Kelly, head of research for IBM, decided to send Watson to medical school.

IBM recruited 20 cancer institutes to “teach” Watson. Despite an overwhelming lack of enthusiasm—a Jeopardy!-playing computer sounded more like a gimmick than an actual medical resource, after all—the institutes quickly realized that the Watson system had an advantage that they didn’t have. Watson wasn’t bound by time.

Dr. Ned Sharpless of University of North Carolina at Chapel Hill told 60 Minutes, “…[W]e have 8,000 new research papers published every day. No one has time to read 8,000 papers a day. So, we found that we were deciding on therapy based on information that was always, in some cases, 12 or 24 months out-of-date. They taught Watson to read medical literature essentially in about a week. It was not very hard. And then Watson read 25 million papers in about another week. And then it also scanned the web for clinical trials open at other centers, and suddenly, we had this complete list that was everything one needed to know.”

Wary of a miracle, Dr. Sharpless and his team subjected Watson to information about 1,000 patients to see if Watson could draw the same conclusions that a skilled doctor would. In 99% of cases, Watson made the same diagnoses and recommended the same treatments that actual doctors made. But in 30% of patients, Watson found something that the physicians hadn’t found. It wasn’t incompetence. These talented doctors simply never had the time to digest all the information Watson had taken in, and now Watson was giving them information that they never had before. Watson recommended treatments for that 30% that the original doctors never would have, or could have, conceived.

Rather than being the culmination of the Watson experiment, Jeopardy! was just the beginning. IBM has since invested billions of dollars in data-analytics technology and further development of artificial intelligence. Today, IBM offers a portfolio of AI products under the name watsonx. Although many have voiced concerns that AI is more of a threat than a solution for humans, IBM remains optimistic that Watson represents the best of the technology’s future, a companion that will aid human life immeasurably in a variety of fields. The computer that conquered Jeopardy! is now poised to conquer jeopardies elsewhere.