How Google’s DeepMind beat the best in Go, and why that matters

South Korea just finished hosting an historic tournament of the ancient game of Go. On one side, Google DeepMind’s AlphaGo program. On the other, South Korean Go champion Lee Sedol, widely considered one of the best players of all time. The contest was to be held over five days, with one match per day, but only three days were needed. AlphaGo won in three resounding victories, each more dominant than the last, and then went on to win 4-1 total. Google DeepMind’s achievement has been called bigger than IBM’s 1997 win against chess champion Garry Kasparov — just like Kasparov, Sedol went in predicting a clean sweep for the human side, but unlike Kasparov, ended up getting almost swept himself.
To be fair to the now-defunct human masters, Go is a game that seems to truly stump humans as well. It originated in China at least several thousand years ago, and while the exact story behind it isn’t known, its mechanics seem strikingly similar to medieval battlefield tactics. The original Chinese name literally means “encircling game,” and it involves laying your pieces so you trap your opponent’s pieces entirely — while, of course, trying to prevent that from happening to your own.

A fancy Go board, with a few pieces laid out.

Rather than shaving down the possibility space for moves, for instance by having a limited number of very specific moves associated with each piece, as in chess, a Go board represents a wide open field of possibility. It’s much larger than a chess board, 19 by 19 spaces according to tournament rules; the combination of a large number of spaces and non-restrictive rules governing moves in those spaces means that there are a uselessly large number of total possible moves. There are more possible positions for all stones on a Go board than there are atoms in the universe — though I don’t think this oft-cited figure includes WIMPs!

Computers have classically worked by quick-searching large numbers of possibilities, so if the number of possibilities becomes unsearchably large, as it does in Go, then a classical computer ought to be fundamentally incapable of playing well. But that’s just the first problem, because unlike chess, Go is all about attrition.

Chess Grandmaster Gerry Kasparov despairs in the face of IBM’s early attempt at machine learning — complete with Japanese surrogate to make the computer’s moves, just like today.

On a Go board, you have to constantly work toward both territory-right-now and a more abstract concept referred to as position. It’s all about boxing out your opponent, opening up territory for yourself while cutting them off from the best opportunities for future play. How do you tell whether this position has better long term potential than that one, when your ability to predict specific outcomes hits a mathematical wall a dozen or so moves into the future?

The moves from Game 1 of 5, March 8.

To answer these questions, the team at DeepMind created two basic components to the system. One neural network tries to get a feel for the overall — how is the board laid out, who’s in a better position, and what areas needs to be threatened and defended? This is the source of much of the difficulty, both for computers and humans, and with human players it’s where we most commonly run into annoying words like “hunch” and “intuition.”

The other major component uses knowledge about past games of Go to look a certain number of moves into the future, given different possible decisions for its own move — each possible path is considered for likely counter-moves, and good responses to each of these, and probable counter-moves to those… and on and on. On a 19-by-19 board, this sort of process needs to be directed by the high-level game- and board-awareness of the first algorithm, or else flail about with poor results.

The overall process is not unlike imagination: The system essentially plays out little mini games as far as it can, due to computational limitations, looks around in this imaginary space, and judges the desirability of this possible path versus all the other imaginary paths it has tested. That’s roughly similar to how some human beings describe their approach to the game. One DeepMind engineer described it in a live interview aired during Game 2 that the heatmap of interesting moves AlphaGo creates of any given snapshot is much like human intuition. “As a Go player, if I look at the board, there are immediately some points that jump out at me as moves that I might want to play. AlphaGo does the same thing.”

Perhaps that’s why even human masters often refer to a “feeling” when asked precisely why they choose one strategy over another — they no more “understand” the truly deterministic outcomes of their decisions than AlphaGo itself.

So, do human Go masters truly understand it? And if so, does that mean AlphaGo does, too? Sedol was surprised to have to deal with a number of moves in the first game (in particular, move 102 to the right, if you’re interested) and seemed to come to think of AlphaGo as a truly unpredictable and dangerous opponent. The program seemed to display genuine creativity — an illusion of course, but an important one.

Garry Kasparov

In the second game, played on March 9, AlphaGo took an unusual opening approach, causing Lee pause. In the first game AlphaGo used far more of its allotted time than Lee, but in the second the human champion used up far more of his time than the computer. He ran out of time first, and continued hesitation brought him down to his final one-minute allotment of overtime before forfeiting due to timing out. It’s incredible — in a very real way, the computer psyched Lee out. What’s more incredible is that it’s actually possible that those early moves were made for specifically that reason. During that game Michael Redmond, the only Westerner to ever reach the top “9-dan” rank in Go, said that this latest version of AlphaGo plays an innovative game. He said its style was “already something I could learn from.”

At the end of the day, the fall-back explanation for human skill at Go is experience — we think it will work because it has worked in the past, even in a very abstract, “This move sort of looks like that move from six months ago!” kind of way. That’s precisely how a deep neural network learns. The difference is that while a human might be able to play at most a few thousand games a year, AlphaGo can play millions every day. That’s how it has acquired such astounding ability, fully a decade before even optimistic projections. By combining a computer’s brute force ability with a truly novel network-of-networks approach to breaking down the possibility space much the way a human mind does, DeepMind has been able to (at least) rival the best humanity has to offer.

This is an incredible achievement, regardless of whether AlphaGo had won the tournament overall. It shows that neural networks really can take us past previously impassable barriers. All Skynet nightmares aside, in the end Alphabet CEO Eric Schmidt said it best: “The winner here, no matter what happens, is humanity.”

TechCityMall

3/15/2016