The adventure of chess programming (3)
It was the mid 1990s. I was in London, accompanying World Chess Champion Garry Kasparov, as I often did, on one of his appearances. This time it was in Home House, a beautiful Georgian villa in Marylebone, and one evening we were joined at dinner by a former child prodigy in chess. He had reached master level (Elo 2300+) at the age of 13 and captained a number of English junior chess teams. He was also a world-class computer games player. It was an interesting encounter, with the lad enthusiastically describing a computer game he was developing. After he left I said to Garry: “That’s a cocky young fellow!” “But very smart,” Garry replied. And we left it at that.
Twenty years later I read in the news that Google had purchased a company called DeepMind Technologies, for £400 million. DeepMind was a British artificial intelligence enterprise which had created neural network software that learned how to play early-gen video games like Pong and Space Invaders, all on its own. It was not hand-programmed to do this, but used methods that were very like those of a human player gaining proficiency in the game. The goal, DeepMind said, was “to create a general-purpose AI that can be useful and effective for almost anything.” One of the founders of the company was Demis Hassabis.
Demis? Wait a minute, wasn’t that the lad we had met in Home House? For a year I watched the progress the company made as a member of the Google family, and was especially fascinated to see how they solved a problem that had needled computer experts for decades: DeepMind created a program, AlphaGo, that learned to play the ancient game of Go, taking it all the way to master and then world championship level. The rules of Go are deceptively simple, but the branching factor makes it very hard for computers to calculate. In the first article of my series I described how in a 40-move game of chess there were 10¹²⁸ possible sequences of moves — vastly more than the number of atoms in the universe. Well, in Go there are 10¹⁷⁰ possible board configurations, which dwarfs the number of chess games to insignificance.
I followed the progress of AlphaGo closely on the news page of ChessBase, which shares with DeepMind an affinity for capitalising in the middle of names. The program used deep neural networks to study a very large number of games, developing its own understanding of what human play looks like. After that it honed its skills by playing different versions against itself while learning from its mistakes. This process, known as reinforcement learning, produced a master-level Go playing software.
At this stage I contacted Demis, who remembered our encounter in Home House and invited me to visit DeepMind in London. My counter-proposal: his team should come to Hamburg to see the assets we have for chess. We have over eight million high-class games, 100 thousand annotated by very strong players, 200 million positions in the cloud, with the evaluations of the world’s most powerful computers attached to each of them, the largest and most up-to-date “live” openings book in the game, etc., etc. DeepMind could use this data to train a neural network for chess—more accurately: have the neural network train itself to play the game.
Demis was open to the idea and promised to consider it. What he did not tell me at the time was that they were already developing a chess engine that was unlike anything anyone had ever seen before. Traditional engines have their knowledge of the game of chess programmed into them. The DeepMind neural network took a radically different path: it was told the rules of the game, how the pieces move and the ultimate goal of checkmate. Nothing else. Using state-of-the-art techniques in artificial intelligence, the program, AlphaZero, played against itself, millions upon millions of times, identifying patterns of its own accord, and adjusting the values as it saw fit. In other words, it produced its own concepts and knowledge, using pattern recognition just as humans do, and improving as it learned. And it did this without the need for all the ChessBase data I was offering.
How was this possible? Initially the system played absurd games, where one side gives up three pieces for nothing, and the other side cannot win because it had lost four pieces. But with each iteration, with each 10,000 or so learning steps, it became stronger. Running on the latest proprietary hardware — for the technology savant: 5,000 first-generation and 64 second-generation TPUs — the program played 44 million games against itself and, in the process, rose to the level of world class chess strength. Nobody had told AlphaZero anything about strategy, nobody had explained that material was important, that queens were more valuable than bishops, that mobility mattered. It had worked everything out by itself, drawing its own conclusions — conclusions, incidentally, that no human being will ever be able to comprehend.
In the end AlphaZero played a test match against an open source engine named Stockfish, one of the top three or four brute force engines in the world. These programs all hover around 3500 points on the rating scale, which is at least 700 more than any human player. Stockfish ran on 64 processor threads and looked at 70 million positions per second; AlphaZero ran on a machine with four TPUs, looking at just 80,000 positions per second. It compensated for this thousand-fold disadvantage by selectively searching only the most promising variations — moves that in its self-play had proved to be effective in similar positions.
In the 100 games that were played against Stockfish, AlphaZero won 25 as white, three as black, and drew the remaining 72 games. All games were played without recourse to an openings book. In addition a series of twelve 100-game matches were played, starting from the 12 most popular human openings. AlphaZero won 290, drew 886 and lost 24 games. Some in the traditional computer chess community call the match conditions “unfair” (no opening books or only constrained openings), but I conclude that without doubt AlphaZero is the strongest entity that has ever played chess. And it had become this after studying the game, from scratch, all alone without any external advice, for a total of about nine hours.
Google and DeepMind were quite relaxed about the project and revealed the methods they used to all and gentry. One of the project managers even came to visit ChessBase in Hamburg and held a talk for half a dozen of our talented young programmers. They went away inspired, determined to learn more about this kind of computer intelligence.
Of course I myself could not resist. In mid November I asked my son Tommy and nephew Noah to build me a powerful computing machine. They bought the components, consisting of a 12-core processor and two state-of-the-art graphics cards that had just been released. These cards have thousands of graphic and tensor core processor units (GPUs and TPU), originally intended to power 3D video display in games. But it turns out that the processors are eminently suited for neural network calculation.
So now I have a very powerful AI machine humming in my home office. Humming? Actually it is a fairly loud whirring sound of multiple fans dissipating the heat from the 600 watts of energy the computer consumes. That heats the room to a very comfortable 23°C, with central heating turned off. Actually you get used to the steady woosh of the machine. There is one interesting thing to consider: if I had had this machine around the year 2000 it would have been the most powerful computer in the world!
What do we do with the super-machine? A friend who is an expert on computer chess uploaded all the tools needed to build a neural network for chess, and the machine went to work playing an average 95,000 games per day against itself, learning from them and from other games. In a few months, we hope, it will reach the AlphaZero level of play and maybe even go further. It already is able to stand up to top brute force programs, some running on massive hardware at 1.6 billion positions per seconds.
All this is exquisitely exciting, not just because our AI program may advance to new superhuman levels of chess playing strength. More important is that it does this in a completely new way, not with brute force tactics but with positional ideas that it has come up with, after studying millions of games. All by itself, with no human intervention.
And that is not the whole story. The techniques used by DeepMind are not only applicable to chess. One can use neural networks to learn all kinds of things — recognise images, faces, handwriting; process natural language; calculate motion (e.g. for advanced computer games or robots); understand economy and stock markets, making better predictions than human experts; and many other things that are coming in the next decade. These young programmers want to understand how their field is being transformed by the transition from explicit hand-coding to unsupervised learning by computers which, in many areas, are already doing a better job than humans.
AlphaZero is just an early example of computers solving complex problems without human intervention. It has demonstrated in striking fashion that this is possible — and, we must conclude, not just for Go and chess. We are going to see the same process take place in many other areas of human endeavour. It is the future of mankind, and we would do well to be prepared for it.
Previous articles on the subject
The adventure of chess programming (1)
Did you know that the first chess program was written by Alan Turing a few years before the first computers were built. The first chess program to actually run on a machine was MANIAC, written in 1951 by the atomic bomb scientists in Los Alamos. Fifty years later computers were challenging world champions, and today it is pointless for any human to play against a digital opponent.
The adventure of chess programming (2)
How do computers play chess, how do they “think”? The author discusses the very, very big numbers involved in looking ahead at all possible continuations. Unfortunately the effort to prune the search tree and only look at plausible lines failed, while advances in hardware and software development led to the triumph of the “brute force” method.