Subscribe to DSC Newsletter

What Makes Libratus the Best Poker Player in the World?

Libratus, the artificial intelligence (AI) engine designed by Professor Tuomas Sandholm at Carnegie Mellon University (CMU) and his graduate student Noam Brown has made an impression on Jason Les, one of the world’s top poker players. Poker News, the poker industry’s online news magazine, recently interviewed Les. A couple questions were telling when asked about which is a better name for his firstborn child and which is the more annoying opponent, Claudico or Libratus. For both questions, he responded with Libratus.[1],[2]

In January, Les and three others of the world’s top four poker champions—Dong Kim, Daniel McAulay, and Jimmy Chou—were challenged to 20 days of No-limit Heads-up Texas Hold ‘em poker at the Brains versus Artificial Intelligence tournament in Pittsburgh’s Rivers Casino. Libratus beat all four opponents, taking away more than $1.7 million in chips.

Les also considered that being selected to play against Libratus as his proudest poker accomplishment to date.

During the tournament, Dong Kim said about Libratus, “I didn’t realize how good it was until today. I felt like I was playing against someone who was cheating, like it could see my cards. I’m not accusing it of cheating. It was just that good.”[3] Kim also noted during the tournament that “he and his fellow humans have no real chance of winning.”[4]

Texas Hold ‘em, The Holy Grail

“In the area of game theory,” said Sandholm, “in which I’ve been working since 1989, No-Limit Heads-Up Texas Hold ‘em is the holy grail of imperfect-information games.” According to Sandholm, this version of poker was really the last game that had not been cracked by AI, in the sense of becoming better than humans. AI has already beaten top experts of other games, such as chess and go, but Texas Hold ‘em is different. In those games, the game play is open. All players know about where the pieces are at any time and the strategic possibilities presented, and thus can directly solve endgame strategies. In Texas Hold ‘em, players have a limited amount of information available about the opponents’ cards because some cards are played on the table and some are held privately in each player’s hand. The private cards represent hidden information that each player uses to devise their strategy. Thus, it is reasonable to assume that each player’s strategy is rational and designed to win the greatest amount of chips.

“Besides being a limited information play, the stakes space is huge. There are 10160 different situations the player can face. You can’t solve all those possibilities, and you can’t solve a sub-game of the game tree with information from a sub game only. How you play a sub-game actually depends on how you play in totally different parts of the game. It takes totally different algorithms from a game like chess, checkers, or go.”

Learning and Reasoning in a Limited Information Environment

Libratus is an AI system designed to learn in a limited information environment. It consists of three modules:

  1. A module that uses game-theoretic reasoning, just with the rules of the game as input, ahead of the actual game, to compute a blueprint strategy for playing the game. According to Sandholm, “It uses Nash equilibrium approximation, but with a new version of the Monte Carlo counterfactual regret minimization (MCCRM) algorithm that makes the MCCRM faster, and it also mitigates the issue of involving imperfect recall abstraction.”
  2. An subgame solver that recalculates strategy on the fly so the software can refine its blueprint strategy with each move.
  3. A post-play analyzer that reviews the opponent’s plays that exploited holes in Libratus’s strategies. “Then, Libratus re-computes a more refined strategy for those parts of the stage space, essentially finding and fixing the holes in its strategy to play even closer to Nash equilibrium in those spots,” stated Sandholm.

 

From Claudico to Libratus

Les and Kim also played against Sandholm’s previous AI poker player, Claudico, in 2015. And, while the two programs were authored by the same persons, Libratus is a different application. “Between Claudico and Libratus, we developed a better equilibrium-finding algorithm for module one, so we could do much finer-grained abstractions. The endgame solver of module two is a big improvement. In Claudico, we found during the 2015 tournament, that the application ran well with or without the endgame solver we had built. Libratus’ endgame solver improves play between 80 to 100 milli-big blinds per hand (mbb/hand),[5] increases safety, or non-exploitability, and it is a nested endgame solver, whereas Claudico was not. In module three, Claudico essentially continued overnight what it did in module one, while in Libratus, it actually fixes exploitable holes in its own strategy. That’s an entirely new aspect,” explained Sandholm.

Sandholm and his students have been working on poker optimization for 13 years. He typically had two PhD Students working with him, “but the actual code for Libratus was written from scratch,” stated Sandholm. He and his student, Noam Brown, spent a little over a year writing Libratus. “In 2015, we couldn’t beat the best players with Claudico, and many wondered how much we could accomplish in just a little over a year of work on Libratus,” added Sandholm. “In fact, the international betting sites were putting us at from four to five to one odds against.” It turned out not to be a safe bet.

Bridges, the Supercomputing Poker Machine

For the tournament, Libratus ran on the newest supercomputer, called Bridges, at the Pittsburgh Supercomputing Center (PSC)  The system is a unique configuration built for traditional types of applications, like simulating astrophysical phenomena, and for machine learning workloads, like Libratus. Bridges was built by Hewlett Packard Enterprise (HPE), using Intel® Xeon® processors and Intel® Omni-Path Architecture (Intel® OPA), a new networking technology from Intel that is now being used in the world’s fastest supercomputers. Bridges comprises 846 nodes of computing power, with both “regular memory” nodes and “high-memory” nodes, the latter often used in data-intensive applications, such as de novo gene assemblies to put together data from gene sequencers.

Sandholm chose Bridges, because it needed supercomputing power. “We needed roughly 15 million core-hours of compute time for module one,” said Sandholm, “and 10 million core-hours for modules two and three.” According to Nick Nystrom, Sr. Director of Research at PSC, Bridges can supply up to 27,000 cores for the largest projects that need intensive computing. During tournament play, Libratus consumed 600 of Bridges’ 752 regular memory nodes, using 400 for the endgame solver and 200 for self-learning module three, which ran 24 hours a day. At night, the application consumed all 600 nodes for module three.

AI for Human Endeavors

AI that can master the best human players in limited information games has great commercial and civil potential, because so many activities between humans is in the context of limited information, including financial trading and negotiations—from something as complex as world peace to the much simpler task of buying a home. “The algorithms in Libratus are not poker-specific at all,” commented Sandholm. “The research involved in finding, designing, and developing application-independent algorithms for imperfect information games is directly applicable to other two-player, zero sum games, like cyber security, bilateral negotiation in an adversarial situation, or military planning between us and an enemy.”

An area that Sandholm and his team have been working on since 2012 is using algorithms developed for AI, like Libratus, for steering evolutionary and biological adaptation to design treatment protocols that can “trap” a disease. “In particular, we are looking at steering a person's T-cell population, using sequences of treatments, so that the T-cells will better fight a disease that a patient has.”

T-cells are the main cells of the immune system. There are various kinds of T-cells, and they are malleable; they will adapt and turn into different kinds of T-cells based on the steering by the treatment protocol. “In this case,” added Sandholm, “the opponent is myopic. It can’t think strategically at all, so we design treatment plans that have traps, where the disease can fall into one of the traps that we set. But steering is not perfect, so with various treatments you can make the immune system work on the disease, and with these algorithms you can potentially come up with a better steering plan that will guide the immune system to a more favorable outcome,” Sandholm recently obtained a grant that will allow him to move this and other works from the theoretical development they have been doing to live cell cultures in a laboratory.

Sandholm established a company called Strategic Machines, Inc. as a vehicle to be able to apply his work in the commercial sector, from playing poker to finance and cybersecurity.

Playing Super-Human Poker

In the American wild western context of who really is the best gunslinger in the territory, it was suggested that the next step would be to pit Libratus against itself, to which Sandholm responded, “Libratus is very different from human poker play, so if we really want to see what super-human poker looks like, we could play it against itself, and there could be a lot to learn from that.”

A Mind-Blowing Accomplishment

Sandholm and his students have done something that no one else has been able to accomplish. “It’s mind blowing to build something that is better than yourself,” stated Sandholm. “We researchers are mediocre poker players, but we built an application that can beat the best in the world.”

[1] https://www.pokernews.com/news/2017/02/calling-the-clock-with-jason...

[2] Claudico was an AI Sandholm created that was used in 2015 also to play poker against professional players, among whom was Les. In that tournament, the humans beat the computer.

[3]"Artificial Intelligence Is About to Conquer Poker—But Not Without H...". WIRED. Retrieved 2017-01-24.

[4] Ibid.

[5] Mbb/g is a normalizing measure of well a player plays.

# # #

This article was produced as part of Intel’s High Performance Computing (HPC) editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.

 

Views: 875

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service