Monuments to failure – part one – Paper Rock Scissors

Howdy all, in this column I begin an {n} part series exploring ideas and code of mine that for one reason or another did not pan out. It’s important to note that none of these ideas were directly tied to my research and funding, though any could have found their way in pending better results. These posts will broadly represent blind explorations of tools and concepts outside of my discipline in an attempt to broaden my repertoire. In each entry I will explore the basic idea, why I tried it, what design considerations I made, and why I considered it a failure. I will also link or will embed the source code for others to experiment upon with attribution under the GNU general public license.



The Project: PRSLearn

The goal of PRSLearn was to create a recursive neural network that could iteratively learn from its opponents’ strategies and win at the game of paper rock scissors. I chose this example as it presented a simple form of competition requiring no complex actions and no dependency on prior states. The true challenge was to establish some form of intuition regarding patterns in an opponent’s behavior and how they may respond to one’s previous actions. Thereby, any player that won a margin more than half of the time against a competent opponent would be presumed to be acting upon some form of intuition.

The Interest

I started this project in direct response to Google’s TensorFlow public release. I’d long had an eye on exploring neural networks for optimization and classification problems. Unfortunately they’d always seemed a bit too niche in their utility via hardware requirements and the availability of entry level libraries. There is some valid criticism regarding TensorFlow’s performance when contrasted to older, more established libraries. However, I have faith in big G, and their entry to any field marks the onset of rapid progress and usable tools. But – scikit-learn was easier to work with, and I was busy, so I a used scikit-learn recursive neural network classifier instead. Sorry google.

The Methods

Like most learning algorithms, recursive neural networks ingest training sets broken apart by features. I reasoned that the most useful features for classification would be the winning move based upon a review of prior sequences of self and opponent moves as well as the general frequency with which self and opponent tended to make certain plays. As a result, I made a list of lambda functions which would look at the most recent history for each player, extract true or false values for the presence of certain sequences, and return floats for the frequency of individual plays. Running this via a list comprehension would create a vector of 1.0-0.0 values which could then be used to train a scikit-learn classifier to select from paper, rock, and scissors. My hope was that by experimenting with pairings of the machine learning algorithm selections, the length of history considered, and the length of sequences compared, I might gain some interesting findings on specific methods that competed most ably.

The Failure

The big fail here became immediately apparent. While each experiment showed unique and interesting behavior at first, they quickly degenerated into long sequences of paper, paper, paper, paper. This would run for maybe 25 generations with the opponent playing rock, rock, rock, rock. Suddenly, in a flash of inspiration, the opponent switches to scissors, scissors, scissors and the cycle begins anew. It’s hard to know what’s happening within the black box, but I’d speculate the RNN’s were simply naval gazing as it were, being stuck on local maxima and paying undue attention to their own strategies that worked rather than the opponent’s. I tried creating a quick support vector machine player to compete against the RNN but faced the same results. I was hoping that by extracting features for each RNN from their personal history that they might learn a flavor of deception. Maybe they’d even do the opposite of what they normally would to keep an opponent on their toes. The possibility to utilize a random input was added to promote this outcome. But, I forgot that in all of my past dabbling with machine learning such algorithms are prone to cheating relentlessly and lazing about the simplest possible solution that made a slight improvement, like paper, paper, paper. The resulting players could easily be defeated by a human opponent, and as such I moved on to greener pastures.