Building a Battleships AI

Monte-Carlo Simulation & Probability

Using Monte-Carlo simulation to build an AI for battleships

The game board

Battleships is a two player game where the goal is for each player to sink the enemies ships before the opponent can sink their ships. Each player gets a set amount of ships to place. An example board could look something like this, where blue represents empty space and grey represents a ship:

example-board

The goal of this project was to design and implement a model which can play this game at or above average human level. To simplify things, we can think of it as a one player game since there is no interaction in the game state between the two players. So we setup a system where by the model can attempt to destroy a set of randomly placed ships on the enemy grid. We then optimise the number of hits it takes the model to destroy every enemy ship.

Possible approaches

Search and destroy

If we think about how a human might approach this game, generally we can intuit that if we get a hit on the board, the adjacent squares must also contain a ship and so we will attack those next. One approach to an AI might then be to implement some hardcoded rules to complete this task. Often this might be achieved by having a Search and Destroy system where the algorithm has two modes, one where it hits randomly until it gets a hit, it then transitions into destroy mode and tries to calculate where the next ship square will be. In theory this sounds simple, but in practice, there are many edge cases such as when ships are both parallel and adjacent. Situations like this would require explicit programming and as such this is not an elegant solution. It would perform better than random, but it’s not ideal.

Reinforcement learning

Initially, I opted to try out reinforcement learning (specifically Deep Q-Learning) such that we use a neural net to take an input of some state representation of the board and give an output of the probable locations of ships. Having spent a long time attempting to make this work, I was unsuccessful. Having spoken to some more knowledgeable machine learning practitioners, there were likely a few reasons for this. In particular, Battleships can be considered a POMDP (Partially Observable Markov Decision Process) as opposed to just a MDP (Markov Decision Process). The difference being that the current state is fully observable in an MDP (an example of an MDP would be a game of chess, where all information is visible to both players in the current state) where as in Battleships, the location of enemy ships is unknown. This creates problems for Deep Q-Learning because it cannot infer this hidden information particularly well. Usually the solution to this is to make the policy network recurrent, so that it can employ some kind of inference or prediction as to what the state might be, however having tried this I was not able to get it to perform better than randomly targeting squares, perhaps because the unknown factor in battleships is not assisted by knowing past states, it is an issue of probability, which the final solution is more adept at working with. I’d like to return to this in the future and see if a deep learning solution to battleships is workable.

Monte-Carlo simulation

Thus, seeing that the task was reliant heavily on probability, I decided to investigate the Monte-Carlo Simulation algorithm, something that has been used successfully in Battleships and other probabilistic tasks in the past. The basic idea is to simulate the random placement of many ships onto the board such that a probability of containing a ship can be assigned to each square. If we randomly place 10,000 ships onto only legal positions on a simulation board, they will appear most often in certain spaces. For example, if there is only a space of one empty square between two misses, it is much less likely for a ship to be there since a ship cannot fit into the space horizontally. Thus in our random sampling we see this reflected with that square receiving a low probability of containing a ship. We can also emphasise simulated placements that overlap existing hits.

Below you can see a GIF of the game board alongside a heatmap representation of how the algorithm values each target square. We see that it values squares near points with hits much more, and it values impossible or small spaces much lower. Here, dark blue is unknown, red is a miss, light blue is a hit and pink is ship destroyed.

heatmap

Conclusion

The finished implementation is able to win in approximately 44 moves on average, which is significantly better than human average. Better performance could be possible by accounting for the humanity of the opposing player. People often have predictable strategies for placing ships that a machine could learn to predict or adapt to over several games against the same opponent. I learnt from working on this project that over-engineering a task (in this case by employing deep learning unnecessarily), although perhaps good for learning, is not always ideal. Using appropriate algorithms is usually better.

Projects

Scraping prnt.sc

Web Scraping & Machine learning

A scraper designed to crawl through the image sharing site prnt.sc, using a CNN to classify images and OCR to extract text.

Read Me

YouTube & MBTI

YouTube & MBTI

Producing, publishing and analysing YouTube & MBTI Datasets on Kaggle

Read Me

Building a Battleships AI

Monte-Carlo Simulation & Probability

Using Monte-Carlo simulation to build an AI for battleships

Read Me

Building a Battleships AI

Monte-Carlo Simulation & Probability

Using Monte-Carlo simulation to build an AI for battleships

Read Me