## Introduction

Hangman is a paper and pencil guessing game for two or more players. One player thinks of a word, phrase or sentence and the other tries to guess it by suggesting letters or numbers, within a certain number of guesses. If the guessing player suggests a letter which occurs in the word, the other player writes it in all its correct positions. If the suggested letter or number does not occur in the word, the other player draws one element of a hanged man stick figure as a tally mark. If the other player makes enough (maybe 5) incorrect guesses to allow his opponent to complete the diagram, the game is also over, this time with the guesser losing. However, the guesser can also win by guessing all the letters or numbers that appears in the word.

An example game of Hangman in progress. The underlined letters appear in the word in their correct places, while the crossed-out letters do not appear, and each crossed-out letter corresponds to one part of the drawing. In this case, the secret word is "hangman".

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Hangman_game.jpg/440px-Hangman_game.jpg">

This tutorial will introduce you to implement both Hangman Game Model and AI Model to play the game. The Game Model is used to interact with players such as giving puzzle and returning guess feedback. And the AI Model is designed based on a good strategy to play this game. Different AI strategy could be designed to increase the successful guess probability. 

Wanna to try? Let's do it!

### Tutorial content

We will cover the following topics in this tutorial:
- [Build Hangman game model](#Build-Hangman-game-model)
- [Design Hangman AI model](#Design-Hangman-AI-model)
- [Letter Frequency AI model](#Build-Letter-Frequency-AI-model)
- [Probability Hangman AI model](#Build-Probability-Hangman-AI-model)
- [Summary](#Summary)

## Build Hangman game model

Before getting started to design hangman strategy, you'll need to implement hangman game model class first to interact with players.

This class should support 3 functions.

1. Load a word dictionary from local file.
This would read words to generate a dictionary from local text file, so each time when player starts a new game, it would randomly pick up a word as puzzle from dictionary. 

2. Start a new game
When play requests model to start a new game, it will reset player's life and return a new puzzle back.

3. Return feedback of every guess.
Model should take different operations based on player's guessed letter.
If this letter is not in the puzzle, decrease player's life and return the original puzzle.
If this letter is in the puzzle, update this char in the puzzle to replace '_', and return it

In [60]:
import random
class HangmanModel:
    def __init__(self, filename):
        """
        Initialize an hangman model.
        
        Args:
            filename: filename of a document to load a puzzle dictionary.
        """
        file = open(filename)
        self.dictionary = []
        dictionarys = []
        while 1:
            line = file.readline()
            if not line:
                break
            words = line.strip().split(' ')
            dictionarys.extend(words)
        self.dictionary = list(set(dictionarys))
        pass
    
    def newGame(self):
        
        """
        Each time play requests this function to start a new game.
            
        Returns: A new puzzle word made up with '_' (not the right answer) should be return.
        """
        self.answer = random.choice(self.dictionary)
        self.life = 5
        self.puzzle = '_'*len(self.answer)
        return self.puzzle
        
    def guess(self, c):
        """
        Return feedback for each guess
        
        Args:
            c: char, indicating the letter for guess in the puzzle word
            
        Returns: tuple (puzzle,status)
            puzzle: Updated puzzle word. If the guess character is in the puzzle word, update it and return. 
                    Otherwise nothing changes.
            status: Player's current status, ALIVE / DEAD
        """
        status = 'ALIVE'
        if self.life <= 0:
            return self.puzzle,'DEAD'
        if c in self.answer:
            for i in range(len(self.answer)):
                if self.answer[i] == c:
                    self.puzzle = self.puzzle[:i] + c + self.puzzle[i+1:]
            return self.puzzle,status
        self.life -= 1
        if self.life <= 0:
            status = 'DEAD'
        return self.puzzle,status
        pass

In [61]:
# Test Hangman model
H = HangmanModel('dictionary.txt')
print len(H.newGame()),H.answer
print H.guess('A')
print H.guess('E')
print H.guess('I')
print H.guess('O')
print H.guess('U')

13 RHYNCHOPHORAN
('___________A_', 'ALIVE')
('___________A_', 'ALIVE')
('___________A_', 'ALIVE')
('______O__O_A_', 'ALIVE')
('______O__O_A_', 'ALIVE')


## Design Hangman AI model

What is the optimal strategy for guessing letters to maximize the chances of getting your first letter?

In its purest form, hangman is a word game played between two people. One person selects a secret word, and the other tries to determine the word by guessing it letter-by-letter.

As a young person, when you first started to play the game, you probably called out random letters. Once you got a hit of a couple of letters, it helped you narrow down the solution.

After playing a number of games, you probably graduated to learning that not all letters are used equally. It's rare that the letter ‘Q' appears in a word, whereas ‘T' is used a lot more often. And also, once you get just a couple of letters of in a hangman puzzle, the game becomes easier. 

These hints give us insights to design an AI model for hangman. The order to guess letters has certain strategies to get better performance.

PS: we expect the target word to be in our dictionary, and we use this dictionary to design strategy.

<img src="http://i.kinja-img.com/gawker-media/image/upload/s--6dNKRV6f--/17igdqkbuubcmpng.png">


### Build Letter Frequency AI model

In the English language, the twelve most commonly occurring letters are, in descending order: e-t-a-o-i-n-s-h-r-d-l-u. This and other letter-frequency lists are used by the guessing player to increase the odds when it is their turn to guess. 

The sequence below represents the usage order of letters in the English language, with the letter ‘E’ being the most common letter. It would be a better idea to guess high frequent letter, instead of random guess.

Therefore, we can build our frequency AI model by simple guess the word by the order of frequency of letter.

Basically, we load the words from dictionary, and then count their frequency by each of char. And finally we sort it in decreasing order to generate our final guessed order.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b0/English_letter_frequency_%28frequency%29.svg/760px-English_letter_frequency_%28frequency%29.svg.png">


In [62]:
import operator
def getFreqencyGuessOrder(filename):
    """
        Return guessed order list
        
        Args:
            filename: filename to load words
            
        Returns: 
            orderList: guessed order list such as ['A','B','C'...]
        """
    file = open(filename)
    wordcnt = {}
    while 1:
        line = file.readline().strip()
        if not line:
            break
        for i in line:
            c = i.upper()
            if c < 'A' or c > 'Z':
                continue
            if c not in wordcnt:
                wordcnt[c] = 0
            wordcnt[c] += 1
    order = [key for key,value in sorted(wordcnt.iteritems(), key=lambda (k,v): (v,k))]
    order.reverse()
    return order
    pass

In [63]:
# get frequncy english char list (sort by decreasing order)
guessOrder = getFreqencyGuessOrder('dictionary.txt')
print guessOrder

['E', 'I', 'A', 'O', 'R', 'N', 'T', 'S', 'L', 'C', 'U', 'P', 'M', 'D', 'H', 'Y', 'G', 'B', 'F', 'V', 'K', 'W', 'Z', 'X', 'Q', 'J']


Design frequency AI model to play a hangman game

Note: this AI model would simply play the hangman game by guessing the word by the frequncy of letter.

In [88]:
# build frequency AI model to solve hangman game puzzle
class FrequencyAIModel:
    
    def __init__(self, hangmanModel,guessOrder):
        """
        Args:
            hangmanModel: use this hangman model to get puzzle and guess feedback
            freqencyCharList: frequncy english letter list (sort by decreasing order)
        """
        self.guessList = guessOrder
        self.curguessId = 0
        self.hangman = hangmanModel
        pass
    
    def guess(self, puzzle):
        c = self.guessList[self.curguessId]
        self.curguessId += 1
        return c
        pass
    
    def play(self):
        self.curguessId = 0
        puzzle = self.hangman.newGame()
        c = self.guess(puzzle)
        puzzle,s = self.hangman.guess(c)
        while s!='DEAD':
            if '_' not in puzzle:
                return True
            c = self.guess(puzzle)
            puzzle,s = self.hangman.guess(c)
        pass
    
    def test(self,itern=100):
        n = 0
        for i in range(itern):
            if self.play() == True:
                n += 1
        return n * 1.0 / itern
        pass
        

Test this AI model by playing 100 times hangman game, and records the successful gussing percentage.

In [75]:
hangmanModel = HangmanModel('dictionary.txt')
AI = FrequencyAIModel(hangmanModel,guessOrder)
print AI.test(100)

0.08


### Build Probability Hangman AI model

As we can see from this game, once we've guessed some letters correctly, we know more about the word we're trying to guess, instead of simply continuing to follow the general frequency of letters.

Therefore, we need several steps to paly this game.

1) We know the length of the target word, n. Remove all words in the dictionary that aren't of length n

2) Calculate the word frequency of all letters in the dictionary

3) Guess the most frequent letter that we haven't already guessed.

4) If we guessed correctly, remove all words from the dictionary that don't match the revealed letters.

5) If we guessed incorrectly, remove all words that contain the incorrectly guessed letter

6) Go to step 2

For maximum effect, instead of calculating word frequencies of all letters in step 2, calculate the word frequencies of all letters in positions that are still blank in the target word.

Be careful, it's useless to try the same letter to guess more than once!

In [89]:
# build naive AI model to solve hangman game puzzle
class ProbabilityAIModel:
    
    def __init__(self, hangmanModel,trainFile):
        """
        Args:
            hangmanModel: use this hangman model to get puzzle and guess feedback
            freqencyCharList: frequncy english char list (sort by decreasing order)
        """
        self.trainFile = trainFile
        self.hangman = hangmanModel
        self.guessHistroy = []
        self.candidate = []
        pass
    
    def initCandidate(self, puzzle):
        file = open(self.trainFile)
        self.candidate = []
        while 1:
            word = file.readline().strip()
            if not word:
                break
            if len(word)!=len(puzzle):
                continue
            self.candidate.append(word)
        pass
    
    def guess(self, puzzle):
        file = open(self.trainFile)
        wordcnt = {}
        newCandidate = []
        for word in self.candidate:
            match = True
            for i in range(len(word)):
                if puzzle[i]!='_' and puzzle[i]!=word[i]:
                    match = False
                    break
            if match is False:
                continue
            newCandidate.append(word)
            for i in range(len(word)):
                if puzzle[i]=='_':
                    c = word[i]
                    if c in self.guessHistroy:
                        continue
                    if c not in wordcnt:
                        wordcnt[c] = 0
                    wordcnt[c] += 1
        self.candidate = newCandidate
        return max(wordcnt.iterkeys(), key=lambda k: wordcnt[k])
        pass
    
    def update(self, c):
        newCandidate = []
        for word in self.candidate:
            if c not in word:
                newCandidate.append(word)
        self.candidate = newCandidate = newCandidate
        
    def play(self):
        self.guessHistroy = []
        puzzle = self.hangman.newGame()
        self.initCandidate(puzzle)
        s = 'FIRST'
        while s!='DEAD':
            if '_' not in puzzle:
                return True
            c = self.guess(puzzle)
            self.guessHistroy.append(c)
            pre_puzzle = puzzle
            puzzle,s = self.hangman.guess(c)
            if pre_puzzle == puzzle:
                self.update(c)
        return False
        pass
    
    def test(self,itern=10):
        n = 0
        for i in range(itern):
            if self.play() == True:
#                 print 'succ'
                n += 1
#             else:
#                 print 'fail'
        return n * 1.0 / itern
        pass
        

Similarly, test this AI model by playing 100 times hangman game, and records the successful gussing percentage.

In [90]:
# test
hangmanModel = HangmanModel('dictionary.txt')
AI = ProbabilityAIModel(hangmanModel,'dictionary.txt')
print AI.test(100)

0.86


### Summary

This is an interesting application of probability theory, and where you do some coding for cool things!

Hope you have a good time here :)

More information could be find here:
(reference)
1. https://en.wikipedia.org/wiki/Hangman_(game)
2. http://stackoverflow.com/questions/9205877/solve-hangman-in-ai-way