# Sudoku Solver

Taro Kuriyama. January 2016, Revised August 2020.
  
Functionally inspired by Richard Bird's solution in *Pearls of Functional Algorithm Design*.

<hr>

In [1]:
from collections import defaultdict

## Define and Show Board

The Sudoku board is provided as a string of digits, where either '0' or '.' defines a blank cell. 

This string is parsed into a list of 9 rows referred to throughout as `board`. Each row is a list comprised of 9 cells, where each cell is a set reflecting possible integer values. A pre-defined cell will be parsed into a singleton set, whereas an empty cell will be parsed into a set of all possible values {1, 2, 3, 4, 5, 6, 7, 8, 9}.

In [2]:
def group(items, n):
    """Group sequence into n-tuples."""
    return list(zip(*[items[i::n] for i in range(n)]))

def parse(board_str):
    """Parse puzzle input string into board object."""
    assert len(board_str) == 81
    digits = {1, 2, 3, 4, 5, 6, 7, 8, 9}
    board = group(board_str, 9)
    return [[digits if n in ('0', '.') else set([int(n)]) for n in row]
            for row in board]

**Sample Boards**

Some sample boards are borrowed from http://www.7sudoku.com/instructions/loading-puzzles.

`Show` returns a string representation of the board that can be printed for visualization.

In [3]:
easy = parse('..6...94.9.....3....4.92...6.7.1..2.5.23.64.9.3..4.7.5...68.5....5.....4.98...1..')
medium = parse('...28.94.1.4...7......156.....8..57.4.......8.68..9.....196......5...8.3.43.28...')
hard = parse('...16..2...2...8.5..5..36.9....5.18...........96.7....1.89..3..4.9...7...5..16...')

In [4]:
nefarious = parse('000060080020000000001000000070000102500030000000000400004201000300700600000000050')

In [5]:
def to_str(sets):
    """Convert sequence of integer sets to string."""
    return ' '.join(['.' if len(s) > 1 else str(tuple(s)[0])
                     for s in sets])
    
def show(board):
    """Convert board object to string representation."""
    bars = '-' * 21 + '\n'
    output = bars
    row_group = group(board, 3)
    for rows in row_group:
        for row in rows:
            output += ' | '.join([to_str(cols) for cols in group(row, 3)])
            output += '\n'
        output += bars
    return output

In [6]:
print(show(easy))

---------------------
. . 6 | . . . | 9 4 .
9 . . | . . . | 3 . .
. . 4 | . 9 2 | . . .
---------------------
6 . 7 | . 1 . | . 2 .
5 . 2 | 3 . 6 | 4 . 9
. 3 . | . 4 . | 7 . 5
---------------------
. . . | 6 8 . | 5 . .
. . 5 | . . . | . . 4
. 9 8 | . . . | 1 . .
---------------------



## Board Validation

A board is valid if there are no duplicate numbers in any rows, columns, or boxes. 

A board is complete if **(a)** it is valid and **(b)** the sum of each row, column, or box is 45 (assuming only digits 0 - 9 are ever used in the board). Note that given (a), only one of rows, columns, or boxes need to be checked for (b).

The validation functions always iterate over board transformations, which are generated by the functions `rows` (identity function), `cols` (matrix transposition), and `boxs` (grouping into boxes). This keeps the validation logic simple.

Note that `rows`, `cols`, and `boxs` are all involutions for a 9 x 9 Sudoku board as defined here, so `cols(cols(board))` is equivalent to just `board`.

In [7]:
def rows(board):
    """Return board."""
    return board

def cols(board):
    """Return transposed board."""
    return list(zip(*board))

def flatten(nested):
    """Flatten nested sequence of sequences."""
    return list(n for sublist in nested for n in sublist)

def boxs(board):
    """Group board by its boxes.
    For each triple of rows, group each row into 3 digits and 
    zip up the rows; the resulting lists are the board boxes.
    """
    boxes = []
    for grouped in group(board, 3):
        triple = [group(row, 3) for row in grouped]
        zipped = list(zip(*triple))
        rows = [flatten(row) for row in zipped]
        boxes.extend(rows)
    return boxes

In [8]:
def singletons(row):
    """Return all singleton values from list of sets."""
    return [tuple(ns)[0] for ns in row if len(ns) == 1]

def singleton_nums(row):
    """Return all singleton numbers from list of sets, excluding singleton sets.
    singletons() only finds singleton sets like {1} in [{1}, {2,3,4}, {3,4}];
    this function would find [2], since 2 only appears once in all non-singleton sets.
    """
    exclude = singletons(row)
    ns = [n for nums in row for n in nums 
          if len(nums) > 1 and n not in exclude]
    digits = defaultdict(int)    
    for n in ns:
        digits[n] += 1 
    return [k for k in digits if digits[k] == 1]

In [9]:
def noempties(board):
    """Iterate through board; return True if there are no
    empty cells in any row."""
    return all(ns for ns in flatten(board))

def nodups(board):
    """Iterate through board; return True if there are no 
    duplicate singleton numbers in any row.
    """
    checks = []
    for row in board:
        singles = singletons(row)
        checks.append(len(singles) == len(set(singles)))
    return all(checks)

def valid(board):
    """Return True if board is valid."""
    return (noempties(board) and 
            all(nodups(f(board)) for f in (rows, cols, boxs)))
    
def complete(board):
    """Return True if board is complete."""
    return (valid(board) and 
            all([sum(singletons(row)) == 45 for row in board]))

**Test Singletones and Singleton_Nums**

In [10]:
print(sorted(singletons([{1}, {1,2}, {}])) == [1])
print(sorted(singletons([{1}, {1,2}, {3}, {4,5,6}])) == [1,3])

# 4, 6, 9 are not singleton sets and appear only once in all sets
print(singleton_nums([{1}, {2}, {3,4,5}, {3}, {5,6,7}, {5,7,9}]) == [4,6, 9])

True
True
True


**Validate Sample Boards**

We can run some trivial unit tests to verify the input boards are as expected.

In [11]:
def verify_board(board):
    """Some basic unit tests"""
    assert valid(board) is True
    assert complete(board) is False
    assert len(board) == 9
    return True

In [12]:
print(verify_board(easy))
print(verify_board(medium))
print(verify_board(hard))

True
True
True


**Validate Transformations**

Validate the rows, cols, and boxs transformations don't mutate.

In [13]:
hard == boxs(boxs(cols(cols(rows(rows(hard))))))

True

<hr>

## Tracker

A global variable tracker is introduced to profile the solver's progress and debug issues (e.g. interminable loops!)

In [14]:
TRACKER = {'iters': 0,
           'stack': 0,
           'candidates': 9*9*9}

In [15]:
def update_tracker(board, stack, suppress=False):
    """Update global TRACEKRS dict"""
    TRACKER['iters'] += 1
    TRACKER['candidates'] = score_progress(board)
    TRACKER['stack'] = stack
    
    iters = TRACKER['iters']
    if not suppress:
        if iters < 1000 and iters % 100 == 0:
            print_tracker(board)
        elif iters < 5000 and iters % 500 == 0:
            print_tracker(board)
        elif iters % 1000 == 0:
            print_tracker(board)

In [16]:
def score_progress(board):
    """Return # of candidates remaining in board."""
    return sum(sum(len(cell) for cell in row) for row in board)

In [17]:
def clear_tracker():
    """Reset all valeus to 0"""
    for k in TRACKER:
        TRACKER[k] = 0
    TRACKER['candidates'] = 9*9*9

In [18]:
def print_tracker(board=[]):
    """Pirnt tracker under certain conditions"""
    print(TRACKER)
    if valid(board): 
        print(show(board))
    else:
        print('Invalid board')

<hr>

## Solver: Prune, Fill, Search

The solver works on a stack of boards and operates in a pipeline of three steps:
1. prune
2. fill
3. search

**Prune**

For each unit of row, column, or box, any cell values that are also singletons in the unit can be removed as possibilities.

**Fill**

For each unit of row, column, or box, if a value can only appear in one cell, fill that cell with the value.

**Search**

Find possible next boards by trying possible values for cells. It makes sense to work with cells that have the smallest number of possibilities. DFS is much faster than BFS for this solving model; invalid boards are pruned after search iteration.





In [19]:
def prune(board):
    """Remove known choices from the board."""
    rows = []
    for row in board:
        singles = singletons(row)
        new = [ns - set(singles) if len(ns) > 1 else ns
               for ns in row]
        rows.append(new)
    return rows

In [20]:
def fill(board):
    """Fill cells where only a single value is possible."""
    new_board = []
    for row in board:
        singles = singleton_nums(row)
        new_row = []
        for nums in row:
            intersect = set(singles) & set(nums)
            if intersect:
                new_row.append(intersect)
            else:
                new_row.append(nums)
        new_board.append(new_row)
    return new_board

In [21]:
def next_boards(board):
    """Generate list of some possible next boards."""
    flat = flatten(board)
    len_choices = [len(ns) for ns in flat if len(ns) > 1]
    
    if not len_choices: return []
    target_len = min(len_choices)

    boards = []
    for ind, ns in enumerate(flat):
        if len(ns) == target_len:
            flats = [flat[:ind] + [set([n])] + flat[ind+1:] for n in ns]
            boards.extend([group(b, 9) for b in flats])
            break
    
    return [b for b in boards if valid(b)]

The solver itself is straightforward because `rows`, `cols`, `boxs` are involutions and allow `prune` and `fill` to iterate over board transformations. For each board transformation, the board is first pruned, then filled, then inverted back to its original arrangement.

In [22]:
def solve(board, suppress=False, max_iters=10000):
    """Solver: prune, fill, and search in order."""
    clear_tracker()
    
    boards = [board]
    while boards:
        board = boards.pop()    
        for f in (rows, cols, boxs):
            board = prune(f(board))
            board = f(fill(board))
            
        if complete(board): return show(board)
        if valid(board):
            boards.extend(next_boards(board))
            boards = sorted(boards, key=score_progress, reverse=True)
        
        update_tracker(board, len(boards), suppress)
        if TRACKER['iters'] > max_iters: break
            
    return []

## Results

The solver is designed with simplicity over performance in mind, but appears to work reasonably well. It can solve 17-hint puzzles (minimum uniquely completable) but may take longer (e.g. 1 second).

In [23]:
print(show(nefarious))

---------------------
. . . | . 6 . | . 8 .
. 2 . | . . . | . . .
. . 1 | . . . | . . .
---------------------
. 7 . | . . . | 1 . 2
5 . . | . 3 . | . . .
. . . | . . . | 4 . .
---------------------
. . 4 | 2 . 1 | . . .
3 . . | 7 . . | 6 . .
. . . | . . . | . 5 .
---------------------



In [24]:
print(solve(nefarious, False, 5000))

{'iters': 100, 'stack': 8, 'candidates': 182}
---------------------
4 5 7 | 3 6 9 | 2 8 1
. 2 . | . 1 . | . . .
. . 1 | . . . | . . .
---------------------
. 7 . | . . . | 1 . 2
5 4 . | 1 3 . | . . .
1 . . | 9 . . | 4 . 5
---------------------
7 6 4 | 2 5 1 | . . .
3 1 5 | 7 . . | 6 2 .
2 . . | 6 . 3 | . 5 .
---------------------

{'iters': 200, 'stack': 9, 'candidates': 107}
Invalid board
{'iters': 300, 'stack': 11, 'candidates': 113}
---------------------
4 5 9 | 3 6 7 | 2 8 1
8 2 7 | 4 1 9 | 5 . .
6 3 1 | 8 2 5 | . 4 .
---------------------
9 7 . | 5 4 . | 1 . 2
5 4 . | 1 3 . | . . .
1 8 . | 9 7 . | 4 . 5
---------------------
7 6 4 | 2 5 1 | 3 . .
3 1 5 | 7 . 4 | 6 2 .
2 9 8 | 6 . 3 | . 5 4
---------------------

{'iters': 400, 'stack': 9, 'candidates': 103}
---------------------
4 5 9 | 3 6 7 | 2 8 1
6 2 3 | 4 1 8 | 5 . .
7 8 1 | 5 2 9 | 3 4 6
---------------------
9 7 6 | 8 4 5 | 1 . 2
5 4 . | 1 3 . | . . .
1 3 . | 9 7 . | 4 . 5
---------------------
8 6 4 | 2 5 1 | . . 3
3 1 5 |

Time some puzzles of varying degrees of supposed difficulty...

In [25]:
%%timeit
solve(easy, True)

4.74 ms ± 729 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [26]:
%%timeit
solve(hard, True)

8.67 ms ± 607 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [27]:
%%timeit
solve(nefarious, True)

1.3 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<hr>

##  More Puzzles

Some more puzzles from a massive repository of 17-hint puzzles collected here: http://staffhome.ecm.uwa.edu.au/~00013890/sudoku17

In [28]:
with open('puzzles.txt', 'r') as f:
    puzzles = [line.strip() for line in f.readlines()]

In [29]:
sample = 100
solved = [p for p in puzzles[:sample] 
          if solve(parse(p), True, 5000)]
print('{} of {} samples solved'.format(len(solved), sample))

100 of 100 samples solved
