## [Creating a Distance Matrix](https://rosalind.info/problems/pdst/)

### Background
For two strings s<sub>1</sub> and s<sub>2</sub> of equal length, the p-distance between them – denoted d<sub>p</sub>(s<sub>1</sub>, s<sub>2</sub>) – is the proportion of corresponding symbols that differ between s<sub>1</sub> and s<sub>2</sub>.

For a general distance function `d` on `n` taxa (taxa are often represented by genetic strings), we may encode the distances between pairs of taxa via a distance matrix `D` in which D<sub>i,j</sub> = d(s<sub>i</sub>, s<sub>j</sub>).

### Problem
**Given:** A collection of n (n ≤ 10) DNA strings of equal length (at most 1 kbp).

**Return:** The matrix `D` corresponding to the p-distance d<sub>p</sub> on the given strings.

### Examples
Input:
```
>Rosalind_9499
TTTCCATTTA
>Rosalind_0942
GATTCATTTC
>Rosalind_6568
TTTCCATTTT
>Rosalind_1833
GTTCCATTTA
```

Output:
```
0.00000 0.40000 0.10000 0.10000
0.40000 0.00000 0.40000 0.30000
0.10000 0.40000 0.00000 0.20000
0.10000 0.30000 0.20000 0.00000
```

In [43]:
from wasims_toolbox import read_fasta

def distance_matrix(fasta_file: str):
    """Generates distance matrix for given reads in FASTA format."""
    
    distance_matrix = []
    reads = read_fasta(file_path=fasta_file).values()
    checked = set()
    
    def hamming_distance(read1, read2):
        """Calculate Hamming distance between two reads."""
        result = 0

        for index in range(len(read1)):
            if read1[index] != read2[index]:
                result += 1

        return round(result / len(read1), 3)
    
    def calculated(read1, read2):
        """Memoization."""
        return (read1, read2) not in checked and (read2, read1) not in checked

    for i, read1 in enumerate(reads):
        distance_matrix.append([])

        for j, read2 in enumerate(reads):
            distance_matrix[i].append(0)
            
            if calculated(read1, read2):
                distance_matrix[i][j] = hamming_distance(read1, read2)

    return distance_matrix

In [44]:
def print_distance_matrix(matrix: list):
    """Prints distance matrix for Rosalind input."""
    
    for row in matrix:
        print("\t".join([str(x) for x in row]))

In [45]:
matrix = distance_matrix("./sample_datasets/23_collection.txt")
print_distance_matrix(matrix)

0.0	0.537	0.547	0.628	0.61	0.67	0.602	0.472	0.307	0.465
0.537	0.0	0.46	0.655	0.279	0.614	0.582	0.312	0.464	0.565
0.547	0.46	0.0	0.52	0.544	0.47	0.33	0.305	0.477	0.575
0.628	0.655	0.52	0.0	0.685	0.492	0.34	0.585	0.632	0.652
0.61	0.279	0.544	0.685	0.0	0.651	0.625	0.461	0.549	0.612
0.67	0.614	0.47	0.492	0.651	0.0	0.296	0.553	0.627	0.656
0.602	0.582	0.33	0.34	0.625	0.296	0.0	0.475	0.57	0.615
0.472	0.312	0.305	0.585	0.461	0.553	0.475	0.0	0.329	0.475
0.307	0.464	0.477	0.632	0.549	0.627	0.57	0.329	0.0	0.3
0.465	0.565	0.575	0.652	0.612	0.656	0.615	0.475	0.3	0.0
