## [Consensus and Profile](https://rosalind.info/problems/cons/)

### Background
Say that we have a collection of DNA strings, all having the same length `n`. Their profile matrix is a `4 Ã— n` matrix `P` in which `P(1,j)` represents the number of times that 'A' occurs in the `j`th position of one of the strings, `P(2,j)` represents the number of times that 'C' occurs in the `j`th position, and so on (see below).

**DNA strands**
```
A T C C A G C T
G G G C A A C T
A T G G A T C T
A A G C A A C C
T T G G A A C T
A T G C C A T T
A T G G C A C T
```

**Profile**
```
A   5 1 0 0 5 5 0 0
C   0 0 1 4 2 0 6 1
G   1 1 6 3 0 1 0 0
T   1 5 0 0 0 1 1 6
```

A consensus string `c` is a string of length `n` formed from our collection by taking the most common symbol at each position; the `j`th symbol of `c` therefore corresponds to the symbol having the maximum value in the `j`th column of the profile matrix. 

**Consensus**
```
A T G C A A C T
```

Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.

### Problem
**Given:** A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.

**Return:** A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

### Example
Input:
```
>Rosalind_1
ATCCAGCT
>Rosalind_2
GGGCAACT
>Rosalind_3
ATGGATCT
>Rosalind_4
AAGCAACC
>Rosalind_5
TTGGAACT
>Rosalind_6
ATGCCATT
>Rosalind_7
ATGGCACT
```

Output:
```
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6
```

In [35]:
def get_reads(fasta_file: str) -> dict:
    """Reads FASTA file and returns DNA strands in dictionary."""
    
    reads = {}
    
    with open(fasta_file, "r") as file:
        lines = file.read().splitlines()
        last_line = lines[-1]
        
        for line in lines:
            if not line.startswith(">Rosalind"):
                reads[read] += line
                continue
            
            if line == last_line:
                break
            
            read = line.replace(">", "")
            reads[read] = ""
            
    return reads

In [36]:
def generate_profile_matrix(reads: dict) -> dict:

    """
    Generates profile matrix for collection of DNA strands in FASTA format.

    Args:
        reads (dict): Dictionary with key-value pairs of read
        title and DNA strand.

    Returns:
        list: Nested dictionary representing 4 x n profile matrix.
    """
    
    pass

In [None]:
def get_consensus(profile_matrix: list) -> str:
    
    """Returns first consensus strand found from the given profile matrix.

    Args:
        profile_matrix (list): Profile matrix of dimension 4 x n,
        with nucleotide occurrences in each strand of length n.

    Returns:
        str: Consensus strand.
    """
    
    pass

In [37]:
import ipytest
ipytest.autoconfig()

def test_reads():
    actual = get_reads("./sample_datasets/10_collection_01.txt")
    expected = {
        'Rosalind_1': 'ATCCAGCT',
        'Rosalind_2': 'GGGCAACT',
        'Rosalind_3': 'ATGGATCT',
        'Rosalind_4': 'AAGCAACC',
        'Rosalind_5': 'TTGGAACT',
        'Rosalind_6': 'ATGCCATT',
        'Rosalind_7': 'ATGGCACT'
    }
    assert actual == expected

ipytest.run()

[32m.[0m

[32m                                                                                            [100%][0m
[32m[32m[1m1 passed[0m[32m in 0.00s[0m[0m


<ExitCode.OK: 0>