# Overlap Graphs


A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail v and head w is represented by (v,w) (but not by (w,v)). A directed loop is a directed edge of the form (v,v).

For a collection of strings and a positive integer k, the overlap graph for the strings is a directed graph Ok in which each string is represented by a node, and string s is connected to string t with a directed edge when there is a length k suffix of s that matches a length k prefix of t, as long as s≠t; we demand s≠t to prevent directed loops in the overlap graph (although directed cycles may be present).

- Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
- Return: The adjacency list corresponding to O3. You may return edges in any order.

> **Sample Dataset**
>
```
>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG
```

> **Sample Output**
>
```
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323
```

In [91]:
def parseFasta(file):
    with open(file, 'r') as f:
        content = "".join(line.rstrip() for line in f).split(">")[1:]

    idList = [content[i][:13] for i in range(len(content))]
    seqList = [content[i][13:] for i in range(len(content))]
    
    return(idList, seqList)

In [92]:
def olapGraphs(idList, seqList, k = 3):
    
    lookUp = dict(zip(seqList, idList))
    
    import itertools
    for pair in itertools.combinations(seqList, 2):
        
        if pair[0][-k:] == pair[1][:k]:
            print(lookUp[pair[0]], lookUp[pair[1]])
            
        if pair[1][-k:] == pair[0][:k]: #don't forget the other direction!!
            print(lookUp[pair[1]], lookUp[pair[0]])

In [93]:
idList, seqList = parseFasta("Q12.txt")
olapGraphs(idList, seqList)

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323


In [95]:
idList, seqList = parseFasta("rosalind_grph.txt")
olapGraphs(idList, seqList, 3)

Rosalind_0293 Rosalind_3848
Rosalind_5288 Rosalind_3848
Rosalind_9287 Rosalind_3848
Rosalind_0218 Rosalind_3996
Rosalind_0908 Rosalind_0218
Rosalind_2634 Rosalind_3996
Rosalind_1719 Rosalind_2634
Rosalind_0807 Rosalind_8324
Rosalind_7820 Rosalind_1110
Rosalind_0987 Rosalind_7820
Rosalind_3256 Rosalind_4010
Rosalind_3256 Rosalind_1236
Rosalind_9837 Rosalind_9973
Rosalind_6807 Rosalind_9973
Rosalind_1236 Rosalind_9973
Rosalind_1089 Rosalind_9973
Rosalind_3595 Rosalind_9973
Rosalind_9837 Rosalind_1366
Rosalind_2523 Rosalind_9837
Rosalind_4520 Rosalind_9814
Rosalind_4520 Rosalind_1849
Rosalind_1849 Rosalind_4520
Rosalind_8655 Rosalind_5388
Rosalind_5388 Rosalind_9365
Rosalind_3484 Rosalind_3263
Rosalind_3484 Rosalind_0807
Rosalind_6884 Rosalind_4181
Rosalind_0987 Rosalind_4944
Rosalind_4944 Rosalind_1570
Rosalind_0447 Rosalind_1743
Rosalind_3192 Rosalind_3692
Rosalind_3192 Rosalind_3392
Rosalind_7830 Rosalind_3192
Rosalind_0526 Rosalind_3192
Rosalind_7917 Rosalind_3192
Rosalind_3996 Rosali