GC - Computing GC Content
Identifying Unknown DNA Quickly
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

In [None]:
def read_file(infile):
    fin = open(infile, "r")
    seq = ""
    header = None
    all_seq = []
    for line in fin:
        if line.startswith(">"):
            if header is not None:
                all_seq.append((header,seq))
            header = line.lstrip(">").rstrip("\n")
            seq=""
        else:
            seq += line.strip()
    all_seq.append((header,seq))      
    return(all_seq)

def gc_count(filename):
    fin = read_file(filename)
    seq_ids = []
    gc_counts = []
    for entry in fin:
        seq_id = entry[0]
        seq = entry[1]
        a, t, g, c = DNA(seq)
        seq_ids.append(entry[0])
        gc_counts.append(float((g + c))*100 / float(a + t + g + c))
    max_count = max(gc_counts)
    index = [i for i, j in enumerate(gc_counts) if j == max_count]
    max_id = seq_ids[index[0]]
    return max_id, max_count

gc_count('rosalind_gc.txt')