## Resultados do CDD

Por forma a obter informações sobre os domínios das proteínas, recorremos à base de dados CDD (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) e retiramos para cada proteína e para cada domínio as seguintes informações:

- __Acession Number__;

- __Name__;

- __Description.

In [6]:
import os, sys, inspect
import pandas as pd
from IPython.core.display import display, HTML

def import_modules():
    """
    Importar os módulos que desenvolvemos neste trabalho.
    """
    current_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
    parent_dir = os.path.dirname(current_dir)
    sys.path.insert(0, parent_dir)

def itemize(l):
    """
    Criar uma lista HTML dada uma lista.
    """
    html = "<ul>"
    for i in l:
        html += "<li>"
        if isinstance(i, dict):
            html += itemize_dict(i)
        else:
            html += i
        html +="</li>"
    html += "</ul>"
    return html

def itemize_dict(d):
    """
    Criar uma lista HTML dado um dicionário.
    """
    html = "<ul style=\"list-style-type: square\">"
    for k in d:
        html += "<li><strong>" + k + ":</strong> " + str(d[k]) + "</li>"  
    html += "</ul>"
    return html
    
def main():
    import_modules()
    import util.rw as rw
    
    # mostra todas as linhas
    pd.options.display.max_rows = 250
    
    # não truncar informação
    pd.set_option('display.max_colwidth', -1)

    domains = rw.read_json("files/domains.json")

    df = pd.DataFrame(domains).transpose()
    df["domains"] = df["domains"].apply(itemize)
    display(HTML(df.to_html(escape=False)))

    
main()

Unnamed: 0,domains
lpg0232,accession: COG0735name: Furdesc: Fe2+ or Zn2+ uptake regulation protein [Inorganic ion transport and metabolism]
lpg0233,"accession: cd02002name: TPP_BFDCdesc: Thiamine pyrophosphate (TPP) family, BFDC subfamily, TPP-binding module accession: pfam02776name: TPP_enzyme_Ndesc: Thiamine pyrophosphate enzyme, N-terminal TPP binding domain accession: pfam00205name: TPP_enzyme_Mdesc: Thiamine pyrophosphate enzyme, central domain accession: COG0028name: IlvBdesc: Acetolactate synthase large subunit or other thiamine pyrophosphate-requiring enzyme"
lpg0234,accession: pfam12252name: SidEdesc: Dot/Icm substrate protein This family of proteins is found in bacteria
lpg0235,accession: cl01553name: GFA super familydesc: Glutathione-dependent formaldehyde-activating enzyme
lpg0237,accession: pfam12695name: Abhydrolase_5desc: Alpha/beta hydrolase family accession: cl21494name: Abhydrolase super familydesc: alpha/beta hydrolases
lpg0238,accession: cd07119name: ALDH_BADH-GbsAdesc: Bacillus subtilis NAD+-dependent betaine aldehyde dehydrogenase-like
lpg0239,accession: cl18945name: AAT_I super familydesc: Aspartate aminotransferase (AAT) superfamily (fold type I) of pyridoxal phosphate (PLP)-dependent enzymesaccession: COG0160name: GabTdesc: 4-aminobutyrate aminotransferase or related aminotransferase
lpg0241,accession: TIGR03814name: Gln_asedesc: glutaminase A
lpg0242,accession: cl21454name: NADB_Rossmann super familydesc: Rossmann-fold NAD(P)(+)-binding proteins accession: COG0111name: SerAdesc: Phosphoglycerate dehydrogenase or related dehydrogenase
lpg0243,accession: PRK07578name: PRK07578desc: short chain dehydrogenase Provisional


[Índice](index.html) | [Anterior](uniprot_results.html) | [Seguinte](ncbi_uniprot_results.html)