## Using UniProt API

- Guillem Ylla

- Links of interest:
    - https://www.uniprot.org/help/programmatic_access
    - https://www.uniprot.org/help/api_queries
    - https://www.ebi.ac.uk/proteins/api/doc/



### Get the fasta of your proteins of interest with Website query

1. Make a query on UNIPROT website: https://www.uniprot.org/uniprotkb?query=*
    - For example "actin", and put filters (for example, select human)
2. Click Download -> URL for API -> Compressed NO -> copy URL of the query
    - For previous example is: https://rest.uniprot.org/uniprotkb/stream?compressed=true&format=fasta&query=%28actin%29+AND+%28model_organism%3A9606%29

In [2]:
import requests
import re
import json

In [None]:
url = 'https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28actin%29+AND+%28model_organism%3A9606%29'  # put the URL of the query here

In [None]:
Query = requests.get(url).text # make the query to Uniprot

In [None]:
type(Query) # string with fasta file of all selected proteins

In [None]:
fasta_list = re.split(r'\n(?=>)', Query)
len(fasta_list)

In [None]:
#fasta_list
fasta_list[1:5]

### Get Proteins info from list of Uniprot IDs

https://www.ebi.ac.uk/proteins/api/doc/ 

#### Given a protein ID get information (i.e. GO terms)

In [None]:
protID="P0CY46"

In [None]:
url_2="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+protID
print(url_2)

In [None]:
Query_2= requests.get(url_2, headers={ "Accept" : "application/json"}) # make the query to NCBI

In [None]:
print(Query_2.status_code)

In [None]:
Query_2.json()

In [None]:
Query_json=Query_2.json()
type(Query_json)

In [None]:
Query_json[0] ## see all the inforamtion for the protein

In [None]:
Query_json[0]["accession"]

In [None]:
Query_json[0]["id"]

In [None]:
Query_json[0]["gene"]

In [None]:
#Query_json[0]["dbReferences"]
for i in Query_json[0]["dbReferences"]:
    #print(i)
    if i["type"] =="GO":
        print(i["id"],i["properties"]["term"] )
 


In [None]:
#Query_json[0]["dbReferences"]
for i in Query_json[0]["dbReferences"]:
    #print(i)
    if i["type"] =="PROSITE":
        print(i["id"] ,i["properties"]["entry name"] )
 


#### Given a list of protein IDs get information (i.e. PROSITE domains)

**Option 1:** iterate over the previous code.

In [None]:
listIDs=["P0CY46", "P00533", "Q29537" ]

for ID in listIDs:
    print(ID)
    url_3="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+ID
    Query_3= requests.get(url_3, headers={ "Accept" : "application/json"}).json() # make the query to NCBI
    for i in Query_3[0]["dbReferences"]:
        #print(i)
        if i["type"] =="PROSITE":
                print("\t",i["id"], i["properties"]["entry name"] )

**Option 2 (Recommended):** Query mutiple prots, and iterate over returned output

In [3]:
listID=["P0CY46", "P00533", "Q29537","Q9VKM1" ]

url_4="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+str.join(",",listID)

print(url_4)

Query_4= requests.get(url_4, headers={ "Accept" : "application/json"}).json() # make the query to NCBI


https://www.ebi.ac.uk/proteins/api/proteins?&accession=P0CY46,P00533,Q29537,Q9VKM1


In [4]:
len(Query_4)

4

In [5]:
for query in Query_4:
    print(query["accession"])

P00533
P0CY46
Q29537
Q9VKM1


In [None]:
#query


In [6]:
for query in Query_4:# iterate for each queried protein
    accession=query["accession"]
    protid=query["id"]
    species=[]
    for i in query["organism"]["names"]:#contains a list, it means that there might be diferent elements, let's iterate
        species=i["value"]
    GOlist=[]
    for functDB in query["dbReferences"]:# for each fucntional datbase
        if functDB["type"] =="GO":
            GOlist.append(functDB["id"])
    print(accession, protid, species, str.join(";",GOlist),"\n" )


P00533 EGFR_HUMAN Human GO:0016324;GO:0009925;GO:0016323;GO:0030054;GO:0009986;GO:0030669;GO:0005737;GO:0031901;GO:0005789;GO:0005768;GO:0010008;GO:0005615;GO:0005925;GO:0000139;GO:0097708;GO:0016020;GO:0045121;GO:0097489;GO:0031965;GO:0005634;GO:0048471;GO:0005886;GO:0032991;GO:0043235;GO:0032587;GO:0070435;GO:0051015;GO:0005524;GO:0051117;GO:0045296;GO:0005516;GO:0003682;GO:0003690;GO:0019899;GO:0048408;GO:0005006;GO:0042802;GO:0005178;GO:0019900;GO:0004709;GO:0019901;GO:0019903;GO:0030296;GO:0004713;GO:0030297;GO:0004714;GO:0004888;GO:0031625;GO:0001618;GO:0007202;GO:0048143;GO:0007166;GO:0098609;GO:0071230;GO:0071276;GO:0071549;GO:0071364;GO:0071392;GO:0071260;GO:0034614;GO:0071466;GO:0021795;GO:0007623;GO:0048546;GO:0016101;GO:0001892;GO:0007173;GO:0050673;GO:0038134;GO:0061029;GO:0001942;GO:0042743;GO:0007611;GO:0097421;GO:0030324;GO:0007494;GO:0060571;GO:0043066;GO:1905208;GO:0042059;GO:0045930;GO:0042177;GO:0022008;GO:0048812;GO:0001503;GO:0042698;GO:0038083;GO:0018108;GO:00457