## Using UniProt API

- Guillem Ylla

- Links of interest:
    - https://www.uniprot.org/help/programmatic_access
    - https://www.uniprot.org/help/api_queries
    - https://www.ebi.ac.uk/proteins/api/doc/



### Get the fasta of your proteins of interest with Website query

1. Make a query on UNIPROT website: https://www.uniprot.org/uniprotkb?query=*
    - For example "actin", and put filters (for example, select human)
2. Click Download -> URL for API -> Compressed NO -> copy URL of the query
    - For previous example is: https://rest.uniprot.org/uniprotkb/stream?compressed=true&format=fasta&query=%28actin%29+AND+%28model_organism%3A9606%29

In [None]:
import requests
import re
import json

In [None]:
url = 'https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28actin%29+AND+%28model_organism%3A9606%29'  # put the URL of the query here

In [None]:
Query = requests.get(url).text # make the query to Uniprot

In [None]:
type(Query) # string with fasta file of all selected proteins

In [None]:
fasta_list = re.split(r'\n(?=>)', Query)
len(fasta_list)

In [None]:
#fasta_list
fasta_list[1:5]

### Get Proteins info from list of Uniprot IDs

https://www.ebi.ac.uk/proteins/api/doc/ 

#### Given a protein ID get information (i.e. GO terms)

In [None]:
protID="P0CY46"

In [None]:
url_2="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+protID
print(url_2)

In [None]:
Query_2= requests.get(url_2, headers={ "Accept" : "application/json"}) # make the query to NCBI

In [None]:
print(Query_2.status_code)

In [None]:
Query_2.json()

In [None]:
Query_json=Query_2.json()
type(Query_json)

In [None]:
Query_json[0] ## see all the inforamtion for the protein

In [None]:
Query_json[0]["accession"]

In [None]:
Query_json[0]["id"]

In [None]:
Query_json[0]["gene"]

In [None]:
#Query_json[0]["dbReferences"]
for i in Query_json[0]["dbReferences"]:
    #print(i)
    if i["type"] =="GO":
        print(i["id"],i["properties"]["term"] )
 


In [None]:
#Query_json[0]["dbReferences"]
for i in Query_json[0]["dbReferences"]:
    #print(i)
    if i["type"] =="PROSITE":
        print(i["id"] ,i["properties"]["entry name"] )
 


#### Given a list of protein IDs get information (i.e. PROSITE domains)

**Option 1:** iterate over the previous code.

In [None]:
listIDs=["P0CY46", "P00533", "Q29537" ]

for ID in listIDs:
    print(ID)
    url_3="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+ID
    Query_3= requests.get(url_3, headers={ "Accept" : "application/json"}).json() # make the query to NCBI
    for i in Query_3[0]["dbReferences"]:
        #print(i)
        if i["type"] =="PROSITE":
                print("\t",i["id"], i["properties"]["entry name"] )

**Option 2 (Recommended):** Query mutiple prots, and iterate over returned output

In [None]:
listID=["P0CY46", "P00533", "Q29537","Q9VKM1" ]

url_4="https://www.ebi.ac.uk/proteins/api/proteins?&accession="+str.join(",",listID)

print(url_4)

Query_4= requests.get(url_4, headers={ "Accept" : "application/json"}).json() # make the query to NCBI


In [None]:
len(Query_4)

In [None]:
for query in Query_4:
    print(query["accession"])

In [None]:
#query


In [None]:
for query in Query_4:# iterate for each queried protein
    accession=query["accession"]
    protid=query["id"]
    species=[]
    for i in query["organism"]["names"]:#contains a list, it means that there might be diferent elements, let's iterate
        species=i["value"]
    GOlist=[]
    for functDB in query["dbReferences"]:# for each fucntional datbase
        if functDB["type"] =="GO":
            GOlist.append(functDB["id"])
    print(accession, protid, species, str.join(";",GOlist),"\n" )
