### Programming for Biomedical Informatics
#### Week 4 Assignment - PubMed Searching

In this weekly mini assignment you will practice using eUtils to query PubMed, there are 3 examples of PubMed query in the snippets for week 4.

- if you want to use the Bio:Entrez module make sure that you've installed Biopython
- you should already have a free NCBI account so that you can get an API key, but if not please register for an NCBI account
- you have the option of Bio:Enrtez or the ```requests``` API approach demonstrated over the last 2 weeks.

We've included the basic code below based on the weekly snippets from the GitHub ```./notebooks/week4``` feel free to explore and try different things.

In [1]:
# using NCBI-NLM eUtils API direclty

import urllib.request
import xml.etree.ElementTree as ET

# load your API key from the file
with open('../api_keys/ncbi.txt', 'r') as file:
    api_key = file.read().strip()

# load your email from the file
with open('../api_keys/ncbi_email.txt', 'r') as file:
    email = file.read().strip()

In [2]:
def get_paper_details(pubmed_id):

    pubmed_id_query = f'{pubmed_id}[pmid]'

    # Define the parameters for the eSearch request
    esearch_params = {
        'db': 'pubmed',
        'term': pubmed_id_query,
        'api_key': api_key,
        'email': email,
        'usehistory': 'y'
    }

    # encode the parameters so they can be passed to the API
    encoded_data = urllib.parse.urlencode(esearch_params).encode('utf-8')

    # the base request url for eSearch
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"

    # make the request
    request = urllib.request.Request(url, data=encoded_data)
    response = urllib.request.urlopen(request)

    # read into an XML object
    esaerch_data_XML = ET.fromstring(response.read())

    # Extract WebEnv and QueryKey
    webenv = esaerch_data_XML.find('WebEnv').text
    query_key = esaerch_data_XML.find('QueryKey').text

    efetch_params = {
    'db': 'pubmed',
    'query_key': query_key,
    'WebEnv': webenv,
    'api_key': api_key,
    'email': email
    }

    # encode the parameters so they can be passed to the API
    encoded_data = urllib.parse.urlencode(efetch_params).encode('utf-8')

    # the base request url for eSummary
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"

    # make the request
    request = urllib.request.Request(url, data=encoded_data)
    response = urllib.request.urlopen(request)

    # read into an XML object
    data_XML = ET.fromstring(response.read())

    # extract the title
    title = data_XML.find('.//ArticleTitle').text

    # extract the authors
    authors = data_XML.findall('.//Author')

    # extract the abstract
    abstract = data_XML.find('.//AbstractText').text

    return title, authors, abstract

In [3]:
paper_data = get_paper_details('39177104')

title, authors, abstract = paper_data

print(f'Title: {title}')
print('Authors:')
for author in authors:
    print(f'\t{author.find("LastName").text}, {author.find("ForeName").text}')
print(f'Abstract: {abstract}')

Title: Multi-Omic Graph Diagnosis (MOGDx): a data integration tool to perform classification tasks for heterogeneous diseases.
Authors:
	Ryan, Barry
	Marioni, Riccardo E
	Simpson, T Ian
Abstract: Heterogeneity in human diseases presents challenges in diagnosis and treatments due to the broad range of manifestations and symptoms. With the rapid development of labelled multi-omic data, integrative machine learning methods have achieved breakthroughs in treatments by redefining these diseases at a more granular level. These approaches often have limitations in scalability, oversimplification, and handling of missing data.
