# PubMed

## 1. About

- PubMed comprises more than 33 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
- The PMC OA service allows users to discover downloadable resources from the PMC Open Access Subset

- `website`: https://pubmed.ncbi.nlm.nih.gov/
- `paper`: https://www.nature.com/articles/nbt.4267
- `OA Web Service`: https://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/

## 2. Api

In [2]:
import os
import json
import tarfile
import requests
from bs4 import BeautifulSoup

**Get database information:**

In [3]:
url = "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi"
r = requests.get(url)
print(r.text)

<OA><responseDate>2021-11-08 09:24:05</responseDate><request>https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi</request><repositoryName>PubMed Central Open Access FTP Repository</repositoryName><formats><format>tgz</format><format>pdf</format></formats><records><count>3933626</count><count format="tgz">3933623</count><count format="pdf">1020064</count><latest>2021-11-08 05:16:42</latest></records></OA>



**Get all the records updated on or after a specified date:**

In [4]:
url = "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2021-11-08"
r = requests.get(url)

**Get a record by id:**

In [5]:
url = "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=PMC5334499"
r = requests.get(url)
print(r.text)

<OA><responseDate>2021-11-08 09:24:16</responseDate><request id="PMC5334499">https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=PMC5334499</request><records returned-count="2" total-count="2"><record id="PMC5334499" citation="World J Radiol. 2017 Feb 28; 9(2):27-33" license="CC BY-NC" retracted="no"><link format="tgz" updated="2017-03-17 13:10:45" href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8e/71/PMC5334499.tar.gz" /><link format="pdf" updated="2017-03-03 06:05:17" href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/71/WJR-9-27.PMC5334499.pdf" /></record></records></OA>



**download tar with wget**

`!wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/27/f6/PMC6517830.tar.gz -O - | tar -xz`

**download tar with python**

In [7]:
PMCID = 'PMC5334499'
PMC_info_url = "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id="+PMCID
r = requests.get(PMC_info_url)
r_text = BeautifulSoup(r.text)
links = r_text.oa.record.findAll('link')
PMC_tar_href, PMC_pdf_href = '', ''
for l in links:
    if l['format']=='tgz':
        PMC_tar_href = l['href']
    if l['format']=='pdf':
        PMC_pdf_href = l['href']
PMC_tar_url = PMC_tar_href.replace('ftp:', 'http:')
PMC_pdf_url = PMC_pdf_href.replace('ftp:', 'http:')
print(PMC_tar_url)
print(PMC_pdf_url)

http://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8e/71/PMC5334499.tar.gz
http://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/71/WJR-9-27.PMC5334499.pdf


In [None]:
tar_r = requests.get(PMC_tar_url)
with open(PMCID+".tar.gz", 'wb') as f:
    f.write(tar_r.content)
file = tarfile.open(PMCID+'.tar.gz') 
file.extractall('.') 
file.close() 

**download pdf with python**

In [60]:
pdf_r = requests.get(PMC_pdf_url)
with open(PMCID+".pdf", 'wb') as f:
    f.write(pdf_r.content) 