# Web scraping tool for extracting meta-data from a list of PMIDs

# Step 1: Import the PMIDs
There are 3 ways to do this:

### (i) Manually insert the PMIDs
Import the PMIDs manually. <br>
This tool might help: https://docs.google.com/spreadsheets/d/1SXp37d2XkDuvkK-6YUtDt78Wu3NPCF2FSbwbLj_tigg/edit?usp=sharing


In [None]:
# Create a list called pmids
# pmids = ['37804447', '36818337', '38931296']

or
### (ii) Save the desired PMIDs from PubMed as a text file called "<b>pmids.txt</b>"
In Pubmed, click "save citations to file" and select format: PMID. <br>
Then import the .txt file into Colab folder and "Copy path". <br>
IMPORTANT: Colab does not store uploaded files (unless the file is stored in your Google Drive and you have linked Colab to your Google Drive).


In [11]:
# Initialize an empty list to store PMIDs
pmids = []
# Open the .txt file and read line by line
with open('/content/pmids.txt', 'r') as file:
  for line in file:
    # Strip any leading/trailing whitespace characters (including newlines)
    pmid = line.strip()
    # Add the PMID to the list
    pmids.append(pmid)

or
### (iii) Save the desired PMIDs from PubMed as a CSV file named "<b>pmids.csv</b>"
In PubMed, click "save citations to file" and select format: CSV. <br>
Then import the .csv file into Colab folder and "Copy path". <br>
IMPORTANT: Colab does not store uploaded files (unless the file is stored in your Google Drive and you have linked Colab to your Google Drive).

In [None]:
# Import pandas (needed to read the CSV file)
# import pandas as pd
# Read the 'PMID' column in the pmids.csv file.
# df = pd.read_csv('/content/pmids.csv', usecols=['PMID'])
# Send the PMID column to a list
# pmids = df['PMID'].tolist()

# Step 2: Run the code below to extract the meta-data from each PMID.

In [12]:
# Install Biopython package https://pypi.org/project/biopython/ and https://biopython.org/wiki/Documentation
!pip install biopython
from Bio import Entrez
from urllib.parse import quote

# Insert any email address for PubMed entrez to work
Entrez.email = "anynerd@anymail.com"

def fetch_article_details(pmids):
    details = []

    for pmid in pmids:
        try:
            # Fetch article details from PubMed using Entrez esummary
            handle = Entrez.esummary(db="pubmed", id=pmid, retmode="xml")
            records = Entrez.read(handle)
            handle.close()

            if records and len(records) > 0:
                record = records[0]
                title = record.get("Title", "No Title Available")

                # Construct the PubMed URL
                pubmed_url = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/"

                # Fetch the DOI
                doi = record.get("DOI", "No DOI Available")
                doi_url = f"https://doi.org/{quote(doi)}" if doi != "No DOI Available" else "DOI Not Available"

                # Fetch scihub link
                scihub_url = f"https://sci-hub.st/{quote(doi)}" if doi != "No DOI Available" else ""

                details.append({
                    "PMID": pmid,
                    "Title": title,
                    "PubMed URL": pubmed_url,
                    "DOI URL": doi_url,
                    "Sci-hub URL": scihub_url
                })
            else:
                print(f"No details found for PMID: {pmid}")

        except Exception as e:
            print(f"An error occurred while fetching details for PMID: {pmid}. Error: {e}")

    return details

# Fetch article details
article_details = fetch_article_details(pmids)

# Output the results
for article in article_details:
    print(f"○ {article['PubMed URL']} {article['DOI URL']} {article['Sci-hub URL']} {article['Title']}")

○ https://pubmed.ncbi.nlm.nih.gov/38507778/ https://doi.org/10.1139/apnm-2023-0595 https://sci-hub.st/10.1139/apnm-2023-0595 Health benefits of interval walking training.
○ https://pubmed.ncbi.nlm.nih.gov/37471216/ https://doi.org/10.1152/japplphysiol.00662.2022 https://sci-hub.st/10.1152/japplphysiol.00662.2022 Exercise-induced changes to the fiber type-specific redox state in human skeletal muscle are associated with aerobic capacity.
○ https://pubmed.ncbi.nlm.nih.gov/37127822/ https://doi.org/10.1038/s42255-023-00799-7 https://sci-hub.st/10.1038/s42255-023-00799-7 Effects of different doses of exercise and diet-induced weight loss on beta-cell function in type 2 diabetes (DOSE-EX): a randomized clinical trial.
○ https://pubmed.ncbi.nlm.nih.gov/35111398/ https://doi.org/10.7717/peerj.12755 https://sci-hub.st/10.7717/peerj.12755 Plasma FGF21 concentrations are regulated by glucose independently of insulin and GLP-1 in lean, healthy humans.
○ https://pubmed.ncbi.nlm.nih.gov/34122354/ h

# Then, copy-paste the output above into Google Doc/Word Doc.
## Note that when copied into Doc, you will have to adjust the font, font color, background color, etc.