Using Biopython to Get PubMed ID and Create NLP Summaries for PubMed Articles

Certainly, here's a README markdown file explaining the code I provided:

Using Biopython to Get PubMed ID and Create NLP Summaries for PubMed Articles

This code utilizes Biopython and spaCy to extract PubMed ID and create NLP summaries for PubMed articles. The pipeline is divided into the following steps:

Use Biopython's Entrez API to search for and retrieve PubMed IDs for a given search query.
Download the PubMed article XML/txt files corresponding to each retrieved PubMed ID from an Amazon S3 bucket.
Parse the article XML files using Biopython's Medline module to extract the article text. (not required)
Process the article text using spaCy's NLP pipeline to generate a summary of the article.
Save the summary in a CSV file named after the original article file.

Getting Started

To get started, you'll need to have the following prerequisites installed:

Python 3
Biopython
spaCy
An AWS account with access to an S3 bucket containing PubMed article XML files.

You can install the necessary Python packages using pip by running the following command:

pip install biopython spacy

You'll also need to download the spaCy English language model by running the following command:

python -m spacy download en_core_web_sm

Usage

Once you have the prerequisites installed, you can run the code by running the main.py script in the project directory. Before running the script, you'll need to modify the following variables in the main.py file to match your AWS S3 bucket and PubMed search query:

S3_BUCKET = "your-s3-bucket-name"
SEARCH_QUERY = "your-pubmed-search-query"

After setting these variables, you can run the script by navigating to the project directory in your terminal and running:

python main.py --> rename later!

This will search PubMed for articles matching your search query, download the corresponding article XML files from your S3 bucket, process the article text using spaCy, and generate a summary CSV file for each article in the text directory.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
run_pubmed_summary.py		run_pubmed_summary.py
search_pubmed.py		search_pubmed.py
text_to_summary.py		text_to_summary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Biopython to Get PubMed ID and Create NLP Summaries for PubMed Articles

Getting Started

Usage

About

Releases

Packages

Languages

tydymy/pubmed_summarize

Folders and files

Latest commit

History

Repository files navigation

Using Biopython to Get PubMed ID and Create NLP Summaries for PubMed Articles

Getting Started

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages