<div style="background-color:lightgrey; padding:10px">

# <font color='red'> Quick Navigate </font>
1. [New content addition](#new-content)

</div>


## GPT Article-Filter Version PubMed_scraper_GPT

In this notebook, we utilize several libraries including `langchain`, `nltk`, `openai`, `pymed`, `Bio` among others, to filter articles based on specific criteria.

The environment variable 'OPENAI_API_KEY' is set and the base API for `openai` is updated to "https://fmops.ai/api/v1/proxy/openai/v1".

Below are the libraries used:

1. `langchain` : Used for creating conversational AI models.

2. `nltk` : Natural Language Toolkit, used for working with human language data.

3. `openai` : Used to access the OpenAI API for generating human-like text.

4. `os` : The OS module in Python provides functions for interacting with the operating system.

5. `pymed` : Python wrapper for the PubMed Open Access database.

6. `pandas` : A data manipulation and analysis library.

7. `re` : Python's built-in module to work with Regular Expressions.

8. `time` : This module provides various time-related functions.

9. `requests` : Used for making HTTP requests in Python.

10. `Bio` : Biopython is a set of freely available tools for biological computation.

11. `docx` : Python library for creating and updating Microsoft Word (.docx) files.

12. `spacy` : Library for advanced Natural Language Processing.

13. `wordcloud` : A word cloud (or tag cloud) is a visual representation of text data.

14. `docx.shared` : Allows sharing of certain common functions, classes and submodules.


In [None]:
#GPT Article-Filter Version
from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate
from langchain.memory import ConversationBufferWindowMemory
from langchain.chat_models import ChatOpenAI
import nltk
from nltk import tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import openai
import os
from pymed import PubMed
import pandas as pd
import re
import time
import requests
from Bio import Entrez
from docx import Document
import spacy
from wordcloud import WordCloud
from docx.shared import Inches

os.environ['OPENAI_API_KEY'] = 'Your OPENAI API Here'
openai.api_base = "https://fmops.ai/api/v1/proxy/openai/v1"

nltk.download('punkt')
nltk.download('vader_lexicon')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\choyo\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\choyo\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

## Creating Templates for Context and Gene

In this part of the code, we create templates that will be used to provide a structured form of interaction with the AI model. The templates are designed in such a way that they define how a conversation or question should be structured.

The **first template** is `template_Context`, it is used to ask the AI to explain what a certain gene is based on the provided text. The text is passed as the `abstract` variable and the gene is passed as the `gene` variable.

The **second template** `template_Gene1` is used to ask the AI if a certain gene is used in the context of a full name gene or as a transcription factor based on a provided sentence.

Each template is attached to a `PromptTemplate` which is then used by an `LLMChain` object to create a language model. The `LLMChain` uses `ChatOpenAI` model with a temperature of 0, which means the output will be deterministic and less random.

In [None]:
template_Context = """<Question: Explain in detail what {gene} is in the Text Provided?><Text: {abstract}>
Your Answer(Do not use abbreviation):"""

prompt_Context = PromptTemplate(
    input_variables=["abstract", "gene"],
    template=template_Context,
)

keyword_Context = LLMChain(
    llm=ChatOpenAI(temperature=0),
    prompt=prompt_Context
)

#-----------------------------
template_Gene1 = """
<Is {gene} in the provided Text used in the context {fullName} gene or transcription factor?
Say no if Text says not mentioned or does not appear)]>
<Text: {sentence}>
Your Answer(Yes or No only):"""

prompt_Gene1 = PromptTemplate(
    input_variables=["sentence", "gene", "fullName"],
    template=template_Gene1,
)

is_Gene = LLMChain(
    llm=ChatOpenAI(temperature=0),
    prompt=prompt_Gene1
)



## Defining Supporting Functions

A set of supporting functions are defined in this portion of the code. They serve various purposes, including search term generation, gene name fetching, information extraction, text cleaning, abstract fetching, and word cloud generation. These functions are critical in processing and transforming the data for further use.

1. **gene_to_search**: This function generates search terms for a given gene and its full name. The search terms are constructed for PubMed search with a focus on Autism but excludes references to Cancer and Tumor.

2. **gene_fullName**: This function retrieves the full name of a gene given its abbreviation by making a request to the 'mygene.info' API.

3. **extract_geneInfo**: This function extracts the gene information from a given query string.

4. **remove_html_tags**: This function removes HTML tags from a given text string.

5. **fetch_abstract**: This function fetches the abstract of a paper from PubMed given its PMID.

6. **generate_wordcloud**: This function generates a word cloud given an input text. The word cloud is based on gene names, entities of certain types, and certain patterns in the text.


In [None]:
def gene_to_search(element, fullName):
    ls = []
    ls.append("(" + element + "[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract]) \
             OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])")

    ls.append("(" + fullName + "[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract]) \
             OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])")

    return ls

def gene_fullName(gene_abbr):
    url = f'https://mygene.info/v3/query?q=symbol:{gene_abbr}&fields=name'
    time.sleep(0.1)
    response = requests.get(url)
    data = response.json()

    return data['hits'][0]['name']

def extract_geneInfo(query):
    split_query = query.split('(')
    gene_info = split_query[1].split('[Title/Abstract]')
    return gene_info[0]

def remove_html_tags(text):
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

def fetch_abstract(pmid):
    handle = Entrez.efetch(db="pubmed", id=pmid, rettype="xml")
    records = Entrez.read(handle)
    try:
        abstract_sections = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["Abstract"]["AbstractText"]
        abstract = "\n".join(str(section) for section in abstract_sections)
    except KeyError:
        abstract = "No abstract available"
    return abstract

def generate_wordcloud(input_text, nlp1, nlp2, nlp3):
    doc1 = nlp1(input_text)
    doc2 = nlp2(input_text)
    doc3 = nlp3(input_text)

    gene_pattern = r"^[A-Z]{1}[A-Za-z0-9_-]*[A-Za-z]{1}[A-Za-z0-9_-]*$"
    genes = [token.text for token in doc2 if re.match(gene_pattern, token.text) and token.pos_ == 'NOUN']

    entities1 = ['_'.join(ent.text.split()) for ent in doc1.ents if ent.label_ in
                 {'ORGAN', 'CELL', 'DEVELOPING_ANATOMICAL_STRUCTURE', 'PATHOLOGICAL_FORMATION'}]
    entities2 = ['_'.join(ent.text.split()) for ent in doc2.ents if ent.label_ in
                 {'DISEASE'}]
    entities3 = [ent.text for ent in doc2.ents if ent.label_ in
                 {'TAXON'}]
    combined_entities = genes + entities1 + entities2 + entities3

    stem_cell_pattern = re.compile(r'\bstem cell\b', re.IGNORECASE)
    ipsc_pattern = re.compile(r'\bipsc\b', re.IGNORECASE)
    if re.search(stem_cell_pattern, input_text):
        combined_entities.append('stem_cell')
    if re.search(ipsc_pattern, input_text):
        combined_entities.append('iPSC')

    filtered_entities = []
    for entity in combined_entities:
        if not any([entity in other_entity and entity != other_entity for other_entity in combined_entities]):
            filtered_entities.append(entity)

    filtered_text = ' '.join(filtered_entities)

    if not filtered_text:
        img = Image.new('RGB', (400, 200), color='white')
        return img

    wordcloud = WordCloud(background_color='white', max_words=100, contour_width=3, contour_color='steelblue')
    wordcloud.generate(filtered_text)

    return wordcloud


## Defining Main Functions

These main functions serve to search queries, analyze articles, and process extracted sentences.

1. **search_Query_GPT**: This function uses a query to search PubMed for articles, fetches the articles' abstracts and other relevant information, applies filters, and stores the results in a DataFrame.

2. **article_Interest**: This function analyzes the abstracts, predicts the context and gene information, and calculates an interest score using sentiment analysis.

3. **extract_Sentences**: This function extracts sentences from the text containing a target keyword or an abbreviation in parentheses, and then processes them to remove unwanted details and redundancies.


In [None]:
def search_Query_GPT(query, gene, fullName, keyword_Context_GPT, is_Gene_GPT):
    Entrez.email = "choyoungb@gmail.com"
    time.sleep(0.2)
    handle = Entrez.esearch(db="pubmed", term=query, retmax=5000)
    record = Entrez.read(handle)
    pmid_list = record["IdList"]
    article_list = []
    element = extract_geneInfo(query)

    for pmid in pmid_list:
        try:
            time.sleep(0.2)
            handle = Entrez.efetch(db="pubmed", id=pmid, rettype="xml")
            time.sleep(0.2)
            records = Entrez.read(handle)

            try:
                title = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["ArticleTitle"]
                title = remove_html_tags(title)
                pub_date = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["Journal"]["JournalIssue"]["PubDate"]
                article_ids = records["PubmedArticle"][0]["PubmedData"]["ArticleIdList"]
                doi_url = "NA"
                # The PubDate field can be a dictionary with 'Year', 'Month', and 'Day' keys, or just a 'Year' key
                for article_id in article_ids:
                    if article_id.attributes["IdType"] == "doi":
                        doi_url = "https://doi.org/" + article_id
                if 'Year' in pub_date:
                        year = pub_date['Year']
                else:
                    year = None
                url = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}"
                full_abstract = fetch_abstract(pmid)
                full_abstract = remove_html_tags(full_abstract) if full_abstract else ''
                title_and_abstract = title + full_abstract

                #check if any lower cased homonyms are detected.
                if element.isupper() and not re.search(r'\d', element):
                    if gene not in title_and_abstract:
                        continue

                    if fullName.lower() in title_and_abstract.lower() or (len(gene) >= 4 or re.search(r'\d', gene)):
                        pass
                    else:
                        print("------------------------------------------")
                        print(url)
                        temp = extract_Sentences(title_and_abstract, gene)
                        time.sleep(0.2)
                        score = article_Interest(gene, temp, fullName, keyword_Context_GPT,is_Gene_GPT)
                        print('▶ ' + str(score))
                        # Check if score is 1
                        if score <= 0:
                            continue

                article_dict = {'info': "Url: " + url + "\n" + "DOI: " + doi_url + "\n\n" + "Title(" + year + "): "
                                + title + "\n\n" + full_abstract + "\n\n"}
                article_list.append(article_dict)


            except IndexError:
                print(f"|Detected Excerpt, not an abstract, or no doi, with PMID:{pmid}|")
                continue

        except Exception as e:
            print(f"|Detected Excerpt, not an abstract, or no doi, with PMID:{pmid}|")
            print(e)

    search_df = pd.DataFrame(article_list)
    return search_df

def article_Interest(gene, full_Abstract, fullName, keyword_Context_GPT, is_Gene_GPT):
    time.sleep(0.3)
    AI_Context = keyword_Context_GPT.predict(abstract=full_Abstract, gene=gene)
    time.sleep(0.3)
    AI_Gene = is_Gene_GPT.predict(sentence=AI_Context, gene=gene, fullName=fullName)

    print(AI_Context)
    print('▶ ' + AI_Gene)
    sid = SentimentIntensityAnalyzer()
    article_Score = sid.polarity_scores(AI_Gene)
    interest_Score = article_Score['compound']

    return 1 if interest_Score > 0 else 0


def extract_Sentences(text, target_keyword):
    sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
    keyword_sentences = [sentence for sentence in sentences if target_keyword in sentence]
    processed_sentences = []
    abbreviation = []

    for sentence in keyword_sentences:
        if sentence.count(',') > 2:
            temp = sentence.split(',')

            keyword_index = -1
            for i, part in enumerate(temp):
                if target_keyword in part:
                    keyword_index = i
                    break

            temp = [temp[0], temp[keyword_index], temp[-1]]

            processed_sentences.append(','.join(temp))
        else:
            processed_sentences.append(sentence)

    # Find sentences containing single-word abbreviations in parentheses
    abbreviation_sentences = [sentence for sentence in sentences if re.search(r'\([A-Za-z]+\)', sentence)]

    # Process abbreviation sentences
    for sentence in abbreviation_sentences:
        if sentence.count(',') > 2:
            temp = sentence.split(',')

            abbreviation_index = -1
            for i, part in enumerate(temp):
                if re.search(r'\([A-Za-z]+\)', part):
                    abbreviation_index = i
                    break

            temp = [temp[0], temp[abbreviation_index], temp[-1]]

            abbreviation.append(','.join(temp))
        else:
            abbreviation.append(sentence)

    combined_text = ' '.join(abbreviation + processed_sentences)
    # Remove words encapsulated in parentheses and any extra whitespace
    cleaned_text = re.sub(r'\s\([A-Za-z]+\)', '', combined_text)

    # Remove abbreviations from the cleaned text
    for abbr in abbreviation:
        cleaned_text = cleaned_text.replace(abbr, '')

    return cleaned_text

## Loading Spacy Models

Here, we load three different models from SpaCy for named entity recognition. This step may take some time due to the size of the models.

- `en_ner_bionlp13cg_md`: This model is trained on the BioNLP 13CG corpus and is suitable for recognizing various biomedical named entities.
- `en_ner_bc5cdr_md`: This model is trained on the BC5CDR corpus, which focuses on recognizing chemical and disease named entities.
- `en_ner_craft_md`: This model is trained on the CRAFT corpus, providing good performance for a broader range of biomedical named entities.



In [None]:
#This step takes time
nlp1 = spacy.load("en_ner_bionlp13cg_md")
nlp2 = spacy.load("en_ner_bc5cdr_md")
nlp3 = spacy.load("en_ner_craft_md")

## Setting up PubMed and Loading Gene List

1. We first setup the `PubMed` tool by passing our tool name (`"MyTool"`) and an email address. Replace `"choyoungb@gmail.com"` with the email that you used to register on PubMed.

In [None]:
# Replace email that you use for pubmed login
pubmed = PubMed(tool="MyTool", email="choyoungb@gmail.com")  # change to your email

# Replace tf with your list of gene names
tf = ['WT1', 'MYD88', 'KLF4', 'BLM', 'BRD4', 'RAF1', 'SMAD2', 'NEGR1', 'IRS2', 'ASXL2']  # list of the first five genes
print(tf)
len(tf)

['WT1', 'MYD88', 'KLF4', 'BLM', 'BRD4', 'RAF1', 'SMAD2', 'NEGR1', 'IRS2', 'ASXL2']


10

## Data Processing and Document Generation

The following steps are performed in this code block:

1. **Initialize DataFrame**: We start by initializing an empty pandas DataFrame with columns 'gene' and 'info'.

2. **Search and Process Genes**: For each gene in our transcription factors (tf) list, we conduct a search using the GPT model. The search results are stored in the DataFrame. If the gene does not exist in the 'gene' column of the DataFrame, we append it. Otherwise, we simply concatenate the new data with the existing DataFrame.

3. **DataFrame Completion**: After processing all the genes and completing our DataFrame, we print a message indicating the completion of the DataFrame.

4. **Document Initiation**: We then initiate a Word Document and add a table to it.

5. **Word Cloud Generation and Insertion**: For each row in our DataFrame, we generate a word cloud from the 'info' column, save it as an image, and insert this image into our Word Document.

6. **Document Saving**: Once we've processed all the rows in the DataFrame and inserted the corresponding word clouds, we save our Word Document and print a message indicating the completion of the document.


In [None]:
df = pd.DataFrame(columns=['gene', 'info'])


for gene in tf:
    fullName = re.sub(r'[^A-Za-z0-9\s]+', ' ', gene_fullName(gene))
    query_list = gene_to_search(gene, fullName)


    for search_phrase in query_list:
        print(search_phrase)
        search_df = search_Query_GPT(search_phrase, gene, fullName, keyword_Context, is_Gene)
        for index, row in search_df.iterrows():
            existing_row = df[df['info'] == row['info']]

            if not existing_row.empty:
                if gene not in existing_row['gene'].values[0]:
                    df.loc[existing_row.index, 'gene'] += f", {gene}"
            else:
                row['gene'] = gene
                df = pd.concat([df, row.to_frame().T], ignore_index=True)

print("DataFrame Complete")

doc = Document()

table = doc.add_table(rows=2 * len(df), cols=1)

row_idx = 0
for index, row in df.iterrows():
    table.cell(row_idx, 0).text = str(row["gene"])
    row_idx += 1
    abstract = row["info"].split("Title: ")[-1].split("\n\n", 1)[1]

    wordcloud = generate_wordcloud(abstract, nlp1, nlp2, nlp3)

    img_path = f"wordcloud_{index}.png"
    if isinstance(wordcloud, WordCloud):
        wordcloud.to_file(img_path)
    else:
        wordcloud.save(img_path)


    table.cell(row_idx, 0).text = str(row["info"])
    paragraph = table.cell(row_idx, 0).paragraphs[0]
    run = paragraph.add_run()
    run.add_picture(img_path, width=Inches(6))


    os.remove(img_path)

    row_idx += 1

doc.save("output.docx")
print("docx Generated")

(WT1[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(WT1 transcription factor[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(MYD88[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(MYD88 innate immune signal transduction adaptor[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(KLF4[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(KLF transcription factor 4[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(BLM[Title/Abstract]) AND (

(SMAD family member 2[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(NEGR1[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(neuronal growth regulator 1[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(IRS2[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(insulin receptor substrate 2[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(ASXL2[Title/Abstract]) AND ((PD-1[Title/Abstract]) OR (PDL-1[Title/Abstract])              OR (CTLA4[Title/Abstract])) AND (Cancer[Title/Abstract])
(ASXL transcriptional regulator 2[Title/Abstrac

# <font color='blue'>Gene Analysis and Insight Generation <a name="new-content"></a></font>


This notebook performs a comprehensive analysis of a list of genes and their potential relation to immunotherapy response. The analysis is based on literature search and subsequent processing using GPT models.

The process is as follows:

1. **Data Preparation**: An empty DataFrame is created with columns 'gene' and 'info'.

2. **Literature Search**: For each gene in the list, a literature search is performed using the gene's name and its full name. The search results are further processed using a GPT model.

3. **Data Aggregation**: The search results are aggregated into the DataFrame. If a gene is already present in the DataFrame, it is appended to the existing entry. Otherwise, a new entry is created.

4. **Document Creation**: A Word document is created with a table that contains the gene names and their corresponding information. A word cloud is generated for each gene and added to the document.

5. **Insight Generation**: For each gene, an insight is generated on how the gene's function possibly relates to immunotherapy response (PD-1, PDL-1, or CTLA4). The insights are based on the information aggregated in the DataFrame and are generated using a GPT model.

The output of this notebook includes the DataFrame with the aggregated information, the Word document with the table and word clouds, and the insights for each gene.


In [None]:
template_Insight = """<Question: In one sentence, explain how the {gene}'s function possibly
relate to immunotherapy response (PD-1, PDL-1, or CTLA4)? If it cannot be derived
from the current text, Just write 0")>
<Text: {abstract}>
Your Answer:"""

prompt_Insight = PromptTemplate(
    input_variables=["abstract", "gene"],
    template=template_Insight,
)

keyword_Insight = LLMChain(
    llm=ChatOpenAI(temperature=0),
    prompt=prompt_Insight
)

In [None]:
# Iterate over the DataFrame and generate insights for each gene
for index, row in df.iterrows():
    gene = row["gene"]
    abstract = row["info"].split("Title: ")[-1].split("\n\n", 1)[1]

    # Generate insight
    insight = keyword_Insight.predict(abstract=abstract, gene=gene)

    # Print the insight
    print(f"Insight for {gene}: {insight}")


Insight for WT1: The function of the WT1 tumor associated antigen displayed by the Bifidobacterium longum 420 oral cancer vaccine possibly enhances the efficacy of anti-PD-1 and anti-CTLA-4 antibodies in the mouse renal cell carcinoma model.
Insight for WT1: 0
Insight for WT1, MYD88: 0
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by inducing immune responses and promoting T-cell and WT1-specific IgG production when combined with anti-PD-1 nivolumab.
Insight for WT1: The function of WT1 may possibly relate to immunotherapy response through the combined blockade of PD-1 and TIM3, leading to improved expansion of antigen-specific CD8+ T cells for adoptive immunotherapy.
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by affecting the ratio of WT1-specific cytotoxic lymphocytes (WT1-CTLs) to MPE CD8+ T cells and the fraction of central memory T (TCM) of WT1-CTLs, which may impact the efficacy of immunotherapies targeting PD

Insight for RAF1: The RAF1 fusion in the melanoma may be associated with elevated expression of the RAS/RAF downstream co-effector ETV5 and ERK activation, which could potentially explain the profound response to MEK inhibitor therapy and suggest a possible relationship between RAF1 function and immunotherapy response.
Insight for RAF1: 0
Insight for SMAD2: The function of SMAD2 may possibly relate to immunotherapy response by enhancing the sensitivity of pancreatic cancer cells to gemcitabine chemotherapy, suppressing epithelial-mesenchymal transition (EMT) and immune escape, and reducing the expression of PD-L1 and CD47.
Insight for SMAD2: The function of SMAD2 may possibly relate to immunotherapy response by promoting glycolysis and TGF-β secretion in the tumor microenvironment, leading to immunotherapy resistance in bladder cancer.
Insight for SMAD2: 0
Insight for SMAD2: SMAD2's function possibly relates to immunotherapy response by being involved in GC-derived TGF-β1-mediated CD8+

In [None]:
template_WT1 = """<Question: Combining the information from the text provided, In one paragraph,
explain how the WT1's function possibly relate to immunotherapy response (PD-1, PDL-1, or CTLA4).")>
<Text: {abstract}>
Your Answer:"""

prompt_WT1 = PromptTemplate(
    input_variables=["abstract"],
    template=template_WT1,
)

keyword_WT1 = LLMChain(
    llm=ChatOpenAI(temperature=0),
    prompt=prompt_WT1
)

In [None]:
# Iterate over the DataFrame and generate insights for each gene
abstract= '''
Insight for WT1: The function of the WT1 tumor associated antigen displayed by the Bifidobacterium longum 420 oral cancer vaccine possibly enhances the efficacy of anti-PD-1 and anti-CTLA-4 antibodies in the mouse renal cell carcinoma model.
Insight for WT1: 0
Insight for WT1, MYD88: 0
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by inducing immune responses and promoting T-cell and WT1-specific IgG production when combined with anti-PD-1 nivolumab.
Insight for WT1: The function of WT1 may possibly relate to immunotherapy response through the combined blockade of PD-1 and TIM3, leading to improved expansion of antigen-specific CD8+ T cells for adoptive immunotherapy.
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by affecting the ratio of WT1-specific cytotoxic lymphocytes (WT1-CTLs) to MPE CD8+ T cells and the fraction of central memory T (TCM) of WT1-CTLs, which may impact the efficacy of immunotherapies targeting PD-1, PDL-1, or CTLA4.
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by inducing WT1-specific cytotoxic T lymphocytes (CTLs) and helper T lymphocytes (HTLs), which can be enhanced by combination treatment with an anti-PD-1 antibody.
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by potentially enhancing the stimulatory potential of IL-15 DC vaccines and overcoming PD-1-mediated inhibition by antigen-specific T cells.
Insight for WT1: The function of the WT1 cancer vaccine possibly relates to enhancing the response to anti-PD-1 antibody treatment, either alone or as an adjunct therapy, in patients with advanced urothelial cancer including bladder cancer.
Insight for WT1: The function of WT1 may possibly relate to immunotherapy response by increasing the infiltration of CD4+ T cells, CD8+ T cells, and NK cells into the tumor, while the function of anti-PD-1 antibody may be to decrease PD-1 molecule expression on tumor-infiltrating CD8+ T cells.
Insight for WT1: 0
Insight for WT1: The function of WT1 possibly relates to immunotherapy response by enhancing T cell reactivity towards PD-L1 silenced AML cells.
Insight for WT1: The function of WT1 may possibly relate to the immunotherapy response by inducing a systemic and tumor-specific immune stimulatory effect, as evidenced by the transient decrease in regulatory T cells and simultaneous increase in activated PD-1+ T cells after IRE treatment.
Insight for WT1: 0
Insight for WT1: The function of WT1 possibly relates to immunotherapy response through the negative regulation of tumor-responding CD8+ T cells by PD-L1.
Insight for WT1: 0
Insight for WT1: 0
Insight for WT1: The function of the WT1 antigen is not mentioned in the text, so it cannot be determined how it relates to immunotherapy response.
Insight for WT1: 0
Insight for MYD88: 0
Insight for MYD88: The function of MYD88 is not mentioned in the text, so it cannot be derived from the current text (0).
'''
answer = keyword_WT1.predict(abstract=abstract)

# Print the insight
print(answer)


The function of the WT1 tumor associated antigen possibly relates to immunotherapy response by enhancing the efficacy of anti-PD-1 and anti-CTLA-4 antibodies, inducing immune responses and promoting T-cell and WT1-specific IgG production, improving expansion of antigen-specific CD8+ T cells, affecting the ratio of WT1-specific cytotoxic lymphocytes to MPE CD8+ T cells and the fraction of central memory T cells, inducing WT1-specific cytotoxic T lymphocytes and helper T lymphocytes, enhancing the stimulatory potential of IL-15 DC vaccines, increasing the infiltration of CD4+ T cells, CD8+ T cells, and NK cells into the tumor, enhancing T cell reactivity towards PD-L1 silenced AML cells, and negatively regulating tumor-responding CD8+ T cells by PD-L1.
