# Notebook Title: Scientific Literature Mining Agents for Drug Discovery Research

## Introduction

This notebook demonstrates various agents for fetching scientific literature relevant to drug discovery and research.
It uses different APIs and libraries to access PubMed, Google Scholar, bioRxiv, Google Patents, and more.

We will cover:
1. Setting up the environment
2. Creating agents for different scientific literature sources
3. Using these agents to fetch relevant information
4. A simple document processing example with a question-answering agent

## 1. Setting up the Environment

First, we need to install the necessary libraries.

In [None]:
%pip install pymed apify-client requests beautifulsoup4 transformers docling

## 2. Creating Agents for Scientific Literature Sources

Import required libraries

In [None]:
from pymed import PubMed
from apify_client import ApifyClient
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

In [None]:
# PubMed Agent
def pubmed_agent(query, max_results=100):
    pubmed = PubMed(tool="MyTool", email="my@email.address")
    results = pubmed.query(query, max_results=max_results)
    return [article.toDict() for article in results]

In [None]:
# Google Scholar Agent
def google_scholar_agent(keyword, max_results=100):
    client = ApifyClient("<YOUR_API_TOKEN>")
    run_input = {
        "keyword": keyword,
        "proxyOptions": {"useApifyProxy": True},
        "maxResults": max_results
    }
    run = client.actor("marco.gullo/google-scholar-scraper").call(run_input=run_input)
    return list(client.dataset(run["defaultDatasetId"]).iterate_items())

In [None]:
# bioRxiv Agent
def biorxiv_agent(query, max_results=100):
    url = f"https://api.biorxiv.org/details/biorxiv/{query}/0/{max_results}"
    response = requests.get(url)
    return response.json()["collection"]

In [None]:
# Google Patents Agent
def google_patents_agent(query, max_results=10):
    url = f"https://patents.google.com/?q={query}&num={max_results}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    patents = soup.find_all("article", class_="result-item")
    return [{"title": p.find("h3").text, "link": p.find("a")["href"]} for p in patents]

## 3. Using the Agents to Fetch Information

In [None]:
# Example usage of each agent
query = "novel drug delivery systems"

print("Fetching results from PubMed...")
pubmed_results = pubmed_agent(query, max_results=5)
print(f"Found {len(pubmed_results)} results from PubMed")

print("\nFetching results from Google Scholar...")
scholar_results = google_scholar_agent(query, max_results=5)
print(f"Found {len(scholar_results)} results from Google Scholar")

print("\nFetching results from bioRxiv...")
biorxiv_results = biorxiv_agent(query, max_results=5)
print(f"Found {len(biorxiv_results)} results from bioRxiv")

print("\nFetching results from Google Patents...")
patent_results = google_patents_agent(query, max_results=5)
print(f"Found {len(patent_results)} results from Google Patents")

## 4. Simple Document Processing and Question Answering

For this example, we'll use a simulated document

In [None]:
document_text = """
# Simulated Research Paper on Novel Drug Delivery Systems

## Abstract
This paper reviews recent advancements in novel drug delivery systems, focusing on nanoparticle-based approaches and their potential in improving therapeutic efficacy and reducing side effects.

## Introduction
Drug delivery systems play a crucial role in enhancing the effectiveness of pharmaceuticals. Novel approaches, particularly those utilizing nanotechnology, have shown promising results in targeted drug delivery and controlled release.

## Methods
We conducted a comprehensive literature review of studies published in the last five years, focusing on nanoparticle-based drug delivery systems. Key databases searched included PubMed, Scopus, and Web of Science.

## Results
Our review identified several promising nanoparticle-based delivery systems, including liposomes, polymeric nanoparticles, and gold nanoparticles. These systems demonstrated improved drug solubility, enhanced cellular uptake, and reduced toxicity in various preclinical and clinical studies.

## Conclusion
Nanoparticle-based drug delivery systems show great potential in improving therapeutic outcomes across various disease areas. Further research is needed to address challenges in large-scale production and long-term safety.
"""

In [None]:
# Simulating document processing (in a real scenario, you'd use Docling for PDF parsing)
sections = document_text.split("## ")
sections = [s.strip() for s in sections if s.strip()]

# Creating a simple question-answering agent
question_answerer = pipeline("question-answering")

def qa_agent(document_sections, question_answerer):
    def agent(question: str) -> str:
        context = "\n".join(document_sections)
        result = question_answerer(question=question, context=context)
        return result["answer"]
    return agent

# Create the agent
agent = qa_agent(sections, question_answerer)

# Test the agent
questions = [
    "What is the main focus of the paper?",
    "What types of nanoparticle-based delivery systems were mentioned?",
    "What are the potential benefits of these novel drug delivery systems?"
]

for question in questions:
    answer = agent(question)
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")

## Conclusion

This notebook demonstrated how to create and use agents for fetching scientific literature from various sources,
as well as a simple document processing and question-answering system. These tools can be valuable for researchers
in drug discovery and development.

Next steps could include:
- Refining the search queries for more specific drug discovery topics
- Integrating the literature fetching agents with the document processing pipeline
- Expanding the question-answering capabilities to handle more complex queries
- Implementing a system to compare and synthesize information from multiple sources