- In a new Jupyter notebook create a research AI agent and give it custom tools.  
  새로운 Jupyter notebook에서 리서치 AI 에이전트를 만들고 커스텀 도구를 부여합니다.

- The agent should be able to do the following tasks:

  - Search in Wikipedia
  - Search in DuckDuckGo
  - Scrape and extract the text of any website.
  - Save the research to a .txt file

- Run the agent with this query:  
  "Research about the XZ backdoor", the agent should try to search in Wikipedia or DuckDuckGo, if it finds a website in DuckDuckGo it should enter the website and extract it's content, then it should finish by saving the research to a .txt file.  
  "Research about the XZ backdoor" 라는 쿼리로 에이전트를 실행하면, 에이전트는 Wikipedia 또는 DuckDuckGo에서 검색을 시도하고, DuckDuckGo에서 웹사이트를 찾으면 해당 웹사이트에 들어가서 콘텐츠를 추출한 다음 .txt 파일에 조사 내용을 저장하는 것으로 완료해야 합니다.


In [4]:
# !pip install -U duckduckgo_search

""" HTTP error
Installing collected packages: primp, lxml, duckduckgo_search
...
Successfully installed duckduckgo_search-7.2.1 lxml-5.3.0 primp-0.10.1
"""

'\nInstalling collected packages: primp, lxml, duckduckgo_search\n...\nSuccessfully installed duckduckgo_search-7.2.1 lxml-5.3.0 primp-0.10.1\n'

In [8]:
import os
import json
import requests
from typing import Type

from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from langchain.agents import initialize_agent, AgentType
from langchain.utilities import DuckDuckGoSearchAPIWrapper
from langchain.utilities.wikipedia import WikipediaAPIWrapper
from bs4 import BeautifulSoup
from langchain.schema import SystemMessage

In [10]:
# Schema
class WikipediaSearchToolArgsSchema(BaseModel):
    query: str = Field(
        description="The query to search for on Wikipedia.",
    )


class DuckDuckGoSearchToolArgsSchema(BaseModel):
    query: str = Field(
        description="The query to search for using DuckDuckGo.",
    )


class WebScraperArgsSchema(BaseModel):
    url: str = Field(
        description="The URL of the website to scrape.",
    )


class SaveToFileToolArgsSchema(BaseModel):
    content: str = Field(
        description="The content to save to a file.",
    )
    filename: str = Field(
        description="The filename to save the content under.",
    )

In [11]:
# Search Tool
from urllib import response


class WikipediaSearchTool(BaseTool):
    name = "WikipediaSearchTool"
    description = "Use this tool to search for information using Wikipedia. It takes a query as an argument."
    args_schema: Type[WikipediaSearchToolArgsSchema] = WikipediaSearchToolArgsSchema

    def _run(self, query):
        wiki = WikipediaAPIWrapper(top_k_results=3)
        return wiki.run(query)


class DuckDuckGoSearchTool(BaseTool):
    name = "DuckDuckGoSearchTool"
    description = "Use this tool to search for information using DuckDuckGo. It takes a query as an argument."
    args_schema: Type[DuckDuckGoSearchToolArgsSchema] = DuckDuckGoSearchToolArgsSchema

    def _run(self, query):
        ddg = DuckDuckGoSearchAPIWrapper()
        return ddg.run(query)


class WebScraperTool(BaseTool):
    name = "WebContentScraper"
    description = "Use this tool to scrape content from a given URL."
    args_schema: Type[WebScraperArgsSchema] = WebScraperArgsSchema

    def _run(self, url):
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
        }
        try:
            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()

            soup = BeautifulSoup(response.text, "html.parser")
            for header in soup.find_all(["header", "footer", "nav"]):
                header.decompose()
            content = soup.get_text(separator="\n", strip=True)
            return content
        except requests.RequestException as e:
            return f"Error from {url}: {e}"


class SaveToFileTool(BaseTool):
    name = "SaveToFileTool"
    description = "Use this tool to save the provided content to a .txt file."
    args_schema: Type[SaveToFileToolArgsSchema] = SaveToFileToolArgsSchema

    def _run(self, content, filename):
        try:
            with open(filename, "w", encoding="utf-8") as f:
                f.write(content)
            return f"Saved to {f}"
        except Exception as e:
            return f"Error saving content to file: {e}"

In [12]:
llm = ChatOpenAI(
    temperature=0.1,
    model="gpt-4o-mini",
)

agent = initialize_agent(
    llm=llm,
    verbose=True,
    agent=AgentType.OPENAI_FUNCTIONS,
    handle_parsing_errors=True,
    tools=[
        WikipediaSearchTool(),
        DuckDuckGoSearchTool(),
        WebScraperTool(),
        SaveToFileTool(),
    ],
    agent_kwargs={
        "system_message": SystemMessage(
            content="""
            You are a highly skilled research expert specializing in gathering comprehensive and accurate information from various online sources.

            Your task is to:
            - Search Wikipedia for relevant information on the query provided.
            - Use DuckDuckGo to find additional resources or websites related to the query.
            - If DuckDuckGo results include useful URLs, scrape the content from those websites to extract relevant text information.
            - Combine all findings from Wikipedia, DuckDuckGo, and scraped websites into a well-structured, detailed answer.
            - Cite all sources (including URLs) used during the research.

            Additionally:
            - All research findings must be saved to a .txt file. Ensure the file includes properly formatted content and citations.
            - Information from Wikipedia must be included in the final report.
            - If certain resources appear incomplete or ambiguous, prioritize cross-referencing multiple sources to ensure the accuracy and reliability of the information.

            Your output should always aim to provide clear, accurate, and actionable insights based on the query, while ensuring professionalism and depth.
        """
        )
    },
)


prompt = "Research about the XZ backdoor"
response = agent.invoke(prompt)
print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `WikipediaSearchTool` with `{'query': 'XZ backdoor'}`


[0m[36;1m[1;3mPage: XZ Utils backdoor
Summary: In February 2024, a malicious backdoor was introduced to the Linux build of the xz utility within the liblzma library in versions 5.6.0 and 5.6.1 by an account using the name "Jia Tan". The backdoor gives an attacker who possesses a specific Ed448 private key remote code execution capabilities on the affected Linux system. The issue has been given the Common Vulnerabilities and Exposures number CVE-2024-3094 and has been assigned a CVSS score of 10.0, the highest possible score.
While xz is commonly present in most Linux distributions, at the time of discovery the backdoored version had not yet been widely deployed to production systems, but was present in development versions of major distributions. The backdoor was discovered by the software developer Andres Freund, who announced his findings on 29 March 2024