# Web Scraping Agent with CrewAI

This notebook demonstrates how to create an intelligent web scraping and summarization system using CrewAI, which combines multiple AI agents to perform complex tasks. The system will search for and summarize information on any given topic.

## Step 1: Install Required Dependencies
The first cell installs the necessary Python packages:
- crewai: For creating and managing AI agents
- crewai[tools]: Additional tools for CrewAI
- google-generativeai: Google's Gemini API
- python-dotenv: For managing environment variables

In [1]:
# !pip install crewai "crewai[tools]" google-generativeai python-dotenv

## Step 2: Import Libraries and Load Environment Variables

This cell:
1. Imports required Python libraries
2. Uses dotenv to load environment variables from a .env file
3. Verifies that the necessary API keys (GEMINI_API_KEY and SERPER_API_KEY) are available

In [2]:
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool

# Load environment variables from .env file
load_dotenv()

# Verify API keys are loaded (optional - remove in production)
print("✓ GEMINI_API_KEY loaded:", "Yes" if os.getenv("GEMINI_API_KEY") else "No")
print("✓ SERPER_API_KEY loaded:", "Yes" if os.getenv("SERPER_API_KEY") else "No")


✓ GEMINI_API_KEY loaded: Yes
✓ SERPER_API_KEY loaded: Yes


## Step 3: Initialize Gemini LLM

This cell configures the Google Gemini LLM with:
1. Model: gemini-2.0-flash-exp
2. Temperature: 0.1 (for more focused and deterministic responses)
3. API key loaded from environment variables
4. Verbose mode disabled for cleaner output

In [3]:
# Initialize Gemini LLM using environment variable
gemini_llm = LLM(
    model='gemini/gemini-2.0-flash-exp',
    api_key=os.getenv("GEMINI_API_KEY"),
    temperature=0.1,
    verbose=False
)

print("✓ Gemini LLM initialized successfully!")


✓ Gemini LLM initialized successfully!


## Step 4: Initialize Search Tool

This cell sets up the SerperDevTool, which provides web search capabilities:
1. Configures to return 10 search results per query
2. Sets the search region to India (in)
3. Sets English as the search language
4. Uses the SERPER_API_KEY from environment variables

In [4]:
# Initialize SerperDevTool - API key automatically loaded from environment
search_tool = SerperDevTool(
    n_results=10,
    country="in",
    locale="en"
)

print("✓ SerperDevTool initialized successfully!")


✓ SerperDevTool initialized successfully!


## Step 5: Create AI Agents

This cell defines two specialized AI agents:

1. **Researcher Agent**
   - Role: Web Research Specialist
   - Purpose: Searches for recent stories on the specified topic
   - Tools: Uses SerperDevTool for web searches
   - Configuration: Uses Gemini LLM with verbose output

2. **Summarizer Agent**
   - Role: Content Summarizer
   - Purpose: Creates concise summaries of the research findings
   - Tools: Uses only the Gemini LLM
   - Configuration: Verbose output enabled for monitoring progress

In [5]:
# Define the topic to research
topic = "latest AI breakthroughs"  # Change this to your desired topic

# Create researcher agent
researcher = Agent(
    role="Web Research Specialist",
    goal=f"Search for the top 5 most recent and relevant stories about {topic}",
    backstory="Expert at finding the latest news and developments on any topic using advanced search tools",
    tools=[search_tool],
    llm=gemini_llm,
    verbose=True,
    allow_delegation=False
)

# Create summarizer agent
summarizer = Agent(
    role="Content Summarizer",
    goal="Create concise, well-structured summaries of research findings",
    backstory="Specialist in distilling complex information into clear, engaging summaries",
    llm=gemini_llm,
    verbose=True,
    allow_delegation=False
)

print("✓ Agents created successfully!")


✓ Agents created successfully!


## Step 6: Define Tasks

This cell creates two sequential tasks:

1. **Research Task**
   - Assigned to: Researcher Agent
   - Purpose: Search for top 5 recent stories about the topic
   - Required Information:
     - Title
     - Source
     - Key points
     - Publication date

2. **Summary Task**
   - Assigned to: Summarizer Agent
   - Purpose: Create markdown-formatted summaries
   - Output Format: Numbered list with clear sections
   - Context: Uses the output from the research task

In [6]:
# Research task
research_task = Task(
    description=f"""Search the web for the top 5 most recent stories about '{topic}'.
    For each story, collect:
    - Title
    - Source
    - Key points
    - Publication date (if available)
    
    Focus on the most recent and credible sources.""",
    expected_output="A structured list of the top 5 stories with all relevant details",
    agent=researcher
)

# Summarization task
summary_task = Task(
    description="""Create a well-formatted markdown summary of the top 5 stories.
    For each story, provide:
    - A clear headline
    - A concise 2-3 sentence summary highlighting the key insights
    - The source
    
    Format the output as a numbered list with clear sections.""",
    expected_output="A markdown-formatted summary with 5 well-organized story summaries",
    agent=summarizer,
    context=[research_task]
)

print("✓ Tasks defined successfully!")


✓ Tasks defined successfully!


## Step 7: Create and Execute Crew

This cell:
1. Creates a Crew object that coordinates both agents
2. Configures sequential processing (research first, then summarization)
3. Enables verbose mode for detailed progress tracking
4. Executes the crew with the specified topic
5. Stores the final result for display

In [7]:
# Create the crew
crew = Crew(
    agents=[researcher, summarizer],
    tasks=[research_task, summary_task],
    process=Process.sequential,
    verbose=True
)

# Execute the crew
print(f"\n{'='*60}")
print(f"Starting research on: {topic}")
print(f"{'='*60}\n")

result = crew.kickoff(inputs={'topic': topic})



Starting research on: latest AI breakthroughs



[91m

I encountered an error while trying to use the tool. This was the error: Arguments validation failed: 1 validation error for SerperDevToolSchema
search_query
  Input should be a valid string [type=string_type, input_value={'description': 'Mandator...nternet', 'type': 'str'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type.
 Tool Search the internet with Serper accepts these inputs: Tool Name: Search the internet with Serper
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search the internet', 'type': 'str'}}
Tool Description: A tool that can be used to search the internet with a search_query. Supports different search types: 'search' (default), 'news'
[0m


## Step 8: Display Results

This final cell:
1. Imports IPython display utilities
2. Prints a formatted section header
3. Displays the markdown-formatted results using IPython's display function

In [8]:
from IPython.display import Markdown, display

print("\n" + "="*60)
print("FINAL SUMMARY")
print("="*60 + "\n")

display(Markdown(str(result)))



FINAL SUMMARY



```markdown
## Top 5 AI Breakthroughs

1.  **Headline:** AI Chips Are Getting Hotter: A Microfluidics Breakthrough
    **Summary:** Microsoft is pioneering a novel cooling solution for silicon chips using microfluidics. This involves etching channels directly into the silicon, allowing cooling liquid to flow through and dissipate heat more efficiently, potentially enabling more powerful and energy-efficient AI processors.
    **Source:** news.microsoft.com

2.  **Headline:** 2 AI Breakthroughs Unlock New Potential for Health and Materials Science
    **Summary:** Generative AI foundation models are accelerating scientific discovery and improving healthcare. These models can significantly speed up the identification of new materials and assist doctors in analyzing radiology results with greater speed and accuracy.
    **Source:** news.microsoft.com

3.  **Headline:** Three New AI Breakthroughs Shaping 2026: AI Trends
    **Summary:** Deloitte identifies Agentic AI, Physical AI, and Sovereign AI as key trends poised to transform industries by 2026. Agentic AI focuses on autonomous AI agents, Physical AI integrates AI with physical systems, and Sovereign AI emphasizes data privacy and control.
    **Source:** www.deloitte.com

4.  **Headline:** The Latest AI News and AI Breakthroughs that Matter Most
    **Summary:** DeepCogito v2, a new open-source AI model, has been released, showcasing advancements in logical reasoning and task planning. According to its developers, DeepCogito v2 surpasses the performance of many closed-source AI models in these areas.
    **Source:** www.crescendo.ai

5.  **Headline:** Artificial Intelligence News -- ScienceDaily
    **Summary:** ScienceDaily reports on a brain-inspired AI breakthrough that enables computers to perceive the world more like humans. Additionally, an AI tool grounded in evidence-based medicine has outperformed other AI tools and most physicians in certain tasks.
    **Source:** www.sciencedaily.com
```