Skip to content

Add AI research agent using smolagents #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

codebyNJ
Copy link

@codebyNJ codebyNJ commented May 12, 2025

This PR introduces a powerful, interactive Web Research Assistant built using Streamlit, integrating enhanced search tools, web scraping, AI agents, and markdown-based summarization. The tool is designed to provide recent, relevant, and readable information from the web, driven by the Hugging Face Qwen2.5-Coder-32B-Instruct model and DuckDuckGo search APIs.

🧠 Key Features:

Streamlit Interface: Intuitive, real-time search assistant with multi-tab results display (Results, Sources, Analysis, Logs).

  • Enhanced Search Agent
  • Webpage Content Extraction
  • Agent Framework
  • Logging & Debugging
  • Export Capabilities
  • Sidebar Settings

Install the dependencies

pip install streamlit pandas requests beautifulsoup4 markdownify python-dotenv smolagents huggingface_hub

Add your hugging face token

HF_TOKEN="hf_..."

Summary by CodeRabbit

  • New Features
    • Introduced a web-based research assistant that performs enhanced web searches, retrieves and summarizes content, and prioritizes recent sources through an interactive interface.
  • Documentation
    • Added a README with an overview, usage instructions, and a demo video link.
    • Included an example environment file for setting up authentication tokens.
  • Chores
    • Added a requirements file listing necessary Python packages for the project.

Copy link
Contributor

coderabbitai bot commented May 12, 2025

Walkthrough

A new research assistant application is introduced using the Smol Agents framework. The update adds a Streamlit-based interface, agent logic for enhanced web search and webpage retrieval, supporting utilities, and documentation. Dependency management and environment variable configuration are provided via requirements and example environment files.

Changes

File(s) Change Summary
researchagent-smolagents/README.md Added a README introducing the project, its purpose, usage, and demo link.
researchagent-smolagents/agents.py Implemented the main Streamlit app, agent logic, search and webpage tools, logging utilities, and UI components.
researchagent-smolagents/requirements.txt Added required Python dependencies for the project.
researchagent-smolagents/.env.example Provided an example environment file with a placeholder for the Hugging Face token.

Sequence Diagram(s)

Loading
sequenceDiagram
    participant User
    participant Streamlit UI
    participant ManagerAgent
    participant ToolCallingAgent
    participant EnhancedSearchTool
    participant VisitWebpageTool

    User->>Streamlit UI: Enter search query and settings
    Streamlit UI->>ManagerAgent: Run manager agent with query and parameters
    ManagerAgent->>ToolCallingAgent: Delegate search task
    ToolCallingAgent->>EnhancedSearchTool: Perform enhanced search
    EnhancedSearchTool->>ToolCallingAgent: Return search results
    ToolCallingAgent->>VisitWebpageTool: Fetch webpage content (as needed)
    VisitWebpageTool->>ToolCallingAgent: Return webpage content
    ToolCallingAgent->>ManagerAgent: Return aggregated results
    ManagerAgent->>Streamlit UI: Provide formatted response
    Streamlit UI->>User: Display results, sources, analysis, and logs

Poem

A rabbit hopped to Streamlit’s door,
With agents and tools, ready to explore.
It searched the web, fetched pages anew,
Summarized findings, all for you!
With tokens and logs, dependencies set,
This research bunny’s your best bet.
🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
researchagent-smolagents/README.md (1)

1-12: Polish README wording and lint issues for professionalism & discoverability.

  1. Typo: “AI Reseach Agent” ⇒ “AI Research Agent”.
  2. Style: “going to be” → “will be”.
  3. Provide a descriptive link title instead of a bare URL (MD034).
  4. Capitalise “Hugging Face”.
-# AI Reseach Agent
-A simple python application to demonstrate how agentic era is going to be. Used Smol Agents framework by hugging face.
+# AI Research Agent
+A simple Python application demonstrating how the agentic era will be.  
+Built with the Smol Agents framework by **Hugging Face**.
@@
-https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30
+[Watch the demo video](https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30)
🧰 Tools
🪛 LanguageTool

[style] ~2-~2: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...lication to demonstrate how agentic era is going to be. Used Smol Agents framework by huggi...

(GOING_TO_WILL)

🪛 markdownlint-cli2 (0.17.2)

6-6: Bare URL used
null

(MD034, no-bare-urls)

researchagent-smolagents/agents.py (4)

10-20: Remove unused imports to keep the module clean.

timedelta and DuckDuckGoSearchTool aren’t referenced after import.

-from datetime import datetime, timedelta
+from datetime import datetime
@@
-    DuckDuckGoSearchTool,
🧰 Tools
🪛 Ruff (0.8.2)

10-10: datetime.timedelta imported but unused

Remove unused import: datetime.timedelta

(F401)


18-18: smolagents.DuckDuckGoSearchTool imported but unused

Remove unused import: smolagents.DuckDuckGoSearchTool

(F401)


210-223: Log buffer may grow unbounded – reset or cap size to prevent memory bloat.

st.session_state.log_container accumulates every search’s logs for the lifetime of the session. For long sessions this can degrade performance.

Minimal fix: keep only the last n entries.

-    st.session_state.log_container.extend(captured_logs)
+    st.session_state.log_container.extend(captured_logs)
+    # Cap to last 1 000 lines
+    max_logs = 1000
+    if len(st.session_state.log_container) > max_logs:
+        st.session_state.log_container = st.session_state.log_container[-max_logs:]

394-396: Combine nested with statements for cleaner, more readable code.

-                        with st.expander(f"Source {idx}: {source}"):
-                            # Add a loading spinner while fetching content
-                            with st.spinner(f"Loading content from source {idx}..."):
+                        with st.expander(f"Source {idx}: {source}"), \
+                             st.spinner(f"Loading content from source {idx}..."):

This follows the Ruff SIM117 recommendation and reduces indentation.

🧰 Tools
🪛 Ruff (0.8.2)

394-396: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


267-270: Cache expiry missing – external pages may change or vanish.

@st.cache_data without a ttl keeps content forever and can serve stale pages.

Add a reasonable TTL:

-@st.cache_data
+@st.cache_data(ttl=60 * 60)  # 1 hour
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ef42de and 2a973f8.

📒 Files selected for processing (4)
  • researchagent-smolagents/.env (1 hunks)
  • researchagent-smolagents/README.md (1 hunks)
  • researchagent-smolagents/agents.py (1 hunks)
  • researchagent-smolagents/requirements.txt (1 hunks)
🧰 Additional context used
🪛 LanguageTool
researchagent-smolagents/README.md

[style] ~2-~2: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...lication to demonstrate how agentic era is going to be. Used Smol Agents framework by huggi...

(GOING_TO_WILL)

🪛 markdownlint-cli2 (0.17.2)
researchagent-smolagents/README.md

6-6: Bare URL used
null

(MD034, no-bare-urls)

🪛 Ruff (0.8.2)
researchagent-smolagents/agents.py

10-10: datetime.timedelta imported but unused

Remove unused import: datetime.timedelta

(F401)


18-18: smolagents.DuckDuckGoSearchTool imported but unused

Remove unused import: smolagents.DuckDuckGoSearchTool

(F401)


89-89: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)


394-396: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

Comment on lines 1 to 2
#add you hugging face token
HF_TOKEN="hf_..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Avoid committing real or placeholder secrets – ship an example file instead.

Including .env in the repo risks leaking real credentials and trains newcomers to commit secrets.

• Rename this file to .env.example and replace the value with a clear placeholder (HF_TOKEN="<YOUR_TOKEN_HERE>").
• Add .env to .gitignore so personal tokens stay local.

-#add you hugging face token
-HF_TOKEN="hf_..."
+# Example – copy to `.env`, fill with your token and keep that file un-tracked.
+HF_TOKEN="<YOUR_TOKEN_HERE>"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#add you hugging face token
HF_TOKEN="hf_..."
# Example – copy to `.env`, fill with your token and keep that file un-tracked.
HF_TOKEN="<YOUR_TOKEN_HERE>"

Comment on lines +1 to +6
streamlit
requests
markdownify
huggingface_hub
smolagents
python-dotenv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Missing runtime dependencies & pinning – app will crash in prod.

agents.py directly imports pandas and bs4 (BeautifulSoup) but they are not listed here. The app will raise ModuleNotFoundError in a clean environment.

 streamlit
 requests
 markdownify
 huggingface_hub
 smolagents
 python-dotenv
+pandas
+beautifulsoup4

Consider pinning versions (e.g., streamlit>=1.34,<2) for deterministic builds.


#!/bin/bash
# Search for pandas imports across the codebase
rg -n "import pandas" 
rg -n "from pandas" 

# Search for bs4 or BeautifulSoup usage
rg -n "import bs4" 
rg -n "from bs4" 
rg -n "BeautifulSoup"

🏁 Script executed:

#!/bin/bash
# Locate agents.py in the repository
file=$(fd -I agents.py)

echo "agents.py found at: $file"
# Search for pandas imports/usage
rg -n "import pandas" "$file"
rg -n "from pandas" "$file"
# Search for bs4/BeautifulSoup imports/usage
rg -n "import bs4" "$file"
rg -n "from bs4" "$file"
rg -n "BeautifulSoup" "$file"

Length of output: 1577


#!/bin/bash
# Search for pandas and BeautifulSoup usage in all Python files
rg -n -e "import pandas" -e "from pandas" -e "import bs4" -e "from bs4" -e "BeautifulSoup" --glob '*.py'

#!/bin/bash
# Search for pandas and BeautifulSoup usage across all Python files
rg -n -e "import pandas" \
      -e "from pandas" \
      -e "import bs4" \
      -e "from bs4" \
      -e "BeautifulSoup" \
      --glob "*.py"

🏁 Script executed:

#!/bin/bash
# Verify pandas & BeautifulSoup imports in the specific agents.py file
file="researchagent-smolagents/agents.py"

if [ -f "$file" ]; then
  echo "Scanning $file for dependencies…"
  rg -n "import pandas" "$file"
  rg -n "from pandas" "$file"
  rg -n "import bs4" "$file"
  rg -n "from bs4" "$file"
  rg -n "BeautifulSoup" "$file"
else
  echo "File not found: $file"
fi

Length of output: 693


Add missing dependencies and pin versions in requirements.txt

The file researchagent-smolagents/agents.py imports runtime libraries that aren’t listed in requirements.txt, which will cause ModuleNotFoundError in a clean environment:

• researchagent-smolagents/agents.py:3
import pandas as pd
• researchagent-smolagents/agents.py:110–111
from bs4 import BeautifulSoup / soup = BeautifulSoup(…)

Please update researchagent-smolagents/requirements.txt:

 streamlit
 requests
 markdownify
 huggingface_hub
 smolagents
 python-dotenv
+pandas
+beautifulsoup4

For deterministic builds, consider pinning versions. For example:

streamlit>=1.34,<2.0
pandas>=2.0,<3.0
beautifulsoup4>=4.12,<5.0

Comment on lines 88 to 92
# First request to get token
response = requests.get('https://html.duckduckgo.com/html/', headers=headers)

# Extract search parameters
search_params = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unused HTTP request wastes latency and bandwidth.

response = requests.get('https://html.duckduckgo.com/html/', …) is assigned but never used.
Delete the call or utilise the headers/cookies it should return.

-    # First request to get token
-    response = requests.get('https://html.duckduckgo.com/html/', headers=headers)
+    # NOTE: Removed pre-flight request – not required for DuckDuckGo HTML endpoint.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# First request to get token
response = requests.get('https://html.duckduckgo.com/html/', headers=headers)
# Extract search parameters
search_params = {
# NOTE: Removed pre-flight request – not required for DuckDuckGo HTML endpoint.
# Extract search parameters
search_params = {
🧰 Tools
🪛 Ruff (0.8.2)

89-89: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)

Comment on lines 166 to 186
def visit_webpage(url: str) -> str:
"""Visits a webpage and converts its content to markdown format.

Args:
url: The complete URL of the webpage to visit (e.g., 'https://example.com').
Must be a valid HTTP or HTTPS URL.

Returns:
str: The webpage content converted to Markdown format with the reference webpages links.
Returns an error message if the request fails.
"""
try:
response = requests.get(url)
response.raise_for_status()
markdown_content = markdownify(response.text).strip()
markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
return markdown_content
except RequestException as e:
return f"Error fetching the webpage: {str(e)}"
except Exception as e:
return f"An unexpected error occurred: {str(e)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add timeout & headers when fetching external pages to avoid hangs and blocks.

requests.get(url) without a timeout can freeze Streamlit workers; some sites also block default user-agents.

-    response = requests.get(url)
+    response = requests.get(
+        url,
+        timeout=15,
+        headers={"User-Agent": "Mozilla/5.0 (WebResearchBot/1.0)"}
+    )

Also consider limiting download size (e.g., via stream=True and iter_content) to mitigate huge pages.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def visit_webpage(url: str) -> str:
"""Visits a webpage and converts its content to markdown format.
Args:
url: The complete URL of the webpage to visit (e.g., 'https://example.com').
Must be a valid HTTP or HTTPS URL.
Returns:
str: The webpage content converted to Markdown format with the reference webpages links.
Returns an error message if the request fails.
"""
try:
response = requests.get(url)
response.raise_for_status()
markdown_content = markdownify(response.text).strip()
markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
return markdown_content
except RequestException as e:
return f"Error fetching the webpage: {str(e)}"
except Exception as e:
return f"An unexpected error occurred: {str(e)}"
def visit_webpage(url: str) -> str:
"""Visits a webpage and converts its content to markdown format.
Args:
url: The complete URL of the webpage to visit (e.g., 'https://example.com').
Must be a valid HTTP or HTTPS URL.
Returns:
str: The webpage content converted to Markdown format with the reference webpages links.
Returns an error message if the request fails.
"""
try:
response = requests.get(
url,
timeout=15,
headers={"User-Agent": "Mozilla/5.0 (WebResearchBot/1.0)"}
)
response.raise_for_status()
markdown_content = markdownify(response.text).strip()
markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
return markdown_content
except RequestException as e:
return f"Error fetching the webpage: {str(e)}"
except Exception as e:
return f"An unexpected error occurred: {str(e)}"


## Demo Video

https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30
Copy link
Contributor

@gmacario gmacario May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This links results in Error 404 (Page not found) to me

Copy link
Contributor

codegen-sh bot commented May 20, 2025

PR Review: Add AI research agent using smolagents

Thank you for this contribution! I've reviewed the PR which adds a new AI research agent using the smolagents framework. Here's my feedback:

Strengths

  • Well-structured application: The Streamlit app is well-organized with clear separation of concerns between UI, search functionality, and agent orchestration.
  • Comprehensive features: The implementation includes enhanced search, content extraction, recency scoring, and export capabilities.
  • User-friendly interface: The multi-tab display (Results, Sources, Analysis, Logs) provides a good user experience.
  • Error handling: The code includes proper error handling for web requests and agent operations.

Suggested Improvements

1. Security & Best Practices

  • Environment Variables: As CodeRabbit suggested, rename .env to .env.example and use a placeholder for the token. Add .env to .gitignore to prevent accidental token commits.
  • Web Requests: Add timeouts and proper user-agent headers to the visit_webpage function to prevent hanging on slow sites and avoid being blocked.
  • Unused Code: Remove the unused HTTP request in the enhanced_search function.

2. Dependencies

  • Missing Requirements: Add pandas and beautifulsoup4 to requirements.txt as they are imported but not listed.
  • Version Pinning: Consider pinning dependency versions for more deterministic builds, e.g., streamlit>=1.34,<2.0.

3. Documentation

  • Demo Link: The demo video link in the README appears to be broken. Please update with a working link.
  • Setup Instructions: Consider adding more detailed setup instructions in the README, including:
    • How to create and activate a virtual environment
    • How to run the application (streamlit run agents.py)
    • Example queries to try

4. Performance Considerations

  • Large Page Handling: Consider limiting download size for web pages (e.g., via stream=True and iter_content) to handle very large pages gracefully.
  • Caching Strategy: The current caching implementation is good, but you might want to add TTL (time-to-live) for cached content.

Minor Issues

  • There's a typo in the README title: "AI Reseach Agent" should be "AI Research Agent"

Overall, this is a valuable addition to the repository. With the suggested improvements, it will be even more robust and user-friendly. Great work!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
researchagent-smolagents/agents.py (1)

258-269: ⚠️ Potential issue

Incomplete implementation of search functionality

The code ends abruptly with an incomplete block after setting up the tabs. The search functionality appears to be incomplete.

Complete the search functionality by adding the following code:

                result_tab, sources_tab, analysis_tab, logs_tab = st.tabs(
                    ["📝 Results", "🔗 Sources", "📊 Analysis", "📋 Logs"]
                )
+                
+                # Run search with the manager agent
+                result = manager_agent.run(
+                    f"""
+                    Research query: {query}
+                    Time period: {time_period}
+                    Search depth: {search_depth}
+                    Max results: {max_results}
+                    
+                    Please provide comprehensive research on this topic, including:
+                    1. Key facts and information
+                    2. Different perspectives or viewpoints
+                    3. Recent developments or updates
+                    4. A summary of your findings
+                    """
+                )
+                
+                # Capture logs from output
+                for timestamp, log in zip(output.timestamps, output.logs):
+                    st.session_state.log_container.append((timestamp, log))
+                
+                # Display results in tabs
+                with result_tab:
+                    st.markdown(format_agent_response(result))
+                
+                with sources_tab:
+                    if isinstance(result, dict) and 'Results' in result:
+                        for idx, source in enumerate(result['Results'], 1):
+                            st.markdown(f"### Source {idx}: {source.get('title', 'No Title')}")
+                            st.markdown(f"**URL:** {source.get('url', 'No URL')}")
+                            st.markdown(f"**Recency Score:** {source.get('recency_score', 0.0):.2f}")
+                            with st.expander("View Source Content", expanded=False):
+                                st.markdown(fetch_webpage_content(source['url'])[:5000] + "...")
+                
+                with analysis_tab:
+                    if isinstance(result, dict) and 'thoughts' in result:
+                        st.markdown(result['thoughts'])
+                    else:
+                        st.info("No detailed analysis available.")
+                
+                with logs_tab:
+                    for timestamp, log in st.session_state.log_container:
+                        st.text(f"[{timestamp}] {log}")
+        
+        except Exception as e:
+            st.error(f"An error occurred during research: {str(e)}")
🧰 Tools
🪛 Ruff (0.11.9)

269-269: SyntaxError: Expected except or finally after try block

♻️ Duplicate comments (2)
researchagent-smolagents/agents.py (2)

68-68: Remove unused HTTP request

This request is assigned but never used. It's simply wasting bandwidth and adding latency.

-        response = requests.get('https://html.duckduckgo.com/html/', headers=headers)

128-138: Add timeout & headers when fetching external pages to avoid hangs and blocks

requests.get(url) without a timeout can freeze Streamlit workers; some sites also block default user-agents.

-        response = requests.get(url)
+        response = requests.get(
+            url,
+            timeout=15,
+            headers={"User-Agent": "Mozilla/5.0 (WebResearchBot/1.0)"}
+        )

Also consider limiting download size (e.g., via stream=True and iter_content) to mitigate huge pages.

🧹 Nitpick comments (2)
researchagent-smolagents/agents.py (1)

211-214: Webpage content fetching doesn't match cache key pattern

The fetch_webpage_content function is decorated with @st.cache_data, but it doesn't properly cache results based on changes in input parameters. The function will be recalled unnecessarily when the same URL is visited multiple times with different query parameters.

@st.cache_data
-def fetch_webpage_content(url):
+def fetch_webpage_content(url: str) -> str:
+    """Fetch and convert webpage content to markdown with caching"""
     return visit_webpage(url)
researchagent-smolagents/README.md (1)

6-6: Use proper Markdown link syntax instead of bare URL

The GitHub asset link should use proper Markdown link syntax for better readability and compatibility.

-https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30
+[View Demo](https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30)

Also note that this link results in a 404 error as mentioned in a previous review. Please verify the correct URL.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

6-6: Bare URL used
null

(MD034, no-bare-urls)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2a973f8 and acb193d.

📒 Files selected for processing (3)
  • researchagent-smolagents/.env.example (1 hunks)
  • researchagent-smolagents/README.md (1 hunks)
  • researchagent-smolagents/agents.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • researchagent-smolagents/.env.example
🧰 Additional context used
🪛 LanguageTool
researchagent-smolagents/README.md

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: ...e python application to demonstrate how agentic era is going to be. Used Smol Agents fr...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[style] ~2-~2: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...lication to demonstrate how agentic era is going to be. Used Smol Agents framework by huggi...

(GOING_TO_WILL)

🪛 markdownlint-cli2 (0.17.2)
researchagent-smolagents/README.md

6-6: Bare URL used
null

(MD034, no-bare-urls)

Comment on lines +106 to +108
return {
'Results': results[:10],
'meta': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Respect user's max_results setting

The hard-coded limit of 10 results doesn't respect the user's max_results setting from the sidebar.

-                'Results': results[:10],
+                'Results': results[:max_results],

However, this would require passing max_results as a parameter to the enhanced_search function. Consider refactoring to make this function respect user preferences.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In researchagent-smolagents/agents.py around lines 106 to 108, the function
currently limits results to 10 regardless of the user's max_results setting.
Modify the function signature to accept a max_results parameter and replace the
hard-coded 10 with this parameter to respect user preferences. Ensure that when
calling enhanced_search, the max_results value from the user settings is passed
accordingly.

Comment on lines +70 to +77
search_params = {
'q': time_keywords,
's': '0',
'dc': '20',
'v': 'l',
'o': 'json',
'api': '/d.js',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Time period parameter is accepted but not used

The function accepts a time_period parameter but doesn't use it in the search parameters. Consider adding it to the search parameters to respect user's time period preference.

        search_params = {
            'q': time_keywords,
            's': '0',
            'dc': '20',
+           'df': time_period,
            'v': 'l',
            'o': 'json',
            'api': '/d.js',
        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
search_params = {
'q': time_keywords,
's': '0',
'dc': '20',
'v': 'l',
'o': 'json',
'api': '/d.js',
}
search_params = {
'q': time_keywords,
's': '0',
'dc': '20',
'df': time_period,
'v': 'l',
'o': 'json',
'api': '/d.js',
}
🤖 Prompt for AI Agents
In researchagent-smolagents/agents.py around lines 70 to 77, the time_period
parameter is accepted by the function but not included in the search_params
dictionary. To fix this, add the time_period value to the search_params with the
appropriate key expected by the API, ensuring the search respects the user's
time period preference.

Comment on lines +1 to +11
# AI Research Agent
A simple python application to demonstrate how agentic era is going to be. Used Smol Agents framework by hugging face.

## Demo Video

https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30

## Task Performed
- AI agent to grasp information on the topic given.
- AI agent to scrape the information and collect the necessary information about the topic.
- AI agent to summarize the whole topic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance README with setup instructions and usage examples

The README lacks critical information about installation, setup, and usage. Consider adding:

  1. Installation instructions (pip install requirements)
  2. Setup instructions (.env file configuration)
  3. Usage examples (how to run the Streamlit app)
  4. Dependencies and requirements
  5. Fix grammar in the introduction

Here's a suggested enhancement:

 # AI Research Agent
-A simple python application to demonstrate how agentic era is going to be. Used Smol Agents framework by hugging face.
+A simple Python application demonstrating the agentic era using the Smol Agents framework by Hugging Face.
 
 ## Demo Video
 
-https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30
+[View Demo](https://github.com/user-attachments/assets/1f876f9c-3dff-4548-b14c-bb6439eede30)
 
-## Task Performed
+## Tasks Performed
 - AI agent to grasp information on the topic given.
 - AI agent to scrape the information and collect the necessary information about the topic.
 - AI agent to summarize the whole topic.
+
+## Installation
+
+```bash
+pip install -r requirements.txt
+```
+
+## Setup
+
+1. Create a `.env` file in the project root directory
+2. Add your Hugging Face API token to the `.env` file:
+   ```
+   HF_TOKEN=your_huggingface_token_here
+   ```
+
+## Usage
+
+Run the Streamlit app:
+
+```bash
+streamlit run agents.py
+```
+
+Then open your browser to http://localhost:8501
🧰 Tools
🪛 LanguageTool

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: ...e python application to demonstrate how agentic era is going to be. Used Smol Agents fr...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[style] ~2-~2: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...lication to demonstrate how agentic era is going to be. Used Smol Agents framework by huggi...

(GOING_TO_WILL)

🪛 markdownlint-cli2 (0.17.2)

6-6: Bare URL used
null

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In researchagent-smolagents/README.md lines 1 to 11, the README is missing
essential setup and usage information. Add installation instructions for
dependencies using pip install -r requirements.txt, provide setup steps for
creating a .env file with the Hugging Face API token, include usage instructions
on how to run the Streamlit app with streamlit run agents.py, and mention the
URL to access the app. Also, correct grammar issues in the introduction for
clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants