Built with β€οΈ by Tushar Singh
RAGCart AI is an autonomous, state-of-the-art Agentic RAG Shopping & Market Intelligence Assistant. Unlike standard static RAG systems that read pre-saved files, RAGCart actively searches the live web on demand, extracts raw technical specs and pricing using visual-layout LLM scrapers, indexes them into a local vector database, and generates highly accurate, side-by-side product comparison reports grounded in verified citations.
The core AI engine is built as an autonomous agent using LangChain and Groq's Llama 3.3 70B model, communicating with search and parsing endpoints via stdio Model Context Protocol (MCP) bridges.
RAGCart runs a highly synchronized, multi-step pipeline to answer user queries with 100% factual accuracy:
[ User Query ]
β
βΌ
[ LangChain Agent ]
β
βΌ
[ Firecrawl MCP Search ]
β
βΌ
[ Find Product URLs ]
β
βΌ
[ ScrapeGraph Extraction ]
β
βΌ
[ Chunking ] (RecursiveCharacterTextSplitter)
β
βΌ
[ Embeddings ] (all-MiniLM-L6-v2)
β
βΌ
[ ChromaDB Storage ] (Local Vector DB)
β
βΌ
[ Retriever ] (Vector Similarity Match)
β
βΌ
[ LLM Reasoning & Comparison ] (Llama 3.3 via Groq)
β
βΌ
[ Final Recommendation ] ("Based on my deep analysis...")
RAGCart implements LangChain's MultiServerMCPClient to bridge standard stdio transport protocols. To ensure native compatibility with Windows systems, the Firecrawl MCP discovery server is spun up using a robust cmd shell wrapper:
"command": "cmd",
"args": ["/c", "npx", "-y", "firecrawl-mcp"]Many open-source tool schemas contain complex regular expression validation filters. However, Groq's tool-calling engine throws validation failures when encountering JSON schemas containing pattern parameters. RAGCart's agent.py includes a custom recursive sanitizer that strips these constraints programmatically, ensuring 100% agent stability:
def remove_pattern(obj):
if isinstance(obj, dict):
if "pattern" in obj:
del obj["pattern"]
for key, value in obj.items():
remove_pattern(value)Instead of regex-based web scraping, RAGCart invokes ScrapeGraphAI's LLM-based layout-aware parsing pipeline. The system passes a structured data schema to extract clean specifications, pros, cons, ratings, and reviews directly into dynamic data structures:
- Automatically parses dynamic responses from the SDK checking for
result.data.json_datavs.result.data.results. - Formats list objects into Markdown tables and key-value blocks before embedding.
Every scraped block is indexed in ChromaDB alongside its source URL. When retrieving facts:
- Context chunks are fed to Llama 3.3 prefixed with their explicit source, e.g.,
--- Chunk 1 [Source: https://...] ---. - A strict system prompt protocol forces the LLM to cite only the authentic, verified review links directly from the retrieved context, completely eliminating hallucinated URLs.
The LangChain Agent has access to the following tool registry:
| Tool Identifier | Call Logic | Description | Arguments |
|---|---|---|---|
firecrawl_search |
MCP Bridge | Executes live Google search sweeps for review URLs. | query (string), limit (integer) |
scrape_tool |
Custom Logic | Extracts structured JSON specs using ScrapeGraphAI and indexes into Chroma. | query (string), url (string) |
retrieve_tool |
Custom Logic | Queries local Chroma vectors and returns the top 5 comparative documents. | query (string) |
RAGCart is packaged inside a premium dark-themed Streamlit user interface featuring:
- π€ Live LLM Dropdown Selector: Hot-swap between Llama 3.3 (70B), Llama 3.1 (8B), Mixtral (8x7B), or Gemma 2 (9B) directly from the sidebar. If you hit a Groq API Daily Token limit (Rate Limit 429 TPD), simply switch models to resume queries instantly with a fresh daily quota pool!
- π Hot-Swappable API Credentials: Enter your Groq or Smartscrape API keys directly in the sidebar. Clicking Save Keys binds them to
os.environ, flushes the cached agent state, and hot-reloads the application instantly. - π Live Vector DB Inspector: Displays the total document collection count inside ChromaDB in real-time.
- ποΈ One-Click Database Purge: Purge indexed chunks in a single click to start fresh queries.
- π Direct Similarity Tester: Input any query directly into the sidebar inspector to instantly view matching chunks and click their verified source links.
- π₯ Markdown Report Downloader: Export synthesized agent reports instantly as fully-formatted markdown files.
git clone https://github.com/tushar80rt/RAG_Cart.git
cd RAG_Cartpython -m venv venv
# On Windows:
venv\Scripts\activate
# On Linux/macOS:
source venv/bin/activatepip install -r requirements.txtCreate a .env file in the root directory:
GROQ_API_KEY=your_groq_api_key
SCRAPEGRAPH_API_KEY=your_scrapegraph_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_keyVerify mcp.json settings:
{
"mcpServers": {
"firecrawl-mcp": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "your_firecrawl_api_key"
}
}
}
}Launch the local interactive Streamlit web application:
streamlit run main.pyβββ assets/ # SVGs, assets, and base64 brand assets
βββ chroma_db/ # Local vector database persistent directories
βββ agent.py # LangChain Agent, custom tools, and MCP pipelines
βββ main.py # Streamlit frontend, UI styles, and session state
βββ mcp.json # MCP Server registry configuration
βββ requirements.txt # Python package dependencies
βββ README.md # Dynamic documentation
Distributed under the MIT License. See LICENSE for more information.
Developed with passion by Tushar Singh.
If you like this project, feel free to give it a βοΈ, fork the repository, or open an issue! Connect with me on GitHub to follow my work on agentic AI and advanced RAG pipelines.