A powerful Model Context Protocol (MCP) server for web searching using Puppeteer and SQLite. No API keys required!
- Multiple search engines: DuckDuckGo, Bing, Google
- Smart filtering: Site-specific, file type, date range
- Intelligent caching: 24-hour result caching for performance
- Full page content: Extract text, links, images, or everything
- Smart caching: Avoids re-scraping the same URLs (7-day cache)
- Configurable output: Control content length and extraction type
- Bulk search: Search multiple queries simultaneously
- Batch processing: Efficient handling of multiple requests
- Error resilience: Continues processing even if some queries fail
- Search statistics: Track usage patterns and popular queries
- Engine analytics: Monitor which search engines are used most
- Historical data: Analyze trends over configurable time periods
- Multiple formats: JSON and CSV export
- Flexible filtering: Export specific queries or date ranges
- Ready-to-use data: Properly formatted for analysis
- Node.js (v16 or higher)
- npm or yarn
-
Clone the repository
git clone https://github.com/yourusername/mcp-web-search.git cd mcp-web-search
-
Install dependencies
npm install
-
Test the server
node index.js
Add this to your MCP configuration file (.kiro/settings/mcp.json
for Kiro IDE):
{
"mcpServers": {
"web-search": {
"command": "node",
"args": ["./path/to/mcp-web-search/index.js"],
"env": {},
"disabled": false,
"autoApprove": [
"web_search",
"extract_content",
"bulk_search",
"search_analytics",
"export_results",
"clear_cache"
]
}
}
}
Search the web with advanced filtering options.
Parameters:
query
(required): Search querymax_results
(optional): Maximum results to return (default: 10)search_engine
(optional): Engine to use -duckduckgo
,bing
,google
(default: duckduckgo)site_filter
(optional): Filter to specific domain (e.g., "github.com")file_type
(optional): Filter by file type (e.g., "pdf", "doc")date_range
(optional): Filter by date -day
,week
,month
,year
use_cache
(optional): Use cached results (default: true)
Extract full content from a specific URL.
Parameters:
url
(required): URL to extract content fromextract_type
(optional): Type of content -text
,links
,images
,all
(default: text)max_length
(optional): Maximum content length (default: 5000)
Search multiple queries simultaneously.
Parameters:
queries
(required): Array of search queriesmax_results_per_query
(optional): Max results per query (default: 5)search_engine
(optional): Search engine to use (default: duckduckgo)
Get analytics about your search history.
Parameters:
days_back
(optional): Number of days to analyze (default: 30)
Export search results to JSON or CSV.
Parameters:
query
(optional): Specific query to export (exports all if not specified)format
(optional): Export format -json
orcsv
(default: json)days_back
(optional): Number of days back to export (default: 7)
Clear the search cache.
Parameters:
older_than_days
(optional): Clear entries older than N days (clears all if not specified)
// Search for JavaScript tutorials
{
"query": "JavaScript tutorials",
"max_results": 5
}
// Search GitHub for Python machine learning projects
{
"query": "machine learning",
"site_filter": "github.com",
"search_engine": "bing",
"max_results": 10
}
// Extract text content from a webpage
{
"url": "https://example.com/article",
"extract_type": "text",
"max_length": 2000
}
// Search multiple frameworks at once
{
"queries": ["React hooks", "Vue.js composition API", "Angular signals"],
"max_results_per_query": 3
}
The server uses SQLite to cache search results and extracted content:
- Search results: Cached for 24 hours
- Extracted content: Cached for 7 days
- Database file:
search_cache.db
(created automatically)
- Puppeteer: Web scraping and content extraction
- SQLite: Local caching and analytics storage
- MCP SDK: Model Context Protocol integration
- Node.js: Runtime environment
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Module not found errors: Ensure all dependencies are installed with
npm install
- Database errors: Delete
search_cache.db
to reset the database - Search engine blocking: Try different search engines or add delays between requests
- Path issues: Ensure the MCP configuration points to the correct file path
Set the environment variable for more verbose logging:
NODE_ENV=development node index.js
- Add more search engines (Yahoo, Startpage)
- Implement rate limiting and request throttling
- Add image search capabilities
- Support for custom user agents and headers
- Web scraping with JavaScript rendering
- Search result deduplication
- Advanced content parsing (markdown, structured data)
If you encounter any issues or have questions, please open an issue on GitHub.