A comprehensive, crawler-based search engine developed for the Advanced Programming course at Cairo University. This project features all core components of a modern search engine, including web crawling, indexing, ranking with TF-IDF and PageRank, a robust query processor, and an interactive web interface.
- Video
- Features
- System Architecture
- Technologies Used
- Getting Started
- Usage
- Performance
- Contributors
- License
- References
-
🕷️ Web Crawler
Multi-threaded crawler that collects 6,000+ HTML documents, respects robots.txt, and detects duplicates. -
📑 Indexer
Efficient indexing for fast retrieval and incremental updates. -
🔍 Query Processor
Supports stemming and phrase search (with quotation marks) for enhanced query matching. -
⭐ Ranker
Combines TF-IDF relevance with PageRank popularity for optimal result ordering. -
🖥️ Web Interface
Modern UI with search suggestions, paginated results, snippets, and timing metrics.
-
Web Crawler:
Starts from seed URLs, collects and normalizes documents, respects robots.txt, employs multi-threading for scalability. -
Indexer:
builds efficient data structures for quick query responses, supports incremental updates. -
Query Processor:
Handles preprocessing, stemming, and advanced query types (including phrase search). -
Ranker:
Calculates relevance using TF-IDF and popularity with PageRank, combining both for final ranking. -
Web Interface:
Responsive frontend for user interaction, displays results with titles, URLs, snippets, and supports pagination (10 results/page).
- Backend: Java Spring Boot
- Frontend: Next.js, TailwindCSS
- Database: MongoDB
- Build Tool: Gradle
- Java 11 or above
- Gradle
- MongoDB
-
Clone the repository:
git clone https://github.com/Hussein-Mohamed1/search-engine.git cd search-engine
-
Install dependencies and build:
gradle build
-
Start MongoDB:
- Ensure MongoDB is running locally (
mongodb://localhost:27017
by default).
- Ensure MongoDB is running locally (
-
Run the application:
gradle run
- Open your browser and navigate to the provided local address (e.g.,
http://localhost:8080
). - Enter your search query. Use quotation marks (
"phrase"
) for exact phrase searches. - Browse results, view snippets, and paginate through results.
- Refine your search using suggestions.
- Crawls and indexes 6,000+ web pages
- Typical query response: ~0.5 seconds
- Indexing speed: ~6000 pages in 45 seconds
- Ranking accuracy comparable to basic commercial search engines
Avatar | Name | Role |
---|---|---|
![]() |
Hussein Mohamed | 🧠 Ranker |
![]() |
Tasneem Ahmed | 🤖 Crawler |
![]() |
Mohamed Abdelaziem | 🗂️ Indexer |
![]() |
Youssef Mohamed | 🔍 Query Processor & Web Interface |
This project is for educational purposes at Cairo University's Computer Engineering Department.
Feel free to modify this template further to match any new features or changes in your project!