🚀 Search Engine Project

A comprehensive, crawler-based search engine developed for the Advanced Programming course at Cairo University. This project features all core components of a modern search engine, including web crawling, indexing, ranking with TF-IDF and PageRank, a robust query processor, and an interactive web interface.

Video

✨ Features

🕷️ Web Crawler
Multi-threaded crawler that collects 6,000+ HTML documents, respects robots.txt, and detects duplicates.
📑 Indexer
Efficient indexing for fast retrieval and incremental updates.
🔍 Query Processor
Supports stemming and phrase search (with quotation marks) for enhanced query matching.
⭐ Ranker
Combines TF-IDF relevance with PageRank popularity for optimal result ordering.
🖥️ Web Interface
Modern UI with search suggestions, paginated results, snippets, and timing metrics.

🏗️ System Architecture

Web Crawler:
Starts from seed URLs, collects and normalizes documents, respects robots.txt, employs multi-threading for scalability.
Indexer:
builds efficient data structures for quick query responses, supports incremental updates.
Query Processor:
Handles preprocessing, stemming, and advanced query types (including phrase search).
Ranker:
Calculates relevance using TF-IDF and popularity with PageRank, combining both for final ranking.
Web Interface:
Responsive frontend for user interaction, displays results with titles, URLs, snippets, and supports pagination (10 results/page).

🛠️ Technologies Used

Backend: Java Spring Boot
Frontend: Next.js, TailwindCSS
Database: MongoDB
Build Tool: Gradle

🚦 Getting Started

Prerequisites

Java 11 or above
Gradle
MongoDB

Installation

Clone the repository:

git clone https://github.com/Hussein-Mohamed1/search-engine.git
cd search-engine

Install dependencies and build:
```
gradle build
```
Start MongoDB:
- Ensure MongoDB is running locally (mongodb://localhost:27017 by default).
Run the application:
```
gradle run
```

💡 Usage

Open your browser and navigate to the provided local address (e.g., http://localhost:8080).
Enter your search query. Use quotation marks ("phrase") for exact phrase searches.
Browse results, view snippets, and paginate through results.
Refine your search using suggestions.

📊 Performance

Crawls and indexes 6,000+ web pages
Typical query response: ~0.5 seconds
Indexing speed: ~6000 pages in 45 seconds
Ranking accuracy comparable to basic commercial search engines

👥 Contributors

Avatar	Name	Role
	Hussein Mohamed	🧠 Ranker
	Tasneem Ahmed	🤖 Crawler
	Mohamed Abdelaziem	🗂️ Indexer
	Youssef Mohamed	🔍 Query Processor & Web Interface

📜 License

This project is for educational purposes at Cairo University's Computer Engineering Department.

📚 References

Feel free to modify this template further to match any new features or changes in your project!

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
assests		assests
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
crawler_state.ser		crawler_state.ser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Search Engine Project

Table of Contents

Video

✨ Features

🏗️ System Architecture

🛠️ Technologies Used

🚦 Getting Started

Prerequisites

Installation

💡 Usage

📊 Performance

👥 Contributors

📜 License

📚 References

About

Uh oh!

Releases

Packages

Languages

MohamedAbdelaiem/Lumos-Search-engine

Folders and files

Latest commit

History

Repository files navigation

🚀 Search Engine Project

Table of Contents

Video

✨ Features

🏗️ System Architecture

🛠️ Technologies Used

🚦 Getting Started

Prerequisites

Installation

💡 Usage

📊 Performance

👥 Contributors

📜 License

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages