Skip to content

A full-featured, crawler-based search engine built using Java Spring Boot and Next.js. It includes a multi-threaded web crawler, efficient indexer, TF-IDF and PageRank-based ranker, a query processor with phrase search and stemming, and a modern, responsive web interface.

Notifications You must be signed in to change notification settings

MohamedAbdelaiem/Lumos-Search-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Search Engine Project

Project Java Status

A comprehensive, crawler-based search engine developed for the Advanced Programming course at Cairo University. This project features all core components of a modern search engine, including web crawling, indexing, ranking with TF-IDF and PageRank, a robust query processor, and an interactive web interface.


Table of Contents


Video

Search Engine Demo

✨ Features

  • 🕷️ Web Crawler
    Multi-threaded crawler that collects 6,000+ HTML documents, respects robots.txt, and detects duplicates.

  • 📑 Indexer
    Efficient indexing for fast retrieval and incremental updates.

  • 🔍 Query Processor
    Supports stemming and phrase search (with quotation marks) for enhanced query matching.

  • ⭐ Ranker
    Combines TF-IDF relevance with PageRank popularity for optimal result ordering.

  • 🖥️ Web Interface
    Modern UI with search suggestions, paginated results, snippets, and timing metrics.


🏗️ System Architecture

  • Web Crawler:
    Starts from seed URLs, collects and normalizes documents, respects robots.txt, employs multi-threading for scalability.

  • Indexer:
    builds efficient data structures for quick query responses, supports incremental updates.

  • Query Processor:
    Handles preprocessing, stemming, and advanced query types (including phrase search).

  • Ranker:
    Calculates relevance using TF-IDF and popularity with PageRank, combining both for final ranking.

  • Web Interface:
    Responsive frontend for user interaction, displays results with titles, URLs, snippets, and supports pagination (10 results/page).


🛠️ Technologies Used

  • Backend: Java Spring Boot
  • Frontend: Next.js, TailwindCSS
  • Database: MongoDB
  • Build Tool: Gradle

🚦 Getting Started

Prerequisites

  • Java 11 or above
  • Gradle
  • MongoDB

Installation

  1. Clone the repository:

    git clone https://github.com/Hussein-Mohamed1/search-engine.git
    cd search-engine
  2. Install dependencies and build:

    gradle build
  3. Start MongoDB:

    • Ensure MongoDB is running locally (mongodb://localhost:27017 by default).
  4. Run the application:

    gradle run

💡 Usage

  1. Open your browser and navigate to the provided local address (e.g., http://localhost:8080).
  2. Enter your search query. Use quotation marks ("phrase") for exact phrase searches.
  3. Browse results, view snippets, and paginate through results.
  4. Refine your search using suggestions.

📊 Performance

  • Crawls and indexes 6,000+ web pages
  • Typical query response: ~0.5 seconds
  • Indexing speed: ~6000 pages in 45 seconds
  • Ranking accuracy comparable to basic commercial search engines

👥 Contributors

Avatar Name Role
Hussein Mohamed 🧠 Ranker
Tasneem Ahmed 🤖 Crawler
Mohamed Abdelaziem 🗂️ Indexer
Youssef Mohamed 🔍 Query Processor & Web Interface

📜 License

This project is for educational purposes at Cairo University's Computer Engineering Department.


📚 References


Feel free to modify this template further to match any new features or changes in your project!

About

A full-featured, crawler-based search engine built using Java Spring Boot and Next.js. It includes a multi-threaded web crawler, efficient indexer, TF-IDF and PageRank-based ranker, a query processor with phrase search and stemming, and a modern, responsive web interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 73.5%
  • JavaScript 24.9%
  • CSS 1.6%