A production-ready distributed search engine built with Spring Boot, featuring sharded indexing, concurrent search, and TF-IDF ranking.
- REST API: Simple endpoints for indexing and searching documents
- Distributed Architecture: Sharded data across multiple nodes using consistent hashing
- Concurrent Search: Parallel query processing across shards with ExecutorService
- Persistence: File based serialization for data durability
- TF-IDF Scoring: Relevance based search result ranking
- Comprehensive Testing: Unit tests and integration tests included
- Load Testing: JMeter configuration for performance validation
- Java 17 or higher
- Maven 3.6+ (or use included wrapper)
# Option 1: Use the provided script
./start.sh
# Option 2: Direct Maven command
mvn spring-boot:run
# Option 3: Maven wrapper (if available)
./mvnw spring-boot:runThe application will start on http://localhost:8080
Index a document:
curl -X POST http://localhost:8080/api/index \
-H "Content-Type: application/json" \
-d '{"id": 1, "content": "Java programming tutorial"}'Search for documents:
curl "http://localhost:8080/api/search?q=java"Index a new document.
Request Body:
{
"id": 1,
"content": "Document content to be indexed"
}Response:
Document indexed successfully with ID: 1
Search for documents containing the query terms.
Parameters:
q: Search query string
Response:
{
"results": [
{
"id": 1,
"content": "Document content",
"score": 0.85
}
],
"totalResults": 1
}flowchart TD
U[Client API] -->|POST /index| IC[Index Controller]
U -->|GET /search?q=...| SC[Search Controller]
IC --> NM[Node Manager]
SC --> NM[Node Manager]
subgraph Distribution Layer
NM -->|assigns via consistent hashing| CH[Consistent Hashing]
CH --> N1[Shard: Node A]
CH --> N2[Shard: Node B]
end
subgraph Core Engine
N1 --> II1[Inverted Index A]
N2 --> II2[Inverted Index B]
II1 --> QP1[Query Processor A]
II2 --> QP2[Query Processor B]
QP1 --> TF1[TF-IDF Calculator A]
QP2 --> TF2[TF-IDF Calculator B]
end
subgraph Persistence
II1 --> PS[Persistence Service]
II2 --> PS
PS --> FS[File System]
end
%% Replication (future)
N1 -. replicate .-> N2
N2 -. replicate .-> N1
- IndexController: Handles document indexing via POST
/api/index - SearchController: Handles search queries via GET
/api/search
- NodeManager: Orchestrates document distribution and parallel search
- ConsistentHashing: Distributes documents across shards
- ExecutorService: Enables concurrent search across shards
- InvertedIndex: Term-to-document mapping with frequency counts
- QueryProcessor: Processes search queries and ranks results
- TFIDFCalculator: Computes relevance scores
- Document: Represents indexed documents
- PersistenceService: Handles serialization to/from disk
- Automatic save/load on startup and shutdown
./run-tests.shThe included integration tests verify:
- Document indexing and retrieval
- Search functionality with TF-IDF ranking
- API response formats
- Error handling
See load-testing/README.md for comprehensive performance testing instructions.
- Shards: 2 (NodeA, NodeB)
- Search Threads: 8 (configurable)
- Persistence: File-based serialization
- Throughput: 500-1000 requests/second
- Response Time: <100ms (95th percentile)
- Concurrency: 50+ simultaneous users
- Horizontal: Add more shards by calling
nodeManager.addNode(nodeId) - Vertical: Increase thread pool size in NodeManager constructor
- Memory: Tune JVM heap size with
-Xmxparameter - Persistence: Consider database backend for larger datasets
Create src/main/resources/application.properties:
server.port=8080
logging.level.com.searchengine=INFOFor production deployment:
java -Xmx2g -Xms1g -XX:+UseG1GC -jar distributed-search-engine.jarsrc/
├── main/java/com/searchengine/
│ ├── api/ # REST controllers
│ ├── config/ # Spring configuration
│ ├── core/ # Search engine core
│ ├── distributed/ # Distribution logic
│ ├── persistence/ # Data persistence
│ └── util/ # Utilities
└── test/java/com/searchengine/
├── core/ # Unit tests
└── api/ # Integration tests
- New Search Algorithms: Implement in
QueryProcessor - Additional Persistence: Extend
PersistenceService - Monitoring: Add metrics collection in controllers
- Security: Implement authentication in Spring Security
The NodeManager class supports two constructors:
// For Spring Boot (dependency injection)
@Autowired
NodeManager nodeManager; // Uses PersistenceService bean
// For testing/standalone usage
NodeManager manager = new NodeManager(); // Creates own PersistenceService
// Custom persistence configuration
PersistenceService customPersistence = new PersistenceService();
NodeManager manager = new NodeManager(customPersistence);Application won't start:
- Check Java version:
java --version - Verify port 8080 is available:
lsof -i :8080
Out of memory errors:
- Increase heap size:
-Xmx2g - Monitor GC with:
-XX:+PrintGC
Poor search performance:
- Increase thread pool size in NodeManager
- Check system CPU usage
- Consider adding more shards
Persistence errors:
- Check file permissions in
search-engine-data/directory - Verify disk space availability
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Run tests:
./run-tests.sh - Submit a pull request
This project is licensed under the MIT License.