A hands-on repository for learning and implementing practical distributed systems concepts, with a focus on real-world applications and concrete implementations.
An implementation of a distributed unique ID generation system, inspired by designs like Twitter's Snowflake.
- Scalable ID generation across multiple servers
- Time-based ordering of IDs
- Worker ID component to identify different generators
- Sequence numbers to handle multiple IDs generated in the same millisecond
- Handles clock drift scenarios
- Compact representation with efficient storage
A from-scratch implementation of a distributed key-value store demonstrating core distributed systems concepts:
- Data Partitioning: Uses consistent hashing to distribute data across nodes
- Replication: Implements a configurable replication factor for fault tolerance
- Consensus: Uses quorum-based reads and writes for data consistency
- Versioning: Implements vector clocks for tracking causality
- Conflict Detection: Identifies conflicting writes across replicas
- Gossip Protocol: Detects node failures and maintains membership
A scalable URL shortening service similar to TinyURL or Bit.ly with a focus on high performance and reliability.
- Base62 Encoding: Efficiently converts IDs to short URL strings using alphanumeric characters
- Distributed ID Generation: Snowflake-inspired approach for unique, sortable IDs
- Caching Layer: Redis implementation for high-performance URL retrieval
- Database Persistence: MySQL storage with connection pooling
- API Design: RESTful API with rate limiting for URL operations
- Analytics: Click tracking and basic statistics
- High Read Throughput: Optimized for the read-heavy workload typical of URL shorteners
A distributed web crawler system capable of efficiently discovering, downloading, and processing web content at scale.
- URL Frontier: Advanced URL queue management with priority-based scheduling
- Politeness Policy: Rate limiting per domain with robots.txt compliance
- DNS Caching: Optimized DNS resolver with TTL-based caching
- Content Deduplication: Detects and filters duplicate content
- Distributed Architecture: Scales horizontally across multiple crawler instances
- MongoDB Storage: Persistent storage of crawled URLs and metadata
- Redis Queuing: High-performance distributed queue implementation
- REST API: Control and monitor the crawler through RESTful endpoints
- Domain Filtering: Ability to restrict crawling to specific domains
- Metadata Extraction: Extracts page titles, descriptions, and other metadata
A highly scalable, real-time social media news feed system similar to platforms like Twitter, Facebook, or Instagram that efficiently processes and displays personalized content.
- Multi-tier Architecture: Separated data, cache, business logic, and API layers
- Fanout Service: Efficient content distribution using push/pull hybrid model
- Five-tier Caching: Specialized Redis caches for feeds, content, social graph, actions, and counters
- MongoDB Storage: Persistent document storage with efficient indexes
- Content Personalization: Customized feeds based on follow relationships
- Social Interactions: Support for like, comment, and share actions
- Optimistic UI Updates: Immediate feedback with background synchronization
- FastAPI Backend: High-performance, async REST API with dependency injection
- React Frontend: Modern UI with component-based architecture
- Celebrity Problem Handling: Special strategies for high-follower accounts
- RESTful API Design: Comprehensive endpoints for all social operations
This repository aims to provide:
- Concrete implementations of distributed systems concepts
- Practical examples that go beyond theory
- Code that can be run, modified, and extended for learning
- Realistic simulations of distributed system behaviors and failure scenarios
Each numbered directory is a standalone system design implementation with its own documentation:
- Each project directory contains the complete implementation of a distributed system concept
- Code is organized to clearly demonstrate architectural patterns and design decisions
- Implementations are modular and well-documented for learning purposes
Planned additions to this repository:
- Distributed rate limiter
- Distributed file system
- Distributed cache
- Distributed transaction processing
- Consensus algorithms implementation
- Load balancer design
- Content delivery network (CDN)
To add a new system design to this repository:
- Create a new numbered directory (e.g.,
6_rate_limiter
) - Implement your system with clear documentation
- Update this README to include your implementation