Skip to content

vnscriptkid/system-design

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

system-design

1. Introduction

2. What Are Design Fundamentals?

  • DFs are building blocks to solve Design Interview questions
  • Just like you need to know Data Structure to solve Coding interview questions

3. Client—Server Model

  • Flow when user enters url on browser and press enter:
    • A req is sent to DNS server, a resolved IP address corresonding to address is sent back to user
    • User uses that IP address as a unique identifier of one entity on the internet, send a message to server (including src IP and dest IP)
    • Server receives req from client, look at the req to determine what user needs before sending back the resource to src IP

4. Network Protocols

  • HTTP (Hyper Text Transfer Protocol):
  • TCP (Transmission Control Protocol): top of IP, reliable (come in order + error checking), 3-way handshake before actual transmission
  • IP (Internet Protocol): used for routing purposes, header contains src IP && dest IP, limited size, no order

5. Storage

  • Database: stores data, supports 2 main operations: WRITE (Create, Update, Delete) && READ
  • 2 types:
    • In-memory: Fast retrieval, not persist (after shutdown)
    • Disk: Slow, do persist

6. Latency And Throughput

  • Latency: How long it takes for data to traverse between systems.
    • Specific systems may requires higher latency: online games
  • Throughput: How much work can be handled at a time
    • Context 1️⃣ 1Gbps connection
    • Context 2️⃣ Refers to amount of reqs that one server can handle at a time
  • How to increase throughput:
    • Pay more $$$
    • Add more servers

7. Availability

  • What is availability? Resistant to failure or fault-tolerant (server shutdown || db shutdown || ...)
  • Why is it important? You can lose customers if they can not access the service
  • How to measure availability? Uptime / Total Time (1 Year) ~ 99.999%
  • High Availability: More Nines
  • SLA/SLO (Service Level Agreement/Objective):
    • The Service Provider's promise about service's uptime.
    • They will take penalty if they fail to keep up with their words.
  • Evaluating a system's availability, must consider all services, what is the core service.
  • How to increase Availability? Redundancy
    • Add more servers to avoid single point of failure, having LB to coordinate traffic
    • Add more LBs
  • 2 types of Redundancy:
    • Passive: having multiple machines working at the same time, when one goes down, others will take care
    • Active: having one machine working at one time, when it goes down, one machine from the pool takes over

8. Caching

  • 🥅 Goal: Reduce latency
  • Definition: Stores data in a place different from the original one. Only cache data to be read.
  • Places to cache:
    • FE: static data, fetch once
    • BE: expensive computations, frequent calls to DB
    • DB: long-running queries that data does change frequently
  • 2 types:
    • Write-through: Write to both cache and DB, takes time but high-consistent
    • Write-back: Write to cache only, db will be updated later (intervals), high risk of losing data before db able to sync data
  • What types to choose: Depends on the resource and how important it is that resource is up-to-date
    • Comment on youtube: Requires high-accuracy
    • View Count: It's okay be stale for certain amount of time
  • Stale data:
    • One that is out-of-date (in cache), that is not in sync with latest one in db
    • Solution: Invalidate
  • Eviction policies: We don't want data to be in cache forever
    • LRU: Take the least recently used out
    • LFU: Take the least frequently used out
    • LIFO
    • FIFO

9. Proxies

  • 2 types
    • Reverse Proxy (nginx)
      • acts on behalf of server, DNS resolves to IP of reverse proxy
      • features: filters reqs, logging, caching html, Load Balancing
    • Forward Proxy: acts on behalf of client, like VPN

10. Load Balancers

  • Context: 1 server has limited throughput
  • Scaling:
    • Vertical
    • Horizontal: Add more servers => need for a LB
  • What is a LB:
    • Server or reverse proxy (acts on behalf of server)
    • Balance reqs between servers
  • Types:
    • Hardware: Dedicated machine
    • Software (our focus)
  • Server selection strategies:
    • Random: Naive, inconsistent, imbalance
    • Round-Robin: Select one by one alternately
    • Round-Robin Weighted: Put a weight on prioritized server (powerful one) so more traffic is directed for that one
    • Performance-based: Frequently do health check amongst servers, select one highly-performant
    • IP-based:
      • Hash ip to get an unique number x
      • x % numOfServers => index of server
      • Pros: 1 ip / 1 server, reqs from 1 user comes back to same server, so never miss the cache
      • Cons: In case we add or remove servers, mapping will change
  • Can use multiple selection strategies for one system.

11. Hashing

  • Context: Server selection algorithm of LB (ip based)
  • Problem with ordinary ip-based:
    • IP % numOfServers => index of server
    • As numberOfServers changes, the mapping between IP and Server will also change

1️⃣ Consistent Hashing

  • As numberOfServers changes (add servers || existing one goes down) most of mappings will not change
  • One server gets lot of load? Virtual servers

2️⃣ Rendezvous Hashing

  • Idea: Client_A_IP <=> [ 👉 ServerA, ServerB, ServerC ] (First Choice)

12. Relational Databases

  • Definition: table (row: data record, col: props), structured (adding data to db must comply with schema)
  • Powerful queries with SQL
  • ACID
    • Atomicity (transaction): transfer money = (subtract from A'account, add to B's account)
      • all succeed
      • all fail
    • Consistency: No stale data (>< eventual consistency)
    • Isolation (locking):
      • trasactions run in isolation from each other at the same time
      • but is sequential at the end
    • Durability: once transaction is committed, can't reverse (permanent effect)
  • Indexing:
    • Use auxiliary table to optimize read queries
    • Cons: Slow down write queries (must write to 2 tables)

13. Key-Value Stores

  • Characteristics: non-relational, no sql
  • Pros:
    • Flexible (no structure)
    • Simple
    • Fast (hashing)
  • Use cases:
    • Caching
    • Dynamic config
  • Variations:
    • Write to: disk || ram
    • consistency: strong || eventual

14. Replication

  • UC1: Increase availability
    • Main db handles queries, update sync to replicas
    • In case main db goes down, replicas take over
  • UC2: Increase latency
    • Have dbs at different regions, storing data for users coming from there
    • Users from region A might get replicated to region B's db asynchrously (eventual consistency - can't see immediate results)
  • UC3: Increase throughput
    • Horizontal scale: Clone more dbs (replicas) and have them all handle traffic
    • Cons: inefficient in case of huge db

15. Sharding

  • Idea: Split db
  • Pros: No duplication, increase throughput
  • How to split:
    • Name start with _
    • Users by regions
    • Consistent hashing
  • Determine which shard to go:
    • In server
    • In reverse-proxy

16. Leader Election

  • Context: Business logic is executed once amongst multiple servers
  • Example: Recurring payment
    • Flow: (1) 3rd service charge user's money (2) 3rd service sends a message to update db
    • We don't want to expose db for 3rd service => set up a server to listen for message
    • Scaling quesiton: What if it goes down => set up standby servers that can take over anytime and do business logic
    • There must be a way to communicate between servers: (1) who is leader now, (2) who will be leader if current leader goes down
    • Leader election: consensus algorithm
  • Tools
    • Zoo Keeper (Uber)
    • Etcd: key-value store, high availability, strong consistency

17. Peer-To-Peer Networks

  • Context: Sending huge files to a lot of clients
  • Problem: Low throughput in general, as clients have to wait for their turns to be served
  • Solution:
    • Break file into small chunks
    • Consider each client as peer, can send and receive chunks from others, not just from server
    • Have a tracker to coordinate traffic: decide who should talk to who
    • Use DHT: Distributed Hash Table

18. Polling And Streaming

  • Same: Data needs to be updated in a automatic way
  • Polling:
    • Client proactively sends reqs to server after certain interval
    • Cons: Data may become stale
    • Solution: Reduce interval time => Cons: Load overhead on server side
    • Use case: Don't need data to be updated regularly
  • Streaming:
    • Server proactively push messages to client on a long-lived connection
    • Data is always updated
    • Use case: Need for realtime data like trading app, chat app

19. Configuration

  • Def: Set of constants
  • Formats: json, yaml
  • Types
    • Static: Any change config requires new code deployment
    • Dynamic: Quick impact

20. Rate Limiting

  • Def: A threshold of how many operations can be executed over a certain period of time (exp: 2 reqs / 10s)
  • Why? DoS protection (not DDoS), reduce overhead for server
  • Limit based on factors:
    • User
    • IP
    • Regions
  • Scaling:
    • Put Rate Limiting service on a separate machine, using Redis to store data.

21. Logging And Monitoring

  • Why: Reproduce issue, debug
  • Formats: json, syslog
  • What to logs: error, reqs
  • Features: Visibility, logs -> metrics -> insights, Auto alerts when err reaches threshold
  • Variations: ggStackDriver, time series database

22. Publish/Subscribe Pattern

  • Context: Streaming
  • 4 entities:
    • Subscribers: Clients
    • Publishers: Servers
    • Topics (Channels): intermediary over which publishers push messages
    • Messages:
      • Idempotent ops: data could be sent > 1 times => in case publishers don't receive ack msg
  • Filtering
    • Topic-based
    • Content-based: Publishers decide whether or not to deliver messages based on message's content
  • Variations: google cloud pubsub, Apache kafka
  1. MapReduce
  2. Security And HTTPS
  3. API Design

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors