## 1. Outline use cases and constraints

Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions

### Questions

- High level APIs needed
    - Who is going to use it?
    - How are they going to use it?
    - What does the system do?
    - What are the inputs and outputs of the system?

- Deciding on load of System
    - How many users are there?
    - How much data do we expect to handle?
    - Data size per read/write transaction?
        - e.g. Tweets are 140chars, so 140bytes
        - For database, usually assume something like 1KB per read/write
    - How many requests per second do we expect?
    - What is the expected read to write ratio?

- Deciding on other components (e.g. DNS, CDN)
    - Distribution across time, geography, entities?

### Sizing Cheatsheet

- Seconds in stretches of time
    - 1 day has 86400 seconds
    - 1 month has 2.5m seconds

- Sizing base 62 hash length `[a-zA-Z0-9]`. You shouldn't need hash of more than length 7 for most use cases to generate an ID
    - $ 62^2 \approx 3.8k$
    - $ 62^3 \approx 240k$
    - $ 62^4 \approx 15M$
    - $ 62^5 \approx 1B$
    - $ 62^6 \approx 57B$
    - $ 62^7 \approx 3.5T$
    - $ 62^8 \approx 218T$
    - $ 62^9 \approx 13Q$

- Important powers of 2
    - $2^0 = 1$
    - $2^7 = 128 \approx 100$
    - $2^{10} = 1024 \approx 1000$
    - $2^{13} = 8196 \approx 10,000$
    - $2^{17} = 131072 \approx 100,000$
    - $2^{20} = 1048567 \approx 1,000,000$
    - $2^{30} = 1073741824 \approx 1,000,000,000$
    - $2^{40} = 1099511627776 \approx 1,000,000,000,000$

- Sizing
    - 1 TB
        - 1 billion * 1KB (e.g. billion database writes) 
        - 1 million * 1MB (e.g. file upload)
        - 1000 * 1GB (e.g. big file upload)

- QPS
    - 1QPS is ~2.5m queries per month
    - 10QPS is ~25m queries per month
    - 40QPS is ~100m queries per month
    - 400QPS is ~1billion queries per month

## 2. Create a high level design

### Client

- All requests start from a client

### DNS

- Usually, you need a DNS (e.g. AWS Route53) if you have 

### Database

Outline a high level design with all important components

- SQL or NoSQL?
    - Do you have a fixed schema?
        - If you require a fixed schema ahead of time, use SQL 
    - Do you have lots of joins?
        - If you have lots of joins, use SQL
    - How will your DB scale? What needs to scale? Read or write?
        - Generally it is accepted that SQL must scale vertically, and NoSQL can scale horizontally and vertically.
        - Why is it harder for SQL to scale horizontally?
            - Enabling of joins in SQL makes it hard to distribute data over multiple shards (because it is hard to direct the access appropriately)
            - Requirement of ACID transactions for SQL makes it hard to distribute load over multiple machines

## 3. Design core components

- Dive into details for each core component.

## 4. Scale the design

- Identify and address bottlenecks, given the constraints.