# Concepts

- https://www.youtube.com/watch?v=-W9F__D3oY4
- https://web.archive.org/web/20221030091841/http://www.lecloud.net/tagged/scalability/chrono

## 1. Vertical/Horizontal Scaling

- Vertical scaling: Get a better machine with more RAM, processors, disk space etc
    - But there is a limit obviously, you are always going to be capped at some point. Either you run out of money, or you run into the tech frontier

- Horizontal scaling: Add MORE stuff (multiple databases, multiple compute instances etc)
    - Instead of getting 1 super good machine, get 99 lousy ones

## 2. Load balancing

- Now that we have multiple web servers (horizontal scaling), we need a way to distribute traffic
- Hence, load balancer. Expose the public IP of the load balancer, and switch all servers to private IP
- One way to load balance; round robin (BIND)

## 3. Shared Session State

- But imagine, as a user, if I'm routed to server 1, and my work gets cached as some session
- The next time I look up the resource, if I go to server 2, I may lose my work, because sessions are stored within the machine
- So to avoid this; we can simply factor out the "session" state that is connected to all servers. Now, shared session data exists across all servers

## 4. RAID

- But now that shared state is stored in one place, it is a single point of failure
- Instead of storing on harddisk, we can use RAID (redundant array of independent disks)
- Idea
    - You have 2 harddrives, and you stripe data across them
    - Every time your OS wants to save a file, you alternate writing a bit to each drive Which effectively doubles the rate at which I can write my file
    - Alternatively, I mirror from 1 disk to the other

- So for a given amount of space, some portion is sacrificed for redundancy

## 5. Shared Storage Tech

- Shared storage solutions include network attached storage (NAS), storage area networks (SAN), and redundant arrays of independent disks (RAID), as well as dedicated storage servers and cloud storage.

## 6. Database Replication

- To avoid losing information, you may also wish to replicate your data 

## 8. Session Affinity

- When you enable session affinity, your load balancer directs all requests from a particular end user to a specific origin server

## 9. In-memory Caching

- Store data in computer's RAM instead of writing to disk
- Faster access, but may encounter cold start issues

## 10. Data Replication - Active/Passive

- Active
    - Process the same request in every replica

- Passive
    - copies the contents from one database, known as the master database, to the other, known as the replica

## 11. Partitioning

- Break data into specified groups

## 12. Data Center Redundancy

- Self explanatory

## 14. Scalability for dummies: Clones

- Public servers of a scalable web service are hidden behind a load balancer, which distributes user requests onto your application servers. 
    - e.g. If user 1 interacts with your service, he may be served at his first request by server 2, then with his second request by server 9 etc.

- Ideally, user should always get the same results of his request back regardless of what server he "landed on". 
    - First golden rule for scalability: every server contains exactly the same codebase and does not store any unique data

- What about user sessions? Shouldn't these be unique?
    - User sessions should be stored in a centralized data store accessible to all application servers. 
    - Either an external database or an external persistent cache, like Redis. 
    - Redis will have better performance than a database. 
    - External just means the data store does not reside on the application servers

- How can I do a deployment with multiple servers, to avoid having some servers serving old code, and some serving new? 
    - Orchestration of deployment is a solved issue

- Finally, create an image file from one of these servers (AWS calls this AMI - Amazon Machine Image). 
    - Use this AMI as a “super-clone” that all your new instances are based upon. 
    - Whenever you start a new instance/clone, just do an initial deployment of your latest code and you are ready!

## 15. Scalability for dummies: Database

- Now your servers can now horizontally scale and serve thousands of concurrent requests. But somewhere down the line, your application slows and breaks due to your DB, which is probably MySQL

- How can we scale our DB? 2 paths:
    1. Stick with MySQL and keep it running. Hire a database administrator (DBA), ask him to do master-slave replication (read from slaves, write to master) and upgrade your master server by adding RAM. 
        - This is not scalable. At some point you will start hunting for ways to optimise your DB server even more (sharding, denormalisation, SQL tuning)
        - Then it'll reach a point where keeping your DB running is going to cost you an obscene amount of $
        
    2. Denormalise right from the beginning, and include no more Joins in any database query.
        - Stay in MySQL and use it like a NoSQL database
        - Now you need to do joins in your business logic
        - But the sooner you do this, the less code you will have to change in the future.

## 16. Scalability for dummies: Cache

- With the above we can still end up with a slow DB. Even though the server side is fine, the user experience may be bad due to DB loan

- To improve performance, use an in-memory cache like `Memcached` or `Redis`, so repeated queries can simply be returned without computation

- 2 patterns of caching data
    - Cached DB queries, where you store the result data in cache, with the hashed query as the key
        - However, caches are not free, and you need to be aware of when the cache can be invalidated

    - Cached objects. Instead of caching only the data, cache the entire object fetching and constructing the data

## 17. Scalability for dummies: Asynchronism

- Example:

    - A user comes to your website and starts a very computing intensive task which would take several minutes to finish. 
    - frontend sends a job to a job queue and immediately signals back to the user: your job is in work, please continue to the browse the page. The job queue is constantly checked by a bunch of workers for new jobs. If there is a new job then the worker does the job and after some minutes sends a signal that the job was done. The frontend, which constantly checks for new “job is done” - signals, sees that the job was done and informs the user about it. 
