# Memcache
https://memcached.org/about

Memcachee is an in-memory Key-Value pair data store
- Put a value with a key
- Get a value with a key

Basic building block for a distributed key-value store
- Trillions of items
- Billiions of requests/second

Network attached in-memory hash table
- Supports LRU based eviction

Memcached
- Memcached is a caching system. It supports very high performance in a distrubuted system. It helps in reducing the load on a backedn database system.

<img src="Memcache/Memcached-Architecture.png" height="30%" width="30%">

- Memcached is a `fast access short term memory` solution for an application.

## Memcached Under the Hood

Memcached is a memory based caching mechanism and build using C language.

Internally, Memcached uses the `least recently used cache` mechanism. After a specific amount of time, the items in memory get expired.

<img src="Memcache/Memcache-LRU.png" height="30%" width="30%">

### Memcached Performance
- Memcached aims for implementation most of the data access operations in O(1) scale.
- Queries are targeted to run in less than 1ms
- There are high end Memcached servers that can serve millions of keys per second.
- Memcached servers are disconnected from each other. Tere is no overhead of data synchronization, broadcasting, cross talk or replication. this helps in providing high performance throughput for read operation.


- The implementation logic of Memcached is distributed in client as well as server.
- Clients know which server to reach out for which data.
- Servers manage the logic of storing and evicting data from the cache.

### Key Value Store
- In Memcached, data is stored as a key value pair.
- The stored data can be of any type.
- The only thing that the server expects is that data should be pre-serialized.

|key|value|
|--|--|
|firstName|Bugs|
|lastName|Bunny|
|location|Earth|

#### Expiration
Data items in Memcached contain a key, raw data, expiration time and some operational flags.

#### Memcache has a fixed amount of Memory
If you store data when memory is full, data will get evicted.
- Usually data with the oldest timestamp will get evicted

## Use Cases for Memcache
Caching
- Datastore query results
- User authentication token and session data
- APIs call or other computation results

Share Data Cross App Instances

## Why do we need Memcache?
- Improve Application Performance
- Reduce Application Cost

## General Memcache Usage Pattern
Coordinate data *read* with Datastore:
- Check if Memcache value exists
- - if it does, displays/uses cached value directly; otherwise
- - fetch t he value form Datastore and write the value to Memcache

Coordinate data *write* with Datastore:
- Invalidate the Memcache value 
- - for this specific entry; or
- - entire Memcache
- Write the value to Datastore
- - Optionally, update the Memcache entry

### Batch Operations
- `getAll()`, `putAll()`, `deleteAll()`
- - A single read or write operation for multiple memcache entries

### Atomic Operations
- `increment(key, delta)`, `incrementAll(...)`,
- - Provide atomic increment of numeric value(s)

- `getIdentifiable()`, `putIfUntouched()`
- - A mechanism to update a value consistently by concurrent requests

### Other Features
- Asynchronous calls: Provides a mechanism to make a *non-blocking* calll for memcache operations
- Namespace: Logically separates data layers for different application purposes (like multi tenancy) acrosss many GAE services such as Datastore, Memcache, Task Queue etc...

## Cons with using Memcache

### Memcache is Volitile
Entries can be evicted anytime by various reasons:
- entry reaches expiration
- entry is evicted because memecache memory is full
- memcache server fails

Tip
- It's important to handle cache-miss gracefully!
- Implements write-though logic by backing memcache with datastore in your application!

### Memcache is not Transactional
eg. you have 100, you perform 2 subtraction operations.
- Subtract 20
- Subtract 30 before previous operation is finished

The value at the end is 70

Tip
- User `getIdentifiable()` and `putIfUntouched(...)` for optimistic locking

This will cause stale data
#### Solution (Leases)
Extend memcache protocol with "leases"
- Return and attach a lease-id with every miss
- Lease-id is invalidated inside server on a delete
- Disallow set if the lease-id is invalid at the server

### Memcache is A Limited Resource
My Application Does NOT Have Enough Memcache!

Tips
- Only need to cache what is useful and necessary!
- Your application should function without memcache!

### Thundering Herds Provlem with Look-Aside caching
If a large amount of read request are looking in memcache for a value that does not exist in memcache, the servers will look for the value in database. But database cannot handle large read loads.

#### Solution
Memcache server arbitrates access to atabase
- Small extension of leases

Clients given a choice of usinga slightly stal value or waiting 

# Memcached on Java
Spy Memcached (Memcached Java Client), to connect Java application to Memcached

api docs for spy memcache

https://dustin.github.io/java-memcached-client/apidocs/

sample code

In [None]:
        MemcachedClient c=new MemcachedClient(
                new InetSocketAddress("hostname", portNum));

        // Store a value (async) for one hour
        c.set("someKey", 3600, someObject);
        // Retrieve a value.
        Object myObject=c.get("someKey")

## Memcache at Facebook
### Infrastructure Requirements for Facebook
1. Near real-time communication
1. Aggregate content on-the-fly from multiple sources
1. Be able to access and update very popular shared content
1. Scale to process millions of user requests per second

### Design Requirements
Support a very heavy read load
- Over 1 billion reads / second
- Insulate backend services from high read rates

Geographically Distributed

Support a constantly evolving product
- System must be flexible enough to support a variety of use cases
- Support rapid deployment of new features

Persistence handled outside the system
- Support mechanisms to refill after updates

### Many memcache servers in one cluster
Items are distributed acrosss memcache servers by using consistent hashing on the key
- Individual items are reaely accessed very frequently over replication doens't make sense
All web servers talk to all memcache servers
- Accessing 100s of memcache serviers to process a user' equest is common

<img src="Memcache/Memcache-Facebook-many-servers-one-cluster.png" height="20%" width="20%">

#### Problem Incast congestion
Many simultaneous response overwhelm the client servers making the request

<img src="Memcache/Memcache-Facebook-many-servers-one-cluster-incastcongestion.png" height="20%" width="20%">

Solution:<br>
Limit the number of outstanding requests with a sliding window
- Larger windows cause result in more congestion
- Smaller windows result in more orund trips to the network

### Many memcache servers in multiple clusters
All-to-all limits horizontal scaling

Multiple memcache clusters front one DB installation
- have to keep the caches consistent
- Have to manager over-replication of data

#### Databases invalidate caches
<img src="Memcache/Memcache-Facebook-databases-invalidate-caches.png" height="30%" width="30%">
Cached data must be invalidated after database updates

Solution: Tail the mysql commit log and issue deletes based on transactions that have been committed
- Allows caches to be resynchronized in the event of a problem

Invalidation pipeline problem: Too many packets
- Aggregating deletes reduces packet rate by 18x
- Makes configuration management easier
- Each stage buffers deletes in case downstream component is down

Putting it all together
1. Single-front-end cluster
- - Read heavy workload
- - Wide fanout
- - Handling failures
2. Multiple front-end clusters
- - Controlling data replication
- - Data consistency
3. Multiple Regions
- - Data consistency

Lessons Learned
- Push complexity into the client whenever possible
- Operational efficiency is as important as performance
- Separating cache and persistent store allows them to be scaled independently