# Storage & Databases
We're going to use a customer loyalty card analogy to talk about databases.

## Instance Stores & Elastic Block Store (Amazon EBS)
We're going to focus on "block level storage" for a bit. This basically just refers to hard drive space, where the storage is discretized into blocks. On aws, there are a few different types of storage.

- Instance Store Volumes - this is storage that's attached to your EC2 host, and it does not persist. So if you stop or terminate the EC2 instance, you lose data stored here. If you start your instance up again, you have no guarantee that you'll get the same host, so this is not for long-term storage.
- Amazon Elastic Block Store (EBS) - these are like proper volumes in docker; standalone entities that connect to your instances, so you can store data long term.

EBS also does incremental snapshots.

## Amazon Simple Storage Service (S3)
Data is stored as objects, and objects are stored in buckets. Max object size is 5TB.

- S3 standard
    - 11 9's of durability. 99.999999999% probability of existing after 1 year
    - S3 static website hosting
- S3 standard - Infrequent Access (S3 Standard-IA)
    - when you require rapid access, but infrequent access
- S3 Glacier Flexible Retrieval
    - when you don't need rapid access, many year long term storage
    - you can create vaults and populate them with archives. An S3 Glacier Vault Lock Policy can help you meet compliance requirements by locking the vault, or implementing a write once/read many (WORM) policy.
    - 3 options for retrieval, ranging from minutes to hours
    - Can upload to Glacier directly, or use lifecycle policies
        - Lifecycle policies can move data automatically between tiers
There are some other storage tools that they mentioned, but didn't delve into.
- S3 One Zone-IA - cheaper than Standard-IA, less redundancy, because it's only in 1 Availability Zone. Good if you can reproduce your data quickly (like with a DAG?)
- S3 Intelligent-Tiering - an automated policy that you can pay for per object to watch the object and shuffle it between tiers based on access frequency
- S3 Glacier Instant Retrieval - archival, deep storage, but can read within milliseconds
- S3 Glacier Deep Archive - lowest cost, archival, retrieve within 12-48 hours
- AWS Outposts - where they install a mini-region on site

## Comparing EBS & S3
- EBS
    - up to 16 TiB (tibibytes?)
    - persistent
    - solid state by default
    - HDD options
- S3
    - unlimited storage
    - individual objects up to 5 TB
    - specialize in write once, read many
    - 11 9's durable

Use cases:
- you run a website where users upload their picture, and you show them every animal that looks like them.
    - millions of photos, indexed, searchable
    - S3 is best. It's web enabled, regionally distributed, cheaper than EBS here, and serverless
- you have an 80GB video file that you're editing. We need to discuss object storage vs block storage
    - object storage treats any file as a discrete object, which means any change requires a whole hog rewrite; there are no incremental, delta updates
    - block storage breaks files into blocks, so when you edit your file, you don't have to rewrite the whole thing, just the changed blocks

So for if you're working with complete objects, or infrequent changes, then go S3. If you're doing complex read/write/change functions, then EBS.

## Amazon Elastic File System (EFS)
EFS is a managed file system. How does it differ from Elastic Block Storage (EBS)? EBS volumes attach to EC2 instances, and they're an availability zone-level resource, so your EC2 instance needs to be in the same AZ as the EBS to attach them. You can save files to EBS, run a database on it, or store applications on it. It's a hard drive. If you fill it up, it won't scale up automatically.

EFS can have multiple EC2 instances reading/writing simultaneously. It's more "cloud." It's not a blank hard drive, it's an actual linux file system. And it's a regional resource, not AZ level. It also automatically scales up when necessary.

## Amazon Relational Database Service (RDS)
This is an RDBMS (relational database management system). Amazon supports MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and others. If you're running one of those, you can do a "lift and shift" migration, where you basically pick up your database and put it into the cloud. This give you your same setup for the database, same OS, memory, storage, etc, just cloud'ed. You can also use the database migration service.

Or you can use the managed service, RDS. It supports all major database engines, but comes with automated patching, backups, redundancy, failover, disaster recovery, all of which you'd have to manage otherwise.

There's also Amazon Aurora, the most managed database option. 2 flavors, MySQL and PostgreSQL. 1/10th the cost of commercial databases, data replication, up to 15 read replicas, continuous S3 backup, and point-in-time recovery. This is what we use, I need to learn more about this...

## Amazon DynamoDB
This is a serverless database, so you don't need to manage the instances under the hood. You just make tables, so you can store and query data. But it doesn't use SQL, it's a non-relational/NoSQL database. This can be good for slightly less rigid data that needs to be accessed at a very high rate. So this can be fast and scalable, but not necessarily a best fit for every DB job.

## RDS vs. DynamoDB
- RDS
    - automatic HA, recovery provided
    - customer ownership of data
    - customer ownership of schema
    - customer control of network
- DynamoDB
    - key-value pair
    - massive throughput
    - PB size potential
    - granular API access

- Relational databases have been around for decades. These can analyze complex data from various tables through SQL. You have a sales supply chain management system. RDS is best because it's built for business analytics. RDS does joins.
- DynamoDB is great for basically anything else? This is great for lookup tables. You have an employee contact list, with all their information. This is a single table. But I still don't understand what you're doing with this table if not joining it to something else? Maybe that's just analytics-brain...

## Redshift
