Every git commit needs user and email, so configure your git with -

```
git config --global user.name "abc"
git config --global user.email "xyz@blah.com"
```

To check your settings:
`git config --list`

---

### IAM: User, Group, Role and Policy

Main actors in IAM are users, groups, roles and policies. And what you need to understand about AWS and never forget is that

> Everything in AWS is an API

And to execute any API or any of its methods, first we have to authenticate and then authorize that particular user/group/role.

Ex: An operator wants to put an object to a S3 bucket. This process happens through a set of API calls within AWS. Basically we call the S3 API and a method of it to put the object into the particular bucket (say method `put_object_in_s3`). For that we may want to provide the name of the bucket, the object, and most importantly we need to provide set of credentials (username with password or secret key or etc) in order to tell the AWS API Engine who this user/group/role is.

The first thing API Engine does is, look at those credentials sent with the API. Then it validate those (whether they are correct, active) credentials indicating that this request is coming from a actual valid user, group or role. Then what the API Engine does is (as it now knows who sent this API request) it takes the policy documents associated with the particular operator (user or role) and evaluate them as a single view. That is we check whether the action called in the API is authorized for that operator.

__IAM user__ - In the context of IAM, an user is a “permanent” named operator (human or machine). What’s important to note is that it’s credentials (credentials maybe username password or access key or a secret key) are permanent and stays with that named user. So by that AWS knows that what are the authentication methods (username password authentication method or secret key method or etc) for this user (as its permanent and stays with the user).

__IAM group__ - As in the above image, a group is a collection of users. And note that a user can be in many groups as well.

__IAM roles__ - Roles are not Permissions !!!. A role is also an authentication method just as IAM users and groups. As an user, a role is also a operator (could be a human, could be a machine). Difference is that credentials with roles are temporary.

__Policy Documents__ - As stated earlier, roles are not Permissions. Permissions in AWS are completely handled by objects called `Policy Documents`. Policy Documents are JSON documents. Policy Documents can directly be attached to Users, Groups or Roles. When a policy document gets attached to any of above operator, then only they get permissions do stuff. A policy document lists things like: Specific API or wildcard group of APIs that gets whitelisted against which resources, and Conditions for those API executions (like allow only if this user, group or role in the home network or allow from any location, allow only at certain times of day and etc)

> Last but not least, Authentication in AWS is done via (IAM users, groups and roles) whereas Authorization is done by Policies.

---

### Difference between Region, Availability Zone (AZ) and Edge Location

- A region is a __physical location in the world__ consisting to 2 or more AZs
- An AZ is one or more discrete __data centers__, each with redundant power, networking and connectivity, housed in separate facilities
- Edge Locations are end points for AWS which are used for __caching content__. Typically, this consists of CloudFront, Amazon's Content Delivery Network (CDN)

---

### Key Concepts:

A __Virtual Private Cloud (VPC)__ is a virtual network dedicated to a single AWS account. It is logically isolated from other virtual networks in the AWS cloud, providing compute resources with security and robust networking functionality.

  #### #of Edge Locations > # of Availability Zones > # of Regions

Some of the important services: Storage, Compute, Database, Network and Security, Identity and Roles

> Whenever we are doing anything with IAM the region is set to Global.

---

### IAM

- It is universal
- ROOT account has GOD mode. Make MFA (Multi Factor Authentication) for root user.

- **USER**
    - Two types of access:
        1. Programmatic access (access key ID and secret access key)
        2. Console access (password)
    - At the beginning, user gets no permission/authorization. We need to give permission via policy document. For example, we can give password change related policy (`IAMUserChangePassword`) to the user so that it can change its password.
    - We can add the user to group(s). Group can have different permissions/policies assigned and those policies are automatically inherited to the user under that group.
    - Set __password policy__ for the user in account settings.

### BILLING ALARM (CLOUD WATCH)

We can use SNS (SIMPLE NOTIFICATION SERVICE) under Cloud Watch to automatically send us the notification if the bill exceeds the threshold set by us.

### S3 (Simple Storage Service)

- It is **object-based** i.e allows you to upload files
- Files can be from 0 to 5 TB
- There is unlimited storage
- Files are stored in **buckets** (folders)
- S3 is a universal namespace (names must be unique globally). The region changes to `Global` when on S3 similar to IAM
- http 200 code if the upload was successful

**s3 objects** consists of following -
* key (name of the object)
* value (this is simply the data and is made up of bytes)
* version id (important for versioning)
* metadata (info about the data)
* subresources
    * access control lists (access/permissions (e.g lock an object) on the bucket level as well as individual object level)
    * torrent
    
**s3 data consistency model**
* Read after write consistency for PUTS of new objects (as soon as you create, you'll be able to read that object)
* Eventual consistency for overwrite PUTS and DELETES (might take some time to reflect the change)

**s3 storage classes or access tier**
1. s3 standard (4 9s availability i.e 99.99% availability; 11 9s durability)
2. s3 IA (infrequency accessed) e.g if we access something at the end of every month
    - Lower fee but retrieval charge
3. s3 One Zone IA (similar to deprecated RRS i.e reduced redundancy service) 
    - We do not care about losing the data if something happens as only 1 AZ
4. s3 Intelligent Tiering
    - Uses machine learning to analyse data usage and automatically moves data to most cost effective access tier or storage class to reduce cost
5. s3 glacier
    - for data archival
    - we can configure retrieval time (between minutes and hours)
6. s3 glacier deep archive
    - lowest cost storage class (which can go upto 12 hours for data retrieval)
    
> Note: Read s3 FAQs as s3 is very important for exam

#### S3 storage expense wise

s3 standard > s3 IA > s3 IT > s3 one zone IA > s3 glacier > s3 glacier deep archive

### S3 basics

#### Access control
To setup **access control** to S3 we can use -
1. Bucket Policy - works on bucket level
2. Access Control List (ACL) - works on individual object level

We can configure s3 bucket to log access requests. This log can be sent to another bucket and even another bucket in different AWS account.

#### Encryption
- `Encryption in transit` (HTTPS encryption) is achieved by
    - SSL/TLS
    
- `Encryption at rest (Server Side)` is achieved by
    - S3 Managed Keys (__SSE-S3__: Server Side Encryption): here Amazon manages the key
    - AWS Key Management Service (__SSE-KMS__): here we and Amazon together manage the keys
    - Server Side Encryption with Customer Provided Keys (__SSE-C__): here we give Amazon our own keys
    
- `Encryption at rest (Client Side)`
    - where we encrypt the object at our end to put it in S3
    
#### Versioning S3
- Great backup tool
- Once enabled on bucket, cannot be disabled but can only be suspended.
- For deletion of bucket, we can enable MFA (multi factor authentication)
- Integrates with Lifecycle rules

SCENARIOS:

> If we upload the same file again with some changes, the file gets overwritten and the permission (if you have made the previous uploaded file public) is private (default behaviour). So, we need to make it public again. The first version will still be public and can be seen under "Version Show" button. And when we delete the object, we can still see the different versions and the latest one would have "delete marker" on it. If we delete the "delete marker" object from version table, then it gets removed from the stack and the second last gets to top and becomes the latest revision.

> On overriding the same file, size of it increases due to version control. So keep in mind if you are updating huge file then the size will increase exponentially. In that case, you might want to look into Lifecyle rules.

#### Lifecycle Rule

We can create lifecycle rule configuration that is applicable for whole bucket or for specific tags corresponding to objects within the bucket.

There are two types of actions we can take on bucket/objects:

1. Automatic **transition** to tiered storage (For example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after you created them, or archive objects to the S3 Glacier storage class one year after creating them).
2. **Expire** your objects (Define when objects expire. Amazon S3 deletes expired objects on your behalf).

> Both the actions gets triggered (once enabled) after "N" number of days of the object creation.

We use lifecycle rule when - 

1. If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you might want to delete them.

2. Some documents are frequently accessed for a limited period of time. After that, they are infrequently accessed. At some point, you might not need real-time access to them, but your organization or regulations might require you to archive them for a specific period. After that, you can delete them.

3. You might upload some types of data to Amazon S3 primarily for archival purposes. For example, you might archive digital media, financial and healthcare records, raw genomics sequence data, long-term database backups, and data that must be retained for regulatory compliance.

#### Object LOCK (Locking objects for future edits/delete for regulations/ critical data etc.) (WORM Storage)

- We can enable `Object Lock` only when **VERSIONING** in enabled for a bucket.
- We use Object Lock to enable WORM mode i.e Write Once Read Many. This is done to avoid people from overwriting or deleting version of an object. So you put one version and don't want it to be changed or deleted for some time (retention period or indefinitely with legal hold)
- Nice video explanation (10mins): https://www.youtube.com/watch?v=d2UzxLoZW9I 
- There's lot of combinations that can be put on Object Lock and to read about those check https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock-overview.html which has a nice documentation.
- Similary, we have Glacier Vault Lock Policy for WORM mode and once locked, the policy cannot be changed.

#### S3 PERFORMANCE

1. **PREFIX**: Bare in mind while designing your architecture/bucket that more prefix you have the better performance we achieve.
   Example of prefix: mybucketname/folder1/subfolder1/abc.jpg > **/folder1/subfolder1**
2. **SSE-KMS**: S3 performance apart from prefix depends on **SSE-KMS**. So if our bucket/object uses that encryption then it will use KMS quota which depending on region can be 5500, 10000 or 30000 requests per second. This is because for example when we upload a file, we will call `GenerateDataKey` in the KMS API and similary while downloading the `Decrypt` key.
3. **MULTIPART UPLOADS**: Recommended for files > 100MB and necessary for files > 5GB
4. **BYTE RANGE FETCHES**: Parallelize byte range downloads

> See Exam Tips of Chapter 18 for # of request per prefix and for SSE-KMS

#### S3 SELECT and GLACIER SELECT

- to retrieve only subset of data. For example if we have a zip file containing CSV files and we want to get only one file then instead of **fetching** the entire zip, **decompressing** it and then reading the actual file, we can use S3 select to query just that file. It will save money for data transfer as well as increase speed.

#### SHARE BUCKET

There are 3 different ways to share S3 buckets -

1. Using Bucket Policy and IAM (bucket level): Programmatic access only
2. Using Bucket ACLs and IAM (object level): Programmatic access only
3. Cross account IAM Roles: Console and Programmatic access

#### CROSS REGION REPLICATION (CRR) OF S3 BUCKET (put in one bucket; automatically in another as well)

* Not that important (understand just the high level)
* Versioning must be enabled on the source bucket for CRR
* Files already existing in the source bucket are automatically replicated
* All subsequent updates will be automatically updated
* Delete markers and delete individual versions are not replicated
* Changing access of objects from private to public in source bucket doesn't impact the object in destination bucket.

#### TRANSFER ACCELERATION (to transfer files faster)

Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

> When using Transfer Acceleration, additional data transfer charges may apply.

Why you might want it?

* You have customers that upload to a centralized bucket from all over the world.
* You transfer gigabytes to terabytes of data on a regular basis across continents.

#### DATA SYNC (like between on-premise server and AWS)

* Used to move large amounts of data from on-premise data center to AWS and vice versa
* Need datasync agent on the source to transfer data
* Used with **NFS** and **SMB** compatible file system
* Replication can be done hourly, daily or weekly
* Can be used to replicate EFS to EFS as well

#### CLOUD FRONT (for delivering content)

* CloudFront is basically a CONTENT DELIVERY NETWORK (CDN)
* CDN is a system of distributed servers that delivers content to a user based on location etc.
* KEY TERMINOLOGIES:
    - ORIGIN: This is the origin of all files that CDN will distribute. This can be either S3 bucket, EC2 instance, Elastic Load Balancer, or Route53
    - DISTRIBUTION: It's the name (domain name) given to CDN which is a collection of edge locations
    - Web Distributino: Typically used for websites
    - RTMP (Real Time Messaging Protocol): For Media Streaming

> Edge Locations are not just READ only. We can write them to (i.e put objects on them). This we saw in Transfer Acceleration.

> Objects are cached for __TTL__ (Time To Live) value.

> You can clear cached object but that will be charged. Example, if we have uploaded something and the users are still getting the old video rather than the new one, then you can go in and probably clear those cached data.

Doc: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Introduction.html

#### CLOUD FRONT SIGNED URLs and COOKIES (e.g for premium users)

* Use signed URLs/Cookies when you want to secure content so that only people who authorize can use it.
* Signed URL is for single file. **1 file = 1 URL**
* Signed Cookie is for multiple files. **1 cookie = multiple files**
* If your origin is EC2, use CloudFront. If your origin is S3 and you have single file for user when you can use **S3 signed URL**

#### SNOWBALL (Migration of huge data)
* Import to S3, export to S3. Your requirement for snowball can depend on the size of file as well as available internet connection. For example if you have 2TB data with 44Mbps connection then you migth use it.

#### Storage Gateway (move backups to the cloud)

Scroll to the bottom in the below URL for better info.

Doc: https://aws.amazon.com/storagegateway/?nc=sn&loc=0&whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc

__Difference between DataSync and Storage Gateway__
(https://acloud.guru/forums/aws-csa-2019/discussion/-MB1MIgdt9ZxnKYajtub/whats%20is%20the%20difference%20between%20data%20sync%20and%20storage%20gateway%20%3F)
One is for optimized data movement, and the other is more suitable for hybrid architecture.

AWS DataSync is ideal for online data transfers. You can use DataSync to migrate active data to AWS, transfer data to the cloud for analysis and processing, archive data to free up on-premises storage capacity, or replicate data to AWS for business continuity.

AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.

You can combine both services. Use AWS DataSync to migrate existing data to Amazon S3, and then use the File Gateway configuration of AWS Storage Gateway to retain access to the migrated data and ongoing updates from your on-premises file-based applications.

#### ATHENA vs MACIE

ATHENA:
- It is an interactive query service i.e allows you to query on data stored in S3
- It is serverless
- Commonly used to **analyse log data** stored in S3

MACIE:
- It is a security service that uses AI to analyze data stored in S3 and helps **identify PII** (Personal Identifiable Information)
- Can also be used to analyse CloudTrail logs data for suspicious activity
- Includes dashboard, report and alerting

> _TIP: READ S3 FAQs in aws as S3 is really really an important topic for Associate Exams_

#### QUIZ ON S3

1. *Power user access allows access to all AWS services except management of users and groups within IAM*

2. You are a solutions architect working for a large engineering company that are moving from a legacy infrastructure to AWS. You have configured the company's first AWS account and you have set up IAM. Your company is based in Andorra, but there will be a small subsidiary operating out of South Korea, so that office will need its own AWS environment. Which of the following statements is true?
    - Correct Answer: You will need to configure **Users and Policy Documents only once**, as they are applied globally. (Here remember that IAM is GLOBAL)

3. You have created a new AWS account for your company, and you have also configured multi-factor authentication on the root account. You are about to create your new users. What strategy should you consider in order to ensure that there is good security on this account.
    - Correct Answer: Enact a **strong password policy**. user passwords must be changed every 45  days, with each password containing a combination of uppercase, numbers, special characters.
    
4. You have been asked to advise on a scaling concern. The client has an elegant solution that works well. As the information base grows they use CloudFormation to spin up another stack made up of an S3 bucket and supporting compute instances. The trigger for creating a new stack is when the PUT rate approaches 100 PUTs per second. The problem is that as the business grows that number of buckets is growing into the hundreds and will soon be in the thousands. You have been asked what can be done to reduce the number of buckets without changing the basic architecture.
    - Correct Answer: Change the trigger level to around 3500 PUTS as S3 can now accommodate much higher PUT and GET levels.
    - Explanation: Until 2018 there was a hard limit on S3 puts of 100 PUTs per second. To achieve this care needed to be taken with the structure of the name Key to ensure parallel processing. As of July 2018 the limit was raised to 3500 and the need for the Key design was basically eliminated. Disk IOPS is not the issue with the problem. The account limit is not the issue with the problem.
    
5. You run a meme creation website where users can create memes and then download them for use on their own sites. The original images are stored in S3 and each meme's metadata in DynamoDB. You need to decide upon a low-cost storage option for the memes, themselves. If a meme object is unavailable or lost, a Lambda function will automatically recreate it using the original file from S3 and the metadata from DynamoDB. Which storage solution should you use to store the non-critical, easily reproducible memes in the most cost-effective way?
    - Correct Answer: S3-1Z-IA
    - Explanation: S3 – OneZone-IA is the recommended storage for when you want cheaper storage for infrequently accessed objects. It has the same durability but less availability. There can be cost implications if you use it frequently or use it for short lived storage. Glacier is cheaper, but has a long retrieval time. RRS has effectively been deprecated. It still exists but is not a service that AWS want to sell anymore.
    
6. What is the availability of S3 – OneZone-IA?
    - Correct Answer: 99.50%
    - Explanation: OneZone-IA is only stored in one Zone. While it has the same Durability, it may be less Available than normal S3 or S3-IA
    
7. One of your users is trying to upload a 7.5GB file to S3. However, they keep getting the following error message: "Your proposed upload exceeds the maximum allowed object size.". What solution to this problem does AWS recommend?
    - Correct Answer: Design your application to use Multi-part upload API for all objects.
    - Explanation: multipart recommended for more than 100 mb and necessary for 5GB or more. Also, 5 TB max upload size for a file is to be remembered.
    
8. AWS S3 has four different URLs styles that it can be used to access content in S3.  The Virtual Hosted Style URL, the Path-Style Access URL, the Static web site URL, and the Legacy Global Endpoint URL.  Which of these represents a correct formatting of the  Virtual Hosted Style URL  style
    - Correct Answer: https://bucket-name.s3.Region.amazonaws.com/abc.png
    - Explanation:
        Virtual Hosted: https://bucket-name.s3.Region.amazonaws.com/key_name
        
        Path Style: https://s3.Region.amazonaws.com/bucket-name/key_name
        
        Virtual style puts your bucket name 1st, s3 2nd, and the region 3rd.
        
        Path style puts s3 1st and your bucket as a sub domain.
        
        Legacy Global endpoint has no region.
        
        S3 static hosting can be your own domain or your bucket name 1st, s3-website 2nd, followed by the region.
        
        AWS are in the process of phasing out Path style, and support for Legacy Global Endpoint format is limited and discouraged. However it is still useful to be able to recognize them should they show up in logs. https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
        
9. How many S3 buckets can I have per account by default?
    - Correct Answer: 100
    
10. What is the availability of objects stored in S3?
    - Correct Answer: 99.99%

11. What is AWS Storage Gateway?
    - Correct Answer: It is a physical or virtual appliance that can be used to cache S3 locally at customer's site.
    - Explanation: At its heart it is a way of using AWS S3 managed storage to supplement on-premise storage. It can also be used within a VPC in a similar way.



## EC2

- It's a virtual machine in the cloud.

### EC2 Pricing Types
1. **On-Demand**: for hourly or based on seconds (popular with developers)

2. **Reserved**: based on contract for 1 year or 3 years terms

    2.1 __Standard Reserved Instance__: After you select instance type, you cannot change it later.
    
    2.2 __Convertible Reserved Instance__: You can let's say move from t2 to r4 instance type in between the contract.
    
    2.3 __Scheduled Reserved Instance__: Let's say you run a school and you want the instances only between 9 and 10 when the students mostly login, then this type of ec2 is best for you.


3. **Spot**: it's like bidding for ec2. If you bid at that price, you'll have your instance(s). Depends on Amazon's own supply and demand. Useful for applications with flexible start and end time.
    > If Spot instance is terminated by Amazon (if they need that instance) then we will not be charged for partial hour of usage. However if we terminate by ourselves then we need to pay for the hour for which instance ran. 

4. **Dedicated Host**: dedicated machine for us and we can also pay on demand.

#### EC2 instance type Mnemonic:
FIGHT DR. MC. PXZ (in) AU

Basic Points on ec2

- __NAME__: EC2 inherits its name from the Tag
- __SUBNET__: We'll discuss this later in more detail but it decides the __AZ of ec2 instances__ within that region and the __range of IPs__ that can be assigned to ec2 instances in that subnet. Subnets fall under VPC.
- __SG__: Security group is like a virtual firewall to __control traffic__; basically deals with which __ip addresses__ can access the instance and through which __ports__.
- __ENCRYPTION (KEY PAIR)__: To connect to ec2 instance, aws provides __asymmetric encryption__. To make it easy to understand, think of asymmetric as __key pair__. Symmetric is just one key. You can consider asymmetric as you have pad lock (public key) with a key. So, basically when you lock a bike with padlock, any public can see that padlock (public key is exposed to public) but only you have the key (private key) to open that padlock. People can try to open your padlock but the handshake won't happen.

## Security Group

* We cannot blacklist anything in SG but we can in NACL. For e.g we cannot configure in SG to block HTTP or SSH or certain IP address but these things can be done in NACL which we'll see in Networking section in more detail. But in SG everything is blocked by default, so we allow ports or IPs (this is different because we are whitelisting but we cannot blacklist as everything is blacklisted by default in the beginning).
* SG are stateful while NACL are stateless. This means whatever we create in "Inbound" in SG are automatically set for "Outbound" as well but in NACL we have to set separately for "Inbound" as well as "Outbound".
* We can add multiple SGs to an instance. So for e.g if we have created 1 SG that only allows HTTP port and 1 SG that allows SSH and MySQL, and we have created an instance that needs all HTTP, SSH and MySQL then we add those SGs to our instance.

## EBS

__Replication__
- Each volume is automatically replicated within AZ to protect us from component failure.

### TYPES 
(based on VOLUME and IOPS; Space is cheap)

__SSD__

1. __General Purpose SSD__: balances price and performance; Volume of 1gb - 16tb with 16k iops
2. __Provisioned IOPS SSD__: high performance; for use-case that requires high storage and fast input/output operation like in case of database. Vol: 4gb-16tb with 64k iops

__HDD__: they give high storage compared to SSD but less iops

3. __Throughput Optimized HDD__: for high storage but less iops use case. Vol: 500gb-16tb with 500 iops
4. __Cold HDD__: cheapest; for infrequently accessed high storage but very less iops. Vol: 500gb-16tb with 250 iops
5. __Magnetic__: previous generation HDD; if for some reason you don't want glacier. very infrequently accessed.

AMIs can be used to create instance with either EBS as root volume or Instance Store as root volume. You can create instance with instance store as root and add ebs as extra later but you cannot create instance with ebs as root and then add instance store later.

## Encrypted Root Device Volume and Snapshots

Root device is where your OS is installed.

How to encrypt an unencrypted volume?

Create snapshot and then copy. While copying it asks for encryption again.
Remember copy is different from copy image.

Now we have encrypted copy of the volume as snapshot.

What we can do with snapshot?

We create snapshot of volume. We can use snapshot to create image (AMI) which can then be used to launch an instance.
1. We can copy the snapshot.
2. We can create ami
3. We can create volume

## CloudWatch

- Used for monitoring __performance__. Some of the examples it can monitor 

Compute:
    - EC2, Autoscaling, ELB, Route53

Storage and CDN:
    - EBS,S3, Storage gateway, CloudFront
    
- It also monitors applications running on AWS
- Cloudwatch with EC2 will monitor __every 5mins by default__ but can be set to __1mins__ with detailed monitoring
- You can have cloudwatch alarms which triggers notifications

> CloudTrail on the other hand is like a cctv that audits on API calls i.e who did what as a user. CloudWatch is about performance and cloudtrail is about auditing.

To cover RAM utilization and disk usage, we need to create custom cloudwatch metrics.

## EFS

- It's a file storage system (popular for Linux)
- We can attach multiple EC2 instances with EFS unlike EBS.
- It can shrink and expand as you remove and add files.

## FSx For Windows

- It's basically a file server for Windows
- It provides managed Windows file system so you can move your Windows based applications that requires file storage to AWS
- It's different from EFS as it manages __SMB (Server Message Block)__ based file services and is designed for Windows applications
- Amazon does not support EFS for EC2 instances running Windows

## FSx for Lustre

- When you need elastic storage but for HPC (High Performance Computing) applications, Big Data, ML and applications that require high throughput and sub low-level latencies.
- It can store data directly on S3

## Summary of EFS and FSx
> EFS for Linux workloads

> FSx for Windows for windows workloads

> FSx For Lustre for high performance workloads

## Overall: 4 Ways to achieve HPC for our business

### 1. Data Transfer (To get our Data into AWS)
1.1 Snowball and Snowmobile (terabytes/petabytes worth of data)

1.2 AWS DataSync (puts an agent in on-premise system) to store data on S3, EFS, FSx for windows etc.

1.3 Direct Connect (creates dedicated line or direct network connection between on-premise and AWS)

---

### 2. Compute and Networking
2.1 EC2 instances that are CPU or GPU optimized

2.2 EC2 fleets (Spot instances and Spot Fleets)

2.3 Placement Group (putting EC2 instances in particular to __clustered PG__ to reduce latency)

---

2.4 Enhanced Networking: uses __SR-IOV__ (Single Root Input Output Virtualization) to provide higher PPS (packet per second) performance and reduced latency.
    
    2.4.1 ENA (10Gbps to 100Gbps) - Recommended
    
    2.4.2 Intel Virtual Function (VF) - For legacy instances (not recommended)

2.5 Elastic Fabric Adapter (__EFA__): uses __OS by-pass__ which makes it much faster with much lower latency. Not supported for Windows yet (only Linux).


### 3. Storage (all about IOPS)

__Instance-attached__ Storage

3.1 EBS: Scaled up to 64k IOPS with Provisioned IOPS type

3.2 Instance Store: Scaled to millions of IOPS; low latency

---

__Network-attached__ Storage

3.3 S3: object-based; not a filesystem

3.3 EFS: Scale IOPS based on total size or use Provisioned IOPS

3.4 FSx for Lustre: millions of IOPS, which is also backed by S3

### 4. Orchestration and Automation

4.1 AWS Batch

4.2 AWS ParallelCluster


## WAF (Web Application Firewall) [Security Guard similar to NACL]

- It lets us monitor HTTP(S) requests that are forwarded to CloudFront, ELB or API Gateway.
- Its 3 behaviours:
    - Allow all requests except the ones you specify
    - Block all requests except the ones you specify
    - Count the requests that match the properties you specify (passive mode so to say)

> These behaviours can be related to Security Group that only permits us to allow things you specify otherwise everything is blocked by default in SG i.e we cannot block things in SG as customization

- Some conditions that we can specify to WAF:
    * IP address that the requests originate from
    * Country that requests originate from
    * Values in request headers e.g id and name in http://acloud.guru?id=1000&name=test
    * Strings that appear in requests (using regex or specific string)
    * Length of requests
    * Presence of SQL code that is likely to be malicious (to prevent against SQL injection)
    * Presence of script that is likely to be malicious (to prevent from cross-site scripting)

## Databases 101

RDS (Online Transaction Processing) __(OLTP)__ . Below are different flavors:
1. SQL
2. MySQL
3. Oracle
4. PostgreSQL
5. Aurora
6. MariaDB

RDS has 2 key feature:
1. Multi AZ (for disaster recovery) - allows you to have an exact copy in another AZ
2. Replicas (for performance) (consider example of viral blog)

Non Relational Database 
1. DynamoDB (No SQL)

RedShift (Online Analytics Processing) __(OLAP)__ (for Business Intelligence and Data Warehousing)

Elastic Cache (Amazon's Caching Solution) (for speed up performance of existing database)
1. Memcached
2. Redis

---

### RDS Backups
- Automatic: done at scheduled maintenance window
- Snapshot: done manually
> Restoring backup would result in new RDS instance with new DNS endpoint

### RDS Mulit AZ
- Allows you to have an exact copy in another AZ
- Not for improving performance
- We can force failover from one AZ to another by rebooting RDS instance 

### RDS Read Replica
- For incresing performance: allows read-only copy of production database (read-only as we can write to primary and the copy of primary get replicated to others)
- Used for very read-heavy database workloads (like a viral blog)
> to improve performance of RDS we use Read Replica and also Elastic Cache
- Must have automatic backup turned on for read replica
- upto 5 read replicas of any db
- read replicas can have their own read replicas but with latency
- read replicas will have their own DNS end point
- read replicas can have multi AZ
- read replica can be in separate region from primary db
- read replicas can be upgraded to be their own databases; this breaks replication
- can be for MySQL, Oracle, MariaDB, PostgreSQL, Aurora (not available for Microsoft SQL as of now)

## DynamoDB

- Stored on SSD (hence very fast)
- Spread across 3 geographically distinct data centers (redundancy)
- 2 different types of read models
    1. Eventual Consistent Read (Default): if you write data to your dynamodb database you'll be able to see the update in a second
    2. Strong Consistent Read: if you write data, you'll be able to see the update in less than a second
> __1 second rule:__ If you have a requirement where you want to read the data before a second then you choose Strong Consistent Read else if in a second is fine then Eventual Consistent Read
    
## RedShift

- petabyte scale data warehousing service
- just like jenkins, redshit can be configured with _single node_ or _multi node where you have leader node (manage client connections) and compute node (store data and perform queries and computations)_. Upto 128 compute nodes.

## DMS (Database Migration Service)

- it's a cloud service that can be used to upload your database to cloud or off-load from cloud to on-premise or between combination of cloud and on-premise setups

## Caching Service

- The following services have caching capabilities
    - CloudFront (caches your media files, videos, pics etc at edge locations near your end user)
    - API Gateway
    - Elatic Cache - consists of Memcached and Redis
    - DynamoDB Accelerator (DAX)
- Caching is a balancing act between __up-to-date, accurate information__ and __latency__

![cahing](./resources/caching_services.png)

The point of this diagram is to show that the more deeper you go with caching the more latency you might face. Here caching is possible in CloudFront level, API gateway level, lambda level, elastic cache level, and dynamodb level

## EMR (Elastic Map Reduce)

- It is industry leading cloud based platform for processing vast amount of data using open source tools such as Apache Spark. 
- The central component of EMR is cluster which has master node.
- If it is multi-node cluster then it'll have master node, one or more core node(s) and optional task nodes
- Master node manages cluster, stores the log, monitors the health of the cluster
- Core node performs tasks and stores data in the Hadoop Distributed File System (HDFS) on your cluster
- Task node just performs tasks and doesn't store data

## AWS Directory Service

__AD Compatible__:
- Managed Microsoft AD (aka Directory Service for Microsoft Active Directory)
- AD Connector
- Simple AD

__Not AD Compatible__:
- Cloud Directory
- Cognito user pools

## IAM Policies

![ARN](./resources/arn.png)

## Resource Access Manager (RAM)
- Allows access of certain resources from one AWS account to another
- For example, we can share Aurora DB to another aws account (say account2) by creating RAM in account1 and then accepting that request from account1 to account2 and then we can see the reflected db in account 2. We can now clone the aurora db that was shared to use it.

## SSO (Single Sign On)
- It helps centrall manage access to AWS accounts and business applications 
- Allows sign on to any SAML 2.0 enabled applications

## Route53
- Route 53 is a global service
- NS Records and SOA
    - When we hit let's say abc.com, the top level domain server (in this case .com) will look for the IP associated with the domain name and it's gonna point to NS Record server. It'll then query the NS Record which will give us the Start of Authority (SOA) server which has the DNS record.
    - DNS record consists of different things like 
        - __A-record (Address record)__ is used by computer to translate domain name to IP address.
        - __CName (Canonical Name)__ to resolve one domain name to another. For e.g Batman will resolve to Bruce Wayne in yellow page phone address example
        - __Alias record__ is similar to CName in that you can map one domain name to another. It is mainly used to __map resource record set in your hosted zone__ to ELBs, CloudFront distributions or S3 buckets that are configured as websites
        > Difference b/w CName and Alias Record is that CName can't be used for naked domain names (zone apex record). For naked domain names, it must be either an A record or Alias name.

Common DNS Types
1. NS Records
2. SOA Records
3. A Records
4. CNAMES
5. MX Records (used for mails)
6. PTR Records (opposite of A record i.e way of looking up from IP address to domain name)

Routing Policies
1. Simple routing
2. Weighted routing
3. Latency-based routing
4. Failover routing
5. Geolocation routing
6. Geoproximity routing (traffic flow only) (complicate flow; not important)
7. Multi-value answer routing

## VPC

- You can additionally create a hardware virtual private network connection between your corporate datacenter and your VPC and leverage AWS cloud as an extension of your datacenter

- We have two lines of security in the VPC: NACL and SG
    - NACL is stateless i.e they allow you to do allow rules and deny rules and we can block specific IP addresses
    - SG is stateful
    
- Below is an example of VPC consisting of internet gateways, route table, NACL, SG, public subnet and private subnet.

![vpc](./resources/vpc1.png)

- The box in the top right of the diagram shows the ranges of private network allowed.

__VPC Peering__
- Allows you to connect one VPC with another via direct network route using private IP address
- Instances behave as if they are on same network
- You can peer VPCs with other AWS account and with other VPCs in the same account
- You can peer between regions as well
- Peering is in star configuration; NO TRANSITIVE PEERING

> 1 subnet = 1 AZ; You can have multiple subnets in one AZ but not multiple subnets in one AZ

> We can attach only 1 internet gateway to 1 VPC

__DEFAULT__:
- While creating VPC, we get __route table, NACL and SG__ by default

![](./resources/default_vpc.png)

- By default subnet created would have "Auto Assign Public IP" as No meaning it would be private. By this it means, any EC2 instance created in the subnet with "Auto Assign Public IP" set to No would not have public IP and thus can't be ssh'ed. So for any EC2 instance to be ssh'ed from public it needs to be set as Yes.

> Interesting: My AZ name, say us-east-1a, could be completely different AZ from your us-east-1a as it is randomized by AWS so that everyone doesn't select us-east-1a as their default choice.

__Route Table__:
- A route table contains a set of rules, called routes, that are used to determine where network traffic from your subnet or gateway is directed. To put it simply, a route table tells network packets which way they need to go (target) to get to their destination.
- By route table remember the subnet route table and gateway route table. 
- Short explanation: https://medium.com/awesome-cloud/aws-vpc-route-table-overview-intro-getting-started-guide-5b5d65ec875f
- Detailed and good explanation: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html 

#### NAT (Network Address Translation) Instance
- Need to disable the source/destination check (an ec2 instance needs to be a source or destination for any traffic it sends or recieves and that is why source/destination check is enabled by default in the networking section of the ec2 instance; NAT ec2 instances acts as gateways for the internet gateway and acts as a gateway between private subnet ec2 instances and internet gateway and hence source/destination should be disabled to act more as gateway and less as instance)
- The problem with NAT instance is it can be a bottleneck if a lot of private instances want to route through it and also because of single point of failure (i.e single instance on single AZ)

#### NAT Gateway
- To tackle the above problems we have NAT Gateway which is not dependent on single EC2 instance and are spread accross multiple AZ

#### VPC FlowLogs
- It's a way of storing all VPC traffics in a log
- Can be created at 3 levels:
    1. VPC
    2. Subnet
    3. ENI (Elastic Network Interwork)
    
#### Bastion Host
- Special purpose computer on a network designed and configured to withstand attacks
- demilitarized zone (DMZ) = public subnet
- Bastion a way of ssh'ing or rdp'ing into your private instances in your private subnet
- A Bastion host allows you to securely administer (via SSH or RDP) an EC2 instance located in a private subnet. Don't confuse Bastions and NATs, which allow outside traffic to reach an instance in a private subnet.

_High Availability Bastion_

Option 1:
- 2 bastion hosts in 2 separate AZ. Use Network LB with static IP and health checks to failover from 1 host to another
- Can't use Application LB, as it is Layer 7 and we need Layer 4 and thus expensive option

![option1](./resources/option1_bastion.png)

Option 2:
- 1 host in 1 AZ and behind ASG with health check and fixed EIP. If the host fails, health checks will fail and ASG will provision new host in separate AZ. You can use user data script to provision the same EIP to the new host
- This is the cheapest option but not 100% fault tolerant as the ASG will take some minutes to provision new host and there will be some downtime because of that

![option2](./resources/option2_bastion.png)

#### Global Accelerator
- Service to accelerate the performance of applications for local and global users
- It provides two static anycast IP addresses but you can provide your own
- Normally it may take many network hopping to reach the application. Accelerator uses global AWS network to remove these inefficiences.

Components:
Static IP, Accelerator(points static ip to dns), DNS Name (for the ip address), Network Zone (similar to AZ), Listener (listens to the ports), Endpoint Group, and Endpoint

![](./resources/accelerator1.png)
![](./resources/accelerator2.png)

#### VPC Private Link

Sharing Applications across VPCs
To open our applications to other VPCs we can either:
1. Open VPC to internet

Disadvantage:
- Security considerations; everything in public subnet is public
- A lot more to manage; like SG are effective, NACL is there, might have to add WAF (web application firewall) etc.
2. VPC Peering

Disadvantage:
- Will have to create and manage many peer relationship
- Whole network will be accessible
3. VPC Private Link
- Best case to tackle the above disadvantages for connecting to more than 10 VPCs

#### Transit Gateway

![](./resources/transit_gateway.png)
![](./resources/transit_gateway2.png)

#### Summary
- 1 subnet = 1 AZ
- 1 VPC = 1 IGW
- SG can't span VPCs
- 1 subnet = 1 NACL but not vice-versa
- 1 ELB needs >= 2 public subnets
- for one vpc to talk to another vpc we have i) vpc peering ii) vpc private link iii) transit gateway
- for a vpc to talk to s3 or dynamodb outside it we have - vpc endpoint

## ELB

Whenever you have ELB, think of below 3 options:

1. __Sticky Session__ - Below diagram shows a common scenario where you as admin sees that no traffic is sent to instance two. What could be the reason? Well, sticky session is enabled and thus in this scenario you would disabled it.
![](./resources/sticky_session.png)

2. __Cross Zone Load Balancing__

Scenario 1: your instance in another AZ is not getting any traffic. What could be the reason? Well, cross zone is disabled.

![](./resources/cross_zone_scenario1.png)

Scenario 2:
![](./resources/cross_zone_scenario2.png)

3. __Path Pattern__

Scenario: when someone hits URL to fetch images
![](./resources/path_pattern.png)

## Autoscaling Group

- It has 3 components:
1. Group: Webserver group, database group, application group etc.
2. Configuration Templates: Group uses this template to launch EC2 instances. You specify AMI ID, instance type, SG, key pair and block device mapping for your instance.
3. Scaling options

    3.1 Maintain current instance levels at all times    
    3.2 Scale manually (most basic one where you tell how many you want i.e desired capacity and ASG takes care of terminating or scaling up)    
    3.3 Scale based on schedule (e.g monday morning 9am)    
    3.4 Scale based on demand - most used one - uses scaling policies (e.g scale up if CPU utilization > 90%)    
    3.5 Predictive scaling - uses ML to predict expected traffic and EC2 usage 
    
---

ELB has target groups and ASG has launch configuration

![](./resources/asg_example.png)

## On premise services with AWS
1. DMS (Database Migration Service)
2. SMS (Server Migration Service)
3. Application Discovery Service
4. VM Import/Export
5. Download Amazon Linux 2 as an ISO

__1. DMS__
- Allows you to move databases to and from AWS
- Supports homogenous (e.g oracle to oracle) as well as heterogenous (e.g sql to aurora) migrations

__2. SMS__
- Incremental migration of your servers to AWS
- Can be used as backup tool, multi-site strategy (on-premis and off-premise)

__3. Application Discovery Service__
- You install ADS agentless connector as a virtual appliance on VMWare vCenter which builds a server utilization map and dependency map of your on-premise environment
- You can export this data as CSV to estimate the Total Cost of Ownership (TCO) of running on AWS and to plan your migration to AWS
- So in rough words it's basically installing software that tracks utilization to estimate cost of migration to AWS

__4. VM Import Export__
- Migrating your applications in to EC2
- You can also export your AWS VMs to your on-premise data center

__5. Download Amazon Linux 2 as an ISO__
- Allows you to download your amazon linux 2 as an ISO
- Supports major virtualization providers like VMware, Oracle VirtualBox, Hyper-V, KVM etc.

## SQS
- It's a way of storing messages in a queue
- Using SQS you can __decouple__ the components of an application so they can run independently, _easing message management between components_
- Messages can contain upto __256KB__ of text in any format (but you can store more in S3) which can be retreived by any component using SQS API

__2 types of Queues__
1. Standard queue (default)
    - nearly unlimited number of transactions per second
    - guarantees that a message is delivered at least once
    - your application needs to be able to handle 2 things
        - message could be delivered out of order
        - multiple copies of the same message could be delivered
        ![](./resources/standard_sqs.png)
        
2. FIFO
    - if your application can't cope up with with above diagram i.e (unordered delivery and multiple copies) then FIFO is for you
    - they are limited to 300 transactions per second

## KMS (Key Management Service)

Source of this notes: https://www.youtube.com/watch?v=ksnHLFxgXcI 

![kms use](./resources/kms_0.png)

---

![kms_1](./resources/kms_1.png)

---

__KMS EXAMPLE: S3 SSE (Server Side Encryption)__

![kms example](./resources/kms_example.png)

__Key Store: KMS HSM vs CloudHSM__

While the CMK (customer master key) stored in KMS HSM in fully managed on its own, the CMK stored in custom key store (i.e CloudHSM) has our full control (like life-cycle, backup in AZs etc.) and needs to be managed by user.

![](./resources/kms_hsm_vs_cloud_hsm.png)

## Some AWS Services as Categories

![aws_services](./resources/aws_services.png)

## Lambda

It can be used as an:
- Event driver compute service (triggers); these events could be changes in S3 or DynamoDB
- As a compute service to run in response to HTTP requests using Amazon API Gateway

## Exam Tips on EC2

__Termination__
- Termination protection is turned off by default. You must turn it on to protect from accidental termination.
- On an EBS-backed instance (i.e with root volume), the default action is for the root EBS volume to be deleted on instance termination. Additional volume added to instance, won't be deleted by default. So, you have to checkmark the "delete on termination" to be effective.

## Exam Tips on SG

- Every time you change role in SG, it takes place immediately. For e.g let's say I disable http port, then that rules become effective immediately.
- Multiple ec2 instances to SG and multiple SGs to an ec2 (many 2 many relationship b/w ec2 and sg).
- SG are stateful i.e if you allow inbound rule allowing traffic in, that traffic is automatically allowed back out again.
- You cannot block specific IP addres, use NACL for that.
- All inbound rules blocked by default. You can specify allow rules but cannot deny rules.

## Exam Tips on EBS and Snapshots

- _How to move/copy ebs from one AZ to another?_ (__MIGRATION__):
    1. Create its snapshot
    2.1. Create image (AMI) that can be deployed to other AZs.
    > Make sure you select Virtualization type as HVM as it gives us different instance type while creating the ec2 out of that image
    
    The image created goes under AMI and we can use that AMI to launch our instance in different subnet.
    2.2. Create AMI and copy that to different AZ (while copying it asks for destination region)
    
> SUMMARY: create snapshot; create AMI and launch or create AMI and copy
- Snapshots are stored on S3
- Snapshots are incremental - this means only the blocks that have changed since your last snapshot are moved to S3
  For e.g if you have f1 as file in the snapshot and next time you add f2 file in the volume and take snapshot then only the delta would be replicated to S3.
- While creating snapshots especially for root volumes, best practice is to stop the instance and then create snapshot. However, you can create it on running instance as well.
- AMIs from snapshot is possible
- We can change volume size on the fly including volume type
- Volume will ALWAYS be in the same AZ as the ec2 instance

## Exam Tips on Instance Store Volumes
- AMI root volume types can be either EBS or instance store
- They are called Emphemeral Storage
- Instance created wiht Instance Store Volume cannot be stopped unlike EBS. If underlying host fails, you will lose data. However, you can reboot your instance with this volume just like EBS.
- By default, both ebs and isv root volumes are deleted on instance termination but with ebs we can change that.
- we can add more instance store volumes only while creating the ec2 instance from ami with instance store as root but not after the ec2 instance has been created. However, we can add ebs volume to it after the instance is created.

## Exam Tips for Network Adapters (ENI vs ENA vs EFA)

Elastic Network Interface (ENI):
- For basic networking. Perhaps you need a management network separate to your production network or a separate logging network and you need to do this at __low cost__. In this scenario use multiple ENIs for each network.

Enhanced Network Adapter:
- For when you need speed b/w __10Gbps to 100Gbps__. Anywhere you need reliable, high throughput.

Elastic Fabric Adapter:
- For when you need to accelerate __High Performance Computing (HPC)__ and __Machine Learning__ applications or if you need to do an __OS by-pass__. If you see a scenario question asking HPC, ML or OS by-pass, choose EFA
- It supports only Linux and not Windows currently

## Exam Tips for Encrypted Root Device Volumes and Snapshots
- Snapshots of encrypted volumes are encrypted automatically
- Volumes restored from encrypted snapshots are encrypted automatically
- Snapshots can be shared only if they are unencrypted and these snapshots can be shared with other AWS account or made public

To encrypt unencrypted root device volume
1. Create its snapshot
2. Create copy of that snapshot and select encryption option
3. Create AMI from encrypted snapshot
4. Use that AMI to launch new encrypted instances

## Exam Tips for Spot Instances and Spot Fleet

1. Spot instances save upto __90%__ of the cost of On-demand instances
2. Use for applications __not__ requiring __data persistence__
3. You can use __spot block__ to stop instances from terminating
4. Spot fleet: __collection__ of spot instances, and optionally on-demand instances

## Exam Tips on EC2 Hibernate

- It preserves in-memory RAM on persistent EBS
- Much faster to boot up as you __do not need to boot up OS__
- Instance RAM must be __< 150GB__ and ebs bigger than RAM
- Support c3, c4, c5, m3, m4, m5, r3, r4, r5 and some t2 instance types
- Available for ubuntu, windows, amazon linux 2 ami
- Instances can't be hibernated for more than __60 days__
- Avaiable for __on demand__ and __reserved__ instances

## Exam Tips on CloudWatch

- Standard monitoring = 5mins
- Detailed monitoring = 1min

What can we do with CW?
- Create awesome __dashboards__ (globally and regionally) to see what is happening
- Create __alarms__ to notify when certain thresholds are hit
- Create __events__ to allow us to respond to state changes in AWS resources
- Create __logs__ to aggregate, monitor and store logs

CloudWatch monitors performance while CloudTrail monitors API calls in the aws platform

## Exam Tips on Roles

- Roles are more secure than storing access key and secret access key on individual EC2 instances
- roles are easier to manage (imagine you lose your access key and you have 1k instances running. you will have to go and update the access key for each instance)
- roles can be assigned to ec2 instance after it is created using both console and cli
- roleas are universal (all iam things are universal)

## Exam Tips on bootstrap and metadata
We can ssh to ec2 instance and get various info about user-data and meta-data for that instance.
- Metadata can be used to get information about an instance (such as its ipv4, public-ipv4 etc)
- curl http://169.254.169.254/latest/meta-data/
- curl http://169.254.169.254/latest/user-data/
- curl http://169.254.169.254/latest/meta-data/ > some.txt

A small e.g is we can use user-data to write a shell script that runs this to get public ip, writes it to a file, then automatically copies it to s3 that could then trigger lambda function and then that could basically store that ip in a database.

## Exam Tips on EFS

- It supports Network File System version 4 (NFSv4) protocol
- You can pay for the storage you use (no pre-provisioning required). In EBS we have to say we need 8GB or so.
- Can scale up to petabytes
- Can support thousands of concurrent NFS connections
- Data is stored across multiple AZs within a region
- Read after write consistency

## Exam Tips on EC2 Placement Groups (falls under Network group in ec2 console)

- Clusterd Placement Group
    - Use case: for low latency and high network throughput
- Spread Placement Group
    - Use case: for individual critical EC2 instances
    - Spread placement groups have a specific limitation that you can only have a maximum of 7 running instances per Availability Zone
- Partitioned Placement Group
    - Multiple EC2 instances
    
- Clustered PG can't spread over multiple AZs but Spread and Partitioned can but in same region
- Cluster Placement Groups are primarily about keeping you compute resources within one network hop of each other on high speed rack switches. This is only helpful when you have compute loads with network loads that are either very high or very sensitive to latency.
- PG name must be unique for an AWS account
- No charge for creating PG
- __Only certain type of instances can be launched within PG (Compute Optimized, GPU, Memory Optimized, Storage Optimized)__
- AWS recommends homogeneous instance types withing Clustered PG
- You can't merge PGs
- You can move existing EC2 instance to and from placement group but the instance should be in stopped state. And they can be moved/removed only via AWS CLI or AWS SDK but not via console yet.

> _My shortcut understanding: Only when we have certain optimized ec2 instances then only we can start thinking about whether we need PG or not_

## Exam Tips on WAF

- In the exam you'll be given different scenarios and you'll be asked how to block from malicious IP addresses
    - Use WAF
    - Use NACL (Network Access Control List)


## Exam Tips on RDS

- RDS runs on virtual machines
- You cannot login/ssh into these machines
- Patching of the RDS operating system and DB is amazon's responsibility
- RDS is __NOT SERVERLESS__
- Aurora Serverless is Serverless

## Exam Tips on RedShift

- It's for business intelligence
- Available only in 1 AZ

__BACKUPS:__
- Enabled by default with 1 day retention period
- Just like RDS max retention period is 35 days
- Redshift will attempt to make at least 3 copies of your data (the original and replica on compute node and a backup in S3)
- Redshift can asynchronously replicate your snapshot to S3 in another region for DR (disaster recovery)

## Exam Tips on Aurora

- 2 copies of your data are contained in each AZ with minimum of 3 AZ. 6 copies of your data
- 3 types of replicas available. Aurora replicas, MySQL replicas and PostgreSQL replicas. Automated failover is only available with Aurora Replicas
- Aurora has automated backups turned on by default. You can take snapshots and share it with other AWS accounts (while taking snapshots it's not gonna effect your production database)
- Use Aurora serverless if you want a simple, cost effective option for infrequent, intermittent or unpredicatable workloads

## Exam Tips on Elastic Cache

- Use Case: let's say you query top 10 purchases in amazon, so ec2 instances query the elastic cache instead of the production database environment. Elastic Cache stores that info in there (as that query changes very infrequently). This improves the performance as it is fast to query from cache. You cache your most important queries in Elastic Cache.
- So, it is used to increase database and web application performance
- A common question is what would you do if your database is overloaded. One is you use read replicas and the other elastic cache
- Memcached is for simple scenarios while redis is for advanced data types and redis is multi AZ and has backups and restore features.

## Exam Tips on DMS

- Allows migration of databases from one source to AWS and vice-versa
- The source can be either on-premise, or within AWS or another cloud provider such as Azure
- You can do __homogenous__ migrations (same DB engines) or __heterogenous__ migrations
- If you do heteregenous migration, you will need __AWS Schema Conversion Tool (SCT)__

## Exam Tips on EMR (Elastic Map Reduce)

- EMR is used for big data processing
- Consists of a master node, core node and optionally task node
- By default, log data is stored on master node
- You can configure replication to S3 __five minutes interval for all log data from the master node__; however this can only be configured when creating the cluster for first time (this step prevents your data loss from master node failure or termination)

## Exam Tips on IAM Policies

- Not explicitly allowed = Implicitly denied
- Explicit deny > everything else
- Only attached policies (to user, role) have effect
- AWS joins all applicable policies (if we have attached 2 or more policies to a role/user)
- We have AWS managed and customer managed policies

__Permission boundaries__
- they do not grant access but limit the access that the user/role already has
- Used to delegate administration access to other users
- Prevent privilege escalation and unnecessary broad permissions
- Control maximum permissions an IAM policy can grant
- Use Cases:
    - Developers creating roles for lambda functions
    - Application owners creating roles for ec2 instances
    - Admins creating ad hoc users
    
## Exam Tips on Route53
- It can take upto 3 days to register domain name depending on the circumstances
- _Simple Routing Policy_
    - if we choose simple routing policy, we can have only one record with multiple IP addresses. If you specify mulitple values in a record, Route 53 returns all values to the user in a random order based on TTL set.
- _Weighted Routing Policy_
    - allows you to split your traffic based on weights assigned
    
    __Heath Checks__
    - You can set health checks on individual record sets
    - If a record set fails a health check it will be removed from Route 53 until it passes the health check
    - You can set SNS notifications to alert you if health check is failed
- _Latency Routing_
    - sends traffic to your end-user based on lowest network latency
- _Failover Routing_
    - we create active-passive setup
- _Geolocation Routing_
    - routes traffic based on end-user location
    - can choose region for end-users based on continent as well as country
- _Geoproximity Routing (Traffic Flow only)_
    - must use Route 53 traffic flow
- _Multivalue Answer Routing_
    - similar to simple routing policy except that we have health check option and also instead of having multiple values for the record set, we can have multiple record set.
    
---

- ELBs do not have predefined IPv4 addresses; you resolve to them using DNS name
    

## Exam Tips on VPC

- When we create a VPC a default Route Table, NACL and SG are created
- It won't create a subnet nor an internet gateway
- us-east-1a in one AWS account could be a completely different AZ to us-east-1a in another AWS account. AZ's are randomized
- Amazon resevers 5 IP addresses within your subnets
- 1 internet gateway per VPC
- Security Groups can't span VPCs; meaning SG created in one VPC can't be seen in another VPC
---
__NAT Instances__
- Disable Source/Destination check when creating the instance
- Should be created in public subnet
- There should be a route out of the private subnet to the NAT instance for this to work
- The amount of traffic NAT instances can support depends on the instance size; if you're bottlenecking, increase the instance size
- You can create high availability using Autoscaling Group, multiple subnets in different AZ and a script to automate failover but is actually a pain
- They are behind the Security Group
---
__NAT Gateways__
- Redundant inside AZ (can recover from failover as it is spread over the AZ)
- We can have only 1 NAT Gateway in 1 AZ
- Preferred by enterprise
- Scales automatically (starts at 5Gbps)
- Automatically assigned public IP address
- No need to patch
- Not attached to any SG
- Remember to update your route tables
- No need to disable Source/Destination Checks
- If you have resource in multiple AZ and they share one NAT gateway, then in case the NAT's AZ goes down, resources in other AZ can't access internet too. To create AZ independent architecture, create NAT gateway in each AZ and configure routing to ensure that resources use the NAT gateway in the same AZ.
---
__NACL__
- Your VPC comes with default NACL which allows all inbound and outbound traffic
- On creation of new NACL it denies everything
- Rules on NACL are prioritized on chronological order e.g rule 99, let's say to allow, will overshadow rule 100, let's say to deny, for the same port.
- each subnet is associated with some NACL; any subnet created goes into the default NACL
- you can change the subnet for a NACL by editing its subnet association
- 1 subnet can reside in only 1 NACL and 1 NACL can have many subnets
- they are evaluated before SG
- you can block IP addresses in NACL but not in SG
- they have separate inbound and outbound rules

---

> You gonna need at least 2 public subnets to create ELB

__VPC Flow logs__
- You can have flow logs between peered VPCs within same account but not for two different accounts
- You can now tag flow logs
- You can't reconfigure flow logs after they are created; for e.g you can't associate a different iam role
- Not all traffic is monitored
    - Traffic generated by instances when they contact Amazon DNS server; however if you use your own DNS server then all the traffic is logged
    - Traffic generated by Windows instance for Amazon Windows license activation
    - Traffic to and from 169.254.169.254 for instance metadata
    - DHCP traffic
    - Traffic to the reserved IP address for the default VPC router
    
__Bastion__
- NAT instance or NAT gateway is used to provide internet traffic to EC2 instances in private subnets
- Bastion is used to securely administer EC2 instance (using SSH or RDP)
- You can't use NAT gateway as Bastion host

__Direct Connect__
- Directly connects your data center to AWS (dedicated network connection to AWS)
- Useful for high throughput workloads (i.e lots of network traffic)
- Or if you need high speed stable connection

Steps:
1. create a Public Virtual Interface in Direct Connect
2. go to VPC console (and under VPN section), create 2 gateways: a Customer Gateway and a Virtual Private Gateway
3. attach the Virtual Private Gateway to your VPC 
4. create a VPN connection selecting the Customer Gateway and the Virtual Private Gateway
5. once the VPN is available, take the IP addresses from Tunnel Details to configure the on-prem Firewall and VPN.

__Global Accelerator__
- Assigns 2 static IP addresses but you can bring your own as well
- You can control traffic using traffic dial within the endpoint group
- You can also do traffic weighting on the endpoints itself

__VPC Endpoint__
- We can use Bastion, NAT instance or NAT gateway to connect our private instances to outside. Similary, we can use VPC endpoints to securely connect our private instances to resources of AWS outside our VPC through AWS network i.e we don't require outside internet for this.
- Supports S3 and dynamodb
- Now called gateway endpoints
- Diagram below shows how instance can communicate to S3 without VPC endpoint and with VPC endpoint (obviously for both cases you need to provide S3 full access role to your private instance first)
![vpc nat gateway](./resources/vpc_nat_gateway.png)
![vpc_endpoint](./resources/vpc_endpoint.png)

__VPC Private Link__
- If you have a question asking about peering VPCs to tens, hundreds or thousands of customer VPCs, think of AWS Private Link
- Don't need VPC peering; no route table, NAT, IGW (internet gateway) etc.
- Requires NLB (network load balancer) on service VPC (i.e VPC where application service is hosted) and ENI on customer VPC

__Transit Gateway__
- It's a way of simpliying your network architecture
- Allows you to have transitive peering between thousands of VPCs and on-premises data center
- Works on a hub and spoke model
- Works on a regional basis but you can have it across multiple regions
- Can be used across multiple AWS accounts (using Resource Access Manager)
- You can use route tables to limit how VPCs talk to each other
- Works with direct connect as well as VPN connections
- Supports __IP multicast__ (not supported by any other AWS service)

> A great explanation with diagram: https://medium.com/slalom-technology/next-generation-networking-with-aws-transit-gateway-and-shared-vpcs-9d971d868c65

__VPN Hub__
- If you have multiple websites each with their own vpn connection, you can use VPN cloud hub to connect those sites together
- Works on a hub and spoke model
- Operates over public internet but all traffic between customer gateway and vpn cloudhub are encrypted

__VPC Network Costs__
- Use private IP addresses over public IP addresses to save on costs. This then utilizes AWS backbone network
- If you want to cut all network costs, group your EC2 instances in the same AZ and use private IP addresses. This will be cost-free but you will have single point of failure in AZ is down.

![](./resources/network_cost.png)

## Exam Tips on ELB

1. Application Load Balancer: intelligent routing e.g if user changes language to German then alb will redirect to german server. All this is done via __listener rules__ where you can create a bunch of if-then rules.
2. Network LB: for high speed/performance applications
3. Classic: cheap and basic load balancing for simple applications with no intelligent routing ie. if-then listener rules

- __504 error__ means gateway has timed out (and it's not the LB which is unhealthy). This means application not responding. You just need to troubleshoot the application to figure out if it's the webserver layer or database layer
- If you need to look for IPv4 address of the end-user check for __X-Forwarded-For__ header
- Instances monitored by ELB are reported as: InService or OutOfService (depending on health check)
- Health checks check the instance by talking to it
- Load Balancer have their DNS name. They are never given an IP address (but you get static IP address for Network LB; remember the case of High Availability Bastion hosts that require NLB)
- Read the FAQ for ELB

__Advanced Load Balancer__
- __Sticky Sessions__ enables your user to stick to same EC2 instance. Can be useful if you're storing information locally to that instance
- __Cross zone load balancing__ enables you to load balance across multiple AZ
- __Path patterns__ allow you to direct traffic to different EC2 instances based on the URL contained on the request

## Exam Tips on HA
- Always design for failure
- Use Multi AZ's and multi region whenever you can
- Know the difference between Multi AZ vs Read replicas for RDS (multi az for disaster recovery and read replica for performance)
- Know the difference between scaling out (creatign new instances via ASG) vs scaling up (increasing the performance of ec2 instances by changing the instance type et. al.)
- Read the question carefully and always consider the cost element (there'll be question that uses 1 option which is expensive and does the job and the other that is cheap)
- Know the different s3 storage classes

## Exam Tips on SQS
- It is pull based; SNS is push based
- Messages are 256KB in size
- Messages can be kept in the queue from 1 minute to 14 days; default retention period is 4 days
- SQS guarantees that your msg will be processed at least once
- __Visibility timeout__ is the amount of time that message is invisible in the SQS queue after a reader picks up that msg. Provided the job is processed before the visibility timeout expires, the msg will then be deleted from the queue.  If the job is not processed in that time, the msg will be visible again for another reader to process. This could result in same msg being delivered twice. For this kind of scenario where the msg is getting delivered twice because the job is taking more time to process than the visibility timeout, you should increase the timeout
- Visibility timeout max is 12 hrs; you must have your job processed within 12 hrs otherwise your msg will come again
- __Long polling__: While the regular short polling returns immediately, long polling does return a response until a message arrives in a queue or the long poll times out. Scenario: you got a queue which is mostly empty and you got your ec2 instance that polls the queue for more work but your queue is empty and that costs you a lot of money; to save money in this case you do long polling
- Any scenario based question mentioning __"decoupling"__ your infrastructure - think SQS

## Exam Tips on SWF (Simple Workflow)
* Think of ordering book from amazon which then needs manual/human intervention to collect book from warehouse and ultimately delivering it to you.
* SQS has retention period of 14 days while workflow executions can last upto 1 yr
* SQS is message oriented whereas SWF is task oriented

## Exam Tips on SNS (Simple Notification Service)
SNS vs SQS
* both are messaging service
* sns is push based and sqs is pull based (polls)

## Exam Tips on Elastic Transcoder
* It's a media converter that converts media files from original format to different formats for platforms like smartphones, tablets, PCs etc.

## Exam Tips on API Gateway
* It acts like a door to your AWS resources
* It has caching capabilities to increase performance
* It is low cost and scales automatically (need not worry about auto-scaling)
* You can throttle that scaling to prevent attacks
* Can log results in Cloudwatch
* If you're using Javascript/AJAX that uses multiple domain names with API Gateway, then you have to enable CORS (Cross Origin Resource Sharing) on API Gateway
* CORS is enforced by client (browser)

> API Gateway: Caching, CORS, scales

## Exam Tips on Kinesis
* It's about data streaming service

__1. Kinesis Streams__:
- data persistence, stores data in _shards_, 24hrs to 1 week data retention

__2. Kinesis Firehose__:
- no data persistence, processes data on the fly (using lambda etc.)

__3. Kinesis Analytics__:
- to analyse the data inside Kinesis Streams or Kinesis Firehose
 
 ![kinesis streams](./resources/kinesis_streams.png)
 ![kinesis firehose](./resources/kinesis_firehose.png)

## Exam Tips on Web Identity Federation - COGNITO
* Cognito is an __identity broker__ which handles interation between your application and Identity Provider (You don't need to write your own code)
Use Case:
1. User uses Web Identity Provider like Facebook, Google, Amazon to sign in.
2. The Web Identity Provider then provides access token
3. This access token is passed by user to Cognito's User Pool which then converts the token to JSON Web Token (JWT)
4. This JWT is passed by user to Identity Pool which authorizes the user as an IAM role
5. The user can then use AWS resources like S3 for the application

![cognito](./resources/cognito.png)

* User pool handles registration, authentication
* Identity pool handles authorization

## Exam Tips on Lambda
- It automatically scales out (not up). For e.g for 5 invocations/events 5 lambda's are executed at the same time
- 1 event = 1 function
- 1 lambda function can trigger other lambda functions; 1 trigger/event can = x functions if functions trigger other functions
- It is serverless (no need to worry about EC2, patching, scaling etc.)
- Know what services are serverless e.g RDS is not serverless but Aurora serverless is
- Architectures can get extremely complicated, AWS __X-Ray__ allows you to __debug__ what is happening
- Lambda can do thing globally, e.g can backup S3 buckets to other buckets
- Know your triggers e.g RDS can't trigger lambda

## QUESTIONS/DOUBTS/NOTES

- If an Amazon EBS volume is an additional partition (not the root volume), can I detach it without stopping the instance? __Yes__, although it may take some time.
- Can I delete a snapshot of an EBS Volume that is used as the root device of a registered AMI? __No__.
- Which AWS CLI command should I use to create a snapshot of an EBS volume? __aws ec2 create-snapshot__
- Can you attach an EBS volume to more than one EC2 instance at the same time? __As of Feb 2020 you can attach certain types of EBS volumes to multiple EC2 instances. https://aws.amazon.com/blogs/aws/new-multi-attach-for-provisioned-iops-io1-amazon-ebs-volumes/__
-If I wanted to run a database on an EC2 instance, which of the following storage options would Amazon recommend? __EBS__
- MySQL installations default to port number 3306
- If you want your application to check RDS for an error, have it look for an ______ node in the response from the Amazon RDS API: __error__
- What data transfer charge is incurred when replicating data from your primary RDS instance to your secondary RDS instance? __no charge__
- If you are using Amazon RDS Provisioned IOPS storage with a Microsoft SQL Server database engine, what is the maximum size RDS volume you can have by default? __16TB__
- You are hosting a MySQL database on the root volume of an EC2 instance. The database is using a large number of IOPS, and you need to increase the number of IOPS available to it. What should you do? __Add 4 additional ESB SSD volumes and create RAID 10 using these volumes__
- When you add a rule to an RDS DB security group, you must specify a port number or protocol. __False. Technically a destination port number is needed, however with a DB security group the RDS instance port number is automatically applied to the RDS DB Security Group.__
- There is a limit to the number of domain names that you can manage using Route 53. Default limit is 50 but you can increase it by contacting AWS support.
- Having just created a new VPC and launching an instance into its public subnet, you realise that you have forgotten to assign a public IP to the instance during creation. What is the simplest way to make your instance reachable from the outside world? __Create Elastic IP and associate with your instance. Although creating a new NIC & associating an EIP also results in your instance being accessible from the internet, it leaves your instance with 2 NICs & 2 private IPs as well as the public address and is therefore not the simplest solution. By default, any user-created VPC subnet WILL NOT automatically assign public IPv4 addresses to instances – the only subnet that does this is the “default” VPC subnets automatically created by AWS in your account.__
- Are you permitted to conduct your own vulnerability scans on your own VPC without alerting AWS first? __Until recently customers were not permitted to conduct penetration testing without AWS engagement. However that has changed. There are still conditions though__
- By default, instances in new subnets in a custom VPC can communicate with each other across Availability Zones. __True. In a custom VPC with new subnets in each AZ, there is a route that supports communication across all subnets/AZs. Plus a default SG with an allow rule 'All traffic, all protocols, all ports, from anything using this default SG'.__
- Which of the following offers the largest range of internal IP addresses? a)/16 b)/28 c)/24 d)/20 __/16__
- By default, how many VPCs am I allowed in each AWS region? __5__
- To save administration headaches, a consultant advises that you leave all security groups in web-facing subnets open on port 22 to 0.0.0.0/0 CIDR. That way, you can connect wherever you are in the world. Is this a good security design? __0.0.0.0/0 would allow ANYONE from ANYWHERE to connect to your instances. This is generally a bad plan. The phrase 'web-facing subnets' does not mean just web servers. It would include any instances in that subnet some of which you may not strangers attacking. You would only allow 0.0.0.0/0 on port 80 or 443 to to connect to your public facing Web Servers, or preferably only to an ELB. Good security starts by limiting public access to only what the customer needs. Please see the AWS Security whitepaper for complete details.__
- You have a website with three distinct services, each hosted by different web server autoscaling groups. Which AWS service should you use? __The ALB has functionality to distinguish traffic for different targets (mysite.co/accounts vs. mysite.co/sales vs. mysite.co/support) and distribute traffic based on rules for target group, condition, and priority.__
- You have been tasked with creating a resilient website for your company. You create the Classic Load Balancer with a standard health check, a Route 53 alias pointing at the ELB, and a launch configuration based on a reliable Linux AMI. You have also checked all the security groups, NACLs, routes, gateways and NATs. You run the first test and cannot reach your web servers via the ELB or directly. What might be wrong? __In a question like this you need to evaluate if all the necessary services are in place. The glaring omission is that you have not built an autoscaling group to invoke the launch configuration you specified. The instance count and health check depend on instances being created by the autoscaling group. Finally, key pairs have no relevance to services running on the instance.__
- In discussions about cloud services the words 'availability', 'durability', 'reliability' and 'resiliency' are often used. Which term is used to refer to the likelihood that a resource will continue to exist until you decide to remove it? __Durability refers to the on-going existence of the object or resource. Note that it does not mean you can access it (availability), only that it continues to exist.__
- In discussions about cloud services the words 'availability', 'durability', 'reliability' and 'resiliency' are often used. Which term is used to refer to the likelihood that a resource ability to recover from damage or disruption? __Resiliency can be described as the ability to a system to self heal after damage or an event. Note that this does not mean that it will be available continuously during the event, only that it will self recover.__
- In discussions about cloud services the words 'availability', 'durability', 'reliability' and 'resiliency' are often used. Which term is used to refer to the likelihood that a resource will work as designed? __Reliability is closely related to availability, however a system can be 'available' but not be working properly. Reliability is the probability that a system will work as designed. This term is not used much in AWS, but is still worth understanding.__
- You work for a manufacturing company that operate a hybrid infrastructure with systems located both in a local data center and in AWS, connected via AWS Direct Connect. Currently, all on-premise servers are backed up to a local NAS, but your CTO wants you to decide on the best way to store copies of these backups in AWS. He has asked you to propose a solution which will provide access to the files within milliseconds should they be needed, but at the same time minimizes cost. As these files will be copies of backups stored on-premise, availability is not as critical as durability. Choose the best option from the following which meets the brief. __S3 OneZone-IA provides on-line access to files, while offering the same 11 9's of durability as all other storage classes. The trade-off is in the availability - 99.5% as opposed to 99.9%-99.99%. However in this brief as cost is more important than availability, S3 OneZone-IA is the logical choice . RRS is deprecated and new uses are strongly discouraged by AWS.__
- Can I "force" a failover for any RDS instance that has multi-AZ configured? __Yes, by restarting__
- Amazon SWF restricts me to the use of specific programming languages. __False. While there are a limited range of SDKs available for SWF, AWS provides an HTTP based API which allows you to interact using any language as long as you phrase the interactions in HTTP requests.__
- What happens when you create a topic on Amazon SNS? __Amazon Resource Name is created__
- In SWF, what does a "domain" refer to? __A collection of relatable workflows__
- By default, EC2 instances pull SQS messages from a standard SQS queue on a FIFO (first in first out) basis. __False__
- What does Amazon SES stand for? __Simple Email Service__
- Like any services in AWS, Lambda needs to have a role associated with it that provide credentials with rights to other services. This is exactly the same as needing a role on an EC2 instance to access S3 or DDB.
- You have created a serverless application to add metadata to images that are uploaded to a specific S3 bucket. To do this, your lambda function is configured to trigger whenever a new image is created in the bucket. What will happen when multiple users upload multiple different images at the same time? __Each time a Lambda function is triggered, an isolated instance of that function is invoked. Multiple triggers result in multiple concurrent invocations, one for each time it is triggered.__