# Reviewing AWS Services

Ok, let's take this opportunity to review the various services we have been using.  To begin remember that we have transitioned from a server based architecture to a serverless architecture.

<img src="./architecture.png">

* Server based

With our server based architecture, we have long running, always on services.  For example, our EC2 machine hosts our code, and is always running regardless of use.  Our RDS instance also a running server.  It continues running regardless of the number of queries (unless we shut it down).

* Serverless

With our serverless, we move to a model where we essentially pay per transaction.  For example, with a lambda function we pay per invocation of the lambda function, with Athena we pay per query.

### Querying and Storage

<img src="./athena-s3.png" width="70%">

* Athena
    * A serverless database, with a pay as you go pricing model.  Our data is stored in S3 buckets and we use Athena to query these buckets.  In other words, Athena separates storage from compute.
    * With athena we specify an input bucket and query results are immediately sent to an output bucket.  Athena is good for ad hoc and data exploration queries, but is too slow to rely on as a transactional or analytical database (remember each query may take a few seconds).

* S3 bucket
    * Used for object (file) storage in a scalable and secure way. Scalable because buckets automatically scale, and secure as we can specify detailed permissions and policies for use. 


<img src="./glue-aws.png" width="70%">


* Glue 
    *  Discovers and prepares data from multiple sources, whether S3 buckets or an RDS instance.  We used glue to crawl our S3 buckets.

* Lakeformation
    * Provides organization to a datalake, which is a collection of unstructured data (store in S3 buckets in an AWS stack).  Describes data sources in a centralized data catalog, with concepts of a database (really just a folder) that points to a collection of tables (our S3 buckets).

### Event driven Pipeline

<img src="./reviewing-services.png">

* Eventbridge
    * Has components of an event bus, which events are sent to.  And an event rules, which routes events to different targets.  Follows an event driven architecture pattern of a producer of events, a consumer (the target), and a router (the event rule)
    
<img src="./eventrule.png">

> Above an s3 object create event creates the event, and the s3 event routes it to a lambda function.

Below, we use our eventbridge rule of an event scheduler to generate events and send them to our target, here our lambda function.

<img src="./eventscheduled.png" width="70%">

* Lambda function
    * A serverless alternative to an EC2 machine.  Allows us to use a pay as you go model, where we pay per transaction, length of transaction and memory resources used.  Lambda automatically scales your functions based on incoming requests.  But as a downside, you get less configuration over the operating system, and is not ideal for for long running processes (because of time and resource restrictions).

* Cloudwatch 
    * Allows us to monitor AWS resources, such as invocations of a lambda function.  Produces logs and logstreams associated with various resources.

### Security

* A role is a way to grant specific permissions to entities (such as users, applications, or services) that can then assume that role and inherit its permissions. 

* Policy - A role has many policies.  A policy specifies the actions that are allowed or denied on AWS resources.  A policy can be attached to a role to specify what the service or user has access to.

* A user - A user can also have many policies.  The difference between a user and a role is that a user is associated with a person, who has login credentials.  A role can attach to a service (like a lambda function, which has policies that give it certain permissions to use other services like Athena).

* ACL (Access control lists) - attach to a specific bucket or object.  They can be used to specify how others can access the buckets and objects.  (Remember that an ACL is like a bouncer at a club, specifying who can enter).  

### IAC Tools

* Boto3
    * Allows us to *interact* with our AWS resources through Python.  For example, can read or upload files to a bucket, trigger a lambda function, or query Athena.
    
* Docker 
    * Allows us to create an image which is a collection of files representing an environment, and start a container which is process that depends on that environment (like starting a web server).
    
* Dockerhub 
    * We can deploy our image to Dockerhub which stores these files.  And we can download (or pull) an image from dockerhub.

* Elastic Container Registry (ECR)
    * Allows us to store our image in an AWS repository.  This way we can keep our images private, and give AWS services permission to use the image (like a lambda function).
    
* Serverless 
    * An IAC tool that allows us to set up our AWS stack.  For example, can create buckets, lambda functions, eventbridge rules, and establish permissions between these services.  Underneath it generates and calls AWS cloudformation, which directly generates the stack.