# DS107 Big Data : AWS Networking Content Delivery and Compute

### Table of Contents <a class="anchor" id="DS107L3_toc"></a>

* [Table of Contents](#DS107L3_toc)
    * [Page 1 - Introduction](#DS107L3_page_1)
    * [Page 2 - Networking Basics](#DS107L3_page_2)
    * [Page 3 - Amazon VPC](#DS107L3_page_3)
    * [Page 4 - VPC Networking](#DS107L3_page_4)
    * [Page 5 - VPC Security](#DS107L3_page_5)
    * [Page 6 - Amazon Route 53](#DS107L3_page_6)
    * [Page 7 - Amazon CloudFront](#DS107L3_page_7)

    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Overview of this Module<a class="anchor" id="DS107L3_page_1"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Topics

- Networking basics
- Amazon VPC
- VPC networking 
- VPC security
- Amazon Route 53
- Amazon CloudFront

## Activities

- Label a network diagram
- Design a basic VPC architecture

## Demo

- VPC Demonstration

## Lab

- Build a VPC and launch a web server

### After completing the AWS Networking Content Delivery module you should be able to:

- Recognize the basics of networking
- Describe virtual networking in the cloud with Amazon VPC
- Label a network diagram
- Design a basic VPC architecture
- Indicate the steps to build a VPC
- Identify security groups
- Create your own VPC and add additional components to it to produce a customized network
- Identify the fundamentals Amazon Route 53 
- Recognize the benefits of CloudFront

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Networking Basics<a class="anchor" id="DS107L3_page_2"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## Networking Basics

<p style="text-align: center">
  <img  src="Media/AWS-Basic-Network.png" width="600" alt="Amazon VPC">
</p>

A computer network consists of two or more computers connected together in order to communicate. A network can be logically partitioned into subnets. Networking requires network equipment such as routers or switches. These devices connect all the computers together and enables communication between them. 

<p style="text-align: center">
  <img  src="Media/AWS-IP-Address.png" width="600" alt="Amazon VPC">
</p>

Each machine on the network has a unique Internet Protocol (IP) address assigned to it. An IP address is similar to a phone number and must be unique on that subnet for it to adequately communicate with other computers have their own IP address. Machines convert that decimal number to a binary format in order to use it. For the IP address 192.0.2.0, each of the four dot separated numbers of the address represent a maximum of 8 bits called an octet. This means that each of the four numbers can be anything from 0 to 255. The combined total of the four numbers for an IP address is 32 bits in binary format. A 32 bit IP address is called an IPv4 address.

<p style="text-align: center">
  <img  src="Media/AWS-IPv4-IPv6.png" width="600" alt="Amazon VPC">
</p>

IPv6 addresses also exist and use a 128 bit binary format. IPv6 can address more devices and were created since we're running out of IPv4 addresses. An IPv6 address is composed of 8 groups of four letters and numbers that are separated by colons. Each of the groups represent 16 bits. That means that each of the groups can represent anything from 0 to ffff. These are hexidecimal numbers i.e. (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F)


<p style="text-align: center">
  <img  src="Media/AWS-CIDR.png" width="600" alt="Amazon VPC">
</p>

A common method to describe networks and groups of IP address is called Classless Inter-Domain Routing or CIDR. A CIDR address is expressed as an IP address, which is the first address of the network followed by a / (forward slash character). Where the prefix must be steady and not changeable and allocated for the network identifier. The bits that are not fixed are allowed to change. CIDR is a way to express a group of IP addresses that are consecutive to each other. In the above example, the CIDR address is 192.0.2.0/24 where the last octet is changeable and the first three octets are fixed. This means that there are 256 addresses available for this network. The range for this network is 192.0.2.0 - 192.0.2.255. 

<p style="text-align: center">
  <img  src="Media/AWS-OSI-Model.png" width="800" alt="Amazon VPC">
</p>

The Open System Interconnection (OSI) Model is a conceptual model that is used to explain data as it travels over a network and which layer the computer may receive it at. It consists of seven layers and shows the various common protocols and addresses that are used to send and receive data at each layer. For example, hubs and switches operate at layer 2 where a router will operate at layer 3. The OSI model is used to demonstrate and understand how communication takes place on a simple network as well as the cloud and Internet.  

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Amazon Virtual Private Cloud (VPC)<a class="anchor" id="DS107L3_page_3"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## Amazon Virtual Private Cloud (VPC)

<p style="text-align: center">
  <img  src="Media/Amazon-VPC.png" width="100" alt="Amazon VPC">
</p>

- Enables you to provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
- Gives you control over your virtual networking resources, including:
    - Selection of IP address range.
    - Creation of subnets.
    - Configuration of route tables and network gateways.
- Enables you to customize the network configuration for your VPC.
- Enables you to use multiple layers of security.

## VPCs and Subnets

- VPCs:
    - Logically isolated from other VPCs.
    - Dedicated to your AWS account.
    - Belong to single AWS Region and can span multiple Availability Zones (AZs).
- Subnets:
    - Range of IP addresses that divide a VPC.
    - Belong to a single Availability Zone.
    - Classified as public or private.

<p style="text-align: center">
  <img  src="Media/AWS-Cloud-VPC.png" width="400" alt="Amazon VPC">
</p>

## IP Adressing

- When you create a VPC, you assign it to an IPv4 CIDR block (range of IPv4 addresses).
- You cannot change the address range after you create the VPC.
- The largest IPv4 CIDR block is /16.
- The smallest IPv4 CIDR block is /28.
- IPv6 is also available (with different block size limit).
- CIDR blocks of subnets cannot overlap.

<p style="text-align: center">
  <img  src="Media/AWS-IP.png" width="400" alt="Amazon VPC">
</p>

## Reserved IP addresses

Example: A VPC with an IPv4 CIDR block 10.0.0.0/16 has 65,536 total IP addresses. The VPC has four equal size subnets. Out of the 256 addresses in a /24 subnet, only 251 IP addresses are available for use by each subnet.

<p style="text-align: center">
  <img  src="Media/AWS-IP-Subnets.png" width="400" alt="Amazon VPC">
</p>

### 5 reserved IP addresses

<p style="text-align: center">
  <img  src="Media/AWS-Reserved-IPs.png" width="400" alt="Amazon VPC">
</p>

## Public IP address types

### Public IPv4 addresses

- Manually assigned through an Elastic IP address
- Automatically assigned through the auto-assign public IP address settings at the subnet level

### Elastic IP address

- Associated with the AWS account
- Can be allocated and remapped anytime
- Additional costs might apply


## Elastic Network Interface

<p style="text-align: center">
  <img  src="Media/AWS-Elastic-Network-Interface.png" width="400" alt="Amazon VPC">
</p>

- An elastic network interface is a virtual network interface that you can:
    - Attach to an instance
    - Detach from an instance, and attach to another instance to redirect network traffic
- Its attributes follow when it is reattached to a new instance.
- Each instance in your VPC has a default network interface that is assigned a private IPv4 address from the IPv4 range of your VPC.

## Route tables and routes

<p style="text-align: center">
  <img  src="Media/AWS-Route-Table.png" width="400" alt="Amazon VPC">
</p>

- A route table contains a set of rules (or routes) that you can configure to direct network traffic from your subnet.
- Each route specifies a destination and a target.
- By default, every route table contains a local route for communication within the VPC.
- Each subnet must be associated with a route table.



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - VPC Networking<a class="anchor" id="DS107L3_page_4"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## VPC Networking

Now that you have learned about the basic components of the VPC, you can start routing traffic in interesting ways.

<p style="text-align: center">
  <img  src="Media/AWS-Internet-Gateway.png" width="800" alt="Amazon VPC">
</p>

An Internet gateway is a scalable, redundant, and highly available VPC component that allows communication between instances in your VPC and the public Internet. An Internet gateway services. 
- Number One, to provide a target in your VPC route tables for Internet traffic. 
- Number two, perform network address translation for instances that were assigned IPv4 addresses. 

To make that subnet public, you attach and Internet gateway to your VPC and add a route entry to the route table associated with the subnet. 

<p style="text-align: center">
  <img  src="Media/AWS-Attach-IG.png" width="800" alt="Amazon VPC">
</p>


<p style="text-align: center">
  <img  src="Media/AWS-Add-Route-Table.png" width="800" alt="Amazon VPC">
</p>

## Network address translation (NAT)

A network address translation, or NAT, gateway enables instances in a private subnet to connect to the Internet or other AWS services. It prevents the public Internet from initiating a connection with those instances. To create a NAT gateway you must specify the public subnet in which NAT gateway should live. 

<p style="text-align: center">
  <img  src="Media/AWS-NAT.png" width="800" alt="Amazon VPC">
</p>

You must also specify an elastic IP address to associate with the NAT gateway when you create it. After you create the NAT gateway you update the route table that is associated with one or more of your private subnets to point Internet-bound traffic to NAT gateway. This will allow instances in your private subnets to communicate with the Internet. You can also use a NAT instance in a public subnet in your VPC instead of a NAT gateway. However, AWS recommends that you use a NAT gateway instead of a NAT instance because a NAT gateway is a managed service that provides better availability, higher bandwidth, and less administrative effort.

<p style="text-align: center">
  <img  src="Media/AWS-Update-Route-Table.png" width="800" alt="Amazon VPC">
</p>

## VPC Sharing

VPC Sharing enables customers to share subnets with other AWS accounts in the same organizations. VPC sharing enables multiple AWS accounts to create their application resources, such as Amazon EC2 instances, Amazon Relational Database Services, Amazon Redshift clusters, and AWS Lambda functions into a shared, centrally managed VPC. In this model, the account that owns the VPC shares one or more subnets with other accounts called the participants that belong to the same organization. After a subnet is shared , participants can view, create, modify, and delete their application resources in the subnets that are shared with them. 


<p style="text-align: center">
  <img  src="Media/AWS-VPC-Sharing.png" width="800" alt="Amazon VPC">
</p>

## VPC Peering

A VPC Peering connection enables you to privately route traffic between two VPCs. Instances in either VPC can communicate with each other as if they were on the same network. 

You can connect VPCs in your own account, between AWS accounts, or between AWS Regions.

Restrictions:

- IP space cannot overlap
- Transitive peering is not supported
- You can only have one peering resource between the same two VPCs.

<p style="text-align: center">
  <img  src="Media/AWS-VPC-Peering.png" width="800" alt="Amazon VPC">
</p>

In the route table for VPC A, you set the destination for VPC B, and the target to be the peering resource ID.

In the route table for VPC B, you set the destination to be the IP address of VPC A with the target to be the peering resource ID.

## AWS site-to-site VPN

By default, instances that you launch into an Amazon VPC cannot communicate over their own remote network. You can enable access to you remote network from your VPC by attaching a virtual private gateway to the VPC, creating a custom route table, updating your security group rules, creating an AWS site-to-site VPN connection, and configuring routing pass through traffic to the connection. 

<p style="text-align: center">
  <img  src="Media/AWS-Site-to-Site-VPN.png" width="800" alt="Amazon VPC">
</p>

## AWS Direct Connect

One of the challenges of network communications is network performance. Performance can be negatively affected if your data center is located far away from you AWS Region. For such situations, AWS offers AWS Direct Connect. AWS Direct Connect enables you to establish a dedicated private connection between your network and one of the direct connect locations. This private connection can increase bandwidth, throughput, and provide a more consistent experience than Internet-based connections or VPN connections. Direct Connect uses open standard 802.1q virtual local area networks.

<p style="text-align: center">
  <img  src="Media/AWS-Direct-Connect.png" width="800" alt="Amazon VPC">
</p>

## AWS VPC Endpoint

On occasion, you will need to connect VPC resources to AWS regional services like Amazon S3 and DynamoDB. A VPC Endpoint is a virtual device that enables you to privately connect your VPC to these supported services. A VPC gateway endpoint is a gateway that you specify as a target for a route in your route table, for traffic destined to either Amazon S3 or Amazon DynamoDB. Traffic between your VPC and these services does not leave Amazon network, so it remains private.

<p style="text-align: center">
  <img  src="Media/AWS-VPC-Endpoint.png" width="800" alt="Amazon VPC">
</p>

## AWS PrivateLink

More recently AWS has introduced AWS PrivateLink. It requires a VPC interface endpoint. AWS PrivateLink simplifies the security of data shared with cloud-based applications, by eliminating the exposure of data to the public internet. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premise applications. All traffic flows securely on the Amazon network. AWS PrivateLink makes it easy to connect  services across different accounts and VPCs to significantly simplify network architectures.

<p style="text-align: center">
  <img  src="Media/AWS-PrivateLink.png" width="800" alt="Amazon VPC">
</p>

## AWS Transit Gateway

Consider how you might connect hundreds of VPCs together. Each VPC pair will require a dedicated peering connection. The complexity of connectivity can become a heavy burden and won't scale well. A transit gateway is a network transit hub that you use to interconnect your virtual private clouds. You can also connect your on-premises network. You can attach a VPC, AWS Direct Connect gateways, or VPN connections to a transit gateway. The topology becomes hub and spoke, which reduces the number of connections required, the complexity to implement, and be able to maintain it.

<p style="text-align: center">
  <img  src="Media/AWS-Transit-Gateway.png" width="800" alt="Amazon VPC">
</p>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - VPC Security<a class="anchor" id="DS107L3_page_5"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## VPC Security

You can build security into your VPC architecture in several ways so that you have complete control over incoming and outgoing traffic. There are two firewall options you can use to secure your VPC.

<p style="text-align: center">
  <img  src="Media/AWS-Security-Groups.png" width="800" alt="Amazon VPC">
</p>

### Security groups
- A security group acts as a virtual firewall that controls inbound and outbound traffic, to and from your instances.
- Security groups acts at the instance level. Particularly, the network interface card, and you can assign each instance in your VPC to a different set of security groups. 
- You can think of a security group as a way to filter traffic, to and from your instances. 
- Security groups are the equivalent of firewalls for your EC2 instances. 
- They contain rules to allow inbound traffic.
- By default, security groups are sealed shut.
- Security groups are stateful. The outbound traffic is always allowed.
    
<p style="text-align: center">
  <img  src="Media/AWS-Security-Groups-Firewall.png" width="800" alt="Amazon VPC">
</p>    

### Network access control lists

<p style="text-align: center">
  <img  src="Media/AWS-NACL.png" width="800" alt="Amazon VPC">
</p>  

- Network access control lists work at the subnet level, and control traffic in and out of the subnet.
- You can set up network ACLs with rules to allow or deny.
- You can also specify ports and protocols.
- Each subnet in your VPC must be associated with a network ACL.
- You can associate a network ACL with multiple subnets. However, a subnet can only be associated with one network ACL.

<p style="text-align: center">
  <img  src="Media/AWS-NACL-Rules.png" width="800" alt="Amazon VPC">
</p>  

- A network ACL has separate inbound and outbound rules, and each rule can either allow or deny traffic. 
- Default network ACLs allow all inbound and outbound IPv4 traffic.
- Network ACLs are stateless.

### Security groups versus network ACLs

<p style="text-align: center">
  <img  src="Media/AWS-Security-Groups-vs-NACLs.png" width="800" alt="Amazon VPC">
</p> 


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Amazon Route 53<a class="anchor" id="DS107L3_page_6"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## Amazon Route 53

<p style="text-align: center">
  <img  src="Media/AWS-53.png" width="100" alt="Amazon VPC">
</p>

DNS Resolution is the process of translating an internal name to the corresponding IP address. The DNS protocol stands for Domain Name System and it functions as a phone book where internet names are replaced for the IP addresses of the corresponding machines.

<p style="text-align: center">
  <img  src="Media/AWS-Route53.png" width="600" alt="Amazon VPC">
</p>

- Amazon Route 53 gives you the ability to register a domain name such as yourcompany.com and have the service handle the names and hosts related to that account.
- Route 53 is highly available, scalable, fully compliant with IPv4 and IPv6.
- Connects user requests to infrastructure running AWS and also outside of AWS.
- Is used check the health of your resources.
- Features traffic flow.
- Enables you to register domain names.

## Amazon 53 supported routing

- Simple routing: Use in single-server environments.
- Weighted routing: Assign weights to resource record sets to specify the frequency.
- Latency routing: Help improve your global applications.
- Geolocation routing: Route traffic based on the location of your users.
- Geoproximity routing: Route traffic based on the location of your resources.
- Failover routing: Fail over to a backup site if your primary site becomes unreachable.
- Multivalue answer routing: Respond to DNS queries with up to eight healthy records selected at random.

<p style="text-align: center">
  <img  src="Media/AWS-Multi-Region-53.png" width="800" alt="Amazon VPC">
</p>



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Amazon CloudFront<a class="anchor" id="DS107L3_page_7"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


## Amazon CloudFront

One of the challenges of network communication is network performance. When you browse to a website, your request is routed through different networks. The origin server stores the original version of the data, which is commonly high density data such as images, songs, and videos. The distance between the customer and the original data server significantly affects performance in the playback and user experience. Also, network latency happens to be different depending on the geographic location of your users.

<p style="text-align: center">
  <img  src="Media/AWS-Network-Latency.png" width="800" alt="Amazon VPC">
</p>



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - <a class="anchor" id="DS107L3_page_8"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Key Terms<a class="anchor" id="DS107L3_page_9"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Key Terms

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>MapReduce</td>
        <td>A programming model used to process big data in parallel using a map procedure to filter and process, and a reduce procedure to perform data aggregation and summarization.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Hive</td>
        <td>A program that allows you to write SQL queries for your Hadoop cluster.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>HiveQL</td>
        <td>The Hive brand of SQL.  Primarily only differs in how views are used.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>User-Defined Functions</td>
        <td>Functions you, the user, create.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Schema on Read</td>
        <td>Storing unstructured data and only giving the data a structure when you go to use it.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Schema on Write</td>
        <td>Storing data in structured tables.  A traditional SQL storage system.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Sqoop</td>
        <td>A program to integrate Hive and traditional database connections like MySQL.</td>
    </tr>
</table>

---

## Key SQL Commands

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>create view</td>
        <td>Makes a view</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>drop view</td>
        <td>Removes a view.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>create database</td>
        <td>Sets up an empty database.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>show database</td>
        <td>Allows you to view all the tables in your database.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>drop table if exists</td>
        <td>Allows you to overwrite a table.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>use</td>
        <td>Sets the table you're working in.</td>
    </tr>
</table>

---

## Key Command Line Code

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>mysql -u root -p</td>
        <td>Connection sequence to get into MySQL.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>sqoop import</td>
        <td>Imports data from a database connection.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>--hive-import</td>
        <td>A specifier to sqoop import that allows you to import data from a database connection directly into Hive.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>sqoop export</td>
        <td>Exports data from Hadoop into a database connection.</td>
    </tr>
</table>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - Lesson 3 Hands-On<a class="anchor" id="DS107L3_page_10"></a>

[Back to Top](#DS107L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">