Skip to content

Latest commit

 

History

History
99 lines (91 loc) · 5.41 KB

5.4. Storage - S3 Integrated Services (Snowball, Athena, Storage Gateway).md

File metadata and controls

99 lines (91 loc) · 5.41 KB

S3 Integrated Services

S3 Select

  • Pull out only the subset of data you need from an object by using simple SQL expressions
  • Improve the performance by 400% and reduce the cost of applications that need to access data in S3.

Snowball

  • Peta-byte scale data transport solution
  • Physical data transport between 80-50 TB to/from S3.
  • Secure tamper resistant, use KMS 256 bit encryption
  • Tracking using SNS and text messages. E-ink shipping label
  • Pay per data transfer job
  • Use cases: large data cloud migrations, DC decommissions, disaster recovery
    • 💡 Use it if data takes longer than a week to upload e.g. > 2TB for 45 Mbps (T3, 269 days), >5TB 100 Mbps (120 days), >60TB 1000 Mbps (12 days).
  • Flow
    1. Request snowball devices from the AWS console for delivery
      • AWS Console -> Snowball service -> Plan your job (Import into S3, Export from S3, Local compute and storage only) -> Give shipping information & choose shipping speed -> Choose snowball or snowball edge variant you want -> Choose storage -> Load EC2 EMI -> IAM Role for Snowball -> Select AWS KMS Encryption key -> Set notification (SNS) & notification topics
    2. Install the snowball client on your servers
    3. Connect the snowball to your servers using ID and manifest file from Console and copy files using the client
    4. Ship back the device when you're done (goes to the right AWS facility)
    5. Data will be loaded into an S3 bucket
    6. Snowball is completely wiped
    7. Tracking is done using SNS, text messages and the AWS console
  • Snowball Edge
    • Snowball Edge add computational capability (e.g. lambda) to the device
      • You can use as local storage only if e.g. your region doesn't have lambda or for durability & capacity.
    • 100 TB capacity with either:
      • Storage optimized - 24 vCPU
      • Compute optimized - 52 vCPU & optional GPU
    • Supports
      • A custom EC2 AMI so you can perform processing on the go
      • Custom Lambda functions
    • Very useful to pre-process the data while moving
    • Use cases: data migration, image collation, IoT capture, machine learning
  • Snowmobile
    • Exabyte scale data transfer service - 45 foot long container truck
    • Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)
    • If you have more than 100 TB or PBs data
    • Each Snowmobile has 100 PB of capacity (use multiple in parallel)
    • Better than Snowball if you transfer more than 10 PB e.g. video libraries, image repositories, on-premise migration.
  • 💡 Choose based on capacity:
    • 50 TB Snowball has 42 TB usable free space
    • 80 TB Snowball has 72 TB usable free space
    • 100 TB Snowball Edge has 83 TB usable free space

Storage Gateway

  • 📝Allows you to expose S3 to on-premises.
  • Bridge between on-premise data and cloud data in S3
  • 💡 Use cases: disaster recovery, backup & restore, tiered storage
  • Flow
    • Can be installed on Hyper-V and VMware.
    • You then create gateway on Console
  • 📝 Three types of Storage Gateway:
    • File gateway -> S3
      • Store & update files as objects in S3 asynchronously as you change them.
      • Ownership/permissions and timestamps are stored in S# in the user-metadata of the object.
      • Accessible using the NFS (network file system) and SMB protocol
      • Supports S3 standard, & S3 IA, S3 One Zone IA & Glacier.
      • Bucket access using IAM roles for each File Gateway
      • Most recently used data is cached locally in the file gateway -> Low latency access
      • Can be mounted on many servers
      • 📝 File access / NFS => File Gateway (backed by S3)
    • Volume gateway -> EBS
      • Block storage in Amazon S3 with point-in-time backups as compressed Amazon EBS snapshot.
      • Allows you to view S3 bucket as mountable volume
      • Block storage using iSCSI protocol backed by S3
      • Two options:
        • Cached volumes: most frequently data are cached on premise on the volumes
        • Stored volumes: entire dataset is on premise, scheduled backups to S3 as EBS snapshots
      • 📝 Volumes / Block Storage / iSCSI => Volume gateway (backed by S3 with EBS snapshots)
    • Tape Gateway -> Glacier
      • Input: Your existing tape-based processes
      • Output: As virtual tapes (using Virtual Tape Library, i.e. VTL) in S3 or Glacier.
      • Uses iSCSI which is interface & networking standard for linking data storage facilities.
      • 📝 VTL Tape solution / Backup with iSCSI => Tape gateway (backed by S3 and Glacier)
  • Security: Your data is encrypted at rest by default in S3.
  • Flow
    • AWS Console -> Storage Gateway -> Choose gateway type -> Select host platform (VMware / hyper-v / EC2) -> IP address of gateway VM

Athena

  • Serverless service to perform analytics directly against S3 files
  • Uses SQL language to query the files
  • Has a JDBC / ODBC driver that allows you to connect BI tools to it.
  • Charged per query and amount of data scanned
  • Supports CSV, JSON, ORC, Avro and Parquet (built on Presto)
  • 💡Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc..
  • 📝 Analyze data directly on S3 -> use Athena
  • Usage
    1. Go to Athena on Console
    2. Create database with a query
    3. Create table & link to a bucket
      • It's a link so you don't pay anything to have tables
    4. You can then write SQL queries against the table
  • Allows you to easily query encrypted data stored in S3 and write encrypted results back.
    • Both, server-side encryption and client-side encryption are supported