# Amazon S3 (Simple Storage Service)

<img src="../_assets/aws_service_icons/s3.svg" width="80" alt="Amazon S3">

## Goals
- Understand what **Amazon S3** is (and what it is not).
- Know common practical use-cases.
- See a minimal **AWS SDK** pseudo-code workflow (no execution).


## Prerequisites
- Basic cloud concepts (regions, IAM) help.
- Familiarity with files/paths and HTTP.

> This notebook includes **pseudo-code only**. It does not run any AWS SDK calls.


## What S3 is
**Amazon S3** is AWS’s managed **object storage** service.

Core concepts:
- **Bucket**: top-level container for objects (bucket names are globally unique).
- **Object**: the stored bytes + metadata.
- **Key**: the object name inside a bucket (keys can include `/` to act like folder *prefixes*).

Important properties and constraints:
- S3 is accessed over an **HTTP API** (and SDKs wrap that API).
- It’s not a traditional filesystem: you typically **upload/download whole objects** rather than doing in-place edits.
- It’s designed for very high durability and availability, and supports features like versioning, lifecycle policies, encryption, and access controls.

### What it is not
- Not a block disk (like EBS) or a network filesystem (like EFS).
- Not a database (no SQL queries without additional tools/services).


## What S3 is practically used for
S3 is commonly used as the default “place to put bytes” in AWS:
- **Data lakes**: raw/bronze data, curated/silver data, and analytics-ready/gold data.
- **Machine learning**: training datasets, feature files, model artifacts, experiment outputs.
- **Static assets**: images, documents, client bundles, and (optionally) static website hosting.
- **Backups and archives**: snapshots/exports and long-term retention with lifecycle rules.
- **Logs and event data**: centralized storage before downstream processing.
- **System-to-system exchange**: a simple integration point between services or teams.

Common operational patterns:
- Separate buckets (or prefixes) for **raw vs processed** data.
- Use **IAM roles/policies** and **bucket policies** to control access.
- Enable **encryption** (SSE-S3 or SSE-KMS) and consider **versioning** for recoverability.


## Using S3 with the AWS SDK (pseudo-code)
Below is a minimal, **non-executable** sketch of common S3 operations using an AWS SDK.

Notes:
- In real projects, avoid hardcoding credentials; prefer **IAM roles** (EC2/ECS/Lambda) or SSO-based local credentials.
- S3 operations can read/write real data and incur cost; use a dedicated dev bucket and clean up.

```python
# PSEUDO-CODE (do not run)

import boto3

region = "us-east-1"
s3 = boto3.client("s3", region_name=region)

bucket = "my-team-ml-artifacts"
prefix = "experiments/run-001/"

# 1) (Optional) Create a bucket (naming must be globally unique).
# s3.create_bucket(Bucket=bucket, CreateBucketConfiguration={"LocationConstraint": region})

# 2) Upload a local file.
local_path = "./models/model.pkl"
key = f"{prefix}model.pkl"
s3.upload_file(local_path, bucket, key)

# 3) List objects under a prefix.
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
for obj in resp.get("Contents", []):
    print(obj["Key"], obj["Size"])

# 4) Download an object to disk.
download_path = "./downloads/model.pkl"
s3.download_file(bucket, key, download_path)

# 5) Generate a temporary (pre-signed) URL for controlled access.
url = s3.generate_presigned_url(
    ClientMethod="get_object",
    Params={"Bucket": bucket, "Key": key},
    ExpiresIn=3600,
)
print(url)
```
