![Thinkube AI Lab](../icons/tk_full_logo.svg)

# Storage Guide 📦

Learn how to work with different storage types in Thinkube:
- Personal persistent storage
- Shared storage
- Object storage (S3)
- Best practices

## Personal Persistent Storage (`~/work/`)

This is YOUR space - persistent across pod restarts.

In [None]:
# Create and test personal persistent storage
from pathlib import Path
import pandas as pd

# TODO: Get work directory path
# TODO: Create work directory if it doesn't exist
# TODO: Create a test file with timestamp
# TODO: Read and display test file content

## Shared Storage (`~/shared/`)

Shared across ALL users - great for datasets and collaboration.

**Note**: Be respectful of space and organize your files!

In [None]:
# Explore shared storage
from pathlib import Path

# TODO: Get shared directory path
# TODO: List items in shared storage
# TODO: Display first 10 items with sizes
# TODO: Handle case where shared storage is not mounted

## Best Practice: Organize Shared Data

Recommended structure for shared storage:

In [None]:
# Create recommended directory structure
from pathlib import Path

# TODO: Create datasets/ directory in shared
# TODO: Create models/ directory in shared
# TODO: Display created structure

## Object Storage (S3-compatible)

Best for large files and backups.

In [None]:
# Setup S3 client for SeaweedFS
import boto3
from botocore.client import Config
import os

# TODO: Load credentials from environment
# TODO: Create S3 client with SeaweedFS endpoint
# TODO: Determine user's bucket name
# TODO: Display bucket name

## Create Your S3 Bucket

In [None]:
# Create S3 bucket

# TODO: Try to create bucket
# TODO: Handle BucketAlreadyOwnedByYou exception
# TODO: Handle other exceptions
# TODO: Display creation status

## Upload Files to S3

In [None]:
# Upload test file to S3

# TODO: Create test data
# TODO: Upload to S3 with put_object
# TODO: List objects in bucket
# TODO: Display uploaded files with sizes

## Download from S3

In [None]:
# Download file from S3

# TODO: Get object from S3
# TODO: Read content
# TODO: Display downloaded content

## Upload Large Files with Progress Bar

In [None]:
# Upload large file with progress tracking
from tqdm import tqdm
import os

# TODO: Define upload_with_progress function
# TODO: Create dummy large file (10 MB)
# TODO: Upload with progress bar
# TODO: Clean up local dummy file

## Storage Best Practices

### Use Personal Storage (`~/work/`) for:
- ✅ Your notebooks and code
- ✅ Small datasets (<1 GB)
- ✅ Experiment results
- ✅ Configuration files

### Use Shared Storage (`~/shared/`) for:
- ✅ Common datasets (e.g., MNIST, CIFAR-10)
- ✅ Pretrained models
- ✅ Team collaboration
- ❌ DO NOT store large personal files here

### Use S3 Storage for:
- ✅ Large datasets (>1 GB)
- ✅ Model checkpoints and artifacts
- ✅ Backups
- ✅ Long-term archival
- ✅ Sharing with other users (via bucket policies)

### Ephemeral Storage (everything else):
- ⚠️ Temporary files only
- ⚠️ Will be deleted on pod restart!

## Clean Up Test Files

In [None]:
# Clean up S3 test files

# TODO: Delete test/hello.txt from S3
# TODO: Delete data/large_file.bin from S3
# TODO: Display cleanup status

# Note: We keep test_persistence.txt in ~/work/ to demonstrate persistence