# Copy Model Artifacts to Public Bucket

## Purpose
Copy trained model artifacts from personal S3 bucket to shared public bucket so all team members can access them.

## Prerequisites
- Models trained via 01_benchmark_model.ipynb and 02_engineered_baseline.ipynb
- Write access to public bucket

## Output
Model artifacts available at:
- `s3://sagemaker-us-east-1-425709451100/aai540-group1/models/raw-baseline/model.tar.gz`
- `s3://sagemaker-us-east-1-425709451100/aai540-group1/models/engineered-no-target-encoding/model.tar.gz`

In [1]:
import boto3
import sagemaker

sess = sagemaker.Session()
s3_client = boto3.client('s3')

PERSONAL_BUCKET = sess.default_bucket()
PUBLIC_BUCKET = 'sagemaker-us-east-1-425709451100'
PREFIX = 'aai540-group1/models'

print(f"Source bucket: {PERSONAL_BUCKET}")
print(f"Destination bucket: {PUBLIC_BUCKET}")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
Source bucket: sagemaker-us-east-1-786869526001
Destination bucket: sagemaker-us-east-1-425709451100


In [2]:
# Find all model.tar.gz files in personal bucket
print("Scanning for model artifacts...")

response = s3_client.list_objects_v2(
    Bucket=PERSONAL_BUCKET,
    Prefix=f"{PREFIX}/"
)

models_to_copy = []
for obj in response.get('Contents', []):
    if obj['Key'].endswith('model.tar.gz'):
        # Extract model type from path
        parts = obj['Key'].split('/')
        model_type = parts[2]  # e.g., 'raw-baseline' or 'engineered-no-target-encoding'
        models_to_copy.append({
            'source_key': obj['Key'],
            'model_type': model_type,
            'size_mb': obj['Size'] / (1024 * 1024)
        })
        print(f"  Found: {model_type} ({obj['Size'] / (1024*1024):.2f} MB)")

print(f"\nTotal models found: {len(models_to_copy)}")

Scanning for model artifacts...
  Found: engineered-baseline (0.11 MB)
  Found: engineered-no-target-encoding (0.03 MB)
  Found: raw-baseline (2.46 MB)
  Found: raw-baseline (2.46 MB)

Total models found: 4


In [3]:
# Copy models to public bucket with simplified structure
print("Copying to public bucket...\n")

for model in models_to_copy:
    source_key = model['source_key']
    dest_key = f"{PREFIX}/{model['model_type']}/model.tar.gz"
    
    source_path = f"s3://{PERSONAL_BUCKET}/{source_key}"
    dest_path = f"s3://{PUBLIC_BUCKET}/{dest_key}"
    
    # Check if already exists
    try:
        s3_client.head_object(Bucket=PUBLIC_BUCKET, Key=dest_key)
        print(f"{model['model_type']:40s} Already exists, skipping")
    except:
        # Copy
        copy_source = {'Bucket': PERSONAL_BUCKET, 'Key': source_key}
        s3_client.copy_object(
            CopySource=copy_source,
            Bucket=PUBLIC_BUCKET,
            Key=dest_key
        )
        print(f"{model['model_type']:40s} ✓ Copied")

print("\n✓ All models copied to public bucket")

Copying to public bucket...

engineered-baseline                      ✓ Copied
engineered-no-target-encoding            ✓ Copied
raw-baseline                             ✓ Copied
raw-baseline                             Already exists, skipping

✓ All models copied to public bucket


In [4]:
# Verify models exist in public bucket
print("Verifying models in public bucket...\n")

for model in models_to_copy:
    dest_key = f"{PREFIX}/{model['model_type']}/model.tar.gz"
    dest_path = f"s3://{PUBLIC_BUCKET}/{dest_key}"
    
    try:
        response = s3_client.head_object(Bucket=PUBLIC_BUCKET, Key=dest_key)
        size_mb = response['ContentLength'] / (1024 * 1024)
        print(f"✓ {dest_path}")
        print(f"  Size: {size_mb:.2f} MB")
    except Exception as e:
        print(f"✗ {dest_path} - NOT FOUND")

print("\n" + "="*60)
print("Models now available to all team members at:")
print(f"  s3://{PUBLIC_BUCKET}/{PREFIX}/")
print("="*60)

Verifying models in public bucket...

✓ s3://sagemaker-us-east-1-425709451100/aai540-group1/models/engineered-baseline/model.tar.gz
  Size: 0.11 MB
✓ s3://sagemaker-us-east-1-425709451100/aai540-group1/models/engineered-no-target-encoding/model.tar.gz
  Size: 0.03 MB
✓ s3://sagemaker-us-east-1-425709451100/aai540-group1/models/raw-baseline/model.tar.gz
  Size: 2.46 MB
✓ s3://sagemaker-us-east-1-425709451100/aai540-group1/models/raw-baseline/model.tar.gz
  Size: 2.46 MB

Models now available to all team members at:
  s3://sagemaker-us-east-1-425709451100/aai540-group1/models/


## Summary

Model artifacts copied to public bucket. Team members can now:

1. **Run model comparison** - `03_model_comparison.ipynb` will download from public bucket
2. **Deploy models** - Use public bucket paths in `04_inference/` notebooks
3. **Register to their own Model Registry** - Using artifacts from public bucket

### Public Model Locations
| Model | Path |
|-------|------|
| raw-baseline (v1) | `s3://sagemaker-us-east-1-425709451100/aai540-group1/models/raw-baseline/model.tar.gz` |
| engineered-no-target-encoding (v2) | `s3://sagemaker-us-east-1-425709451100/aai540-group1/models/engineered-no-target-encoding/model.tar.gz` |