# MinIO S3-Compatible File Storage

Learn how to upload, download, and manage files using MinIO in FastAPI applications.

## 1. Understanding S3 and MinIO

MinIO is an S3-compatible object storage server.

In [None]:
print("üì¶ MinIO vs Other Storage Solutions\n")
print("Local Filesystem:")
print("  ‚ùå Not scalable")
print("  ‚ùå Complex replication")
print("  ‚ùå Hard to backup")
print()
print("AWS S3:")
print("  ‚úÖ Scalable")
print("  ‚úÖ Managed service")
print("  ‚ùå Costs money")
print()
print("MinIO (S3-compatible):")
print("  ‚úÖ Open source")
print("  ‚úÖ S3 API compatible")
print("  ‚úÖ Self-hosted")
print("  ‚úÖ Works with any S3 client")
print()
print("Benefits:")
print("  - Use same code for local dev (MinIO) and production (AWS S3)")
print("  - High availability and distributed")
print("  - Geo-replication support")
print()

## 2. MinIO Bucket Concepts

Buckets are like folders in MinIO storage.

In [None]:
print("ü™£ Bucket Structure\n")
print("MinIO:")
print("  ‚îî‚îÄ‚îÄ ml-models/  (bucket)")
print("      ‚îú‚îÄ‚îÄ users/")
print("      ‚îÇ   ‚îú‚îÄ‚îÄ 1/  (user_id)")
print("      ‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ models/")
print("      ‚îÇ   ‚îÇ       ‚îú‚îÄ‚îÄ model_v1.pkl")
print("      ‚îÇ   ‚îÇ       ‚îî‚îÄ‚îÄ model_v2.pkl")
print("      ‚îÇ   ‚îî‚îÄ‚îÄ 2/")
print("      ‚îÇ       ‚îî‚îÄ‚îÄ models/")
print("      ‚îÇ           ‚îî‚îÄ‚îÄ model.joblib")
print("      ‚îî‚îÄ‚îÄ experiments/")
print("          ‚îú‚îÄ‚îÄ exp_001/")
print("          ‚îÇ   ‚îî‚îÄ‚îÄ metadata.json")
print("          ‚îî‚îÄ‚îÄ exp_002/")
print("              ‚îî‚îÄ‚îÄ metrics.csv")
print()
print("This structure enables:")
print("  - Multi-tenancy (separate user directories)")
print("  - Version control (model_v1, model_v2)")
print("  - Organized storage")
print()

## 3. Initialize MinIO Client

Connect to MinIO server.

In [None]:
from minio import Minio
from minio.error import S3Error

# Initialize MinIO client
# In real app, these come from environment variables
MINIO_ENDPOINT = "localhost:9000"
MINIO_ACCESS_KEY = "minioadmin"
MINIO_SECRET_KEY = "minioadmin123"
MINIO_BUCKET = "ml-models"
MINIO_SECURE = False  # True for HTTPS

client = Minio(
    MINIO_ENDPOINT,
    access_key=MINIO_ACCESS_KEY,
    secret_key=MINIO_SECRET_KEY,
    secure=MINIO_SECURE
)

print(f"‚úÖ MinIO client initialized")
print(f"   Endpoint: {MINIO_ENDPOINT}")
print(f"   Bucket: {MINIO_BUCKET}")
print()
print("Note: In production, connection would be established to real MinIO/S3 service.")
print("For this notebook, we'll show the code patterns.")
print()

## 4. Ensure Bucket Exists

Create bucket if it doesn't exist.

In [None]:
# Code example for ensuring bucket exists
print("Ensuring bucket exists (code pattern):\n")
print("""from minio.error import S3Error

def ensure_bucket_exists(client: Minio, bucket_name: str):
    \"\"\"Create bucket if it doesn't exist.\"\"\"
    try:
        if not client.bucket_exists(bucket_name):
            client.make_bucket(bucket_name)
            print(f"Created bucket: {bucket_name}")
        else:
            print(f"Bucket already exists: {bucket_name}")
    except S3Error as e:
        print(f"Error: {e}")
        raise

# Usage
ensure_bucket_exists(client, "ml-models")
""")
print()

## 5. File Upload

Upload files to MinIO.

In [None]:
print("üì§ File Upload Pattern\n")
print("""from pathlib import Path
from minio import Minio
import io

def upload_model_file(
    client: Minio,
    bucket: str,
    file_data: io.BytesIO,
    file_name: str,
    user_id: int,
    content_type: str = "application/octet-stream"
) -> str:
    \"\"\"Upload a file and return the object path.\"\"\"
    try:
        # Construct path: users/{user_id}/models/{filename}
        object_path = f"users/{user_id}/models/{file_name}"
        
        # Upload file
        file_data.seek(0)  # Reset to beginning
        result = client.put_object(
            bucket_name=bucket,
            object_name=object_path,
            data=file_data,
            length=file_data.getbuffer().nbytes,
            content_type=content_type,
            metadata={"uploaded_by": str(user_id)}
        )
        
        print(f"‚úÖ Uploaded: {object_path}")
        return object_path
        
    except S3Error as e:
        print(f"‚ùå Upload failed: {e}")
        raise

# Usage in FastAPI:
# @router.post(\"/api/v1/files/upload\")
# async def upload_file(
#     file: UploadFile,
#     current_user: User = Depends(get_current_user)
# ):
#     file_content = await file.read()
#     file_obj = io.BytesIO(file_content)
#     path = upload_model_file(
#         client, "ml-models", file_obj,
#         file.filename, current_user.id
#     )
#     return {\"path\": path, \"size\": len(file_content)}
""")
print()

## 6. File Download

Download files from MinIO.

In [None]:
print("üì• File Download Pattern\n")
print("""from minio import Minio
from fastapi import FileResponse
import io

def download_model_file(
    client: Minio,
    bucket: str,
    object_path: str
) -> bytes:
    \"\"\"Download a file from MinIO.\"\"\"
    try:
        response = client.get_object(bucket, object_path)
        file_data = response.read()
        response.close()
        response.release_conn()
        return file_data
    except S3Error as e:
        print(f"‚ùå Download failed: {e}")
        raise

# Usage in FastAPI (streaming):
# @router.get(\"/api/v1/files/download/{path:path}\")
# async def download_file(path: str, current_user: User = Depends(get_current_user)):
#     # Verify ownership
#     object_path = f\"users/{current_user.id}/models/{path}\"
#     try:
#         file_data = download_model_file(client, \"ml-models\", object_path)
#         return FileResponse(
#             io.BytesIO(file_data),
#             media_type=\"application/octet-stream\",
#             filename=object_path.split(\"/\")[-1]
#         )
#     except S3Error:
#         raise HTTPException(status_code=404, detail=\"File not found\")
""")
print()

## 7. Presigned URLs

Generate temporary download URLs without exposing credentials.

In [None]:
from datetime import timedelta

print("üîó Presigned URL Pattern\n")
print("""from minio import Minio
from datetime import timedelta

def get_presigned_url(
    client: Minio,
    bucket: str,
    object_path: str,
    expires_seconds: int = 3600
) -> str:
    \"\"\"Generate a temporary download URL.\"\"\"
    try:
        # Generate presigned URL
        url = client.get_presigned_download_url(
            bucket_name=bucket,
            object_name=object_path,
            expires=timedelta(seconds=expires_seconds)
        )
        return url
    except S3Error as e:
        print(f\"‚ùå Failed to generate URL: {e}\")
        raise

# Usage in FastAPI:
# @router.get(\"/api/v1/files/presigned-url/{path:path}\")
# async def get_download_url(
#     path: str,
#     expires_seconds: int = 3600,
#     current_user: User = Depends(get_current_user)
# ):
#     object_path = f\"users/{current_user.id}/models/{path}\"
#     url = get_presigned_url(client, \"ml-models\", object_path, expires_seconds)
#     return {\"url\": url, \"expires_in_seconds\": expires_seconds}

print("\nBenefits of Presigned URLs:")
print("  ‚úÖ No credentials exposed")
print("  ‚úÖ Time-limited access")
print("  ‚úÖ Can be shared with users")
print("  ‚úÖ Works for direct browser downloads")
print("  ‚úÖ Reduces server load (direct S3 transfer)")
""")
print()

## 8. File Deletion

Remove files from MinIO.

In [None]:
print("üóëÔ∏è File Deletion Pattern\n")
print("""from minio import Minio

def delete_model_file(client: Minio, bucket: str, object_path: str) -> bool:
    \"\"\"Delete a file from MinIO.\"\"\"
    try:
        client.remove_object(bucket, object_path)
        print(f\"‚úÖ Deleted: {object_path}\")
        return True
    except S3Error as e:
        print(f\"‚ùå Deletion failed: {e}\")
        return False

# Usage in FastAPI:
# @router.delete(\"/api/v1/models/{model_id}\")
# async def delete_model(model_id: int, current_user: User = Depends(get_current_user)):
#     # Get model from database
#     model = await db.execute(select(MLModel).where(MLModel.id == model_id))
#     if not model:
#         raise HTTPException(status_code=404)
#     
#     # Verify ownership
#     if model.owner_id != current_user.id:
#         raise HTTPException(status_code=403)
#     
#     # Delete file from MinIO
#     if model.model_file_path:
#         delete_model_file(client, \"ml-models\", model.model_file_path)
#     
#     # Delete from database
#     await db.delete(model)
#     await db.commit()
#     
#     return {\"message\": \"Model deleted\"}
""")
print()

## 9. File Validation

Validate file types before upload.

In [None]:
print("‚úÖ File Validation Pattern\n")
print("""from fastapi import UploadFile
from pathlib import Path

# Allowed extensions for ML models
ALLOWED_EXTENSIONS = {".pkl", ".joblib", ".pt", ".h5", ".pb", ".onnx", ".model"}
MAX_FILE_SIZE = 500 * 1024 * 1024  # 500 MB

def validate_model_file(file: UploadFile) -> bool:
    \"\"\"Validate uploaded model file.\"\"\"
    # Check extension
    file_ext = Path(file.filename).suffix.lower()
    if file_ext not in ALLOWED_EXTENSIONS:
        raise ValueError(f\"File type {file_ext} not allowed. Allowed: {ALLOWED_EXTENSIONS}\")
    
    # Check size
    if file.size and file.size > MAX_FILE_SIZE:
        raise ValueError(f\"File too large. Max size: {MAX_FILE_SIZE / 1024 / 1024} MB\")
    
    # Check MIME type
    allowed_mimes = {\"application/octet-stream\", \"application/x-python-pickle\"}
    if file.content_type not in allowed_mimes:
        print(f\"Warning: Unexpected MIME type: {file.content_type}\")
    
    return True

# Usage in FastAPI:
# @router.post(\"/api/v1/files/upload\")
# async def upload_file(file: UploadFile = File(...)):
#     try:
#         validate_model_file(file)
#     except ValueError as e:
#         raise HTTPException(status_code=400, detail=str(e))
#     
#     # ... proceed with upload ...
""")
print()

## 10. MinIO Client Wrapper Class

Create a reusable wrapper for common operations.

In [None]:
print("üõ†Ô∏è MinIO Client Wrapper Class\n")
print("""from minio import Minio
from minio.error import S3Error
from datetime import timedelta
import io

class MinIOClient:
    def __init__(self, endpoint: str, access_key: str, secret_key: str, bucket: str):
        self.client = Minio(endpoint, access_key, secret_key, secure=False)
        self.bucket = bucket
        self.ensure_bucket_exists()
    
    def ensure_bucket_exists(self):
        \"\"\"Create bucket if it doesn't exist.\"\"\"
        if not self.client.bucket_exists(self.bucket):
            self.client.make_bucket(self.bucket)
    
    def upload(
        self,
        file_data: io.BytesIO,
        object_path: str,
        content_type: str = \"application/octet-stream\"
    ) -> str:
        \"\"\"Upload file and return path.\"\"\"
        file_data.seek(0)
        self.client.put_object(
            self.bucket,
            object_path,
            file_data,
            length=file_data.getbuffer().nbytes,
            content_type=content_type
        )
        return object_path
    
    def download(self, object_path: str) -> bytes:
        \"\"\"Download file.\"\"\"
        response = self.client.get_object(self.bucket, object_path)
        data = response.read()
        response.close()
        response.release_conn()
        return data
    
    def get_presigned_url(self, object_path: str, expires_seconds: int = 3600) -> str:
        \"\"\"Generate temporary download URL.\"\"\"
        return self.client.get_presigned_download_url(
            self.bucket,
            object_path,
            expires=timedelta(seconds=expires_seconds)
        )
    
    def delete(self, object_path: str) -> bool:
        \"\"\"Delete file.\"\"\"
        try:
            self.client.remove_object(self.bucket, object_path)
            return True
        except S3Error:
            return False
    
    def list_objects(self, prefix: str = \"\") -> list[str]:
        \"\"\"List all objects with optional prefix.\"\"\"
        objects = []
        for obj in self.client.list_objects(self.bucket, prefix=prefix):
            objects.append(obj.object_name)
        return objects

# Usage:
# minio_client = MinIOClient(
#     endpoint=\"localhost:9000\",
#     access_key=\"minioadmin\",
#     secret_key=\"minioadmin123\",
#     bucket=\"ml-models\"
# )
# 
# # Upload
# path = minio_client.upload(file_data, \"users/1/models/model.pkl\")
# 
# # Download
# data = minio_client.download(path)
# 
# # Get URL
# url = minio_client.get_presigned_url(path)
# 
# # Delete
# minio_client.delete(path)
# 
# # List
# models = minio_client.list_objects(prefix=\"users/1/models/\")
""")
print()

## 11. Practical Example: Complete Upload Flow

See how all pieces fit together.

In [None]:
print("üîÑ Complete File Upload Flow in FastAPI\n")
print("""from fastapi import APIRouter, UploadFile, File, Depends, HTTPException
from sqlalchemy.ext.asyncio import AsyncSession
from pathlib import Path
import io

router = APIRouter(prefix=\"/api/v1/files\", tags=[\"files\"])

ALLOWED_EXT = {\".pkl\", \".joblib\", \".pt\", \".h5\", \".onnx\"}
MAX_SIZE = 500 * 1024 * 1024

@router.post(\"/upload\")
async def upload_file(
    file: UploadFile = File(...),
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db)
):
    # 1. Validate file
    ext = Path(file.filename).suffix.lower()
    if ext not in ALLOWED_EXT:
        raise HTTPException(status_code=400, detail=f\"Invalid extension: {ext}\")
    
    # 2. Read file
    content = await file.read()
    if len(content) > MAX_SIZE:
        raise HTTPException(status_code=413, detail=\"File too large\")
    
    # 3. Upload to MinIO
    file_obj = io.BytesIO(content)
    object_path = f\"users/{current_user.id}/models/{file.filename}\"
    
    minio_client.upload(file_obj, object_path)
    
    # 4. Optional: Save metadata to database
    model = MLModel(
        name=file.filename.split(\".\")[0],
        model_file_path=object_path,
        owner_id=current_user.id,
        file_size=len(content)
    )
    db.add(model)
    await db.commit()
    
    # 5. Return response
    return {
        \"filename\": file.filename,
        \"path\": object_path,
        \"size\": len(content),
        \"model_id\": model.id
    }
""")
print()

## 12. Security Best Practices

Important considerations for production file storage.

In [None]:
print("üîí MinIO Security Best Practices\n")
print()
print("1. CREDENTIALS")
print("   ‚ùå DON'T: Hardcode access_key and secret_key")
print("   ‚úÖ DO: Store in environment variables")
print()
print("2. BUCKET ORGANIZATION")
print("   ‚ùå DON'T: Store all files in one location")
print("   ‚úÖ DO: Namespace by user: users/{user_id}/models/")
print("         This enables per-user access controls")
print()
print("3. FILE VALIDATION")
print("   ‚ùå DON'T: Accept any file type")
print("   ‚úÖ DO: Whitelist allowed extensions")
print("         Check file size before upload")
print("         Scan for malware (optional)")
print()
print("4. PRESIGNED URLS")
print("   ‚ùå DON'T: Long expiry times (>1 hour)")
print("   ‚úÖ DO: Short expiry (5-60 minutes)")
print("         Generate on-demand, never cache")
print()
print("5. OWNERSHIP VERIFICATION")
print("   ‚ùå DON'T: Let users download any file")
print("   ‚úÖ DO: Verify file belongs to current user before download")
print("         Check path matches user_id")
print()
print("6. ENCRYPTION")
print("   ‚ùå DON'T: Store unencrypted sensitive models")
print("   ‚úÖ DO: Enable MinIO encryption at rest")
print("         Use HTTPS/TLS in production")
print()
print("7. VERSIONING")
print("   ‚ùå DON'T: Overwrite files without tracking")
print("   ‚úÖ DO: Include version in filename: model_v1.pkl, model_v2.pkl")
print("         Store metadata in database")
print()
print("8. ACCESS LOGGING")
print("   ‚ùå DON'T: No audit trail")
print("   ‚úÖ DO: Log all upload/download operations")
print("         Include: user_id, filename, timestamp, IP")
print()

## Summary

**MinIO Key Concepts:**
- ‚úÖ S3-compatible object storage
- ‚úÖ Buckets organize objects
- ‚úÖ Objects are files with metadata
- ‚úÖ Presigned URLs for temporary access
- ‚úÖ Same API works for MinIO and AWS S3

**Common Operations:**
1. **Upload**: `put_object()` with BytesIO
2. **Download**: `get_object()` returns stream
3. **Presigned URL**: `get_presigned_download_url()`
4. **Delete**: `remove_object()`
5. **List**: `list_objects()` with prefix

**Best Practices:**
1. Validate files before upload (type, size)
2. Namespace files by user for multi-tenancy
3. Use presigned URLs for downloads
4. Log all file operations
5. Verify ownership before download
6. Use short expiry times for presigned URLs
7. Store file metadata in database
8. Enable encryption for sensitive data

**Integration with FastAPI:**
- Use UploadFile for file handling
- Dependency injection for MinIO client
- Stream downloads for efficiency
- Return presigned URLs for direct access