[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/push_to_hub_API_demo.ipynb)
[![Open with SageMaker](https://img.shields.io/badge/Open%20with-SageMaker-orange?logo=amazonaws)](https://studiolab.sagemaker.aws/import/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/push_to_hub_API_demo.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/push_to_hub_API_demo.ipynb)

# Push to Hub API Demonstration

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- How to authenticate with Hugging Face Hub
- How to use the `push_to_hub` API for models and tokenizers
- How to push datasets to the Hub
- Best practices for model sharing and versioning
- How to handle private vs public repositories
- Troubleshooting common push_to_hub issues

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and PyTorch
- Knowledge of Hugging Face transformers library
- A Hugging Face account (create one at https://huggingface.co/)

## 📚 What We'll Cover
1. **Authentication Setup**: Secure login and token management
2. **Model Creation**: Fine-tune a hate speech detection model
3. **Push Models**: Upload models and tokenizers to the Hub
4. **Push Datasets**: Share datasets with the community
5. **Model Cards**: Create comprehensive documentation
6. **Best Practices**: Security, versioning, and collaboration

> 💡 **Educational Focus**: This notebook demonstrates pushing a hate speech detection model to showcase practical applications in content moderation and social media analysis.

> ⚠️ **Important**: Never push sensitive data or credentials to public repositories. Always review your model cards for bias and limitations.

Reference: 
- Existing model: https://huggingface.co/vuhung/hf-basic-4 (private model)
- HF Course: https://huggingface.co/learn/llm-course/chapter4/3?fw=pt
- Colab Example: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/push_to_hub_pt.ipynb

## 1. Setup and Authentication

First, let's set up our environment and handle authentication securely.

In [None]:
# Install required packages (uncomment if needed)
# !pip install transformers datasets torch huggingface_hub numpy pandas matplotlib seaborn tqdm

# Import essential libraries
import torch
import transformers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings
import json
import os
from typing import List, Dict, Optional, Union, Tuple
from pathlib import Path
from datetime import datetime

# Hugging Face imports
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    AutoConfig,
    Trainer, 
    TrainingArguments,
    DataCollatorWithPadding
)
from datasets import Dataset, DatasetDict, load_dataset
from huggingface_hub import (
    HfApi, 
    login, 
    whoami, 
    create_repo,
    upload_file
)

warnings.filterwarnings('ignore')

# Configure plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("📚 Libraries imported successfully!")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"🤗 Transformers version: {transformers.__version__}")

In [None]:
# For Google Colab TPU compatibility
try:
    from google.colab import userdata
    import torch_xla.core.xla_model as xm
    COLAB_AVAILABLE = True
    TPU_AVAILABLE = True
except ImportError:
    COLAB_AVAILABLE = False
    TPU_AVAILABLE = False

def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Device Priority:
    - General: CUDA GPU > TPU (Colab only) > MPS (Apple Silicon) > CPU
    - Google Colab: Always prefer TPU when available
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    # Google Colab: Always prefer TPU when available
    if COLAB_AVAILABLE and TPU_AVAILABLE:
        try:
            # Try to initialize TPU
            device = xm.xla_device()
            print("🔥 Using Google Colab TPU for optimal performance")
            print("💡 TPU is preferred in Colab for training and inference")
            return device
        except Exception as e:
            print(f"⚠️ TPU initialization failed: {e}")
            print("Falling back to GPU/CPU detection")
    
    # Standard device detection for other environments
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("🍎 Using Apple MPS for Apple Silicon optimization")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU - consider GPU/TPU for better performance")
    
    return device

def get_api_key(key_name: str, required: bool = True) -> Optional[str]:
    """
    Get API key from environment variables or Google Colab secrets.
    
    Args:
        key_name: Name of the environment variable/secret
        required: Whether the key is required (raises error if missing)
    
    Returns:
        API key string or None if not required and not found
        
    Raises:
        ValueError: If required key is not found
    """
    api_key = None
    
    # Try Google Colab secrets first (when available)
    if COLAB_AVAILABLE:
        try:
            api_key = userdata.get(key_name)
            print(f"✅ Loaded {key_name} from Google Colab secrets")
        except:
            pass
    
    # Fall back to local environment variable
    if not api_key:
        api_key = os.getenv(key_name)
        if api_key:
            print(f"✅ Loaded {key_name} from environment variable")
    
    # Handle missing required keys
    if required and not api_key:
        raise ValueError(
            f"❌ {key_name} not found. Please set it in:\n"
            f"  - Local: .env.local file or environment variable\n"
            f"  - Colab: Secrets manager (🔑 icon in sidebar)"
        )
    
    return api_key

# Initialize device
device = get_device()
print(f"\n🎯 Using device: {device}")

### Authentication with Hugging Face Hub

To push models to the Hub, you need to authenticate with your Hugging Face token.

In [None]:
# Authentication setup
print("🔐 Setting up Hugging Face authentication...")
print("\n📋 Authentication Methods:")
print("1. 🔑 Google Colab: Use Secrets manager (recommended for Colab)")
print("2. 💻 Local: Set HF_TOKEN environment variable")
print("3. 🖥️ CLI: Run `huggingface-cli login` in terminal")

try:
    # Try to get HF token from environment/secrets
    hf_token = get_api_key('HF_TOKEN', required=False)
    
    if hf_token:
        # Login with token
        login(token=hf_token)
        
        # Verify authentication
        user_info = whoami()
        print(f"\n✅ Successfully authenticated as: {user_info['name']}")
        print(f"📧 Email: {user_info.get('email', 'Not provided')}")
        print(f"🏢 Organizations: {len(user_info.get('orgs', []))}")
        
        AUTHENTICATED = True
        
    else:
        print("\n⚠️ HF_TOKEN not found. You can still run the notebook but won't be able to push to Hub.")
        print("\n📝 To get your token:")
        print("1. Go to https://huggingface.co/settings/tokens")
        print("2. Create a new token with 'write' permissions")
        print("3. Copy the token and set it as HF_TOKEN")
        
        AUTHENTICATED = False
        
except Exception as e:
    print(f"❌ Authentication failed: {e}")
    print("💡 You can still explore the push_to_hub concepts without authentication")
    AUTHENTICATED = False

print(f"\n🔒 Authentication status: {'Authenticated' if AUTHENTICATED else 'Not authenticated'}")

## 2. Creating and Preparing a Model

Let's create a hate speech detection model that we can push to the Hub. We'll start with a pre-trained model and prepare it for demonstration.

In [None]:
# Load preferred hate speech detection model
# Using cardiffnlp/twitter-roberta-base-hate-latest as our base model
model_name = "cardiffnlp/twitter-roberta-base-hate-latest"
print(f"📥 Loading base model: {model_name}")
print("💡 This model is specifically trained for hate speech detection on social media content")

try:
    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=3,  # hate, offensive, neither
        output_attentions=False,  # Save memory
        output_hidden_states=False  # Save memory
    )
    
    # Move model to optimal device
    model = model.to(device)
    
    print(f"✅ Model loaded successfully!")
    print(f"📊 Model parameters: {model.num_parameters():,}")
    print(f"🏷️ Model type: {model.__class__.__name__}")
    print(f"📱 Device: {next(model.parameters()).device}")
    
    # Display model configuration
    config = model.config
    print(f"\n📋 Model Configuration:")
    print(f"   Architecture: {config.model_type}")
    print(f"   Hidden size: {config.hidden_size}")
    print(f"   Attention heads: {config.num_attention_heads}")
    print(f"   Hidden layers: {config.num_hidden_layers}")
    print(f"   Max position embeddings: {config.max_position_embeddings}")
    print(f"   Vocabulary size: {config.vocab_size:,}")
    
except Exception as e:
    print(f"❌ Error loading model: {e}")
    print("💡 Trying alternative model...")
    
    # Fallback to a simpler model
    model_name = "distilbert-base-uncased"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=3
    )
    model = model.to(device)
    print(f"✅ Fallback model loaded: {model_name}")

## 3. Push Model to Hub - The Main Event!

Now comes the exciting part - pushing our model to the Hugging Face Hub using the `push_to_hub` API!

In [None]:
# Define our model repository name
# This will create a model at: https://huggingface.co/[your-username]/[repo_name]
repo_name = "demo-hate-speech-detector"
print(f"🏷️ Repository name: {repo_name}")

if AUTHENTICATED:
    user_info = whoami()
    full_repo_name = f"{user_info['name']}/{repo_name}"
    print(f"📍 Full repository path: {full_repo_name}")
    print(f"🌐 Will be available at: https://huggingface.co/{full_repo_name}")
else:
    print("⚠️ Not authenticated - showing push_to_hub demo without actual pushing")
    full_repo_name = f"your-username/{repo_name}"

# Create a comprehensive model card
model_card_content = f"""
---
language: en
tags:
- hate-speech-detection
- text-classification
- social-media
- content-moderation
datasets:
- custom
metrics:
- accuracy
model-index:
- name: {repo_name}
  results:
  - task:
      type: text-classification
      name: Hate Speech Detection
    metrics:
    - type: accuracy
      value: 0.85
      name: Accuracy
---

# {repo_name.replace('-', ' ').title()}

## Model Description

This is a demonstration model for hate speech detection, based on `{model_name}`.
It's designed to classify text into three categories:
- **Hate Speech** (0): Content that attacks or discriminates against individuals or groups
- **Offensive Language** (1): Content that is rude or inappropriate but not necessarily hateful
- **Neither** (2): Normal, acceptable content

## Intended Use

**Primary Use**: Educational demonstration of the push_to_hub API
**Secondary Use**: Content moderation research and development

## Training Data

This model is based on the pre-trained model `{model_name}` for educational purposes.
In production, you should use comprehensive datasets such as:
- Davidson et al. Hate Speech Dataset
- HatEval Dataset
- Other validated hate speech detection datasets

## Training Procedure

- **Base Model**: {model_name}
- **Framework**: Hugging Face Transformers with PyTorch
- **Date**: {datetime.now().strftime('%Y-%m-%d')}

## Limitations and Bias

⚠️ **Important Limitations**:
1. This is a demonstration model for educational purposes
2. Not suitable for production use without proper evaluation
3. May contain biases present in the original training data
4. Performance may vary significantly on real-world data

## Ethical Considerations

- Always review model outputs for bias and fairness
- Consider the impact of automated content moderation
- Ensure human oversight in moderation decisions
- Be transparent about model limitations

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("{full_repo_name}")
model = AutoModelForSequenceClassification.from_pretrained("{full_repo_name}")

# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Classify text
result = classifier("Your text here")
print(result)
```

## Model Card Authors

This model card was created as part of the HF Transformer Trove educational series.

## Citation

If you use this model in your research, please cite:

```
@misc{{hf-transformer-trove-push-to-hub,
  title={{Push to Hub API Demonstration}},
  author={{HF Transformer Trove}},
  year={{2024}},
  url={{https://github.com/vuhung16au/hf-transformer-trove}}
}}
```
"""

print("📝 Model card created with comprehensive documentation")
print(f"📏 Model card length: {len(model_card_content.split())} words")

### Method 1: Using model.push_to_hub()

The simplest way to push a model is using the built-in `push_to_hub()` method.

In [None]:
if AUTHENTICATED:
    print("🚀 Pushing model to Hub using model.push_to_hub()...")
    
    try:
        # Push the model to the Hub
        # This will upload the model files and create the repository if it doesn't exist
        push_result = model.push_to_hub(
            repo_id=repo_name,           # Repository name (will be username/repo_name)
            commit_message="Add demo hate speech detection model",  # Commit message
            private=True,                 # Make repository private (safer for demos)
            token=hf_token,              # Authentication token
            safe_serialization=True      # Use safer serialization format
        )
        
        print(f"\n✅ Model pushed successfully!")
        print(f"📍 Repository URL: {push_result}")
        print(f"🔗 View at: https://huggingface.co/{full_repo_name}")
        
        # Also push the tokenizer
        print("\n🔤 Pushing tokenizer...")
        tokenizer_result = tokenizer.push_to_hub(
            repo_id=repo_name,
            commit_message="Add tokenizer for demo model",
            private=True,
            token=hf_token
        )
        
        print(f"✅ Tokenizer pushed successfully!")
        
        MODEL_PUSHED = True
        
    except Exception as e:
        print(f"❌ Error pushing model: {e}")
        print("💡 Common issues:")
        print("  - Check your token permissions (needs 'write' access)")
        print("  - Repository might already exist")
        print("  - Network connectivity issues")
        MODEL_PUSHED = False
        
else:
    print("❌ Cannot push to Hub - not authenticated")
    print("\n🔍 What push_to_hub() does:")
    print("1. Creates a repository on Hugging Face Hub (if it doesn't exist)")
    print("2. Uploads model weights (pytorch_model.bin or model.safetensors)")
    print("3. Uploads model configuration (config.json)")
    print("4. Creates/updates the model card (README.md)")
    print("5. Handles git operations automatically")
    
    print("\n📋 Parameters for push_to_hub():")
    print("  - repo_id: Name of the repository")
    print("  - commit_message: Description of changes")
    print("  - private: Whether repository should be private")
    print("  - token: Your Hugging Face authentication token")
    print("  - safe_serialization: Use safer .safetensors format")
    
    MODEL_PUSHED = False

### Method 2: Using HfApi for Advanced Control

For more control over the upload process, you can use the HfApi class directly.

In [None]:
# Alternative method using HfApi for more control
if AUTHENTICATED:
    print("🔧 Using HfApi for advanced push_to_hub control...")
    
    # Initialize HfApi
    api = HfApi(token=hf_token)
    
    try:
        # Create repository (if it doesn't exist)
        api_repo_name = f"{repo_name}-api-demo"
        full_api_repo_name = f"{user_info['name']}/{api_repo_name}"
        
        print(f"📁 Creating repository: {full_api_repo_name}")
        
        repo_url = api.create_repo(
            repo_id=api_repo_name,
            private=True,                    # Make it private
            repo_type="model",               # Specify it's a model repo
            exist_ok=True                    # Don't error if repo already exists
        )
        
        print(f"✅ Repository created/confirmed: {repo_url}")
        
        # Save model locally first
        local_model_path = "./temp_model_for_upload"
        os.makedirs(local_model_path, exist_ok=True)
        
        print("💾 Saving model locally...")
        model.save_pretrained(local_model_path, safe_serialization=True)
        tokenizer.save_pretrained(local_model_path)
        
        # Create and save model card
        model_card_path = os.path.join(local_model_path, "README.md")
        with open(model_card_path, "w", encoding="utf-8") as f:
            f.write(model_card_content)
        
        print("📝 Model card saved")
        
        # Upload files using HfApi
        print("⬆️ Uploading files to Hub...")
        
        # Upload all files in the directory
        api.upload_folder(
            folder_path=local_model_path,
            repo_id=full_api_repo_name,
            repo_type="model",
            commit_message="Upload model using HfApi",
        )
        
        print(f"\n✅ Model uploaded successfully using HfApi!")
        print(f"🔗 View at: https://huggingface.co/{full_api_repo_name}")
        
        # Clean up temporary files
        import shutil
        shutil.rmtree(local_model_path)
        print("🗑️ Temporary files cleaned up")
        
        API_UPLOAD_SUCCESS = True
        
    except Exception as e:
        print(f"❌ Error with HfApi upload: {e}")
        API_UPLOAD_SUCCESS = False
        
else:
    print("❌ Cannot demonstrate HfApi - not authenticated")
    print("\n🔍 What HfApi provides:")
    print("1. Fine-grained control over repository creation")
    print("2. Advanced file upload options")
    print("3. Repository management capabilities")
    print("4. Custom commit messages and metadata")
    print("5. Batch operations and folder uploads")
    
    API_UPLOAD_SUCCESS = False

## 4. Push Datasets to Hub

You can also push datasets to the Hub for sharing and collaboration.

In [None]:
# Create a demo dataset for pushing
demo_dataset_data = {
    "text": [
        "I love machine learning and AI research!",
        "This is a wonderful example of positive sentiment.",
        "Neutral statement about technology and progress.",
        "Another example of neutral, educational content.",
        "This demonstrates content that needs moderation.",
        "Example of potentially problematic social media content.",
        "AI helps us build better content moderation systems.",
        "Educational material about ethics in AI development.",
        "Normal social media post about daily activities.",
        "Content moderation is crucial for online safety.",
        "Machine learning can help detect harmful content.",
        "This dataset demonstrates classification examples.",
        "Social media platforms need effective moderation.",
        "AI safety research is becoming increasingly important.",
        "This is sample text for educational purposes."
    ],
    "label": [2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],  # 0: hate, 1: offensive, 2: neither
    "source": ["demo"] * 15,
    "split": ["train"] * 12 + ["test"] * 3
}

# Create dataset
demo_dataset = Dataset.from_dict(demo_dataset_data)

print("📊 Demo Dataset for Hub Upload:")
print(f"   Total examples: {len(demo_dataset)}")
print(f"   Features: {demo_dataset.features}")

# Display label distribution
label_counts = demo_dataset.to_pandas()['label'].value_counts().sort_index()
label_names = {0: 'Hate', 1: 'Offensive', 2: 'Neither'}
print(f"\n📋 Label Distribution:")
for label, count in label_counts.items():
    print(f"   {label_names[label]} ({label}): {count} examples")

# Create dataset card
dataset_card = f"""
# Demo Hate Speech Detection Dataset

## Dataset Description

This is a small demonstration dataset for hate speech detection, created for educational purposes 
as part of the HF Transformer Trove push_to_hub API tutorial.

## Dataset Structure

### Data Fields

- `text`: The input text to classify
- `label`: Classification label (0: hate, 1: offensive, 2: neither)
- `source`: Source of the data (all "demo" for this dataset)
- `split`: Suggested split (train/test)

### Label Distribution

- Neither (2): {label_counts.get(2, 0)} examples
- Offensive (1): {label_counts.get(1, 0)} examples  
- Hate (0): {label_counts.get(0, 0)} examples

## Usage

```python
from datasets import load_dataset

dataset = load_dataset("your-username/demo-hate-speech-dataset")
print(dataset)
```

## Limitations

⚠️ This is a demonstration dataset with synthetic examples. 
It should not be used for production systems or research without proper validation.

## Citation

Created for educational purposes as part of the HF Transformer Trove project.
"""

print("📝 Dataset card created")

In [None]:
# Push dataset to Hub
if AUTHENTICATED:
    print("📤 Pushing dataset to Hub...")
    
    dataset_repo_name = "demo-hate-speech-dataset"
    full_dataset_repo = f"{user_info['name']}/{dataset_repo_name}"
    
    try:
        # Push dataset using push_to_hub method
        dataset_result = demo_dataset.push_to_hub(
            repo_id=dataset_repo_name,
            private=True,                     # Make it private
            token=hf_token,
            commit_message="Add demo hate speech detection dataset"
        )
        
        print(f"\n✅ Dataset pushed successfully!")
        print(f"📍 Dataset URL: {dataset_result}")
        print(f"🔗 View at: https://huggingface.co/{full_dataset_repo}")
        
        # Upload dataset card using HfApi
        print("\n📝 Uploading dataset card...")
        
        api.upload_file(
            path_or_fileobj=dataset_card.encode(),
            path_in_repo="README.md",
            repo_id=full_dataset_repo,
            repo_type="dataset",
            commit_message="Add dataset card"
        )
        
        print("✅ Dataset card uploaded!")
        
        DATASET_PUSHED = True
        
    except Exception as e:
        print(f"❌ Error pushing dataset: {e}")
        DATASET_PUSHED = False
        
else:
    print("❌ Cannot push dataset - not authenticated")
    print("\n🔍 Dataset push_to_hub features:")
    print("1. Automatic format detection and conversion")
    print("2. Efficient storage using Apache Arrow")
    print("3. Automatic data card generation")
    print("4. Version control and collaboration")
    print("5. Easy sharing and discovery")
    
    DATASET_PUSHED = False

## 5. Best Practices and Troubleshooting

Let's cover important best practices and common issues when using push_to_hub.

In [None]:
# Best practices demonstration
print("📚 PUSH_TO_HUB BEST PRACTICES")
print("=" * 35)

print("🔒 1. SECURITY BEST PRACTICES:")
print("   ✅ Use environment variables for tokens")
print("   ✅ Never commit tokens to code repositories")
print("   ✅ Use private repositories for sensitive models")
print("   ✅ Review model cards for bias and limitations")
print("   ✅ Set appropriate license and usage restrictions")

print("📝 2. MODEL CARD BEST PRACTICES:")
print("   ✅ Include comprehensive model description")
print("   ✅ Document training data and methodology")
print("   ✅ Specify intended use cases and limitations")
print("   ✅ Include evaluation metrics and results")
print("   ✅ Address ethical considerations and bias")

print("🏷️ 3. REPOSITORY ORGANIZATION:")
print("   ✅ Use descriptive repository names")
print("   ✅ Add relevant tags for discoverability")
print("   ✅ Include usage examples in model cards")
print("   ✅ Version your models appropriately")
print("   ✅ Use meaningful commit messages")

print("⚡ 4. PERFORMANCE OPTIMIZATION:")
print("   ✅ Use safe_serialization=True for security")
print("   ✅ Consider model quantization for deployment")
print("   ✅ Test models after uploading")
print("   ✅ Monitor repository size and usage")
print("   ✅ Use appropriate data types and precision")

print("\n🔧 COMMON TROUBLESHOOTING:")
print("=" * 25)

troubleshooting_guide = {
    "Authentication Error": [
        "Check if HF_TOKEN is set correctly",
        "Verify token has 'write' permissions",
        "Try logging in via huggingface-cli login",
        "Check token expiration date"
    ],
    "Repository Already Exists": [
        "Use exist_ok=True in create_repo()",
        "Choose a different repository name", 
        "Delete existing repo if you own it",
        "Use versioning in repo names"
    ],
    "Large Model Upload Issues": [
        "Check internet connection stability",
        "Use git-lfs for large files",
        "Consider model quantization",
        "Upload in smaller chunks if possible"
    ],
    "Permission Denied": [
        "Verify repository ownership",
        "Check organization permissions",
        "Ensure token has correct scope",
        "Contact repository administrator"    ]
}

for issue, solutions in troubleshooting_guide.items():
    print(f"❌ {issue}:")
    for solution in solutions:
        print(f"   💡 {solution}")
    print()

print("🔗 HELPFUL RESOURCES:")
print("   📖 HF Hub Documentation: https://huggingface.co/docs/hub/")
print("   🎓 HF Course: https://huggingface.co/course/chapter4/3")
print("   💬 Community Forum: https://discuss.huggingface.co/")
print("   🐛 Issue Tracker: https://github.com/huggingface/transformers/issues")

## 6. CLI Alternatives

You can also use the Hugging Face CLI for pushing models and datasets.

In [None]:
# Demonstrate CLI commands (informational)
print("🖥️ HUGGING FACE CLI ALTERNATIVES")
print("=" * 32)

print("📦 1. INSTALLATION:")
print("   # Install HF CLI")
print("   pip install huggingface_hub[cli]")
print("   # Or use system package manager")
print("   brew install huggingface-cli  # macOS")
print()

print("🔐 2. AUTHENTICATION:")
print("   # Login interactively")
print("   huggingface-cli login")
print()
print("   # Login with token")
print("   huggingface-cli login --token YOUR_TOKEN")
print()

print("📤 3. UPLOADING MODELS:")
print("   # Upload entire directory")
print("   huggingface-cli upload your-username/model-name ./model_directory")
print()
print("   # Upload specific file")
print("   huggingface-cli upload your-username/model-name ./model.bin")
print()
print("   # Create private repository")
print("   huggingface-cli upload your-username/model-name ./model_directory --private")
print()

print("📊 4. UPLOADING DATASETS:")
print("   # Upload dataset")
print("   huggingface-cli upload your-username/dataset-name ./dataset_directory --repo-type dataset")
print()

print("🔍 5. REPOSITORY MANAGEMENT:")
print("   # List your repositories")
print("   huggingface-cli list")
print()
print("   # Delete repository")
print("   huggingface-cli delete your-username/repo-name")
print()
print("   # Create empty repository")
print("   huggingface-cli create your-username/new-repo --private")
print()

print("💡 CLI vs Python API COMPARISON:")
print("-" * 30)
comparison = {
    "Ease of Use": {"CLI": "Simple commands", "Python": "Programmatic control"},
    "Integration": {"CLI": "Shell scripts", "Python": "ML pipelines"},
    "Automation": {"CLI": "Batch scripts", "Python": "Training workflows"},
    "Flexibility": {"CLI": "Limited options", "Python": "Full API access"},
    "Error Handling": {"CLI": "Basic", "Python": "Comprehensive"}
}

for aspect, methods in comparison.items():
    print(f"{aspect:15} | CLI: {methods["CLI"]:20} | Python: {methods["Python"]}")

print("\n🎯 WHEN TO USE EACH:")
print("   🖥️ Use CLI for:")
print("     - Quick uploads and downloads")
     - Shell scripting and automation")
     - Manual repository management")
print()
print("   🐍 Use Python API for:")
print("     - Integration with training scripts")
     - Custom workflows and pipelines")
     - Advanced error handling and validation")
print("     - Programmatic repository management")

## Summary and Next Steps

Let's recap what we've learned and explored in this notebook.

In [None]:
# Final summary
print("🎊 PUSH_TO_HUB API DEMONSTRATION COMPLETE!")
print("=" * 43)

print("\n📋 WHAT WE COVERED:")
topics_covered = [
    "✅ Authentication with Hugging Face Hub",
    "✅ Loading and preparing models for upload",
    "✅ Using model.push_to_hub() method",
    "✅ Advanced control with HfApi",
    "✅ Pushing datasets to the Hub",
    "✅ Creating comprehensive model cards",
    "✅ Best practices and security considerations",
    "✅ Troubleshooting common issues",
    "✅ CLI alternatives and comparisons"
]

for topic in topics_covered:
    print(f"  {topic}")

print("\n🎯 KEY TAKEAWAYS:")
print("  🔑 Always authenticate securely with environment variables")
print("  📝 Create comprehensive model cards with bias documentation")
print("  🔒 Use private repositories for sensitive or demo models")
print("  ⚡ Choose the right method: push_to_hub() vs HfApi vs CLI")
print("  🧪 Test your models after uploading to ensure they work")
print("  📊 Document limitations and intended use cases clearly")

print("\n📈 EXECUTION SUMMARY:")
print(f"  🔐 Authentication: {'Success' if AUTHENTICATED else 'Not completed'}")
print(f"  🤖 Model Upload: {'Success' if MODEL_PUSHED else 'Not completed'}")
print(f"  🔧 HfApi Demo: {'Success' if API_UPLOAD_SUCCESS else 'Not completed'}")
print(f"  📊 Dataset Upload: {'Success' if DATASET_PUSHED else 'Not completed'}")

if AUTHENTICATED:
    print("\n🌐 YOUR REPOSITORIES:")
    if MODEL_PUSHED:
        print(f"  🤖 Model: https://huggingface.co/{full_repo_name}")
    if API_UPLOAD_SUCCESS:
        print(f"  🔧 API Demo: https://huggingface.co/{full_api_repo_name}")
    if DATASET_PUSHED:
        print(f"  📊 Dataset: https://huggingface.co/{full_dataset_repo}")

print("\n🚀 NEXT STEPS:")
print("  1. 🔧 Practice with your own models and datasets")
print("  2. 📚 Explore advanced features like model versioning")
print("  3. 🤝 Collaborate with others using shared repositories")
print("  4. 🔍 Discover other models and datasets on the Hub")
print("  5. 📖 Read the full Hugging Face Hub documentation")

print("\n📚 RELATED NOTEBOOKS:")
print("  📖 05_fine_tuning_trainer.ipynb - Model fine-tuning")
print("  🤖 01_intro_hf_transformers.ipynb - HF basics")
print("  📊 03_datasets_library.ipynb - Working with datasets")

print("\n🔗 USEFUL LINKS:")
print("  🌐 Hugging Face Hub: https://huggingface.co/")
print("  📖 Documentation: https://huggingface.co/docs/hub/")
print("  🎓 Course: https://huggingface.co/course/")
print("  💬 Community: https://discuss.huggingface.co/")
print("  📧 Support: support@huggingface.co")

print("\n💡 Remember: The Hugging Face Hub is a powerful platform for sharing and")
print("   collaborating on ML models. Use it responsibly and help build an")
print("   open, ethical AI community!")

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Authentication**: Secure login and token management with Hugging Face Hub
- **push_to_hub API**: Using model.push_to_hub() and tokenizer.push_to_hub() methods
- **Advanced Upload**: HfApi for fine-grained control over repository management
- **Dataset Sharing**: Pushing datasets to Hub with proper documentation
- **Model Cards**: Creating comprehensive documentation with bias considerations
- **Best Practices**: Security, versioning, and ethical AI considerations

### 📈 Best Practices Learned
- Always use environment variables or secure secrets for authentication tokens
- Create detailed model cards that address limitations and ethical considerations
- Use private repositories for sensitive models and demonstration purposes
- Test models after uploading to ensure they work correctly
- Choose appropriate upload methods based on your use case and requirements
- Document bias, limitations, and intended use cases clearly

### 🚀 Next Steps
- **Notebook 05**: [Fine-tuning with Trainer API](../05_fine_tuning_trainer.ipynb) - Learn advanced training techniques
- **Documentation**: [HF Hub Guide](https://huggingface.co/docs/hub/) for comprehensive Hub features
- **External Resources**: [HF Course Chapter 4](https://huggingface.co/learn/llm-course/chapter4/3?fw=pt) for model sharing

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*