[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/manage-repo-model-hub.ipynb)
[![Open with SageMaker](https://img.shields.io/badge/Open%20with-SageMaker-orange?logo=amazonaws)](https://studiolab.sagemaker.aws/import/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/manage-repo-model-hub.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic4.3/manage-repo-model-hub.ipynb)

# Managing Repositories on Hugging Face Model Hub

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- How to authenticate with Hugging Face Hub
- Creating and managing model repositories programmatically
- Uploading models, tokenizers, and files to the Hub
- Version control and repository management using Git and Python API
- Creating comprehensive model cards for documentation
- Best practices for model sharing and collaboration

## 📋 Prerequisites
- Basic understanding of machine learning models
- Familiarity with Git version control
- Hugging Face account (free at [huggingface.co](https://huggingface.co))
- Knowledge of Transformers library basics

## 📚 What We'll Cover
1. **Authentication & Setup**: Secure credential management
2. **Repository Creation**: Creating model repositories on the Hub
3. **File Management**: Uploading models, configs, and documentation
4. **Python API Methods**: Using `huggingface_hub` library
5. **Git Integration**: Traditional Git workflow for model repositories
6. **Model Cards**: Creating comprehensive documentation
7. **Best Practices**: Production-ready repository management

## Part 1: Environment Setup and Authentication

First, let's set up our environment and handle authentication securely.

In [1]:
# Install required packages (uncomment if needed)
# !pip install transformers huggingface_hub datasets torch

# Essential imports for Hugging Face Hub management
import os
import json
import warnings
from pathlib import Path
from typing import Optional, Dict, Any

# Hugging Face ecosystem imports
from huggingface_hub import (
    HfApi,
    Repository,
    login,
    whoami,
    create_repo,
    upload_file,
    upload_folder,
    delete_file,
    list_models,
    model_info
)
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    AutoConfig,
    pipeline
)
import torch

warnings.filterwarnings('ignore')
print("📦 Libraries imported successfully!")

📦 Libraries imported successfully!


In [2]:
# For Google Colab TPU compatibility
try:
    from google.colab import userdata
    COLAB_AVAILABLE = True
except ImportError:
    COLAB_AVAILABLE = False

def get_api_key(key_name: str, required: bool = True) -> Optional[str]:
    """
    Load API key from environment or Google Colab secrets.

    Args:
        key_name: Environment variable name
        required: Whether to raise error if not found

    Returns:
        API key string or None
    """
    # Try Colab secrets first
    if COLAB_AVAILABLE:
        try:
            return userdata.get(key_name)
        except:
            pass

    # Try environment variable
    api_key = os.getenv(key_name)

    if required and not api_key:
        raise ValueError(
            f"{key_name} not found. Set it in:\n"
            f"- Local: .env.local file\n"
            f"- Colab: Secrets manager"
        )

    return api_key

def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.

    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("🍎 Using Apple MPS (Apple Silicon)")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU (consider GPU for better performance)")

    return device

# Set optimal device
device = get_device()

💻 Using CPU (consider GPU for better performance)


### 🔐 Hugging Face Authentication

To manage repositories on Hugging Face Hub, you need to authenticate. There are several ways to do this:

In [3]:
# Method 1: Load token from environment/secrets (recommended)
try:
    hf_token = get_api_key("HF_TOKEN", required=False)

    if hf_token:
        # Login programmatically
        login(token=hf_token)
        print("✅ Successfully authenticated with Hugging Face Hub")

        # Verify authentication
        user_info = whoami()
        print(f"👤 Logged in as: {user_info['name']}")
        print(f"📧 Email: {user_info.get('email', 'Not provided')}")

    else:
        print("⚠️  No HF_TOKEN found. You can:")
        print("   1. Set HF_TOKEN environment variable")
        print("   2. Add HF_TOKEN to Colab secrets (if using Colab)")
        print("   3. Run: huggingface-cli login (in terminal)")
        print("   4. Use login() interactively (next cell)")

except Exception as e:
    print(f"❌ Authentication failed: {e}")
    print("💡 Trying alternative authentication methods...")

✅ Successfully authenticated with Hugging Face Hub
👤 Logged in as: vuhung
📧 Email: vuhung16plus@gmail.com


In [4]:
# Method 2: Interactive login (if token not available)
# Uncomment and run this if you need to login interactively

# from huggingface_hub import login
# login()  # This will prompt for your token

# Method 3: Command line login (run in terminal)
# !huggingface-cli login

print("💡 Authentication Methods:")
print("1. Environment Variable: Set HF_TOKEN in your environment")
print("2. Colab Secrets: Add HF_TOKEN to Google Colab secrets manager")
print("3. Command Line: Run 'huggingface-cli login' in terminal")
print("4. Interactive: Use login() function (prompts for token)")

💡 Authentication Methods:
1. Environment Variable: Set HF_TOKEN in your environment
2. Colab Secrets: Add HF_TOKEN to Google Colab secrets manager
3. Command Line: Run 'huggingface-cli login' in terminal
4. Interactive: Use login() function (prompts for token)


## Part 2: Repository Creation and Management

Now let's learn how to create and manage repositories on the Hugging Face Hub.

In [5]:
# Initialize Hugging Face API client
api = HfApi()

# Example repository configuration
REPO_CONFIG = {
    "repo_id": "your-username/demo-hate-speech-detector",  # Replace with your username
    "repo_type": "model",
    "private": False,  # Set to True for private repositories
    "description": "Educational demo: Fine-tuned RoBERTa for hate speech detection"
}

def create_model_repository(repo_id: str, private: bool = False, description: str = ""):
    """
    Create a new model repository on Hugging Face Hub.

    Args:
        repo_id: Repository identifier (username/repo-name)
        private: Whether to create private repository
        description: Repository description
    """
    try:
        # Create repository
        repo_url = create_repo(
            repo_id=repo_id,
            repo_type="model",
            private=private,
            exist_ok=True  # Don't fail if repository already exists
        )

        print(f"✅ Repository created/verified: {repo_url}")
        print(f"🔗 Visit: https://huggingface.co/{repo_id}")
        return repo_url

    except Exception as e:
        print(f"❌ Error creating repository: {e}")
        return None

# Example: Create repository (uncomment to try)
# repo_url = create_model_repository(
#     repo_id=REPO_CONFIG["repo_id"],
#     private=REPO_CONFIG["private"],
#     description=REPO_CONFIG["description"]
# )

print("📝 Repository creation function ready!")
print(f"💡 Example repo ID: {REPO_CONFIG['repo_id']}")
print("⚠️  Remember to replace 'your-username' with your actual Hugging Face username")

📝 Repository creation function ready!
💡 Example repo ID: your-username/demo-hate-speech-detector
⚠️  Remember to replace 'your-username' with your actual Hugging Face username


## Part 3: Working with the Reference Model (vuhung/hf-basic-4)

Let's explore the existing model referenced in the issue and understand its structure.

In [6]:
# Reference model from the issue
REFERENCE_MODEL = "vuhung/hf-basic-4"

def explore_model_repository(repo_id: str):
    """
    Explore an existing model repository to understand its structure.

    Args:
        repo_id: Repository identifier to explore
    """
    try:
        # Get model information
        info = model_info(repo_id)

        print(f"🔍 Exploring Repository: {repo_id}")
        print("=" * 50)
        print(f"📋 Task: {info.pipeline_tag or 'Not specified'}")
        print(f"📚 Library: {info.library_name or 'Not specified'}")
        print(f"💾 Downloads: {info.downloads:,}")
        print(f"👍 Likes: {info.likes}")
        print(f"🏪 Created: {info.created_at}")
        print(f"📅 Last Modified: {info.last_modified}")

        # List files in the repository
        if hasattr(info, 'siblings') and info.siblings:
            print(f"\n📁 Repository Files:")
            for file_info in info.siblings:
                size_mb = file_info.size / (1024 * 1024) if file_info.size else 0
                print(f"   📄 {file_info.rfilename} ({size_mb:.2f} MB)")

        # Display tags if available
        if info.tags:
            print(f"\n🏷️  Tags: {', '.join(info.tags[:10])}")

        return info

    except Exception as e:
        print(f"❌ Error exploring repository: {e}")
        print(f"💡 This might be a private repository or authentication issue")
        return None

# Explore the reference model
model_info_data = explore_model_repository(REFERENCE_MODEL)

🔍 Exploring Repository: vuhung/hf-basic-4
📋 Task: Not specified
📚 Library: Not specified
💾 Downloads: 0
👍 Likes: 0
🏪 Created: 2025-09-28 23:58:05+00:00
📅 Last Modified: 2025-09-28 23:58:05+00:00

📁 Repository Files:
   📄 .gitattributes (0.00 MB)
   📄 README.md (0.00 MB)

🏷️  Tags: license:mit, region:us


In [7]:
# Load and test our preferred hate speech detection model
# (Following repository focus on hate speech detection)
PREFERRED_MODEL = "cardiffnlp/twitter-roberta-base-hate-latest"

def load_hate_speech_model(model_name: str):
    """
    Load and test a hate speech detection model.

    Args:
        model_name: Name of the model to load
    """
    try:
        print(f"📥 Loading model: {model_name}")

        # Load tokenizer and model
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSequenceClassification.from_pretrained(model_name)

        # Move to optimal device
        model = model.to(device)

        print(f"✅ Model loaded successfully on {device}")
        print(f"📊 Model parameters: {model.num_parameters():,}")

        # Create pipeline for easy usage
        classifier = pipeline(
            "text-classification",
            model=model,
            tokenizer=tokenizer,
            device=0 if device.type == 'cuda' else -1
        )

        # Test with examples
        test_texts = [
            "This is a wonderful community project!",
            "I strongly disagree with this approach.",
            "Thanks for the helpful tutorial."
        ]

        print(f"\n🧪 Testing model with sample texts:")
        for text in test_texts:
            result = classifier(text)
            print(f"   Text: '{text[:40]}...'")
            print(f"   Result: {result[0]['label']} (confidence: {result[0]['score']:.3f})")
            print()

        return tokenizer, model, classifier

    except Exception as e:
        print(f"❌ Error loading model: {e}")
        return None, None, None

# Load our preferred hate speech detection model
tokenizer, model, classifier = load_hate_speech_model(PREFERRED_MODEL)

📥 Loading model: cardiffnlp/twitter-roberta-base-hate-latest


tokenizer_config.json:   0%|          | 0.00/351 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/888 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Device set to use cpu


✅ Model loaded successfully on cpu
📊 Model parameters: 124,647,170

🧪 Testing model with sample texts:


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

   Text: 'This is a wonderful community project!...'
   Result: NOT-HATE (confidence: 0.999)

   Text: 'I strongly disagree with this approach....'
   Result: NOT-HATE (confidence: 0.996)

   Text: 'Thanks for the helpful tutorial....'
   Result: NOT-HATE (confidence: 0.999)



## Part 4: File Upload Methods

Learn different ways to upload files to your Hugging Face repository.

In [8]:
# Method 1: Upload individual files
def upload_individual_file(repo_id: str, local_file_path: str, repo_file_path: str):
    """
    Upload a single file to the repository.

    Args:
        repo_id: Repository identifier
        local_file_path: Path to local file
        repo_file_path: Path in the repository
    """
    try:
        result = upload_file(
            path_or_fileobj=local_file_path,
            path_in_repo=repo_file_path,
            repo_id=repo_id,
            repo_type="model"
        )
        print(f"✅ File uploaded: {repo_file_path}")
        return result
    except Exception as e:
        print(f"❌ Error uploading file: {e}")
        return None

# Method 2: Upload entire folder
def upload_model_folder(repo_id: str, local_folder_path: str):
    """
    Upload an entire folder containing model files.

    Args:
        repo_id: Repository identifier
        local_folder_path: Path to local folder
    """
    try:
        result = upload_folder(
            folder_path=local_folder_path,
            repo_id=repo_id,
            repo_type="model"
        )
        print(f"✅ Folder uploaded: {local_folder_path} -> {repo_id}")
        return result
    except Exception as e:
        print(f"❌ Error uploading folder: {e}")
        return None

# Method 3: Save and upload model directly
def save_and_upload_model(model, tokenizer, repo_id: str, local_save_path: str = "./temp_model"):
    """
    Save model locally and upload to Hub.

    Args:
        model: The model to save
        tokenizer: The tokenizer to save
        repo_id: Repository identifier
        local_save_path: Temporary local save path
    """
    try:
        # Create local directory
        os.makedirs(local_save_path, exist_ok=True)

        # Save model and tokenizer
        print(f"💾 Saving model to {local_save_path}...")
        model.save_pretrained(local_save_path)
        tokenizer.save_pretrained(local_save_path)

        # Upload to Hub
        print(f"📤 Uploading to {repo_id}...")
        result = upload_folder(
            folder_path=local_save_path,
            repo_id=repo_id,
            repo_type="model"
        )

        print(f"✅ Model uploaded successfully!")
        print(f"🔗 View at: https://huggingface.co/{repo_id}")

        return result

    except Exception as e:
        print(f"❌ Error saving/uploading model: {e}")
        return None

print("📤 File upload methods ready!")
print("💡 These functions can be used to upload individual files, folders, or complete models")

📤 File upload methods ready!
💡 These functions can be used to upload individual files, folders, or complete models


## Part 5: Creating Comprehensive Model Cards

Model cards are crucial for documenting your models. Let's create a comprehensive model card.

In [9]:
def create_model_card(model_name: str, task: str, performance_metrics: Dict[str, float] = None) -> str:
    """
    Create a comprehensive model card following Hugging Face standards.

    Args:
        model_name: Name of the model
        task: Primary task the model performs
        performance_metrics: Dictionary of performance metrics

    Returns:
        Formatted model card as markdown string
    """

    # Default metrics if none provided
    if performance_metrics is None:
        performance_metrics = {
            "accuracy": 0.892,
            "f1_score": 0.885,
            "precision": 0.878,
            "recall": 0.893
        }

    model_card = f"""---
language: en
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- hate-speech-detection
- text-classification
- roberta
- social-media
- content-moderation
datasets:
- tdavidson/hate_speech_offensive
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: {model_name}
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: Hate Speech Detection
      type: hate_speech_offensive
    metrics:
    - name: Accuracy
      type: accuracy
      value: {performance_metrics.get('accuracy', 0.0):.3f}
    - name: F1 Score
      type: f1
      value: {performance_metrics.get('f1_score', 0.0):.3f}
---

# {model_name.title()}

## Model Description

This model is a fine-tuned version of RoBERTa specifically designed for {task}. It has been trained on curated datasets to identify and classify potentially harmful content, making it suitable for content moderation applications.

### Model Details

- **Developed by:** [Your Name/Organization]
- **Model type:** RoBERTa for Sequence Classification
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** cardiffnlp/twitter-roberta-base-hate-latest

### Intended Uses

#### Intended Use Cases
- Content moderation for social media platforms
- Automated hate speech detection in user-generated content
- Research on hate speech detection and bias in NLP models
- Educational purposes for learning about text classification

#### Out-of-Scope Use Cases
- Should not be used as the sole basis for content removal decisions
- Not suitable for legal or judicial decision-making
- Should not be used without human oversight in high-stakes scenarios

## Training Data

This model was trained on the Davidson et al. hate speech dataset, which contains:
- **Training samples:** ~20,000 labeled tweets
- **Labels:** Hate speech, Offensive language, Neither
- **Language:** English
- **Domain:** Social media (Twitter)

## Performance

| Metric | Value |
|--------|-------|
| Accuracy | {performance_metrics.get('accuracy', 0.0):.3f} |
| F1 Score | {performance_metrics.get('f1_score', 0.0):.3f} |
| Precision | {performance_metrics.get('precision', 0.0):.3f} |
| Recall | {performance_metrics.get('recall', 0.0):.3f} |

## Usage

### Direct Use

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="{model_name}")

# Use the model
text = "This is a sample text for classification"
result = classifier(text)
print(result)
```

### Advanced Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("{model_name}")
model = AutoModelForSequenceClassification.from_pretrained("{model_name}")

# Tokenize and predict
inputs = tokenizer("Your text here", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
```

## Limitations and Bias

### Known Limitations
- Performance may vary on text from domains different from social media
- May exhibit bias towards certain demographic groups
- Context-dependent hate speech may be challenging to detect
- Limited to English language content

### Bias Analysis
This model has been trained on social media data which may contain inherent biases. Users should:
- Test the model on their specific use case
- Monitor for bias in predictions
- Use human oversight for final decisions

## Training Procedure

### Training Hyperparameters
- Learning rate: 2e-5
- Batch size: 16
- Number of epochs: 3
- Optimizer: AdamW
- Weight decay: 0.01

### Framework Versions
- Transformers: 4.35.0
- PyTorch: 2.1.0
- Datasets: 2.14.0
- Tokenizers: 0.15.0

## Citation

If you use this model, please cite:

```bibtex
@misc{{{model_name.replace('/', '_')},
  author = {{Your Name}},
  title = {{{model_name.title()}: Fine-tuned RoBERTa for Hate Speech Detection}},
  year = {{2024}},
  publisher = {{Hugging Face}},
  url = {{https://huggingface.co/{model_name}}}
}}
```

## Contact

For questions or issues, please contact [your-email@example.com]

---

*This model is part of the educational series from [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove).*
"""

    return model_card

# Create example model card
example_model_card = create_model_card(
    model_name="your-username/hate-speech-detector-v1",
    task="hate speech detection"
)

print("📝 Model card created successfully!")
print("\n" + "="*50)
print("📄 PREVIEW OF MODEL CARD:")
print("="*50)
print(example_model_card[:1000] + "...\n[truncated for preview]")

📝 Model card created successfully!

📄 PREVIEW OF MODEL CARD:
---
language: en
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- hate-speech-detection
- text-classification
- roberta
- social-media
- content-moderation
datasets:
- tdavidson/hate_speech_offensive
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: your-username/hate-speech-detector-v1
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: Hate Speech Detection
      type: hate_speech_offensive
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.892
    - name: F1 Score
      type: f1
      value: 0.885
---

# Your-Username/Hate-Speech-Detector-V1

## Model Description

This model is a fine-tuned version of RoBERTa specifically designed for hate speech detection. It has been trained on curated datasets to identify and classify potentially harmful content, making it suitable for content moderation applica

In [10]:
# Save model card to file and upload
def save_and_upload_model_card(repo_id: str, model_card_content: str, local_path: str = "./README.md"):
    """
    Save model card locally and upload to repository.

    Args:
        repo_id: Repository identifier
        model_card_content: Model card content as string
        local_path: Local path to save the model card
    """
    try:
        # Save locally
        with open(local_path, 'w', encoding='utf-8') as f:
            f.write(model_card_content)

        print(f"💾 Model card saved locally: {local_path}")

        # Upload to repository
        result = upload_file(
            path_or_fileobj=local_path,
            path_in_repo="README.md",
            repo_id=repo_id,
            repo_type="model",
            commit_message="Add comprehensive model card with usage examples and bias documentation"
        )

        print(f"✅ Model card uploaded to {repo_id}")
        print(f"🔗 View at: https://huggingface.co/{repo_id}")

        return result

    except Exception as e:
        print(f"❌ Error saving/uploading model card: {e}")
        return None

# Example usage (uncomment to try)
# save_and_upload_model_card(REPO_CONFIG["repo_id"], example_model_card)

print("📤 Model card upload function ready!")

📤 Model card upload function ready!


## Part 6: Git Integration for Advanced Repository Management

For more complex workflows, you can use Git directly with Hugging Face repositories.

In [11]:
# Git-based repository management
def clone_and_manage_repo(repo_id: str, local_dir: str = "./temp_repo"):
    """
    Clone repository and demonstrate Git-based management.

    Args:
        repo_id: Repository identifier
        local_dir: Local directory for cloning
    """
    try:
        print(f"🔄 Cloning repository: {repo_id}")

        # Clone repository using huggingface_hub
        repo = Repository(
            local_dir=local_dir,
            clone_from=repo_id,
            repo_type="model",
            use_auth_token=True
        )

        print(f"✅ Repository cloned to: {local_dir}")

        # List files in repository
        import os
        files = os.listdir(local_dir)
        print(f"📁 Files in repository: {files}")

        return repo

    except Exception as e:
        print(f"❌ Error cloning repository: {e}")
        return None

def git_workflow_example(repo: Repository, model, tokenizer):
    """
    Demonstrate Git workflow for model updates.

    Args:
        repo: Repository object
        model: Model to save
        tokenizer: Tokenizer to save
    """
    try:
        print("🔄 Git workflow demonstration:")

        # 1. Pull latest changes
        print("1. Pulling latest changes...")
        repo.git_pull()

        # 2. Save model files
        print("2. Saving model files...")
        model.save_pretrained(repo.local_dir)
        tokenizer.save_pretrained(repo.local_dir)

        # 3. Add files to git
        print("3. Adding files to git...")
        repo.git_add()

        # 4. Commit changes
        print("4. Committing changes...")
        repo.git_commit("Update model with improved performance")

        # 5. Push to Hub
        print("5. Pushing to Hub...")
        repo.git_push()

        print("✅ Git workflow completed successfully!")

    except Exception as e:
        print(f"❌ Error in Git workflow: {e}")

print("🔧 Git integration functions ready!")
print("💡 These functions demonstrate advanced repository management using Git")

🔧 Git integration functions ready!
💡 These functions demonstrate advanced repository management using Git


## Part 7: Command Line Tools and Best Practices

Learn about command line tools and production-ready practices.

In [12]:
# Command line equivalents and best practices
def show_cli_commands():
    """
    Display equivalent command line commands for common operations.
    """
    cli_commands = {
        "Installation": [
            "# Install Hugging Face CLI",
            "pip install huggingface_hub",
            "",
            "# For macOS with Homebrew",
            "brew install huggingface-cli"
        ],
        "Authentication": [
            "# Login to Hugging Face",
            "huggingface-cli login",
            "",
            "# Login with token",
            "huggingface-cli login --token YOUR_TOKEN"
        ],
        "Repository Management": [
            "# Create new repository",
            "huggingface-cli repo create your-model-name --type model",
            "",
            "# Create private repository",
            "huggingface-cli repo create your-model-name --type model --private"
        ],
        "File Upload": [
            "# Upload single file",
            "huggingface-cli upload your-username/your-model ./model.bin model.bin",
            "",
            "# Upload entire folder",
            "huggingface-cli upload your-username/your-model ./model_folder ."
        ],
        "Git LFS Setup": [
            "# Install Git LFS (required for large files)",
            "git lfs install",
            "",
            "# Clone repository",
            "git clone https://huggingface.co/your-username/your-model",
            "",
            "# Add large files to LFS tracking",
            "git lfs track '*.bin'",
            "git lfs track '*.safetensors'",
            "",
            "# Standard Git workflow",
            "git add .",
            "git commit -m 'Add model files'",
            "git push"
        ]
    }

    print("🖥️  COMMAND LINE REFERENCE")
    print("=" * 50)

    for category, commands in cli_commands.items():
        print(f"\n📋 {category}:")
        for cmd in commands:
            if cmd.startswith("#"):
                print(f"   {cmd}")
            elif cmd == "":
                print()
            else:
                print(f"   $ {cmd}")

def production_best_practices():
    """
    Display production best practices for model repository management.
    """
    practices = {
        "Security": [
            "Use environment variables for tokens, never hardcode them",
            "Use read-only tokens when possible",
            "Regularly rotate access tokens",
            "Use private repositories for sensitive models"
        ],
        "Version Control": [
            "Tag model versions for easy reference",
            "Use semantic versioning (v1.0.0, v1.1.0, etc.)",
            "Write descriptive commit messages",
            "Keep track of model performance across versions"
        ],
        "Documentation": [
            "Always include comprehensive model cards",
            "Document training data sources and biases",
            "Provide clear usage examples",
            "Include performance metrics and limitations"
        ],
        "File Management": [
            "Use Git LFS for large model files (>100MB)",
            "Compress models when possible (quantization)",
            "Include configuration files and tokenizers",
            "Use consistent file naming conventions"
        ],
        "Collaboration": [
            "Use organizations for team repositories",
            "Set up proper access controls",
            "Use pull requests for model updates",
            "Maintain changelog for significant updates"
        ]
    }

    print("\n🏆 PRODUCTION BEST PRACTICES")
    print("=" * 50)

    for category, items in practices.items():
        print(f"\n📋 {category}:")
        for item in items:
            print(f"   ✓ {item}")

# Display command line reference and best practices
show_cli_commands()
production_best_practices()

🖥️  COMMAND LINE REFERENCE

📋 Installation:
   # Install Hugging Face CLI
   $ pip install huggingface_hub

   # For macOS with Homebrew
   $ brew install huggingface-cli

📋 Authentication:
   # Login to Hugging Face
   $ huggingface-cli login

   # Login with token
   $ huggingface-cli login --token YOUR_TOKEN

📋 Repository Management:
   # Create new repository
   $ huggingface-cli repo create your-model-name --type model

   # Create private repository
   $ huggingface-cli repo create your-model-name --type model --private

📋 File Upload:
   # Upload single file
   $ huggingface-cli upload your-username/your-model ./model.bin model.bin

   # Upload entire folder
   $ huggingface-cli upload your-username/your-model ./model_folder .

📋 Git LFS Setup:
   # Install Git LFS (required for large files)
   $ git lfs install

   # Clone repository
   $ git clone https://huggingface.co/your-username/your-model

   # Add large files to LFS tracking
   $ git lfs track '*.bin'
   $ git lfs track

## Part 8: Practical Demo - Complete Workflow

Let's put it all together with a complete workflow demonstration.

In [13]:
def complete_workflow_demo(demo_mode: bool = True):
    """
    Demonstrate complete workflow for repository management.

    Args:
        demo_mode: If True, only shows what would be done without actually executing
    """

    print("🚀 COMPLETE WORKFLOW DEMONSTRATION")
    print("=" * 50)

    # Configuration
    demo_repo_id = "your-username/demo-hate-speech-model"

    if demo_mode:
        print("📝 DEMO MODE - Showing workflow steps without execution")
        print(f"   Repository: {demo_repo_id}")
        print("   (Replace 'your-username' with your actual username)")
        print()

    steps = [
        "1. 🔐 Authenticate with Hugging Face Hub",
        "2. 📂 Create new model repository",
        "3. 🤖 Load and prepare model for upload",
        "4. 💾 Save model files locally",
        "5. 📝 Generate comprehensive model card",
        "6. 📤 Upload model and documentation",
        "7. 🔍 Verify upload and test model",
        "8. 🏷️  Tag version and update metadata"
    ]

    for step in steps:
        print(step)

        if demo_mode:
            # Simulate step execution
            if "Authenticate" in step:
                print("   → login(token=os.getenv('HF_TOKEN'))")
                print("   → Verified authentication")

            elif "Create new" in step:
                print(f"   → create_repo(repo_id='{demo_repo_id}')")
                print("   → Repository created successfully")

            elif "Load and prepare" in step:
                print("   → Loading cardiffnlp/twitter-roberta-base-hate-latest")
                print("   → Model and tokenizer loaded")

            elif "Save model" in step:
                print("   → model.save_pretrained('./temp_model')")
                print("   → tokenizer.save_pretrained('./temp_model')")

            elif "Generate comprehensive" in step:
                print("   → Creating model card with performance metrics")
                print("   → Including usage examples and bias documentation")

            elif "Upload model" in step:
                print("   → upload_folder('./temp_model', repo_id)")
                print("   → upload_file('README.md', repo_id)")

            elif "Verify upload" in step:
                print("   → Testing model loading from Hub")
                print("   → Running inference tests")

            elif "Tag version" in step:
                print("   → Adding version tag: v1.0.0")
                print("   → Updating repository metadata")

            print("   ✅ Step completed\n")
        else:
            print("   [This would execute the actual workflow step]\n")

    print("🎉 Workflow completed successfully!")
    print(f"🔗 Your model would be available at: https://huggingface.co/{demo_repo_id}")

    # Show what the final repository would contain
    print("\n📁 Repository Contents:")
    repository_files = [
        "README.md (comprehensive model card)",
        "config.json (model configuration)",
        "pytorch_model.bin (model weights)",
        "tokenizer.json (tokenizer files)",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "vocab.json",
        "merges.txt"
    ]

    for file in repository_files:
        print(f"   📄 {file}")

# Run the complete workflow demonstration
complete_workflow_demo(demo_mode=True)

🚀 COMPLETE WORKFLOW DEMONSTRATION
📝 DEMO MODE - Showing workflow steps without execution
   Repository: your-username/demo-hate-speech-model
   (Replace 'your-username' with your actual username)

1. 🔐 Authenticate with Hugging Face Hub
   → login(token=os.getenv('HF_TOKEN'))
   → Verified authentication
   ✅ Step completed

2. 📂 Create new model repository
   → create_repo(repo_id='your-username/demo-hate-speech-model')
   → Repository created successfully
   ✅ Step completed

3. 🤖 Load and prepare model for upload
   → Loading cardiffnlp/twitter-roberta-base-hate-latest
   → Model and tokenizer loaded
   ✅ Step completed

4. 💾 Save model files locally
   → model.save_pretrained('./temp_model')
   → tokenizer.save_pretrained('./temp_model')
   ✅ Step completed

5. 📝 Generate comprehensive model card
   → Creating model card with performance metrics
   → Including usage examples and bias documentation
   ✅ Step completed

6. 📤 Upload model and documentation
   → upload_folder('./temp_m

In [14]:
# Utility function to check repository status
def check_repository_status(repo_id: str):
    """
    Check the current status of a repository.

    Args:
        repo_id: Repository identifier to check
    """
    try:
        print(f"🔍 Checking repository status: {repo_id}")

        # Get repository information
        info = model_info(repo_id)

        print("✅ Repository Status:")
        print(f"   📊 Downloads: {info.downloads:,}")
        print(f"   👍 Likes: {info.likes}")
        print(f"   📅 Last Updated: {info.last_modified}")
        print(f"   🏷️  Pipeline Tag: {info.pipeline_tag or 'Not specified'}")

        # Check if model is loadable
        try:
            tokenizer = AutoTokenizer.from_pretrained(repo_id)
            model = AutoModelForSequenceClassification.from_pretrained(repo_id)
            print("   ✅ Model is loadable")
            print(f"   🔢 Parameters: {model.num_parameters():,}")
        except Exception as e:
            print(f"   ⚠️  Model loading issue: {e}")

        return info

    except Exception as e:
        print(f"❌ Error checking repository: {e}")
        return None

# Check the reference model status
print("🔍 Checking reference model status:")
reference_status = check_repository_status(REFERENCE_MODEL)

print("\n🔍 Checking preferred model status:")
preferred_status = check_repository_status(PREFERRED_MODEL)

🔍 Checking reference model status:
🔍 Checking repository status: vuhung/hf-basic-4
✅ Repository Status:
   📊 Downloads: 0
   👍 Likes: 0
   📅 Last Updated: 2025-09-28 23:58:05+00:00
   🏷️  Pipeline Tag: Not specified
   ⚠️  Model loading issue: Unrecognized model in vuhung/hf-basic-4. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, d

## Part 9: Troubleshooting Common Issues

Learn how to resolve common problems when managing Hugging Face repositories.

In [15]:
def troubleshooting_guide():
    """
    Display common issues and their solutions.
    """

    issues = {
        "Authentication Issues": {
            "Problem": "Token not found or invalid",
            "Solutions": [
                "Check if HF_TOKEN is set in environment variables",
                "Verify token has write permissions",
                "Generate new token at: https://huggingface.co/settings/tokens",
                "Use 'Write' access level for repository management"
            ]
        },
        "Large File Upload Issues": {
            "Problem": "Files larger than 5GB failing to upload",
            "Solutions": [
                "Install and configure Git LFS: git lfs install",
                "Track large files: git lfs track '*.bin'",
                "Use huggingface-cli upload for large files",
                "Consider model quantization to reduce size"
            ]
        },
        "Repository Access Issues": {
            "Problem": "Cannot access private repository",
            "Solutions": [
                "Ensure you have access to the repository",
                "Check if repository exists and name is correct",
                "Verify authentication token has appropriate permissions",
                "Contact repository owner for access"
            ]
        },
        "Model Loading Issues": {
            "Problem": "Model fails to load from Hub",
            "Solutions": [
                "Check if all required files are uploaded (config.json, tokenizer files)",
                "Verify model architecture matches expected format",
                "Test loading with specific revision/branch",
                "Check for corrupted files and re-upload if necessary"
            ]
        },
        "Git LFS Issues": {
            "Problem": "Git LFS not working properly",
            "Solutions": [
                "Install Git LFS: brew install git-lfs (macOS) or apt install git-lfs (Ubuntu)",
                "Initialize in repository: git lfs install",
                "Track large files: git lfs track '*.bin' '*.safetensors'",
                "Commit .gitattributes file: git add .gitattributes && git commit"
            ]
        }
    }

    print("🔧 TROUBLESHOOTING GUIDE")
    print("=" * 50)

    for issue, details in issues.items():
        print(f"\n❌ {issue}")
        print(f"Problem: {details['Problem']}")
        print("Solutions:")
        for i, solution in enumerate(details['Solutions'], 1):
            print(f"   {i}. {solution}")

def diagnostic_commands():
    """
    Show diagnostic commands for troubleshooting.
    """

    commands = {
        "Check Authentication": [
            "huggingface-cli whoami",
            "python -c \"from huggingface_hub import whoami; print(whoami())\""
        ],
        "Check Repository": [
            "huggingface-cli repo ls your-username/your-model",
            "git lfs ls-files  # Check LFS tracked files"
        ],
        "Check Git LFS": [
            "git lfs version",
            "git lfs env",
            "git lfs ls-files"
        ],
        "Test Model Loading": [
            "python -c \"from transformers import AutoModel; AutoModel.from_pretrained('your-model')\""
        ]
    }

    print("\n🔬 DIAGNOSTIC COMMANDS")
    print("=" * 30)

    for category, cmds in commands.items():
        print(f"\n📋 {category}:")
        for cmd in cmds:
            print(f"   $ {cmd}")

# Display troubleshooting information
troubleshooting_guide()
diagnostic_commands()

🔧 TROUBLESHOOTING GUIDE

❌ Authentication Issues
Problem: Token not found or invalid
Solutions:
   1. Check if HF_TOKEN is set in environment variables
   2. Verify token has write permissions
   3. Generate new token at: https://huggingface.co/settings/tokens
   4. Use 'Write' access level for repository management

❌ Large File Upload Issues
Problem: Files larger than 5GB failing to upload
Solutions:
   1. Install and configure Git LFS: git lfs install
   2. Track large files: git lfs track '*.bin'
   3. Use huggingface-cli upload for large files
   4. Consider model quantization to reduce size

❌ Repository Access Issues
Problem: Cannot access private repository
Solutions:
   1. Ensure you have access to the repository
   2. Check if repository exists and name is correct
   3. Verify authentication token has appropriate permissions
   4. Contact repository owner for access

❌ Model Loading Issues
Problem: Model fails to load from Hub
Solutions:
   1. Check if all required files are 

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Authentication**: Secure credential management using tokens and environment variables
- **Repository Management**: Creating, uploading, and managing model repositories programmatically
- **File Operations**: Multiple methods for uploading models, tokenizers, and documentation
- **Model Cards**: Creating comprehensive documentation following HF standards
- **Git Integration**: Advanced repository management using Git workflow
- **Best Practices**: Production-ready patterns for model sharing and collaboration

### 📈 Best Practices Learned
- Use environment variables for secure token management
- Create comprehensive model cards with bias documentation
- Implement proper version control with semantic versioning
- Use Git LFS for large model files (>100MB)
- Test model loading after upload to verify integrity
- Follow Hugging Face community guidelines for model sharing

### 🚀 Next Steps
- **Practice**: Create your own model repository using the patterns learned
- **Advanced Topics**: Explore model versioning and A/B testing strategies
- **Community**: Engage with Hugging Face community and contribute models
- **Documentation**: Study other high-quality model cards for inspiration
- **Automation**: Set up CI/CD pipelines for automated model deployment

### 🔗 Useful Resources
- **Hugging Face Hub Documentation**: [huggingface.co/docs/hub](https://huggingface.co/docs/hub)
- **Model Card Guidelines**: [huggingface.co/docs/hub/model-cards](https://huggingface.co/docs/hub/model-cards)
- **Git LFS Documentation**: [git-lfs.github.io](https://git-lfs.github.io/)
- **Hugging Face CLI Reference**: [huggingface.co/docs/huggingface_hub/guides/cli](https://huggingface.co/docs/huggingface_hub/guides/cli)

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*