# Download and Load MABe Mouse Behavior Detection Dataset

This notebook downloads the [MABe Mouse Behavior Detection](https://www.kaggle.com/competitions/MABe-mouse-behavior-detection) competition dataset using the Python `kaggle` module.

## Prerequisites
1. **Kaggle Account**: Ensure you have joined the competition on Kaggle to accept the rules.
2. **API Token**: You need a `kaggle.json` file (API Token) from your account settings.
   - Place it in `~/.kaggle/kaggle.json` (Linux/Mac) or `%USERPROFILE%\.kaggle\kaggle.json` (Windows).
   - Or set `KAGGLE_USERNAME` and `KAGGLE_KEY` environment variables.

## Manual Dataset Placement

If you have already downloaded the dataset manually, please create a directory named `data` in the same location as this notebook.
Then, place the **unzipped contents** of the competition dataset directly into this `data` directory.

For example, if you downloaded `MABe-mouse-behavior-detection.zip`, after unzipping it, you would copy the folders and files (e.g., `train`, `test`, `sample_submission.csv`) directly into `./data/`.

If you proceed this way, you can skip running the download cells (Cells 3 and 4) in this notebook.

In [1]:
# Install necessary libraries
!pip install kaggle pandas

Collecting kaggle
  Downloading kaggle-1.8.2-py3-none-any.whl.metadata (16 kB)
Collecting pandas
  Downloading pandas-2.3.3-cp314-cp314-macosx_11_0_arm64.whl.metadata (91 kB)
Collecting black>=24.10.0 (from kaggle)
  Downloading black-25.11.0-cp314-cp314-macosx_11_0_arm64.whl.metadata (85 kB)
Collecting bleach (from kaggle)
  Downloading bleach-6.3.0-py3-none-any.whl.metadata (31 kB)
Collecting kagglesdk (from kaggle)
  Downloading kagglesdk-0.1.13-py3-none-any.whl.metadata (13 kB)
Collecting mypy>=1.15.0 (from kaggle)
  Downloading mypy-1.19.0-cp314-cp314-macosx_11_0_arm64.whl.metadata (2.2 kB)
Collecting protobuf (from kaggle)
  Downloading protobuf-6.33.1-cp39-abi3-macosx_10_9_universal2.whl.metadata (593 bytes)
Collecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting requests (from kaggle)
  Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting setuptools>=21.0.0 (from kaggle)
  Using cached se

In [2]:
import os
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize Kaggle API
api = KaggleApi()
try:
    api.authenticate()
    print("Kaggle API authenticated successfully.")
except Exception as e:
    print(f"Authentication failed: {e}")
    print("Please ensure 'kaggle.json' is in the correct location or environment variables are set.")

Could not find kaggle.json. Make sure it's located in /Users/zihanghuang/.kaggle. Or use the environment method. See setup instructions at https://github.com/Kaggle/kaggle-api/


NameError: name 'exit' is not defined

In [None]:
# Configuration
COMPETITION_NAME = 'MABe-mouse-behavior-detection'
DATA_DIR = './data'

# Create data directory if it doesn't exist
if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

print(f"Downloading files for competition '{COMPETITION_NAME}' to '{DATA_DIR}'...")

try:
    # Download and unzip files
    api.competition_download_files(COMPETITION_NAME, path=DATA_DIR, unzip=True)
    print("Download and extraction complete.")
except Exception as e:
    print(f"An error occurred during download: {e}")
    print("Ensure you have accepted the competition rules on the Kaggle website.")

In [None]:
# Inspect the downloaded files
print("Downloaded files:")
for root, dirs, files in os.walk(DATA_DIR):
    for file in files:
        print(os.path.join(root, file))

In [None]:
# Example: Load a CSV file (adjust the filename based on the output above)
import pandas as pd

# Common files in competitions: 'train.csv', 'test.csv', 'sample_submission.csv'
# Update this variable with a file found in the previous step
TARGET_FILE = f'{DATA_DIR}/sample_submission.csv' 

if os.path.exists(TARGET_FILE):
    print(f"\nLoading {TARGET_FILE}...")
    try:
        df = pd.read_csv(TARGET_FILE)
        display(df.head())
        print(f"Loaded dataframe with shape: {df.shape}")
    except Exception as e:
        print(f"Could not load file: {e}")
else:
    print(f"\nFile '{TARGET_FILE}' not found. Please check the file list above and update 'TARGET_FILE'.")