# Tips and Tricks for Using FiftyOne in Google Colab

Working with FiftyOne in Google Colab presents unique challenges: temporary instances that lose your data, installation overhead on every restart, and the need to manage datasets across sessions. This notebook shares practical techniques to make your FiftyOne workflow in Colab faster, more reliable, and reproducible.

We'll cover the following concepts:

- Installing FiftyOne efficiently using modern tools
- Persisting datasets, models, and configurations across sessions with Google Drive
- Sharing datasets with collaborators
- Leveraging GPU acceleration for faster inference
- Managing the FiftyOne App for a better development experience

**So, what's the takeaway?**

These patterns apply beyond FiftyOne. The techniques for managing data persistence, sharing resources, and optimizing compute translate to other ML workflows in Colab.

## Setup

To follow this tutorial, you'll need a Google account with sufficient space on Google Drive (approximately 1 GB).

## Tip 1: Fast and clean FiftyOne installation

Using `uv` for package installation and `%%capture` to suppress output creates clean and efficient Colab notebooks.

In Google Colab, the virtual machine instance is temporary. When you close your browser tab or the notebook becomes idle for too long, the instance is recycled and all installed packages are lost. Having a quick way to re-install libraries like FiftyOne is valuable.

### Why use uv?

[uv](https://github.com/astral-sh/uv) is a modern, fast Python package installer and resolver designed to be significantly quicker than traditional tools like `pip` or `conda`. This is useful in Colab notebooks since they need to be re-configured each time you start a new session.

### Suppressing installation output

The `%%capture` magic command suppresses the standard output and standard error streams of a cell. When installing packages, the output can be verbose. Using `%%capture` keeps your notebook clean and focused on results.

### Specifying package versions

It's good practice to make the version of your libraries explicit. This is crucial for reproducibilityâ€”different versions can have API changes or different dependencies. By specifying the version, you ensure that anyone running your notebook in the future uses the exact same environment.

In [None]:
# We use %%capture to avoid polluting the notebook with the install trace
%%capture
!uv pip install fiftyone==1.8.0

Import FiftyOne:

In [None]:
import fiftyone as fo

Verify the installed version:

In [None]:
print(f"FiftyOne version installed: {fo.__version__}")

## Tip 2: Persistent storage with Google Drive

Colab instances are recycled after a period of inactivity (usually around 90 minutes) or when you close your browser tab. Instances also have a maximum lifetime (currently 12 hours), after which they are recycled. Any data not saved to persistent storage will be lost.

Using Google Drive as persistent storage ensures your datasets, models, and work are preserved across sessions. Note that if you share your notebook with collaborators, they will only be able to access data in your Drive that you've granted them access to.

In [None]:
# You will be asked to authorize access to Drive after running this.
# Note that this connects to your own Google Drive account.
# It doesn't connect to folders from others unless they have been shared with you.
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

Create a directory for your FiftyOne data:

In [None]:
import os
from pathlib import Path
# Notice that this is a Google Drive path
save_path = Path('/gdrive/MyDrive/fiftyone_dataset_curation')
os.makedirs(save_path, exist_ok=True)

Configure the MongoDB database location for FiftyOne:

In [None]:
# path to the MongoDB database
database_path = save_path / "mongodb"
os.makedirs(database_path, exist_ok=True)
fo.config.database_dir = str(database_path)

Load a sample dataset from the FiftyOne Dataset Zoo:

In [None]:
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    max_samples=200,
)

Export the dataset to Google Drive:

In [None]:
# Export the dataset to a folder
export_dir = save_path / "coco-2017"
dataset.export(
    export_dir=str(export_dir),
    dataset_type=fo.types.FiftyOneDataset,
     # set to True to export the original images alongside annotations
    export_media=True,
)

Reload the dataset from Drive in a future session:

In [None]:
reloaded_dataset = fo.Dataset.from_dir(
    dataset_dir=export_dir,
    dataset_type=fo.types.FiftyOneDataset,
)

## Tip 3: Downloading shared folders with gdown

When a dataset is shared as a Google Drive folder, [gdown](https://github.com/wkentaro/gdown) provides a convenient way to download files directly from public Google Drive links. This automates the download process and can be included directly in your notebook for reproducibility.

In [None]:
# We use %%capture to avoid polluting the notebook with the install trace
%%capture
!uv pip install gdown==5.2.0

Define the URL of the shared folder and specify a local path. We'll download the same COCO-2017 sample dataset exported earlier.

Public access link: https://drive.google.com/drive/folders/1G6JKGm0sy5d5ViEpDktXMxIqq5cl2Nrc?usp=drive_link

In [None]:
# This assumes 'save_path' is defined from a previous cell.
# Let's ensure it exists.
save_path = Path('/gdrive/MyDrive/fiftyone_dataset_curation')
download_output_path = save_path / "gdown_downloads"
os.makedirs(download_output_path, exist_ok=True)

# The public Google Drive folder URL for the COCO-2017 samples
folder_url = "https://drive.google.com/drive/folders/1G6JKGm0sy5d5ViEpDktXMxIqq5cl2Nrc?usp=drive_link"

# Use gdown to download the entire folder.
# The --folder flag specifies that we are downloading a folder.
# The -O flag sets the output directory.
!gdown --folder "{folder_url}" -O "{download_output_path}"

You may encounter a limit when downloading folders with many files:

```bash
The gdrive folder with url: https://drive.google.com/drive/folders/1fe
	UZBqLJm_2OoxLKTxV53SGPZvUYPa49?hl=en has more than 50 files, gdrive
	can't download more than this limit.
```

To bypass this limit, zip the folder first and share the zip file:

In [None]:
import os
import zipfile

# Define the path to the folder you want to zip
folder_to_zip = '/gdrive/MyDrive/fiftyone_dataset_curation/coco-2017'

# Define the name and path for the output zip file
output_zip_file = '/gdrive/MyDrive/fiftyone_dataset_curation/coco-2017.zip'

# Create a ZipFile object in write mode
with zipfile.ZipFile(output_zip_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Walk through all the files and subdirectories in the folder
    for root, dirs, files in os.walk(folder_to_zip):
        for file in files:
            # Create the full path to the file
            file_path = os.path.join(root, file)
            # Add the file to the zip archive, preserving the directory structure
            zipf.write(file_path, os.path.relpath(file_path, folder_to_zip))

print(f"Folder '{folder_to_zip}' successfully zipped to '{output_zip_file}'")

Create a new shared link for the zip file:

![create_sharing_link](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/create_sharing_link.webp)

Click "Manage Access" and set "Everyone" to "Viewer" permission to share the file publicly:

![manage_access](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/manage_access.webp)

Download the zip file and unzip it:

In [None]:
# The public Google Drive folder URL for the COCO-2017 samples
file_url = "https://drive.google.com/file/d/1UyL0clgoIPSYDWHlPjLP0IPZoFRQQJDE/view?usp=drive_link"

# Use gdown to download the entire file.
!gdown "{file_url}" -O "{download_output_path}"

Here's the complete workflow in Python:

In [None]:
import gdown
import os
import zipfile
from pathlib import Path
import fiftyone as fo

# --- 1. Setup Paths ---
# This assumes 'save_path' is defined from a previous cell.
# Let's ensure it exists.
save_path = Path('/gdrive/MyDrive/fiftyone_dataset_curation')
download_dir = save_path / "gdown_downloads"
os.makedirs(download_dir, exist_ok=True)

# Define the full path for the downloaded zip file
zip_file_path = download_dir / "coco_samples_dataset.zip"

# Define the directory where the contents will be unzipped
unzip_dir = download_dir / "coco_samples_unzipped"
os.makedirs(unzip_dir, exist_ok=True)


# --- 2. Download the ZIP file from Google Drive ---
# The public Google Drive file URL for the zipped COCO-2017 samples
file_url = "https://drive.google.com/file/d/1UyL0clgoIPSYDWHlPjLP0IPZoFRQQJDE/view?usp=drive_link"

print(f"Downloading dataset from Google Drive to '{zip_file_path}'...")
# Use gdown.download to fetch the file
gdown.download(url=file_url, output=str(zip_file_path), quiet=False)
print("Download complete.")


# --- 3. Unzip the File using the `zipfile` library ---
print(f"Unzipping '{zip_file_path}' to '{unzip_dir}'...")

# Use a 'with' statement to safely open and extract the zip archive
try:
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(unzip_dir)
    print("Unzipping successful.")
except zipfile.BadZipFile:
    print("Error: The downloaded file is not a valid zip file or is corrupted.")
except Exception as e:
    print(f"An error occurred during unzipping: {e}")


# --- 4. Load the dataset into FiftyOne ---
# The directory containing the unzipped FiftyOne dataset
# Note: The actual dataset might be in a subdirectory within `unzip_dir`.
# For this specific zip file, the data is in the root of the unzipped folder.
dataset_dir = unzip_dir

print(f"Loading FiftyOne dataset from '{dataset_dir}'...")

# Check if the directory exists and appears to be a FiftyOne dataset
if (Path(dataset_dir) / "samples.json").exists():
    # Use Dataset.from_dir to load the dataset from its source directory
    dataset = fo.Dataset.from_dir(
        dataset_dir=str(dataset_dir),
        dataset_type=fo.types.FiftyOneDataset,
        name="coco-samples-from-drive" # Give the dataset a unique name
    )

    print("\nDataset loaded successfully!")
    print(dataset)

    # You can now work with the dataset, for example, launch the App
    # session = fo.launch_app(dataset)
    # print(session.url)
else:
    print(f"Error: Could not find a valid FiftyOne dataset at '{dataset_dir}'.")
    print("Please check the contents of the unzipped folder.")

You may encounter rate limiting:

```bash
Cannot retrieve the folder information from the link. You may need to
	change the permission to 'Anyone with the link', or have had many
	accesses.
```

When this happens, consider using Drive shortcuts instead (see Tip 4).

## Tip 4: Instant dataset sharing with Drive shortcuts

Adding a shortcut from a shared Google Drive folder to your own account provides fast access to data **without downloading, compressing, or decompressing**. This is more immediate than using gdown.

You can try the following steps using this public access link to COCO-2017 samples: https://drive.google.com/drive/folders/1G6JKGm0sy5d5ViEpDktXMxIqq5cl2Nrc?usp=sharing

### Creating a shortcut

1. Open [drive.google.com](https://drive.google.com) and sign in
2. (Optional) If the folder was shared via invitation, find it under "Shared with me"
3. Right-click on the folder and select **Organize > Add shortcut to Drive**

![add_shortcut](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/add_shortcut.webp)

4. Select a location (e.g., "My Drive") and click **Add shortcut**
5. Access your shortcutâ€”it will be marked with a small arrow icon

You can then reference this folder in your code:

```python
from pathlib import Path
import os
path = Path("/gdrive/MyDrive/<folder_name>")
os.listdir(path)
```

Shortcuts are pointers to the original folderâ€”they don't make copies or use your storage quota. Any changes to files within the shortcut are reflected in the original folder.

### Configuring FiftyOne paths on Drive

Save your FiftyOne downloads, models, and MongoDB database to Drive by modifying the configuration:

In [None]:
# Check the state of fo.config before doing any modification
print(fo.config)

In [None]:
# Where we will download the data when using the FiftyOne dataset zoo
# https://docs.voxel51.com/dataset_zoo/index.html
dataset_zoo_path = save_path / "fo_dataset_zoo"
os.makedirs(dataset_zoo_path, exist_ok=True)
fo.config.dataset_zoo_dir = str(dataset_zoo_path)

# path to the MongoDB database
database_path = save_path / "mongodb"
os.makedirs(database_path, exist_ok=True)
fo.config.database_dir = str(database_path)

models_path = str(save_path / "models")
os.makedirs(models_path, exist_ok=True)
fo.config.model_zoo_dir = models_path

model = foz.load_zoo_model("clip-vit-base32-torch")

## Tip 5: Secure API access with Colab Secrets

Adding tokens (like HuggingFace) as secrets in Google Colab allows you to share your notebooks without exposing personal credentials.

### Why you need a HuggingFace token

- **Access gated models**: Many models on HuggingFace require agreeing to terms before downloading. Your token verifies permissions.
- **Avoid rate limiting**: Authenticated requests have higher download limits.
- **Upload your work**: Upload and manage models and datasets on the Hub.

### Setting up your token

1. Create an account at [huggingface.co](https://huggingface.co/)
2. Go to **Settings > Access Tokens** and create a new token with "Write" role
3. In Colab, click the **key icon** (ðŸ”‘) in the left sidebar
4. Add a new secret named `HUGGINGFACE_TOKEN` with your token value
5. Enable "Notebook access"

### Using the token

```python
import os
from google.colab import userdata

os.environ["HUGGINGFACE_TOKEN"] = userdata.get('HUGGINGFACE_TOKEN')
```

**Security tip**: Never expose tokens in plain text. If compromised, revoke immediately in your HuggingFace settings.

### Loading datasets from HuggingFace

In [None]:
from fiftyone.utils.huggingface import load_from_hub

other_datasets_path = save_path / "other_datasets"
os.makedirs(other_datasets_path, exist_ok=True)
fo.config.default_dataset_dir = str(other_datasets_path)

curated_mnist_dataset = load_from_hub("Voxel51/curated-mnist",
                                       max_samples=100)

## Tip 6: Launching the FiftyOne App in a separate tab

Launching the FiftyOne App in a separate browser tab provides a full-window view, which is often more convenient than embedding it within a notebook cell.

In [None]:
# Passing auto=False prevents the app from launch on its own notebook cell
session = fo.launch_app(dataset, auto=False)
# print session.url gives us a nice URL that we can click on ;)
print(f"Just click here to get to the app (whenever you want, no auto launch) {session.url}")


![full_window_view](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/full_window_view.webp)

Alternatively, `session.open_tab()` opens a new tab automatically, though printing the URL gives you more control over when to access the App.

In [None]:
# session.open_tab() is another option, we just don't get to see the URL here
#session.open_tab()

## Tip 7: Managing a single App instance

Each call to `fo.launch_app()` launches a new server process. While you can run multiple apps, it's best to manage a single instance to conserve resources and avoid confusion. If you see the FiftyOne App "flickering," it's often because two instances are running simultaneously.

Store the returned `session` object to control the app programmatically. If you call `fo.launch_app()` again with an existing session, FiftyOne will return a handle to the existing session without creating a new one.

In [None]:
# The dataset is already loaded from a previous cell
# dataset = foz.load_zoo_dataset("coco-2017", split="validation", max_samples=200)

# If it's flickering it's because we have already called fo.launch_app() inside the notebook
# This will gracefully shut down the App server associated with the previous session
session.close()

# Launch the app and store the session.
session = fo.launch_app(dataset)

Use the session object to update the view without relaunching:

In [None]:
from fiftyone import ViewField as F

dog_view = dataset.filter_labels("ground_truth", F("label") == "dog")
session.view = dog_view

If you close the App's cell or tab, the underlying Python process continues running. You can reopen a tab to the App's URL anytime.

## Tip 8: GPU acceleration for faster inference

Enable GPU acceleration to significantly speed up inference and computations. Google Colab provides free access to GPUs:

```
Runtime > Change Runtime Type > T4 GPU
```

This provides an NVIDIA T4 GPU with approximately 16 GB of VRAM, making embedding computation and model inference much faster.

In [None]:
import torch
# Check if the GPU is available
torch.cuda.is_available()

In [None]:
import time
import fiftyone.brain as fob

# Helper: measure time for computing embeddings into a field
def time_compute_embeddings_and_project(model, label="gpu"):
    start = time.time()
    # Use Brain to both compute embeddings and reduce dimensionality
    res = fob.compute_visualization(
        dataset,
        model=model,
        embeddings="clip_embeddings_" + label,
        brain_key="vis_" + label,
        method='pca',
        batch_size=16,
    )
    end = time.time()
    return end - start

# Load CLIP (OpenAI ViT-B/32) from the model zoo
clip_model = foz.load_zoo_model("clip-vit-base32-torch")
print("Model has embeddings:", getattr(clip_model, "has_embeddings", None))

# Benchmark on GPU (if available)
gpu_time = None
if torch.cuda.is_available():
    # Many FiftyOne zoo models run on GPU automatically if available;
    # timing reflects GPU execution.
    gpu_time = time_compute_embeddings_and_project(clip_model, label="gpu")
    print(f"GPU time (s): {gpu_time:.2f}")

# Force CPU run by moving model to CPU (and/or disabling CUDA)
# Depending on environment, you may need to ensure the model runs on CPU.
# Re-load a fresh model instance to avoid device cross-talk
clip_model_cpu = foz.load_zoo_model("clip-vit-base32-torch")
# If the wrapped model exposes .cuda()/.cpu(), it will be set appropriately by the integration.
# Here we simply assume CPU because no CUDA ops are used when CUDA is unavailable.
# If your environment auto-selects GPU, set CUDA_VISIBLE_DEVICES="" before launching Python
cpu_time = time_compute_embeddings_and_project(clip_model_cpu, label="cpu")
print(f"CPU time (s): {cpu_time:.2f}")

## Tip 9: High-RAM runtime for large datasets

When working with large datasets or high-resolution images, the standard Colab runtime (approximately 12 GB RAM) may be insufficient. Colab Pro offers a High-RAM runtime with approximately 25 GB or more:

```
Runtime > Change Runtime Type > Runtime shape > High-RAM
```

This is useful when loading many samples into FiftyOne, computing embeddings for entire datasets, or performing other memory-intensive operations.

FiftyOne remains responsive even on low resources, but larger datasets benefit from extra memory:

In [None]:
import psutil

# Check available RAM
total_ram_bytes = psutil.virtual_memory().total
# Convert 16GB to bytes
sixteen_gb_bytes = 16 * 1024 * 1024 * 1024

if total_ram_bytes > sixteen_gb_bytes:
    try:
        big_dataset = foz.load_zoo_dataset(
            "coco-2017",
            split="validation",
            max_samples=5000,  # Loading 5000 samples requires more memory
        )
        print(f"Successfully loaded a larger dataset with {len(big_dataset)} samples.")
        # You can now launch the app with this larger dataset
        # session = fo.launch_app(big_dataset)
        # print(session.url)
    except Exception as e:
        print(f"An error occurred: {e}")
        print("This may be due to insufficient RAM. Try switching to a High-RAM runtime.")
else:
    print("Skipping loading the larger dataset: Insufficient RAM (less than 16GB) available.")

## Tip 10: Rendering notebooks that fail on GitHub

Many Jupyter notebooks produce artifacts that prevent proper rendering on GitHub previews:

![invalid_notebook](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/invalid_notebook.webp)

Example of a notebook that doesn't render: https://github.com/andandandand/practical-computer-vision/blob/main/notebooks/Food_Dataset_Curation_with_Fiftyone.ipynb

Change `github.com` to `githubtocolab.com` in the URL to render it immediately in Colab:

https://githubtocolab.com/andandandand/practical-computer-vision/blob/main/notebooks/Food_Dataset_Curation_with_Fiftyone.ipynb

![working_colab](https://cdn.voxel51.com/getting_started_colab_tips/notebook2/working_colab.webp)

## Summary

This tutorial covered techniques for working effectively with FiftyOne in Google Colab's temporary environment.

**Installation and Setup**
- Use `uv` for fast package installation
- Apply `%%capture` to suppress verbose output
- Pin explicit versions for reproducibility

**Data Persistence**
- Mount Google Drive to preserve datasets across sessions
- Configure FiftyOne's MongoDB database to Drive storage paths
- Export and import datasets using `fo.types.FiftyOneDataset`

**Sharing and Collaboration**
- Download shared folders with `gdown` (best for smaller datasets)
- Use Drive shortcuts for instant access without downloading
- Set proper permissions for collaborative access

**App Management**
- Launch with `auto=False` and print URLs for manual access
- Manage a single session to prevent flickering
- Close and reopen without data loss

**Performance**
- Enable T4 GPU runtime for faster inference
- Use High-RAM runtime (25 GB+) for large datasets
- Benchmark CPU vs GPU execution times

**Additional Tips**
- Use `githubtocolab.com` to render notebooks that fail on GitHub
- Remember: 90-minute idle timeout, 12-hour maximum session lifetime