[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yejingxin/ai-on-gke/blob/ipp/applications/ipyparallel/tpu/example_notetook/quickstart.ipynb)
[![Open On GitHub](https://img.shields.io/badge/Open-on%20GitHub-blue?logo=GitHub)](https://github.com/yejingxin/ai-on-gke/blob/ipp/applications/ipyparallel/tpu/example_notetook/quickstart.ipynb)

# Quick Start Guide: Running Notebooks on Multi-host TPU
This tutorial will guide you through initializing a notebook setup and running example cells on a multi-host TPU. This guide focuses on executing cells on an existing service. For information on setting up the service, please refer to this [user guide](https://github.com/yejingxin/ai-on-gke/blob/ipp/applications/ipyparallel/README.md).

## Install `ipyparallel`
To interact with the cluster, we use IPython Parallel and cell magic. You can either have it pre-installed in the jupyter-notebook container image or install it using the following cell:

In [5]:
!pip install ipyparallel

Collecting ipyparallel
  Downloading ipyparallel-8.8.0-py3-none-any.whl.metadata (6.4 kB)
Downloading ipyparallel-8.8.0-py3-none-any.whl (293 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.1/293.1 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: ipyparallel
Successfully installed ipyparallel-8.8.0


## Mount Filestore to Colab Enterprise Instance (Colab Enterprise Only)
**Important: Non-Colab Enterprise users: Please skip this cell and proceed to the next section.**

To mount the filestore to your Colab Enterprise instance, follow these steps:

1. Run the following command:
    ```
    bash ipp_notebook.sh nfsmount
    ```
    This will generate the necessary mount command.

2. Copy the generated command and paste it into the cell below.
3. Execute the cell to mount the filestore.

Example command (your actual command may differ):
```
!sudo apt-get install nfs-common && mkdir nfs && sudo mount -o nolock 10.195.89.130:/ipp nfs
```

In [None]:
#!sudo apt-get install nfs-common && mkdir nfs && sudo mount -o nolock <ip>:/<share_name> nfs

## Verify Client Config
When the multi-host notebook service initializes, it automatically generates a client configuration file in the designated filestore folder. For both Colab and Jupyter Notebook environments, this filestore folder is pre-mounted in the GKE Jupyter Notebook pod for seamless access.

To ensure proper setup, check the config file exist and accessible:

In [3]:
IPP_FILE_PATH = "nfs/security/ipcontroller-client.json"
!cat {IPP_FILE_PATH}

{
  "ssh": "",
  "interface": "tcp://10.56.0.132",
  "registration": 39167,
  "control": 44173,
  "mux": 41335,
  "task": 56755,
  "task_scheme": "leastload",
  "iopub": 52367,
  "notification": 43735,
  "broadcast": 53291,
  "key": "566bf9eb-ea078ca7c904fde10cebb78f",
  "curve_serverkey": null,
  "location": "ipp-notebook-0",
  "pack": "json",
  "unpack": "json",
  "signature_scheme": "hmac-sha256"
}

## Connect to the TPU cluster
Use this client config file to connect to different hosts in the TPU Cluster:

In [6]:
import ipyparallel as ipp
rc = ipp.Client(IPP_FILE_PATH)
print(rc.ids)
if rc.ids:
 print(f'Successfully established connection with {len(rc.ids)} hosts')
else:
 print(f'Failed to connect to {IPP_FILE_PATH}')

[2, 3]
Successfully established connection with 2 hosts


##  Check Current Task Status
A ready-to-use TPU cluster should have zero outstanding tasks, and all task queues should be marked as completed:

In [7]:
print('Cluster task queue status:', rc.queue_status())
print('Current outstanding tasks:', rc.outstanding)

Cluster task queue status: {'unassigned': 0, 0: {'queue': 0, 'completed': 1, 'tasks': 0}, 1: {'queue': 0, 'completed': 1, 'tasks': 0}, 2: {'queue': 0, 'completed': 0, 'tasks': 0}, 3: {'queue': 0, 'completed': 0, 'tasks': 0}}
Current outstanding tasks: set()


## Run a cell on different hosts
We use the `%%px --block --group-outputs=engine` cell magic to execute code across hosts in blocking mode. For detailed instructions on cell magic, refer to the `ipyparallel` [documentation](https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html).  

Example:

In [8]:
%%px --block --group-outputs=engine
import jax
import socket
print(f'jax process {jax.process_index():02d} is running on {socket.gethostname()}, \
local num of chips: {jax.local_device_count()}, global num of chips: {jax.device_count()}')

%px:   0%|          | 0/2 [00:03<?, ?tasks/s]

[stdout:3] jax process 00 is running on ipp-notebook-0-2, local num of chips: 4, global num of chips: 8


%px:  50%|█████     | 1/2 [00:03<00:00,  9.93tasks/s]

[stdout:2] jax process 01 is running on ipp-notebook-0-1, local num of chips: 4, global num of chips: 8


%px: 100%|██████████| 2/2 [00:03<00:00,  1.61s/tasks]


This output demonstrates different JAX processes running on various hosts and displays the total number of chips within the TPU cluster in synchronous mode.

## Clean Up
Ensure all the tasks are completed and disconnect from the cluster.

In [9]:
print('Cluster task queue status:', rc.queue_status())
print('Current outstanding tasks:', rc.outstanding)

Cluster task queue status: {'unassigned': 0, 0: {'queue': 0, 'completed': 1, 'tasks': 0}, 1: {'queue': 0, 'completed': 1, 'tasks': 0}, 2: {'queue': 0, 'completed': 1, 'tasks': 0}, 3: {'queue': 0, 'completed': 1, 'tasks': 0}}
Current outstanding tasks: set()


In [22]:
rc.purge_everything()

In [5]:
rc.shutdown()

**Important**: The step above only disconnect the notebook from the cluster. The notebook service and cluster itself remain active. For complete resource cleanup, including shutting down the service and deleting cluster resources, please refer to the "Clean Up" section in the cluster [user guide](../../README.md). Proper cleanup ensures efficient resource management and prevents unnecessary costs.