<a href="https://colab.research.google.com/github/yumemio/short-guide-to-variables/blob/main/a_short_guide_to_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A short guide to variables in Colab

Google Colab (or underlying Jupyter Notebook) handles Python variables and envvars in a way I find hard to grasp.

So I wrote a notebook to help you get started with handling variables!

In this short memo, you will learn:

* How to set and use environment variables in Colab (Jupyter)
* How to use Python variables in a shell command
* How to use environment variables in a Python code

---

## Environment variables in shell commands

You can use environment variables in shell commands.

👍 Use `%env name=value`, without quoting the value, to set an environment variable.

👍 Use `$my_var` or `${my_var}` to get the assigned value.

🚫 Do not use `!export`, as it has no effect.

🚫 Do not use shell variables (`!name=value`).

In [None]:
# correct
%env my_config=/content/config.yaml
!echo "Config: $my_config"
!echo "Config extension: ${my_config##*.}"

!export my_images="/content/train/" # wrong
!echo "Images: $my_images"

!my_json="/content/labels.json"     # wrong
!echo "Labels: $my_json"

env: my_config=/content/config.yaml
Config: /content/config.yaml
Config extension: yaml
Images: 
Labels: 


### Using variables in `%env` declaration

👍 You can access **Python** variables in a `%env` command, with `$python_var` or `{python_var}` syntax.

In [None]:
the_number = 42
%env epochs=$the_number
%env the_answer={the_number}

env: epochs=42
env: the_answer=42


🚫 You cannot, however, use environment variables in another `%env` declaration.

In [None]:
%env surname=houdini
%env full_name=harry $surname
%env full_name=harry {surname}

env: surname=houdini
env: full_name=harry $surname
env: full_name=harry {surname}


Colab understands and automatically expands `{a_python_expression}` in `%env`, so use `os.environ` instead.

In [None]:
import os
%env full_name=harry {os.environ['surname']}

env: full_name=harry houdini


One common use case of this technique is when you want to append a directory path to `PATH` envvar.

The following cell is equivalent to `export PATH=/content/my_bin:$PATH` in bash.

In [None]:
%env PATH=/content/my_bin:{os.environ['PATH']}

env: PATH=/content/my_bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin


Let's see if the new `PATH` definition is in effect...

In [None]:
!mkdir -p /content/my_bin
!echo 'echo hello world!' > /content/my_bin/sayhello
!chmod u+x /content/my_bin/sayhello
!sayhello

hello world!


One last caveat:

🚫 Do not try to set `PYTHONPATH` with `%env PYTHONPATH=/my/awesome/lib/`. 

👍 Use `sys.path.append("/my/awesome/lib")` instead.

See https://stackoverflow.com/q/34976803/13301046 for details.

In [None]:
import os
%env PYTHONPATH=/content/my_lib:{os.environ['PYTHONPATH']}

!mkdir -p /content/my_lib
script = """def inquisite():
    print('Nobody expects the Spanish inquisition!')"""
!echo "{script}" > /content/my_lib/inquisitor.py

import inquisitor
inquisitor.inquisite()

env: PYTHONPATH=/content/my_lib:/env/python


ModuleNotFoundError: ignored

In [None]:
import sys
sys.path.append("/content/my_lib/")

import inquisitor
inquisitor.inquisite()

Nobody expects the Spanish inquisition!


---

## Python variables in shell commands

One big (and confusing) feature of Jupyter is that **you can use Python variables in shell commands**. You can access them in two ways - `{my_var}` or `$my_var`. Because the latter syntax is the same as shell variable expansion, the former is less ambiguous ("Was `$my_var` declared as a Python variable, or an envvar?")

Wrapping names with curly braces (`${my_var}`) will cause weird bugs, so don't do that!

👍 Use `{my_var}`.

👎 `$my_var` works too, but use with caution.

🚫 Do not use `${my_var}`.

In [None]:
my_var = "gs://foo/bar"

!echo "The hidden lair lies in {my_var}"    # correct
!echo "The hidden lair lies in $my_var"     # ok-ish
!echo "The hidden lair lies in ${my_var}"   # wrong

The hidden lair lies in gs://foo/bar
The hidden lair lies in gs://foo/bar
The hidden lair lies in ://foo/bar


### Downsides or `$my_var` syntax

`$my_var` syntax has two downsides, which leads to a bug-prone code:

1. If the variable is used as a part of a string (i.e. the filename in a filepath), it (and nothing after it) must be surrounded with double quotes (`"`).
2. Possible name collision with `%env` variables. 
`$my_var` can expand to a Python or `%env` variable, whereas `{my_var}` is guaranteed to be understood as a Python variable expansion.

I recommend avoiding `$my_var` syntax altogether and using `{my_var}` instead.

In [None]:
## Demonstration of tricky double quotes
filename = "DSC_0001"
!echo /home/guest/Pictures/"$filename".JPG  # correct
!echo "/home/guest/Pictures/$filename.JPG"  # wrong
!echo

# If variables with the same name are declared in %env and 
# Python, the latter always wins.
if 'runtime' in globals(): del runtime
%env runtime=bash
!echo "> $runtime rules!"

runtime = "Python"
!echo "> $runtime rules!"

%env runtime=bash
!echo "> $runtime rules!"

/home/guest/Pictures/DSC_0001.JPG
/home/guest/Pictures/.JPG

env: runtime=bash
> bash rules!
> Python rules!
env: runtime=bash
> Python rules!


### Using `{my_var}`

`{my_var}` always expands to a Python variable, and not ones declared using `%env`.

If `my_var` is not defined yet, Colab leaves `{my_var}` untouched.

In [None]:
if "runtime" in globals(): del runtime

%env runtime=bash
!echo "> {runtime} rules!"    # runtime is not defined in Python yet

runtime = "Python"
!echo "> {runtime} rules!"

%env runtime=bash
!echo "> {runtime} rules!"

env: runtime=bash
> {runtime} rules!
> Python rules!
env: runtime=bash
> Python rules!


You can quote `{my_var}` however you want: with double or single quotes.

In [None]:
!ls -l "/content/sample_data/mnist_"*".csv"
!echo

mode = "test"
!ls -l '/content/sample_data/mnist_{mode}.csv'

-rw-r--r-- 1 root root 18289443 Nov  1 13:35 /content/sample_data/mnist_test.csv
-rw-r--r-- 1 root root 36523880 Nov  1 13:35 /content/sample_data/mnist_train_small.csv

-rw-r--r-- 1 root root 18289443 Nov  1 13:35 /content/sample_data/mnist_test.csv


This syntax is somewhat similar to Python's f-string or `.format()` method -- but note that advanced formatting features (like text alignment with `<` / `>` operators) are not available.

In [None]:
accelerator = "TPU"
print(f"{accelerator:>10}") # will work
!echo "{accelerator:>10}"   # won't work

       TPU
{accelerator:>10}


### Expansion of Python expressions is all-done-or-nothing-done

Evaluation of braced words as Python expressions happen before the line is passed to the shell. If any of the braces fails to expand, Colab (Jupyter) leaves **all** braces in the row in the not-expanded state.

This means that **mixing braced Python expressions with other uses of braces in a single line is not allowed**. You have to ensure that all occurences of braced words are understood as valid Python code. 

So be careful not to use braces for other purposes, notably:
- Variable expansion in Bash (`${bash_variable}`)
- Argument replacement in `xargs -i` (instead, for example, use`xargs -I@@`)

In [1]:
superhero = "Batman"
villain = "the Joker"
!echo "A story of a hero, {superhero}, fighting against {villain}"  # Will work
!echo "A story of a hero, {superhero}, with the help of {sidekick}, fighting against {villain}" # Won't work
!echo "A story of a hero, {superhero}, fighting against {villain}" # Won't work, since {} is in the comment


A story of a hero, Batman, fighting against the Joker
A story of a hero, {superhero}, with the help of {sidekick}, fighting against {villain}
A story of a hero, {superhero}, fighting against {villain}


In [2]:
!echo -e "{superhero}\n{villain}" | xargs -n1 -i echo 'Name: {}'    # Won't work, since Jupyter cannot expand the last pair of braces
!echo -e "{superhero}\n{villain}" | xargs -n1 -I@@ echo 'Name: @@'  # Will work

Name: {superhero}
Name: {villain}
Name: Batman
Name: the Joker


In [6]:
%env sidekick=Robin

# Case 1: Won't work at all
#   Python expansion fails (because of undeclared_variable),
#   and Bash replaces undeclared_variable with a whitespace
!echo "A story of a hero, {superhero}, with the help of somebody, fighting against ${undeclared_variable}"

# Case 2: Only bash variable expansion will work
#   Python expansion fails (because sidekick is not a Python variable),
#   then Bash replaces ${sidekick} with "Robin"
!echo "A story of a hero, {superhero}, with the help of ${sidekick}, fighting against {villain}"

env: sidekick=Robin
A story of a hero, {superhero}, with the help of somebody, fighting against 
A story of a hero, {superhero}, with the help of Robin, fighting against {villain}


---

## %env variables in Python cells

You can use `os.environ['my_var']` to get environment variables in Python.

In [None]:
%env password=ShinyBash

import os
print(f"Your password is: {os.environ['password']}")

env: password=ShinyBash
Your password is: ShinyBash


---

## Wrap up

That's all for handling variables! Here are the takeaways:

* There are two types of variables you can use in Colab / Jupyter:
  * Environment variables, and
  * Python variables.
* You can set environment variables with `%env`, and use it in a shell command with `${my_var}`.
* You can use Python variables in a shell command with `{my_var}` (don't use `$my_var`!)
* Old-school `os.environ` is an handy way to get environment variables in Python.

If you want to update / add more tips to this notebook, please open an issue at [the GitHub repo](https://www.github.com/yumemio/short-guide-to-variables)!