# Python Imports

## Overview

- Syntax
- What it does
- The Python Path
- Relative imports

## Syntax

### Simple import

The simplest import statement looks like `import module`, where `module` is a "dotted name":

```python
import module
import module.submodule
```

Multiple modules *can* be listed, but it is not considered good style

```python
import sys, os  # Ew!
```

Modules can be renamed at import time. Generally used to shorten the name of a heavily-used module:

```python
import numpy as np
import matplotlib.pyplot as plt
```

### Import from

A "from" import generally reads:

```python
from module import something
```

Useful if the fully-qualified name to an object feels too long or redundant,
or for classes or functions that are simply used a lot:

```python
from datetime import datetime
from iris.cube import Cube
```

Like the simple import, `as` can be used to rename the imported item:

```python
from iris.util import promote_aux_coord_to_dim_coord as promote_coord
```

Also like the simple import, multiple names can be listed, but here *is*
considered good style - so much so that the list is allowed to be bracketed:

```python
from typing import (
    List,
    Mapping,
    Optional,
    ...
)
```

## What it Does

- Check cache
- Import parent
- Locate file
- Initialise module
- Execute file
- Assign variables

### Check cache

Python caches all imported modules in `sys.modules`

In [1]:
import sys

sys.modules["sys"]

<module 'sys' (built-in)>

In code:

In [2]:
def import_module(name):
    try:
        return sys.modules[name]
    except KeyError:
        pass

    return _import_module(name)

### Import parent

The "parent" module is found by simply removing the last dotted part of the name.

So there is essentially an implicit `import module` before any
`import module.submodule`:

In [3]:
import module.submodule

in module
in module.submodule


No output on reimport due to caching:

In [4]:
import module.submodule

In code:

In [5]:
def _import_module(name):
    search_paths = None
    parent, _, child = name.rpartition(".")
    if parent:
        parent_module = import_module(parent)
        search_paths = parent_module.__path__

        # Check cache again
        try:
            return sys.modules[name]
        except KeyError:
            pass

    module = find_and_load(name, search_paths)
    if parent:
        # Ensure available as an attribute of the parent
        setattr(parent_module, child, module)

    return module

### Locate file

This may be a python file, or a folder that may or may not contain a file
called `__init__.py`

**Note**: although an `__init__.py` file is not required for the import system
itself to consider a folder a module, it remains the only available heuristic
in other cases. For example, pytest uses their presence to deduce fully-qualified
names for tests.

#### *Well actually...*

Python's import system is far more general.

Really, it would be consulting several "*finders*" here, whose job is to
return a "*module specification*", which in turn must define a "*loader*".

We're focusing on just one of the default finders - the one that knows how to
import python files on a typical filesystem.

In code:

In [6]:
import os

In [7]:
def find_module_file(name, search_paths=None):
    path = name.replace(".", "/")

    if search_paths is None:
        search_paths = sys.path

    for root in search_paths:
        dirname = os.path.join(root, path)
        if os.path.isdir(dirname):
            filename = os.path.join(dirname, "__init__.py")
            if not os.path.isfile(filename):
                filename = None
            return root, filename

        filename = dirname + ".py"
        if os.path.isfile(filename):
            return None, filename

    raise ModuleNotFoundError(f"No module named {name!r}")

### Initialise module

A new module object is created and stored in `sys.modules`.

It is given some special attributes:

- `__name__`: the full module name
- `__package__`: package that this module is considered part of
  - same as `__name__` if already a package
  - parent if there is one
  - `None` otherwise
- `__path__`: paths to search for submodules (not present if not a package)
- `__spec__`: the module specification
- `__loader__`: loader defined by the specification
- `__file__`: name of file to load (not present if meaningless for the loader)

In code:

In [8]:
Module = type(sys)

In [9]:
def find_and_load(name, search_paths=None):
    dirname, filename = find_module_file(name, search_paths)

    module = sys.modules[name] = Module(name)

    parent = name.rpartition(".")[0]
    if dirname is not None:
        # Is a package
        module.__path__ = [dirname]
        module.__package__ = name
    elif parent:
        # Is part of a package
        module.__package__ = parent

    if filename is not None:
        module.__file__ = filename
        load_module_file(module, filename)

    return sys.modules[name]

### Execute file

The module object is used as the global namespace. In particular, this means:

- Module attributes like `__name__` are available as variables.

- Any global variables defined - such as function or class definitions - will
  immediately be reflected as attributes of the module object.

In code:

In [10]:
def load_module_file(module, filename):
    try:
        with open(filename, "r") as file:
            exec(file.read(), module.__dict__)
    except:
        try:
            del sys.modules[module.__name__]
        except KeyError:
            pass
        raise

### Assign variables

Varies slightly depending on the form of the import:

- `import module.submodule`: the top-level parent module is assigned to a
  variable of the same name. If a submodule was requested, it will be available
  as an attribute.

  Other submodules may be available too, if they were imported in the process
  of executing any of the files so far.

- `import module as something`: the imported (sub)module is assigned directly
  to the name specified.

- `from module import something`: the module is imported but not assigned to
  anything.
  
  The names listed after `import` must either already be attributes of the
  module, or importable submodules. Each is assigned, according to the `as`
  clause if present.

In code:

In [11]:
def _import(name, fromlist=(), namespace=globals()):
    full_name = resolve_name(namespace.get("__package__"), name)
    module = import_module(full_name)

    if not fromlist:
        top_level = full_name.partition(".")[0]
        namespace[top_level] = sys.modules[top_level]
    else:
        for item_name in fromlist:
            try:
                item = getattr(module, item_name)
            except AttributeError:
                item = import_module(f"{full_name}.{item_name}")
            namespace[item_name] = item

## The Python Path

When python is searching for a file to import, it is considering each item of
`sys.path` in turn.

This is a list that is populated in a number of different ways when python
starts up. Most can be disabled with various command-line options or
environment variables.

By default, the entries are (from highest to lowest priority):

- Script directory (disable: `-I`)
- `$PYTHONPATH` (disable: `-I` or `-E`)
- Standard library
- User site packages (disable: `-S` or `-s` or `-I`)
- Site packages (disable: `-S`)

In [12]:
sys.path

['',
 '/opt/scitools/environments/default/current/lib/python36.zip',
 '/opt/scitools/environments/default/current/lib/python3.6',
 '/opt/scitools/environments/default/current/lib/python3.6/lib-dynload',
 '/home/h01/bsherrat/.local/lib/python3.6/site-packages',
 '/opt/scitools/environments/default/current/lib/python3.6/site-packages',
 '/opt/scitools/environments/default/current/lib/python3.6/site-packages/IPython/extensions',
 '/net/home/h01/bsherrat/.ipython']

### Script directory

If a file is being executed, its parent directory is inserted at the start of
`sys.path`.

Otherwise, an empty string is used, which means "current working directory" in
the typical sense - including being affected by `os.chdir`.

### `$PYTHONPATH`

This environment variable can contain a colon-separated list of paths -
similar to `$PATH`. Each will be added to `sys.path`, after the script
directory but before the standard library.

**Caution**: if it contains an empty path (eg if it was extended with
`PYTHONPATH="/some/path:$PYTHONPATH"` when already empty), then this is
expanded to the working directory at the time of execution.

### Site packages

This simply refers to installed packages.

The standard library module
[`site`](https://docs.python.org/3/library/site.html)
is responsible for making sure they are available on `sys.path`, which python
will normally invoke during startup.

There are two main locations:

- Near the standard library, eg `/path/to/lib/pythonX.Y/site-packages`
- A user-writable counterpart, eg `~/.local/lib/pythonX.Y/site-packages`

In [13]:
import site
site.getsitepackages()

['/opt/scitools/environments/default/current/lib/python3.6/site-packages']

In [14]:
site.getusersitepackages()

'/home/h01/bsherrat/.local/lib/python3.6/site-packages'

Additionally, if any of these folders contain `.pth` files, each line of each
file is added to `sys.path` as well. Installing a package from source may make
use of this.

### Which one to use?

**Site package**

Ideally, as much as possible should be installed as a site package - system
not user.

If elevated permissions are required for this, strongly consider creating
a separate environment where they are not - eg using
[`venv`](https://docs.python.org/3/library/venv.html) from the standard
library, or [conda](https://conda.io/en/latest/).

**PYTHONPATH**

If a dedicated environment is not possible or otherwise undesirable, install
the desired package from source, and point to it using `$PYTHONPATH`. Best in
a wrapper script, *not* `.bashrc`.

**User site package**

Generally achieved with `pip install --user`. More convenient than
`PYTHONPATH`, but need to be *very* careful as all python executables of the
same version will share these same packages, so overall less recommended.

**Script directory**

That is, the imported code is in the same folder as the scripts using it.

OK for single-user code, but otherwise should not be considered more than a
temporary solution - install it like any other package.

## Relative imports

A relative import is simply an import that is restricted to the current
package. It is recognised by a module name having leading dots, which is only
valid in the `from` form:

```python
from . import sibling
from .sibling import something
from .. import uncle
from ...great_aunt import cousin_once_removed
```

Main advantages:

- Clear visual distinction from standard/3rd-party imports
- Fully protected against "import shadowing"
- Less to change if a package/folder is renamed

One minor limitation:

- Because `__package__` must be defined, a file containing relative imports
  cannot be executed as a script (ie `python submodule.py`). But any module
  can already be executed with `python -m module.submodule` anyway.

### What it does

The relative module name is translated to an absolute name, by appending it
to `__package__` and going up a level for each additional dot.

For example, a module `a.b` is only allowed to make relative imports with one
leading dot - referring to `a` - `a.b.c` may use two leading dots, etc.

Importing then proceeds almost exactly as before, just using `__path__` in
place of `sys.path`.

In [15]:
def resolve_name(package, name):
    if not name.startswith("."):
        # Already absolute
        return name

    if not package:
        raise ImportError("attempted relative import with no known parent package")

    name = name[1:]
    parents = package.split(".")
    while name.startswith("."):
        name = name[1:]
        parents = parents[:-1]
    if not parents:
        raise ImportError("attempted relative import beyond top-level package")

    parent = ".".join(parents)
    if name:
        return f"{parent}.{name}"
    return parent

## Bonus!

### `__future__`

This is not an import at all, it's part of the grammar: the first non-blank
non-comment lines may read `from __future__ import feature`, and this is not
allowed anywhere else in the file.

It was mostly used to allow python 3 features in python 2, to aid the
transition from 2 to 3.

### Circular imports

Here is a module with three submodules that import each other:

In [16]:
import circular

circular: importing submodule_1
circular.submodule_1: importing submodule_a
circular.submodule_a: importing submodule_α
circular.submodule_α: importing submodule_1
finished circular.submodule_α
finished circular.submodule_a
finished circular.submodule_1


Python is interpreted: it is stepping through each line of a file one-by-one.
An import statement interrupts this by executing another file. If a subsequent
import refers to an "interrupted" file, it is ignored.

In practice, this usually works, because the dependencies are not typically
required to define a function, only to call it. However, a circular dependency
remains a red flag, suggesting the code would benefit from a refactor.

### Using our implementation

In [17]:
_import("a.b", ["c"])

in a
in a.b
in a.b.c


In [18]:
"a.b.c" in sys.modules

True

In [19]:
"a" in globals()

False

In [20]:
"c" in globals()

True

In [21]:
c

<module 'a.b.c' from 'a/b/c.py'>