# Loading and saving images

Almost all image processing involves reading and saving image files...

This tutorial covers some basic filesystem utilities and image IO (input/output) libraries



## Filesystem handling with `pathlib`

There are two ways to handle file path operations in the python standard library:
The `os` and `os.path` modules, and the more recent object-oriented `pathlib` module.

I'll be mostly using `pathlib` here as I prefer it (some arguments [here](https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/)), but many of these operations
have [corresponding tools in the `os` module](https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module)\

See the [pathlib documentation](https://docs.python.org/3/library/pathlib.html) for more


In [2]:
from pathlib import Path

# get current working directory with `cwd`
cur_dir = Path.cwd()
print(f"The current directory is {cur_dir}\n")

# specify a relative or absolute directory
data_dir = Path("data")

# join paths using the division operator
# works on both windows and mac/linux
# note:  a path doesn't need to exist to "refer" to it
sub_dir = data_dir / "subdirectory"
print("some data subdirectory:", sub_dir)

The current directory is /Users/talley/Dropbox (HMS)/Python/hms_pyintro2

some data subdirectory: data/subdirectory


### `Path` objects have lots of useful methods

In [3]:
# check if something exists with `exists()`:
print(f"subdirectory exists?: {sub_dir.exists()}\n")

# create a subdirectory with `mkdir`
sub_dir.mkdir(exist_ok=True)
print(f"subdirectory exists?: {sub_dir.exists()}\n")

# iterate over all files in a specific path with `listdir`
print(f"The files in the data directory are {list(data_dir.iterdir())}\n")

# use .glob to find all files matching a specific pattern, with "*" as wildcard
tiff_files = data_dir.glob("*.TIF")
print(f"Tiff files in the data directory: {list(tiff_files)}\n")

# tricky sidenote:  `glob(...)` returns a generator that gets "consumed" each
# time you use it, so make sure to either recreate the generator each time,
# or store the output in a list.

# remove files with `.unlink` and folders with `.rmdir`
sub_dir.rmdir()
print(f"subdirectory exists?: {sub_dir.exists()}\n")


subdirectory exists?: False

subdirectory exists?: True

The files in the data directory are [PosixPath('data/subdirectory'), PosixPath('data/mm_ex_w2561.TIF'), PosixPath('data/mm_ex.nd'), PosixPath('data/.ipynb_checkpoints'), PosixPath('data/mm_ex_w1488.TIF')]

Tiff files in the data directory: [PosixPath('data/mm_ex_w2561.TIF'), PosixPath('data/mm_ex_w1488.TIF')]

subdirectory exists?: False



### Read text files into python

Now that we know how to reference various locations on the disk, lets read something into python

In [4]:
# read text file using the builtin `open` function
# note, we use the `with ...` context manager so that the file
# is automatically closed when we are done reading it:

with open(data_dir / 'mm_ex.nd') as file:
    text = file.read()

print(text)

"NDInfoFile", Version 2.0
"Description", Nikon Imaging Center - Station 2: Lucille
"StartTime1", 20210623 08:22:39
"DoTimelapse", FALSE
"NTimePoints", 1
"DoStage", FALSE
"DoWave", TRUE
"NWavelengths", 2
"WaveName1", "488"
"WaveDoZ1", TRUE
"WaveName2", "561"
"WaveDoZ2", TRUE
"DoZSeries", TRUE
"NZSteps", 7
"ZStepSize", 0
"WaveInFileName", TRUE
"NEvents", 0
"EndFile"



In [5]:
# Path objects from the pathlib module have a convenient function for reading text
ptext = (data_dir / 'mm_ex.nd').read_text()
print('The texts are the same:', text == ptext)

The texts are the same: True


### Read image data into python

Here's where things start to get a little confusing, because there are often many different ways to do the same thing (some of the file IO libraries are actually just calling functions from another library under the hood!).  I'll cover a few of the most popular libraries, but I encourage you to explore the options for your specific file types.

#### Tiffs...
Let's start by trying to load the `mm_ex_w1488.TIF` file in the data directory.
This is exactly how the file came off the microscope (with metamorph), the all caps `.TIF` extension actually poses some challenges

#### tifffile

`tifffile` is probably the de-facto scientific tiff file reader, and many of the `imread` functions from "popular" libraries actually call `tifffile` under the hood.

In [6]:
import tifffile

img = tifffile.imread(data_dir / 'mm_ex_w1488.TIF')
print(type(img), img.shape, img.dtype)
print(img)

# This is correct: our image is a 7-plane, 302 x 302 image

<class 'numpy.ndarray'> (7, 302, 302) uint16
[[[105 101 103 ... 102 104 101]
  [101 100 104 ... 102 106 105]
  [103 105 102 ... 102 102 102]
  ...
  [210 216 209 ... 139 145 142]
  [225 237 229 ... 144 140 151]
  [245 242 249 ... 141 154 152]]

 [[103 103 103 ... 109  99 103]
  [106  98 102 ... 106 104 103]
  [ 97 101 104 ... 102  97 104]
  ...
  [272 292 319 ... 147 149 148]
  [316 363 360 ... 152 157 155]
  [358 366 356 ... 155 157 154]]

 [[100 102 102 ... 106  98 107]
  [100 102  98 ... 108 102 102]
  [103 100 103 ... 105 105 101]
  ...
  [393 414 497 ... 177 182 179]
  [500 622 772 ... 189 191 212]
  [705 830 920 ... 194 201 211]]

 ...

 [[100 107 101 ... 109 105 104]
  [106 103 104 ... 107 109 102]
  [106 106 104 ... 108 103 101]
  ...
  [241 289 332 ... 182 192 197]
  [321 370 383 ... 210 221 242]
  [359 374 385 ... 218 236 230]]

 [[102  99 108 ... 108  99 105]
  [102 100 107 ... 103 104 108]
  [106 102  98 ... 103 108 107]
  ...
  [178 189 195 ... 148 145 157]
  [193 197 198 

#### imageio

`imageio` is popular library with support for a *ton* of [image formats](https://imageio.readthedocs.io/en/stable/formats.html).  It will also use `tifffile` under the hood for tiff files, but one must be careful when using `imageio.imread` as it will only read a single plane.  `volread` is capable of reading a tiff stack.

In [7]:
import imageio

# careful!
img = imageio.imread(data_dir / 'mm_ex_w1488.TIF')
print(type(img), img.shape, img.dtype)

# use volread for 3D tiff data
img = imageio.volread(data_dir / 'mm_ex_w1488.TIF')
print(type(img), img.shape, img.dtype)

<class 'imageio.core.util.Array'> (302, 302) uint16
<class 'imageio.core.util.Array'> (7, 302, 302) uint16


#### scikit-image


In [8]:
from skimage import io

# careful!
img = io.imread(data_dir / 'mm_ex_w1488.TIF')
print('io.imread')
print(type(img), img.shape, img.dtype)

io.imread
<class 'numpy.ndarray'> (302, 302) uint16


it didn't work!  why not?

scikit-image wraps various other image readers (like `imagio` and `tifffile`) in its `io` module and calls them "plugins".  The [docstring](https://scikit-image.org/docs/dev/api/skimage.io.html#imread) of the skimage.io.imread function tells us that the `imread` function accepts a "plugin" parameter:


> Name of plugin to use. By default, the different plugins are tried (starting with imageio) until a suitable candidate is found. If not given and fname is a tiff file, the tifffile plugin will be used.

presumably, the capital `.TIF` extension caused skimage to use the `imageio` library which we already saw only reads a single plane (unless you use volread)

let's try explicitly requesting the `tifffile` plugin

In [9]:
img = io.imread(data_dir / 'mm_ex_w1488.TIF', plugin='tifffile')
print()
print('io.imread(..., plugin="tifffile")')
print(type(img), img.shape, img.dtype)


io.imread(..., plugin="tifffile")
<class 'numpy.ndarray'> (7, 302, 302) uint16
