# An Introduction to HyperSpy:
## The multi-dimensional data analysis toolbox

### <br/>
### Josh Taillon and Andy Herzing
#### *April 5, 2018*

## <a id='top'></a> Table of contents

1. <a href='#intro'> Intro</a>
2. <a href='#starting'> Getting Started</a>
2. <a href='#signal'> The Signal Class</a>
3. <a href='#io'> Input/Output</a>
4. <a href='#EM'> Electron Microscopy Tools</a>
5. <a href='#EDS'> EDS Processing</a>
6. <a href='#EELS'> EELS Processing</a>
7. <a href='#extending'> Extensibility</a>
8. <a href='#demos'> Interactive Demos</a>

# Notes for before presentation:
* Open separate instances of the following (all in `hyperspy` conda env):
    * Jupyter QtConsole
    * HyperSpyUI
    * JupyterLab
    * Jupyter Notebook
    * Spyder (with `./examples/analysis_script.py` open)
    * Plain anaconda console in `./examples/` directory
    

#### Import hyperspy

In [None]:
%matplotlib nbagg
import hyperspy.api as hs
import numpy as np

#### Disable warnings for presentation:

In [None]:
import logging
hs_logger = logging.getLogger('hyperspy') 
hs_logger.setLevel(logging.ERROR)

## A quick note first:

## This isn't your parents' Powerpoint...

## ...because everything is interactive!

In [None]:
import datetime
import time
datestring = datetime.datetime.now().strftime('%B %d, %Y')
for c in 'Today is {}!'.format(datestring):
    print(c, end='')
    time.sleep(.1)

## Made possible with:

* Jupyter notebook &mdash; https://jupyter.org/

* RISE (Reveal.js IPython/Jupyter Slideshow Extension) &mdash; https://github.com/damianavila/RISE

# <a id='intro'></a> Introduction

## What is HyperSpy?

* Open-source Python library for interactive data analysis of multi-dimensional datasets

* Makes it easy to operate on multi-dimensional arrays as you would a single spectrum (or image)

* Easy access to cutting-edge signal processing tools 

* Modular structure makes it easy to add custom features

## Why &nbsp; <img src="img/python_logo.svg" width=300px style="display: inline-block;">?

<center><img src="img/xkcd_python.png" width=600px></center>

## Why &nbsp; <img src="img/python_logo.svg" width=300px style="display: inline-block;">?

* Quickly becoming the *de facto* standard of scientific computing

* Free (as in speech and as in beer)
    * No pesky licenses to checkout

* Vast array of scientific libraries available:
    * `pip install antigravity`

* Thanks to `numpy` and other libraries, similar (or often better) performance than MATLAB

## History of HyperSpy

* Developed by [Francisco de la Pe√±a](https://scholar.google.com/citations?user=5n2c_fYAAAAJ&hl=en) in 2007 &mdash; 2012 as part of Ph.D. Thesis

* Originally called EELSLab:

<center><img src="img/eelslab.png" width=500px></center>

* Open-sourced (on [Github](https://github.com/hyperspy/hyperspy)) in 2010

* Renamed to HyperSpy in 2011

* Now... over 100 citations, and rapidly growing!

## Design philosophy of HyperSpy

* HyperSpy is a Python library, rather than standalone program
    * Part of the greater scientific Python ecosystem

* Enables and requires Python scientific stack (i.e. `numpy` and `scipy`)

* Data storage is in an open hierarchical format (HDF5)

* Analysis done via reproducible notebooks

* Feature development is completely open-source

## How we came to love HyperSpy

### Josh:

* Became interested in multivariate statistical analysis of EELS spectrum images

* No easy way to do that in commercial software

* The entire scientific Python ecosystem is available from HyperSpy &mdash; <br/> machine learning, clustering, signal separation, etc.

* Came for the data analysis, stayed because of the community

### Andy:

* Needed a way to efficiently and objectively process chemical tomography data based on hyperspectral images

* No available commercial options except brute force

* Quickly realized that HyperSpy was ideally set up to enable reproducible and well documented data analysis
    * You know, science!

# <a id='starting'></a> Getting Started

## Installation

* Easiest method on Windows &mdash; HyperSpy bundle
  * http://hyperspy.org/download.html#windows-bundle-installers
  * Installs a Python distribution with HyperSpy included
  * Best method if you have no prior Python experience

* For more control (on Windows, Mac, and Linux) &mdash; Anaconda Python
  * https://www.anaconda.com/download/
  * After installing Anaconda, simply run `conda install hyperspy`
  * This method is preferred by the developers

## How to use HyperSpy?

* Console/Command line

* Integrated development environment (IDE)

* **Jupyter Notebook** (and JupyterLab)

* HyperSpyUI

## Important note:


<center>*Because HyperSpy is a library, all of these are just generic ways to access Python, and not specific to HyperSpy! <br/>(except the last one)*</center>

## Console/Command line

The simplest way to run is with a pre-written script directly from the command line:

```
$ python analysis_script.py
```

There are also "advanced Python interpreters", such as Jupyter QTConsole, `bpython`, `ipython`, etc.

## Integrated Development Environments

* Spyder (live example)
* PyCharm
* NetBeans

## Jupyter Notebook

The Jupyter project (https://jupyter.org) exists to:

"...develop open-source software, open-standards, and services for interactive computing across dozens of programming languages."

The "Notebook" is a human-readable format for storing both the inputs and outputs of code (see https://en.wikipedia.org/wiki/Notebook_interface)...

Inspired by Mathematica and Maple; has been adopted in many languages

Quick deviation from slideshow to show notebook interface...

#### **Features of the notebook:**

* Separation of the kernel (for calculation) and the front-end (for display)

* Runs completely in the web-browser (no special software needed)

* Kernel can be run on a central server &mdash; users connect with a web browser

* `.ipynb` files are JSON format and can be versioned

* Language-agnostic (can be used with Python, R, Java, Julia, etc.)

## Jupyter Lab

* An exciting new project that is more fully-featured and will eventually replace the Notebook interface

* Aims to be an IDE like Spyder or RStudio, but running within the browser

* Incorporates notebooks, the terminal, text editor, file browser, rich outputs, etc. into one interface

* Deviation for a short view of Jupyter Lab

## HyperSpyUI (https://github.com/hyperspy/hyperspyui)

* Developed in parallel to HyperSpy as a more "user-friendly" experience

* Many commonly used features from HyperSpy are available

* Deviation for a short view of Jupyter Lab (loading EELS signal, view metadata, signal separation)

* Most use Jupyter notebooks, but the UI is useful for quick investigations, or for those without programming experience

## How to get help?

* Well-documented user guide and documentation: http://hyperspy.org/hyperspy-doc/current/user_guide/index.html

* Tutorials and demos: https://github.com/hyperspy/hyperspy-demos

* User group list: [hyperspy-users@googlegroups.com](https://groups.google.com/forum/#!forum/hyperspy-users)

* Gitter chat: https://gitter.im/hyperspy/hyperspy

* If all else fails, Andy and Josh

## HyperSpy's `Signal` Class

* The "heart" of HyperSpy's data structure

* Every dataset stored within HyperSpy is a sub-class of `Signal`

## <a id='signal'></a> Structure of a `Signal`

* `Signal` is a wrapper around the raw data

*  Data is stored in a `numpy` array 

* Calibration information is stored in two types of `Axes` objects:
    * Navigation and Signal dimensions

In [None]:
hs.signals.Signal1D(np.random.random((10, 20, 30))).axes_manager

## Structure of a `Signal`

Examples of signal dimensionality:

<center>
<table class="table table-condensed table-nonfluid tablesorter tablesorter-default" role="grid">
<tbody aria-live="polite" aria-relevant="all">
<tr role="row">
<td></td>
<td><strong>Navigation</strong></td>
<td><strong>Signal</strong></td>
</tr>
<tr role="row">
<td>Single spectrum</td>
<td>0</td>
<td>1</td>
</tr>
<tr role="row">
<td>Line scan spectrum image</td>
<td>1</td>
<td>1</td>
</tr>
<tr role="row">
<td>Areal spectrum image</td>
<td>2</td>
<td>1</td>
</tr>
<tr role="row">
<td>Single image</td>
<td>0</td>
<td>2</td>
</tr>
<tr role="row">
<td>Time series image stack</td>
<td>1</td>
<td>2</td>
</tr>
<tr role="row">
<td>4D STEM diffraction image</td>
<td>2</td>
<td>2</td>
</tr>
</tbody>
</table>
</center>

## Structure of a `Signal`

* `Signal`s can be sliced by index, or by axis units, on either type of axis

* Signal axis slicing:

In [None]:
s = hs.datasets.example_signals.EDS_SEM_Spectrum()
print(s)

# Slice by axis units with floats:
print(s.isig[1.0:5.0])

# Slice by index with integers:
print(s.isig[20:100])

* Navigation axis slicing:

In [None]:
im = hs.load('examples/HRSTEM.dm3')
print(im)

# Slice by axis units and index:
im_crop = im.isig[1.0:10.5, 20:60]
print(im_crop)
im_crop.plot()

## The `Signal` class offers flexibility...

* <mark>Andy, want to fill this in?</mark>

## <a id='io'></a> Getting your data in (and out) of HyperSpy

Many data readers have been written for experimental tools:
<br/><br/>

<center><img src="img/formats.png" width=800px></center>

## Loading data is simple!

Example of Gatan's `dm3` format:

In [None]:
im = hs.load('examples/HRSTEM.dm3')

In [None]:
im

In [None]:
im.metadata

Original metadata is maintained:

In [None]:
im.original_metadata

Plotting is also simple within the notebook:

In [None]:
im.plot()

### EDAX EDS mapping data

In [None]:
s = hs.load('examples/SEM_EDS_map.spd')

In [None]:
s

In [None]:
s.axes_manager

In [None]:
s.plot()

## Generic data access

* A `Signal` can be created from any data that can be expressed as a `numpy` array

* If your tool can output raw data, it can be loaded into HyperSpy with little fuss

* Using general Python features, data from other sources can be loaded easily as well

### Loading a `.csv` spectrum file

In [None]:
# Create a csv example:
single_pix = hs.load('examples/signal_separation_EELS_SI.hdf5').inav[0,0]
x_data = single_pix.axes_manager[0].axis
y_data = single_pix.data
data = np.stack([x_data, y_data])
np.savetxt("examples/spectrum.csv", data.T, delimiter=",", header="Energy (eV), Counts")

In [None]:
# Print the first few lines of the .csv file for inspection:
with open('examples/spectrum.csv', 'r') as f:
    for i in range(10):
        print(f.readline(), end='')

In [None]:
# Load the data into a numpy array from the .csv file:
d = np.loadtxt("examples/spectrum.csv", delimiter=',')

# Create a signal from the second column of data (the spectral counts)
s = hs.signals.Signal1D(d[:,1])
s

In [None]:
# Take the first column of values and set the energy axis accordingly:
energy_data = d[:,0]
s.axes_manager[0].scale = np.diff(energy_data).mean()
s.axes_manager[0].units = 'eV'
s.axes_manager[0].offset = energy_data[0]
s.axes_manager[0].name = 'Energy'
s.axes_manager

In [None]:
s.plot()

### Loading and saving MATLAB files 

The SciPy project provides a Matlab reader and saver that makes this easy:

In [None]:
from scipy.io import loadmat, savemat
house = loadmat('examples/house_image.mat')
print(house['__header__'])

In [None]:
s = hs.signals.Signal2D(house['IMin0'])
print(s.metadata)
s.axes_manager

In [None]:
s.plot()

## "Lazy" signal access

* HyperSpy makes it easy to work with big data (bigger than your system's memory)

* Uses the excellent [`dask`](https://dask.pydata.org/en/latest/) library for chunking operations

* Almost all the regular features of HyperSpy can operate on "lazy" signals (see [User Guide](http://hyperspy.org/hyperspy-doc/current/user_guide/big_data.html))

Comparison with normal loading:

In [None]:
# Load the EDS map lazily:
s = hs.load('examples/SEM_EDS_map.spd', lazy=True)
print(type(s.data))

In [None]:
# Print some statistics about memory usage
print("Full dataset should consume:", s.data.nbytes / 1e6, 'MB')
print("Chunk sizes are:", s.data.chunks)
one_chunk = s.data[:s.data.chunks[0][0], :s.data.chunks[1][0],:s.data.chunks[2][0]]
print("Memory use from one chunk: ", one_chunk.nbytes / 1e6, "MB")

## Saving data from HyperSpy &mdash; HDF5

* The default format for HyperSpy data is an `.hspy` file in [HDF5](https://portal.hdfgroup.org/display/HDF5/HDF5) format

* Open, hierarchical data format supporting compression and full read/write capability

* All HyperSpy signals can be saved as `.hspy` files

* Saves full metadata about signal, including critical processing parameters
  * Modeling, signal separation, elemental information

## Saving data from HyperSpy &mdash; data interchange

* Other formats can be easily written:
  * Single spectra &mdash; `.msa` format
  * Images &mdash; TIFF, JPG, etc.
  * Spectrum images &mdash; Lispix-style `.rpl`/`.raw` pairs

## <a id='EM'></a> Electron microscopy-specific tools

* HyperSpy is incredibly flexible, but was developed from a microscopy perspective

* Has in-depth features related to image, EDS, and EELS processing
  * Many of the tools are applicable to multiple modalities

* Some other EM tools available:
    * Dielectric function analysis (for plasmon EELS)
    * Electron holography
    * "Extension" projects that build upon HyperSpy (like Andy's `tomotools`)

* Provides a robust framework on which to develop new processing pipelines

## <a id='EDS'></a> EDS Processing

* EDS support is implemented as `EDSSpectrum`, a subclass of `Signal` for EDS-specific features

* Open metadata structure holds relevant info about instrument and detectors:

In [None]:
s = hs.datasets.example_signals.EDS_TEM_Spectrum()
s.metadata.Acquisition_instrument.TEM.Detector

* Also holds all the compositional information:

In [None]:
print(s.metadata.Sample.elements)

# Elements can be added easily:
s.add_elements(['Cu'])
print(s.metadata.Sample.elements)

### Processing tools

* All the "basic" EDS processing tools are included:
    * Background removal
    * Net intensity line map extraction
    * Quantification using Cliff-Lorimer (k-factors), $\zeta$-factors, and ionization cross sections

* Can also use the general HyperSpy tools for more advanced analysis:
    * Curve fitting
    * Machine learning
        * Factor reduction
        * Signal separation ("phase mapping")

* Look to the extensive documentation in the [User Guide](http://hyperspy.org/hyperspy-doc/current/user_guide/eds.html) and [Tutorials](https://github.com/hyperspy/hyperspy-demos/tree/master/electron_microscopy/EDS) for help

## <a id='EELS'></a> EELS Processing

* <mark>To be completed by Andy</mark>

## <a id='extending'></a> Extensibility of HyperSpy

* <mark>To be completed by Andy</mark>

# <a id='demos'></a> Interactive demos

* Curve fitting (and it's application to EELS spectrum images)

* Processing TEM EDS data

* Extensibility (Andy's `tomotools` package)