# An Introduction to HyperSpy:
## The multi-dimensional data analysis toolbox

### <br/>
### Josh Taillon and Andy Herzing
#### *April 5, 2018*

# Notes for before presentation:
* Open separate instances of the following (all in `hyperspy` conda env):
    * Jupyter QtConsole
    * HyperSpyUI
    * JupyterLab
    * Jupyter Notebook
    * Spyder (with `./examples/analysis_script.py` open)
    * Plain anaconda console in `./examples/` directory
    

#### Import hyperspy

In [3]:
%matplotlib nbagg
import hyperspy.api as hs

#### Disable warnings for presentation:

In [4]:
import logging
hs_logger = logging.getLogger('hyperspy') 
hs_logger.setLevel(logging.ERROR)

## A quick note first:

## This isn't your parents' Powerpoint...

## ...because everything is interactive!

In [None]:
import datetime
import time
datestring = datetime.datetime.now().strftime('%B %d, %Y')
for c in 'Today is {}!'.format(datestring):
    print(c, end='')
    time.sleep(.2)

## Made possible with:

* Jupyter notebook &mdash; https://jupyter.org/

* RISE (Reveal.js IPython/Jupyter Slideshow Extension) &mdash; https://github.com/damianavila/RISE

# Introduction

## What is HyperSpy?

* Open-source Python library for interactive data analysis of multi-dimensional datasets

* Makes it easy to operate on multi-dimensional arrays as you would a single spectrum (or image)

* Easy access to cutting-edge signal processing tools 

* Modular structure makes it easy to add custom features

## History of HyperSpy

* Developed by [Francisco de la Peña](https://scholar.google.com/citations?user=5n2c_fYAAAAJ&hl=en) in 2007 &mdash; 2012 as part of Ph.D. Thesis

* Originally called EELSLab:

<center><img src="img/eelslab.png" width=500px></center>

* Open-sourced (on [Github](https://github.com/hyperspy/hyperspy)) in 2010

* Renamed to HyperSpy in 2011

* Now... over 100 citations, and rapidly growing!

## Design philosophy of HyperSpy

* HyperSpy is a Python library, rather than standalone program
    * Part of the greater scientific Python ecosystem

* Enables and requires Python scientific stack (i.e. `numpy` and `scipy`)

* Data storage is in an open hierarchical format (HDF5)

* Analysis done via reproducible notebooks

* Feature development is completely open-source

## How we came to love HyperSpy

### Josh:

* Became interested in multivariate statistical analysis of EELS spectrum images

* No easy way to do that in commercial software

* The entire scientific Python ecosystem is available from HyperSpy &mdash; <br/> machine learning, clustering, signal separation, etc.

* Came for the data analysis, stayed because of the community

### Andy:

* <mark>To be filled-in</mark>

* <mark>To be filled-in</mark>

* <mark>To be filled-in</mark>

* <mark>To be filled-in</mark>

# Getting Started

## Installation

* Easiest method on Windows &mdash; HyperSpy bundle
  * http://hyperspy.org/download.html#windows-bundle-installers
  * Installs a Python distribution with HyperSpy included
  * Best method if you have no prior Python experience

* For more control (on Windows, Mac, and Linux) &mdash; Anaconda Python
  * https://www.anaconda.com/download/
  * After installing Anaconda, simply run `conda install hyperspy`
  * This method is preferred by the developers

## How to use HyperSpy?

* Console/Command line

* Integrated development environment (IDE)

* **Jupyter Notebook** (and JupyterLab)

* HyperSpyUI

## Important note:


<center>*Because HyperSpy is a library, all of these are just generic ways to access Python, and not specific to HyperSpy! <br/>(except the last one)*</center>

## Console/Command line

The simplest way to run is with a pre-written script directly from the command line:

```
$ python analysis_script.py
```

There are also "advanced Python interpreters", such as Jupyter QTConsole, `bpython`, `ipython`, etc.

## Integrated Development Environments

* Spyder (live example)
* PyCharm
* NetBeans

## Jupyter Notebook

The Jupyter project (https://jupyter.org) exists to:

"...develop open-source software, open-standards, and services for interactive computing across dozens of programming languages."

The "Notebook" is a human-readable format for storing both the inputs and outputs of code (see https://en.wikipedia.org/wiki/Notebook_interface)...

Inspired by Mathematica and Maple; has been adopted in many languages

Quick deviation from slideshow to show notebook interface...

#### **Features of the notebook:**

* Separation of the kernel (for calculation) and the front-end (for display)

* Runs completely in the web-browser (no special software needed)

* Kernel can be run on a central server - users connect with a web browser

* `.ipynb` files are JSON format and can be versioned

* Language-agnostic (can be used with Python, R, Java, Julia, etc.)

## Jupyter Lab

* An exciting new project that is more fully-featured and will eventually replace the Notebook interface

* Aims to be an IDE like Spyder or RStudio, but running within the browser

* Incorporates notebooks, the terminal, text editor, file browser, rich outputs, etc. into one interface

* Deviation for a short view of Jupyter Lab

## HyperSpyUI (https://github.com/hyperspy/hyperspyui)

* Developed in parallel to HyperSpy as a more "user-friendly" experience

* Many commonly used features from HyperSpy are available

* Deviation for a short view of Jupyter Lab (loading EELS signal, view metadata, signal separation)

* Most use Jupyter notebooks, but the UI is useful for quick investigations, or for those without programming experience

## How to get help?

* Well-documented user guide and documentation: http://hyperspy.org/hyperspy-doc/current/user_guide/index.html

* Tutorials and demos: https://github.com/hyperspy/hyperspy-demos

* User group list: [hyperspy-users@googlegroups.com](https://groups.google.com/forum/#!forum/hyperspy-users)

* Gitter chat: https://gitter.im/hyperspy/hyperspy

* If all else fails, Andy and Josh

## The Signal Class

The "heart" of HyperSpy's data structure

## Structure of a `Signal`

* `Signal` is a wrapper around the raw data

*  Data is stored in a `numpy` array 

* Separation of Navigation and Signal axes

* <mark>Andy, want to fill this in?</mark>

## The `Signal` class offers flexibility...

* <mark>Andy, want to fill this in?</mark>

## Getting your data in (and out) of HyperSpy

* Many data readers have been written for experimental tools:

<center><img src="img/formats.png" width=600px></center>

# Supplementary information and setup code

In [2]:
%matplotlib nbagg
import hyperspy.api as hs
import numpy as np
hs.signals.Signal1D(np.random.rand(10,10,100)).plot()



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### Downloading and creating test signal:

In [14]:
import hyperspy.api as hs
hs.datasets.eelsdb(spectrum_type='coreloss', formula="B4C")[0].save('examples/EELS_signal_B4C.hdf5')
hs.datasets.eelsdb(spectrum_type='lowloss', title="Silicon Dioxide Amorphous")[0].save('examples/EELS_signal_SiO2_ll.hdf5')

Overwrite 'examples/EELS_signal_B4C.hdf5' (y/n)?
y


In [59]:
import hyperspy.api as hs
from skimage.data import astronaut
s = hs.signals.Signal1D(astronaut())

# Calibrate the image
s.axes_manager[0].name = "width"
s.axes_manager[0].scale = 0.13
s.axes_manager[0].offset = -29.2
s.axes_manager[0].units = "cm"

s.axes_manager[1].name = "height"
s.axes_manager[1].scale = 0.13
s.axes_manager[1].offset = -12.9
s.axes_manager[1].units = "cm"

s.axes_manager[2].name = "RGB"
s.to_signal2D().save("astronaut.hdf5")

Overwrite 'astronaut.hdf5' (y/n)?
y


In [60]:
from urllib.request import urlretrieve, urlopen
from zipfile import ZipFile

# This line doesn't work at NIST, but we've packaged the files locally
# files = urlretrieve("https://www.dropbox.com/s/dt6bc3dtg373ahw/machine_learning.zip?raw=1", "./machine_learning.zip")

with ZipFile("../machine_learning.zip") as z:
    z.extractall()