    Version control:
      Last updated Jupyter notebook: 15-05-2017
      Compatible MATRIX versions: 3.3.1, 3.0.
      Python version: 3.6.1
    
    Authors:
      Procopios Constantinou & Tobias Gill
      London Centre for Nanotechnology
      procopios.constantinou.16@ucl.ac.uk
      toby.gill.09@ucl.ac.uk

# STM data summary and analysis platform

### Contents
* [0 - Installing Jupyter Notebook](#0)
* [1 - Data selection](#1)
* [2 - Topography analysis](#2)
* [3 - Spectroscopy I(V) analysis](#3)
* [4 - Current-distance I(z) analysis](#4)
* [5 - Current-imaging-tunneling spectroscopy (CITS) analysis](#5)
* [6 - Supplementary information](#6)

This is a [Jupyter notebook](http://jupyter.readthedocs.io/en/latest/) that summarises and analyses any data obtained from STM experiments performed at the [London Centre for Nanotechnology (LCN)](https://www.london-nano.com/). The raw data can take the form of a topography scan (*.Z_flat* file), a spectroscopy scan (*.I(V)_flat* file) and a current-distance scan (*.I(Z)_flat* file) - all of which are displayed and analysed within this Jupyter notebook.

There are two essential requirements for this Jupyter notebook to run without any issues:
- The initial raw MATRIX files must be converted to flat-files by using the [Vernissage](http://www.scientaomicron.com/en/software-downloads-matrix-spm-control/55) software, available by Scienta Omicron, for them to be viewed and/or analysed by this Jupyter notebook. More importantly, this will then allow you to use Vernissage as a data reduction tool, such that all the *good* and *sensible* data can be imported into this Jupyter notebook for viewing and subsequent analysis.
- The path to the parent directory, that holds all the data directories, each of which contain all the flat-files, must be defined by the *dataPath* variable and the path to the directory that contains the stm-analysis module must be defined by the *modulePath* variable.


This Jupyter notebook uses a *minimalistic* and *simplistic* interface so that you can get started right away, even with no prior training to the Python language.

## 0 - Installing Jupyter Notebook <a class="anchor" id="0"></a>
While Jupyter runs code in many programming languages, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook. For new users, it is highly recommended [installing Anaconda](https://www.continuum.io/downloads). Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. It essentially is all you need, for either Mac or Windows.

Use the following installation steps:

1. Download [Anaconda](https://www.continuum.io/downloads) and it is recommended to download Anaconda’s latest Python 3 version (currently Python 3.6.1).
2. Install the version of Anaconda which you downloaded, following the instructions on the download page.
3. Once Anaconda has been downloaded, run the Anaconda (or Anaconda - Navigator) application and on the home-page, select install Jupyter notebook.
4. Congratulations, you have installed Jupyter Notebook and can get started right away! 

*Hint: All you need to know is the < Shift > < Enter > command that runs a cell of Python code within the Jupyter notebook.*

## 1 - Data selection <a class="anchor" id="1"></a>
This first section of the Jupyter notebook is critical, as the data you select here is what will be subsequently displayed and analysed. Furthermore, this is the only section upon which all others are dependent upon because all of the analysis sections run completely independent from one another.


You should make sure the correct file path is written for both the *dataPath* and *modulePath* variables;
- *dataPath*: The path to the directory that holds the **folders** of all the different STM flat-file data.
- *modulePath*: The path to the directory that holds the **stm_analysis.py** script, which yields all the classes and functions to perform all the data-viewing and analysis.


If this is done correctly, the code in this section will run smoothly and the output will be a set of iPython button widgets (whose labels are identical to the folder names) that will allow you to select which folder of flat-file data should be loaded as the *data* object, which will hold all of the data from the chosen directory. One important thing to note is that if you select a different data directory during the analysis, all the analysis code will need to be restarted again (easiest to do this by going to the menu and selecting 'Kernel > Restart and Run all').

The true power of this Jupyter notebook is that it allows you to load in **multiple folders simultaneously**, from **completly different *dataPath* directories**. If you wish to exploit this, all you need to do is follow the convention laid out here:
- If you wish to select multiple folders of data from the **same** *dataPath* directory, then you can create multiple *data* objects (labelled *data_1*, *data_2*, ..., *data_N*) from the same *dataPath*, which can each be called by '*data_N = stm.DataSelection(dataPath)*'.
- If you wish to select multiple folders of data from **different** *dataPath* directories, then you can define each *dataPath* explicitly (labelled *dataPath_1*, *dataPath_2*, ..., *dataPath_N*) and then create unique *data* objects (labelled *data_1*, *data_2*, ..., *data_N*) associated with each *dataPath* defined. This can be called by '*data_N = stm.DataSelection(dataPath_N)*'.
- Finally, all subsequent viewing and data analysis on all these different *data_N* objects can be performed by passing each *data_N* object through the stm analysis code, within the same cells. This will then display all the output figures adjacent to eachother, allowing them to be easily and simultaneously compared.

In [1]:
# Loading in the file path the stm analysis module
modulePath = '/Users/pconstantinou/Documents/Prog_GitHub/STM_flatfile_analysis/stm_analysis/'
# Loading in the file path to data_1 directories
dataPath_1 = '/Users/pconstantinou/Documents/stm_data/'

In [2]:
# Forcing all figures to be plotted in-line throughout the JuPyter notebook
%matplotlib inline
# Importing all the necessary python modules
import sys                                   # Import the system parameters
sys.path.insert(0, modulePath)               # Change the working directory to that of the stm analysis modules
import stm_analysis as stm                   # Import the stm-analysis code

In [3]:
# Define the data objects that will extract all STM data from the selected data directories
data_1 = stm.DataSelection(dataPath_1)

*Hint: If a new dataPath directory is defined, the Python code must be executed from the top again, so that it loads in the changes.*

## 2 - Topography analysis <a class="anchor" id="2"></a>

This section will explore the analysis of the STM topography scans that were obtained from STM experiments. This is done by using the *stm.STT(data_N, type)* function, which loads in the *data_N* object from Section 1 and executes a specific *type* of analysis. All of the relevant Python code executes the analysis in the background and in real-time, as the widget selections are changed. This is the only section that is split into multiple layers, given the vast amount of permutations that the topography analysis can take. The different layers of the analysis are (i) Leveling operations, (ii) Image operations, (iii) 1D line-profiles and (iv) Fast-fourier transforms.  A detailed explanation of each stage of the topogrphy analysis and the operations available are discussed below.

### 2.1 - Leveling and Image operations

global plane,
local plane,
linewise subtraction,
three-point RoI,
polynomial background removal,
zeroing the bottom of the stm plot.
Print out the dictionary of information next to the figure

In [5]:
# Analysing all the topography scans for the selected directory
stt_1 = stm.STT(data_1)

### 2.2 - 1D line profiles

Line profile over given P1 and P2 points in nanometers,
Potential to fit a Gaussian, Lorentzian to the line profile and return its maximum height and std. dev.
Line profile analysis to fit sinusoid with Gaussians.

In [6]:
# Defining the line profile across the stm topography scan
stt_line_1 = stm.STT_lineprof(stt_1)

In [None]:
# Analysing the line profile across the stm topography scan
def f(x, a, b, c)
    import numpy as np
    return a*np.sin(b*x+c)

### 2.3 - 1D line profile statistics

In [None]:
import numpy as np
points = np.array([[1, 1],
                   [6, 5]])

A = stt_line_1.nm2pnt(points[1][0], stt_line_1.topo_data)

np.max(stt_line_1.line_prof_y)

### 2.3 - Fast Fourier Transform

Perform a fourier transform of the stm topography image produced.

### 2.5 - 3D topography profile

1D and 2D FFT filtering

## 3 - Spectroscopy $I(V)$ analysis <a class="anchor" id="3"></a>
This section will explore the analysis of $I(V)$ spectroscopy curves that were obtained from STS experiments. This is done by using the *stm.STS(data_N)* function, which loads in the *data_N* object from Section 1. All of the relevant Python code executes the analysis in the background and in real-time, as the widget selections are changed. A detailed explanation of each stage of the $I(V)$ curve analysis and the operations available are discussed below.

In [None]:
# Analysing all the STM spectroscopy curves for the selected directory
sts_1 = stm.STS(data_1)

### 3.1 - *Raw $I(V)$ files*
The first step is to load in the raw $I(V)$ curves that you wish to browse through, or analyse, by using the '*Raw $I(V)$ files*' scrollable selection box, located on the far left of the interaction pane. This selection box gives you various amounts of flexibility in regards to data selection:
- Single $I(V)$ files can be loaded by simply clicking on the file you wish to view.
- Multiple $I(V)$ files can be loaded in simultaneously using three methods; 
    1. <*Ctrl*> (or <*Command*> on Mac) *clicking* on multiple, individual files loads in specific selections of $I(V)$ curves.
    2. <*Shift*>* clicking* allows entire blocks of $I(V)$ files to be loaded in.
    3. <*Ctrl*>* + A *(or <*Command*>* + A *on Mac) loads in every single $I(V)$ file.

*Note: current tests have been performed with over 400 $I(V)$ curves being loaded in simultaneously and the analysis runs fine, but it may take about ~5-10s to execute completly.*

The analysis takes into consideration all of the $I(V)$ curves selected, even if they have *different bias ranges* and *grid spacings* between their respective voltage domains;
- If multiple $I(V)$ curves are selected with different *bias ranges*, the Python program automatically determines a mutually consistent domain between all the selected $I(V)$ curves. 
- If any of the $I(V)$ curves selected then have different *grid spacings*, a linearly interpolation is performed, so that they can be sampled onto the mutually consistent voltage domain. 

Therefore, the Python program essentially performs a cross correlation analysis between all of the selected $I(V)$ curves to ensure consistency in the voltage domain *range* and *grid spacing*. 

*Note: The voltage domain of the $I(V)$ curves can be selectively controlled by using the 'Restrict $V_{bias}$' slider, which is provided so that the $I(V)$ curves can be easily cropped. This allows you to do two things; (i) any anomalous data or maxed out data can be rejected, or (ii) the voltage bias domain can be restricted so that the maximum tunneling current is identical in the +/- bias regimes. Any $I(V)$ data that is cropped out is displayed as grey points in the corresponding 'Raw $I(V)$'figures.* 


### 3.2 - *$I(V)$ analysis*
The $I(V)$ spectroscopy analysis is split up into three main constituents:

**1. Intermediate plots: **
This performs the full analysis on all the $I(V)$ curves that have been selected and it's corresponding figure shows each stage of the analysis, which follows the steps outlined below:
- **Averaging**: A global average is determined from all the $I(V)$ curves that have been selected. If '*Both*' traces are selected, then the average is taken over all the *traces* and *retraces*, however, the '*Traces*' option is there to allow the data analysis to be executed over just the '*Trace*' or '*Retrace*' $I(V)$ curves, if necessary.
- **Smoothing**: There is the option to provide no smoothing at all, but there are two additional options for either *Binomial* and *Savitzky-Golay* smoothing. The Binomial smoothing is effectivly a Gaussian filter and the '*Smoothing order*' option controls the window size over which the Binomial smoothing is performed. The Savitzky-Golay smoothing is the default smoothing method, as it is found to provide much better smoothing in regards to the raw $I(V)$ data. The '*Smoothing order*' option here controls the running polynomials order, with a fixed window size of 51 points. 
- **Differentiating**: The $dI/dV$ curve that is displayed is the $dI/dV$ curve of the averaged and smoothed raw $I(V)$ data that has been selected. There are two important features that are included in the $dI/dV$ curve; (i) the entire $dI/dV$ curve has been *globally offset along the y-axis* by 1.1 times its minimum value, directly after differentiation, to ensure that there are no negative data points (as they would not be displayed on the semi-log plot of $dI/dV$), (ii) the variance is calculated by finding the difference between the mean $[I(V)]^2$ curves and the mean of the raw $I(V)$ curves.


**2. Point STS: **
This performs the analysis generally associated with spectroscopy curves that were taken over specific points on the sample surface and it's corresponding figure shows all the raw $I(V)$ curves selected, along with the best estimate for the $dI/dV$ curve and it's band-gap.

The most important feature associated with this analysis is the '*Band-gap*' slider, which allows you to selectivly define the location and range of the best estimate to the band gap. The Python program then determines the 1$\sigma$ and 2$\sigma$ estimations, based on the band-gap you have defined. The band-gap calculations are as follows:
- The voltage domain of the band-gap is directly selected and it's the length, along the voltage domain, defines the band gap. The constant y-axis position is determined by taking the average of the $dI/dV$ curve, over everything that is *within the band-gap window*.
- The 1$\sigma$ and 2$\sigma$ values of $dI/dV$ are then determined directly from the standard deviation of the $dI/dV$ data that lies *within the band-gap window*, and this can be transposed to get the associated values of the 1$\sigma$, and 2$\sigma$, VBM and CBM positions.
- All the information associated with the band-gap calculations is shown to the right of the corresponding figure.

*Note: The band-gap calculator is very sensitive to the quality of data that is used and it should always be aimed to get a $dI/dV$ curve that looks like a 'V' or 'U'. Bad quality $dI/dV$ curves are ones that look like an 'M', which has minima that are much lower than that of the band-gap itself. In order to rectify this issue with bad data, it is recommended to cut the domain of the $I(V)$ curves (using the *Restrict V* slider), such that these spurious regions are deleted from the edges. This will ensure that the band-gap calculations will always work.
*


**3. Line STS: **
This performs the analysis generally associated with spectroscopy curves that were taken over specific line-profiles on the sample surface (usually over some defect or step edges) and it's corresponding figures shows all the stacked $dI/dV$ curves in comparison the mean $dI/dV$ curve, but also an image of a train of the $dI/dV$ data in the form of an image. 

The two associated figures with this analysis demonstrate how the $dI/dV$ curves change as a function of the selected $I(V)$ curves;
- The *left* figure shows a comparison between all the individual $dI/dV$ curves and their corresponding mean. This gives an illustration of how the $dI/dV$ curves change over the different $I(V)$ curves selected. 
- The *right* figure shows a train of all the $dI/dV$ curves, stacked in ascending order of file name. This gives a direct illustration of how the $dI/dV$ curves change over the different $I(V)$ curves selected, in the form of a CITS slice. Additionally, the VBM and CBM edges are displayed from the previous band-gap calculations performed in the 'Point STS' analysis and given you have a sufficient amount of $I(V)$ curves (~50+), the band-gap can be checked for consistency from the image.

*Note: This analysis section is versatile because it can be used to compare various $I(V)$ curves that were either taken over identical or different regions and see directly how the $dI/dV$ curves change. Hence, this allows you to get good estimates of the band-gap, given that you have obtained a sufficient (~50+) amount of repeats, but to also identify any surface states that exist when $I(V)$ curves are taken alone a pristine-defect-pristine line.
*



### 3.3 - *Axes controls*
Finally, the '*Axes Controls*' are located on the far-right of the interaction pane and the default condition is that all the axes will *auto-scale* to a sensible size, given the selected I(V) files that have been loaded in. If you wish to change the limits on both the *x-* and *y-*axes directly, you can do this by selecting the '*Axes limit*' button;
- The voltage bias *V* slider simply controls the limits over the voltage-domain for all of the figures.
- The tunneling current *I* slider controls the symmetrical value of the tunneling current limit along the '*y-*axes' of the figures.
- The *dI/dV* slider controls the maximum value of *dI/dV* that appears in the figures and, by default, its minimum is taken at the location of the minimum *dI/dV* value. One important side-note is that you can use the '*dI/dV*' slider as a method to actively control the contrast of the image formed in the *Line STS* analysis section too.


*Note: If the axes are made smaller than the data being displayed in the figure, the data *does not get deleted or chopped*, rather it just remains invisible and off the axis.*

## 4 - Current-distance I(z) analysis <a class="anchor" id="4"></a>

This section will explore the analysis of $I(Z)$ curves by using the *stm.STZ(data_N)* function, which loads in the *data_N* object from Section 1. All of the relevant Python code executes the analysis in the background and in real-time, as the widget selections are changed. A detailed explanation of each stage of the $I(Z)$ curve analysis and the operations available are discussed below.

In [None]:
# Analysing all the STM spectroscopy curves for the selected directory`
stz_1 = stm.STZ(data_1)

The zero point of the I(Z) curve is the position that the tip is when it reaches the set-point. It does not give the tip-sample distance! But the I(Z) curve can be used as calibration, provided that the same set point is used for all the scans.

Make a plot of I(Z) as a function of the voltage bias and then you can make a plot of Kappa as a function of the voltage bias, which then gives you the work fucntion as a fuction of voltage bias, from which you can extract the extent of the band bending as a fucntion of the voltage bias, which you can then potentially use to correct the geometry of a CITS scan too.

## 5 - Current-imaging-tunneling spectroscopy (CITS) analysis <a class="anchor" id="5"></a>


CITS scans are completly seperate to all other scans.
Use matlab to get 3D image of the CITS map.


5.1 - Raw CITS scans.


5.2 - Topography correctd CITS (fixed kappa)

5.3 - Topography corrected CITS (using deltaz)

## 6 - Supplementary information  <a class="anchor" id="6"></a>

- *If you perform all the analysis and then change the selected folder in '1 - Data Selection' you will need to run the code consecutively again.*

- *If you want to load in multiple files from different directories, this can be performed by creating a new Class that yield. *

- *If you double click on any of the figures produced, it will zoom in.*

- *Do not save CITS files in the same folder as STS curves because they have the same '.I(V)_flat' format which the Python program cannot distinguish between. Instead, create a seperate CITS directory with all the CITS scans placed inside that.*