<style>
code {
    color:red !important;
}
</style>

# Explore

Explore is a tool that allows you to compute a dataflow graph in Matlab.

## Setup

We clear and setup paths.

In [1]:
warning('off');
rmpath(genpath([pwd filesep '..' filesep '..' filesep 'test']));
rmpath(genpath([pwd filesep '..' filesep '..' filesep 'example']));
warning('on');
addpath(genpath([pwd filesep '..' filesep '..' filesep 'src']));
addpath(genpath([pwd filesep '..' filesep '..' filesep 'example' filesep 'fcn']));




## Session

You need to define a session by its name. This allows you to retrieve the session from other notebooks or experiments. A session contains the history of the launched nodes of the dataflow graph. The session is created in the context of the calling function, here `nbsession()`.

In [9]:
sess = 'sess1';
explo = nbsession(sess);

Explore folder "C:\Users\jahsue\explo"
Retrieve existing work session "sess1" for context "nbsession_26a0b20876a1f9a2e4df2a996232c94a"



Depending on the session (which acts as a context), you also can switch the nodes. A concrete example is to switch data node depending you are experimenting on the entire data set or only a few samples which could be the case when you are trying to debug your code.

In [10]:
switch sess
    case 'sess1'
        fcn = @fcnTestexp1;
    case 'sess2'
        fcn = @fcnTestexp1bis;
end




## Pipe

You need to define the pipes you will use. A pipe is a way to identify the data that is put in it. If another node needs the data to compute its code, it checks the signature of the data before loading it. It compares this signature with the signature used during the last computation. If the code and its dependencies did not change and the inputs signature did not change, the node code will not be computed.


Everytime you create a pipe, you need to define its signature type. For the moment, you have the choice between 2 different types: <br>
1) **Matfile** (`'matfile'`): it hashes the content of the Matlab file payload and save it as signature. A description of MAT-file format (including header and payload definition) can be found [here](https://maxwell.ict.griffith.edu.au/spl/matlab-page/matfile_format.pdf).<br>
2) **Date** (`'date'`): it signs with the Matlab file save date. This kind of signature is suited for big data files.

In [11]:
explo.addPip('s1','matfile');
explo.addPip('s2','date');
explo.addPip('s3','auto');




## Variable

In [12]:
explo.addVar('v1','s3');




## Node

In [13]:
explo.addFcn('m1',fcn,{},{'s1','s2'},'class','branch');
explo.addFcn('m2',@fcnTestexp2,{'m1_s2','m1_s1','v1_s3'},{'s1'});
explo.addFcn('m3',@fcnTestexp3,{'m1_s1','m2_s1'},{'s3'},'class','leaf');




## Init

To init a graph, call the following method. However the magic command <code>%plot native</code> should be called prior to that to ensure the plot is not closed at the end of the command (closing the plot and putting the image of the axe in the output of the notebook is the standard behavior <code>%plot</code>). 

Indeed, if the plot is closed, the you will not be able to run the graph. This was intentionally implemented in the native Matlab IDE in order to ensure multiple instances of the graph are not initialized. This will hopefully be changed in a near future.

In [14]:
%plot native
explo.init();




## Run

Run the graph to the end node <code>m3</code>. In the meanwhile, take a look at the Matlab figure.

In [16]:
explo.setVariable('v1_s3',3);
explo.run('-e:m3');
data = explo.getVariable('m3_s3');

Action     = Save auto [matfile-v6] output "v1_s3" (started on 10-Apr-2019 17:30:06)...
             |--- Elapsed time is 0.0057053 seconds.
Action     = Compute signature of output "v1_s3" (started on 10-Apr-2019 17:30:06)...
             |--- Elapsed time is 0.010947 seconds.
--- Function @fcnTestexp1 "m1" <branch>
Task       =   First test function
Input(s)   = [  ]
Output(s)  = [ m1_s1 | m1_s2 ]
Action     = Compute function signature (started on 10-Apr-2019 17:30:06)...
             |--- Elapsed time is 0.023213 seconds.
Decision   = RETRIEVE (same signatures)...
--- Function @fcnTestexp2 "m2" <branch>
Task       =   Second test function
Input(s)   = [ m1_s2 | m1_s1 | v1_s3 ]
Output(s)  = [ m2_s1 ]
Action     = Compute function signature (started on 10-Apr-2019 17:30:06)...
             |--- Elapsed time is 0.023135 seconds.
Decision   = COMPUTE (function changed)...
Action     = Load input "m1_s2" (started on 10-Apr-2019 17:30:06)...
             |--- Elapsed time is 0.0033732 se

## Figure

After you init the graph with `explo.init()`, you can view different types of information to let you know about the status of the computational graph.


* Status of computations (after `.run()`):<br>
1) A node or edge is gray when the computation is not planned<br>
2) A node or edge is blue when it is planned but not computed yet<br>
3) A node or edge is orange when it is being computed at the current time<br>
4) A node or edge is green when the computation is finished

* Status of functions (after `.run()` and after computation or retrieval):<br>
1) A function label is green when the result is retrieved<br>
2) A function label is orange when the function content has changed and maybe also the inputs<br>
3) A function label is blue hwne the function content is the same but inputs have changed<br>
4) A function label is red when it was forced

* Information on function arguments (on the figure):<br>
You can view information on the function arguments by clicking on the nodes with the tooltip.

* Shape of the nodes (on the figure):<br>
1) A circle node represents a `'branch'` node.<br>
2) A diamond node represents a `'leaf'` node.<br>
3) A square node represents a `'root'` node.

    



## Run Examples

You can also uncomment the following lines to run the nodes with another configuration.

In [17]:
% -- Run one class
% explo.run('-c:root');

% -- Run only needed nodes to compute end node
% explo.run('-e:d4');

% -- Run only needed nodes to compute end node and force their computation
% explo.run('-e:d4-m:f');

% -- Run all processing nodes
% explo.run('-c:root');
% explo.run('-c:branch');

% -- Open the Matlab file in Matlab IDE
% edit('testexp2.m')

% -- Retrieve its absolute path
% which('testexp2.m')

% -- Get the experiment information
 explo.info()

=== I/ CONTEXT
Context name "nbsession_26a0b20876a1f9a2e4df2a996232c94a"
1) Session "sess1": Last init = 10-Apr-2019 17:29:39 | Variable cache = 5 files - 3E+03 bytes

=== II/ SESSION HISTORY
Session name "sess1"
1) Init (10-Apr-2019 17:29:39): 3 functions | 5 variables | 3 pipeline classes | File = "20190410T172939099.mat"
--- Execution (10-Apr-2019 17:29:43): Command = "-e:m3" | Status = "finished" | Duration = 7.6427 seconds | Error = 1
--- Execution (10-Apr-2019 17:30:06): Command = "-e:m3" | Status = "finished" | Duration = 3.7336 seconds | Error = 0

=== III/ SESSION VARIABLES
1) Variable "m1_s2": File = "m1_s2.mat" (10-Apr-2019 17:29:50) - 2E+03 bytes | Class = date
2) Variable "m1_s1": File = "m1_s1.mat" (10-Apr-2019 17:29:50) - 2E+02 bytes | Class = matfile
3) Variable "m2_s1": File = "m2_s1.mat" (10-Apr-2019 17:30:08) - 2E+02 bytes | Class = matfile
4) Variable "m3_s3": File = "m3_s3.mat" (10-Apr-2019 17:30:10) - 2E+02 bytes | Class = auto
5) Variable "v1_s3": File = "v1_s3.m

## Experiment Notes

You can declare the experiment using the command `explo = Explore().session('sess')`.
    
* You can create multiple sessions for each context. The context is the script full path where `Explore()` is called from or the command line (when called from command line). All the caches and persistency will be defined and saved in the session `'sess'`. If you change the name or move the main script file, it will automatically create a new context. Changing sessions will allow to switch to different variable caches. This can be used for instance when switching from debug (small data set) to production (bigger one).

* You can emulate an experiment creation or retrieval from another context (script or the command line) with the following commands: <br>
1) `explo = Explore('')` emulates the expriment from the command line context<br>
2) `explo = Explore('~/experiments/exp1.m')` emulates the experiment from the script `~/experiments/exp1.m` context<br>
3) `explo = Explore(@exp1)` emulates the experiment from the script handle `@exp1` context<br>

* With the command `explo.info()`, you can get information on the status of variables that are declared, the history of past `.init()` and `.run()` methods and also the different sessions of the context.


## Graph Notes


* The method `explo.addPip('s1','var')` adds a new pipe type. Data stored with this pipe are considered equal when the hash of the variable content is the same. This should be used only for reasonable variable sizes (<5-10 MB) caching, otherwise the hashing function will take too much time and you will loose the benefit of persistent memoization (generally deprecated, please use `'matfile'` pipe type).

* The method `explo.addPip('s2','date')` adds a new pipe type. Data stored with this pipe are considered equal when the saved time is the same. This is more efficient with big data sizes (>1 GB).

* The method `explo.addPip('s3','matfile')` adds a new pipe type. Data stored with this pipe are considered equal when the content of a `.mat` file without its header is the same. This is more efficient with standard data sizes. (5MB<size<1GB)

* The method `explo.addFcn('m4',@pow1,{'m2_s1','m3_s1'},{'s1'})` adds a new computational node (like Python's decorator) called `'m4'` to the function `@pow1`, mapping the pipe `'s1'` from the decorator `'m2'` to the first input of `@pow1` and the pipe `'s1'` from the decorator `'m3'` to the second input of `@pow1`.

* By default the node class is `'branch'` but you can change it to `'leaf'` or `'root'`, using for example, `explo.addFcn('m1',@data1,{},{'s1','s2'},'class','root')`. A `'root'` node will not be executed automatically. This is very useful when working with both small and big data sets. Usually, you use a small data set to validate your scripts and debug them, so you do not want to import the data every time but only on demand executing the specific class of nodes `'root'`. To switch to a bigger data set, you can processed the `'root'` nodes again by changing paths to point to the bigger data set. Typically, a `'leaf'` node should be a display node, i.e., no big computations should happen. Therefore, executing the `'branch'` nodes will trigger all the big computations without displaying their result (in `'leaf'` nodes) and  without building the data set again (in `'root'` nodes)

* Because the semantic includes underscores to declare a variable  (output pipe from a node, e.g., `'m2_s1'`), the only restriction is that the pipe and node names should not contain any underscores.

* To run the computation graph, use `explo.run()`. This will run by default  all the branch and leaf nodes. You can options to the method in order to compute only chosen nodes:<br>
1) Class run: `explo.run('-c:root')` will compute only all the `'root'` nodes<br>
2) Solo run: `explo.run('-s:m1')` will compute or retrieve only node `'m1'`<br>
3) End run: `explo.run('-e:m1')` will compute or retrieve only the nodes needed to obtain the output of node `'m1'`<br>
4) Run with another mode: `explo.run('-s:m1-m:f')` will force to compute the node `'m1'` even if the result could be retrieved, i.e., even if the inputs and the function content and dependencies did not have changed.<br>

