# User Manual: Sparse Identification of Induction Motor Nonlinear Dynamics in Unbalanced Conditions

**Table of Contents**
<!-- TOC -->
* [Sparse Identification of Induction Motor Nonlinear Dynamics in Unbalanced Conditions](#sparse-identification-of-induction-motor-nonlinear-dynamics-in-unbalanced-conditions)
  * [Configuration](#configuration)
    * [Choice of Regularization / Optimizer](#choice-of-regularization-/-optimizer)
    * [Choice of Library](#choice-of-library)
  * [Usage](#usage)
    * [1) Data Generation](#1-data-generation)
    * [2) Data Preparation](#2-data-preparation)
    * [3) Optimization of hyperparameters](#3-optimization-of-hyperparameters)
    * [4) Model Identification](#4-model-identification)
    * [5) Model Evaluation](#5-model-evaluation)
  * [Additional Info](#additional-info)
<!-- TOC -->

## Configuration
### Choice of Regularization / Optimizer
sr3, lasso or STLSQ


### Choice of Library
For SINDy to work, a library of candidate functions must be defined. 
Predefined libraries from pySINDy are polynomials or Fourier terms, though pySINDy allows for
custom libraries to be defined and fine-tuned for each input variable. 
Some of our own libraries are predefined in `libs.py` and can be called by the user: `get_custom_library_funcs()`.

In [1]:
%matplotlib QtAgg
import matplotlib.pyplot as plt

import os
from optimize_parameters import parameter_search, optimize_parameters, plot_optuna_data
from source import *
from train_model_source import make_model, simulate_model

## Usage

### 1) Data Generation
The data generation is completely distict from the other aspects of the project. 
It uses the time-stepping solver IMMEC to model the induction machine and obtain the data.
The user can generate the data by running the script `data_generation.py` and setting the parameters in the script.
The user can choose to generate test or training data, resulting in one or multiple (`numbr_of_simulations`) 
simulations respectively. An initial eccentricity can also be set by `ecc_value` (value between 0 and 1) and `ecc_dir` 
(the direction in $`x, y`$ coordinates). The mode can be set to `linear`or `nonlinear` to simulate the machine.

The function automatically creates a folder inside the `train-data` or `test-data` folder with the name of the date, 
and saves the simulation data in a `.npz`-file with the name provided by the user by `save_name`.

The `.npz`-file contains the following arrays:
- `i_st` - Stator current
- `omega_rot` - Rotor angular speed
- `T_em` - Electromagnetic torque
- `F_em` - Electromagnetic force or UMP
- `v_applied` - Applied line voltages
- `T_l` - Load torque
- `ecc` - Eccentricity
- `time` - Simulated time
- `flux_st_yoke` - Stator yoke flux
- `gamma_rot` - Rotor angle
- `wcoe` - Magnetic coenergy

If multiple simulations are saved in one file, which is the case for traindata, 
the arrays have an additional (3rd) dimension for each simulation.

In [2]:
### DATA TRAINING FILES
path_to_data_files = os.path.join(os.getcwd(), 'train-data', '07-29-default', 'IMMEC_0ecc_5.0sec.npz')

### DATA TEST FILES
path_to_test_file = os.path.join(os.getcwd(), 'test-data', '07-29', 'IMMEC_0ecc_5.0sec.npz')

In [3]:
### Visualise the data
plot_immec_data(path_to_data_files, simulation_number = 10, title= "Train")

In [4]:
plot_immec_data(path_to_test_file, title="Test")

### 2) Data Preparation
In order to create a model, the training data must be prepared. This is done in the script `data_preparation.py`, which is 
called by the  `prepare_data()` function during training and postprocessing by other scripts. The user can call this function if desired,
but this is not necessary. The function takes the following arguments:
- `path_to_data_file` - Path to the `.npz`-file containing the data
- `test_data` - default False, this omits the extra preparation needed for trainingdata
- `number_of_trainfiles` - default -1 (all files), can be set to a number if not all simulations should be considered. The choice of selected simulations is random. This can be useful to reduce the training samples for large datasets.
- `use_estimate_for_v` - default False, if True, the `v_abc` are estimated from the line voltages.
- `usage_per_trainfile` - default 0.5, the percentage of the data used from each simulation.
- `ecc_input`- default False, if True, the eccentricity is used as an input variable to the model.

The function returns a dictionary containing the following arrays:
- `x`- Currents
- `u`- Input values, if `ecc_input` is True, the eccentricity is also included
- `xdot` - Time derivative of the currents
- `feature_names` - Names of the features to pass to the SINDy model

Additionally, as one might want to fit a SINDy model for the torque or UMP (by replacing `xdot`), the following are also present:
- `UMP` - Unbalanced magnetic pull
- `T_em` - Electromagnetic torque
- `wcoe` - The magentic coenergy



If the data is trainingsdata, it is split up into train and validation data (80% - 20%),    
in which case the dictionary also contains all the previous values but ending with `_train` and `_val`. 

### 3) Optimization of hyperparameters
Assuming the data generation is done, the user can start the optimization of the hyperparameters. 

As described in [this section](#choice-of-regularisation), the Lasso and SR3 regulators are considered, 
yielding 1 and 2 hyperparameters respectively. Hence, the validation data is used to select the best parameter values. 
This can be combined with various selections of library candidate functions, enlarging the search-space.

For this purpose, a Python package called `Optuna` is used, which searches for the pareto-optimal solution. 
The user can initialise a study by calling the function `optimize_parameters()` from the script `optimize_parameters.py`, by which a study is created.
The function requires the following arguments:
- `path_to_data_files` - Path to the `.npz`-file containing the training data
- `mode`- default 'torque', can be set to 'ump', 'currents' or 'wcoe', specifying what the model predicts
- `additional_name` - default None, a string that is added to the name of the study
- `n_jobs` - default 1, number of cores to be used for the optimization, to run in paralell
- `n_trials` - default 100, number of trials to be performed per core

The ranges of the parameters are predefined inside the function, but can be changed by the user. 
The library candidate functions are called from `libs.py` by the function `get_library_names()` during the search. 
The user should set the desired libraries by changing the returned values of this function. 

The resulting study is saved in the `optuna_studies` folder, which can be accessed by calling the `plot_optuna_data()` function from the same script.


In [None]:
### Optimising parameters takes a long time to run
#optimize_parameters(path_to_data_files, mode="torque", additional_name="_jupyter", n_jobs = 1, n_trials = 5)

ecc_input = True
Loading data
Done loading data
Calculating xdots
Assume all t_vec are equal
Done calculating xdots
time trim:  0.5
No ecc


[I 2024-08-13 13:57:47,919] A new study created in RDB with name: torquetorque_jupyter-optuna-study
[I 2024-08-13 13:58:35,006] Trial 0 finished with values: [0.8431076537888884, 0.0] and parameters: {'lib_choice': 'poly_2nd_order', 'optimizer': 'sr3', 'lambdas': 16.513471641014835, 'nus': 0.02665887012807107}. 
[I 2024-08-13 13:59:30,939] Trial 1 finished with values: [0.0024937965234715954, 61.0] and parameters: {'lib_choice': 'poly_2nd_order', 'optimizer': 'lasso', 'alphas': 0.00022056127528053627}. 
[I 2024-08-13 14:03:54,298] Trial 2 finished with values: [0.9668717010378006, 2.0] and parameters: {'lib_choice': 'nonlinear_terms', 'optimizer': 'sr3', 'lambdas': 0.23650165476045204, 'nus': 5.133499992504579e-07}. 
[I 2024-08-13 14:07:33,505] Trial 3 finished with values: [0.8431076537888884, 0.0] and parameters: {'lib_choice': 'nonlinear_terms', 'optimizer': 'sr3', 'lambdas': 8.846775027962632, 'nus': 0.007695000966948789}. 
[I 2024-08-13 14:08:05,664] Trial 4 finished with values: 

In [5]:
### Plot a premade study
plot_optuna_data('torquelinear-optuna-study')

[I 2024-08-13 15:18:08,337] Study name was omitted but trying to load 'torquelinear-optuna-study' because that was the only study found in the storage.


['torquelinear-optuna-study']
Trial count: 3394


### 4) Model Identification
Now the desired hyperparameters and optimiser are known, the user can start the model identification. 
This is done by calling the function `make_model()` from the script `train_model_source.py`. The function requires the following arguments:
- `path_to_data_files` - Path to the `.npz`-file containing the training data
- `modeltype`- Can be set to 'torque', 'ump', 'torque-ump', 'currents' or 'wcoe', specifying what the model predicts
- `optimizer`- Either 'lasso' or 'sr3', specifying the regularisation method
- `lib` - The chosen library candidate functions
- `nmbr_of_train` - default -1 (all files), can be set to a number if not all simulations should be considered.
- `alpha` - default None, the regularisation parameter for lasso
- `nu` - default None, the first regularisation parameter for sr3
- `lamb` - default None, the second regularisation parameter for sr3
- `model_name` - default None, a string that is added to the name of the model

When a model is created, it is saved as a `.pkl`-file in the `models` folder. The model can be loaded by calling the `load_model()` function from `source.py`.

In [6]:
make_model(path_to_data_files, modeltype='torque', optimizer='sr3',
               nmbr_of_train=-1, lib='poly_2nd_order', nu=1.978e-10, lamb=5.3e-9,
               modelname='jupyter_example')

ecc input: True
Loading data
Done loading data


KeyboardInterrupt: 

### 5) Model Evaluation
Now, the model's performance can be evaluated on the test data. This is done by calling the function `simulate_model()`
from the script `train_model_source.py`. The function requires the following arguments:
- `model_name` - The name of the model from the `models` folder
- `path_to_test_file` - Path to the `.npz`-file containing the test data
- `modeltype`- Can be set to 'torque', 'ump', 'torque-ump', 'currents' or 'wcoe', specifying what the model predicts
- `do_time_simulation` - default False, only relevant if `modeltype` == 'currents'. Then the `xdot` is solved by `solve_ivp` 
to retrieve the prediction of `x`

This function returns the predicted and expected values. These can also be plotted in the frequency domain by the `plot_fourier()` function from `source.py`.

In [7]:
model = 'jupyter_example'
pred, test = simulate_model(model+'_model', path_to_test_file, modeltype="torque", do_time_simulation=False, show=True)

(i_d)' = -0.001 1 + -0.133 i_d + 0.189 i_q + 1.524 v_d + 0.860 v_q + 0.001 v_0 + -1.771 I_d + 1.263 I_q + 0.139 V_d + -0.261 V_q + -0.002 V_0 + -0.007 \omega + -0.001 i_d i_q + -0.340 i_d I_d + 6.737 i_d I_q + 0.060 i_d V_d + -0.924 i_d V_q + 0.012 i_d V_0 + -6.748 i_q I_d + -0.460 i_q I_q + 0.922 i_q V_d + 0.062 i_q V_q + -0.037 i_q V_0 + 0.025 v_d I_d + -0.048 v_d I_q + -1.377 v_d V_d + -0.005 v_d V_q + 0.050 v_q I_d + 0.026 v_q I_q + 0.005 v_q V_d + -1.377 v_q V_q + -0.001 v_0 I_d + -0.006 v_0 I_q + 0.012 v_0 V_0 + 1.752 I_d^2 + -0.240 I_d I_q + 0.243 I_d V_d + 1.229 I_d V_q + -0.001 I_d V_0 + -0.001 I_d \omega + -0.602 I_q^2 + -1.323 I_q V_d + 0.572 I_q V_q + -0.001 I_q V_0 + -0.013 I_q \omega + 0.003 I_q f + -0.012 V_d^2 + 0.007 V_d V_q + -0.002 V_d V_0 + -0.001 V_d \omega + -0.014 V_q^2 + -0.020 V_q V_0 + 0.002 V_q \omega + 0.003 V_0 f
Loading data
Done loading data
Calculating xdots
Assume all t_vec are equal
Done calculating xdots
No ecc
MSE on test:  1.5195230046026998e-06
Non

## Additional Information