In [3]:
!cd

D:\Ansys Simulations\Project\2D


# SCALING NOTEBOOK
This notebook is used to develop the functions needed to script the scaling of the data

In [3]:
## imports
from pathlib import Path
import pandas as pd
import numpy as np
from PREPROCESSING_splitting import get_number

ModuleNotFoundError: No module named 'PREPROCESSING_splitting'

In [2]:
## Create a function to get the data
def get_sample_dfs(samples_folder_path, sample_number):
    ## returns the input and output dataframe for the sample specified
    
    input_folder_path = Path(samples_folder_path, 'input')
    output_folder_path = Path(samples_folder_path, 'output')
    
    glob_string = "*_" + str(sample_number) + ".csv"
    
    input_sample_glob = input_folder_path.glob(glob_string)
    
    for i, sample_input_file in enumerate(input_sample_glob):
        if i == 0:
            sample_input_df = pd.read_csv(sample_input_file, index_col = 0)
        else:
            raise Exception('error: more than one input sample with label' + str(sample_number))
    
    output_sample_glob = output_folder_path.glob(glob_string)
    
    for i, sample_output_file in enumerate(output_sample_glob):
        if i == 0:
            sample_output_df = pd.read_csv(sample_output_file, index_col = 0)
        else:
            raise Exception('error: more than one output sample with label' + str(sample_number))
    
    return sample_input_df, sample_output_df

data_folder_path =  Path('D:/Ansys Simulations/Project/2D/data') 
print(data_folder_path)
raw_input_data, raw_output_data = get_sample_dfs(data_folder_path, 26)

D:\Ansys Simulations\Project\2D\data


In [32]:
raw_output_data

Unnamed: 0,node_number,x_loc,y_loc,z_loc,x_disp,y_disp,z_disp
0,1,-0.186530,1.13540,0.0,-0.000264,-0.000852,0.0
1,2,0.000000,0.00000,0.0,-0.000586,-0.000198,0.0
2,3,-0.178300,1.08530,0.0,-0.000265,-0.000852,0.0
3,4,-0.170060,1.03510,0.0,-0.000264,-0.000852,0.0
4,5,-0.161820,0.98496,0.0,-0.000261,-0.000849,0.0
...,...,...,...,...,...,...,...
569,570,0.169450,0.33947,0.0,0.000249,-0.000553,0.0
570,571,0.140460,0.33622,0.0,0.000280,-0.000460,0.0
571,572,0.104840,0.44160,0.0,0.000606,-0.000288,0.0
572,573,0.556150,0.75061,0.0,0.000115,-0.000399,0.0


# Pi Theorem
We can use the pi theorem to undimensionalize the data. Using the pint package to do so:

In [51]:
from pint import pi_theorem, formatter, UnitRegistry

In [60]:
ureg = UnitRegistry()
pi_groups = ureg.pi_theorem({'nodal_force': '[force]',
                        'disp': '[length]',
                        'youngs_modulus':'[pressure]'})
for group in pi_groups:
      print(formatter(group.items()))

disp ** 2 * youngs_modulus / nodal_force


In [61]:
pi_groups

[{'nodal_force': -1.0, 'disp': 2.0, 'youngs_modulus': 1.0}]

Thus we arrive at a problem. With only the input values it is not possible to arrive at representative pi groups. Some alternative approaches could be:
* With the idea of having a data-centric approach for the model, what can be done is to scale all the forces by the maximum force in the dataset and all displacements by the maximum displacement in the dataset, and make these maximums be a parameter in the dataset for the model to learn the non-linear scaling required from the data itself. 

* Trying to use non-scaled data might be a good option if some way to initialize weights effectively is found, afterall, the data is expected to always have the same behaviour as it is supposed to represent real world physics which should be a stable dataset.

* An "energy" term  that agglomerates dsplacements and forces cumulatively could also be possibly crafted to measure how much "energy" is being provided to the sample, and scaling that data using pi groups that way, although to avoid introducing non-data values, this energy term would just be a term with the dimensions of energy, instead of an actual energy value.

In [65]:
pi_groups = ureg.pi_theorem({'nodal_force': '[force]',
                        'disp': '[length]',
                        'youngs_modulus':'[pressure]',
                        'energy': '[energy]'})
for group in pi_groups:
      print(formatter(group.items()))

disp ** 2 * youngs_modulus / nodal_force
disp * nodal_force / energy


For a small dataset such as the one we have for the proof of concept, it would likely be useful to use some form of scaling which is more information dense such as introducing some engineering constant from outside of the data(yield stress, youngs modulus, etc). However, we don't really care for the performance of the proof of concept model, as long as it is able to learn some correct behaviour, so we're going to act as if I did have a dataset that can be considered "exhaustive" and simply scale it by the largest value of the dataset in that variable. This 'naive' approach involves inspecting the dataset to find that largest value, and is the one that likely scales best with a larger dataset.

In [None]:
## Create generator function to iterate through all samples
def sample_iterator(samples_folder_path):
    for i in 