In [1]:
!cd

D:\Ansys Simulations\Project\2D\preprocessing


## Formatting
In this notebook, the objective is to explore and develop a way to transformed the scaled data into a relevant data format/file/shape that can be used for a 2d convolutional neural network. Ideally, what is desired is a file that can store tensors as spatial data and that can be used by a ConvNet.
After some research, the conclusion is that the best way to do this is to format them as a 4D numpy array and saved as a .npy file so that they can be later quickly read into memory.

In [60]:
## Imports
import numpy as np
import matplotlib.pyplot as plt
from PREPROCESSING_splitting import get_number
import PREPROCESSING_scaling as scale
from pathlib import Path
import pandas as pd
import os, sys

In [14]:
## load in and scale  a sample

data_folder_path =  Path('D:/Ansys Simulations/Project/2D/data') 

# get data
raw_input_data, raw_output_data = scale.get_sample_dfs(data_folder_path, 26)

max_force, max_disp = scale.get_max_disp_force(data_folder_path)

scaled_input_data = scale.scale_dataframe(raw_input_data, max_force, max_disp)
with pd.option_context("display.max_rows", None):
    display(scaled_input_data.iloc[170:189,:])

UPDATED MAX 	 sample #1 	 force: 0.00 	 displacement: 0.002053 
UPDATED MAX 	 sample #10 	 force: 152.79 	 displacement: 0.003020 
UPDATED MAX 	 sample #100 	 force: 152.79 	 displacement: 0.019382 
UPDATED MAX 	 sample #101 	 force: 242.87 	 displacement: 0.019382 
UPDATED MAX 	 sample #11 	 force: 287.27 	 displacement: 0.019382 
UPDATED MAX 	 sample #55 	 force: 295.32 	 displacement: 0.019382 


Unnamed: 0,node_number,named_selection,x_loc,y_loc,z_loc,x_disp,y_disp,z_disp,x_force,y_force,z_force
170,171,-1,0.18059,0.24273,0.0,0.0,0.0,0.0,0.0,0.0,0.0
171,172,-1,0.13878,0.20568,0.0,0.0,0.0,0.0,0.0,0.0,0.0
172,173,-1,0.068634,0.12306,0.0,0.0,0.0,0.0,0.0,0.0,0.0
173,174,-1,0.054411,0.084662,0.0,0.0,0.0,0.0,0.0,0.0,0.0
174,175,-1,0.024266,0.07297,0.0,0.0,0.0,0.0,0.0,0.0,0.0
175,176,2,0.32859,0.6949,0.0,-0.050859,-0.107693,0.0,0.0,0.0,0.0
176,177,1,0.1898,0.56873,0.0,0.031275,-0.014854,0.0,0.0,0.0,0.0
177,178,1,0.12709,0.60428,0.0,0.031275,-0.014854,0.0,0.0,0.0,0.0
178,179,2,0.24574,0.73252,0.0,-0.050859,-0.107693,0.0,0.0,0.0,0.0
179,180,2,0.26941,0.62538,0.0,-0.050859,-0.107693,0.0,0.0,0.0,0.0


To run the convolutional layer on the data, the data has to be converted from the positions table into an object that represents the element's positions in a way that a convolutional kernel is able to understand positional relationships. Looking at the 3D data files generated, there seem to be about 10000 nodes on each of those samples (although the intent is to decrease it later to speed up sample creation). With this in mind, and knowing that most of the samples have a relatively similar volume, choosing a size of 32x32x32 entries in which to split the data is an arbitrary choice which seems to lead to a good balance of nodal specificity, averaging and processing time for the network. 

The fourth dimension's size depends on the amount of features. For directional data such as displacement and forces, there are three components per type. On top of that, one more feature is created which is the "existence" feature, which encodes whether a certain "volume" in space contains material from the element or not, varying from 0 (no material) to 1 (filled with  material). It is only different than 1 or 0 on the edges/faces of the element. 

In [56]:
## Create an empty numpy array with the correct dimensions
def create_array(dimensionality, features, resolution = 32):
    ## returns an array of zeros for the correct type of model specified with the dimensionality and the features
    positional_shape = [resolution]*dimensionality
    shape = positional_shape + [features]
    array = np.zeros(shape)
    return array

In [58]:
print(create_array(3,6, 35).shape)
print(create_array(2,4).shape)

(35, 35, 35, 6)
(32, 32, 4)


To correctly distribute the data inside of the tensor, it is necessary to get the maximum positional dimensions of the dataset so that every sample actually fits inside of the tensor

In [188]:
## Create function to run through all data and get max values of dimensions
def get_max_dimensions(samples_folder_path):
    ## iterates through all data to get the max dimensions
    samples = scale.sample_iterator(samples_folder_path)
    
    max_x = 0
    max_y = 0
    max_z = 0
    
    for sample in samples:
        sample_number, input_data, output_data = sample
        #print(samples)
        
        ## run through all data
        # first absolute, then max in the columns 
        updated = False
        max_x_temp, max_y_temp, max_z_temp = input_data.loc[:,['x_loc','y_loc','z_loc']].abs().max()
        
        if max_x_temp > max_x:
            range_x = [input_data.loc[:,['x_loc']].max().item(), input_data.loc[:,['x_loc']].min().item()]
            max_x = abs(range_x[0] - range_x[1])
            updated = True
            
        if max_y_temp > max_y:
            range_y = [input_data.loc[:,['y_loc']].max().item(), input_data.loc[:,['y_loc']].min().item()]
            max_y = abs(range_y[0] - range_y[1])
            updated = True
            
        if max_z_temp > max_z:
            range_z = [input_data.loc[:,['z_loc']].max().item(), input_data.loc[:,['z_loc']].min().item()]
            max_z = abs(range_z[0] - range_z[1])
            updated = True
            
        if updated:
            print(f'UPDATED MAX \t sample #{sample_number} \t x: {max_x:.4f} \t y: {max_y:.4f} \t z: {max_z:.4f}')
            
    return max_x, max_y, max_z

In [189]:
get_max_dimensions(data_folder_path)

UPDATED MAX 	 sample #1 	 x: 0.9692 	 y: 1.1906 	 z: 0.0000
UPDATED MAX 	 sample #10 	 x: 1.0849 	 y: 1.1906 	 z: 0.0000
UPDATED MAX 	 sample #100 	 x: 1.4374 	 y: 1.3295 	 z: 0.0000
UPDATED MAX 	 sample #101 	 x: 1.4374 	 y: 1.4031 	 z: 0.0000
UPDATED MAX 	 sample #102 	 x: 1.6833 	 y: 1.4031 	 z: 0.0000
UPDATED MAX 	 sample #12 	 x: 1.7286 	 y: 1.4031 	 z: 0.0000
UPDATED MAX 	 sample #50 	 x: 1.7286 	 y: 1.4830 	 z: 0.0000


(1.7286, 1.483, 0)

In [59]:
class HiddenPrints:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout

In [64]:
#create function that creates an element size for the resolution
def get_element_size(samples_folder_path, resolution = 32):
    ## runs through all samples to get the element size based on the resolution
    with HiddenPrints():
        max_x, max_y, max_z = get_max_dimensions(samples_folder_path)
    
    largest_dim = max([max_x, max_y, max_z])
   
    element_size = largest_dim/resolution
    
    return element_size

In [69]:
element_size = get_element_size(data_folder_path)

I just realized that the scaling step performed earlier on the pandas dataframe is of no use since there is no guarantee that two nodes aren't going to fall in the same element, and thus the data must be re-scaled after it has been fitted into the new format, as the maximums will change. This will be implemented later, and the previous scaling function will be considered deprecated.

In [167]:
## create function that takes  the element size and a tensor index and uses it to get
## node indices that match that location from a sample.
def get_dataframe_indices(element_size, sample_df, x, y, z=0):
    ## create function that takes  the element size and a tensor index and uses it to get
    ## dataframe indices that match that spatial location from a sample
    
    ## gets range of values
    x_range = [x*element_size, (x+1)*element_size]
    y_range = [y*element_size, (y+1)*element_size]
    z_range = [z*element_size, (z+1)*element_size]    
    
    ## Check if 2D:
    Is_2D=True
    for i, row in sample_df.loc[:,['z_loc']].iterrows():
        
        if row.loc['z_loc'].item() != 0.0:
            Is_2D = False
            break
    
    ## goes through positions on the sample
    indices = []
    for i, row in sample_df.loc[:,['x_loc','y_loc','z_loc']].iterrows():
        
        x_condition = x_range[0] < abs(row.loc['x_loc'].item()) < x_range[1]
        y_condition = y_range[0] < abs(row.loc['y_loc'].item()) < y_range[1]
        z_condition = z_range[0] < abs(row.loc['z_loc'].item()) < z_range[1]
        
        if x_condition and  y_condition and (z_condition or Is_2D):
            indices.append(i)
            #print(i, row.loc['x_loc'].item(), row.loc['y_loc'].item(), abs(row.loc['z_loc'].item()))

    return indices

In [168]:
get_dataframe_indices(element_size, raw_input_data, 0,4)

[16, 137, 234, 235, 453, 455, 511]

To be able to assign the correct position for the element inside of the tensor, the quadrant it was created in must be determined.