# XGBoost Prototype
This notebook will be used to develop a prototype machine learning model to be used for fitting the from the strain gauges for force reconstruction.
At first the prototype is going to be developed using the dataset found at https://www.kaggle.com/daalgi/fem-simulations

The following cell imports the libraries we need for the model

In [2]:
## Imports
import pandas as pd
import xgboost as xgb
import numpy as np
import matplotlib.pyplot as plt

Now the data needs to be loaded in. We are in a colab notebook, so the data is loaded as follows from a local drive which has to have the data already in it.

In [3]:
## Loading in data
from google.colab import files
uploaded = files.upload()

Saving 5184doe.csv to 5184doe.csv


And now we transform it into a data type called a "dataframe" which is easy to think of essentially as an excel table, with headers and all. The library that handles the dataframe data type is pandas, shortened to "pd" in the imports.

The step of loading in the data is so simple because we are getting a dataset which is already very well formated. Whenever we get the actual dataset a lot will have to be done to pre-process it and create a dataset which is usable by the model.

In [7]:
df = pd.read_csv('5184doe.csv')
df

Unnamed: 0,Sample,ecc,N,gammaG,Esoil,Econc,Dbot,H1,H2,H3,Mr_t,Mt_t,Mr_c,Mt_c
0,1,0,2000,0.9,25,30000,17,0.8,1.0,0.8,0.082100,0.055648,0.082100,0.055648
1,2,10,2000,0.9,25,30000,17,0.8,1.0,0.8,-0.597084,-0.233470,1.160648,0.605016
2,3,18,2000,0.9,25,30000,17,0.8,1.0,0.8,-1.094196,-0.566130,1.908188,0.947770
3,4,26,2000,0.9,25,30000,17,0.8,1.0,0.8,-1.416485,-0.865039,2.844706,1.310545
4,5,0,2000,0.9,25,37000,17,0.8,1.0,0.8,0.079570,0.054213,0.079570,0.054213
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5179,5180,26,5000,1.1,75,30000,23,1.6,2.0,1.6,-6.027412,-2.866107,6.471047,4.499262
5180,5181,0,5000,1.1,75,37000,23,1.6,2.0,1.6,0.143146,0.108935,0.143146,0.108935
5181,5182,10,5000,1.1,75,37000,23,1.6,2.0,1.6,-2.671362,-0.561849,1.702752,2.465506
5182,5183,18,5000,1.1,75,37000,23,1.6,2.0,1.6,-4.408298,-1.696514,4.021527,3.522614


The headers aren't very descriptive, but reading the dataset description:
- load parameters: ecc, N, gammaG.
- material parameters: Esoil, Econc.
- geometry parameters: Dbot, H1, H2, H3.
- stress related results: Mrt, Mtt, Mrc, Mtc.

Knowing this, we can decide on what is going to be used for the input and what is going to be used for the output. 

It is desirable to the prototype keep it as close as possible to the final model. The final model should have this behaviour (on prediction):

$$ 
f: \text{(strain gauge voltages)} \mapsto \text{(wheel hub loads)}
$$
Therefore the closest that can be achieved with the prototype for this dataset being naive and not trying to figure out how to exclude geometry parameters is to simple include the material and geometry parameters in the model, such that it behaves like this
$$ 
f: (E_{soil}, E_{conc}, D_{bot}, H_1, H_2, H_3, M_{rt}, M_{tt}, M_{rc}, M_{tc}) \mapsto (e_{cc}, N, \gamma_g)
$$