<a href="https://colab.research.google.com/github/zanzivyr/Tactile-Sensor/blob/main/Tactile_Sensor_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tactile Sensor: Depth Inference

## Summary

In this second step, I train a deep neural network on the tabular data provided from the YOLOv5 object detection in order to predict the depth of a deformation.

# Object Detection
(via YOLOv5)

Resources:

- https://github.com/ultralytics/yolov5/issues/36
- https://stackoverflow.com/questions/67244258/how-to-get-class-and-bounding-box-coordinates-from-yolov5-predictions 

## Import Libraries

In [2]:
# Import the necessary libraries
import ipywidgets as widgets
from google.colab.output import eval_js
from PIL import Image
from IPython.display import clear_output 
import torch
from torch import nn
import numpy as np
import pandas as pd

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!pip install -r https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt

## Get Object Detection from YOLO

In [4]:
model = torch.hub.load('ultralytics/yolov5', 'custom', '/content/drive/MyDrive/Tactile Sensor/model.pt') 
images = [
    "/content/drive/MyDrive/Tactile Sensor/images/1.png",
    "/content/drive/MyDrive/Tactile Sensor/images/2.png",
    "/content/drive/MyDrive/Tactile Sensor/images/3.png",
    "/content/drive/MyDrive/Tactile Sensor/images/4.png",
    "/content/drive/MyDrive/Tactile Sensor/images/5.png",
    "/content/drive/MyDrive/Tactile Sensor/images/6.png",
    "/content/drive/MyDrive/Tactile Sensor/images/7.png",
    "/content/drive/MyDrive/Tactile Sensor/images/8.png",
]
results = model(images)

Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip
YOLOv5 🚀 2023-1-9 Python-3.8.16 torch-1.13.0+cu116 CPU

Fusing layers... 
YOLOv5s summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape... 


## Shape Returned Data

In [5]:
data = np.array(results.pandas().xyxy)
df = pd.DataFrame({
    'xmin': data[:, 0, 0], 
    'ymin': data[:, 0, 1], 
    'xmax': data[:, 0, 2], 
    'ymax': data[:, 0, 3], 
    'confidence': data[:, 0, 4], 
    'class': data[:, 0, 5], 
    'name': data[:, 0, 6]
})
df

Unnamed: 0,xmin,ymin,xmax,ymax,confidence,class,name
0,906.650574,215.124756,1126.825073,526.016418,0.917952,0,0
1,1074.559326,195.01207,1277.986816,548.637146,0.898038,0,0
2,570.556641,111.673859,754.431519,240.853912,0.892166,0,0
3,538.139832,21.461456,760.539368,185.778992,0.89058,0,0
4,534.04541,0.0,737.37854,114.226547,0.890327,0,0
5,494.853088,436.521423,641.529846,586.613586,0.921078,0,0
6,429.716675,460.232605,613.557495,668.624817,0.923419,0,0
7,378.069458,517.873413,594.305969,720.0,0.927537,0,0


Trim the data to only relevant columns

In [6]:
df = df[['xmin','ymin','xmax','ymax']]
df

Unnamed: 0,xmin,ymin,xmax,ymax
0,906.650574,215.124756,1126.825073,526.016418
1,1074.559326,195.01207,1277.986816,548.637146
2,570.556641,111.673859,754.431519,240.853912
3,538.139832,21.461456,760.539368,185.778992
4,534.04541,0.0,737.37854,114.226547
5,494.853088,436.521423,641.529846,586.613586
6,429.716675,460.232605,613.557495,668.624817
7,378.069458,517.873413,594.305969,720.0


Get the center of the image. This will be an approximate center of the TPU finger. We are assuming that the camera is in the same position every time and that the center of the photo is the true center of the finger.

In [7]:
img=Image.open(images[0])
w,h=img.size
xorigin, yorigin = w/2, h/2
xorigin, yorigin

(640.0, 360.0)

In [8]:
df['xcenter'] = df['xmax'] - df['xmin']
df['ycenter'] = df['ymax'] - df['ymin']
df['area'] = df['xcenter'] * df['ycenter']
df['distance'] = ((xorigin - df['xcenter'])**2 - (yorigin - df['ycenter'])**2) ** 0.5

df['angle'] = ((df['xcenter'] - 1) / df['distance'])
df['angle'] = df['angle'].apply(lambda x: np.arccos(x) * 180 / 3.1415 )

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['xcenter'] = df['xmax'] - df['xmin']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['ycenter'] = df['ymax'] - df['ymin']


Unnamed: 0,xmin,ymin,xmax,ymax,xcenter,ycenter,area,distance,angle
0,906.650574,215.124756,1126.825073,526.016418,220.1745,310.891663,68450.416215,416.943428,58.288385
1,1074.559326,195.01207,1277.986816,548.637146,203.42749,353.625076,71937.061754,436.525963,62.374178
2,570.556641,111.673859,754.431519,240.853912,183.874878,129.180054,23752.966607,393.411082,62.301695
3,538.139832,21.461456,760.539368,185.778992,222.399536,164.317535,36544.143652,368.91533,53.121896
4,534.04541,0.0,737.37854,114.226547,203.33313,114.226547,23226.041366,360.934018,55.905589
5,494.853088,436.521423,641.529846,586.613586,146.676758,150.092163,22015.031855,446.437589,70.957173
6,429.716675,460.232605,613.557495,668.624817,183.84082,208.392212,38310.995185,430.228167,64.852154
7,378.069458,517.873413,594.305969,720.0,216.236511,202.126587,43707.147981,393.257524,56.818537


In [9]:
#@title Annotate Depth (Click to Expand)

btn_index = -1
button = widgets.Button(description="Click to start")
output = widgets.Output()

# Create a function that updates the image widget
def update_image_widget(image_file):
  # Convert the binary image to a PIL image
  image = Image.open(image_file)

  width = int(image.size[0] * 0.5)
  height = int(image.size[1] * 0.5)

  # resize the image to 50% of the original size
  image = image.resize((width, height))

  # Display the image in the widget
  eval_js('google.colab.output.setIframeHeight('+str(height + 130)+')')
  display(image)

def on_button_clicked(b):
  global btn_index

  # Display the message within the output widget.
  with output:
    if(btn_index < 0):
      print("Start annotating now.")
      button.description = "Next Image"
      depth.disabled = False
      update_image_widget(images[0])

    elif(btn_index < df.shape[0]-1):
      clear_output()
      print("placed at (index, depth) ("+str(btn_index)+", "+str(depth.value)+")")
      update_image_widget(images[btn_index])
      df.at[btn_index, 'depth'] = depth.value

    else:
      clear_output()
      print("placed at (index, depth) ("+str(btn_index)+", "+str(depth.value)+")")
      df.at[btn_index, 'depth'] = depth.value

      depth.disabled=True
      button.disabled=True
      display(df)

    btn_index += 1

depth = widgets.IntText(
    value=2,
    description='Depth (mm):',
    disabled=True
)
print("Press 'Next Image' to annotate images with depth info.")
button.on_click(on_button_clicked)
display(depth, button, output)

Press 'Next Image' to annotate images with depth info.


IntText(value=2, description='Depth (mm):', disabled=True)

Button(description='Click to start', style=ButtonStyle())

Output()

# Create Deep Neural Network

Create DNN for depth predictions

From LeakyAI - https://www.youtube.com/watch?v=r5D6bnCJ490

In [10]:
# Check to see if we have a GPU to use for training
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('A {} device was detected.'.format(device))

# Print the name of the cuda device, if detected
if device=='cuda':
    print (torch.cuda.get_device_name(device=device))

A cpu device was detected.


## Standardize Data

In [11]:
mean = {
    'xmin': df['xmin'].mean(),
    'ymin': df['ymin'].mean(),
    'xmax': df['xmax'].mean(),
    'ymax': df['ymax'].mean(),
    'xcenter': df['xcenter'].mean(),
    'ycenter': df['ycenter'].mean(),
    'area': df['area'].mean(),
    'distance': df['distance'].mean(),
    'angle': df['angle'].mean(),
    'depth': df['depth'].mean(),
}

std = {
    'xmin': df['xmin'].std(),
    'ymin': df['ymin'].std(),
    'xmax': df['xmax'].std(),
    'ymax': df['ymax'].std(),
    'xcenter': df['xcenter'].std(),
    'ycenter': df['ycenter'].std(),
    'area': df['area'].std(),
    'distance': df['distance'].std(),
    'angle': df['angle'].std(),
    'depth': df['depth'].std(),
}

df['xmin']     = (df['xmin']     - mean['xmin']    ) / std['xmin']    
df['ymin']     = (df['ymin']     - mean['ymin']    ) / std['ymin']    
df['xmax']     = (df['xmax']     - mean['xmax']    ) / std['xmax']    
df['ymax']     = (df['ymax']     - mean['ymax']    ) / std['ymax']    
df['xcenter']  = (df['xcenter']  - mean['xcenter'] ) / std['xcenter'] 
df['ycenter']  = (df['ycenter']  - mean['ycenter'] ) / std['ycenter'] 
df['area']     = (df['area']     - mean['area']    ) / std['area']    
df['distance'] = (df['distance'] - mean['distance']) / std['distance']
df['angle']    = (df['angle']    - mean['angle']   ) / std['angle']   
df['depth']    = (df['depth']    - mean['depth']   ) / std['depth']   

In [22]:
display(mean)
display(std)

{'xmin': 615.8238754272461,
 'ymin': 244.73744773864746,
 'xmax': 813.3193283081055,
 'ymax': 448.84392738342285,
 'xcenter': 197.49545288085938,
 'ycenter': 204.1064796447754,
 'area': 40992.97557685175,
 'distance': 405.8316376259497,
 'angle': 60.57745087023986,
 'depth': 2.125}

{'xmin': 243.65872423297344,
 'ymin': 203.05753647028646,
 'xmax': 251.73838345271653,
 'ymax': 233.34701680277556,
 'xcenter': 25.381534760922136,
 'ycenter': 86.16781927313689,
 'area': 19708.02440038967,
 'distance': 31.624213312428132,
 'angle': 5.719562957988278,
 'depth': 0.8345229603962802}

The above values are to be used in the 2D visualization Notebook.

In [12]:
df

Unnamed: 0,xmin,ymin,xmax,ymax,xcenter,ycenter,area,distance,angle,depth
0,1.193582,-0.145834,1.245363,0.33072,0.893525,1.23927,1.393211,0.35137,-0.400217,-0.149786
1,1.882697,-0.244883,1.845835,0.42766,0.233715,1.735202,1.570126,0.970596,0.314137,-1.348076
2,-0.185781,-0.6553,-0.233925,-0.891334,-0.536633,-0.869541,-0.874771,-0.392755,0.301464,-1.348076
3,-0.318823,-1.09957,-0.209662,-1.127355,0.981189,-0.461761,-0.225737,-1.167343,-1.303518,1.048503
4,-0.335627,-1.205262,-0.301666,-1.43399,0.229997,-1.04308,-0.901508,-1.419723,-0.816821,-0.149786
5,-0.496476,0.944481,-0.682413,0.590407,-2.002192,-0.62685,-0.962955,1.284015,1.814775,-0.149786
6,-0.763803,1.061252,-0.793529,0.941863,-0.537975,0.049737,-0.136086,0.771451,0.747383,1.048503
7,-0.975768,1.345116,-0.870004,1.162029,0.738374,-0.022977,0.137719,-0.39761,-0.657203,1.048503


In [13]:
# Create our PyTorch tensors and move to CPU or GPU if available
# Extract the inputs and create a PyTorch tensor x (inputs)
inputs = ['xmin','ymin','xmax','ymax','xcenter','ycenter','area','distance','angle']
df[inputs] = df[inputs].astype(float)
x = torch.tensor(df[inputs].values, dtype=torch.float, device=device)

# Extract the outputs and create a PyTorch tensor y (outputs)
outputs = ['depth']
df[outputs] = df[outputs].astype(float)
y = torch.tensor(df[outputs].values,dtype=torch.float, device=device)

In [14]:
# Explore the first 5 inputs
x[0:5]

tensor([[ 1.19358, -0.14583,  1.24536,  0.33072,  0.89353,  1.23927,  1.39321,  0.35137, -0.40022],
        [ 1.88270, -0.24488,  1.84583,  0.42766,  0.23371,  1.73520,  1.57013,  0.97060,  0.31414],
        [-0.18578, -0.65530, -0.23392, -0.89133, -0.53663, -0.86954, -0.87477, -0.39275,  0.30146],
        [-0.31882, -1.09957, -0.20966, -1.12736,  0.98119, -0.46176, -0.22574, -1.16734, -1.30352],
        [-0.33563, -1.20526, -0.30167, -1.43399,  0.23000, -1.04308, -0.90151, -1.41972, -0.81682]])

In [15]:
# Explore the first 5 outputs
y[0:5]

tensor([[-0.14979],
        [-1.34808],
        [-1.34808],
        [ 1.04850],
        [-0.14979]])

## DNN

Fully Connected Layer -> ReLU Activation Layer -> Fully Connected Layer

In [16]:
# Define your PyTorch neural network
# Number of Inputs: 5
# Number of Hidden Units: 100
# Number of Hidden Layers: 1
# Activation Function:  Relu
# Number of Ouputs: 1
model = nn.Sequential(
            nn.Linear(9, 100),
            nn.ReLU(),
            nn.Linear(100, 1)
        )

# Move it to either the CPU or GPU depending on what we have available
model.to(device)

Sequential(
  (0): Linear(in_features=9, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=1, bias=True)
)

## Train DNN

In [17]:
import torch.optim as optim

# Meausure our neural network by mean square error
criterion = torch.nn.MSELoss()

# Train our network with a simple SGD approach
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Train our network a using the entire dataset 5 times
for epoch in range(2):
    totalLoss = 0
    for i in range(len(x)):
        
        # Single Forward Pass
        ypred = model(x[i])
        
        # Measure how well the model predicted vs actual
        loss = criterion(ypred, y[i])
        
        # Track how well the model predicted
        totalLoss+=loss.item()
        
        # Update the neural network
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Print out our loss after each training iteration
    print ("Total Loss: ", totalLoss)

Total Loss:  9.384383098979015
Total Loss:  28.15400403738022


In [18]:
# Save the model
torch.save(model, 'cnn_model.pt')

## Visualize Training of DNN

In [19]:
# Import visualization library
import matplotlib.pyplot as plt

# Plot predictions vs. true values
@torch.no_grad() 
def graphPredictions(model, x, y , minValue, maxValue):
    
    model.eval()                               # Set the model to inference mode
    
    predictions=[]                             # Track predictions
    actual=[]                                  # Track the actual labels
    
    x.to(device)
    y.to(device)
    model.to(device)
    
    for i in range(len(x)):
        
        # Single forward pass
        pred = model(x[i])                               

        # Un-normalize our prediction
        pred = pred*std['depth']+mean['depth']
        act = y[i]*std['depth']+mean['depth']
        
        # Save prediction and actual label
        predictions.append(pred.tolist())
        actual.append(act.item())
    
    # Plot actuals vs predictions
    plt.scatter(actual, predictions)
    plt.xlabel('Actual Depth')
    plt.ylabel('Predicted Depth')
    plt.plot([minValue,maxValue], [minValue,maxValue]) 
    plt.xlim(minValue, maxValue)
    plt.ylim(minValue, maxValue)
 
    # Make the display equal in both dimensions
    plt.gca().set_aspect('equal', adjustable='box')
    plt.show()

In [20]:
graphPredictions(model, x, y, 0, 300)

# Get Depth Inference

In [21]:
# Data that affects the number of lemons sold in one day
idx = 3
sample = {
    'xmin': df['xmin'][idx],
    'ymin': df['ymin'][idx],
    'xmax': df['xmax'][idx],
    'ymax': df['ymax'][idx],
    'xcenter': df['xcenter'][idx],
    'ycenter': df['ycenter'][idx],
    'area': df['area'][idx],
    'distance': df['distance'][idx],
    'angle': df['angle'][idx],
}

# Calculate what would have been the actual result using
# the synthetic dataset's algorithm
actual = df['depth'][idx]

# Use the CPU as we just need to do a single pass
model.to('cpu')

# Normalize our inputs using the same values for our training
'''
sample['xmin']     = (sample['xmin']     - mean['xmin']    ) / std['xmin']    
sample['ymin']     = (sample['ymin']     - mean['ymin']    ) / std['ymin']    
sample['xmax']     = (sample['xmax']     - mean['xmax']    ) / std['xmax']    
sample['ymax']     = (sample['ymax']     - mean['ymax']    ) / std['ymax']    
sample['xcenter']  = (sample['xcenter']  - mean['xcenter'] ) / std['xcenter'] 
sample['ycenter']  = (sample['ycenter']  - mean['ycenter'] ) / std['ycenter'] 
sample['area']     = (sample['area']     - mean['area']    ) / std['area']    
sample['distance'] = (sample['distance'] - mean['distance']) / std['distance']
sample['angle']    = (sample['angle']    - mean['angle']   ) / std['angle']   
'''

# Create our input tensor
x1 = torch.tensor(list(sample.values()), dtype=float)

# Pass the input into the neural network
y1 = model(x1.float())

# Un-normalize our output y1
y1 = y1*std['depth']+mean['depth']
   
# Compare what your network predicted to the actual
print ("Neural Network Predicts: ", y1.item())
print ("Actual Result: ", actual)

Neural Network Predicts:  2.313155174255371
Actual Result:  1.0485032066517366


At the time of uploading, the error is quite high. This is because the entire training set is only 8 images. And, unlike the YOLO dataset, this is not transfer learning.

In the future I will retrain this to gain a higher level of precision.

    Neural Network Predicts:  2.313155174255371
    Actual Result:  1.0485032066517366