<a href="https://colab.research.google.com/github/tinachengece/CMSC818V_HW1/blob/main/CMSC818v_HW1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*UIDs:*

# **CMSC818v Homework 1: Tactile Depth Estimation**


## Objective

In this project we want to use deep neural networks to predict 3D contact geometry from monocular images of a vision-based tactile sensor.

# Background

  - **Tactile sensors** are devices designed to measure information arising from the physical interaction of robots with their environment. These sensors excel in detecting stimuli resulting from mechanical stimulation, temperature variations, and even pain-like responses.
<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1dBNV2bzJVY4bS70TljKFSjzn7X40vY70" alt="Digit on allgero" width="450"/>
</p>
However, recent sensor developments in this field, often inspired by the biological sense of cutaneous touch, have predominantly concentrated on capturing the 3D geometry of contact. In this project, we aim to extend this focus to predicting such interactions, particularly for GelSight tactile sensors. The figure below illustrates the resolution of tactile sensors when they come into contact with various objects
<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1JtePPB9wisU5XdIZ56P_Omo_84QZmFGk" alt="Digit images"/>
</p>
The papers below contain crucial information on how these sensors work, which could be helpful for success in this project <br>
<a href="http://gelsight.csail.mit.edu/wedge/ICRA2021_Wedge.pdf"> GelSight Wedge: Measuring High-Resolution 3D Contact Geometry with a Compact Robot Finger</a><br>
<a href="https://arxiv.org/pdf/2005.14679.pdf">DIGIT: A Novel Design for a Low-Cost Compact
High-Resolution Tactile Sensor with Application to
In-Hand Manipulation</a>

- **Depth Prediction** is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. *You can also watch one of the recent works on reconstructing objects with tactile sensors on [YouTube](https://www.youtube.com/watch?v=38utg590wao)*.


## Objective
In this project, we aim to acquire the inverse sensor model to reconstruct local 3D geometry from a tactile image. The task involves training the model in a supervised manner to predict local heightmaps and contact areas from tactile images. While one potential strategy involves integrating depth and contact prediction within a stacked neural network, such as outlined in [Depth Map Prediction from a Single Image using a Multi-Scale Deep Network](https://arxiv.org/pdf/1406.2283.pdf), we encourage you to develop a working program for the specific challenges of the problem.

## Step 1: Data-loading *(10 points)*
Create a custom program to read images from the [provided dataset](https://drive.google.com/drive/folders/16BcGTVkj4s0y9kWM9vIFo40MdPjRRv7L?usp=drive_link). You might need to preprocess the data as these are raw tactile readings from sensor without any normalization. For further guidance, refer to the [PyTorch tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).

In [None]:
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms, utils
from skimage import transform
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
import time
import os
import re
import tqdm
import random
from imageio import imread
from pathlib import Path
from PIL import Image, ImageFile


class TactileDataset(Dataset):

    def __init__(self, tactile_dir, depth_dir, transform=None):
        super(TactileDataset, self).__init__()

        self.tactile_dir = tactile_dir
        self.depth_dir = depth_dir
        self.transform = transform

    def __len__(self):
        return len(os.listdir(self.tactile_dir))

    def __getitem__(self, idx):

        # read as PIL images
        tactile_sample = Image.open(os.path.join(self.tactile_dir, os.listdir(self.tactile_dir)[idx])).convert('RGB')
        depth_sample = Image.open(os.path.join(self.depth_dir, os.listdir(self.depth_dir)[idx])).convert('L')

        contact_sample = (np.array(depth_sample) > 0).astype(np.uint8)       # conatct sample can be retrieved from depth_sample where depth is greater than 0

        # train transform
        seed = random.randint(0, 2 ** 32)
        if self.transform:
            random.seed(seed)
            tactile_sample = self.transform(tactile_sample)

            random.seed(seed)
            depth_sample = self.transform(depth_sample)

        # resize depth image if needed
        # calculate contact mask based on depth


        # convert to torch tensor
        sample = {'tactile':tactile_sample, 'depth': depth_sample, 'contact': contact_sample}

        return sample

# Add some transformation based on your choice that suits the diversity you expect to see during testing. This step is one of the most important parts that can affect the model's performance.
# you can check https://pytorch.org/vision/stable/transforms.html for existing augmentations
trans_train = transforms.Compose([
    transforms.Resize((320, 240)),

])

trans_test = transforms.Compose([
    transforms.Resize((320,240)), # resize to training images shape
])

## load data
root_dir = Path(r"  ")
data_dir = []

for category_dir in sorted(root_dir.iterdir()):
    if category_dir.is_dir():
        tactile_dir = category_dir / "tactile"
        depth_dir = category_dir / "depth"

        print(f"\nCategory: {category_dir.name}")

        # Check if both subfolders exist
        if tactile_dir.exists() and depth_dir.exists():
            tactile_files = sorted(os.listdir(tactile_dir))
            depth_files = sorted(os.listdir(depth_dir))

            if len(tactile_dir) == len(depth_dir)
              print(f"Valid folder — tactile: {len(tactile_files)}, depth: {len(depth_files)}")
              data_dir.append(category_dir)

        else:
            print(f"  Skipping {category_dir.name} (missing tactile/ or depth/ folder)")

for d in data_dir:
    print(" -", d.name)

# Splitting
random.seed(42)

random.shuffle(data_dir)
split_idx = int(0.7 * len(data_dir))
data_dir_train = data_dir[ :split_idx]
data_dir_valid = data_dir[split_idx: ]

print(f"Total valid folders: {len(data_dir)}")
print(f"Training folders: {len(data_dir_train)}")
print(f"Testing folders: {len(data_dir_valid)}")

dataset_train = TactileDataset(data_dir_train / 'tactile', data_dir_train / 'depth', transform=trans_train)
dataloader_train = DataLoader(dataset_train, batch_size=bs, shuffle=True)

dataset_valid = TactileDataset(data_dir_valid / 'tacilte', data_dir_valid / 'depth', transform=trans_test)
dataloader_valid = DataLoader(dataset_valid, batch_size=bs, shuffle=True)

datalen_train = len(dataset_train)
datalen_valid = len(dataset_valid)

print(datalen_train, datalen_valid)


SyntaxError: ignored

## Step 2: Network Design *(40 point)*
Design the neural network, incorporating various [layers](https://pytorch.org/docs/stable/nn.html). Additionally, consider initializing the layer weights using predefined [PyTorch initializers](https://pytorch.org/docs/stable/nn.init.html). Inpired by [1], you may use Coarse network for contact prediction and a Fine network for depth prediction, providing higher resolution.
<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1_IJxSfYNjsU6wkE0QwSMF4LSvENuJPRL" alt="Digit images"/>
</p>


In [None]:
class ContactNet(nn.Module):

    def __init__(self, init=True):
      super(ContactNet, self).__init__()
      # define your network layers that takes tactile image and outputs the predicted contact mask

        if init:
          # Initialize the weights

    def forward(self, x):
      #implement the forward pass to predict the contact
    return c #return contact



class TactileDepthNet(nn.Module):

    def __init__(self, init=True):
        super(TactileDepthNet, self).__init__()
        # define your network layers that takes tactile image and outputs the predicted depth (heightmap)

        if init:
          # Initialize the weights

    def forward(self, x, contact_output_batch):

        return d #return depth

# initialize
contact_model = ContactNet(init=False).to(device)
tactile_depth_model = TactileDepthNet(init=False).to(device)


## Step 3: Loss Function *(10 points)*


In [None]:
class Loss(nn.Module):
  def __init__(self):
        super(Loss, self).__init__()

    def forward(self, pred, target):

        # define the loss function based on the task and your expection of network's output
        return loss

#criterion
contact_criterion = Loss()
tactile_depth_criterion = Loss()

# optimizer
contact_optimizer = torch.optim.

tactile_depth_optimizer = torch.optim.

# data parallel
contact_model = nn.DataParallel(contact_model)
tactile_depth_model = nn.DataParallel(tactile_depth_model)

In [None]:
def plot_losses(train_losses, valid_losses):
    plt.plot(train_losses, label='train losses')
    plt.plot(valid_losses, label='valid losses')

    plt.xlabel("Iterations")
    plt.ylabel("Losses")

    plt.legend()
    plt.title("Losses")
    plt.grid(True)

## Step 4: Training Networks *(30 points)*

In [None]:
## Contact Model
train_losses = []
valid_losses = []
tl_b = []

start = time.time()
for epoch in tqdm(num_epochs):

    train_loss = 0
    contact_model.train()
    for i, samples in enumerate(dataloader_train):

        tactiles = samples['tactile'].float().to(device)
        contacts = samples['contact'].float().to(device)

        # forward pass
        output = contact_model(tactiles)

        # compute contact loss


        # backward pass

        # optimization

        train_loss += loss.item()
        tl_b.append(loss.item())

    train_losses.append(train_loss / datalen_train)

    valid_loss = 0
    contact_model.eval()
    with torch.no_grad():
        for i, samples in enumerate(dataloader_valid):

            tactiles = samples['tactile'].float().to(device)
            contacts = samples['contact'].float().to(device)

            # forward pass contact_model
            output =
            # compute contact loss
            loss =

            valid_loss += loss.item()

    valid_losses.append(valid_loss / datalen_valid)

    # save contact_model with torch.save


elapse = time.time() - start
print('Time used (Sec): ', elapse, ' per epoch used: ', elapse / num_epochs)
plot_losses(train_losses, valid_losses)

In [None]:
plt.subplot(311)
plt.plot(tl_b, label='train loss')
plt.grid(True)
plt.legend()

plt.subplot(312)
plt.plot(valid_loss, label='val loss')
plt.grid(True)
plt.legend()

plt.subplot(313)
plt.plot(tl_b, label='train loss')
fml = np.mean(tl_b)
plt.axhline(y = fml, color='r', linestyle='-', label='final mean train loss: {:.2f}'.format(fml))
plt.grid(True)
plt.legend()

In [None]:
## Tactile Depth Model
train_losses_, valid_losses_ = [], []
tl_b_ = []
start = time.time()
for epoch in range(num_epochs):

    print('>', end=' ')

    train_loss = 0
    tactile_depth_model.train()
    for i, samples in enumerate(dataloader_train):

        tactiles = samples['tactile'].float().to(device)
        depths = samples['depth'].float().to(device)

        # results from contact

        # forward pass

        # backward pass

        # optimization

        train_loss += loss.item()
        tl_b_.append(loss.item())

    train_losses_.append(train_loss / datalen_train)

    valid_loss = 0
    tactile_depth_model.eval()
    with torch.no_grad():
        for i, samples in enumerate(dataloader_valid):

            tactiles = samples['tactile'].float().to(device)
            depths = samples['depth'].float().to(device)

            # results from tactile_depth network
            tactile_depth_model.eval()
            with torch.no_grad():

            # forward pass tactile_depth_model

            # compute loss from tactile_depth_criterion
            loss =

            valid_loss += loss.item()
    valid_losses_.append(valid_loss / datalen_valid)

    # save save tactile_depth_model with torch.save

elapse = time.time() - start
print('Time used (Sec): ', elapse, ' per epoch used: ', elapse / num_epochs)
plot_losses(train_losses_, valid_losses_)

## Step 5: Evaluation *(10 points)*

In [None]:
## Evaluation
## You should evaluate multiple error and accuracy metrics that are used for depth estimation. Some of them are mentioned in Section 4.3 in https://arxiv.org/pdf/1406.2283.pdf
## Provide per-object metric results and discuss how object shape influences the performance of your method.
outputs = no.array([])
for i, samples in enumerate(dataloader_valid):

    tactiles = samples['tactile'].float().to(device)
    depths = samples['depth'].float().to(device)

    # results from contact network
    contact_model.eval()
    with torch.no_grad():
        contact_output = contact_model(tactiles).unsqueeze(1)

    # results from tactile depth network
    tactile_depth_model.eval()
    with torch.no_grad():
        tactile_depth_output = tactile_depth_model(tactiles, contact_output)
    break

# show 10 sample images (from both the train and test sets) in a subplot figure. Each row should represent a tactile image, and there should be three columns: the original image, the predicted depth, and the predicted contact.

Define a function below that instantializes the networks again and loads the weights for new predictions. This function will be used for testing purposes.

In [None]:
def predict(tactile_image):

  return contact, depth

**Grading Criteria:**




Step 1: Data-loading => 10%

Step 2: Network Design => 40%

Step 3: Loss Function => 10%

Step 4: Training Networks ==> 30%

Step 5: Evaluations ==> 10%


Refrences:

[1] [Depth Map Prediction from a Single Image
using a Multi-Scale Deep Network](https://arxiv.org/pdf/1406.2283.pdf)

[2] [MidasTouch: Monte-Carlo inference over
distributions across sliding touch](https://arxiv.org/pdf/2210.14210.pdf)

[3] [depth-eigen](https://github.com/shuuchen/depth_eigen/tree/master)
