# Assignment 02 for 3D World Representations

In this assignment you will complete two programming tasks and answer 1 written question. You are expected to use Python along with libraries such as **NumPy**, **SciPy**, **Matplotlib**, and **Open3D** (for 3D visualization). Please ensure your code is well-commented and modular.



## 1 Task Based Mapping Methods
Roboticists at the University of Bonn are developing several robots for different applications: 

- **UBoButler**: A humanoid robot tasked with sorting objects on a kitchen counter. The robot must identify and manipulate items like 'cups', 'plates', and 'utensils' and place them in designated locations. It does not wander out of the kitchen.

- **UBoWanderDog**: UBoWanderDog: UBoWanderDog is a quadruped navigating a hilly terrain with obstacles such as 'rocks', 'trees', and 'bushes'. The robot's primary task is to explore the area and map potential paths.

- **UBoAssist**: UBoAssist is a humanoid assisting elderly and visually challenged people in navigation in dense urban areas with pedestrians, cyclists, other vehicles, traffic lights, and crosswalks.

These robots rely on accurate 3D perception to function effectively. Their perception systems process visual data to create 3D maps of their environments. The team is focused on using pixel-level classification from deep learning models to label elements in the scene. To understand the challenges involved, consider the following computational cost considerations:

- **Panoptic Segmentation**: Highest computational cost.

- **Semantic Segmentation**: Provides pixel-level classification of object categories

- **Instance Segmentation**: Similar to Semantic Segmentation but requires additional computation for tracking individual instances.

Determine the most suitable scene understanding approach for mapping (Semantic Segmentation, Instance Segmentation, or Panoptic Segmentation) for each of the three scenarios (Kitchen, Hilly Terrain, and Downtown Area). Justify your choice for each scenario. 

***Answer here in plain text***


## 2 Semantic Point Cloud Creation and Stitching            

In this task you are provided with a set of depth images, semantic segmentation masks, camera poses, and intrinsics. Your goal is to generate and visualize a colored 3D point cloud. Each depth image should be converted to a local point cloud, colored using the semantic mask and a provided CSV file that maps 41 semantic classes (including background) to RGB colors, and then transformed into world coordinates using the camera pose. Finally, the per-image point clouds should be stitched together and visualized as one global colored point cloud.

- a. Convert each depth image into a 3D point cloud using the camera intrinsics. 

- b. Modify above function to color the point cloud. For each depth, use the corresponding semantic segmentation mask and the provided CSV file mapping semantic classes to colors, to assign an RGB color to each point. Use nearest point interpolation for aligning depth image with segmentation image. Assume segmentation image has same instrinsics as color image. Assume the optical centers for RGB and depth images are co-incident. 

- c. Then, transform the colored local point cloud into world coordinates using the provided camera pose.

- d. Stitch the transformed point clouds from all images into a single global point cloud and visualize it using a 3D visualization tool such as Open3D. 
- e. Explain in text the trade-off between occupancy and TSDF maps on one hand and different grid sizes on the other based on the visualizations generated

### Note:
- Use appropriate libraries (e.g., Open3D) for point cloud creation and visualization.
- Make sure to apply the correct transformations using camera poses and intrinsics.


In [None]:
def load_data():
    ## Load all the depth images, semantic mask images, camera poses, intrinsics and csv for mapping semantic classes to colors
    pass

In [None]:
def convert_depth_to_pointcloud(depth, depth_intrinsics):
    pass

In [None]:
def convert_depth_semantics_to_colored_pointcloud(depth, semantics, depth_intrinsics, color_intrinsics):
    pass

In [None]:
def transform_pointcloud_to_worldframe(pointcloud, pose):
    pass

In [None]:
def stitch_pointcloud(clouds):
    pass


In [None]:
def visualize(stitched_cloud):
    pass


In [None]:
if __name__ == "main":
    # Call all your functions here
    pass

## 3. 2D TSDF Grid Generation, Weighted Update and Occupancy Computation ##

In this task you are provided with a defined parabola given by the equation $y=4x^2$, that represents a curved surface in the first quadrant. The scene is defined over a 1m × 1m area with a  user-defined grid cell size i.e. your method should be general to the size. You are also provided with multiple camera poses that indicate the camera’s position and orientation in the scene. Your goal is to generate a 2D Truncated Signed Distance Function (TSDF) grid by performing raycasting from the camera pose(s) into the scene. The TSDF should be computed as follows: along each ray cast from the camera, determine the intersection with the surface defined by $y=4x^2$ (the measured surface).  For a given grid cell along that ray:

- If the cell lies between the camera and the measured surface, assign a positive TSDF value.

- If the cell is beyond the measured surface, assign a negative TSDF. 

- Use TSDF updating principles in the lecture including weight drop-off beyond -0.1m. 

You can refer to state-of-the-art TSDF weight and distance calculation and update methods in the repository https://github.com/ethz-asl/voxblox. 

- a.	Generate the grid covering the 1m × 1m domain with user defined resolution (Use 0.1m for initial testing). Initialize a maps with two values for the grid: TSDF distance and the weight. Also perform suitable initialization of the weights and distances.     

- b.	Using the provided camera pose(s), perform raycasting into the scene. You are given the helper function get_intersection_point(). Compute the TSDF value based on the distance from the cell to the measured surface along that ray using the following convention: cells between the camera and the surface have positive TSDF values, and cells beyond the surface have negative TSDF values. Update the cell’s weight only if its absolute TSDF value is within 0.1m. Document your raycasting method and detail how you use the get_intersection_point() function to determine the surface intersection.
                                                                                            		
- c.	Once the TSDF weights and distances are updated for all poses, generate an occupancy map from the two maps.  
		
- d.	Visualize the TSDF distance, weights and the occupancy maps for two different gride cell sizes 0.1m and 0.01m.  
		
- e.	Explain in text the trade-off between occupancy and TSDF maps on one hand and different grid sizes on the other based on the visualizations generated 

**Note**: You can add helper functions to make the solution modular. However the functions given below must be completed. You can call your helper functions within it. 
		


In [None]:
import numpy as np
import matplotlib.pyplot as plt

##############################
# Configuration and Constants
##############################

config = {
    'use_weight_dropoff': True,
    'default_truncation_distance': 0.1,  # meters
    'max_weight': 100.0,
    'use_const_weight': False
}

kFloatEpsilon = 1e-6

def get_intersection_point(pose, noise_factor=0.001):
    """
    Compute the intersection point of the sensor ray with the parabola defined by:
         y = 4x^2
    The sensor ray is given by: (x, y) + t*(cos(theta), sin(theta)).
    We solve for t from:
         y_cam + t*sin(theta) = 4*(x_cam + t*cos(theta))^2
    and add noise proportional to t. The measured distance is returned as negative,
    but the intersection point is computed using its absolute value.
    """
    x_cam, y_cam, theta = pose

    # Coefficients for the quadratic:
    # 4*cos(theta)^2 * t^2 + (8*x_cam*cos(theta) - sin(theta)) * t + (4*x_cam^2 - y_cam) = 0
    A = 4 * (np.cos(theta)**2)
    B = 8 * x_cam * np.cos(theta) - np.sin(theta)
    C = 4 * (x_cam**2) - y_cam

    discriminant = B**2 - 4 * A * C
    if discriminant < 0:
        t_true = 0
    else:
        t1 = (-B + np.sqrt(discriminant)) / (2 * A)
        t2 = (-B - np.sqrt(discriminant)) / (2 * A)
        candidates = [t for t in [t1, t2] if t > 0]
        t_true = min(candidates) if candidates else 0

    noise = np.random.randn() * noise_factor * t_true
    t_meas = -(t_true + noise)  # negative measured distance

    ray_direction = np.array([np.cos(theta), np.sin(theta)])
    # Compute intersection point using absolute measured distance
    point_G = np.array([x_cam, y_cam]) + (-t_meas) * ray_direction
    return point_G

In [None]:
def load_camera_poses_from_csv_file(csv_file):
    pass

In [None]:
def initialize_tsdf_map(grid_size=1.0, cell_size=0.1, config=config):
    # Initialize a tsdf map with 2 values each: weight and distance
    pass


In [None]:

def update_tsdf_map(camera_pose,  grid_size=1.0, cell_size=0.2, config=config):
    # Update the tsdf map's weight and distances where applicable for the current camera pose
    # Use the get_intersection_point() function in above cell to get the intersection point with the parabola which is the surface point
    pass


In [None]:
def compute_occupancy_map(tsdf_map, weight_threshold=0.01,  config=config):
    # Compute occupancy map from the tsdf map's weights and distances. If weight < 0.01, assume occupancy is 0.5 i.e. unknown
    pass

In [None]:
def visualize_tsdf_weight(tsdf_map):
    pass

In [None]:
def visualize_tsdf_dist(tsdf_map):
    pass

In [None]:
def visualize_occupnacy_map(occupancy_map):
    pass

In [None]:
if __name__ == '__main__':

## Call all the funcs here
    pass