# Mastering with ROS: Shadow Hand

## Unit 5: Perception and Object Recognition

<p style="background:green;color:white;">SUMMARY</p>

Estimated time of completion: <b>1h</b><br><br>
This Unit will show you how to use Perception and Object Recognition to get the position of graspable objects.

<p style="background:green;color:white;">END OF SUMMARY</p>

In the previous Chapter, you learned how to perform grasping tasks using the **Smart Grasping System**. But you need to know that this system gets the position of the objects to grasp directly from the simulation data. So, as you may imagine, this is not accurate with the real environments, where there isn't any simulation data. In real scenarios (not simulated), robots can only get data from the environment they are in from their sensors. For instance, a Kinect camera.

So, in this Unit, we are going to see how we can get information about the position of the objects we want to grasp not from the simulation data, but from the sensors.

One of the most usefull perception skills is being able to recognise objects. This allows you to create robots that can grasp objects and understand the world around them a little bit better.
<br><br>
There are two main skills to master here:<br>

* **Recognise flat surfaces**: This allows the robot to detect places where objects usually are. For instance, tables or shelves. It's the first step to take when searching for objects.

* **Recognise objects**: Once you know where to look, you have to be able to recognise different object in the scene and localise where they are placed in the environment.

For this Unit, we have made a couple of modifications in the simulation, in order to make easier the task of recognising objects in the environment.

* The ball to grasp has now a black and white texture, which makes it easier to detect by the camera.
* The camera is closer to the talbe, in order to be able to get a better view of the scenario.

<img src="img/ball_with_texture.png" width="400" />

<img src="img/closer_camera.png" width="600" />

So... with the proper introductions made, let's start working!

## Table Top Detector

The first step to take in order to be able to recognise objects, is knowing where these objects are more likely to be. For this, we are going to use a part of the <a href="http://wg-perception.github.io/object_recognition_core/">tabletop_object_detector</a> package. With this package, we will be able to detect flat surfaces and represent it in RVIZ.

In order to see how you can use this package, just follow the next exercise!

<p style="background:#EE9023;color:white;">Exercise 5.1</p>

a) The first step is to create your own object recognition package:

<table style="float:left;background: #407EAF">
<tr>
<th>
<p class="transparent">Execute in WebShell #1</p>
</th>
</tr>
</table>

In [None]:
roscd;cd ../src

In [None]:
catkin_create_pkg my_object_recognition_pkg rospy object_recognition_core

b) Inside this package, create a **launch** folder containing a launch file named **init_table_top.launch**. Copy the following code inside this file:

In [None]:
<launch>
    
    <arg name="tabletop_ork_file" value="$(find my_object_recognition_pkg)/conf/detection.tabletop_shadow.ros.ork"/>
    
    <node pkg="object_recognition_core"
    type="detection"
    name="tabletop_server_node"
    args="-c $(arg tabletop_ork_file)"
    output="screen">
    </node>

</launch>

So, as you can see, you are launching a binary file called **detection** with a configuration file as the argument. This configuration file, called **detection.tabletop_fetch.ros.ork**, is where all the input sensors and values for the table detection will be set. It's basically like a YAML file, but with a different extension (**.ork**).

c) So, the next step will be to create a directory named **conf** inside your package. Then, create a file named **detection.tabletop_shadow.ros.ork**. Inside this file, copy the following code:

In [None]:
source1:
  type: RosKinect
  module: 'object_recognition_ros.io'
  parameters:
    rgb_frame_id: '/camera_depth_optical_frame'
    rgb_image_topic: '/camera/rgb/image_raw'
    rgb_camera_info: '/camera/rgb/camera_info'
    depth_image_topic: '/camera/depth/image_raw'
    depth_camera_info: '/camera/depth/camera_info'
    #
    #crop_enabled: True
    #x_min: -0.4
    #x_max: 0.4
    #y_min: -1.0
    #y_max: 0.2
    #z_min: 0.3
    #z_max: 1.5

sink1:
  type: TablePublisher
  module: 'object_recognition_tabletop'
  inputs: [source1]

pipeline1:
  type: TabletopTableDetector
  module: 'object_recognition_tabletop'
  inputs: [source1]
  outputs: [sink1]
  parameters:
    table_detector:
      min_table_size: 4000
      plane_threshold: 0.01
    #clusterer:
    #  table_z_filter_max: 0.35
    #  table_z_filter_min: 0.025


Of all the parameters you see here, the only ones that are really relevant most of the times are this ones:
<br><br>
**rgb_frame_id**: '/camera_depth_optical_frame'<br>
**rgb_image_topic**: '/camera/rgb/image_raw'<br>
**rgb_camera_info**: '/camera/rgb/camera_info'<br>
**depth_image_topic**: '/camera/depth/image_raw'<br>
**depth_camera_info**: '/head_camera/depth/camera_info'<br>

This will set the correct image topics as inputs so that the recognition can be made.

d) With everything set up, just execute the launch file.

<table style="float:left;background: #407EAF">
<tr>
<th>
<p class="transparent">Execute in WebShell #1</p>
</th>
</tr>
</table>

In [None]:
roslaunch my_object_recognition_pkg init_table_top.launch

e) Finally, open RVIZ and add all the elements you want to visualize (like the Camera or the PointCloud2 elements). In order to visualise the Tabletop detection, you will have to add the **OrkTable** element. Then, you have to set the topic where the table data is being published, in this case, **/table_array**. You can then check certain options, like **bounding_box**, in order to have a bounding box around the tabletop detection, or the **top** option in order to visualize what is being considered as the top of the surface.

At the end, you should have something similar to this:

<img src="img/tabletop.png" width="700" />

<p style="background:#EE9023;color:white;">End of Exercise 5.1</p>

## Let's get some pictures!

The next step will be to take some pictures of the object we want to grasp, in order to detect some key points that define the object. With this key points, we will be able to detect the object later, by comparing the pictures taken with the object being detected by the camera.

For this purpose we will use the **find_object_2d** package. So, in order to see how to do this, just follow the next exercise!

<p style="background:#EE9023;color:white;">Exercise 5.2</p>

a) Inside your object recognition package, create a new launch file called **start_find_object_2d.launch**. Copy the following code inside it:

In [None]:
<?xml version="1.0" encoding="UTF-8"?>

<launch> 
    <arg name="camera_rgb_topic" default="/camera/rgb/image_raw" />
    <node name="find_object_2d_node" pkg="find_object_2d" type="find_object_2d" output="screen">
        <remap from="image" to="$(arg camera_rgb_topic)"/>
    </node>
    
</launch> 

As you can see, you just need to set the RGB camera image source and the system will be ready to go. In this case, it's **/camera/rgb/image_raw**.

b) Launch this file and go to The Graphical Tools tab. You should see something similar to this:

<img src="img/photos1.png" width="600" />

After a few seconds, you will be able to see the scene.

<img src="img/photos2.png" width="600" />

c) Now, it's time to get some pics of the object we want to grasp! In order to do this, select the **Edit -> Add object from scene** option.

<img src="img/photos3.png" width="400" />

You can also add previously taken images directly, but bear in mind that there are some peculiarities. The images appear in this object recogniser mirrored, if you compare them with the images from the cameras. So you should be careful with that.

d) In the **Add Object** screen, you just have to follow the steps in order to select the section of the image that you consider to be the object.

Click on the **Take picture** button.

<img src="img/photos4.png" width="600" />

Select the desired section of the image. In this case, it's the ball. Try to make the selection a little bit bigger than the ball itself.

<img src="img/photos5.png" width="600" />

Finally, click the **End** button.

<img src="img/photos6.png" width="600" />

Great! Once done, you should be detecting the object in the table. This system compares the images received by the camera with the saved ones, and looks for matches. If it matches in enough points, it considers it the desired object.

<img src="img/photos7.png" width="600" />

e) So, the last step will be to save all of the objects added. There are 2 main ways of doing this:<br>

* Saving the objects as images: **File -> Save Objects**. This will save all of the images taken in a folder
* Saving the whole session: **File -> Save Session**. This will save a binary with all of the images and settings. This is the most compact way of doing it, although you won't have access to the images of the objects. It depends on your needs

For now, let's just save the whole session. Inside your package, create a new folder named **saved_pictures**, and save the session inside this folder. You can name it **ball_session**.

<p style="background:#EE9023;color:white;">End of Exercise 5.2</p>

So, once you have your session stored, you need to be able to always start an object recognition session with all of that stored data. In order to do so, just follow the next exercise!

<p style="background:#EE9023;color:white;">Exercise 5.3</p>

a) Create a new launch file inside your package named **start_find_object_3d_session.launch**, and copy the following content into it.

In [None]:
<launch>
		
	<node name="find_object_3d" pkg="find_object_2d" type="find_object_2d" output="screen">
		<param name="gui" value="true" type="bool"/>
		<param name="settings_path" value="~/.ros/find_object_2d.ini" type="str"/>
		<param name="subscribe_depth" value="true" type="bool"/>
		<param name="session_path" value="$(find my_object_recognition_pkg)/saved_pictures/ball_session.bin" type="str"/>
		<param name="objects_path" value="" type="str"/>
		<param name="object_prefix" value="object" type="str"/>
		
		<remap from="rgb/image_rect_color" to="/camera/rgb/image_raw"/>
		<remap from="depth_registered/image_raw" to="/camera/depth/image_raw"/>
		<remap from="depth_registered/camera_info" to="/camera/depth/camera_info"/>
	</node>
	
</launch>

b) Launch the file. You should then be able to get the TF of the detected object published. If you have multiple images of the same object, you will get multiple frames of objects. It's up to you to filter them.

<img src="img/object_detected1.png" width="500" />

<img src="img/object_detected2.png" width="500" />

c) You can also see the object detected by executing the following command in another terminal while the prior launch is working:


<table style="float:left;background: #407EAF">
<tr>
<th>
<p class="transparent">Execute in WebShell #2</p>
</th>
</tr>
</table>

In [None]:
rosrun find_object_2d print_objects_detected

<img src="img/object_dected_cmd.png" width="1000" />

<p style="background:#EE9023;color:white;">End of Exercise 5.3</p>

Awesome! So now we are able to get the TF of the object we want to grasp. But... how can we get the position of that object? Which, in fact, is the important data we really want to know if we want to grasp it.

Well, as you have seen in the previous exercise, we now have the TF from the detected object to the **camera_depth_optical_frame** being published. And, obviously, we also have the TF from this camera frame to the **world** frame, which represents the center of the environment. So... with this TF data being published, we can already know the position of that object related to the world frame!

So, in order to get the position of the object, you just need to check the value of its TF regarding the world frame. You can check that by using the following command:

In [None]:
rosrun tf tf_echo world <object_frame>

So, if the frame of your object is named, like in this notebook, **object_8**, the command would be:

In [None]:
rosrun tf tf_echo world object_8

After a few seconds, you will get something like this:

<img src="img/object_tf.png" width="600" />

Awesome! So now, you are able to get the position of the object to grasp by using Object Recognition! Now, you can go to the Next Unit (Project) and use this data to actually Grasp the object.