This is a repository that contains utility functions when working with Ai2-THOR, an open-source simulator of embodied agents in household environments. The idea of this repository is that even though Ai2-THOR updates its version rather frequently with potential changes to its API, thortils will always provide the SAME API for commonly useful functionalities one would need ("Do one thing once"). This includes, for example:
-
import thortils as tt controller = tt.launch_controller({"scene": "FloorPlan1"})
-
import thortils as tt controller = tt.launch_controller({"scene": "FloorPlan1"}) event = controller.step(action="Pass") result = tt.thor_visible_objects(event)
The result is a list, where each element is a dictionary that contains metadata about an object (from the
event
):>>> result[0] {'name': 'Cabinet_5e0161e9', 'position': {'x': -1.8499999046325684, 'y': 2.015000104904175, 'z': 0.3799999952316284}, 'rotation': {'x': -0.0, 'y': 90.0, 'z': -0.0}, 'visible': True, 'obstructed': False, 'receptacle': True, ...
-
Construct a 3D map of a scene as a point cloud, using Open3D
import thortils as tt controller = tt.launch_controller({"scene": "FloorPlan1"}) mapper = tt.map3d.Mapper3D(controller) mapper.automate(num_stops=20, sep=1.5) mapper.map.visualize()
The output looks like:
-
Construct a proper 2D map by projecting the 3D map
# continuing from the above example grid_map = mapper.get_grid_map(floor_cut=0.1) # treat bottom 0.1m as floor viz = tt.utils.visual.GridMapVisualizer(grid_map=grid_map, res=30) img = viz.render() viz.show_img(img)
The output looks like
See more at test_mapper.py
-
Projection of object detection bounding boxes onto the 2D grid map
For code, please refer to the test
tests/test_project_object_detection_gridmap.py
linked above. The result looks like: -
Get shortest path to object. Please refer to the linked function for details.
The branches of thortils are named after the version it is built for. Currently, the version on this branch is 3.3.4. For later versions of Ai2-THOR,
you can create a branch on top of this one, and run tests under tests/
, and fix bugs due to the Ai2-THOR version upgrade. The API of thortils
should stay the same or could be expanded.
- COS-POMDP: Code for "Towards Optimal Correlational Object Search" (ICRA 2022)
- ai2thor-web: Running AI2-THOR in browser to conduct user studies with non-technical, remote participants.
If you find this package useful, please cite the paper "Towards Optimal Correlational Object Search, International Conference on Robotics and Automation (ICRA), 2022.
@inproceedings{zheng2022towards,
title={Towards Optimal Correlational Object Search,
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
author={Zheng, Kaiyu and Chitnis, Rohan and Sung, Yoonchang and Konidaris, George and Tellex, Stefanie},
year={2022}
}
The codebase for this paper, a good example of using this package, is here: https://github.com/zkytony/cos-pomdp
Ai2-THOR by default provides a "GetReachablePositions" function. You might want to use this to construct a grid map of the scene, but that is actually incorrect. Because there are many places in the scene that are not reachable and will be excluded. As an example to illustrate the problem, below is a screenshot of a kitchen scene. The left shows the first-person view, and the right shows the grid map obtained based on the "GetReachablePositions" function. The black cell corresponds to an occupied place, such as on the table or the counter. (Ignore the colors and the graph for now)
The problem is that clearly there are more occupied place that is not included in this grid map. Also, some we may not want to distinguish some occupied places, such as the areas inside a fridge or a cabinet. We cannot do it using this method.
Instead, thortils provides a method proper_convert_scene_to_grid_map
which
obtains the 2D grid map by projecting a 3D map of the scene constructed from a sequence of RGBD images collected at sampled viewpoints within the scene. This 2D grid map is an occupancy grid map, which means a grid cell is either free, occupied, or unknown. This accounts for places
that could be inside a container (like a fridge).
The constructed 3D map (left); the ceilling and floor (middle); the walls and furnitures (right)
The 2D projection (left) and the corresponding 2D grid map (right). Black indicates obstacle (or occupied), gray indicates unknown (e.g. inside fridge), and cyan indicates free space (the robot can access)
Note that the coordinates of this grid map are 0-based integers (instead of metric), which
can be more convenient to work with. The granularity depends on the grid_size
setting of the Ai2-THOR controller.
-
Clone the repository and then install it by:
pip install -e .
-
Run a little test
cd tests python test_scene_to_grid_map.py
Expected output:
FloorPlan22 xxx............xxxx xxx............xxxx xxx............xxxx xxx...........xxxxx xxx..........xxxxxx xxx.........xxxxxxx xxx........xxxxxxxx xxx.......xxxxxxxxx xx........xxxxxxxxx xx........xxxxxxx.. ..........xxxxxxx.. ................... ................... ...................
(note: this is only a test; this grid map is not actually an accurate reflection of the scene.)
If this works, then you should be good to go. Try running the other tests under
tests/
. -
Optionally, obtain a scene dataset. You can either:
-
Download scenes.zip and scene_scatter_plots.zip and decompress them in the root directory of this repository, or
-
Run the following scripts to generate these two datasets:
cd scripts python build_scene_dataset.py ../scenes python create_scatter_plots.py ../scenes/ ../scene_scatter_plots
This is only necessary if you would like to use the functions provided by SceneDataset.
-
Inside thortils/:
- agent.py: Functions related to the agent (e.g. pose)
- controller.py: Launching the controller, and the
thor_get
function. - object.py: Functions related to the objects (e.g. visible objects)
- scene.py: Functions related to scenes (e.g. scene names, convert scene to grid map, ThorSceneInfo, SceneDataset)
- grid_map.py: The
GridMap
class (0-based index of coordinates). Can be converted from an Ai2thor scene - interactions.py: Functions that correspond to calling different interaction actions in Thor (e.g.
OpenObject
means callingcontroller.step(action="OpenObject")
). - constants.py: The configuration, including parameters used as default when launching controllers.
- utils.py: Non-Thor related utility functions
In ai2thor, a pose is typically a tuple (position, rotation)
.
Although ai2thor likes to use dictionary, we often use tuples in this codebase:
-
position
(tuple): tuple(x, y, z)
; ai2thor uses (x, z) for robot base -
rotation
(tuple): tuple(x, y, z)
; pitch, yaw, roll.Not doing quaternion because in ai2thor the mobile robot can only do two of the rotation axes so there's no problem using Euclidean. Will use DEGREES. Will restrict the angles to be between 0 to 360 (same as ai2thor).
yaw refers to rotation of the agent's body. pitch refers to rotation of the camera up and down.
There are two kinds of pose representations throughout the code in this repo:
- Full pose: refers to a tuple
(position, rotation)
, defined below. - simplified pose refers to (x, z, pitch, yaw)
When specifying actions in ai2thor, you supply an action name and a dictionary of parameters. For navigation actions, we also use a format as follows:
(action_name, (forward, h_angle, v_angle))
We sometimes call variables "action_delta" or "delta" to refer to (forward, h_angle, v_angle)
This is only a few functions among all that you can run on the command line.
python -m thortils.controller
You can specify a scene
python -m thortils.controller FloorPlan2
The following enters debugger with an event object to play with
python -m thortils.controller --debug
python -m thortils.controller FloorPlan2 --debug
In scripts/
there is a utility program that starts a controller,
and allows you to control the agent with keyboard to navigate around.
python scripts/kbcontrol.py
Example output:
w
(MoveAhead)
a d
(RotateLeft) (RotateRight)
e
(LookUp)
c
(LookDown)
q
(quit)
w | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
w | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
a | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 225.0, 0.0))
d | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
a | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 225.0, 0.0))
Feel free to open issues about mistakes, or contribute directly by sending pull requests (to this REAdME documentation or to the codebase in general).