Skip to content

Utility functions when working with Ai2-THOR. Try to do one thing once.

License

Notifications You must be signed in to change notification settings

zkytony/thortils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

thortils

This is a repository that contains utility functions when working with Ai2-THOR, an open-source simulator of embodied agents in household environments. The idea of this repository is that even though Ai2-THOR updates its version rather frequently with potential changes to its API, thortils will always provide the SAME API for commonly useful functionalities one would need ("Do one thing once"). This includes, for example:

  • Launching a controller

    import thortils as tt
    
    controller = tt.launch_controller({"scene": "FloorPlan1"})
  • Get visible objects

    import thortils as tt
    
    controller = tt.launch_controller({"scene": "FloorPlan1"})
    event = controller.step(action="Pass")
    result = tt.thor_visible_objects(event)

    The result is a list, where each element is a dictionary that contains metadata about an object (from the event):

    >>> result[0]
    {'name': 'Cabinet_5e0161e9', 'position': {'x': -1.8499999046325684, 'y': 2.015000104904175, 'z': 0.3799999952316284}, 'rotation': {'x': -0.0, 'y': 90.0, 'z': -0.0}, 'visible': True, 'obstructed': False, 'receptacle': True, ...
    
  • Construct a 3D map of a scene as a point cloud, using Open3D

    import thortils as tt
    
    controller = tt.launch_controller({"scene": "FloorPlan1"})
    mapper = tt.map3d.Mapper3D(controller)
    mapper.automate(num_stops=20, sep=1.5)
    mapper.map.visualize()

    The output looks like:

  • Construct a proper 2D map by projecting the 3D map

    # continuing from the above example
    grid_map = mapper.get_grid_map(floor_cut=0.1)  # treat bottom 0.1m as floor
    viz = tt.utils.visual.GridMapVisualizer(grid_map=grid_map, res=30)
    img = viz.render()
    viz.show_img(img)

    The output looks like

    See more at test_mapper.py

  • Projection of object detection bounding boxes onto the 2D grid map

    For code, please refer to the test tests/test_project_object_detection_gridmap.py linked above. The result looks like:

  • Get shortest path to object. Please refer to the linked function for details.

Versions

The branches of thortils are named after the version it is built for. Currently, the version on this branch is 3.3.4. For later versions of Ai2-THOR, you can create a branch on top of this one, and run tests under tests/, and fix bugs due to the Ai2-THOR version upgrade. The API of thortils should stay the same or could be expanded.

Projects that use thortils

  • COS-POMDP: Code for "Towards Optimal Correlational Object Search" (ICRA 2022)
  • ai2thor-web: Running AI2-THOR in browser to conduct user studies with non-technical, remote participants.

Citation

If you find this package useful, please cite the paper "Towards Optimal Correlational Object Search, International Conference on Robotics and Automation (ICRA), 2022.

@inproceedings{zheng2022towards,
  title={Towards Optimal Correlational Object Search,
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  author={Zheng, Kaiyu and Chitnis, Rohan and Sung, Yoonchang and Konidaris, George and Tellex, Stefanie},
  year={2022}
}

The codebase for this paper, a good example of using this package, is here: https://github.com/zkytony/cos-pomdp

Proper 2D Grid Map

Ai2-THOR by default provides a "GetReachablePositions" function. You might want to use this to construct a grid map of the scene, but that is actually incorrect. Because there are many places in the scene that are not reachable and will be excluded. As an example to illustrate the problem, below is a screenshot of a kitchen scene. The left shows the first-person view, and the right shows the grid map obtained based on the "GetReachablePositions" function. The black cell corresponds to an occupied place, such as on the table or the counter. (Ignore the colors and the graph for now)

The problem is that clearly there are more occupied place that is not included in this grid map. Also, some we may not want to distinguish some occupied places, such as the areas inside a fridge or a cabinet. We cannot do it using this method.

Instead, thortils provides a method proper_convert_scene_to_grid_map which obtains the 2D grid map by projecting a 3D map of the scene constructed from a sequence of RGBD images collected at sampled viewpoints within the scene. This 2D grid map is an occupancy grid map, which means a grid cell is either free, occupied, or unknown. This accounts for places that could be inside a container (like a fridge).

The constructed 3D map (left); the ceilling and floor (middle); the walls and furnitures (right)

The 2D projection (left) and the corresponding 2D grid map (right). Black indicates obstacle (or occupied), gray indicates unknown (e.g. inside fridge), and cyan indicates free space (the robot can access)

Note that the coordinates of this grid map are 0-based integers (instead of metric), which can be more convenient to work with. The granularity depends on the grid_size setting of the Ai2-THOR controller.

Installation

  1. Clone the repository and then install it by:

    pip install -e .
    
  2. Run a little test

    cd tests
    python test_scene_to_grid_map.py
    

    Expected output:

    FloorPlan22
    xxx............xxxx
    xxx............xxxx
    xxx............xxxx
    xxx...........xxxxx
    xxx..........xxxxxx
    xxx.........xxxxxxx
    xxx........xxxxxxxx
    xxx.......xxxxxxxxx
    xx........xxxxxxxxx
    xx........xxxxxxx..
    ..........xxxxxxx..
    ...................
    ...................
    ...................
    

    (note: this is only a test; this grid map is not actually an accurate reflection of the scene.)

    If this works, then you should be good to go. Try running the other tests under tests/.

  3. Optionally, obtain a scene dataset. You can either:

    • Download scenes.zip and scene_scatter_plots.zip and decompress them in the root directory of this repository, or

    • Run the following scripts to generate these two datasets:

      cd scripts
      python build_scene_dataset.py ../scenes
      python create_scatter_plots.py ../scenes/ ../scene_scatter_plots
      

    This is only necessary if you would like to use the functions provided by SceneDataset.

Organization

Inside thortils/:

  • agent.py: Functions related to the agent (e.g. pose)
  • controller.py: Launching the controller, and the thor_get function.
  • object.py: Functions related to the objects (e.g. visible objects)
  • scene.py: Functions related to scenes (e.g. scene names, convert scene to grid map, ThorSceneInfo, SceneDataset)
  • grid_map.py: The GridMap class (0-based index of coordinates). Can be converted from an Ai2thor scene
  • interactions.py: Functions that correspond to calling different interaction actions in Thor (e.g. OpenObject means calling controller.step(action="OpenObject")).
  • constants.py: The configuration, including parameters used as default when launching controllers.
  • utils.py: Non-Thor related utility functions

Notes on the Codebase

Poses

In ai2thor, a pose is typically a tuple (position, rotation). Although ai2thor likes to use dictionary, we often use tuples in this codebase:

  • position (tuple): tuple (x, y, z); ai2thor uses (x, z) for robot base

  • rotation (tuple): tuple (x, y, z); pitch, yaw, roll.

    Not doing quaternion because in ai2thor the mobile robot can only do two of the rotation axes so there's no problem using Euclidean. Will use DEGREES. Will restrict the angles to be between 0 to 360 (same as ai2thor).

    yaw refers to rotation of the agent's body. pitch refers to rotation of the camera up and down.

There are two kinds of pose representations throughout the code in this repo:

  • Full pose: refers to a tuple (position, rotation), defined below.
  • simplified pose refers to (x, z, pitch, yaw)

Actions

When specifying actions in ai2thor, you supply an action name and a dictionary of parameters. For navigation actions, we also use a format as follows:

(action_name, (forward, h_angle, v_angle))

We sometimes call variables "action_delta" or "delta" to refer to (forward, h_angle, v_angle)

Command Line Usage

This is only a few functions among all that you can run on the command line.

Start controller

python -m thortils.controller

You can specify a scene

python -m thortils.controller FloorPlan2

The following enters debugger with an event object to play with

python -m thortils.controller --debug
python -m thortils.controller FloorPlan2 --debug

Keyboard control

In scripts/ there is a utility program that starts a controller, and allows you to control the agent with keyboard to navigate around.

python scripts/kbcontrol.py

Example output:

            w
        (MoveAhead)

    a                 d
(RotateLeft)     (RotateRight)

    e
(LookUp)

    c
(LookDown)

    q
(quit)

w | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
w | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
a | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 225.0, 0.0))
d | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 270.0, 0.0))
a | Agent pose: ((-1.25, 0.9009995460510254, 1.0), (-0.0, 225.0, 0.0))

Contributor

Feel free to open issues about mistakes, or contribute directly by sending pull requests (to this REAdME documentation or to the codebase in general).

About

Utility functions when working with Ai2-THOR. Try to do one thing once.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages