Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures

This repository contains instructions and demo code of the paper: Yichao Zhou, Jingwei Huang, Xili Dai, Linjie Luo, Zhili Chen, Yi Ma. "HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures". Technical Report. arXiv:2008.03286 [cs.CV].


Please visit our project website for the overview and download links of the HoliCity dataset. We provide sample data in the folder sample-data.


We provide training, validation, and test sets for evaluating your algorithms. Some ground truth labels of the test set (such as depth and normal maps) are reserved for future competition and not available at this time. You can report the performance on the validation set if you want to use our data split.


The panorama images are stored with equirectangular projection.


We provide perspective renderings of panorama images. The field of views of all the current renderings are 90 degrees and the principal point is at the center of the image. We provide the OpenGL code that renders panorama images to perspective images in


HoliCity provides refined geolocation of each viewpoint and its corresponding perspective images in the coordinate of WGS-84, i.e., the longitude and latitude. The above figure shows the viewpoints on Google Maps. The following table summarizes the meanings of entries of geolocation annotations that are specific to the viewpoints (panoramas).

Entry Explanations
loc The xyz coordinate of the viewpoints in the space of the CAD models. We provide utility functions model2gps and gps2model in for converting between the xy coordinate of the CAD model and the WGS84 coordinate, i.e., longitude and latitude. The z coordinate represents the distance between the camera and the terrain.
pano_yaw Yaw of the panorama camera (panorama center) with respect to the north.
tilt_yaw Tilt direction of the panorama camera with respect to the north. Such tilt exists because the street-view car might be on a slope.
tilt_pitch Tilt degree of the panorama camera.

The following code snippet converts a point in the space of the CAD model to the space of a local viewpoint.

from vispy.util.transforms import rotate
def panorama_to_world(d, loc, pano_yaw, tilt_yaw, tilt_pitch):
    """Convert d \in S^2 (direction of a ray on the panorama) to the world space."""
    axis = np.cross([np.cos(pano_yaw), np.sin(tilt_yaw), 0], [0, 0, 1])
    R = (rotate(pano_yaw, [0, 0, 1]) @ rotate(tilt_pitch, axis))[:3, :3]
    return d @ R + loc

The following code snippet draw the point in the space of a local viewpoint onto the corresponding panorama image.

def draw_point_on_panorama(d, panorama_image):
    """Draw d (direction of a ray on the panorama) on the panorama image"""
    d = d / LA.norm(d)
    pitch = math.atan(d[2] / LA.norm(d[:2]))
    yaw = math.atan2(d[0], d[1])
    x, y = (yaw + np.pi) / (np.pi * 2), (pitch + np.pi / 2) / np.pi
    plt.scatter(x * img.shape[1], (1 - y) * img.shape[0])

The following table summarizes the meanings of entries of geolocation annotations that are specific to the perspective renderings.

Entry Explanations
R The rotation and translation matrix that transforms the world coordinate of the CAD models to the camera space. This entry is derived from loc, yaw, and pitch.
q The rotational quaternion derived from R. Useful for training networks such as PoseNet.
fov The field of view.
yaw The direction of the camera with respect to the north.
pitch The direction of the camera with respect to the horizontal plane. Example: 0 means cameras are pointed horizontally and 90 means cameras are pointed toward the sky.
tilt Currently, this entry is not used. All the perspective renderings have zero tilt, which means that the up-forward planes of cameras are always perpendicular to the horizontal plane.

The following code snippet computes R from loc, yaw, and pitch.

def transformation_matrix(loc, yaw, pitch):
    """Computes 4x4 world-to-camera transformation matrix"""
    yaw = -yaw * np.pi / 180 + np.pi / 2
    pitch = pitch * np.pi / 180
    return lookat(
        [np.cos(pitch) * np.cos(yaw), np.cos(pitch) * np.sin(yaw), np.sin(pitch)],
        [0, 0, 1],

def lookat(position, forward, up=[0, 1, 0]):
    """Computes 4x4 transformation matrix to put camera looking at look point."""
    c = np.asarray(position).astype(float)
    w = -np.asarray(forward).astype(float)
    u = np.cross(up, w)
    v = np.cross(w, u)
    u /= LA.norm(u)
    v /= LA.norm(v)
    w /= LA.norm(w)
    return np.r_[u,, v,, w,, 0, 0, 0, 1].reshape(4, 4)

City CAD Models

Currently, AccuCities provides freely available CAD models for an area of 1 km2 in the London city to the public. We label all the viewpoints of HoliCity in this area with suffix _HD. If you are using HoliCity for research purposes, you might want to contact with AccuCities and apply for other city models. At the time of release, the unit of the HoliCity CAD model was meter. However, it recently was changed to millimeters. Read Issue 13 if you try to render the CAD model and have mis-matching issues between the HoliCity CAD model and this dataset.

Holistic Surface Segmentation

We segment the surface of the 3D CAD model based on (approximate) local curvature. The reference MaskRCNN implementation used in our paper can be found here (HoliCity-MaskRCNN). You should be able to use it to reproduce the results of our paper.

3D Planes

For each surface segment, we approximate it with a 3D plane whose equation is . We provide the parameter for the fitted plane of each surface segment. The plane fitting is done on the global level of the CAD model. Surface segments and planes exclude trees. provides the example code showing how to parse the plane parameters and draw depth maps and normal maps accordingly. We note that there is some difference between the ground truth depth maps and the depth maps derived from the parameter due to the error from global plane fitting, especially for large planes such as the ground.

Low-level 3D Representations

We provide renderings of depth maps and normal maps for each perspective image. The unit of depth maps is the meter, which is the same as the unit of the CAD model. Renderings with the suffix _HD have more details than the renderings with the _LD suffix. We note that the low-level represesntaions currently do not include moving objects such as cars and pedestrians.

Coordinate Systems

For normal maps and vanishing points, the coordinate system of the camera in HoliCity follows the convention of OpenGL: The camera is placed at (0, 0, 0). The x axis is toward the right of the image and the y axis is upward. The z axis points out of the screen to form the right-hand coordinate system, which means that the image plane is at .

The camera space coordinate can be constructed from values in depth maps with the following code:

def transformation_matrix(y, x, depth_map):
    px = (x - 255.5) / 256
    py = (-y + 255.5) / 256
    pz = depth[y, x, 0]
    return px * pz, py * pz, -pz

Vanishing Points

We provide the extracted vanishing points using the script

Semantic Segmentation

We provide the semantic segmentation for all the perspective images. The following table shows the meaning of the labels.

Values Meaning
0 Sky or nothing
1 Buildings
2 Roads
3 Terrains
4 Trees
5 Others


This work is sponsored by a generous grant from Sony Research US. We'd also like to thank Sandor Petroczi and Michal Konicek from AccuCities for the help of their London CAD model.


If you find HoliCity useful in your research, please consider citing:

    author={Zhou, Yichao and Huang, Jingwei and Dai, Xili and Luo, Linjie and Chen, Zhili and Ma, Yi},
    title={{HoliCity}: A City-Scale Data Platform for Learning Holistic {3D} Structures},
    year = {2020},
    archivePrefix = "arXiv", 
    note = {arXiv:2008.03286 [cs.CV]},