Projecting 3d cuboids to camera images #24

beunguk · 2019-09-03T08:01:22Z

Hi,

I'm trying to draw 3d cuboid on 2d camera images, so the all corners could appear in the camera images. I can check there is projected_lidar_labels which is 2d bounding boxes for cuboids, but this is not I want to draw. For example, https://www.nuscenes.org/public/images/road.jpg is kind of projected image I would like to make.

I tried to use CameraCalibration and laser_labels in Context to draw cuboids, but I still couldn't figure it out. It seems like cuboids don't align well on objects in camera images.

Thanks.

The text was updated successfully, but these errors were encountered:

peisun1115 · 2019-09-03T17:08:38Z

Hi,
How are you going to use that 3d box on camera image? It is just for visualization? If yes, one way i can think of is that you can use the projected_lidar_labels to create the drawing you want by treating the top and bottom edge of the rectangle as the diagonal lines of the top/bottom surfaces.

Pei

beunguk · 2019-09-04T02:30:55Z

Hi,
Thank you for your reply.

However, this is not how I would like to draw cuboids. Assuming top and bottom line as diagonal lines of the top/bottom surfaces could make visualizations wrong. Projected cuboids on the camera images should matched to the corners of cuboids.

In other words, I would like to use center_x, center_y, center_z, length, width, height as cuboid representation and project them into camera images by using extrinsic and intrinsic matrices. I tried to use them but it seems like projected cuboids are slightly shifted.

I attached images what I have now (Red boxes are projected_lidar_labels and rest of them are cuboids I projected on images). I think it works well on the FRONT camera image because it has extrinsic matrix as almost identity matrix. However, in other images, cuboids are slightly shifted.

Beunguk

peisun1115 · 2019-09-04T02:39:24Z

It looks like this is because of your projection algorithm is not that accurate? Your projection should cover the same area as projected_lidar_labels?

Note that our cameras are rolling shutter cameras. It has non-trivial effect on side cameras when the SDC is moving at high speed (say > 30MPH). But this does not seem to be the case in your example. The SDC seems to be moving very slowly or even static by looking at the scene. Let me know if this is not the case.
If this is not caused by rolling shutter effect, i think something is wrong in your projection algorithm?

** We are planning to release a projection lib that takes rolling shutter effect into account depending on the community interest. No ETA yet.

peisun1115 · 2019-09-04T02:42:00Z

One more note: We provide all parameters needed for a user to implement their projection algorithm by taking rolling shutter effect into account in the existing dataset.

https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L229-L252

beunguk · 2019-09-04T07:51:05Z

Yes, projections should cover the same area as projected lader labels.

I tried to figure out the problem, but I couldn't... In the document, center_x, y, z are in vehicle frame and calibration(CameraCalibration) is for 'vehicle frame to camera frame'. So, it should be just matrix multiplication using extrinsic(rotation, translation) and then intrinsic matrices.

What am I missing? What should I consider more to have right results? Is there any kind of sample codes or pseudocode for this?

Thanks,
Beunguk

peisun1115 · 2019-09-04T15:36:22Z

As I mentioned above, one possibility is rolling shutter effect which might not be the problem in your case as the SDC seems to be moving slowly based on the camera images.

One thing to note is the camera frame definition.

The camera frame is placed in the center of the camera lens. The x-axis points down the lens barrel out of the lens. The z-axis points up. The y/z plane is parallel to the sensor plane. The coordinate system is right handed.

So y/z plane of the camera sensor frame is parallel with image. When you do the intrinsic transform, do something like:
u = -y / x; (width)
v = -z / x; (height)

YanShuo1992 · 2019-09-19T08:49:23Z

@beunguk
I am working on drawing the 3d projections. Could you please give any details?

Cheers,
Shuo

YanShuo1992 · 2019-09-24T08:31:37Z

@peisun1115
I try to use the coordinates from the laser_labels to multiply the extrinsic matrix from camera_calibrations and then the intrinsic matrix same as @beunguk.

However, I get the extremely large x and y value. Is there anything wrong in my method?

Cheers

pwais · 2019-09-24T09:17:28Z

Perhaps the cuboids (Frame.laser_labels) are implicitly stamped at Frame.timestamp_micros and the discrepancy in the images above is due to not the shutter of the camera but rather the camera-lidar sync. @beunguk what differences do you see for, say, CameraImage.pose_timestamp versus Frame.timestamp_micros? It's sad that these stamps are not just both nano-second timestamps :P but perhaps this difference might illuminate the problem.

That said, the differences in images above look to be a bit more than 100ms or so. Looks more like the camera-lidar matching in the actual Frame is wrong, like the camera images got put in the wrong Frame object. I haven't looked at how the lidar scans project onto the side cameras, though... that might disprove this hypothesis.

@peisun1115 What would honestly be really helpful would be nanosecond timestamps for the lidar scans, camera images, and all labels (e.g. perhaps just add a timestamp member to Label https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/label.proto#L20 ). In comparison, Argoverse and NuScenes / Lyft Level 5 all have this timestamp info, so doing things like projecting cuboids, interpolating labels, etc, are all very easy. Without those timestamps, I'm not sure it's even possible to compare, say, a Tracking algorithm between Waymo Open and these other datasets, because one would not be able to use the Waymo side cameras & cuboids with the error shown in @beunguk's images.

It would also be nice if motorcycles were broken out of the vehicle class, which would be on par with other datasets.

peisun1115 · 2019-09-24T15:35:17Z

@pwais Our camera and lidar are well synchronized. We have statistics for all the data released. The maximum error calculated by the timestamps at which lidar/camera scan the same physical point is bounded at [-6ms, 7ms] with >99.999% confidence which is super good (i doubt nuScenes or Lyft have anything close). This is calculated by real data in this dataset (you can calculated it too).

Nanotimestamp is not the problem here. Our projected_laser_labels are computed by the data available in this dataset. We did not use any other information. So it is likely that something wrong is in the projection code used. We are planning to release a rolling shutter projection code but unfortunately we are still going through legal.

If you have the projection code, i can help to take a look.

peisun1115 · 2019-09-24T16:27:25Z

@YanShuo1992

Can you copy-paste your code? i can help to take a look. Very likely, you did not get the coordinates in camera frame correctly used. When you multiple with intrinsic matrix, it needs to be something like

given a point in camera frame (x, y, z),
u_d = -y/x
v_d = -z/x

// apply distortion model on u_d, u_v
..... code to apply distortion is ignored....
u_d = u_d * f_u + c_u
v_d = v_d * f_v + c_v

haoliplus · 2019-09-25T02:07:01Z

@pwais Our camera and lidar are well synchronized. We have statistics for all the data released. The maximum error calculated by the timestamps at which lidar/camera scan the same physical point is bounded at [-6ms, 7ms] with >99.999% confidence which is super good (i doubt nuScenes or Lyft have anything close). This is calculated by real data in this dataset (you can calculated it too).

Nanotimestamp is not the problem here. Our projected_laser_labels are computed by the data available in this dataset. We did not use any other information. So it is likely that something wrong is in the projection code used. We are planning to release a rolling shutter projection code but unfortunately we are still going through legal.

If you have the projection code, i can help to take a look.

Wow, the synchronization is really good! Is the camera/lidar frame with delta time([-6ms, 7ms]) recorded directly on the car or somehow processed(e.g manually aligned after raw data is recorded?)

pwais · 2019-09-25T05:55:21Z

@peisun1115 I have no doubt that your lidar-camera sync could be the best on the planet, but then what happened in @beunguk 's examples? I'm not hypothesizing a problem with lidar-camera sync, but rather than the Frames themselves might have the wrong content, which would materialize as an error that looks similar to bad lidar-camera sync. It's really troubling to see this error, because one does not get results like this doing straightforward cuboid-to-image projections in other datasets.

@peisun1115 Today, how does one recover the timestamp of a camera image? Is it CameraImage.pose_timestamp + CameraImage.camera_readout_done_time? It would be helpful to have this documented somewhere, especially because these fields are not read anywhere in the code in this repo. Having examples of data usage in the repo is critical to communicating the semantics of the data.

In order to avoid simple projection problems and other errors as demonstrated in this Github Issue, it sure would be helpful to have a means for exporting the Waymo data to a more well-established format like Kitti (see e.g. in nuscenes https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/scripts/export_kitti.py ). There's probably no expectation that might be made available any time soon, but even the tensorflow/models and TPU teams have made an effort to support MSCOCO format (despite certain drawbacks of that format).

YanShuo1992 · 2019-09-26T02:17:59Z

FILENAME = '/content/waymo-od/tutorial/frames'
FILENAME ='segment-933621182106051783_4160_000_4180_000.tfrecord'
dataset = tf.data.TFRecordDataset(FILENAME, compression_type='')
for data in dataset:
frame = open_dataset.Frame()
frame.ParseFromString(bytearray(data.numpy()))
break

calibrations = sorted(frame.context.camera_calibrations, key=lambda c: c.name)

c = calibrations[0] #only need the front camera
extrinsic = np.reshape(np.array(c.extrinsic.transform), [4, 4])
extrinsic = extrinsic[0:3]
intrinsic = np.reshape(np.array(c.intrinsic), [3, 3])

laser_labels = frame.laser_labels
for l in laser_labels:
k_mat = np.reshape(np.array([l.box.center_x, l.box.center_y, l.box.center_z, 1]), [4, 1])
p = np.dot(extrinsic,k_mat)
p = np.dot(intrinsic,p)
x = p[0]/p[2]
y = p[1]/p[2]

@peisun1115 This is my code. I find some slices about the 3d coordinate projection. I do have many questions. Could you please help me to check the code and give me a hint?

Cheers

pwais · 2019-09-26T08:51:19Z

@YanShuo1992 I think the problem is that your code interprets the camera intrinsics incorrectly. The documentation is unfortunately confusing here. CameraCalibration.intrinsic is NOT (despite its name) the camera's intrinsic matrix (or camera matrix K) but rather CameraCalibration.intrinsic is a list containing parameters for K as well as the distortion model coefficients. See: https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L91 You might want something like:

f_u, f_v, c_u, c_v, k_1, k_2, p_1, p_2, k_3 = c.intrinsic
K = np.array([
     [f_u, 0,   c_u],
     [0,   f_v, c_v],
     [0,   0,     1]])

For the demo ( https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb ), I see a K of:

[[2.05555615e+03 0.00000000e+00 9.39657470e+02]
 [0.00000000e+00 2.05555615e+03 6.41072182e+02]
 [0.00000000e+00 0.00000000e+00 1.00000000e+00]]

Be mindful of their comment: "Note that this intrinsic corresponds to the images after scaling" -- it appears the units of the parameters they provide are not in pixels. I'm not sure where to look up the image size.. CameraImage ironically has no size parameters, just the jpeg image. (Most good data schemas embed the image dimensions because it's cheap to do so and saves the user the cost of having to decode the image to get them).

While the Waymo authors don't specify the distortion model exactly, I guess we're supposed to assume the one documented at OpenCV's website: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html Be careful though, some of the OpenCV code in the calibration module has slight differences between versions.

Perhaps we'll get some unambiguous symbol grounding for Waymo's data format when they provide example code for projecting cuboid labels into the camera frame. I might be wrong in my own interpretation, but it's clear that CameraCalibration.intrinsic is not K.

peisun1115 · 2019-09-26T23:08:50Z

Here is an example code snippet that projects a point to image without taking rolling shutter and distortion into account:


def project_point(point, camera_calibration):
  # vehicle frame to camera sensor frame.
  extrinsic = tf.reshape(camera_calibration.extrinsic.transform, [4, 4])
  vehicle_to_sensor = tf.matrix_inverse(extrinsic)
  point1 = point
  point1.append(1)
  point_camera_frame = tf.einsum('ij,j->i', vehicle_to_sensor, tf.constant(point1, dtype=tf.float32))
  u_d = - point_camera_frame[1] / point_camera_frame[0]
  v_d = - point_camera_frame[2] / point_camera_frame[0]
 
  # add distortion model here if you'd like.
  f_u = camera_calibration.intrinsic[0];
  f_v = camera_calibration.intrinsic[1];
  c_u = camera_calibration.intrinsic[2];
  c_v = camera_calibration.intrinsic[3];
  u_d = u_d * f_u + c_u;
  v_d = v_d * f_v + c_v;

  return [u_d.numpy(), v_d.numpy()]

I have tested this code on the waymo open dataset and it worked for the example flagged by @beunguk

We have documented the data format (including distortion model) here:
https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L91-L100

pwais · 2019-09-26T23:57:22Z

Thank you @peisun1115 . A few questions:

Will there be more details and/or code on what actual distortion model is assumed? The citation in dataset.proto that you highlight is only marginally helpful, and to be honest actual code would help dispel all confusion (see bazel build error with gcc-5 and gcc-7 #3 below).
Will there be any answer about recovering the camera image timestamp from the given protobuf data, or will we have to wait for the Waymo-legal-approved rolling shutter code example release to see that?
Your code sample alludes that CameraCalibration.extrinsic is actually camera-to-vehicle, while the code clearly claims CameraCalibration.extrinsic is "Vehicle frame to camera frame" https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L101 . If one were simply reading the protobuf comment, then the tf.matrix_inverse() should not be necessary; perhaps that's why @beunguk 's examples were off (by approximately 2m translation for side cameras). Perhaps what happened is that somebody read "vehicle from camera frame" and translated the comment to "vehicle to camera frame" in the open source code.
Do you know if this is the only extrinsic transform in the dataset where the value embedded in the protobuf record does not match the semantics of the documentation? (FWIW, this example is a key reason why other frameworks like ROS encapsulate the source and destination frames along with the serialized transformation matrix... it's too easy to get confused).
Is there any reason (in this code sample as well as the larger repo) that you guys insist on using Tensorflow (versus numpy) for all matrix algebra as well as Einstein notation? The Tensorflow teams have shown a commitment to making things accessible and these choices make the sample code relatively more esoteric. While I appreciate that TPUs require the use of Tensorflow operators, TPUs are not (yet?) ubiquitous in the research community. Moreover, as this repo demonstrates, there is interest in shedding the very heavy Bazel dependency if not TFRecords and Tensorflow as well: https://github.com/gdlg/simple-waymo-open-dataset-reader

YanShuo1992 · 2019-09-27T05:34:25Z

@peisun1115
Thanks for the code. I test it and it works well. I get the similar boxes as @beunguk .
I find the 3d boxes of front objects are slightly large but generally correct. However, the 3d boxes of the side objects are incorrect even in the front camera images. I don't think the rolling shutter and lens distortion would cause such differences.
I am wondering a parameter named 'heading' in the laser_labels. How to rotate +x to the surface normal of the SDC front face?
Could you please give me any details about it?

@pwais
Thank you for the comments. I do find the "Vehicle frame to camera frame" which makes me confused. LOL

peisun1115 · 2019-09-27T05:39:35Z

@YanShuo1992
Probably because of the way to compute box corners? heading is just the yaw. I was trying to be precise when describing it. I can check in a simple util function to compute box corners which should resolve the confusion.

@pwais

The distortion model (directly translated from OpenCV documentation) pseudo code:

  k1 = calibration_.intrinsic(4);
  k2 = calibration_.intrinsic(5);
 k3 = calibration_.intrinsic(6);  // same as p1 in OpenCV.
 k4 = calibration_.intrinsic(7);  // same as p2 in OpenCV
  k5 = calibration_.intrinsic(8);  // same as k3 in OpenCV.

r2 = u_n * u_n + v_n * v_n;
r4 = r2 * r2;
r6 = r4 * r2;

r_d = 1.0 + k1 * r2 + k2 * r4 + k5 * r6;

  // If the radial distortion is too large, the computed coordinates will
  // be unreasonable (might even flip signs).
  if (r_d < kMinRadialDistortion || r_d > kMaxRadialDistortion) {
    return false;
  }

  u_nd = u_n * r_d + 2.0 * k3 * u_n * v_n + k4 * (r2 + 2.0 * u_n * u_n);
  v_nd = v_n * r_d + k3 * (r2 + 2.0 * v_n * v_n) + 2.0 * k4 * u_n * v_n;

There is no notion of timestamp of an image in the context of rolling shutter camera. Each pixel has its own timestamp. camera_image.pose_timestamp is the timestamp of the image center.
I have fixed the comment in the codebase. We have consistent definitions of pose transform and extrinsic. If you see any comment that is different from others, then the comment is wrong not code. I've scanned through the codebase and did not find more.
There is no particular reason. I am more familiar with tensorflow libs and found einsum very clean to use :)

peisun1115 · 2019-09-27T05:47:57Z

@YanShuo1992 Without including distortion, projection points outside of the camera FOV are very likely to work very poorly. That might be the reason. It will be helpful if you can copy/paste your projection results (only for objects inside the camera image's FOV).

YanShuo1992 · 2019-09-27T06:24:35Z

@peisun1115
I only draw three points instead of the boxes.
p1 = project_point([l.box.center_x - 0.5length, l.box.center_y - 0.5width, l.box.center_z - 0.5height],calibrations[index]) #blue squares
p2 = project_point([l.box.center_x + 0.5length, l.box.center_y + 0.5width, l.box.center_z + 0.5height],calibrations[index]) #red squares
pc = project_point([l.box.center_x, l.box.center_y, l.box.center_z], calibrations[index]) #green squares

The red boxes are the projected_lidar_labels in the frame. I find a margin between the objects and the projected_lidar_labels.

peisun1115 · 2019-09-27T06:31:32Z

You don't have heading when getting points from the box? Are they 0? How fast is the SDC moving in the scene you selected (check pose difference)? If you do not want to worry about rolling shutter, focus front camera first. Then worry about side camera

YanShuo1992 · 2019-09-27T08:47:51Z

@peisun1115
The heading is not 0. I am not sure how to use it. I think the projected 3d points will be on the corners of the projected_lidar_labels if I use the heading information, is that correct?

Sorry, I don't know what SDC is as well. I read the comments in dataset.proto. I assume it is a parameter about the velocity. The larger numbers cause the rolling shutter, is that correct? How will the SDC affect the projection?

peisun1115 · 2019-09-29T17:33:28Z

You can try this function to get box corners.

https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/utils/box_utils.py#L92

jjin12 · 2019-09-30T19:51:06Z

@peisun1115 Does only the file name with xxx_with_camera_labels.tfrecord contain the corresponding image frame?

peisun1115 · 2019-09-30T19:53:04Z

All files contain images (if that is what you meant by 'image frame'). Files with suffix '_with_camera_labels.tfrecord' contain '2D' image labels labeled by humans. All files contain 2D labels projected from lidar (see projected_lidar_labels).

jjin12 · 2019-09-30T19:55:48Z

@peisun1115 Thanks for your quick reply.

pwais · 2019-10-04T10:53:54Z

@peisun1115 I think there might be an axis transform / extrinsic rotation missing from your demo code (or implicit and opaque), and perhaps that lead to some confusion in @YanShuo1992 's images. In particular, it appears that the camera extrinsics do not account for e.g. the x-z axis swap and the y-axis inversion that are rolled into the extrinsics published in at least three other open datasets.

If one wants to compute pixel-frame 2d point p from ego-frame 3d point P using the standard method p = K [R | T] P (ignoring distortion for now...), then roughly:

n = 0 # Front camera?
camera_calibration = frame.context.camera_calibrations[n]
extrinsic = tf.reshape(camera_calibration.extrinsic.transform, [4, 4])
RT = tf.matrix_inverse(extrinsic).numpy()

f_u = camera_calibration.intrinsic[0]
f_v = camera_calibration.intrinsic[1]
c_u = camera_calibration.intrinsic[2]
c_v = camera_calibration.intrinsic[3]

K = np.array([
    [f_u, 0,   c_u],
    [0,   f_v, c_v],
    [0,   0,   1  ],
])

p = K * RT * P  # NOPE !

BUT the above won't work because the extrinsic transform RT in the TFRecords files appears to maintain the same axes as the ego frame: e.g. +z up. But the camera frame is traditionally +z depth (+x in ego frame). I had to do something like this, at least for the front camera:

p_cam = RT.dot(P.T)

# Move into camera sensor frame
p_cam = p_cam[(2, 1, 0), :]
p_cam = p_cam[(1, 0, 2), :] # ??
p_cam[1, :] *= -1 # ??
p_cam[0, :] *= -1 # ??

p = K.dot(p_cam)
p[:2, :] /= p[2, :]

@peisun1115 It would really be helpful if the documentation of the calibration data in dataset.proto were improved, e.g. explaining exactly what CameraCalibration.extrinsic is intended to be, and if the data is really the same in all the TFRecords files. The Waymo dataset here and elsewhere strays from convention for no apparent reason. While I appreciate there are unfortunately different conventions that are at-odds for basic things (e.g. encoding of quaternions, euler angles, basic matrix multiplication versus einstein notation, numpy vs Tensorflow), it would be helpful if Waymo provided data and code that's on par with that provided in other datasets like NuScenes, Lyft Level 5, and Argoverse. Waymo has already elided key classes like motorcycles and road obstacles (e.g. cones) as well as chosen a particularly disconcerting and unfriendly legal position towards the sharing of model weights. It's frustrating to then have problems with basic things like trying to use camera calibration parameters correctly.

peisun1115 · 2019-10-05T23:45:41Z

@pwais

We have defined what camera sensor frame is on our website. Copied below:
"""
The camera frame is placed in the center of the camera lens. The x-axis points down the lens barrel out of the lens. The z-axis points up. The y/z plane is parallel to the camera plane. The coordinate system is right handed.
"""
We have documented that the camera extrinsic is to transform points from 'camera sensor frame' to 'vehicle frame'.

I can copy this to our code (dataset.proto) to clarify.

This slide is a good introduction to camera matrices (including extrinsics and intrinsics). The following equations are simply applying the geometry (see slide 8 in the provided link).

u_d = -y/x
v_d = -z/x

u_d = u_d * f_u + c_u
v_d = v_d * f_v + c_v

I think the way our camera sensor frame is defined (x-forward) is a little non-conventional. We tried to clarify that on our website. Other than that, i think the way to define camera model is pretty standard. It is pretty much the same as the CMU slides mentioned in point 2. What is the 'convention' you are talking about?

pwais · 2019-10-08T21:20:35Z

I missed this earlier, but the repo I previously linked to has solid support for projecting 3d cuboids to images:

https://github.com/gdlg/simple-waymo-open-dataset-reader/blob/master/examples/example.py#L39
- Notably, this utility handles the Waymo box properly. The code is very concise, close to one half or one third the amount of code versus Waymo's equivalent.
They also have a project-to-camera utility that (1) correctly inverses Waymo's wrongly-documented extrinsics and (2) includes the axis transform that's missing from those extrinsics and is not documented anywhere in the Waymo github repo:
- https://github.com/gdlg/simple-waymo-open-dataset-reader/blob/master/examples/example.py#L99

Here's the repo: https://github.com/gdlg/simple-waymo-open-dataset-reader

Additional notable features:

They provide a TFRecord parser that does not depend on Tensorflow, so you can read and use the Waymo data without any of the giant Waymo dependencies (e.g. Bazel). You don't even need Tensorflow if you don't want it. (Source: https://github.com/gdlg/simple-waymo-open-dataset-reader/blob/master/simple_waymo_open_dataset_reader/__init__.py#L19 )

Examples using the code in that repo below. Note a couple of things:

Waymo does not label construction cones consistently; see the false negative in the left of the 2nd image. Moreover, the boxes do not include the entirety of the object, unlike other datasets.
Waymo appears to blank out some signs as if they were license plates (first image), so careful doing anything with the provided sign labels.
There are a great number of visible cars that are unlabeled (probably because they're beyond the 70m max). Be careful to trim your predictions, as they will incorrectly be interpreted as false positives.

peisun1115 · 2019-10-09T17:13:49Z

The projection lib they have is a good way to demo the data format. Note that they don't take distortion and rolling shutter into account. .
Note that TFRecord is officially supported by tensorflow. The repo you linked re-implemented some of the logic in the official tensorflow tf record reader. But it misses features such as CRC check. There may also be compatibility issues in the future. Feel free to use that if that meets you needs. We prefer to staying with the officially supported reader in tensorflow for now.
We have lots of dependencies as we have other code in this repo. For example, we provide libs to build model, tf ops to do eval, c++ code to do eval. We try to keep our code quality high. Users are welcome to write their own code (e.g. the repo you linked) if they only need part of the functionality. The repo you linked is a good example of that.
Regarding the labels. Please refer to the labeling policy we publish. Also keep in mind that projection has errors (esp if you don't take distortion and rolling shutter into account). Check the label in lidar 3d view.
Yes, we only label objects within 75m for 3d labels. Again, pls check the lidar 3d view for 3d labels.

I am closing this issue as I think we have clarified the lidar->camera projection and you guys are able to make it roughly work. As we mentioned in the thread, we are planning to release a projection lib but we don't have an ETA yet. Please stay tuned.

pwais · 2019-10-09T21:56:01Z

Note: the Simple Waymo Open Dataset Reader doesn't check CRC codes (though that might be irrelevant given noise in the Waymo labels). However, if you do need a TFRecord reader that checks CRC codes, you might check out Apache Beam's reader here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/tfrecordio.py#L67 The cited Beam code was authored by a Google engineer and Google Cloud Sales Engineers are pushing Beam pretty hard onto customers (e.g. Google Dataflow), so it's likely to stay updated.

The Tensorflow-free options cited above allow:

easier data analysis, as you don't have to create a Tensorflow Session to just simply read data
random access to records, while the Tensorflow API only exposes a stream of string-encoded protobufs

cch2016 · 2020-06-28T03:28:15Z

Hi,
Thank you for your reply.

However, this is not how I would like to draw cuboids. Assuming top and bottom line as diagonal lines of the top/bottom surfaces could make visualizations wrong. Projected cuboids on the camera images should matched to the corners of cuboids.

In other words, I would like to use center_x, center_y, center_z, length, width, height as cuboid representation and project them into camera images by using extrinsic and intrinsic matrices. I tried to use them but it seems like projected cuboids are slightly shifted.

I attached images what I have now (Red boxes are projected_lidar_labels and rest of them are cuboids I projected on images). I think it works well on the FRONT camera image because it has extrinsic matrix as almost identity matrix. However, in other images, cuboids are slightly shifted.

Beunguk

Sorry. I have a question: when you vis projected_lidar_labels,how do you confirm which camera images?

likegogogo · 2021-10-29T00:51:40Z

Hi,
Thank you for your reply.
However, this is not how I would like to draw cuboids. Assuming top and bottom line as diagonal lines of the top/bottom surfaces could make visualizations wrong. Projected cuboids on the camera images should matched to the corners of cuboids.
In other words, I would like to use center_x, center_y, center_z, length, width, height as cuboid representation and project them into camera images by using extrinsic and intrinsic matrices. I tried to use them but it seems like projected cuboids are slightly shifted.
I attached images what I have now (Red boxes are projected_lidar_labels and rest of them are cuboids I projected on images). I think it works well on the FRONT camera image because it has extrinsic matrix as almost identity matrix. However, in other images, cuboids are slightly shifted.
Beunguk

Sorry. I have a question: when you vis projected_lidar_labels,how do you confirm which camera images?

use CameraName.Name to confirm

peisun1115 closed this as completed Oct 9, 2019

peisun1115 mentioned this issue Oct 25, 2019

does all image already be dedistorted? #53

Closed

hhhmoan mentioned this issue Nov 4, 2019

Project the 3D box into camera side left and find some bias #66

Open

peisun1115 mentioned this issue Mar 25, 2020

Official script to convert your dataset to the kitti dataset format #107

Open

anthony-chaudhary mentioned this issue Apr 19, 2022

3D Camera Ease of Use V2 diffgram/diffgram#821

Open

This was referenced Apr 10, 2023

Issue about "extrinsic":<float> [4, 4] -- camera extrinsic matrix OpenDriveLab/OpenLane#60

Closed

How to project the 3D point to image？ OpenDriveLab/OpenLane#57

Closed

JulienStanguennec-Leddartech mentioned this issue May 17, 2024

Distorded or undistorded images - Waymo-open-dataset v2 #834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Projecting 3d cuboids to camera images #24

Projecting 3d cuboids to camera images #24

beunguk commented Sep 3, 2019

peisun1115 commented Sep 3, 2019

beunguk commented Sep 4, 2019

peisun1115 commented Sep 4, 2019

peisun1115 commented Sep 4, 2019 •

edited

beunguk commented Sep 4, 2019

peisun1115 commented Sep 4, 2019

YanShuo1992 commented Sep 19, 2019

YanShuo1992 commented Sep 24, 2019

pwais commented Sep 24, 2019

peisun1115 commented Sep 24, 2019 •

edited

peisun1115 commented Sep 24, 2019

haoliplus commented Sep 25, 2019

pwais commented Sep 25, 2019

YanShuo1992 commented Sep 26, 2019 •

edited

pwais commented Sep 26, 2019

peisun1115 commented Sep 26, 2019 •

edited

pwais commented Sep 26, 2019

YanShuo1992 commented Sep 27, 2019

peisun1115 commented Sep 27, 2019 •

edited

peisun1115 commented Sep 27, 2019 •

edited

YanShuo1992 commented Sep 27, 2019 •

edited

peisun1115 commented Sep 27, 2019

YanShuo1992 commented Sep 27, 2019

peisun1115 commented Sep 29, 2019

jjin12 commented Sep 30, 2019

peisun1115 commented Sep 30, 2019

jjin12 commented Sep 30, 2019

pwais commented Oct 4, 2019 •

edited

peisun1115 commented Oct 5, 2019 •

edited

pwais commented Oct 8, 2019

peisun1115 commented Oct 9, 2019

pwais commented Oct 9, 2019

cch2016 commented Jun 28, 2020

likegogogo commented Oct 29, 2021

Projecting 3d cuboids to camera images #24

Projecting 3d cuboids to camera images #24

Comments

beunguk commented Sep 3, 2019

peisun1115 commented Sep 3, 2019

beunguk commented Sep 4, 2019

peisun1115 commented Sep 4, 2019

peisun1115 commented Sep 4, 2019 • edited

beunguk commented Sep 4, 2019

peisun1115 commented Sep 4, 2019

YanShuo1992 commented Sep 19, 2019

YanShuo1992 commented Sep 24, 2019

pwais commented Sep 24, 2019

peisun1115 commented Sep 24, 2019 • edited

peisun1115 commented Sep 24, 2019

haoliplus commented Sep 25, 2019

pwais commented Sep 25, 2019

YanShuo1992 commented Sep 26, 2019 • edited

pwais commented Sep 26, 2019

peisun1115 commented Sep 26, 2019 • edited

pwais commented Sep 26, 2019

YanShuo1992 commented Sep 27, 2019

peisun1115 commented Sep 27, 2019 • edited

peisun1115 commented Sep 27, 2019 • edited

YanShuo1992 commented Sep 27, 2019 • edited

peisun1115 commented Sep 27, 2019

YanShuo1992 commented Sep 27, 2019

peisun1115 commented Sep 29, 2019

jjin12 commented Sep 30, 2019

peisun1115 commented Sep 30, 2019

jjin12 commented Sep 30, 2019

pwais commented Oct 4, 2019 • edited

peisun1115 commented Oct 5, 2019 • edited

pwais commented Oct 8, 2019

peisun1115 commented Oct 9, 2019

pwais commented Oct 9, 2019

cch2016 commented Jun 28, 2020

likegogogo commented Oct 29, 2021

peisun1115 commented Sep 4, 2019 •

edited

peisun1115 commented Sep 24, 2019 •

edited

YanShuo1992 commented Sep 26, 2019 •

edited

peisun1115 commented Sep 26, 2019 •

edited

peisun1115 commented Sep 27, 2019 •

edited

peisun1115 commented Sep 27, 2019 •

edited

YanShuo1992 commented Sep 27, 2019 •

edited

pwais commented Oct 4, 2019 •

edited

peisun1115 commented Oct 5, 2019 •

edited