Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it would be better to use hdf5 files to export depth images, rather than exr #24

Open
mikeroberts3000 opened this issue Oct 1, 2016 · 10 comments
Milestone

Comments

@mikeroberts3000
Copy link

mikeroberts3000 commented Oct 1, 2016

Hello,

In my opinion, it would be better to use hdf5 (http://www.h5py.org/) to export depth images, rather than exr. My rationale here is as follows:

  1. As a data exchange format, hdf5 is more widely supported than exr.
  2. It is easier and requires fewer external dependencies to read hdf5 files in Python and Matlab.
  3. The HDF5 C++ library for writing hdf5 files is widely used, easy to build, and cross-platform.
@mikeroberts3000
Copy link
Author

@qiuwch In case it would be helpful, I'd be happy to provide a simple standalone C++ demo that saves a float array as an hdf5 file. Please do send me a note if you think this would be helpful.

@qiuwch
Copy link
Member

qiuwch commented Oct 2, 2016

@mikeroberts3000:

Thanks. I think this is an awesome idea. I am considering how to get more complex and structural data (the mesh you mentioned in another thread, for example) from unrealcv. A good file format that can support that is ideal.

The constraint I am facing is using 3rd library in UE4 is tricky. I prefer to use the 3rd libraries bundled with UE4 if possible, but if it is important to include new ones, I am also happy to do that. To use hdf5, I think I need to include hdf5 source code or compiled lib within unrealcv. To make it cross-platform, I can not ask users to install dependency with apt-get or brew install. This bundle might be tricky and I need to do some experiments with it.

I did some research about hdf5 yesterday, I found they provide compiled binaries in here, but no Mac version. I will need to start a new branch for this and merge it until it is considered stable. I would be very helpful if I can get any help from you, such as the demo.

@qiuwch
Copy link
Member

qiuwch commented Oct 2, 2016

Actually, I am also considering yaml or xml. But not sure whether the overhead is too big to bear.

@mikeroberts3000
Copy link
Author

@qiuwch Sure thing, happy to help.

I was thinking about this yesterday too. I 100% agree with your assessment. EXR is a tolerable format for depth images, but for other more general data streams you might want to export, it would be very unpleasant to try to pack them into EXR images, like triangle meshes, etc. HDF5 would be much better for more general data streams.

I'll start by providing the standalone example usage of HDF5. And then I'll help to test the HDF5 branch.

Getting the HDF5 binaries for Mac is easy using macports or brew. Getting the Python HDF5 dependencies for Mac is also easy. This is in contrast to reading EXR files, which I've concluded is not straightforward at all on Mac (see AcademySoftwareFoundation/openexr#207 and AcademySoftwareFoundation/openexr#208).

@mikeroberts3000
Copy link
Author

@qiuwch I think XML would be tolerable as well. Obviously HDF5 would be better for large assets. But even for dense image data, it wouldn't be the end of the world to store this in XML. I believe the COLLADA interchange format between 3D modeling programs is totally XML based, and people manage to make that work for large assets. Obviously XML will take longer than necessary to load. But I think a depth image stored as XML could be loaded and parsed in less than a second.

@mikeroberts3000
Copy link
Author

mikeroberts3000 commented Oct 2, 2016

@qiuwch One more thing. The a raw binary blob could also be dumped to a file. This is unsafe and non-portable across machines due to endian issues, but as long as you're running the UnrealCV Python/Matlab client on the same machine as the Unreal game, this strategy is safe. Of course, requiring that these two applications be running on the same machine is certainly an unpleasant limitation, even if this is how most people would use UnrealCV in practice.

Of all these potential strategies, I think supporting HDF5 is the preferred option.

@mikeroberts3000
Copy link
Author

mikeroberts3000 commented Oct 4, 2016

@qiuwch In case it is helpful, I'm including an end-to-end minimal code example for writing HDF5 files from C++, and reading them in Python.

This code example assumes that your compiler knows where to find the HDF5 C++ headers and binary files. I obtained these files on OSX by typing sudo port install hdf5 in the terminal.

This code also assumes that your compiler knows where to find the andres::Marray library. andres::Marray is an extremely lightweight, header-only multidimensional array library for C++, with a very convenient high-level interface for reading and writing HDF5 files. andres::Marray is only 3 header files. You can simply download the 3 header files from GitHub and you're good to go.

Here is the complete C++ program for writing HDF5 files. Note that if you already have a raw pointer to your data, andres::Marray allows you to create a lightweight view object over the data (similar to a NumPy view), and save the data to HDF5 using the view. So you can avoid an extra data copy. I demonstrate this approach in the second half of this code snippet.

#include <iostream>

#include "andres/marray.hxx"
#include "andres/marray-hdf5.hxx"

int main() {

    {
        // In this example, we manually fill in each element of an andres::Marray object and save to hdf5.
        size_t shape[] = {1, 6, 8};
        andres::Marray<float> m(shape, shape + 3, 0.0, andres::FirstMajorOrder);

        float val = 0.0;
        for(size_t i=0; i<shape[0]; i++) {
            for(size_t j=0; j<shape[1]; j++) {
                for(size_t k=0; k<shape[2]; k++) {
                    m(i,j,k) = val;
                    val += 1.0;
                }
            }
        }

        std::cout << m.asString();

        hid_t file = andres::hdf5::createFile("m.h5");
        andres::hdf5::save(file, "m", m);
        andres::hdf5::closeFile(file);        
    }

    {
        // In this example, we assume that we already have the raw pointer to some data, so we we
        // construct an andres::View object over the data, and we save the andres::View object to hdf5.
        float n_raw[1][6][8];
        for(size_t i=0; i<48; i++) {
            (**n_raw)[i] = 2.0 * static_cast<float>(i);
        }

        // From the andres::Marray tutorial:
        //
        //     "the first andres::FirstMajorOrder determines how coordinates
        //     are mapped to memory. The second andres::FirstMajorOrder 
        //     determines how scalar indices into the view are mapped to
        //     coordinates in the view."
        size_t shape[] = {1, 6, 8};
        andres::View<float> n(shape, shape + 3, **n_raw, andres::FirstMajorOrder, andres::FirstMajorOrder);

        std::cout << n.asString();

        hid_t file = andres::hdf5::createFile("n.h5");
        andres::hdf5::save(file, "n", n);
        andres::hdf5::closeFile(file);        
    }

    return 0;
}

I compiled this example by naming it tutorial.cpp and typing g++ tutorial.cpp -o tutorial /opt/local/lib/libhdf5.dylib in the terminal.

In Python, I can read the data with the following code snippet. Note that this code snippet assumes that you have the h5py Python module installed. This module came pre-installed with Enthought Canopy, and can be easily obtained from your favorite package manager.

import h5py

m_h5py = h5py.File("m.h5", "r")
m      = m_h5py["m"][:]

print m; print

n_h5py = h5py.File("n.h5", "r")
n      = n_h5py["n"][:]

print n

@qiuwch qiuwch modified the milestone: Wishlist Nov 29, 2016
@qiuwch
Copy link
Member

qiuwch commented Dec 16, 2016

I am working on integrating protobuf with UnrealCV
The development is in the feature/protobuf branch here.

I chose protobuf is because it provides a flexible way to define message and the proto file can naturally serve as documentation.

It is currently in a very early stage and I will keep you updated.

@qiuwch qiuwch modified the milestones: v0.4.0, Wishlist Dec 16, 2016
@bhack
Copy link

bhack commented Jun 10, 2017

@qiuwch It is really useful a protobuf integration

@qiuwch
Copy link
Member

qiuwch commented Jun 10, 2017

@bhack Actually I changed my mind after this half an year thought and development. My current decision is sending json for plain text structure data and use cnpy to send numpy array. https://github.com/rogersce/cnpy. This is almost done and I am doing some testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants