Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ROS3 Driver #12

Open
garrettmflynn opened this issue Feb 18, 2022 · 17 comments
Open

Enable ROS3 Driver #12

garrettmflynn opened this issue Feb 18, 2022 · 17 comments

Comments

@garrettmflynn
Copy link

Is there a way to point to a ROS3 driver in the current implementation? I've gotten some interest to integrate my wrapper into DANDI instead of using the current cloud-based visualizer.

@bmaranville
Copy link
Member

The ROS3 driver makes HTTP requests directly from the C-code (using CURL), which is not really possible in WebAssembly. You can call back into Javascript to make requests, or you can use the Emscripten Fetch API, but directly using the CURL libraries won't work as far as I can tell. A patched version of H5FDs3comms.c could be written that uses e.g. the Fetch API, but that would be a bit of work.

@garrettmflynn
Copy link
Author

garrettmflynn commented Feb 19, 2022

My hdf5-io library already has a basic wrapper for the JavaScript Fetch API implemented, if this is what you meant by the first option. Though it seemed like the functionality we're looking for could only be implemented on the HDF5 reader itself.

Since I'm still quite new to HDF5 and C / WASM, I'll see if I can get @satra to comment more about the requirements for integrating with DANDI.

@satra
Copy link

satra commented Feb 19, 2022

indeed it looks like there is a thread here to talk about curl and web assembly: WebAssembly/WASI#107

the main intent is to leverage the streamability of these HDF5 files on DANDI as opposed to downloading them into the browser (likely to be impossible except for the tiniest of files).

but the streamability relies on treating s3 as a range-getable filesystem, which is what the curl layer in hdf5 does. in the python world s3fs works similarly, and i know there have been some attempts in various contexts in javascript, but i don't know where they stand.

@bmaranville
Copy link
Member

bmaranville commented Apr 28, 2022

It is possible to build this directly into the Emscripten filesystem without ROS3... I have some POC code that combines lazyFile.ts from https://github.com/phiresky/sql.js with typescript-lru-cache and can lazy-load files with h5wasm in a WebWorker.

Having to call your HDF5 access code in a worker adds some complexity, of course. I'll try to clean up my code and publish to github soon.

It does require that the https server being contacted support range requests, though!

@garrettmflynn
Copy link
Author

@satra would know more about this than me. At first glance, though, it seems like it might work!

Thank you for the response, @bmaranville!

@satra
Copy link

satra commented May 2, 2022

i think this is less about the ROS3 driver per se, and more about exposing a remote http or s3 object as an in memory object or a streamable object. in python the s3fs library uses direct calls to expose an s3 object in memory handling the translation of in-memory access calls to requests behind the scenes.

the challenge here is that the hdf5 files can be really large 10s to 100s of GBs. thus the main requirement is that any reading happens in streaming mode.

@bmaranville
Copy link
Member

There's a working demo at https://bmaranville.github.io/lazyFileLRU/ with source at https://github.com/bmaranville/lazyFileLRU

  • It is mounting a remote file in the Emscripten filesystem, where it fetches blocks on demand as disk "reads" occurs.
  • You can choose the size of your LRU buffer, as well as the block size.
  • It's set to blocks of 1024 bytes by default just so you can watch the network activity and see it retrieving a bunch of blocks.

It's a very simplistic example - you can explore the contents of the file by clicking "load" to do the initial mount, then enter a path and click "get". There are still issues with it - the lazyFile algorithm ramps up the number of chunks fetched for large reads, and if the number of chunks being fetched exceeds the LRUsize then it stops working. (try getting the dataset at /60.0/DAS_logs/pointDetector/counts with the default settings, for instance)

@garrettmflynn
Copy link
Author

I've spent some time playing around with the demo code—though I'm not able to get data from other URLs (e.g. "https://s3.us-east-2.amazonaws.com/hdf5ros3/GMODO-SVM01.h5", "https://dandiarchive.s3.amazonaws.com/blobs/43b/f3a/43bf3a81-4a0b-433f-b471-1f10303f9d35") because of CORS errors.

Screen Shot 2022-05-03 at 10 17 33 AM

The latter URL works with pynwb / h5py on ROS3 mode.

Do either of you have intuitions about solving this issue?

@bmaranville Also, can you include the dependencies for building the source in https://github.com/bmaranville/lazyFileLRU?

@bmaranville
Copy link
Member

Apologies - I forgot to include the package.json - it is there now. You should be able to do npm install and npm run build now.

As for the CORS errors, that is something that mostly has to be worked out on the server side. I had to add the following directives (for Apache):

Header always set Access-Control-Allow-Origin "*"
Header always set Access-Control-Allow-Headers "origin, x-requested-with, content-type, range"
Header always set Access-Control-Allow-Methods "GET, OPTIONS"
Header always set Access-Control-Expose-Headers "Accept-Ranges, Content-Encoding, Content-Length, Content-Range"

You also have to disable compression on the server, if you want to allow range requests. I added a flag "?gzip=false" and a corresponding rewrite-rule on the server to disable gzip, but you would do something else for S3 undoubtedly.

@satra
Copy link

satra commented May 3, 2022

i'm not sure why this is showing a cors problem. here is an example cors test on a file on dandi:

$ curl -IXGET -H 'Origin: http://example.com' https://dandiarchive.s3.amazonaws.com/blobs/a4f/71d/a4f71d55-15e1-416b-b718-275a2fa470a7
HTTP/1.1 200 OK
x-amz-id-2: FJXd6TBWzopDmce+gl1QjyeR4pJQxSJyGaRZOHy/wZj+KCvqmp0g8/0fUketGUw6eVF4LXdui00=
x-amz-request-id: JH7NBFXNJXPVY1X8
Date: Tue, 03 May 2022 21:56:26 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: PUT, POST, GET, DELETE
Access-Control-Expose-Headers: ETag
Access-Control-Max-Age: 3000
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Last-Modified: Mon, 18 Apr 2022 14:39:52 GMT
ETag: "a5017194dcc664aeb1dcb9866199e142-8"
x-amz-version-id: phrH8qPN78ho5DvP0uZtmLtLi7YqK6tk
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Server: AmazonS3
Content-Length: 476850135

@bmaranville
Copy link
Member

bmaranville commented May 3, 2022

The current implementation is making a HEAD request before any of the GET requests... Do you know if that is where it is failing?

Also I noticed that OPTIONS is not in the list of approved methods, and that might be needed for CORS

Edit: ah, I see you're making a HEAD request in your example so that's probably not it.

@satra
Copy link

satra commented May 3, 2022

it only supports GET at the moment. HEAD is missing. i'll add that to the setup on our side.

@bmaranville
Copy link
Member

It looks like simple range requests (with only a single, well-defined range) will be supported without an OPTIONS pre-flight on most browsers - that would speed things up. whatwg/fetch#1312

@garrettmflynn
Copy link
Author

garrettmflynn commented Aug 26, 2022

@satra @bmaranville It's a quick and dirty solution, but I've forked the lazyFileLRU to add a fallback to an asynchronous GET request (using Fetch) that aborts after reading the file headers. The source code is at https://github.com/garrettmflynn/lazyFileLRU and a demo with a 5GB file from DANDI at https://garrettflynn.com/lazyFileLRU/.

@bmaranville
Copy link
Member

clever!

@bmaranville
Copy link
Member

@garrettmflynn I was trying your implementation and it was not falling back to "GET" because the xhr request for "HEAD" in the try {} block doesn't throw an error if the request fails at the server or is blocked by CORS restrictions in the browser (just returns status with error code). It could be converted into a simple if/else block instead of try/catch - this then worked for me:

    // can't set Accept-Encoding header :( https://stackoverflow.com/questions/41701849/cannot-modify-accept-encoding-with-fetch
    xhr.open("HEAD", url, false);
    // // maybe this will help it not use compression?
    // xhr.setRequestHeader("Range", "bytes=" + 0 + "-" + 1e12);
    xhr.send(null);
    if (xhr.status >= 200 && xhr.status < 400) {
      datalength = Number(
        xhr.getResponseHeader("Content-length")
      );

      hasByteServing = xhr.getResponseHeader("Accept-Ranges") === "bytes";
      encoding = xhr.getResponseHeader("Content-Encoding");
    }
    else {
      console.log("HEAD request failed... falling back to aborted GET");
      const controller = new AbortController();
      const signal = controller.signal;

      await fetch(url, { signal }).then(response => {
        datalength = Number(response.headers.get("Content-length"));
        hasByteServing = response.headers.get("Accept-Ranges") === "bytes";
        encoding = response.headers.get("Content-Encoding");
        controller.abort();
      }).catch(this.#ready.reject)
    }

@garrettmflynn
Copy link
Author

garrettmflynn commented Dec 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants