Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Head orientation tracking #68

Open
anssiko opened this issue Oct 1, 2020 · 10 comments
Open

Head orientation tracking #68

anssiko opened this issue Oct 1, 2020 · 10 comments

Comments

@anssiko
Copy link
Member

anssiko commented Oct 1, 2020

User’s head orientation information is exposed through platform APIs using sensors built into the mainstream headphones of today. See e.g. CMHeadphoneMotionManager.

Are there valid head orientation tracking use case for the web? I opened this issue to gather perspectives on this topic.

The Orientation Sensor is designed with extensibility in mind and as such would be a natural fit if the Devices and Sensors Working Group decides to advance with this proposal.

Given the group’s focus on privacy, also this feature would be carefully scrutinized for privacy considerations that differ from those of the sensors built into the devices not attached to the user’s head such as phones, tablets, and laptops. Given the WebXR Device API exposes poses through interfaces specialized for XR, we should carefully assess any considerations from that body of work as well.

@AldenBraverman
Copy link

Glad I found this comment - I believe the biggest use case for head orientation tracking being accessible to web would be for spatial audio playback for web applications. A simple web implementation of this idea to demonstrate the capability would be assigning pitch/roll/yaw head orientation parameters to Youtube 360 iFrame player's pitch/roll/yaw (the Youtube 360 format supports first order ambisonics). I have been trying to create a Capacitor plugin to expose head orientation values to an Ionic Web App that uses the iFrame player, I haven't had any success yet but I am curious if there are any alternative approaches to this.

@himwho
Copy link

himwho commented Dec 2, 2021

continuance for others, some exploration into this can be seen here:
https://demos.mach1.tech (codebase: https://github.com/Mach1Studios/m1-web-spatialaudioplayer)

AirPods spatial audio examples can be found here: https://github.com/Mach1Studios/Pod-Mach1SpatialAPI

@anssiko
Copy link
Member Author

anssiko commented Dec 3, 2021

@himwho thanks for the impressive demos!

I have a few questions that you as an expert in this space can probably help answer. Your responses will help the W3C Devices and Sensor Working Group assess the readiness to start work on this feature:

  • Would a Web API that represents head orientation data in quaternion or rotation matrix formats satisfy your requirements? If so, there's a path to extend https://w3c.github.io/orientation-sensor/ that means only small API surface changes, making the process faster.
  • What would be the "minimum viable" sampling frequency and sensor reading accuracy requirements that'd enable key spatial audio use cases? This group develops Web APIs that carefully mitigate any known privacy and security threats and I'm wondering whether the existing mitigations we have in place would work without compromising the use cases.
  • What mainstream products that you know of currently support head orientation tracking and expose (preferably cross-)platform/OS-level APIs that could (at least in theory) be integrated into open-source web engines?
  • Any other considerations we should be aware of.

Answers to even a subset of these questions would help us make progress with this feature. Thank you for your help!

@himwho
Copy link

himwho commented Dec 3, 2021

@anssiko Sure thing!

Would a Web API that represents head orientation data in quaternion or rotation matrix formats satisfy your requirements? If so, there's a path to extend https://w3c.github.io/orientation-sensor/ that means only small API surface changes, making the process faster.

If you have to pick one or the other and to be very safe I would suggest picking quaternion, even though all our examples use rotations typically in degrees because they are much more human usable. If possible use both as we often have to supply developers with advice on how to manage going back and forth when only one is offered. We have learned ways to help explain how we describe rotation since it does differ from platform to platform which is a large pain point for this use case, we have typed out our experiences and opinion on this here: https://research.mach1.tech/posts/describing-3d-motion/

What would be the "minimum viable" sampling frequency and sensor reading accuracy requirements that'd enable key spatial audio use cases? This group develops Web APIs that carefully mitigate any known privacy and security threats and I'm wondering whether the existing mitigations we have in place would work without compromising the use cases.

This is a great question, when it comes to the use case of spatial audio it has to be very frequent. We have developed internal tests to poll when a listener can detect an "orientation delay" however we never finished this poll to make it public. The aim is to get to 200hz or greater but from our findings 100hz and even 50hz are sometimes acceptable for spatial audio usecases but often users can detect that there is a delay. Once a user detects orientation delay, the resulting spatial audio playback becomes highly compromised and ineffective.
We have also found that while using devices/platforms that do not allow for steady sampling that just adding an additional smoothing filter can help; for example in this webexample we add a 1Euro filter to help smooth the orientation results and reduce jitters: https://github.com/Mach1Studios/m1-web-spatialaudioplayer#facetracking

What mainstream products that you know of currently support head orientation tracking and expose (preferably cross-)platform/OS-level APIs that could (at least in theory) be integrated into open-source web engines?

We have a blog post trying to track public IMU enabled devices here: https://research.mach1.tech/posts/imu-enabled-devices/
However we also know there are many more headphone devices entering market with IMUs embedded on them suitable for headtracking use cases. The challenge is there is not a lot of SDK or interfacing support for developers yet, despite there being a demand for it from the spatial audio community. Right now beyond a few consumer products, most developers are utilizing 3rd party IMU senors and attaching them to any pair of headphones to continue development while anticipating this becoming much more ubiquitous.

Any other considerations we should be aware of.

We have a ton of opinions on this topic from our own experiences in trying to create and develop for this use case over the last 5+ years, this has resulted in an SDK for helping anyone aggregate spatial audio approaches and help the community have more control over multichannel audio developments of any kind. We know from experience the current limitations to spatial audio and headtracking in depth, the biggest issues tend to lie in allowing one or two companies "define spatial audio" as a proprietary feature instead of releasing tools to let everyone access orientation from devices and handle multichannel audio in a more agnostic fashion; this is currently the biggest blocker to the medium. Our goal is to help each company that enters this discussion to see the common pain points so they can create open tools that benefit developers and creators alike.

@anssiko
Copy link
Member Author

anssiko commented Dec 3, 2021

@himwho, thank you for the most excellent response!

@reillyeon, see above. Do you happen to be familiar with any of the https://research.mach1.tech/posts/imu-enabled-devices/ that could possibly be used for prototyping? Your thoughts on the topic welcome. Feel free to tag folks who might be interested.

With my co-chair hat on, I’d say this feature as an extension to the existing API would be in scope of the charter. To be successful, to start, we need good use case(s) and web developer enthusiasm which I think we have. The trickier part seems to be the path to (initially single platform?) prototype and later cross-platform implementation.

@himwho
Copy link

himwho commented Dec 3, 2021

@reillyeon, see above. Do you happen to be familiar with any of the https://research.mach1.tech/posts/imu-enabled-devices/ that could possibly be used for prototyping? Your thoughts on the topic welcome. Feel free to tag folks who might be interested.

For prototyping we have had the best cross-platform success with this IMU and SDK. There are several open 3rd party IMU sensor suppliers but have preferred the support & accuracy/features of this one. Feel free to reach out anytime with more questions!

@anssiko
Copy link
Member Author

anssiko commented Dec 3, 2021

I was looking at that one exactly and knowing it communicates over BluetoothLE I tagged @reillyeon who happens to be working on the Web Bluetooth API.

@reillyeon
Copy link
Member

My first instinct is that spacial audio should be supported by Web Audio and other web media playback APIs so that browsers and operating systems can handle integrating input from the IMU on behalf of the page. I understand however that in a developing field there may not be sufficient standards to completely offload this processing and so allowing the developer to implement their own audio processing is the faster route. This feels similar to the transition that has happened for WebXR, which was originally built using polyfills processing IMU input from existing sensor APIs. Now WebXR is part of the browser engine which has the benefit of simplifying the code that must be provided by developers and improving user privacy by not exposing as much sensor information directly to the page.

Putting my implementer hat on I would rather not include custom code for supporting the embedded IMUs in various headsets. We have a similar problem with the Gamepad API and much prefer it when operating systems provide a consistent API so that the browser doesn't have to implement device-specific support as that greatly limits the number of devices we can officially support. Given that, unless operating systems recognize these IMUs as sensors, from an implementation perspective, I would recommend exploring polyfilling an OrientationSensor object using APIs like Web Bluetooth as suggested by @anssiko.

From a standards perspective then it seems like it would be useful to specify a HeadOrientationSensor, OrientationSensor({ 'position': 'head' }), or similar so that the behavior of libraries providing such sensors can be consistent across implementations.

CC Web Audio WG co-chair @hoch.

@hoch
Copy link

hoch commented Dec 3, 2021

The head orientation (3d vector) can be applied to this interface in Web Audio API:
https://webaudio.github.io/web-audio-api/#AudioListener

I was able to hook up some VR headsets with this class and it worked okay.

@himwho
Copy link

himwho commented Dec 3, 2021

@reillyeon

My first instinct is that spacial audio should be supported by Web Audio and other web media playback APIs so that browsers and operating systems can handle integrating input from the IMU on behalf of the page. I understand however that in a developing field there may not be sufficient standards to completely offload this processing and so allowing the developer to implement their own audio processing is the faster route.

While I agree that it would be unrealistic to supply a never ending list of playback APIs, I just want to point out some counter arguments to help your side progress, there is not and will not be a definitive single use of "spatial audio" or what "spatial audio" means. We are trying to help conversations like this by giving rough and hopefully unbiased technical definitions [our biased version here].

We are already seeing that there are many ways to approach "spatializing audio" to a listener and some of them introduced by companies like Dolby do not even include interactivity or headtracking, forcing other companies like Apple to make up their own version of headtracking applied to audio processing which has completely cutoff access to creators and developers and highly limits quality and usage. This is an example of assuming there is one global usage of processing "spatial audio" and create more issues than solutions.
Situations like this might likely continue until it is agreed that the term "spatial audio" is not referring to a single way to handle audio but instead an expectation of interactivity or immersion to the end-user [ideally defined by content creators]. In our opinion the first step should be aggregating all devices and orientation methods in a way that allows new emerging devices to easily be added so that all the different "spatial audio" use cases can make use of this instead of just the ones contributed to by select companies with siloed user expectations.

I hope this helps clarify but a great use case how things are currently blocked is to imagine musicians who are making orientation based interactive spatial music content but cannot distribute them anywhere due to these limitations at this current day.

Summary: When it comes to the spatial audio processing side it gets a lot more tricky and maybe it helps to just focus on aggregating everything that is needed before audio processing?

From a standards perspective then it seems like it would be useful to specify a HeadOrientationSensor, OrientationSensor({ 'position': 'head' }), or similar so that the behavior of libraries providing such sensors can be consistent across implementations.

This sounds smart and inline with our experiences!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants