Initial version of handling audio object content. #304
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Object audio is an emerging immersive format, which is especially interesting for content creators. An audio object consists of 1) audio wave stem (typically mono), and 2) associated object metadata. Currently most prominent application is to utilize spatial objects, where the metadata describes the spatial position of the object as a function of time, and all objects in the content are then rendered simultaneously according to their metadata. Objects are in principle quite layout-agnostic, so the content can be reproduced in any multi-loudspeaker setup, or headphones etc. Typically, content creators prefer that the objects are merely spatial without any renderer-side interactivity to preserve the artistic intent.
This PR represents an initial step toward handling object audio compression using Opus. The naivest solution would be to code each object with a separate mono Opus instance at equal bitrate. However, given that the number of objects can be large, this is very consuming. Luckily, Opus already implements multistream coding, as well as a mechanism to adjust individual channel/stream rate based on analyzing the joint masking among all channels. No handling of object metadata, nor decoder side rendering, are implemented, and it may be reasonable to leave these outside Opus in general. All object wave PCMs are assumed to be inputted as e.g. a multichannel file.
The underlying spatial masking model for bitrate allocation assumes that all objects in the content/multistream are rendered with a typical spatial renderer (such as EAR). The decoder side interactivity (e.g. changing the levels) is not assumed here. In a typical listening room with reflections (as opposed to free field/anechoic room), the spatial masking release effects are not very prominent, and thus for a typical object content, this first approximation just assumes no spatial release from masking between the objects. Despite the simplicity,
object_analysisis added as a separate function to make future development easier.Related PRs to other Opus projects TODO.
Comments and suggestions welcome!