Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stat for inputAudioLevel, before the audio filter #271

Closed
huibk opened this issue Nov 2, 2017 · 17 comments
Closed

Add stat for inputAudioLevel, before the audio filter #271

huibk opened this issue Nov 2, 2017 · 17 comments

Comments

@huibk
Copy link

huibk commented Nov 2, 2017

It could be useful to have a stat the represents the audio level before noise filtering or gain control. Use cases are:

  • distinguish silence from missing microphone input from silence due to noise suppression
  • determine the level of processing of an audio signal
@alvestrand
Copy link
Contributor

Doodling: This would logically be a stat on the source for a track, not a stat on the track itself, no?

if you have:

getUserMedia(audio, id=foo, volume=1.0) => track1
getUserMedia(audio, id=foo, volume=0.5) => track2

and get an input signal at 0.5 (-6dBov)

then track2 would get 0.25 (-12dBov) and track1 would get 0.5 as "level"; an input stat would get 0.5 for both.

One way of getting "unprocessed volume" would be

getUserMedia(audio, id=foo) => track1
getUserMedia(audio, id=foo, processing=none) => track2

track2 should then get the number you want.
Don't know if that works now.

@huibk
Copy link
Author

huibk commented Nov 3, 2017

Good point, getting a second track without processing may achieve the same. Presumably at higher cost though.

@vr000m
Copy link
Contributor

vr000m commented Jan 10, 2018

from #288:
My expectation was that input audio level and output level would match. And if someone did not hear anything then the volume stat being 0 would identify the problem.

I am trying to see now if we compare the input audio level with the audio output level and the audio output takes the volume into account, how do I diagnose if the issue is with the post decoding filter...?

@alvestrand
Copy link
Contributor

An implementation experiment showed that on current Chrome, you can't turn on echo cancellation on one track and turn it off on another track from the same source - which limits the usefulness of my other idea. https://crbug.com/802198
Firefox at least reports that echo has been turned off, but no discernible volume difference in simple tests. webrtc/samples#993 is the test page.

@na-g
Copy link

na-g commented Feb 14, 2018

Would RTCRtp{Contributing,Synchronization}Source (from webrtc-pc) be an appropriate place for this? It seems like this is data that one might want to poll frequently.

@na-g
Copy link

na-g commented Feb 14, 2018

It took me too long to realize this was about local media. Still, there may be a better API possible (one that is synchronous and fast) than getStats for audio data.

@alvestrand
Copy link
Contributor

It turns out present Chrome doesn't support having two tracks from the same source with differing processing requirements. So it makes sense, sort of.
@huibk is this still a burning desire?

@alvestrand
Copy link
Contributor

Thought: One possibility is to define a "source" stats object, which would have a reference from the "track" stats object. This is a place to hang both input audio level and input frame width and height (which has been requested in other contexts - see googFrameWidthInput non-standard stats).
@henbos @burnburn what do you think?

@alvestrand
Copy link
Contributor

Note: No matter what, it should be doing "accumulated energy", not "instant volume", for all the reasons given.

@vr000m
Copy link
Contributor

vr000m commented May 16, 2018

👍 If we can do video input as well, it would be worthwhile consider doing.

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

1. Source: Camera resolution
->
2. Constraints: E.g. downscaled video
->
Entering: WebRTC pipeline, the track is attached to a sender.
3. Sender knows input resolution (the per-constraint downscaled video).
->
The encoder is not exposed, but the sender's encoder encodes the video.
4. Sender knows output resolution (encoder might decide to downscale even more).
->
Sender creates RTP packets
->
IceTransport
->
Receiver gets RTP patckets
->
Jitter buffer, concealment, whatever happens to prepare the stuff for the decoder.
->
The decoder is not exposed, but the receiver's decoder decodes the video.
5. Receiver knows the resolution of the final track.
->
6. Possible post-procesing. I don't know if this happens, but it's conceivable that the WebRTC implementation decides that, if it's audio, "this is just silence", and mutes it, OR this could have been part of the decoding step.
Exiting the WebRTC pipeline.
->
7. The application might do additional processing through canvas etc, but now we have left "WebRTC land".
->
Render on screen.

Resolution may change 1-7, we can only provide getStats() for what is in the "WebRTC pipeline", e.g. 3-6.

Our current stats are for 3 and for 6 (or if we don't do anything at 6 I then the stats are for 5).
We could chose to expose more of these, but we cannot expose stuff that happens outside of the "WebRTC pipeline" without getStats() or equivalent on non-WebRTC primitives or on GETUSERMEDIA objects like MediaStreamTrack (note that the webrtc getStats() for track is not actually MediaStreamTrack stats but based on sender/receiver stats).

Am I missing anything?

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

2 and 3 are the same resolution

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

I think some of this issue stems from wanting stats before/after application does additional processing, but that is outside the scope of WebRTC pipeline.

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

@huibk / @vr000m Can you pinpoint where in the steps described above you want to have the metrics for? (I mostly used resolution as the example, but same applies for audioLevel)

@vr000m
Copy link
Contributor

vr000m commented Jun 21, 2018

My original concern was to have input and output metrics for all the components that transform media in some way.

For example, components of the media pipeline that downscale/upscale video or suppress/conceal audio.

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

The sender is comprised of multiple components and it might be worth considering splitting the dictionary up into multiple components to reflect that, a "sender"/"receiver" and "encoder"/"decoder" dictionary. Conceptually: https://photos.app.goo.gl/fyqGKxYMhM247dK47

The encoder should have input/output stats (whether resolution or audio energy, etc). Or we put them in the "sender" stats for now, but be clear about what is input and what is output, none of the names are clear about which step in the pipeline they reflect.

But let's make sure this is something we want before we change it. It could be that what is being asked for are metrics for before or after the WebRTC pipeline, which might be outside the scope of this spec.

@henbos
Copy link
Collaborator

henbos commented Jun 21, 2018

Based on discussion with @huibk and @vr000m:

This has to do with MediaStreamTrack and getUserMedia()/devices, stats for before and after processing/applying constraints, which is outside of WebRTC (and could be of interest for non-WebRTC applications) and this spec.

Closing this bug. Feel free to file a bug on https://github.com/w3c/mediacapture-main/issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants