Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTCMediaStreamTrackStats.audioLevel clarification #193

Closed
na-g opened this issue Apr 5, 2017 · 15 comments
Closed

RTCMediaStreamTrackStats.audioLevel clarification #193

na-g opened this issue Apr 5, 2017 · 15 comments
Assignees

Comments

@na-g
Copy link

na-g commented Apr 5, 2017

This is in reference to the member definition for RTCMediaStreamTrackStats.audioLevel.

Is the second sentence indicating that RTCMediaStreamTrackStats.audioLevel is obtained via remapping (from 0 ... 127 to 1.0 ... 0.0) the result of the method described in Appendix A of RFC 6465?

@alvestrand
Copy link
Contributor

The intent wasn't to claim that it had to have this result by remapping (remappping through a 7-bit value loses precision when we don't have to). It was intended to say that this definition is the same as the definition for the input value to the RFC appendix.

@henbos
Copy link
Collaborator

henbos commented May 10, 2017

The referenced takes length number of samples of 0..MAX_VALUE (127 for 1 byte per sample), normalizes it to 0..1 and converts to the rms (root mean square) 0..1. It calculates db = 20 * log10(rms), clamps it between -127 and 0 and rounds it to an integer.

So... our audioLevel is the rms value? I get not wanting to use the sample value, 0..MAX_VALUE, as to be agnostic about the number of bytes used per sample, but why do we pick an audioLevel value that is the rms value? Why not the db value? We don't have to round it to an integer if we're afraid of loss of precision.

@henbos
Copy link
Collaborator

henbos commented May 10, 2017

Do you know the answer to my comment @taylor-b ?
Would it make sense to change audioLevel to the result of the level calculation without rounding?
audioLevel = -20 * log10(rms)
Admittedly I'm not an audio person. Maybe rms are amazing.

@taylor-b
Copy link
Contributor

I've wondered myself for a long time why it's not in dBov... Following the git history, I found it was defined this way to match "volume" in mediacapture-main... Which was introduced in a PR by Cullen... Which they decided to merge in this meeting: https://www.w3.org/2014/10/30-mediacap-minutes.html

... where it turns out it was just an arbitrary choice.

Anyway, I think the definition in mediacapture-main makes more sense; this one confused me for a while the first time I read it.

@taylor-b
Copy link
Contributor

Some other things that aren't clear to me about audioLevel:

  • Is it there for both sent and received tracks?
  • Is it a smoothed value, or does it represent the audio level of the last chunk of audio that the MediaStreamTrack produced?

@henbos
Copy link
Collaborator

henbos commented Oct 21, 2017

@hlundin can you help clarifying audioLevel? We're confused. :)

@taylor-b
Copy link
Contributor

I'd define it as "RMS of audio samples divided by the maximum encodable value". Is there a better term for these units?

And/or we can specifically point to the rms variable from appendix A of RFC6465.

@hlundin
Copy link

hlundin commented Oct 23, 2017

I found this definition:

The value is between 0..1 (linear), where 1.0 represents 0 dBov, 0 represents silence, and 0.5 represents approximately 6 dBSPL change in the sound pressure level from 0 dBov.

Does that make it clear? It says nothing about what chunk of audio we are measuring, and whether to use smoothing or not. I think that smoothing should not be used, but for the duration I have no good answer.

@taylor-b
Copy link
Contributor

@hlundin This is the definition we don't think is very clear. It's not obvious to me what "linear" means, or that the "approximately 6" value is the result of 20 * log (0.5).

@alvestrand
Copy link
Contributor

We have totalAudioEnergy defined now, which is computable. Is it possible to define audioLevel in terms of a computation over totalAudioEnergy?

I worry about audioLevel since it's inherently a "windowing" operator - instantaneous audioLevel doesn't make much sense at all. We could leave it implementation-defined, but that will give implementation variance. https://webrtc.github.io/samples/src/content/getusermedia/volume/ gives a JS implementation that shows two versions of exponential-decay calculation.

"Linear" kind of makes sense if you think that each halving of audio level should represent another 6 dBSPL change (0.25 = -12dB compared to 0 dBov). There are many other curves that could go through the other two defined points (0 dBov and -6 dBov) if the word "linear" is dropped. Suggestions for clearer description?

@taylor-b
Copy link
Contributor

We have totalAudioEnergy defined now, which is computable. Is it possible to define audioLevel in terms of a computation over totalAudioEnergy?

Is audioLevel's definition clear enough to answer that? I'm not sure if it's intended to represent the value from the last received packet, or from the last 10ms chunk of audio output, or something else. Chrome's implementation is "maximum sample value from last 10ms chunk of audio", which is almost certainly not the intention...

@henbos
Copy link
Collaborator

henbos commented Oct 27, 2017

Can I reassign to you Harald?

@henbos henbos assigned alvestrand and unassigned henbos Oct 27, 2017
@alvestrand
Copy link
Contributor

Let's see if #288 (where I claim that it's an average over an implementation-defined interval) makes sense to you. Agree that current Chrome implementation doesn't.

@vr000m
Copy link
Contributor

vr000m commented Jan 22, 2018

Did #288 fix this issue? could we close this? There is an issue #271 that covers the pre-filter situation

@alvestrand
Copy link
Contributor

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants