-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 0195 audio descriptions #349
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work in progress @skynavga , thank you for this. I've added some constructive comments. In my view it is very nearly complete already.
If you would like me to draft a section explaining the mapping to a speech and web audio graph for the purpose of explaining the "presentation" model I can give that a go. It would be an annex most likely.
spec/ttml2.xml
Outdated
<div3 id="audio-style-attribute-gain"> | ||
<head>tta:gain</head> | ||
<p>The <att>tta:gain</att> attribute is used to specify an audio style property that | ||
determines a <emph>gain</emph> multiplier to be applied to the the sum of all active audio content during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be all active audio content, but the active audio content in the context of the element to which it applies. Indeed this is confirmed by the example and note below, which are exactly what I would expect.
<tr> | ||
<td><emph>Values:</emph></td> | ||
<td> | ||
<code><loc href="#style-value-percentage"><number></loc></code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Web Audio gain node this is a float. I don't see why we should not also make it a float here. Limiting to [-1,1] seems unnecessarily limiting.
I would like to add that we base the semantics of this on the GainNode as linked above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having checked how we handle floats, we just need to remove the [-1,1]
interval restriction because <number>
is already effectively a float.
</tr> | ||
</tbody> | ||
</table> | ||
<p>For the purpose of determining applicability of this audio style property, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually weird here isn't it?
then the computed value of the property associated with this attribute is clamped to this bounded interval.</p> | ||
<p>If the computed value of the property associated with this attribute is negative, then gain is set to | ||
the absolute value of the computed value and a phase inversion is applied.</p> | ||
<p>If gain is 0, then active audio content is fully muted. If gain is 1, then the amplitude of active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/active/the applicable
and this child <code>p</code> combines a second active audio track to form the output to its children. | ||
Furthermore, distinct gains are specified on each source audio as well as on the output of <code>p</code>, | ||
such that the final output is <code>0.3[0.5(track1) + 0.8(track2)]</code>.</p> | ||
</note> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good example and description - this is exactly what I was expecting/intending. It shows why gain needs to be applicable to p also.
</table> | ||
<p>For the purpose of determining applicability of this audio style property, | ||
each character child of a <el>p</el> element is considered to be enclosed in an anonymous | ||
span.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again not sure what value this adds here.
<p>For the purpose of determining applicability of this audio style property, | ||
each character child of a <el>p</el> element is considered to be enclosed in an anonymous | ||
span.</p> | ||
<p>If the specified value of this attribute is not contained in the interval <code>[-1,1]</code>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to state that we base the semantics for this on StereoPanner. This deals with how many input channels are processed and how many output channels there are, i.e. two in each case, with a requirement to up- or down-mix the input to 2 channels if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been done now.
The <code>div</code> element provides one active audio track as an output to its child <code>p</code>, | ||
and this child <code>p</code> combines a second active audio track to form the output to its children. | ||
Furthermore, distinct pans are specified on each source audio as well as on the output of <code>p</code>, | ||
such that the final output pan is <code>0.3[0.5(track1) + 0.8(track2)]</code>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the example and the derivation note, however the last part is really unclear - what does the mathematical expression mean here? I think we need to understand the equivalent resulting positions of track1 and track2.
</div3> | ||
<div3 id="audio-style-attribute-pitch"> | ||
<head>tta:pitch</head> | ||
<p>The <att>tta:pitch</att> attribute is used to specify an audio style property that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we reference SSML for this - specifically the reference is to §3.2.4, the SSML prosody element's pitch
attribute.
spec/ttml2.xml
Outdated
</tbody> | ||
</table> | ||
<note role="derivation"> | ||
<p>The semantics of the style property represented by this attribute are based upon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically the presence of a tta:speak
attribute produces the semantic of the SSML §3.1.1 speak
element whose p/s
contents are the span's character content with a SSML §3.2.4 prosody
element whose rate
attribute is set to the tta:speak
attribute's value.
while still excluding pitch and speak from embedded audio elements.
* Make `tta:gain` an unconstrained `<number>` * `tta:gain` and `tta:pan` apply additionally to `p` and `div` elements. * Change “[all] active audio” to “applicable audio” in advance of adding a normative section on audio processing semantics.
* Add a term for “audio generating element” * Clarify the semantic derivation of gain and pan as they relate do audio generating elements vs non audio generating elements * Add reference to the WD of WebAudio with an Editorial Note to indicate the basis in doing so is the expectation that WebAudio will be a Recommendation prior to TTML2, otherwise we will need to refactor.
In line with spec modification, remove the constraint restricting `tta:gain` values - they are permitted to be any number from -infinity to +infinity.
Issue 0195 add ad xsds
spec/ttml2.xml
Outdated
<p>If the computed value of the property associated with this attribute is negative, then gain is set to | ||
the absolute value of the computed value and a phase inversion is applied.</p> | ||
<p>If gain is 0, then the applicable audio content is fully muted. If gain is 1, then the amplitude of | ||
the applicable audio content is not modified.</p> | ||
<p>If gain is 0, then active audio content is fully muted. If gain is 1, then the amplitude of active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skynavga why restrict the gain to the range [-1,1]
when gains greater than 1 are reasonable things to apply, and are supported e.g. by GainNode
?
See #195.
This is a preliminary PR for enhancements to audio functionality in order to more fully specify audio related behavior as well as add missing functionality needed to support common audio usage scenarios, such as text to speech and audio descriptions. Additional work in this branch is required prior to merger.