Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 0195 audio descriptions #349

Merged
merged 23 commits into from
Jun 18, 2017
Merged

Conversation

skynavga
Copy link
Collaborator

@skynavga skynavga commented May 30, 2017

See #195.

This is a preliminary PR for enhancements to audio functionality in order to more fully specify audio related behavior as well as add missing functionality needed to support common audio usage scenarios, such as text to speech and audio descriptions. Additional work in this branch is required prior to merger.

Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work in progress @skynavga , thank you for this. I've added some constructive comments. In my view it is very nearly complete already.

If you would like me to draft a section explaining the mapping to a speech and web audio graph for the purpose of explaining the "presentation" model I can give that a go. It would be an annex most likely.

spec/ttml2.xml Outdated
<div3 id="audio-style-attribute-gain">
<head>tta:gain</head>
<p>The <att>tta:gain</att> attribute is used to specify an audio style property that
determines a <emph>gain</emph> multiplier to be applied to the the sum of all active audio content during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be all active audio content, but the active audio content in the context of the element to which it applies. Indeed this is confirmed by the example and note below, which are exactly what I would expect.

<tr>
<td><emph>Values:</emph></td>
<td>
<code><loc href="#style-value-percentage">&lt;number&gt;</loc></code>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Web Audio gain node this is a float. I don't see why we should not also make it a float here. Limiting to [-1,1] seems unnecessarily limiting.

I would like to add that we base the semantics of this on the GainNode as linked above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having checked how we handle floats, we just need to remove the [-1,1] interval restriction because <number> is already effectively a float.

</tr>
</tbody>
</table>
<p>For the purpose of determining applicability of this audio style property,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually weird here isn't it?

then the computed value of the property associated with this attribute is clamped to this bounded interval.</p>
<p>If the computed value of the property associated with this attribute is negative, then gain is set to
the absolute value of the computed value and a phase inversion is applied.</p>
<p>If gain is 0, then active audio content is fully muted. If gain is 1, then the amplitude of active
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/active/the applicable

and this child <code>p</code> combines a second active audio track to form the output to its children.
Furthermore, distinct gains are specified on each source audio as well as on the output of <code>p</code>,
such that the final output is <code>0.3[0.5(track1) + 0.8(track2)]</code>.</p>
</note>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example and description - this is exactly what I was expecting/intending. It shows why gain needs to be applicable to p also.

</table>
<p>For the purpose of determining applicability of this audio style property,
each character child of a <el>p</el> element is considered to be enclosed in an anonymous
span.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not sure what value this adds here.

<p>For the purpose of determining applicability of this audio style property,
each character child of a <el>p</el> element is considered to be enclosed in an anonymous
span.</p>
<p>If the specified value of this attribute is not contained in the interval <code>[-1,1]</code>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to state that we base the semantics for this on StereoPanner. This deals with how many input channels are processed and how many output channels there are, i.e. two in each case, with a requirement to up- or down-mix the input to 2 channels if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been done now.

The <code>div</code> element provides one active audio track as an output to its child <code>p</code>,
and this child <code>p</code> combines a second active audio track to form the output to its children.
Furthermore, distinct pans are specified on each source audio as well as on the output of <code>p</code>,
such that the final output pan is <code>0.3[0.5(track1) + 0.8(track2)]</code>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the example and the derivation note, however the last part is really unclear - what does the mathematical expression mean here? I think we need to understand the equivalent resulting positions of track1 and track2.

</div3>
<div3 id="audio-style-attribute-pitch">
<head>tta:pitch</head>
<p>The <att>tta:pitch</att> attribute is used to specify an audio style property that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we reference SSML for this - specifically the reference is to §3.2.4, the SSML prosody element's pitch attribute.

spec/ttml2.xml Outdated
</tbody>
</table>
<note role="derivation">
<p>The semantics of the style property represented by this attribute are based upon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically the presence of a tta:speak attribute produces the semantic of the SSML §3.1.1 speak element whose p/s contents are the span's character content with a SSML §3.2.4 prosody element whose rate attribute is set to the tta:speak attribute's value.

Tom Rosier and others added 12 commits June 13, 2017 17:23
while still excluding pitch and speak from embedded audio elements.
* Make `tta:gain` an unconstrained `<number>`
* `tta:gain` and `tta:pan` apply additionally to `p` and `div` elements.
* Change “[all] active audio” to “applicable audio” in advance of
adding a normative section on audio processing semantics.
* Add a term for “audio generating element”
* Clarify the semantic derivation of gain and pan as they relate do
audio generating elements vs non audio generating elements
* Add reference to the WD of WebAudio with an Editorial Note to
indicate the basis in doing so is the expectation that WebAudio will be
a Recommendation prior to TTML2, otherwise we will need to refactor.
In line with spec modification, remove the constraint restricting
`tta:gain` values - they are permitted to be any number from -infinity
to +infinity.
@nigelmegitt
Copy link
Contributor

I've generated pull request #393 into this branch which, if merged, will address my review comments. @skynavga I've requested your review - if it's okay please go ahead and merge it into here.

@skynavga skynavga merged commit f8f06de into gh-pages Jun 18, 2017
spec/ttml2.xml Outdated
<p>If the computed value of the property associated with this attribute is negative, then gain is set to
the absolute value of the computed value and a phase inversion is applied.</p>
<p>If gain is 0, then the applicable audio content is fully muted. If gain is 1, then the amplitude of
the applicable audio content is not modified.</p>
<p>If gain is 0, then active audio content is fully muted. If gain is 1, then the amplitude of active
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skynavga why restrict the gain to the range [-1,1] when gains greater than 1 are reasonable things to apply, and are supported e.g. by GainNode?

@skynavga skynavga deleted the issue-0195-audio-descriptions branch August 21, 2017 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants