Skip to content

Commit

Permalink
Add detailed design discussion for codec configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
steveanton committed Sep 10, 2019
1 parent d6a55a4 commit daccbea
Show file tree
Hide file tree
Showing 2 changed files with 209 additions and 102 deletions.
59 changes: 59 additions & 0 deletions explainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,65 @@ input.readable.pipeInto(demuxer.writable);
muxer.readable.pipeInto(output.writable);
```

## Detailed design discussion

### Codec configuration

Many codecs and encoder/decoder implementations are highly configurable. WebCodecs intends to support most of the configuration options available in codecs today to efficiently allow for advanced use cases.

Configuration options are classified into two types:
- **Parameters** are metadata required to construct a compliant bitstream. These are required when constructing the encoder/decoder and cannot be changed. For example, the VP9 profile.
- **Settings** are configuration options that influence the behavior of the encoder but do not change the type of bitstream produced. For example, target bitrate.

Settings are further classified into three types:
- **Static codec settings** must be specified when constructing the encoder and cannot be changed without reinitializing the encoder.
- **Dynamic codec settings** apply to the lifetime of the encoder and can be changed at any point. Dynamic settings must be lightweight and not require reinitializing the encoder (concretely, does not require a new key frame to be produced).
- **Frame settings** apply only to specific input frames.

WebCodecs will maintain a standard definition of parameters for each supported codec. Additionally, the specification will establish common encoder settings that apply across codecs and implementation. However, we expect many settings will be implementation-specific. These will be available behind a feature detection and configuration API (TODO: sketch this).

#### Configuration examples

Both encoder and decoder constructors take in the codec name and required parameters. Encoders additionally take in a dictionary of codec settings.

```javascript
const encoder = new VideoEncoder({
codec: 'VP9',
profile: '1',
settings: {
targetBitRate: 80_000,
},
});
```

Codec settings can be changed on-the-fly by bundling the changed settings with the next input image. The changed settings will be applied before encoding the image and apply to subsequent images.

```javascript
const encoder = new VideoEncoder(...);
const writer = encoder.writable.getWriter();
writer.write({
imageData: ...,
timestamp: ...,
changeCodecSettings: {
targetBitRate: 50_000,
},
});
```

Frame settings are also bundled with the next input image. These settings do not persist beyond encoding for the image on which they appear.

```javascript
const encoder = new VideoEncoder(...);
const writer = encoder.writable.getWriter();
writer.write({
imageData: ...,
timestamp: ...,
frameSettings: {
forceKeyFrame: true,
},
});
```

## Alternative designs considered

Media Source Extensions (MSE) is already used widely for low-latency streaming. However, there are some problems:
Expand Down
252 changes: 150 additions & 102 deletions webidl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,63 @@
// TODO(when writing spec):
// - Specify that encoding and decoding must happen off the main thread.

[Constructor(MediaStreamTrack track)]
interface AudioTrackReader {
readonly attribute ReadableStream readable; // of DecodedAudioPacket
// Common definitions used for both audio and video.

[Constructor(unsigned long long value, unsigned long long scale)]
interface MediaTime {
readonly attribute unsigned long long value;
readonly attribute unsigned long long scale;
}

interface DecodedAudioPacket {
readonly attribute MediaTime timestamp;
// Sample count == duration.value
// Sample rate == duration.scale
readonly attribute MediaTime duration;
readonly attribute unsigned long channelCount

// Audio encoder and decoder interfaces.

dictionary AudioCodecParameters {
DOMString codec;

// Defaults are codec-specific
unsigned long? sampleRate;
unsigned long? channelCount;
}

[Constructor(AudioDecoderInit init)]
interface AudioDecoder {
readonly attribute WritableStream writable; // AudioDecoderInput
readonly attribute ReadableStream readable; // AudioDecoderOutput
}

[Constructor(AudioEncoderParams params)]
[Constructor(AudioEncoderInit init)]
interface AudioEncoder {
void setParameters(AudioEncoderParams params);
readonly attribute WritableStream writable; // DecodedAudioPacket
readonly attribute ReadableStream readable; // EncodedAudioPacket
readonly attribute WritableStream writable; // AudioEncoderInput
readonly attribute ReadableStream readable; // AudioEncoderOutput
}

dictionary AudioEncoderParams {
DOMString mimeType;
dictionary AudioDecoderInit : AudioCodecParameters {
// Optional byte data required to initialize audio decoders
// such as Vorbis codebooks.
BufferSource? extraData;
// Duration decoder must decode before the decoded data is valid
MediaTime? seekPreRoll;
// Duration decoder should discard before returning decoded data.
// Can include both decoder delay as well as padding added during
// encoding.
MediaTime? codecDelay;
}

dictionary AudioDecoderInput {
Uint8Array data;
MediaTime timestamp;
}

dictionary AudioDecoderOutput {
AudioBuffer buffer;
// TODO: decode stats.
}

dictionary AudioEncoderStaticSettings {
}

dictionary AudioEncoderDynamicSettings {
// not supported by all codecs
// null/unset means use the codec default
unsigned long? bitsPerSecond;
Expand All @@ -39,83 +73,74 @@ dictionary AudioEncoderParams {
bool dtx = false; // enabled or not
bool cbr = false; // cbr or not (vbr if not)
bool speechMode = false; // speech-specific mode or not
}
}

[Constructor(BufferSource data, MediaTime timestamp)]
interface EncodedAudioPacket {
readonly attribute MediaTime timestamp;
readonly attribute Uint8Array data;
dictionary AudioEncoderSettings : AudioEncoderStaticSettings, AudioEncoderDynamicSettings {
}

[Constructor(AudioDecoderParams params)]
interface AudioDecoder {
readonly attribute WritableStream writable; // EncodedAudioPacket
readonly attribute ReadableStream readable; // DecodedAudioPacket
attribute EventHandler onerror;
dictionary AudioEncoderInit : AudioCodecParameters {
AudioEncoderSettings? settings;
}

dictionary AudioDecoderParams {
DOMString codec; // For example, "opus"
dictionary AudioEncoderInput {
MediaTime timestamp;
ArrayBuffer buffer;
AudioEncoderDynamicSettings changeCodecSettings;
}

// Defaults are codec-specific
unsigned long? sampleRate;
unsigned long? channelCount;
dictionary AudioEncoderOutput {
Uint8Array data;
MediaTime timestamp;
// TODO: encode stats.
}

// Optional byte data required to initialize audio decoders
// such as Vorbis codebooks.
BufferSource? extraData;
// Duration decoder must decode before the decoded data is valid
MediaTime? seekPreRoll;
// Duration decoder should discard before returning decoded data.
// Can include both decoder delay as well as padding added during
// encoding.
MediaTime? codecDelay;
}

[Constructor()]
interface AudioTrackWriter {
readonly attribute WritableStream writable; // of DecodedAudioPacket
readonly attribute MediaStreamTrack track;
// Video encoder and decoder interfaces.

interface VideoFrame {
readonly attribute MediaTime timestamp;
readonly attribute ImageData imageData;
}

dictionary VideoCodecParameters {
DOMString codec;

[Constructor(MediaStreamTrack track)]
interface VideoTrackReader {
readonly attribute ReadableStream readable; // of DecodedVideoFrame
// For VP9:
DOMString? profile;
}

interface DecodedVideoFrame {
readonly attribute MediaTime timestamp;
readonly attribute ImageData imageData;
[Constructor(VideoDecoderInit init)]
interface VideoDecoder {
readonly attribute WritableStream writable; // VideoDecoderInput
readonly attribute ReadableStream readable; // VideoDecoderOutput
}

[Constructor(VideoEncoderParams params)]
[Constructor(VideoEncoderInit init)]
interface VideoEncoder {
void setParameters(VideoEncoderParams params);
void generateKeyFrame(optional sequence<DOMString> layerIds);
readonly attribute WritableStream writable; // DecodedVideoFrame
readonly attribute ReadableStream readable; // EncodedVideoFrame
attribute EventHandler onerror;
readonly attribute WritableStream writable; // VideoEncoderInput
readonly attribute ReadableStream readable; // VideoEncoderOutput
}

dictionary VideoEncoderParams {
// Cannot be changed once set
DOMString mimeType;
dictionary VideoDecoderInit : VideoCodecParameters {
// Optional byte data required to initialize video decoders
// such as H264 with SPS and PPS.
BufferSource? extraData;

// Can be used to initialize the encoder faster
// Can be used to initialize the decoder faster
// than waiting for the first frame
unsigned long? expectedWidth;
unsigned long? expectedHeight;

// unset/null means the encoder will pick
// target will be exceeded for key frames
unsigned long bitsPerSecond;

VideoEncodeContentMode contentMode;
unsigned long long? expectedWidth;
unsigned long long? expectedHeight;
}

sequence<VideoEncodeLayer> layers;
}
dictionary VideoDecoderInput {
Uint8Array data;
MediaTime timestamp;
}

dictionary VideoDecoderOutput {
VideoFrame frame;
// TODO: add decode stats.
}

enum VideoEncodeContentMode {
"screen" // For screen sharing/recording
Expand Down Expand Up @@ -148,51 +173,74 @@ dictionary VideoEncodeLayer {
unsigned long? bitsPerSecond;
}

[Constructor(BufferSource data, MediaTime timestamp)]
interface EncodedVideoFrame {
readonly attribute Uint8Array data;
readonly attribute MediaTime timestamp;
// Info provided as a result from the encoder
// Not needed as input to a decoder
readonly attribute VideoEncodeResult? encoded;
dictionary VideoEncoderStaticSettings {
// Can be used to initialize the encoder faster
// than waiting for the first frame
unsigned long? expectedWidth;
unsigned long? expectedHeight;

sequence<VideoEncodeLayer> layers;
}

dictionary VideoEncoderDynamicSettings {
// unset/null means the encoder will pick
// target will be exceeded for key frames
unsigned long long? targetBitRate;

VideoEncodeContentMode contentMode;
}

dictionary VideoEncoderSettings : VideoEncoderStaticCodecSettings, VideoEncoderDynamicCodecSettings {
}

dictionary VideoEncoderInit : VideoCodecParameters {
VideoEncoderSettings settings;
}

interface VideoEncodeResult {
dictionary VideoEncoderFrameSettings {
boolean? forceKeyFrame;
}

dictionary VideoEncoderInput {
VideoFrame frame;
VideoEncoderFrameSettings frameSettings;
VideoEncoderDynamicSettings changeCodecSettings;
}

dictionary VideoEncoderOutput {
Uint8Array data;
MediaTime timestamp;

// If using multiple layers, which layer is it?
readonly attribute DOMString? layerId;
DOMString? layerId;
// Whether or not it's a key frame meaning it depends on
// no other frames
readonly attribute bool keyFrame;
}
boolean keyFrame;

[Constructor(VideoDecoderParams params)]
interface VideoDecoder {
readonly attribute WritableStream writable; // EncodedVideoFrame
readonly attribute ReadableStream readable; // DecodedVideoFrame
attribute EventHandler onerror;
// TODO: per-frame encode stats.
}

dictionary VideoDecoderParams {
DOMString mimeType;

// Can be used to initialize the decoder faster
// than waiting for the first frame
unsigned long long? expectedWidth;
unsigned long long? expectedHeight;
// MediaStreamTrack integration.

// Optional byte data required to initialize video decoders
// such as H264 with SPS and PPS.
BufferSource? extraData;
}
[Constructor(MediaStreamTrack track)]
interface AudioTrackReader {
readonly attribute ReadableStream readable; // of DecodedAudioPacket
}

[Constructor()]
interface VideoTrackWriter {
readonly attribute WritableStream writable; // of DecodedVideoFrame
interface AudioTrackWriter {
readonly attribute WritableStream writable; // of DecodedAudioPacket
readonly attribute MediaStreamTrack track;
}

[Constructor(unsigned long long value, unsigned long long scale)]
interface MediaTime {
readonly attribute unsigned long long value;
readonly attribute unsigned long long scale;
[Constructor(MediaStreamTrack track)]
interface VideoTrackReader {
readonly attribute ReadableStream readable; // of VideoFrame
}

[Constructor()]
interface VideoTrackWriter {
readonly attribute WritableStream writable; // VideoFrame
readonly attribute MediaStreamTrack track;
}

0 comments on commit daccbea

Please sign in to comment.