Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tile encoding #1126

Merged
merged 65 commits into from Apr 20, 2019

Conversation

Projects
None yet
6 participants
@rom1v
Copy link
Collaborator

commented Mar 18, 2019

(description updated on 16 april 2019)

This PR implements tile encoding (#631).

  1. The (many) first commits introduce tiling structures, which allow to expose simultaneous tiled regions of the whole frame data.
  2. Following commits use the tiling structures where necessary in the whole codebase.
  3. Then, command line arguments are added, the encoder encodes tiles separately, and separate tiles are written to the bitstream.
  4. Finally, parallelization is enabled (spoiler: 1 line of code).

Context

Encoding a frame first involves frame-wise accesses (initialization, etc.), then tile-wise accesses (to encode tiles in parallel), then frame-wise accesses using the results of tile-encoding (deblocking, cdef, …):

                                \
      +----------------+         |
      |                |         |
      |                |         |  Frame-wise accesses
      |                |          >
      |                |         |   - FrameState<T>
      |                |         |   - Frame<T>
      +----------------+         |   - Plane<T>
                                /    - ...

              ||   tiling views
              \/
                                \
  +---+  +---+  +---+  +---+     |
  |   |  |   |  |   |  |   |     |  Tile encoding (possibly in parallel)
  +---+  +---+  +---+  +---+     |
                                 |
  +---+  +---+  +---+  +---+     |  Tile-wise accesses
  |   |  |   |  |   |  |   |      >
  +---+  +---+  +---+  +---+     |   - TileStateMut<'_, T>
                                 |   - TileMut<'_, T>
  +---+  +---+  +---+  +---+     |   - PlaneRegionMut<'_, T>
  |   |  |   |  |   |  |   |     |
  +---+  +---+  +---+  +---+     |
                                /

              ||   vanishing of tiling views
              \/
                                \
      +----------------+         |
      |                |         |
      |                |         |  Frame-wise accesses
      |                |          >
      |                |         |  (deblocking, CDEF, ...)
      |                |         |
      +----------------+         |
                                /

Tiling

As you know, in Rust, it is not sufficient not to read/write the same memory from several threads, it must be impossible to write (safe) code that could do it. More precisely, a mutable reference may not alias any other reference to the same memory.

That's the reason why, as a preliminary step, I replaced accesses using the whole plane as a raw slice in addition to the stride information by PlaneSlice (#1035) and PlaneMutSlice (#1043).

But Plane(Mut)Slice still borrows the whole plane slice, so it does not, in itself, solves the problem.

There are several structures to be tiled, which form a tree:

 +- FrameState → TileState
 |  +- Frame → Tile
 |  |  +- Plane → PlaneRegion 
 |  +  RestorationState → TileRestorationState
 |  |  +- RestorationPlane → TileRestorationPlane
 |  |     +- FrameRestorationUnits → TileRestorationUnits
 |  +  FrameMotionVectors → TileMotionVectors
 +- FrameBlocks → TileBlocks

Most of them exist both in const and mutable version (e.g. PlaneRegion and PlaneRegionMut).

Tiling structures

PlaneRegion

This is a view of bounded region of a Plane. It is similar to PlaneSlice, except that it does not borrow the whole underlying raw slice. That way, it is possible to get several non-overlapping regions simultaneously.

In the end, we should probably merge it with PlaneSlice, but it requires more work because some frame-wise code still uses PlaneSlice in the code base.

It is possible to retrieve a subregion of a region (which may not exceed its parent). In theory, a subregion is defined by a rectangle (for example: x, y, width, height), but in practice, we need more flexibility. For example, we often need to retrieve a region from an offset, using the same bottom-right corner as its parent without providing width and height.

For that purpose, I propose a specific Area structure (actually, a Rust enum) to describe subregion bounds. Here are some usage examples:

let region = plane.region(Area::Rect { x: 32, y: 32, width: 512, height: 512 });

// the area is relative to the parent region
let subregion = region.subregion(Area::StartingAt { x: 128, y: 128 });
// it is equivalent to
let subregion = region.subregion(Area::Rect { x: 128, y: 128, width: 384, height: 384 });
// or
let subregion = plane.region(Area:: Rect { x: 160, y: 160, width: 384, height: 384 });

Retrieving a subregion from a BlockOffset is so common accross the code base that I decided to expose it directly:

let bo = BlockOffset { x: 2, y: 3 };
let subregion = region.subregion(Area::BlockStartingAt { bo });

Like Plane(Mut)Slice, it provides operator[] and iterators over its rows:

let row5 = &region[5];
let value = region[3][4];
for row in region.rows_iter() {
    let _first_four_values = &row[..4];
}

The mutable versions of the structure (PlaneRegionMut) and methods are also provided.

Tile

A Tile is a view of 3 colocated plane regions (Tile is to a PlaneRegion as a Frame is to a Plane).

The mutable version (TileMut) is also provided.

TileState

The way the FrameState fields are mapped in TileState depends on how they are accessed tile-wise and frame-wise.

Some fields (like qc) are only used during tile-encoding, so they are only stored in TileState.

Some other fields (like input or segmentation) are not written tile-wise, so they just reference the matching field in FrameState.

Some others (like rec) are written tile-wise, but must be accessible frame-wise once the tile views vanish (e.g. for deblocking).

It contains 2 tiled views: TileRestorationState and a vector of TileMotionVectorsMut (a tiled view of FrameMotionVectors).

This structure is only provided as mutable (TileStateMut). A const version is not necessary, and would require to instantiate a const version of all its embedded tiled views.

TileBlocks

TileBlocks is a tiled view of FrameBlocks. It exposes the blocks associated to the tile.

The mutable version (TileBlocksMut) is also provided.

Splitting into tiles

A TilingInfo structure computes all the details about tiling from the frame width and height and the (log2 of the) number of tile columns and rows. The details are accessible for initializing data or writing into the bitstream.

It provides an iterator over tiles (yielding one TileStateMut and one TileBlocksMut for each tile).

Frame offsets vs tile offsets

In encode_tile(), super-block, block and plane offsets are expressed relative to the tile. The tiling views expose its data relative to the tile:

  • plane_region[y][x] is pixel (x, y) relative to the plane region,
  • tile_blocks[boy][box] contains the Block at (box, boy) relative to the tile,

TileStateMut exposes some references to frame-level data stored in FrameState:

  • input is a reference to the whole frame,
  • input_hres and input_qres are references to the whole planes.

When accessing these frame-level data, tile offsets are converted to frame offsets, for example by:

let frame_bo = ts.to_frame_block_offset(bo);

Current state

It works.

Need more tests and reviews.

Usage

Pass the requested log2 number of tiles, with --tile-cols-log2 and --tile-rows-log2. For example, to request 2x2 tiles:

rav1e video.y4m -o video.ivf --tile-cols-log2 1 --tile-rows-log2 1

Currently, the number of tiles is passed in log2 (like in libaom, even if the aomenc options are called --tile-columns and --tile-rows), to avoid any confusion. Maybe we could find a correct user-friendly option later.

Note that the actual number of tiles may be smaller (e.g. if the image size has fewer super-blocks).

@rom1v rom1v force-pushed the rom1v:tiling branch 2 times, most recently from 294daf7 to ae0a873 Mar 18, 2019

@rom1v rom1v referenced this pull request Mar 18, 2019

Closed

[WIP] Tiling #821

@coveralls

This comment has been minimized.

Copy link

commented Mar 18, 2019

Coverage Status

Coverage increased (+6.8%) to 82.605% when pulling b52795e on rom1v:tiling into b9fef7c on xiph:master.

rom1v added a commit to rom1v/rav1e that referenced this pull request Mar 19, 2019

Add struct FrameMotionVectors
The motion vectors were stored as a Vec<Vec<MotionVector>>.

The innermost Vec contains a flatten matrix (fi.w_in_b x fi.h_in_b) of
MotionVectors, and there are REF_FRAMES instances of them (the outermost
Vec).

Introduce a typed structure to replace the innermost Vec:
 - this improves readability;
 - this allows to expose it as a 2D array, thanks to Index and IndexMut
   traits;
 - this will allow to split it into (non-overlapping) tiled views,
   containing only the motion vectors for a bounded region of the plane
   (see <xiph#1126>).

rom1v added a commit to rom1v/rav1e that referenced this pull request Mar 19, 2019

Add struct FrameMotionVectors
The motion vectors were stored in a Vec<Vec<MotionVector>>.

The innermost Vec contains a flatten matrix (fi.w_in_b x fi.h_in_b) of
MotionVectors, and there are REF_FRAMES instances of them (the outermost
Vec).

Introduce a typed structure to replace the innermost Vec:
 - this improves readability;
 - this allows to expose it as a 2D array, thanks to Index and IndexMut
   traits;
 - this will allow to split it into (non-overlapping) tiled views,
   containing only the motion vectors for a bounded region of the plane
   (see <xiph#1126>).

lu-zero added a commit that referenced this pull request Mar 19, 2019

Add struct FrameMotionVectors
The motion vectors were stored in a Vec<Vec<MotionVector>>.

The innermost Vec contains a flatten matrix (fi.w_in_b x fi.h_in_b) of
MotionVectors, and there are REF_FRAMES instances of them (the outermost
Vec).

Introduce a typed structure to replace the innermost Vec:
 - this improves readability;
 - this allows to expose it as a 2D array, thanks to Index and IndexMut
   traits;
 - this will allow to split it into (non-overlapping) tiled views,
   containing only the motion vectors for a bounded region of the plane
   (see <#1126>).

@rom1v rom1v force-pushed the rom1v:tiling branch 2 times, most recently from 61d2d74 to 274478b Mar 19, 2019

@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 19, 2019

I added #[derive(Copy)] to BlockOffset, and used BlockOffset by value in the Area enum. Using a reference would add a lifetime parameter (Area<'_>), which would be overkill here.

But since I added Copy, clippy warns about BlockOffset everywhere:

error: this argument is passed by reference, but would be more efficient if passed by value
    --> src/partition.rs:1234:19
     |
1234 | pub fn has_tr(bo: &BlockOffset, bsize: BlockSize) -> bool {
     |                   ^^^^^^^^^^^^ help: consider passing by value instead: `BlockOffset`
     |
     = note: `-D clippy::trivially-copy-pass-by-ref` implied by `-D warnings`
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref

I think we should replace all &BlockOffset to BlockOffset in function parameters. Do you agree that I do that on master?

rom1v added a commit to rom1v/rav1e that referenced this pull request Mar 20, 2019

Make BlockOffset derive Copy
BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <xiph#1126 (comment)>.

rom1v added a commit to rom1v/rav1e that referenced this pull request Mar 20, 2019

Make BlockOffset derive Copy
BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <xiph#1126 (comment)>.

rom1v added a commit to rom1v/rav1e that referenced this pull request Mar 20, 2019

Make BlockOffset derive Copy
BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <xiph#1126 (comment)>.

lu-zero added a commit that referenced this pull request Mar 20, 2019

Make BlockOffset derive Copy
BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <#1126 (comment)>.

aditj added a commit to aditj/rav1e that referenced this pull request Mar 21, 2019

Build in debug mode instead of release
Switched the parameters of the get_sad() function
ps://rust-lang.github.io/rust-clippy/master/index.html#neg_multiply

Use a procedural macro instead a local copy of arg_enum

Make cargo doc happier and make crav1e not depend on compiler
unstable feature.

Remove mutability from input plane for CDEF functions

Restore CDEF-disabled paths for RDO

Add speed setting for CDEF

Teach --speed-test=baseline to set SpeedSettings::default()

Many of the settings change nothing at the default speed.

Use cargo fetch to generate the Cargo.lock needed by kcov

Split long running tests and add a feature flag to avoid high dimension

Parallel build for dependencies on Travis

Also, check out the v1.0.0-errata1 tag of libaom.

Build test separately before running kcov

This reduces the odds of a Travis timeout.

Rewrite the high_bit_depth and chroma_sampling tests

Apparently cbindgen has problems parsing them in the former rendition.

Add width and height as parsable parameters

Use .iter() over the plane data

Preliminary to use a different backing storage.

Drop macro_use for interpolate_name

Use a Box<[T]> as storage for Plane

Add PlaneData

It acts a aligned memory wrapper.

Fixes xiph#1101

Derive Layout on demand in PlaneData

Align PlaneData to 32 bytes on Windows

Re-enable building assembly files on Windows.

This doesn't actually call the assembly functdions yet.

Unbreak Context::container_sequence_header()

Remove fake-genericity from sad functions
he functions sad_ssse3() and sad_sse2() only support u16 and u8
respectively, so they are not generic.

Make the caller pass the expected type.

Rename sad_ssse3() to sad_hbd_ssse3()

<xiph#1092 (comment)>

Suggested-by: David Michael Barr <b@rr-dav.id.au>

Add a copy_from_raw_u8 test

Use sccache in CI scripts (xiph#1110)

* Extract archives in parallel with download
* Fetch sccache binary release
* Use sccache for C and C++ dependencies
* Limit sccache size to 500M
* Use CI generic cache to store compiler cache

api: Drop parse() function from Config

This function is not needed in rust and it is mostly a convenience for
other languages. Instead move this chunk in the appropriate bindings.

Fix get_sad() tests and benches

The function get_sad() was called with block width and block height
parameters swapped.

As a consequence, in tests, associate the precomputed SAD values to the
transposed block size.

Call assembly functions on Windows.

Retrieve dimensions from plane_cfg

To compute the number of pixels available in the top-right and
bottom-left edges, get_intra_edges() received frame_w_in_b (MiCols) and
frame_h_in_b (MiRows) as parameters, initialized as follow:

    MiCols = 2 * ( ( FrameWidth + 7 ) >> 3 )
    MiRows = 2 * ( ( FrameHeight + 7 ) >> 3 )

<https://aomediacodec.github.io/av1-spec/#compute-image-size-function>

The sizes computed by get_intra_edges() were basically the frame
dimensions rounded up to the next multiple of 8, decimated:

    (MI_SIZE >> plane_cfg.xdec) * frame_w_in_b
    (MI_SIZE >> plane_cfg.ydec) * frame_h_in_b

But in Frame::new(), the luma plane dimensions are also initialized with
the frame dimensions rounded up to the next multiple of 8. Therefore, it
is equivalent to directly use the plane dimensions.

Avoid superfluous memset in forward transforms

Avoid superfluous memset in write_coeffs_lv_map

Move motion_estimation to a trait

And keep the actual code as default trait

Move full pixel me in a separate function

Move the specific full_pixel_me impl where they belong

Move the specific sub_pixel_me impl where they belong

Disable prep_8tap assembly.

Temporarily fixes xiph#1115.

Cast before left shift in native prep_8tap

Enable prep_8tap assembly

Enable the Clippy's manual_memcpy lint (xiph#1122)

https://rust-lang.github.io/rust-clippy/master/index.html#manual_memcpy

Inline often called and almost-trivial functions (xiph#1124)

* Inline constrain and msb for cdef_filter_block
  This reduces its average time by around 42%.
* Inline round_shift for pred_directional and others
  This reduces its average time by around 10%.
* Inline sgrproj_sum_finish to its various callers
  It is at the lowest level of a hot call graph and almost trivial.
* Inline get_mv_rate in motion estimation
  It is almost trivial and called often.

Enable the Clippy's if_same_then_else lint

https://rust-lang.github.io/rust-clippy/master/index.html#if_same_then_else

Add struct FrameMotionVectors

The motion vectors were stored in a Vec<Vec<MotionVector>>.

The innermost Vec contains a flatten matrix (fi.w_in_b x fi.h_in_b) of
MotionVectors, and there are REF_FRAMES instances of them (the outermost
Vec).

Introduce a typed structure to replace the innermost Vec:
 - this improves readability;
 - this allows to expose it as a 2D array, thanks to Index and IndexMut
   traits;
 - this will allow to split it into (non-overlapping) tiled views,
   containing only the motion vectors for a bounded region of the plane
   (see <xiph#1126>).

Enable the Clippy's len_zero lint (xiph#1128)

https://rust-lang.github.io/rust-clippy/master/index.html#len_zero

diamond_me: save only selected frame motion vectors

Save them by reference frame types instead of picture slot.
Do not add several times the zero motion vector to the predictor list.

Use diamond search for the half resolution motion estimation

estimate_motion_ss2: include it in the MotionEstimation trait

Make BlockOffset derive Copy

BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <xiph#1126 (comment)>.

Make SuperBlockOffset derive Copy

Like previous commit did for BlockOffset.

Make PlaneOffset derive Copy

Like previous commits did for BlockOffset and SuperBlockOffset.

Set timeout for cargo kcov to 20 minutes.

Do not pass both BlockOffset and PlaneOffset

In motion estimation, several functions received both the offset
expressed in blocks and in pixels for the luma plane. This information
is redundant: a block offset is trivially convertible to a luma plane
offset.

With tiling, we need to manage both absolute offsets (relative to the
frame) and offsets relative to the current tile. This will be more
simple without duplication.

@rom1v rom1v force-pushed the rom1v:tiling branch 3 times, most recently from f272325 to 320377f Mar 22, 2019

@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2019

tiling

At least, we can see the tiles :trollface:

EDIT: another one with wrong chroma:
tiling2

and with wrong luma:
tiling3

@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 5, 2019

Here is my first working tiled (2×2) video encoded with rav1e:
bbb_tiled3.mp4 7919c917bd8769ef29ef108553deec55caee88becf1cb65b8acb1a649dd89ef6
bbb_tiled3.ivf 73fba3cbd3beb4b811f05a8d2daf5ba5169a29299a2d1d44d60ad8b5ad922116

From this version: https://github.com/rom1v/rav1e/commits/tiling.100

ffmpeg -i big_buck_bunny_720p.mp4 -ss 10 -t 10 bbb.y4m
target/release/rav1e bbb.y4m --tile-rows-log2 1 --tile-cols-log2 1 -o bbb_tiled3.ivf
ffmpeg -i bbb_tiled3.ivf -c:v copy bbb_tiled3.mp4

@rom1v rom1v force-pushed the rom1v:tiling branch from 320377f to 16b1aef Apr 5, 2019

@rom1v rom1v changed the title [WiP] Tiling structures [WiP] Tiling Apr 5, 2019

@rom1v rom1v force-pushed the rom1v:tiling branch 2 times, most recently from 5f85599 to 59f09d0 Apr 5, 2019

@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 6, 2019

1 tile vs 4 tiles: https://beta.arewecompressedyet.com/?job=%402019-04-05T21%3A29%3A59.430Z_ref_1_tile&job=%402019-04-05T21%3A28%3A52.445Z_4_tiles

Of course, there is a cost in quality (SSIM, PSNR…) because it loses the possibility to exploit some redundancy across tiles in the same frame. For now, the current version only saves CDF from tile 0 (it should choose the bigger tile in bytes instead), and always store tile sizes on 4 bytes. It can (and will) be improved.

The encoding time is worse with tiling on AWCY because I think that it uses only 1 core per instance.

@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 6, 2019

Unfortunately (but as expected), the tiling structures are not a zero cost abstraction. They add an overhead in encoding time between 1~3%.

Concretely, if we compare the version compiled from master branch with the version compiled from this pull request, and encode a video without tiling (1 tile), the latter takes a bit more time to encode. Said otherwise, we pay a (small) cost for tile encoding even if we don't use it.

As an example, on my laptop, an encoding takes 3mn33,245 on master and 3mn35,722 on tiling (it's just an example, but the difference is quite stable).

You can compare encoding times on AWCY for 1 tile: https://beta.arewecompressedyet.com/?job=master-70005e353aa8ce21e3ecd257c927f71d4012a117&job=%402019-04-05T21%3A29%3A59.430Z_ref_1_tile

Maybe some work could be done to minimize this overhead (for example using more #[inline(always)]), but I think the overhead cannot be removed (within a memory-safe language).

EDIT: now there is no overhead (even a negative overhead with more inlines than on master) 🎉

@rom1v rom1v force-pushed the rom1v:tiling branch from 59f09d0 to 75ec5cd Apr 6, 2019

@tdaede

tdaede approved these changes Apr 18, 2019

@ycho

ycho approved these changes Apr 19, 2019

Copy link
Collaborator

left a comment

I think it is ready to land now.
Thank you very much for the great work!

As I observed, BD Rate change by this PR is:
At speed 0, low_latency=true (Sine low_latency=false seems not working correctly)
1 tile -> 2x2=4 tiles:
AWCY link

PSNR PSNR Cb PSNR Cr PSNR HVS SSIM MS SSIM CIEDE 2000
2.8320 1.7719 2.2092 2.7641 2.8205 2.7920 2.4722

1 tile -> 4x2 (col x row) tiles:
AWCY link

PSNR PSNR Cb PSNR Cr PSNR HVS SSIM MS SSIM CIEDE 2000
5.2285 3.8110 4.2109 5.0781 5.2115 5.1621 4.7286
@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 19, 2019

(Since low_latency=false seems not working correctly)

Actually, in absolute, --low_latency=false gives better objective quality results. But relatively, tiling cause less loss with --low_latency=true: AWCY.

I don't know why.

rom1v and others added some commits Apr 18, 2019

Remove unused members from RestorationPlane
They have been implemented in TileRestorationPlane instead.
Add command line arguments for tiling
Add --tile-cols-log2 and --tile-rows-log2 to configure tiling.

This configuration is made available in FrameInvariants.
Add tiling info to FrameInvariants
Compute the tiling information and make it accessible from
FrameInvariants.
Encode tiles provided by iterator
Encode the tiles from each tile context provided by the TilingInfo tile
iterator.
Add write_le() to BitWriter
To write the bitstream, a big-endian BitWriter is used. However, some
values need to be written in little-endian (le(n) in AV1 specs).

A method write_uleb128() was already present. Add a new one to write
little-endian values: write_le(bytes, value).
Write tile info to bitstream
Correctly write the bitstream if there are several tiles:
<https://aomediacodec.github.io/av1-spec/#tile-info-syntax>
Prepare iterator for Rayon
Collect the context and CDFs in an intermediate vector, so that it can
be iterated in parallel with Rayon.
Parallelize tile encoding
Use par_iter_mut() from Rayon to call encode_tile() for each tile
context in parallel.
Merge RDO trackers from tiles into FrameState
Tile RDO trackers results need to be aggregated at frame level.
Use the biggest tile for CDF update
Use the tile that takes the largest number of bytes for CDF update. It
should be better for entropy coding.
Optimize tile size bytes in bitstream
The tile size may be encoded using 1, 2, 3 or 4 bytes. For simplicity,
it always used 4 bytes.

Instead, use the number of bytes required by the biggest tile.
Fix crash when enable_cdef is false
The region may be smaller than the lrf_input plane. In that case,
&rec[..width] panic!ed.
Remove outdated comment
The offsets are relatives to the tile, so find_valid_row_offs() behavior
does not change with tiling.
Expose frame blocks size in TileBlocks
We will need the blocks size at frame-level to clamp motion vectors.
Clamp motion vectors at frame level
This fixes bitstream corruption!

Lost hours here: many.
Add tiling parameters to decode_tests
This will allow to add tile encoding tests.
Add tile encoding test
Add a decode_test with size such as it uses stretched restoration units.

See <#631 (comment)>.
Reduce overhead of tiling abstraction
The tail call confuses the compiler, preventing inlining.
Inline all TileBlocks methods
The method set_block_size() have been declared inlined after profiling.
Also inline the others setters.

@rom1v rom1v force-pushed the rom1v:tiling branch from a90e9ca to b52795e Apr 19, 2019

@ycho

This comment has been minimized.

Copy link
Collaborator

commented Apr 19, 2019

Actually, in absolute, --low_latency=false gives better objective quality results.

Sure, far better (> 10%), that is why we wanted to have pyramid and/or frame-reordering.

@tdaede tdaede merged commit d7599d6 into xiph:master Apr 20, 2019

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+6.8%) to 82.605%
Details

@rom1v rom1v referenced this pull request Apr 24, 2019

Closed

Add tiles support #631

0 of 6 tasks complete
@rom1v

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

I published a blog post about this feature: https://blog.rom1v.com/2019/04/implementing-tile-encoding-in-rav1e/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.