-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tiles support #631
Comments
Currently, To support tiling, we must not be able to borrow memory outside the relevant region of the To avoid this problem, I suggest to create a We could then make As a consequence, it will not be possible to provide a slice covering the whole surface (multiple rows) and let the user use the stride information anymore, since this would borrow data possibly belonging to other regions (tiles). Therefore, all code using To be able to use a cursor ( After that, we can expose |
Here all the users according to |
EDIT: but in fact, |
Oh, the sample program (https://play.rust-lang.org/?version=stable&mode=debug&edition=2015) fn main() {
|
@rom1v, indeed it works with println! commented out. I tried below and worked!
|
As I explained in #836 and #848, I want a to split This split is recursive, the fields of
Each But I just noticed a problem: the restoration units may be "stretched" (is it the right word?) near frame boundaries. Let's take a concrete example:
Splitting the frame results in 9 tiles:
This is consistent with Tile info semantics:
However, splitting the restoration units vector results in 4 tiles, because restoration units on borders will be stretched: Lines 471 to 472 in ebea677
So units are assigned to different parts of the frame:
Therefore, all 9 tiles cannot be encoded independently, since adjacent tiles on frame borders will need to share their Any thoughts about this? |
I'd round up the frame so the tiles have exactly the same size. |
Stop loop rest units from straddling tile boundaries (Wed Oct 25 17:15:28 2017 +0100)
Does it mean that we (may) need more restoration units if we tile than if we don't? |
According to read_lr() in https://aomediacodec.github.io/av1-spec/#read-loop-restoration-syntax, though its parameters like unitRowStart/unitColStart might be liberal, read_lr() is anyway called for one tile as in https://aomediacodec.github.io/av1-spec/#decode-tile-syntax. |
True, but the loop filters, in general, need read access and very little, tightly-contained write access. Once they're pipelined, that will be even more true (they will write to pipelined temp vectors passed between, not global storage). I would think the loop filter write access could simply be global across tiles and locked with very little downside. |
I'm not sure cropping should be used for more than just rounding to a multiple of 8 or something. I don't think we want to extend the frame to the next tile boundaries (tiles may be big).
Indeed, it is written only there (for now): Lines 3017 to 3019 in de68ab1
So a lock (or an atomic swap) would "work". But it still seems weird to me that a restoration unit has to be shared between tiles (and only in specific cases, on frame borders), since a tile is intended to be decoded/encoded independently of others.
(from IRC) I think you're right. In
Now, let's consider the last row of tiles:
Therefore:
does nothing, and In other words, unless I'm mistaken, it seems that restoration units are not used for the last row and last column in my example. So there are 9 tiles, but only 4 of them use a restoration unit. That way, a restoration unit is never shared between tiles. Am I right? |
Now, let's consider the last row of tiles:
128x128 128x128 44x128
128x128 128x128 44x128
128x 44 128x 44 44x 44 <-- this row
read_lr() will be called with r = 64. For the luma sample:
unitRows = 2;
unitCols = 2;
unitRowStart = (64 * MI_SIZE + 127) / 128 = 2;
unitRowEnd = Min(2, /* a value greater than 2 */) = 2;
Therefore:
for ( unitRow = unitRowStart; unitRow < unitRowEnd; unitRow++ )
does nothing, and read_lr_unit() is never called.
It does nothing because it's a shared RU. It uses the parameters
coded from the earlier RU.
In other words, unless I'm mistaken, it seems that restoration units are not used for the last row and last column in my example.
So there are 9 tiles, but only 4 of them use a restoration unit. That way, a restoration unit is never shared between tiles.
Am I right?
There are only four restoration units, but all tiles have a
restoration unit. The ones on the edges are stretched. Use a last
row/col size of eg 68 rather than 48 and you'll see it allocate nine
RU instead of 4.
Monty
|
[NORMATIVE] Allow LR units to cross tiles (Wed Mar 28 16:58:57 2018 +0100)
|
OK.
It's not clear to me where it uses (or should use) the earlier RU (shared with another tile) in the code. Could you detail how it works, please? |
What are the expected write access (and from where), apart from the |
So, finally, restoration units may not be split into tiles (because they may be shared between several tiles). There is no need to create non-overlapping views of them, they have interior mutability via atomic or mutex. For now, I can split a The next step is to use these structure instead of ProblemThe main problem is that many pieces of code directly use I see two strategies to attack this problem:
The first strategy has advantages:
However, this will imply to refactor things that are only used at frame level, so some changes would not be absolutely necessary (like possibly On the other hand, option 2 will result in a big tiling branch to maintain until it's merged, and different parts of the code would access regions of plances in two different ways ( I suggest we try to take option 1. ProposalWe will need to remove We will need to access rows separately (and also retrieve a raw ptr for assembly code). I suggest something like: diff --git a/src/plane.rs b/src/plane.rs
index 80f5d8f..1fab676 100644
--- a/src/plane.rs
+++ b/src/plane.rs
@@ -293,11 +293,21 @@ impl<'a> ExactSizeIterator for IterWidth<'a> { }
impl<'a> FusedIterator for IterWidth<'a> { }
impl<'a> PlaneSlice<'a> {
- pub fn as_slice(&self) -> &'a [u16] {
- let stride = self.plane.cfg.stride;
- let base = (self.y + self.plane.cfg.yorigin as isize) as usize * stride
- + (self.x + self.plane.cfg.xorigin as isize) as usize;
- &self.plane.data[base..]
+ pub fn row(&self, x_offset: isize, y_offset: isize) -> &[u16] {
+ assert!(self.plane.cfg.yorigin as isize + self.y + y_offset >= 0);
+ assert!(self.plane.cfg.xorigin as isize + self.x + x_offset >= 0);
+ let base_y = (self.plane.cfg.yorigin as isize + self.y + y_offset) as usize;
+ let base_x = (self.plane.cfg.xorigin as isize + self.x + x_offset) as usize;
+ let base = base_y * self.plane.cfg.stride + base_x;
+ let width = self.plane.cfg.stride - base_x;
+ &self.plane.data[base..base + width]
+ }
+
+ pub fn as_ptr(&self) -> *const u16 {
+ let base_y = (self.plane.cfg.yorigin as isize + self.y) as usize;
+ let base_x = (self.plane.cfg.xorigin as isize + self.x) as usize;
+ let base = base_y * self.plane.cfg.stride + base_x;
+ self.plane.data[base..].as_ptr()
}
pub fn as_slice_clamped(&self) -> &'a [u16] { Passing Usage would be: diff --git a/src/partition.rs b/src/partition.rs
index eaeba5c..1764e0e 100644
--- a/src/partition.rs
+++ b/src/partition.rs
@@ -866,7 +866,7 @@ pub fn get_intra_edges<'a>(
// Needs top
if needs_top {
if y != 0 {
- above[..tx_size.width()].copy_from_slice(&dst.go_up(1).as_slice()[..tx_size.width()]);
+ above[..tx_size.width()].copy_from_slice(&dst.row(0, -1)[..tx_size.width()]);
} else {
let val = if x != 0 { dst.go_left(1).p(0, 0) } else { base - 1 };
for v in above[..tx_size.width()].iter_mut() {
diff --git a/src/me.rs b/src/me.rs
index 15ecc10..3ccdae7 100644
--- a/src/me.rs
+++ b/src/me.rs
@@ -82,8 +82,8 @@ mod nasm {
for c in (0..blk_w).step_by(step_size) {
let org_slice = plane_org.subslice(c, r);
let ref_slice = plane_ref.subslice(c, r);
- let org_ptr = org_slice.as_slice().as_ptr();
- let ref_ptr = ref_slice.as_slice().as_ptr();
+ let org_ptr = org_slice.as_ptr();
+ let ref_ptr = ref_slice.as_ptr();
sum += func(org_ptr, org_stride, ref_ptr, ref_stride);
}
} Then we need to see where the compiler fails (hint: at many places), and use these new methods. Sometimes it's straightforward, sometimes it requires to refactor code which uses What do you think? |
The plan 1 is good and possibly we could work in parallel since should be easy to update little by little keeping the old methods around till their usage is phased away completely. |
The util function convert_slice_2d() operates on a slice using the stride information. It will become incompatible with PlaneSlice which will not expose a multi-rows slice anymore (see <xiph#631 (comment)>). To keep it generic enough, we don't want to use a PlaneSlice wrapper for every call, so make the function use raw pointers (and unsafe).
The util function convert_slice_2d() operates on a slice using the stride information. It will become incompatible with PlaneSlice which will not expose a multi-rows slice anymore (see <xiph#631 (comment)>). To keep it generic enough, we don't want to use a PlaneSlice wrapper for every call, so make the function use raw pointers (and unsafe).
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <xiph#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <xiph#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <xiph#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <xiph#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <xiph#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
Add a decode_test with size such as it uses stretched restoration units. See <xiph#631 (comment)>.
A restoration unit may contain several super-blocks, and may be "stretched" on borders, even across tile boundaries: <#631 (comment)> In the bitstream, it must be coded only for its first super-block, in plane order. To do so, a "coded" flag was set the first time, so that further super-blocks using the same restoration will not "code" it. But this assumed that all super-blocks associated to a restoration unit were encoded sequentially in plane order. With parallel tile encoding, even with proper synchronization (preventing data races), this introduces a race condition: a "stretched" restoration unit may not be coded in its first super-block, corrupting the bitstream. To avoid the problem, expose the restoration unit only for its first super-block, by returning a Option<&(mut) RestorationUnit>. This also avoids the need for any synchronization (a restoration unit will never be retrieved by more than 1 tile). At frame level, lrf_filter_frame() will still retrieve the correct restoration unit for each super-block, by calling restoration_unit_by_stripe().
Add a decode_test with size such as it uses stretched restoration units. See <#631 (comment)>.
Implemented by #1126. |
Currently, many functions access rectangular regions of planes (spanning multiple rows) via a primitive slice to the whole plane along with the stride information. This strategy is not compatible with tiling, since this borrows the memory belonging to other tiles. As a first step, add PlaneSlice methods to access rows separetely. See <xiph#631 (comment)>.
The util function convert_slice_2d() operates on a slice with stride information. It will become incompatible with PlaneSlice which will not expose a multi-rows slice anymore (see <xiph#631 (comment)>). To keep it generic enough, we don't want to use a PlaneSlice wrapper for every call, so make the function use raw pointers (and unsafe). Note: this commit changes indentation for unsafe blocks, so the diff is more understable with "git show -b".
Currently, many functions access rectangular regions of planes (spanning multiple rows) via a primitive slice to the whole plane along with the stride information. This strategy is not compatible with tiling, since this borrows the memory belonging to other tiles. Like for PlaneSlice, add methods to PlaneMutSlice to access rows separately. See <xiph#631 (comment)>.
Currently, many functions access rectangular regions of planes (spanning multiple rows) via a primitive slice to the whole plane along with the stride information. This strategy is not compatible with tiling, since this borrows the memory belonging to other tiles. As a first step, add PlaneSlice methods to access rows separetely. See <xiph#631 (comment)>.
The util function convert_slice_2d() operates on a slice with stride information. It will become incompatible with PlaneSlice which will not expose a multi-rows slice anymore (see <xiph#631 (comment)>). To keep it generic enough, we don't want to use a PlaneSlice wrapper for every call, so make the function use raw pointers (and unsafe). Note: this commit changes indentation for unsafe blocks, so the diff is more understable with "git show -b".
Currently, many functions access rectangular regions of planes (spanning multiple rows) via a primitive slice to the whole plane along with the stride information. This strategy is not compatible with tiling, since this borrows the memory belonging to other tiles. Like for PlaneSlice, add methods to PlaneMutSlice to access rows separately. See <xiph#631 (comment)>.
The text was updated successfully, but these errors were encountered: