Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn_types: Add GC fence feild to Write records #9207

Merged
merged 5 commits into from Dec 10, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
82 changes: 77 additions & 5 deletions components/txn_types/src/write.rs
Expand Up @@ -5,7 +5,7 @@ use crate::timestamp::TimeStamp;
use crate::types::{Value, SHORT_VALUE_MAX_LEN, SHORT_VALUE_PREFIX};
use crate::{Error, ErrorInner, Result};
use codec::prelude::NumberDecoder;
use tikv_util::codec::number::{NumberEncoder, MAX_VAR_U64_LEN};
use tikv_util::codec::number::{self, NumberEncoder, MAX_VAR_U64_LEN};

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum WriteType {
Expand All @@ -21,6 +21,7 @@ const FLAG_LOCK: u8 = b'L';
const FLAG_ROLLBACK: u8 = b'R';

const FLAG_OVERLAPPED_ROLLBACK: u8 = b'R';
const FLAG_GC_FENCE: u8 = b'F';
Copy link
Contributor

@sticnarf sticnarf Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to call it a prefix instead of a flag because it has a timestamp payload?


/// The short value for rollback records which are protected from being collapsed.
const PROTECTED_ROLLBACK_SHORT_VALUE: &[u8] = b"p";
Expand Down Expand Up @@ -60,6 +61,7 @@ pub struct Write {
pub write_type: WriteType,
pub start_ts: TimeStamp,
pub short_value: Option<Value>,

/// The `commit_ts` of transactions can be non-globally-unique. But since we store Rollback
/// records in the same CF where Commit records is, and Rollback records are saved with
/// `user_key{start_ts}` as the internal key, the collision between Commit and Rollback
Expand All @@ -68,6 +70,47 @@ pub struct Write {
/// Also note that `has_overlapped_rollback` field is only necessary when the Rollback record
/// should be protected.
pub has_overlapped_rollback: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it can be replaced by gc_fence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's produced because commit_ts == rollback_ts in the lock, it doesn't need a gc fence.


/// Records the next version after this version when overlapping rollback happens on an already
/// existed commit record.
///
/// When a rollback flag is written on an already-written commit record, it causes rewriting
/// the commit record. It may cause problems with the GC compaction filter. Consider this case:
///
/// ```text
/// Key_100_put, Key_120_del
/// ```
///
/// and a rollback on `100` happens:
///
/// ```text
/// Key_100_put_R, Key_120_del
/// ```
///
/// Then GC with safepoint = 130 may happen. However a follower may not have finished applying
/// the change. So on the follower, it's possible that:
///
/// 1. `Key_100_put`, `Key_120_del` applied
/// 2. GC with safepoint = 130 started and `Key_100_put`, `Key_120_del` are deleted
/// 3. Finished applying `Key_100_put_R`, which means to rewrite `Key_100_put`
/// 4. Read at `140` should get nothing (since it's MVCC-deleted at 120) but finds `Key_100_put`
///
/// To solve the problem, when marking `has_overlapped_rollback` on an already-existed commit
/// record, add a special field `gc_fence` on it. If there is a newer version after the record
/// being rewritten, the next version's `commit_ts` will be recorded. When MVCC reading finds
/// a commit record with a GC fence timestamp but the corresponding version that matches that ts
/// doesn't exest, the current version will be believed to be already GC-ed and ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say when and what the gc fence would be set to in the example?

///
/// For CDC and TiFlash, when they receives a commit record with `gc_fence` field set, it can
/// determine that it must be caused by an overlapped rollback instead of an actual commit.
///
/// The meaning of the field:
/// * `None`: A record that haven't been rewritten
/// * `Some(0)`: A commit record that has been rewritten due to overlapping rollback, but it
/// doesn't have an newer version.
/// * `Some(ts)`: A commit record that has been rewritten due to overlapping rollback,
/// and it's next version's `commit_ts` is `ts`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great comment! 👍👍👍

pub gc_fence: Option<TimeStamp>,
}

impl std::fmt::Debug for Write {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug needs updating

Expand Down Expand Up @@ -97,6 +140,7 @@ impl Write {
start_ts,
short_value,
has_overlapped_rollback: false,
gc_fence: None,
}
}

Expand All @@ -113,12 +157,18 @@ impl Write {
start_ts,
short_value,
has_overlapped_rollback: false,
gc_fence: None,
}
}

#[inline]
pub fn set_overlapped_rollback(mut self, has_overlapped_rollback: bool) -> Self {
pub fn set_overlapped_rollback(
mut self,
has_overlapped_rollback: bool,
gc_fence: Option<TimeStamp>,
) -> Self {
self.has_overlapped_rollback = has_overlapped_rollback;
self.gc_fence = gc_fence;
self
}

Expand All @@ -137,6 +187,7 @@ impl Write {
start_ts: self.start_ts,
short_value: self.short_value.as_deref(),
has_overlapped_rollback: self.has_overlapped_rollback,
gc_fence: self.gc_fence,
}
}
}
Expand All @@ -154,6 +205,12 @@ pub struct WriteRef<'a> {
/// Also note that `has_overlapped_rollback` field is only necessary when the Rollback record
/// should be protected.
pub has_overlapped_rollback: bool,

/// Records the next version after this version when overlapping rollback happens on an already
/// existed commit record.
///
/// See [`Write::gc_fence`] for more detail.
pub gc_fence: Option<TimeStamp>,
}

impl WriteRef<'_> {
Expand All @@ -170,6 +227,7 @@ impl WriteRef<'_> {

let mut short_value = None;
let mut has_overlapped_rollback = false;
let mut gc_fence = None;

while !b.is_empty() {
match b
Expand All @@ -193,6 +251,7 @@ impl WriteRef<'_> {
FLAG_OVERLAPPED_ROLLBACK => {
has_overlapped_rollback = true;
}
FLAG_GC_FENCE => gc_fence = Some(number::decode_u64(&mut b)?.into()),
flag => panic!("invalid flag [{}] in write", flag),
}
}
Expand All @@ -202,6 +261,7 @@ impl WriteRef<'_> {
start_ts,
short_value,
has_overlapped_rollback,
gc_fence,
})
}

Expand All @@ -217,6 +277,10 @@ impl WriteRef<'_> {
if self.has_overlapped_rollback {
b.push(FLAG_OVERLAPPED_ROLLBACK);
}
if let Some(ts) = self.gc_fence {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can calculate a better capacity estimation like https://github.com/tikv/tikv/blob/6b4466ba8a/components/txn_types/src/lock.rs#L161

b.push(FLAG_GC_FENCE);
b.encode_u64(ts.into_inner()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not encode_var_u64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VarInt saves space when ts is 0. But VarInt occupies more space (9 bytes vs 8 bytes) and is slower when ts is a real timestamp.
It has little difference to me because overlapping happens rarely...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it doesn't matter much which encoding to use here, but encode_u64 looks faster than var_u64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's strange to encode start_ts in var int and this ts in u64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see in Lock's encoding, everything new is using encode_u64...

}
b
}

Expand All @@ -237,7 +301,7 @@ impl WriteRef<'_> {
self.start_ts,
self.short_value.map(|v| v.to_owned()),
)
.set_overlapped_rollback(self.has_overlapped_rollback)
.set_overlapped_rollback(self.has_overlapped_rollback, self.gc_fence)
}
}

Expand Down Expand Up @@ -285,9 +349,17 @@ mod tests {
Write::new(WriteType::Delete, (1 << 20).into(), None),
Write::new_rollback((1 << 40).into(), true),
Write::new(WriteType::Rollback, (1 << 41).into(), None),
Write::new(WriteType::Put, 123.into(), None).set_overlapped_rollback(true),
Write::new(WriteType::Put, 123.into(), None).set_overlapped_rollback(true, None),
Write::new(WriteType::Put, 123.into(), None)
.set_overlapped_rollback(true, Some(1234567.into())),
Write::new(WriteType::Put, 456.into(), Some(b"short_value".to_vec()))
.set_overlapped_rollback(true, None),
Write::new(WriteType::Put, 456.into(), Some(b"short_value".to_vec()))
.set_overlapped_rollback(true, Some(0.into())),
Write::new(WriteType::Put, 456.into(), Some(b"short_value".to_vec()))
.set_overlapped_rollback(true, Some(2345678.into())),
Write::new(WriteType::Put, 456.into(), Some(b"short_value".to_vec()))
.set_overlapped_rollback(true),
.set_overlapped_rollback(true, Some(421397468076048385.into())),
];
for (i, write) in writes.drain(..).enumerate() {
let v = write.as_ref().to_bytes();
Expand Down
5 changes: 3 additions & 2 deletions src/storage/mvcc/txn.rs
Expand Up @@ -30,7 +30,8 @@ pub(crate) fn make_rollback(
Some(write) => {
assert!(start_ts > write.start_ts);
if protected {
Some(write.set_overlapped_rollback(true))
// TODO: Set the GC fence ts to the next version's commit_ts.
Some(write.set_overlapped_rollback(true, Some(0.into())))
} else {
// No need to update the original write.
None
Expand Down Expand Up @@ -758,7 +759,7 @@ mod tests {
let w1r = must_written(&engine, k1, 10, 20, WriteType::Put);
assert!(w1r.has_overlapped_rollback);
// The only difference between w1r and w1 is the overlapped_rollback flag.
assert_eq!(w1r.set_overlapped_rollback(false), w1);
assert_eq!(w1r.set_overlapped_rollback(false, None), w1);

let w2r = must_written(&engine, k2, 11, 20, WriteType::Put);
// Rollback is invoked on secondaries, so the rollback is not protected and overlapped_rollback
Expand Down
2 changes: 1 addition & 1 deletion src/storage/txn/actions/commit.rs
Expand Up @@ -90,7 +90,7 @@ pub fn commit<S: Snapshot>(

for ts in &lock.rollback_ts {
if *ts == commit_ts {
write = write.set_overlapped_rollback(true);
write = write.set_overlapped_rollback(true, None);
break;
}
}
Expand Down