Skip to content

Latest commit

 

History

History
101 lines (52 loc) · 4.96 KB

18 - Multi-Version Concurrency Control.md

File metadata and controls

101 lines (52 loc) · 4.96 KB

The DBMS maintains multiple physical versions of a single local object. Transactions read the newest version of the object when the transaction started.

Advantages

  • Writers do not block readers and readers do not block writers. (The writer writes on a new physical object while the readers are reading the previous one)
  • Read transactions do not require locks to ensure consistency.
  • Possible to support time-travel queries.

Implementation

Timestamps are assigned to transactions at the start of transaction. This is different from OCC where they are assigned at the validation phase.

Example 1

A0 will be read by transactions between 0 inclusive and 2 exclusive. So transaction with timestamp 1 will always read A0.

In a separate location, track whether transactions have been committed.

Example 2

In this example, transaction with timestamp 2 reads A0 because the earlier transaction has not committed yet even though there was a write.

After transaction 1 commits, transaction 2 commits. Let's assume 2PL is being used. Under read committed, transaction 2 would commit. Under snapshot isolation, transaction 2 will abort (If it is deadlock avoidance with first-writer-win rule).

Snapshot isolation

When a transaction starts, it sees a consistent snapshot of the database from the beginning of the transaction. It means if someone else commits writes after the transaction's read timestamp, it will not see the changes.

The database enforces that two committed transactions have disjoint write sets.

Prone to write skew anomaly.

Concurrency control protocol

Timestamp ordering - Transaction timestamps that determine serial order.

Optimistic concurrency control - Three-phase protocol with private workspace.

Two phase locking - Transaction acquire locks of physical version before reading and writing logical tuples.

MVCC + Two phase locking

Use a different concurrency control scheme for read transactions than for update transactions.

Use MVCC for reads so that they do not block the writers. Read-only transactions are assigned a timestamp and they only see the versions of objects with timestamps before the start of transaction.

Use strict 2PL to schedule the operations of update transactions. Read-only transactions are ignored for 2PL.

Version storage

DBMS maintains version chain per logical tuple. Indexes point to the head of the chain.

Append-only - New versions are appended to table on every update. As shown in previous examples.

Oldest-to-newest (O2N) ordering means the index points to the oldest part of the chain and adding new version is just appending to the chain, but lookups need to traverse chain. Newest-to-oldest (N2O) ordering means every new version must update index pointers but lookups do not require traversing the chain.

Time-travel - Old versions are copied to separate table space.

There are main table and time travel table physically in the database. Update means the value in main table is copied into time travel table and updates the version.

Delta table - Every update, only copy the values that were modified to the delta storage. Faster writes but slower reads.

Garbage collection

DBMS needs to remove reclaimable physical versions over time. Need to know if 1) expired 2) safe to reclaim from memory.

Tuple-level - Find old versions by examining tuples.

In background vacuuming, background threads scan the table and look for reclaimable versions (outside active transactions).

In cooperative cleaning, worker threads (not background threads) identify reclaimable versions as they traverse version chain (Only for O2N). Disadvantage is that some tuples may never be reclaimed if unread.

Transaction-level - Transactions keep track of old versions.

On commit/abort, the transaction provide read and write sets to the centralized vacuum worker.

Indexes and version chains

Primary key indexes point to version chain head (oldest/newest). When primary key is updated, it is treated as delete + insert.

For secondary indexes, there can be a logical pointer that requires an indirection layer (RID, primary key) or a physical pointer that points to the head of version chain. (PosgreSQL)

For physical pointer, if there are many secondary indexes, all will have to be updated. (MySQL) For retrieving the logical pointer, the primary key index has to be queried again. If there are many secondary indexes, only the primary index has to be updated.