-
Notifications
You must be signed in to change notification settings - Fork 173
TxGuide
Bigdata uses Multi-Version Concurrency Control (MVCC) for transactions. MVCC is in the family of optimistic concurrency control algorithms. Bigdata does not obtain a lock when you start a transaction. Instead, it validates the transaction when you commit. The advantage of MVCC is that readers and writers never block and writers succeed unless there is a conflict. This can yeild higher concurrency that Two Phase Locking (2PL).
Timestamps are central to transaction processing in bigdata. There is a unique timestamp for each each commit point and each transaction. When a transaction commits, it first validates each tuple in its write set and then annotates each tuple with the revision time for that transaction. A transaction will abort if there is a write-write conflict. This occurs when a concurrent transaction (one running at the same time) modified the same tuple and committed its changes first. Write-write conflicts are detected by a revision timestamp on the tuple which is more recent than the start time of the transaction.
By default, Blazegraph registers indices that do NOT support transactions. Write operations on such indices are always "unisolated". Unisolated write operations provide a higher throughput since writes are not double-buffered, but writes on a given index will be serialized.
Against a Journal, unisolated writes can provide full ACID semantics with high performance.
In scale-out, unisolated writes provide shard-wise ACID semantics.
Note that read-only transactions with snapshot isolation are always supported, even when the indices are not configured to support full read/write transactions.
You MUST explicitly enable transaction support when you register an index. Transaction processing requires that the index maintains both per-tuple delete markers and per-tuple version identifiers. While scale-out indices always maintain per-tuple delete markers, neither local nor scale-out indices maintain the per-tuple version identifiers by default.
final IndexMetadata indexMetadata = new IndexMetadata( "testIndex", UUID.randomUUID());
// this index will support transactions.
indexMetadata.setIsolatable(true);
// register the index.
store.registerIndex(indexMetadata);
There are two kinds of transactions:
- read-only transactions
- read-write transactions
Read-only transactions are always supported. They provide extremely fast, highly concurrent snapshot isolation. You specify a read-only transaction by declaring the commit point from which you want to read to the transaction service. The returned transaction identifier provides snapshot isolation with a fully consistent view of the state of the database as of that commit point.
Read-write transactions fully buffer writes on "isolated" indices, then validate those writes during the commit protocol, and will fail a transaction if the write set cannot be validated (due to intervening commits). Read-write transaction support must be configured when you create an index.
In addition to transactions, you can have unisolated operations. Unisolated operations are key to extremely high concurrency since they do not require any global coordination. Both the RDF database and the "row store" make extensive use of unisolated operations.
Creating and using transactions with the Journal is straightforward.
Journal store = ...
// start a read-write transaction.
final long txid = store.newTx(ITx.UNISOLATED);
// Obtain a view of a named index isolated by that transaction.
final IIndex isolatedBTree = store.getIndex("testIndex", txid);
// Write on the index.
isolatedBTree.insert("Hello", "World!");
// Commit the transaction.
store.commit(txid);
The BigdataSail wraps the Journal. When wrapping the Journal, the index updates are fully ACID. The following pattern shows how to obtain a connection that supports mutation, work on that connection, and then commit the connection. If anything goes wrong, then the patterns will rollback the work performed on the connection. A similar pattern may be used with the BigdataSailRepository. This class is just a wrapper over the BigdataSail and the connection objects that it returns are just a wrapper over the BigdataSailConnection objects.
BigdataSailConnection conn = null;
boolean ok = false;
try {
conn = sail.getConnection();
doWork(conn);
conn.commit();
ok = true;
} finally {
if( conn != null ) {
if(!ok) {
conn.rollback();
}
conn.close();
}
}
Recycling behavior depends critically on the close of open transactions. The MVCC architecture of Blazegraph means that data for the historical commit points cannot be recycled until there are no active transactions reading on those commit points. If you are holding open a transaction (either a read-only or a read-write transaction) while writing on the database, the database cannot recycle storage and will start to grow in size on the disk once it fills up the available allocations. See the page on RetentionHistory for more about this issue, including the specifics of the RWStore recycler behavior.
If you suspect a storage leak, you should turn on the following logger in the log4j configuration file:
com.bigdata.txLog=INFO
This will cause the following events to be logged:
Event | Fields | Description |
---|---|---|
OPEN-JOURNAL | The UUID, file, and BufferMode of the Journal | A Journal was opened. |
CLOSE-JOURNAL | The UUID and file of the Journal. | A Journal was closed. |
COMMIT | commitTime | The unisolated write set was committed. |
OPEN | txId, readsOnCommitTime | A read-only or read-write transaction was opened. |
CLOSE | txId, readsOnCommitTime | A read-only or read-write transaction was closed. |
RECYCLER | lastCommitTime, latestReleasableTime, lastDeferredReleaseTime, activeTxCount | This is an information message generated when the recycler runs. The recycler cannot recycle allocations unless activeTxCount is ZERO (0). If the counter never becomes ZERO (0), then the RWStore will "leak storage". This is generally an application bug. |
RECYCLED | fromTime, toTime, totalFreed, commitPointsRecycled, commitPointsRemoved | Deferred frees of allocations were released (recycled). Check totalFreed and commitPointsRemoved to see if anything was actually recycled. |
ABORT | N/A | The unisolated write set of the Journal was discarded. |
ROLLBACK | N/A | The state of the Journal was restored to the previous root block. |
SAIL-CREATE-NAMESPACE | namespace | A new namespace was created (since 2.2.0). |
SAIL-DESTROY-NAMESPACE | namespace | A namespace was destroyed (since 2.2.0). |
SAIL-START-CONN | conn | A new BigdataSailConnection was created. |
SAIL-NEW-TX | txId, connn | A new read/write transaction identifier was assigned to a BigdataSailConnection. This occurs when a read/write tx is created and each time you call rollback() or commit() on a read/write tx. |
SAIL-COMMIT-CONN | commitTime, conn | commit() was invoked on a BigdataSailConnection. |
SAIL-ROLLBACK-CONN | conn | rollback() was invoked on a BigdataSailConnection. |
SAIL-CLOSE-CONN | conn | close() was invoked on a BigdataSailConnection. |
REST-API-TASK-OPEN | task | A REST API task was created in response to an HTTP request (since 2.2). |
REST-API-TASK-SUCCESS | task | A REST API task completed normally (since 2.2). |
REST-API-TASK-ERROR | task, cause | A REST API task failed (since 2.2) |