Omid, which stands for Optimistically transaction Management In Data stores, is an open-source project started at Yahoo! that aims to provide transactions to datastores in the NoSQL ecosystem (e.g. HBase)
Here, we walk you through the architecture of Omid. If you want to skip to a hands on approach to Omid, please read Getting Started.
The following sections present Omid and its high-level architecture, and detail the motivation and the problems behind Omid.
Omid adds lock-free transactional support on top of HBase. Omid beneﬁts from a centralized scheme in which a single server, called The Status Oracle (TSO,) monitors the modiﬁed rows by transactions and use that to detect write-write conﬂicts. HBase clients in Omid maintain a read-only snapshot (copy) of transaction commit times to reduce the load on the TSO, making it scalable up to 50,000 transactions per second (TPS.)
The following figure shows a high level view of the Omid's architecture
The core architecture of the software is described in more detail in the Technical Details.
Some experimental results could also be found here: Experimental Results.
A transaction comprises a unit of work against a database, which must either entirely complete (i.e., commit) or have no effect (i.e., abort). In other words, partial executions of the transaction are not deﬁned. Without the support for transactions, the developers are burdened with ensuring atomic execution of a multi-row transaction despite failures as well as concurrent accesses to the database by other transactions. Data stores such as HBase, BigTable, PNUTS, and Cassandra, usually lack this precious feature.
Snapshot isolation (SI) guarantees that all reads of a transaction are performed on a snapshot of the database that corresponds to a valid database state with no concurrent transaction. To implement SI, the database maintains multiple versions of the data in some data servers, and transactions, running by clients, observe different versions of the data depending on their start time. For example, transaction txn_n_ reads the modifications made by the transaction txn_o_, but not the ones made by the concurrent transaction txn_c_ because txn_c_ is not committed when txn_n_ starts. Implementations of SI have the advantage that writes of a transaction do not block the reads of others. Two concurrent transactions still conﬂict if they write into the same data item, say a database row.
In the centralized implementation of SI, a single server -The Status Oracle or TSO- receives the commit requests accompanied by the set of the identiﬁers (id) of modiﬁed rows, R. Since the TSO has observed the modiﬁed rows by the previous commit requests, it has enough information to check if there is temporal overlap for each modiﬁed row. The timestamps are obtained from a timestamp oracle integrated into the status oracle and the uncommitted data of transactions are stored on the same data tables.
For each transaction, the TSO server sends/receives the following main messages:
Since the timestamp oracle is integrated into the TSO, the transactional clients (TCs) can obtain the start timestamp from the TSO. The following list details the steps of transactions:
hbase-trx is an ongoing project that attempts to extend HBase with transactional support. Similar to Percolator, hbase-trx runs a 2PC algorithm to detect write-write conflicts. In contrast to Percolator, hbase-trx generates a transaction id locally (by generating a random integer) rather than acquiring one from a global oracle. During the commit preparation phase, hbase-trx detects write-write conflicts and caches the write operations in a server-side state object. On commit, the data server (i.e., RegionServer) applies the write operations to its regions. Each data server considers the commit preparation and applies the commit in isolation. There is no global knowledge of the commit status of transactions.
In the case of a client failure after a commit preparation has been sent to a data server, the transaction will eventually be applied optimistically after a timeout, regardless of the correct status of the transaction. This could lead to inconsistency in the database. To resolve this issue, hbase-trx would require a global transaction status oracle similar to that presented in this project. hbase-trx does not use the timestamp attribute of HBase fields; transaction ids are randomly generated integers. Consequently, hbase-trx is unable to offer snapshot isolation, as there is no fixed order in which transactions are written to the database.
If you want to contribute, clone the repository and start hacking. If you want yours changes accepted, please refer to How to contribute section.