You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a client app to write flatbuffers to Ceph, given our previous flatbuffers design and results. Using the tpch lineitem table data, write a client that continuously calls getnextrow(), hashes the row to a bucket, and flushes buckets to Ceph. A bucket contains a FlatbufferBuilder and buckets are 1:1 with Ceph objects, and assume there are n objects. This code and logic will be used in our foreign data wrapper.
Assume a primary key is always provided and is integral. Create composite keys of 2 columns for testing.
For mapping rows to buckets, hash the key using this algorithm, given the known number of buckets n. The bucket number will also serve as the object_id (oid) for now. https://arxiv.org/pdf/1406.2294
A bucket contains a flatbuffer that is being built as rows arrive, and since there is 1 bucket per object, there may be millions of objects. Hence maintaining an open bucket for every object may be prohibitive in terms of memory on the client machine. Although there will be some overhead, for now buckets should be created on demand, rows added as they arrive, and the entire bucket freed when flushed to Ceph. Assume buckets are flushed when the number of rows exceeds the flush_rows parameter.
The bucket should maintain some statistics. When adding a row to a bucket, update a few stats such as min/max/counts for several columns. Then when flushing the bucket data (flatbuffer) to Ceph, the bucket statistics should be written to omap for the corresponding object.
At the end of getnextrow(), iterate over the remaining open buckets and flush to Ceph.
For now, just append flatbuffers to their corresponding objects when flushing to Ceph.
The text was updated successfully, but these errors were encountered:
Closing this since is completed except for Ceph write append -- moving that to new issue #32. Statistics has also been separated out of this issue and the stats structure is here.
Bulk of this issue (hash rows into flatbuffer buckets, and write flatbuffers to local binary files as objects), was resolved via commit
This is a client app to write flatbuffers to Ceph, given our previous flatbuffers design and results. Using the tpch lineitem table data, write a client that continuously calls getnextrow(), hashes the row to a bucket, and flushes buckets to Ceph. A bucket contains a FlatbufferBuilder and buckets are 1:1 with Ceph objects, and assume there are n objects. This code and logic will be used in our foreign data wrapper.
Assume a primary key is always provided and is integral. Create composite keys of 2 columns for testing.
For mapping rows to buckets, hash the key using this algorithm, given the known number of buckets n. The bucket number will also serve as the object_id (oid) for now.
https://arxiv.org/pdf/1406.2294
A bucket contains a flatbuffer that is being built as rows arrive, and since there is 1 bucket per object, there may be millions of objects. Hence maintaining an open bucket for every object may be prohibitive in terms of memory on the client machine. Although there will be some overhead, for now buckets should be created on demand, rows added as they arrive, and the entire bucket freed when flushed to Ceph. Assume buckets are flushed when the number of rows exceeds the flush_rows parameter.
The bucket should maintain some statistics. When adding a row to a bucket, update a few stats such as min/max/counts for several columns. Then when flushing the bucket data (flatbuffer) to Ceph, the bucket statistics should be written to omap for the corresponding object.
At the end of getnextrow(), iterate over the remaining open buckets and flush to Ceph.
For now, just append flatbuffers to their corresponding objects when flushing to Ceph.
The text was updated successfully, but these errors were encountered: