Writing flatbuffers to Ceph. #3

jlefevre · 2018-06-29T23:48:31Z

This is a client app to write flatbuffers to Ceph, given our previous flatbuffers design and results. Using the tpch lineitem table data, write a client that continuously calls getnextrow(), hashes the row to a bucket, and flushes buckets to Ceph. A bucket contains a FlatbufferBuilder and buckets are 1:1 with Ceph objects, and assume there are n objects. This code and logic will be used in our foreign data wrapper.

Assume a primary key is always provided and is integral. Create composite keys of 2 columns for testing.
For mapping rows to buckets, hash the key using this algorithm, given the known number of buckets n. The bucket number will also serve as the object_id (oid) for now.
https://arxiv.org/pdf/1406.2294
A bucket contains a flatbuffer that is being built as rows arrive, and since there is 1 bucket per object, there may be millions of objects. Hence maintaining an open bucket for every object may be prohibitive in terms of memory on the client machine. Although there will be some overhead, for now buckets should be created on demand, rows added as they arrive, and the entire bucket freed when flushed to Ceph. Assume buckets are flushed when the number of rows exceeds the flush_rows parameter.
The bucket should maintain some statistics. When adding a row to a bucket, update a few stats such as min/max/counts for several columns. Then when flushing the bucket data (flatbuffer) to Ceph, the bucket statistics should be written to omap for the corresponding object.
At the end of getnextrow(), iterate over the remaining open buckets and flush to Ceph.
For now, just append flatbuffers to their corresponding objects when flushing to Ceph.

jlefevre · 2019-06-18T00:29:24Z

Closing this since is completed except for Ceph write append -- moving that to new issue #32. Statistics has also been separated out of this issue and the stats structure is here.

Bulk of this issue (hash rows into flatbuffer buckets, and write flatbuffers to local binary files as objects), was resolved via commit

jlefevre assigned billyinthe510 Jun 29, 2018

jlefevre closed this as completed Jun 18, 2019

jlefevre mentioned this issue May 31, 2020

Write append new flatbuffer layout to Ceph uccross/skyhookdm-ceph-cls#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing flatbuffers to Ceph. #3

Writing flatbuffers to Ceph. #3

jlefevre commented Jun 29, 2018

jlefevre commented Jun 18, 2019

Writing flatbuffers to Ceph. #3

Writing flatbuffers to Ceph. #3

Comments

jlefevre commented Jun 29, 2018

jlefevre commented Jun 18, 2019