-
Notifications
You must be signed in to change notification settings - Fork 267
hbase scalding Store based on maple/storehaus #404
base: develop
Are you sure you want to change the base?
Conversation
hbase scalding Store based on maple/storehaus
.write(new HBaseVersionedSource[K2, V2](table, scheme)) | ||
} | ||
|
||
/* overridden methods for ReadableStore[K, V2] */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to add something like, .toReadableStore: ReadableStore[K, V2]
here rather than make this one item subclass two things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me how to override the get/multiGet methods if I don't subclass/extend ReadableStore. Do you have an example of what you had in mind?
Thanks for working on this! |
|
||
val scheme = new HBaseScheme(new Fields(KeyColumnName), ColumnFamily, new Fields(ValColumnName)) | ||
|
||
implicit val b2immutable: Injection[Array[Byte], ImmutableBytesWritable] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you maybe able to use this bijection instead of writing a new one
incorporate code review feedback
I've incorporated the feedback. I removed the error check for the empty pipe in readLast. It looks like ScaldingStore's merge should handle this correctly |
} | ||
|
||
|
||
class HBaseStore [K, V2] (quorum: Seq[String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this class is needed? Cant you just do this?
val store:Store[K,V2]=HBaseByteArrayStore(quorum, table, columnFamily, valColumnName, createTable)
.convert[K,V2](K=>Array[Byte])(V2=>Array[Byte])?
see this for an example of how to convert a store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! I’ll do that.
used converetd HBaseByteArrayStore rather than new class
in addition to using a converted HBaseByteArrayStore I also renamed main() in ScaldingRunning so the "getting started" wiki instructions weren't impacted by another main() in summingbird-example |
|
||
|
||
def toReadableStore: ReadableStore[K,V2] = { | ||
hbaseStore.asInstanceOf[ReadableStore[K,V2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HBaseStore is already extending ReadableStore, you dont have to do hbaseStore.asInstanceOf[ReadableStore[K,V2]]
remove extra cast
any update on this PR. Would love to get this merged in as I have to use HBase with SB too |
table: String)( | ||
implicit | ||
batcher: Batcher, | ||
injection: Injection[(K, (BatchID,V)), (Array[Byte], Array[Byte])], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this? Doesn't it come for free with the keyInj + valueInj?
Sorry for the slow response. @MansurAshraf I guess you have reviewed the HBase stuff. We don't use it much at Twitter, so I'm only giving a summingbird review. I don't see how this code is tracking the state of which batches this store has completed. That information needs to be available at planning time, and this code is currently just always claiming to have data. That will not be correct. That said, we do need to build some kind of framework to make it easier to test Stores. |
store/fetch last processed BatchID in ZK
Added stuff to write/read the last processed BatchID from ZK. I have to confess that I don't have much experience interacting with ZK directly (mostly use systems that use ZK) so I'm quite open to feedback on how to do that better. It does seem like the ideal way to do this would be to register a Watcher with the yet-to-be-created zookeeper WaitingState so the BatchID could be written once the Scalding job has completed successfully. |
table: String) | ||
extends Watcher | ||
{ | ||
val LastBatchIDZKPath = "/summingbird/" + table + "/lastBatchID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we want the users to pass the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't incorporate this feedback. I thought that the details of how/where the store put the state in ZK were internal details to the store and I preferred not to leak them out. If you feel strongly I can add that though.
apply code review feedback
|
No description provided.