Skip to content

creating pull request for storehaus-cassandra#211

Open
AndreasPetter wants to merge 15 commits intotwitter:developfrom
AndreasPetter:storehaus-cassandra
Open

creating pull request for storehaus-cassandra#211
AndreasPetter wants to merge 15 commits intotwitter:developfrom
AndreasPetter:storehaus-cassandra

Conversation

@AndreasPetter
Copy link
Copy Markdown

Dear Storehaus-reviewer-team,

This is a simple Hector-based Storehaus-wrapper for Cassandra.

The wrapper comprises 3 classes:

  1. CassandraKeyValueStore is a simple key-value store with a fixed column (this seems to be similar to the wrapper for Dynamo)
  2. CassandraLongStore is similar but uses Cassandra's "counter"-feature to sum up longs, creating interesting effects for put operations (which cannot be done as simple as just putting a value).
  3. CassandraCompositeWideColumnStore enables clients to decide which parts of the key should be row-key or column-key by using dynamic composite columns. It requires a Tuple2 with (row-key-list, column-key-list) (so, this is similar to what is being used in the Summingbird-example) and the relevant Cassandra-serializers. This is useful for column-slice-queries.

I ran CassandraKeyValueStore with the Summingbird-example (wordcount). Thereby I noticed the following things:

  • swapping in Summingbird might create the need to define 2 Cassandra-Storehaus "stores" in the code as the key and value types are changing which will require to allocate different serializers for row- and column-keys.
  • BatchID needs a Long-Serializer-Wrapper serializers in order to be human-readable by cassandra-cli or cqlsh. However, I cannot check this into storehaus as this would require BatchID to be part of storehaus, of course. Can we use some implicit magic?

From my point of view, while reviewing it would be really great if s.o. would consider at least the following things:

  • coding style (as i'm new to Scala)
  • implementation concepts match the concepts behind Storehaus
  • improvements for the serialization mismatch -> can we do s.th. with implicits and how?
  • copyright notice (can of course be changed, but you decide, as you will take over legal resposibilities)

Thank you very much in advance!
Regards, Andi

Comment thread project/plugins.sbt Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the eclipse plugin, you can add them locally if you want them for your ide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we need this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, removed it.

@rubanm
Copy link
Copy Markdown
Contributor

rubanm commented Feb 6, 2014

Also a note to add some tests. Travis lets you spin up a Cassandra instance:
http://docs.travis-ci.com/user/database-setup/#Cassandra

There are tests for redis, memcache, mongodb and others if you're looking for examples.

@alexflav23
Copy link
Copy Markdown

@AndreasPetter Are you blocked by a 2.11 release of phantom?

@AndreasPetter
Copy link
Copy Markdown
Author

@alexflav23 Not at the moment, as the build comprises 2.9.3 and 2.10 right now. However it might come up sooner than i expected, please see #235 for twitter's time schedule.

Comment thread project/Build.scala
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these two are unused. Can you omit them and maybe add these later with the cascading tap change.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ruban, thanks for all your comments. Sorry for the delayed answer, but i hoped to complete the cascading stuff earlier, but we are still working on it. All updates i will mention will be comprised in the Cascading PR - sorry for that, but it would make life a bit too difficult for me to switch and merge branches all day, while sharing results with my colleague. So yes, this will come with the StorehausTap (contains an update to this PR, too).

@rubanm
Copy link
Copy Markdown
Contributor

rubanm commented Aug 23, 2014

@AndreasPetter thanks for the refresh. Definitely looking more typesafe :)

@pkallos
Copy link
Copy Markdown

pkallos commented Feb 20, 2015

👍

@debasish83
Copy link
Copy Markdown

I would like to use it for cassandra access through storehaus.. .will it be merged to master soon ?

@johnynek
Copy link
Copy Markdown
Contributor

sorry this slipped through the cracks. We are interested in merging, but there are unaddressed comments and it is not a clean diff to master.

Would you like to take this up and take it over the finish line @debasish83 ?

@AndreasPetter
Copy link
Copy Markdown
Author

Sorry, but no, it is not dead. You can find the current state at
https://github.com/AndreasPetter/storehaus together with some idea from
ruban to use storehaus stores as taps for cascading.
Https://github.com/zirpins/summingbird provides usage options for
summingbird and scalding with versioning in Cassandra itself. However,
right now we are still not up to master yet. This is coming for my parts as
soon as I will fix a minor performance bug which comes up if the Cassandra
stores are used with Storehausoutputformat in cascading (and therefore
upper layers also). Ideas how to fix this are highly welcome...
Am 15.07.2015 16:15 schrieb "P. Oscar Boykin" notifications@github.com:

sorry this slipped through the cracks. We are interested in merging, but
there are unaddressed comments and it is not a clean diff to master.

Would you like to take this up and take it over the finish line
@debasish83 https://github.com/debasish83 ?


Reply to this email directly or view it on GitHub
#211 (comment).

@debasish83
Copy link
Copy Markdown

I am confused why is Cascading showing up in Cassandra StoreHaus ? Cassandra does not use any Cascading right and StoreHaus should not be using it either

@AndreasPetter
Copy link
Copy Markdown
Author

Storehaus-Cassandra can and originally was intended to be used without
cascading. Actually, the idea with storehaus-cascading is to provide
storehaus stores as taps. So this is more abstract than just a Cassandra
only solution. As a side effect you get a flexible Cassandra tap. The
current version of Cassandra-cascading is searching for maintainers and
doesn't support collection types, so the storehaus taps even add
flexibility to what's possible with cascading. You'll find some
documentation and tutorial like examples under the aforementioned URL in
the storehaus-cascading directory including some examples for
storehaus-Cassandra and storehaus-memcache. However afaict the (minor)
performance bug only affects the combination of Cassandra and cascading. In
the end if you want some summingbird job which is using e.g. Cassandra for
online and batching you might get some benefit by not only using the same
job-function but also the same stores.
Am 15.07.2015 19:04 schrieb "Debasish Das" notifications@github.com:

I am confused why is Cascading showing up in Cassandra StoreHaus ?
Cassandra does not use any Cascading right and StoreHaus should not be
using it either


Reply to this email directly or view it on GitHub
#211 (comment).

@rubanm
Copy link
Copy Markdown
Contributor

rubanm commented Jul 27, 2015

@AndreasPetter It should be possible to merge just storehaus-cassandra stores, and add storehaus-cascading and related cassandra tap in subsequent PRs right. Anything that stops us from doing that? Ideally, I think these should be done as separate PRs anyway to keep the PR size in check, smaller logical changes etc.

@AndreasPetter
Copy link
Copy Markdown
Author

Ok. I'll try to make 3 PRs. Right now only merging the stuff up to master
(and most notably modifying Build.scala, which will include searching and
adding stuff for scala 2.11) prevents me from doing so. A little bit off
topic: i found the "performance bug" - adding sorted data to a single
partition is not what Cassandra has been designed for. Do you have a hint
for me how to specify any non-standard-Ordering in Summingbird such that it
gets propagated down into Scalding?

2015-07-27 22:01 GMT+02:00 Ruban Monu notifications@github.com:

@AndreasPetter https://github.com/AndreasPetter It should be possible
to merge just storehaus-cassandra stores, and add storehaus-cascading and
related cassandra tap in subsequent PRs right. Anything that stops us from
doing that? Ideally, I think these should be done as separate PRs anyway to
keep the PR size in check, smaller logical changes etc.


Reply to this email directly or view it on GitHub
#211 (comment).

@carwilki
Copy link
Copy Markdown

carwilki commented Sep 2, 2015

Just a bump to see if there has been any status change on this.

@AndreasPetter
Copy link
Copy Markdown
Author

Yes and no, sorry. I did merge with master and fixed Build.scala just to
see that there is some work to be done :(. But don't worry, I'll try to
make the first PR this weekend as the rest of the family is on vacation.

2015-09-02 18:04 GMT+02:00 carwilki notifications@github.com:

Just a bump to see if there has been any status change on this.


Reply to this email directly or view it on GitHub
#211 (comment).

@debasish83
Copy link
Copy Markdown

I think we should not mix cascading and storehaus cassandra....storehaus can be used by compute fabrics like spark flink and summingbird including frontend apis....

@ghost ghost mentioned this pull request Sep 7, 2015
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.