Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using druid maha lookups as a replacement for lookups-cached-global #420

Closed
vsharathchandra opened this issue Mar 5, 2019 · 4 comments
Closed

Comments

@vsharathchandra
Copy link

vsharathchandra commented Mar 5, 2019

Hi ,
Currently we are using lookups-cached-global extension for loading lookups in druid(version - 0.12.3).We load lookups from different Mssql and Msql servers.We load around 50-100 lookups of which the top 10 have around 10-15 million entries.Because of such huge size of lookups we are having a lot of issues(high gc pauses,not able to query) while loading lookups on historicals and brokers.So I would like to use your extension as a replacement for lookups-cached-global.
Are there any queries that could be affected ?
Do you support extracting lookups from msql servers?

@patelh
Copy link
Collaborator

patelh commented Mar 12, 2019

Of your 50-100 lookups, how many have the same key?

How long does it currently take to load the lookups?

You could convert your lookups to RocksDB based lookups where you create new snapshots once a day and publish updates via Kafka. This would require you to build a new RocksDB instance once a day, zip it up and publish it to HDFS. But it also means you would need some daemon process to do change data capture and publish the updated or new rows to Kafka.

In your 50-100 lookups, if many of your lookups share the same key, you could replace them with our JDBC lookup since it allows for multiple values to be loaded in one lookup, saving duplication of key space. E.g. lookups-cached-global you have one key to one value: Map(a -> aa, b -> bb) Map(a-> 123, b -> 456), our JDBC lookups allow for just one lookup : Map( a -> (aa, 123), b -> (bb, 456)). At query time, you just specific which column you want in the extraction function.

@vsharathchandra
Copy link
Author

We haven't properly monitored the loading time.For one large lookup(around 10 million entries) , it takes around 45 minutes.

@patelh
Copy link
Collaborator

patelh commented Apr 16, 2019

@vsharathchandra might be easier to talk about this on gitter or hangouts

@vsharathchandra
Copy link
Author

okay sure will contact you on gitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@patelh @vsharathchandra and others