using druid maha lookups as a replacement for lookups-cached-global #420

vsharathchandra · 2019-03-05T10:28:50Z

Hi ,
Currently we are using lookups-cached-global extension for loading lookups in druid(version - 0.12.3).We load lookups from different Mssql and Msql servers.We load around 50-100 lookups of which the top 10 have around 10-15 million entries.Because of such huge size of lookups we are having a lot of issues(high gc pauses,not able to query) while loading lookups on historicals and brokers.So I would like to use your extension as a replacement for lookups-cached-global.
Are there any queries that could be affected ?
Do you support extracting lookups from msql servers?

patelh · 2019-03-12T14:37:09Z

Of your 50-100 lookups, how many have the same key?

How long does it currently take to load the lookups?

You could convert your lookups to RocksDB based lookups where you create new snapshots once a day and publish updates via Kafka. This would require you to build a new RocksDB instance once a day, zip it up and publish it to HDFS. But it also means you would need some daemon process to do change data capture and publish the updated or new rows to Kafka.

In your 50-100 lookups, if many of your lookups share the same key, you could replace them with our JDBC lookup since it allows for multiple values to be loaded in one lookup, saving duplication of key space. E.g. lookups-cached-global you have one key to one value: Map(a -> aa, b -> bb) Map(a-> 123, b -> 456), our JDBC lookups allow for just one lookup : Map( a -> (aa, 123), b -> (bb, 456)). At query time, you just specific which column you want in the extraction function.

vsharathchandra · 2019-04-16T11:38:18Z

We haven't properly monitored the loading time.For one large lookup(around 10 million entries) , it takes around 45 minutes.

patelh · 2019-04-16T21:52:31Z

@vsharathchandra might be easier to talk about this on gitter or hangouts

vsharathchandra · 2019-04-25T04:46:55Z

okay sure will contact you on gitter.

vsharathchandra closed this as completed Nov 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using druid maha lookups as a replacement for lookups-cached-global #420

using druid maha lookups as a replacement for lookups-cached-global #420

vsharathchandra commented Mar 5, 2019 •

edited

patelh commented Mar 12, 2019 •

edited

vsharathchandra commented Apr 16, 2019

patelh commented Apr 16, 2019

vsharathchandra commented Apr 25, 2019

using druid maha lookups as a replacement for lookups-cached-global #420

using druid maha lookups as a replacement for lookups-cached-global #420

Comments

vsharathchandra commented Mar 5, 2019 • edited

patelh commented Mar 12, 2019 • edited

vsharathchandra commented Apr 16, 2019

patelh commented Apr 16, 2019

vsharathchandra commented Apr 25, 2019

vsharathchandra commented Mar 5, 2019 •

edited

patelh commented Mar 12, 2019 •

edited