Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the data? #1

Open
ciro-maciel opened this issue Oct 23, 2017 · 9 comments
Open

How to get the data? #1

ciro-maciel opened this issue Oct 23, 2017 · 9 comments

Comments

@ciro-maciel
Copy link

@tristanls liked your project, congratulations!

I'm looking at your documentation and trying to understand the architecture. I'm not finding anything on how to get the data.

How to get the data?

Thank you

Ciro

@tristanls
Copy link
Owner

tristanls commented Oct 23, 2017

Hey Ciro,

I never got around to implementing consolidating the data intervals nor reading from the written data intervals. The data is stored in a LevelDB-backed storage, that's what alldata-storage-leveldb uses. In particular, here is where storage is created - https://github.com/tristanls/alldata-storage-leveldb/blob/master/index.js#L344 and a write to it https://github.com/tristanls/alldata-storage-leveldb/blob/master/index.js#L361. Reading the data would involve opening up the underlying store using levelup.open. And then reading accordingly, probably via a levelup.createReadStream.

In case you're curious, there is a similar system that has been implemented end-to-end that I saw a presentation on. It has similar design elements as alldata, and is called OK Log. Check it out: OK Log video, repo link here.

I hope this helps.

Cheers,

Tristan

Edit: Added reference to levelup.createReadStream to demo streaming all the data from an interval, once opened.

@tristanls
Copy link
Owner

tristanls commented Oct 23, 2017

For a visual explanation of the architecture alldata implements, you can see a slide show explaining high-level concepts, starting here.

Edit: Updated link (accidentally linked to start of presentation instead of specific section).

@ciro-maciel
Copy link
Author

Hello @tristanls,

I'm going to study the material you've sent and I'll return.

Thanks for the answer!

Ciro

@ciro-maciel
Copy link
Author

Hello @tristanls,

Thank you for your point of view and the ample and detailed material, fantastic!

I believe you have understood about allData working, its documentation is great.

I have some questions / ideas, what would be your opinion about it:

  • for a search on the data, do you think it's the best way to do a direct search on the local LevelDB instance?
  • regarding fulltext searches, create the indexes in these locals instances and perform the search?

Ciro

@tristanls
Copy link
Owner

Hello,

I think if alldata were fully implemented using the ideas described in the slides I linked before, then there might end up no room for the indices on the machines that store the data. I would expect the machines to eventually become full and migrate themselves into read-only cluster. Although, this only implies that persistent storage is "full", memory might be available and if your indices fit into memory then that might work.

Another consideration is that the data in alldata is stored in-order, so sequential access via something like levelup.createReadStream would probably be most "performant" (for some arbitrary definition of "performant"). Doing random access/searches against in-order data might be less "performant". Then again, if you created indexes in memory, it might work just fine.

Cheers,

Tristan

@tristanls
Copy link
Owner

Here's some info more reliable than my opinion about performance of leveldb :)
https://github.com/google/leveldb#performance

@ciro-maciel
Copy link
Author

Understood @tristanls,

The numbers on the levelDB performande are interesting!

Your presentation is important, very instructive.

I found this mechanism (DHTs - bucket storage) very interesting, once I reserve more time I will start to study these structures more.

Thank you.

Ciro

@tristanls
Copy link
Owner

Regarding bucket storage, here's an implementation that's used in a bunch of production DHTs: k-bucket. As of v3.0.3, it is optimized to use less heap, but if you look at the code prior to v3.0.3, it is a very literal implementation of the presentation slides.

@tristanls
Copy link
Owner

By the way,

I'm happy to hear you find these helpful, thanks for letting me know.

Cheers,

Tristan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants