Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entities to indices #103

Open
quoll opened this issue Nov 30, 2020 · 2 comments
Open

Add entities to indices #103

quoll opened this issue Nov 30, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request in progress Ticket currently being worked on

Comments

@quoll
Copy link
Contributor

quoll commented Nov 30, 2020

Rebuilding entities is time consuming, particularly for large structures. We have had success with caching these outside of the store, but this can be done inside.

For in-memory, we can include a new index mapping entity nodes (the :db/id) to the entity. Most changes will be done via modifications to an existing structure that was already added, so there is likely to be code sharing. The main exception to this will be data re-acquired through APIs. It's possible to look for diffs between structures, but this will be a lot of effort, and should only be considered if we see significant memory use.

For on-disk usage, we can write entities to an append-only file, using a Clojure serialization such as fressian. To avoid a new index, the SPO index can accept an internal predicate that will connect entities IDs to the latest serialization location, in the same way the data in the data pool is referenced. The new predicate will be filtered out of triple results.

All of this can be handled via existing APIs accessing the modified data structures

@quoll quoll added the enhancement New feature or request label Nov 30, 2020
@quoll quoll self-assigned this Nov 30, 2020
@quoll
Copy link
Contributor Author

quoll commented Dec 1, 2020

Things to consider:

  • fressian works fine, and returns a ByteBuffer with the encoded data. Looking at the code, there isn't a lot beyond what our encoding does (and it's no smaller), with the exception of collections. If we expand the codec to include exceptions then we can avoid this dependency.
  • Entity updates will need to be translated to calls to assoc/dissoc

@quoll quoll added the in progress Ticket currently being worked on label Mar 18, 2021
@quoll
Copy link
Contributor Author

quoll commented Jun 11, 2021

Serialization is done. Currently getting byte offsets for identified sub-entities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Ticket currently being worked on
Projects
None yet
Development

No branches or pull requests

1 participant