Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to write performance #31

Merged
merged 21 commits into from May 3, 2016
Merged

Improvements to write performance #31

merged 21 commits into from May 3, 2016

Conversation

brianjlowe
Copy link
Member

Submitting this from December as a pull request since it constitutes a fairly substantial change and is not associated with specific open issues.

This work was supported by the National Agriculture Library of the United States Department of Agriculture through a cooperative agreement with Cornell University and Ontocale SRL.

Highlights:

  • Incremental ABox inference changes use the same code as the ABoxRecomputer, rather than the original statement-by-statement incremental update. For typical real-world writes, where multiple statements are changed that refer to the same individual, this typically results in an overall inference speedup because each individual touched by the write is recomputed once. It also simplifies the reasoner by eliminating additional implementations of the same logic.
  • Deletion of RDF happens in batches rather than triple-by-triple.
  • Clearing graphs through the ingest tools again takes advantage of the bulk update handling; this had gotten broken at one point when the Jena library deprecated BulkUpdateHandlers.
  • Changes made by a SPARQL UPDATE API request are accumulated in memory and written in batches rather than triple-by-triple.

With this code I've observed speedups to writes of an order of magnitude or more compared to 1.8, depending on how much inference is required. The improvements are typically the most dramatic with the SPARQL API and with deletion of RDF files or clearing of graphs through the GUI. The only things I observed to be slower were some of the reasoner unit tests, which operate with small sets of data purely in memory and benefited from the original triple-by-triple reasoning approach.

… selective recomputing, for major improvement to RDF uploading speed. A few unit tests related to retraction with sameAs aren't passing yet.
… reasoner's individual queue between concurrent writing threads
Conflicts:
	api/src/main/java/edu/cornell/mannlib/vitro/webapp/dao/jena/ABoxJenaChangeListener.java
	api/src/main/java/edu/cornell/mannlib/vitro/webapp/dao/jena/JenaChangeListener.java
	api/src/main/java/edu/cornell/mannlib/vitro/webapp/rdfservice/impl/jena/ListeningGraph.java
@grahamtriggs grahamtriggs merged commit 97658ad into vivo-project:develop May 3, 2016
chenejac pushed a commit to chenejac/Vitro that referenced this pull request Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants