Improvements to write performance #31

brianjlowe · 2016-02-11T16:25:36Z

Submitting this from December as a pull request since it constitutes a fairly substantial change and is not associated with specific open issues.

This work was supported by the National Agriculture Library of the United States Department of Agriculture through a cooperative agreement with Cornell University and Ontocale SRL.

Highlights:

Incremental ABox inference changes use the same code as the ABoxRecomputer, rather than the original statement-by-statement incremental update. For typical real-world writes, where multiple statements are changed that refer to the same individual, this typically results in an overall inference speedup because each individual touched by the write is recomputed once. It also simplifies the reasoner by eliminating additional implementations of the same logic.
Deletion of RDF happens in batches rather than triple-by-triple.
Clearing graphs through the ingest tools again takes advantage of the bulk update handling; this had gotten broken at one point when the Jena library deprecated BulkUpdateHandlers.
Changes made by a SPARQL UPDATE API request are accumulated in memory and written in batches rather than triple-by-triple.

With this code I've observed speedups to writes of an order of magnitude or more compared to 1.8, depending on how much inference is required. The improvements are typically the most dramatic with the SPARQL API and with deletion of RDF files or clearing of graphs through the GUI. The only things I observed to be slower were some of the reasoner unit tests, which operate with small sets of data purely in memory and benefited from the original triple-by-triple reasoning approach.

… selective recomputing, for major improvement to RDF uploading speed. A few unit tests related to retraction with sameAs aren't passing yet.

…meAs still needs fixing; will flunk tests.

… the unit test

… reasoner's individual queue between concurrent writing threads

Conflicts: api/src/main/java/edu/cornell/mannlib/vitro/webapp/dao/jena/ABoxJenaChangeListener.java api/src/main/java/edu/cornell/mannlib/vitro/webapp/dao/jena/JenaChangeListener.java api/src/main/java/edu/cornell/mannlib/vitro/webapp/rdfservice/impl/jena/ListeningGraph.java

… vivo-1915 (vivo-project#31) Partial resolution to: https://jira.lyrasis.org/browse/VIVO-1915

brianjlowe added 21 commits November 23, 2015 20:47

First rough steps at replacing SimpleReasoner's incremental mode with…

dcfd95c

… selective recomputing, for major improvement to RDF uploading speed. A few unit tests related to retraction with sameAs aren't passing yet.

5x speedup for add RDF via GUI; 25x speedup for SPARQL API writes. sa…

1a1606d

…meAs still needs fixing; will flunk tests.

reverting a small rearrangement of VClassGroupCache to avoid changing…

6e99b1c

… the unit test

log level change

a4649a7

passes sameAs unit tests

f099260

clean up / remove unneeded methods in SimpleReasoner

5ae16cf

bulk graph clear from ingest tools

d13fdf8

add/remove ABox portion of mixed RDF through RDFService

295ddbf

change to listener interface to simplify things and avoid sharing the…

539bffd

… reasoner's individual queue between concurrent writing threads

remove obsolete import

e85e8a4

misc. cleanup / minor fixes

17fab28

minor fixes

ace972d

remove obsolete code

ce1ec11

remove obsolete files

5d33da4

moving IndividualURIQueue to its new mavenized home

de9d406

fix to allow for batch handling of more complex SPARQL updates

6e3a256

improvements to RDFServiceGraph triple batching

64cfc4a

Merge branch 'writePerformance-1.8' into writePerformance

724d8df

move unit test to new location

e4a8c4f

Merge remote-tracking branch 'upstream/develop' into writePerformance

97658ad

grahamtriggs merged commit 97658ad into vivo-project:develop May 3, 2016

chenejac pushed a commit to chenejac/Vitro that referenced this pull request Sep 29, 2022

removed redundant file, identical to file in Vitro-reporitory, ticket…

16dc0d9

… vivo-1915 (vivo-project#31) Partial resolution to: https://jira.lyrasis.org/browse/VIVO-1915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to write performance #31

Improvements to write performance #31

brianjlowe commented Feb 11, 2016

Improvements to write performance #31

Improvements to write performance #31

Conversation

brianjlowe commented Feb 11, 2016