New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model operations performance improvements #347
Conversation
…y triple transactions on add/remove model Load n3 files from home directory into in-memory model to use bulk loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@litvinovg it looks as great improvements in performance. I have tested performance, loading of models is significantly improved. Also, tried to validate whether everything is loaded in the triplestore by using SPARQL query, I have checked for academic degrees and continents, and the number of classes is the same. There is only one my comment regarding code review.
api/src/main/java/edu/cornell/mannlib/vitro/webapp/rdfservice/adapters/BulkModelCom.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@litvinovg two more my comments
@@ -18,6 +18,8 @@ | |||
import java.util.TreeSet; | |||
|
|||
import edu.cornell.mannlib.vitro.webapp.i18n.selection.SelectedLocale; | |||
import edu.cornell.mannlib.vitro.webapp.rdfservice.adapters.BulkUpdatingModel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used import, please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -182,12 +184,16 @@ private static Set<Path> getPaths(String parentDir, String... strings) { | |||
private static void readOntologyFileIntoModel(Path p, Model model) { | |||
String format = getRdfFormat(p); | |||
log.debug("Loading " + p); | |||
Model memModel = ModelFactory.createDefaultModel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you are not using here:
Model memModel = ModelFactory.createDefaultModel(); | |
Model memModel = VitroModelFactory.createModel(); |
If we are using ModelFactory.createDefaultModel() we are not using bulk model, bulk graph, etc., right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it wouldn't have any effect on performance, so no reason to use more complicated model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge change. Seems even better than pre-1.10, before the clear statements started exhibiting the extreme slowness.
I initially did main-branch timings on the content models instead of config:
Main branch, content models
Load sample data 15s
Clear statements 820s
Remove sample data 39s
Remove model 955s
This PR:
Add/clear statements/remove RDF/Remove model all within 13-18s depending on trial
Main branch, config models
All operations, total craziness
This PR:
All operations 2-3 seconds
Performance test of the PR 347Here is the performance test of this PR 347: Experimental conditionThe comparison is made between Bulk loading and Classical loading (implemented in version 1.13.0 and earlier) The comparison is performed on three target configurations:
VIVO Setup on AWS-Ec2 VM Machine
MethodologyThe goal of the tests is to calculate the startup time of VIVO, i.e. the time needed to load the ontologies at the initial startup of VIVO. Initialization
Test for bulk loading
Test For classical load:
Results
ConclusionFor all cases (TDB/Fuseki/Neptune) the 'Bulk' loading offers a real gain in loading performance and does not seem to generate instability although the present tests do not serve to test the VIVO stability. |
* created RDFServiceBulkUnionUpdater * fix and indents * use bulk updater for RDFServiceGraph in blankNodeFilteringGraph * use removeAll to optimize removal all triples from TDB modeles * avoid additional serialization/deserialization cycle of removing model chunk * fixed VitroModelFactory tests * fixes for BulkUpdating models * Created custom ModelCom, OntModelImpl, BulkGraphMem to avoid triple by triple transactions on add/remove model Load n3 files from home directory into in-memory model to use bulk loading. * refact: simplified BulkGraphMem checks * removed unused import
VIVO 3605 GitHub issue
VIVO 3792 GitHub issue
What does this pull request do?
Implemented some changes to improve performance of model modifications
What's new?
RDFSerivceBulkUnionUpdater which forwards remove(model), removeAll, add(model) calls to enclosed updaters.
RDFServiceModelMaker calls removeAll on model instead of removing triple by triple.
Removed excessive de-serialization/serialization cycle in BulkUpdatingOntModel.performRemoveAll
Created Bulk ModelCom, Bulk OntModelImpl, Bulk GraphMem to avoid triple by triple transactions on batch add/remove
How should this be tested?
Interested parties
@bkampe
@brianjlowe
@chenejac