Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized batch_import_marc #2995

Merged
merged 8 commits into from
Jul 21, 2023

Conversation

damien-git
Copy link
Contributor

@damien-git damien-git commented Jul 17, 2023

Currently when imported files are small there is a large overhead when the JVM is loaded and SolrMarc prepares to process files. But SolrMarc supports taking multiple files as input.
This PR changes batch-import-marc.sh and import-marc.sh so that several files can be sent at once (set to 10 by default). When files only contain a few records (which happens for instance with incremental harvests), performance is much better.

TODO

  • Add equivalent Windows .bat support, if possible
  • Update changelog and relevant documentation when merging

@demiankatz
Copy link
Member

Thanks, @damien-git, that's a great idea; I did not realize that SolrMarc supported this behavior. I'm adding a TODO item to port these changes to the .bat versions of the files if possible, and I'll look into doing that when time permits.

@demiankatz demiankatz added this to the 9.1 milestone Jul 17, 2023
Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, @damien-git. I haven't actually tried this out yet, but I had time to give it a closer look today. I found one possible bug and had two suggestions that might or might not be helpful -- let me know what you think about them, but there's no pressure to implement if you don't think they're worth the effort.

Once you've had a chance to look at this, I'll see about working on the Windows port of the new functionality. I suspect that adding the -x functionality to batch-import-marc.bat may be more effort than it's worth due to the limitations of batch files (in which case I can just add an error message if somebody tries to pass -x, and we can spend more time if a Windows user actually cares), but I can probably at least add the multiple file support to marc-import.bat.

harvest/batch-import-marc.sh Show resolved Hide resolved
harvest/batch-import-marc.sh Outdated Show resolved Hide resolved
harvest/batch-import-marc.sh Outdated Show resolved Hide resolved
import-marc.sh Outdated Show resolved Hide resolved
@demiankatz
Copy link
Member

@damien-git, I've done a little more work on the Windows equivalent functionality -- for now, if you pass a -x flag to the batch-import-marc.bat file, it fails with a message that the feature is not supported in Windows. I suspect that this could be implemented -- probably by using a counter and stringing together the filenames inside the main loop -- but I'm not sure that it's worth the effort to bother doing the work, since I don't think we have many Windows users. Unless anyone objects, I'll just leave it at this until somebody complains.

Is there any other work you want to do on this, or is it now ready to merge from your perspective?

@demiankatz
Copy link
Member

demiankatz commented Jul 20, 2023

I've opened VUFIND-1626 to track the missing Windows functionality, so people can comment/vote there if they want it. For good measure, I've also linked to the ticket from the Windows error message, so if anyone encounters the situation, they'll know how to report their needs.

@damien-git
Copy link
Contributor Author

Is there any other work you want to do on this, or is it now ready to merge from your perspective?

I think it's ready to merge.

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, @damien-git, this is a great improvement. Merging now!

@demiankatz demiankatz merged commit a2b1e19 into vufind-org:dev Jul 21, 2023
6 checks passed
bpalme pushed a commit to bpalme/vufind that referenced this pull request Aug 5, 2023
- Allows multiple files to be passed to import-marc.sh (Linux + Windows)
- Utilizes multiple file loading to improve performance of batch-import-marc.sh (Linux only)

---------

Co-authored-by: Demian Katz <demian.katz@villanova.edu>
demiankatz added a commit to demiankatz/vufind that referenced this pull request Dec 4, 2023
EreMaijala pushed a commit to EreMaijala/vufind that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants