Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug causing duplicate object IDs #788

Merged
merged 1 commit into from
Feb 2, 2018

Conversation

willswope
Copy link
Contributor

The ID values for new "Selection" objects were being generated by the line "String.fromCharCode(65 + Math.floor(Math.random() * 26)) + Date.now()", which led to the Auto-detection feature frequently skipping pages in large documents. The unique IDs are now simply generated by converting Math.random() to a string.

The ID values for new "Selection" objects were being generated by the line "String.fromCharCode(65 + Math.floor(Math.random() * 26)) + Date.now()", which led to the Auto-detection feature frequently skipping pages in large documents. The unique IDs are now simply generated by converting Math.random() to a string.
@jeremybmerrill
Copy link
Member

jeremybmerrill commented Feb 1, 2018

Thank you for your contribution! I'm excited that this may fix the problem you describe. This mechanism for the problem is very interesting... Do you know why the previous version was creating collision for the IDs?

@willswope
Copy link
Contributor Author

Hi Jeremy! The previous code was generating IDs by concatenating a single random alphabetical character together with the system time in milliseconds, so when multiple pages were being processed in less than a millisecond, there was a probability that two pages would have identical IDs.

The bug is easy to replicate; load up a sufficiently large document (in the several hundred pages range), use the "Repeat this selection" or Autodetect feature, and when you scroll through the document a small number of pages will be missing selections. I was processing 300-page long medicare fee schedule PDFs when I first encountered the issue, but any long document does the trick (https://www.novitas-solutions.com/webcenter/portal/MedicareJH/FeeLookup if you want some easy test material).

@jeremybmerrill
Copy link
Member

Ah, that makes sense. Thank you again for digging into this and submitting the pull request! This may fix #780.

@jeremybmerrill jeremybmerrill merged commit 5cdcfa7 into tabulapdf:master Feb 2, 2018
@jeremybmerrill
Copy link
Member

@jazzido do you remember if there's any reason we chose to do a single random letter plus a timestamp for these IDs? Or more precisely, is there a reason to add any of that back in addition to the larger random number IDs suggested here?

@jazzido
Copy link
Contributor

jazzido commented Feb 2, 2018

No idea. In any case, Math.random() should be safer than our previous solution.

@jeremybmerrill
Copy link
Member

Haha, okay cool. Thanks again @willswope!

@willswope
Copy link
Contributor Author

@jeremybmerrill I'm glad I was able to help! Thank you for working on such a great project; it's saved me many hours of work.

jeremybmerrill added a commit that referenced this pull request Jun 26, 2018
Fixed a bug causing duplicate object IDs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants