Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross reference search speed improvements [v3] #424

Merged
merged 8 commits into from
Apr 18, 2019
Merged

Conversation

gunnsth
Copy link
Contributor

@gunnsth gunnsth commented Apr 17, 2019

Two notable performance enhancements:

  • xrefNextObjectOffset search changed to binary search and limited to only uncompressed objects (ones with offset and not within stream).
  • Speedup hasObject lookup in the writer using a map.

Changes required changing XrefTable to a struct to be able to add the cached list of objects for faster search.

The changes speed up passthrough benchmark test in the build server from 38-40seconds down to 15 seconds, so it's a pretty significant improvement.

@gunnsth gunnsth added this to the v3.0.0-rc.1 milestone Apr 17, 2019
@gunnsth gunnsth requested a review from adrg April 17, 2019 20:33
@codecov
Copy link

codecov bot commented Apr 17, 2019

Codecov Report

Merging #424 into v3 will decrease coverage by 0.02%.
The diff coverage is 84.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##               v3     #424      +/-   ##
==========================================
- Coverage   59.56%   59.53%   -0.03%     
==========================================
  Files         153      153              
  Lines       27683    27707      +24     
==========================================
+ Hits        16489    16495       +6     
- Misses      10807    10836      +29     
+ Partials      387      376      -11
Impacted Files Coverage Δ
pdf/model/writer.go 81.29% <100%> (+0.21%) ⬆️
pdf/core/utils.go 42.06% <25%> (ø) ⬆️
pdf/core/repairs.go 27.81% <50%> (+0.29%) ⬆️
pdf/core/crossrefs.go 75.4% <66.66%> (ø) ⬆️
pdf/core/parser.go 76.21% <93.93%> (-0.03%) ⬇️
pdf/model/reader.go 65.19% <0%> (-3.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69ab100...eef9d24. Read the comment docs.

Copy link
Collaborator

@adrg adrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. Left a couple of minor observations.

parser.xrefs[objNum] = obj
common.Log.Trace("entry: %+v", parser.xrefs[objNum])
parser.xrefs.ObjectMap[objNum] = obj
common.Log.Trace("entry: %+v", parser.xrefs.ObjectMap[objNum])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just print obj instead of reading it from the map after adding it:

common.Log.Trace("entry: %+v", obj)

}

i := sort.Search(len(parser.xrefs.sortedObjects), func(i int) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be equivalent to:

	for _, obj := range parser.xrefs.sortedObjects {
		if obj.Offset >= offset {
			return obj.Offset
		}
	}
	return 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except sort.Search uses binary search which is very efficient for sorted data.

}

// Sort by offset, descending.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be ascending offset order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@gunnsth gunnsth merged commit 6e7b575 into v3 Apr 18, 2019
@gunnsth gunnsth deleted the v3-xref-perf-improvements branch April 18, 2019 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants