Better forward search #244

aikrahguzar · 2023-11-15T10:25:12Z

I have been looking to improve the synctex experience with pdf-tools and auctex especially with my preferred of writing every paragraph on one line and using visual-line-mode for reflowing the text. Fixing backward search turned out not to be too difficult see #242 but it seems like making forward text search more accurate is harder.

Basically the situation seems to be that although synctex is theoretically capable to providing an accurate column number, it needs tex engines to provide this information which none of them do. So it only provides line level information. However a given source line can correspond to multiple lines in pdf (and vice versa) and I in that case synctex provides multiple results about the query asking editors to somehow chose the best one.

However pdf-tools only gives access to first result of synctex forward search. I don't know c but I think that is happening here,
https://github.com/vedang/pdf-tools/blob/c69e7656a4678fe25afbd29f3503dd19ee7f9896/server/epdfinfo.c#L3188C21-L3188C21
This seems to correspond to only to a single or occasionally two pdf lines corresponding to the same source line but not all of the lines. Is someone who knows c and wants better forward search willing to either,

Change cmd_synctex_forward_search in epdfinfo.c so that it returns the edges corresponding to the bounding box of the whole region of pdf corresponding to the source line? My guess is (I am not sure) that this would simply be the union of all the rectangles in individual search results. Some care would be need when the paragraph get split across pages.
Probably easier and backward compatible. Add a new function to epdfinfo.c that returns the whole list of search results and expose that to lisp so that the region can be determined from lisp.

With this change in c code, I think I can use techniques similar to those used for backward search and those in pdf-isearch for highlighting to get word level accuracy. But without a good bound on the pdf region to search, it is hard to get good enough performance.

The text was updated successfully, but these errors were encountered:

aikrahguzar · 2023-11-18T21:22:33Z

I have managed to implement a heuristic refinement of forward search in aikrahguzar@e25ae22

Since Smith-Waterman type of alignment on a whole page is too slow, it is regexp from hell variety of heuristic which is more likely to fail than the one for backward search. The failure modes are:

Lots of math
Paragraphs crossing pages in the presence of fancy header/footer. Plain pages decorated with just page numbers should be fine.
Lines consisting just of macros.

In these two cases, the heuristic should realize its defeat and fall back to the result provided by synctex so things should be no worse than before.

However there is third failure mode: two lines on a pdf page which are too similar except for math and text in macros. In that case the first line will get used and the results will be worse than without the hueristic. What I suggested in the earlier comment can help with this scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better forward search #244

Better forward search #244

aikrahguzar commented Nov 15, 2023

aikrahguzar commented Nov 18, 2023

Better forward search #244

Better forward search #244

Comments

aikrahguzar commented Nov 15, 2023

aikrahguzar commented Nov 18, 2023