Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better forward search #244

Open
aikrahguzar opened this issue Nov 15, 2023 · 1 comment
Open

Better forward search #244

aikrahguzar opened this issue Nov 15, 2023 · 1 comment

Comments

@aikrahguzar
Copy link

I have been looking to improve the synctex experience with pdf-tools and auctex especially with my preferred of writing every paragraph on one line and using visual-line-mode for reflowing the text. Fixing backward search turned out not to be too difficult see #242 but it seems like making forward text search more accurate is harder.

Basically the situation seems to be that although synctex is theoretically capable to providing an accurate column number, it needs tex engines to provide this information which none of them do. So it only provides line level information. However a given source line can correspond to multiple lines in pdf (and vice versa) and I in that case synctex provides multiple results about the query asking editors to somehow chose the best one.

However pdf-tools only gives access to first result of synctex forward search. I don't know c but I think that is happening here,
https://github.com/vedang/pdf-tools/blob/c69e7656a4678fe25afbd29f3503dd19ee7f9896/server/epdfinfo.c#L3188C21-L3188C21
This seems to correspond to only to a single or occasionally two pdf lines corresponding to the same source line but not all of the lines. Is someone who knows c and wants better forward search willing to either,

  1. Change cmd_synctex_forward_search in epdfinfo.c so that it returns the edges corresponding to the bounding box of the whole region of pdf corresponding to the source line? My guess is (I am not sure) that this would simply be the union of all the rectangles in individual search results. Some care would be need when the paragraph get split across pages.
  2. Probably easier and backward compatible. Add a new function to epdfinfo.c that returns the whole list of search results and expose that to lisp so that the region can be determined from lisp.

With this change in c code, I think I can use techniques similar to those used for backward search and those in pdf-isearch for highlighting to get word level accuracy. But without a good bound on the pdf region to search, it is hard to get good enough performance.

@aikrahguzar
Copy link
Author

I have managed to implement a heuristic refinement of forward search in aikrahguzar@e25ae22

Since Smith-Waterman type of alignment on a whole page is too slow, it is regexp from hell variety of heuristic which is more likely to fail than the one for backward search. The failure modes are:

  1. Lots of math
  2. Paragraphs crossing pages in the presence of fancy header/footer. Plain pages decorated with just page numbers should be fine.
  3. Lines consisting just of macros.

In these two cases, the heuristic should realize its defeat and fall back to the result provided by synctex so things should be no worse than before.

However there is third failure mode: two lines on a pdf page which are too similar except for math and text in macros. In that case the first line will get used and the results will be worse than without the hueristic. What I suggested in the earlier comment can help with this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant