Adding tests for rect extractions #36

kreuzberger · 2024-01-24T10:02:34Z

Added tests for rect extraction for sphinx-simplepdf / weasyprint generated pdf.
Tests checks for textbox extraction from codeblocks, admonitions and tables.

The tests for table did not work as expected. Instead of extracting colored table cells as rect, the 3 table row shown with alternating colors is extracted as whole.
Attached is a picture from visual debug.

The tests now works, asuming the "wrong" number of rects (i would expect 7). See attached file.
All other extractions work like expected.

kreuzberger · 2024-01-24T10:10:49Z

The failing test has nothing to do with the test implemenation, the test is missing a required executable!
This now explains why i had to patch /etc/ImageMagic Policies on my ubuntu machine.!

This seems to be a feature of the visual debug. A hint in the doc would help.

ubmarco · 2024-02-03T22:22:55Z

Thanks a lot for your PR, I really appreciate new tests for the library.

I cannot push to your branch as the fork is created on your organization, not on your personal account.
So I added a commit to your branch and created a new PR from it to see the changes in CI #37.
The ruff linting is non-voting for now, but I want to enable it over time for more and more files. Once a file is touched I will add it to the files in the tox.ini lint environment.

ubmarco · 2024-02-03T21:47:55Z

tests/conftest.py

@@ -31,6 +31,11 @@
 # test PDFs from official python documentation
 PDF_PYTHON_LOGGING = os.path.join(os.path.dirname(__file__), "pdf", "howto-logging.pdf")

+# test PDF for rect extraction generateby by sphinx-simplepdf


Suggested change

# test PDF for rect extraction generateby by sphinx-simplepdf

# test PDF for rect extraction generated by sphinx-simplepdf

Not urgent, but why not fix it when spotted in a review

tests/test_rects.py

ubmarco · 2024-02-03T22:27:35Z

tests/test_rects.py

+        visual_debug_output_dir=tmpdir.join("visual_debug_dir"),
+        visual_split_elements=True,


these 2 should not be needed if you set visual_debug=False

Yes, but just want to change visual_debug to True without chanching the later ones.
Could be set by using a variable?

vs_debug = False

visual_debug = vs_debug
visual_split_elements = not vs_debug
visual_debug_output = tmpdir.join("visual_debug_dir") if vs_debug else ""

Or just leave it 😄

Still don't understand this. Why would you provide fields that have no meaning in this context?

The libpdf run will not populate this directory nor split elements, so why passing the params. Maybe I misunderstand your use case here. 🤔

ubmarco · 2024-02-03T22:30:44Z

PR #37 fails as expected. I propose to cherry-pick my commits into your branch and fix the mentioned issues from my review.

ubmarco · 2024-02-03T22:42:29Z

For the rect count in your PDF, this is how the PDF is made, the header row actually has 3 rectangles while the row is just one. If you zoom in extremely, you also see it, e.g. here in Firefox's pdf.js based reader:

If you look closely, you see bright vertical lines in the header, but not in the row.

ubmarco

almost there

ubmarco · 2024-02-19T14:32:46Z

tests/conftest.py

@@ -31,6 +31,11 @@
 # test PDFs from official python documentation
 PDF_PYTHON_LOGGING = os.path.join(os.path.dirname(__file__), "pdf", "howto-logging.pdf")

+# test PDF for rect extraction generateby by sphinx-simplepdf


Not urgent, but why not fix it when spotted in a review

ubmarco · 2024-02-19T14:37:24Z

tests/test_rects.py

+        visual_debug_output_dir=tmpdir.join("visual_debug_dir"),
+        visual_split_elements=True,


Still don't understand this. Why would you provide fields that have no meaning in this context?

The libpdf run will not populate this directory nor split elements, so why passing the params. Maybe I misunderstand your use case here. 🤔

kreuzberger · 2024-02-19T16:29:44Z

almost there
If i set visual_debug to True, the debugging is configured. This is my main intention.

kreuzberger added 2 commits January 24, 2024 10:57

adding tests for rec

f20c15a

finish basic tests

5bab1ac

kreuzberger mentioned this pull request Jan 24, 2024

Test framework for sphinx-simplepdf useblocks/sphinx-simplepdf#83

Open

kreuzberger added 2 commits January 24, 2024 11:14

disable visual debug for tests, seems to depend on external tools

c1a9109

adding ghostscript hint in visual debug

f7cd2a0

ubmarco requested changes Feb 3, 2024

View reviewed changes

kreuzberger added 2 commits February 7, 2024 16:17

merge useblocks/libpdf branch rect-tests and fix linting in tests

322edf5

apply code changes due to review

e327644

kreuzberger requested a review from ubmarco February 7, 2024 15:32

ubmarco requested changes Feb 19, 2024

View reviewed changes

ubmarco mentioned this pull request Feb 19, 2024

Color and font information for chars, words and boxes #39

Merged

get merging done fast

fa0c25e

ubmarco approved these changes Feb 21, 2024

View reviewed changes

ubmarco merged commit 29fe5a3 into useblocks:master Feb 21, 2024
15 checks passed

ubmarco mentioned this pull request Mar 15, 2024

Tests for the new Rects class #37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding tests for rect extractions #36

Adding tests for rect extractions #36

kreuzberger commented Jan 24, 2024

kreuzberger commented Jan 24, 2024 •

edited

Loading

ubmarco commented Feb 3, 2024 •

edited

Loading

ubmarco Feb 3, 2024

ubmarco Feb 19, 2024

ubmarco Feb 3, 2024

kreuzberger Feb 7, 2024

ubmarco Feb 19, 2024

ubmarco commented Feb 3, 2024 •

edited

Loading

ubmarco commented Feb 3, 2024

ubmarco left a comment

ubmarco Feb 19, 2024

ubmarco Feb 19, 2024

kreuzberger commented Feb 19, 2024

	# test PDF for rect extraction generateby by sphinx-simplepdf
	# test PDF for rect extraction generated by sphinx-simplepdf

		visual_debug_output_dir=tmpdir.join("visual_debug_dir"),
		visual_split_elements=True,

Adding tests for rect extractions #36

Adding tests for rect extractions #36

Conversation

kreuzberger commented Jan 24, 2024

kreuzberger commented Jan 24, 2024 • edited Loading

ubmarco commented Feb 3, 2024 • edited Loading

ubmarco Feb 3, 2024

Choose a reason for hiding this comment

ubmarco Feb 19, 2024

Choose a reason for hiding this comment

ubmarco Feb 3, 2024

Choose a reason for hiding this comment

kreuzberger Feb 7, 2024

Choose a reason for hiding this comment

ubmarco Feb 19, 2024

Choose a reason for hiding this comment

ubmarco commented Feb 3, 2024 • edited Loading

ubmarco commented Feb 3, 2024

ubmarco left a comment

Choose a reason for hiding this comment

ubmarco Feb 19, 2024

Choose a reason for hiding this comment

ubmarco Feb 19, 2024

Choose a reason for hiding this comment

kreuzberger commented Feb 19, 2024

kreuzberger commented Jan 24, 2024 •

edited

Loading

ubmarco commented Feb 3, 2024 •

edited

Loading

ubmarco commented Feb 3, 2024 •

edited

Loading