DOC: Example code doesn't give the right output (fix + proven, didn't want to create a pull request for that)

I was trying to use the exact same example mentioned in [here](https://pypdf.readthedocs.io/en/latest/user/extract-text.html#example-1-ignore-header-and-footer), but it gives blank output, even though I copied the same code, and same [PDF file](https://github.com/py-pdf/pypdf/blob/main/resources/GeoBase_NHNC1_Data_Model_UML_EN.pdf). (Fix is at the bottom of this issue report)



## Environment

Debian

```bash
$ python -m platform
Linux-6.1.0-12-amd64-x86_64-with-glibc2.36

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.0.1, crypt_provider=('cryptography', '41.0.7'), PIL=10.2.0
```

## Code + PDF

This is a minimal, complete example that shows the issue (same example from documentation):

```python
from pypdf import PdfReader

reader = PdfReader("GeoBase_NHNC1_Data_Model_UML_EN.pdf")
page = reader.pages[3]

parts = []


def visitor_body(text, cm, tm, font_dict, font_size):
    y = cm[5]
    if y > 50 and y < 720:
        parts.append(text)


page.extract_text(visitor_text=visitor_body)
text_body = "".join(parts)

print(text_body)
```

## Fix

Just change `cm` to `tm`. The selection of height must be from the text matrix, not current matrix.

[Here](https://github.com/py-pdf/pypdf/blob/main/resources/GeoBase_NHNC1_Data_Model_UML_EN.pdf)'s to the PDF file.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: Example code doesn't give the right output (fix + proven, didn't want to create a pull request for that) #2431

Environment

Code + PDF

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

DOC: Example code doesn't give the right output (fix + proven, didn't want to create a pull request for that) #2431

Description

Environment

Code + PDF

Fix

Activity

stefan6419846 commented on Feb 1, 2024

etern4l-white commented on Feb 1, 2024

stefan6419846 commented on Feb 1, 2024

etern4l-white commented on Feb 1, 2024

etern4l-white commented on Feb 1, 2024

stefan6419846 commented on Feb 1, 2024

etern4l-white commented on Feb 1, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions