Because the behavior of react-pdf is to iterate through each element of text content and then assume that the DOM node found by index is the only associated DOM node, the count drifts every time an element of text content with length and hasEOL is hit.
I don't get it. The length of textContent.items matches the number of rendered children on text layer. If it has hasEOL false, it's going to be a span, if true, then br. I double checked that and added more unit tests in 7c1c925 to ensure that and I'm still unable to reproduce. Perhaps the sample PDF we have doesn't have this issue?
@wojtekmaj The issue is specifically that some tokens containing text and a line break will render both a <span> and a <br>, meaning that the number of rendered elements no longer matches 1-1 with the input. You can see the conditions that lead to this result here.