-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long lines truncated at 10,000,000 chars. #399
Comments
Think you're hitting a limit in libxml2 https://www.suse.com/support/kb/doc/?id=000019477. Not sure if you need to rebuild or if this can be changed at runtime 🤷 |
if I try reading with > xmlParse("./test.html")
xmlSAX2Characters: huge text nodeExtra content at the end of the document
Error: 1: xmlSAX2Characters: huge text node2: Extra content at the end of the document Looks like the |
Seems I have "options"...
However I can't find an option that will make |
Now filed at r-lib/xml2#440. I suspect there's not going to be much we can do apart from turning |
Long lines in HTML are truncated at 10 million characters.
I get:
showing truncation at 10000000 chars and the
as.character
form has truncated the content and put a script closing tag at the end to make well-formed HTML from truncated data. This is all done silently with no errors or warnings.The real-world case of this was an HTML file created by the
leaflet
package which creates large single lines of geographic data.The Python packages
requests
andBeautifulSoup
read this all correctly.The text was updated successfully, but these errors were encountered: