-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blogger Articles show up as untitled #803
Comments
Yup, I see no easy way to get content since content isn't in page source. That's really, really bad, talking about web standards. |
@fivefilters has been informed. |
Yes, I don't understand how this even caught on - it looks like a horrible way to present simple content. Nonetheless, Google has a page for developers who insist on presenting content in this way to help them offer the same content in plain HTML for the benefit of crawlers and other such systems. Blogspot - which powers this particular site - follows the spec. It contains a meta element Full-Text RSS also understands this, but we only look in the first 4000 characters of the HTML to find the meta tag which signals that we should fetch the plain HTML URL. In this case, due to a large embedded image, that tag appears after 4000 characters, and so is missed by Full-Text RSS. We'll try to fix this in a future update. To fix it manually, you can try editing HumbleHttpAgent.php - https://github.com/wallabag/wallabag/blob/master/inc/3rdparty/libraries/humble-http-agent/HumbleHttpAgent.php - and replacing the number 4000 (3 occurrunces) with a bigger number. Or removing the parameter completely so Full-Text RSS searches for the tag in the entire HTML. |
Thanks for reporting that blogger website. The problem here is that this website use a data-uri image as the open graph image. So the body is huge and graby (should be the same for FTRSS) don't check too much html to be able to detect that fragment. It'll be fixed in the 2.0.2. |
I tried saving the following article to my wallabag setup: http://www.gavinj.net/2012/06/building-python-daemon-process.html
This uses one of these rather silly javascript templates which seems to cause problems with wallabag. No content is saved.
The text was updated successfully, but these errors were encountered: