Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem reading DZone entry #1980

Closed
Antoninum opened this issue Apr 25, 2016 · 6 comments
Closed

Problem reading DZone entry #1980

Antoninum opened this issue Apr 25, 2016 · 6 comments

Comments

@Antoninum
Copy link

Hi,
I just tried wallabag for the first time on my webserver. Looks great but for some reason some articles won't be read...

Issue details

When adding an article from DZone (from example this one : https://dzone.com/articles/javascript-mvvm-youre-probably-doing-it-wrong) wallabag says that it can't retrieve the article, with a "No title found".

The error is the following :
Warning: Division by zero in /var/www/[my-adress]/var/cache/prod/twig/64/64cb463472fd62c8bc6d3567afb6cbd09f9935aeba22be7abb5134fb577d22a0.php on line 129

Environment

  • Wallabag version : 2.0.2
  • git revision: 7d5b463
  • i installed wallabag the "recommended ways" (git clone && composer install)
  • It's my first time trying wallabag
  • PHP 5.6.15 with FastCGI (I'm using phpfarm)
  • Debian 6
  • Dedicated server
  • SQLite

Steps to reproduce/test case

Click on the "add new entry button", add the URL mentionned above, and that's it.

@tcitworld
Copy link
Member

About the error, please select a value inside Config > Settings > Reading speed to fix it.

The website being not fetched will be taken care of.

@Antoninum
Copy link
Author

Thanks for your quick answer.

I already had a value set. I tried changing it, even to logout / login again. But the error persists, even if I add new article from this website.

@Antoninum
Copy link
Author

I have to mention that I can fetch and read articles from other websites perfectly.

@Antoninum
Copy link
Author

Hello,
After investigation, I have new informations regarding this bug.

First, the "Division by zero" error appears to have vanished: the "Reading speed" change seems to have worked, but I did not see that immediately since my main problem (wallabag cannot retrieve the DZone entry) rested unchanged.

The problem comes from the ?_escaped_fragment_= that wallabag adds to my URL. I believe it serve the purpose of crawling AJAX.
But DZone returns à 404 error with this escaped fragment on any of their articles! I'm guessing they try to avoid being fetched by robots ? Sounds silly but, well, they're doing it anyway...

So I guess an option to remove the escaped fragment from specifics entry should work ? Unless it is possible to add it for an entire website ? Or maybe wallabag can try fetching without the escaped fragment if it detects a 404 error ?

By the way, it seems like this AJAX crawling stuff is deprecated: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html

@j0k3r
Copy link
Member

j0k3r commented Apr 27, 2016

Many thanks for the investigation.

In fact, for some websites (when we found stuff inside the html meta - it's here) we append the escaped fragment to for the website to render a html version of the page and not a async one (using ajax to get the content). Even if it's deprecated it still works in some case.

For DZone we found that the website use angular and then we try with escaped fragment. If should be a good idea to rollback this addition in case of error from the website.
In that specific case it seems that the content already exist in the page and we don't need to append the escaped fragment.

Could you please post an issue on graby (this is the lib which handle content extraction for wallabag) about that and then close this one ? Thanks !

@Antoninum
Copy link
Author

I'll do that. Thanks for your answers !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants