Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl multipage articles #1758

Closed
Matthias84 opened this issue Mar 6, 2016 · 4 comments
Closed

Crawl multipage articles #1758

Matthias84 opened this issue Mar 6, 2016 · 4 comments

Comments

@Matthias84
Copy link

Issue details

(I know this might become a hard one)

Especially newspages create articles with pagination. It would be nice, if wallabag could detect < 1 2 3> links and combine them to one single combined text.

@tcitworld
Copy link
Member

Well, normally wallabag already tries to do that. You can sent us links of where it should happen for us to fix it. :)

@Matthias84
Copy link
Author

Ok will try to collect some links. Till then, feel free to close this issue 😃

@j0k3r
Copy link
Member

j0k3r commented Mar 6, 2016

We are using siteconfig file to better handle pagination.
For example, this config file for gsmarena.com tells the parser which link will be the next page (with next_page_link). The parser will concat page until it can't find the next page link.

But there isn't something like auto find the next page link.
I'll more than happy if you can provide few links with the Google / W3C suggestion. So we can investigate and try to implement that.

@nicosomb nicosomb modified the milestone: 2.1.0 Mar 7, 2016
@nicosomb nicosomb modified the milestone: 2.1.0 Apr 12, 2016
@j0k3r
Copy link
Member

j0k3r commented Sep 30, 2016

I don't think we'll handle auto link, next_page_link is the way to go in siteconfig.

@j0k3r j0k3r closed this as completed Sep 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants