Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only a section of the page renders in the Just Read format #32

Closed
trjate opened this issue Dec 24, 2016 · 13 comments
Closed

Only a section of the page renders in the Just Read format #32

trjate opened this issue Dec 24, 2016 · 13 comments

Comments

@trjate
Copy link

trjate commented Dec 24, 2016

on https://www.privateinternetaccess.com/pages/downloads the first section is correctly rendered, but none of the following, but identically formatted sections render

@ZachSaucier
Copy link
Owner

Thanks for the report! It's because each section is technically in their own container, so Just Read selects the first container. I'll still working to try and fix this in the auto-selection mode, but I'm not sure how I can do that without error across all sites.

However, have you tried out user selection mode or highlight mode? I was able to easily get all the content using both of those modes.

@ZachSaucier ZachSaucier changed the title Incomplete rendering of page Only a section of the page renders in the Just Read format Dec 25, 2016
@paulvancotthem
Copy link

paulvancotthem commented Jan 16, 2017

I have a similar issue on pages in the "motorsport.com" domain.
When I use the "Clearly" extension (now unsupported and no longer developed), it renders the page much better.

Example:
http://www.motorsport.com/f1/news/massa-returns-to-f1-as-bottas-replacement-865853/

@ZachSaucier
Copy link
Owner

ZachSaucier commented Jan 16, 2017

@paulvancotthem That website seems to work when I use JR's auto-selection...

@paulvancotthem
Copy link

paulvancotthem commented Jan 17, 2017

@ZachSaucier It does not fully work for me. The top H1-title of that article ("Mercedes confirms Bottas as Hamilton's teammate") and the photo at the top of the page do not show up after JR parses it. JR's output starts at the H2-subtitle, ignoring what's above it.

@ZachSaucier
Copy link
Owner

ZachSaucier commented Jan 17, 2017

@paulvancotthem I understand now.

The title isn't obtained because JR checks the article's container for a h1 or h2 first, then more globally (I believe that this approach is generally more favorable - keep in mind you can manually edit the title if you need to by clicking the pencil after hovering the title). The photo is just outside the article container, so it would be hard to programmatically find images like that and determine whether or not they should be included.

@paulvancotthem
Copy link

@ZachSaucier When I use the "Clearly"-extension (from Evernotes; no longer supported, but still available for download here), it renders this page correctly, both the title and the image and the rest of the article are parsed and rendered correctly. So, there must be a way to do this programmatically...
screenshot-001

@ZachSaucier
Copy link
Owner

I seem unable to get that download of Clearly to work on my computer. I can look at the code though, so I'll try to break down what's going on to let them select better than how JR selects.

@paulvancotthem
Copy link

@ZachSaucier Oh wow, thanks Zach!

@ZachSaucier
Copy link
Owner

One idea I have (potential feature) to help, not solve this, is to implement a "select more generally" or "select parent container" button. That way, if only part of the content is shown, users can stay in the Just Read format but select more content (which should tend to be the full article).

@iandunn
Copy link

iandunn commented Jun 27, 2017

each section is technically in their own container, so Just Read selects the first container... I'm not sure how I can do that without error across all sites.

You may have already considered this, but would it be a problem to just include all the <article> elements inside the container?

Maybe some sites put too much crap in there, but with all the other formatting stripped away, it doesn't seem like it'd be that bad. At least for me, having too much (clean) text is a minor problem, while missing some important text is a major problem.

@ZachSaucier
Copy link
Owner

@iandunn The main problem using that approach is that I've come across sites that don't use the article content for the article at all, they use it only for previews of other articles (why they do so eludes all logical explanations I can fathom).

This is really just a symptom of the larger problem of how JR does auto-selection. If I have time I'll look at redoing all of it this summer, but time gets shorter every day, haha.

@ZachSaucier
Copy link
Owner

ZachSaucier commented Aug 7, 2017

This should be more or less fixed in the latest version (1.1.0) with Just Read's new auto-selection algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants