Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit size of article #640

Closed
ptah-alexs opened this issue Apr 13, 2014 · 25 comments
Closed

Limit size of article #640

ptah-alexs opened this issue Apr 13, 2014 · 25 comments

Comments

@ptah-alexs
Copy link

Some large pages don't import completly.

Instance http://www.fanfics.me/read.php?id=46429

rules file:

title://title                                                                                                                   
strip://table[@class = 'ReadTable']                                                                                             
strip://table[@class = 'horizontal-bar']                                                                                        
strip://*[contains(@class,'hidden')]                                                                                            
strip://div[@id = 'text-width-container']                                                                                       
strip://div[@id = 'new_comment']                                                                                                
strip://table[@class = 'horizontal-bar']                                                                                        
strip://div[@id = 'read_footer']                                                                                                
strip://div[@class='totop']                                                                                                
body://h2//p[@class = 'fict']
@tcitworld
Copy link
Member

Which parts do you miss ? I don't speak a word of russian, but it seems I fully have it (from the Chapter One title to the « Chapter published: 04/13/2014 » end).

Even though if I import it via the « add link » button inside wallabag and it tells me import has failed.
Note that I haven't got specific rules for this website, and that my server timeout is set as default.

@ptah-alexs
Copy link
Author

Hm, maybe difference in a options. I had only approx 1/4 of original size.

@tcitworld
Copy link
Member

Try :

  • removing your rules
  • (more likely) See if you can modify some Apache parameters like KeepAliveTimeout or RequestReadTimeout and increase them.

Else, there's not much you can do, except changing hosting provider.

@nicosomb
Copy link
Member

Can you run wallabag_compatibility_test.php please?

Maybe tidy is not enabled.

@ptah-alexs
Copy link
Author

|Test | Should Be | What You Have|
|PHP | 5.3.3 or higher | 5.5.11-2|
|XML | Enabled | Enabled, and sane|
|PCRE | Enabled | Enabled|
|Data filtering | Enabled | Enabled|
|Tidy | Enabled | Enabled|
|cURL | Enabled | Enabled|
|Parallel URL fetching | Enabled | Enabled|
|allow_url_fopen | Enabled | Enabled|
|gettext | Enabled | Enabled|

@nicosomb
Copy link
Member

nicosomb commented May 2, 2014

Without rules file, all the article is fetched by wallabag.
Can you give more information about your environment please?

@nicosomb nicosomb added the Bug label May 2, 2014
@nicosomb nicosomb added this to the 1.7.0 milestone May 2, 2014
@ptah-alexs
Copy link
Author

what information is you need?

@tcitworld
Copy link
Member

All the informations stated here : https://github.com/wallabag/wallabag/blob/dev/CONTRIBUTING.md#you-found-a-bug
Sorry for not replying earlier.

@nicosomb nicosomb modified the milestones: 1.8.0, 1.7.0 May 29, 2014
@tcitworld
Copy link
Member

@ptah-alexs : Have you seen my last message ? :)

@ptah-alexs
Copy link
Author

Sorry for long silence.

Retrieve page http://www.fanfics.me/read.php?id=46429 within and without rulefile : Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 877.

Another page http://www.fanfics.me/read.php?id=60257 save approx 30-40% of text either with and without rulefile.

Version of wallabag 1.6.1
Dedicated server with Debian GNU/Linux Sid
PHP 5.5.12-2 (cli) (built: May 12 2014 13:02:34)
Used MySQL storage
wallabag_compatibility_test.php without warnings:

Test Should Be What You Have
PHP 5.3.3 or higher 5.5.12-2
XML Enabled Enabled, and sane
PCRE Enabled Enabled
Data filtering Enabled Enabled
Tidy Enabled Enabled
cURL Enabled Enabled
Parallel URL fetching Enabled Enabled
allow_url_fopen Enabled Enabled
gettext Enabled Enabled

@tcitworld
Copy link
Member

Can you edit your php.ini and increase the number set for the max_execution_time value until it works fully ?

Also, your system seems a bit weak, even though the page is very long, my own installation (VPS with 2 vcores) takes about 1 minute to save it. What are your hardware specifications ?

@ptah-alexs
Copy link
Author

it's a strange - i had edit /etc/php5/apache2/php.ini, set max_execution_time in 240. In result i had see "Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 877", what's wrong?

@ptah-alexs
Copy link
Author

I have a seedbox with AMD Athlon(tm) II X3 425 processor and 12G of memory.

@ptah-alexs
Copy link
Author

With 1.7.0 are same things.

@tcitworld
Copy link
Member

Hum, my fault, it's the max_input_time value you may need to increase.

Don't forget to restart/reload Apache after editing the php.ini file. ;)

@ptah-alexs
Copy link
Author

Hmm. max_execution_time = 120, max_input_time = -1, after restart apache: Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 877. It's strange.

@tcitworld
Copy link
Member

Have you got safe_mode enabled ? Are you sure Apache uses the right php.ini file ?

Else, I see no other option than to add set_time_limit(0); into the Readability.php file, either at the top, either just before line 877. But if that doesn't work (like if it's an infinite loop), it may freeze your server until Apache own timeout comes.

Sorry for not finding a solution. :(

@ptah-alexs
Copy link
Author

How do I know that safe_mode enabled? I'm sure that used right php.ini. I'll try your advice.

@ptah-alexs
Copy link
Author

With set_time_limit(0); script done successful, but save only first 50-60kb of text.

@tcitworld
Copy link
Member

How do I know that safe_mode enabled?

It should be in php.ini if set. But it's rather unusual to have it.

Is first 50-60kb of text better than when you started with ~1/4 of the text ?

Final try : edit your Apache configuration, either directly in Apache configuration (/etc/apache2/apache2.conf) or better inside your vhost :
Add the line (or modify it) : TimeOut 300 and see if increasing the time changing anything.

@ptah-alexs
Copy link
Author

  1. Php.net says safe_mode is deprecated.
  2. 60kb of 950kb text :(
  3. with TimeOut 300" in apache2 config i'd seen: Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 452
    3.1 with TimeOut 600: Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 448
    3.2 with TimeOut 900: Fatal error: Maximum execution time of 120 seconds exceeded in /var/www/wallabag/inc/3rdparty/libraries/readability/Readability.php on line 452

@tcitworld
Copy link
Member

Well, I'm sorry to say I've got no ideas left. Maybe someone will see this and have the solution, but I think you will be luckier on Stack Overflow.
Good luck anyway.

@ptah-alexs
Copy link
Author

OK, thank you.

@nicosomb nicosomb removed the Make FAQ label Jul 30, 2014
@nicosomb nicosomb changed the title [Bug] [Question] limit size of article? Limit size of article Jul 30, 2014
@lbivens
Copy link

lbivens commented Sep 6, 2014

The problem is on line 33 of ./inc/3rdparty/makefulltextfeed.php where it says "@set_time_limit(120);"
Changing the value to something like 600 has worked to upload my 481 items list.

Maybe tweaking the import function on Poche.class.php would be better than changing the time-out value. How about making a pool with the articles to import and using multiple threads so the time-out affects only a single item? I am a bit rusty on php...

Update: I just imported, with the same 600 s setting, 1771 items from another reading list... It worked like a charm!

@nicosomb nicosomb modified the milestones: 1.8.1, 1.8.2 Oct 29, 2014
@tcitworld tcitworld modified the milestones: 1.8.3, 1.8.2 Dec 22, 2014
@tcitworld tcitworld modified the milestones: 1.8.3, 1.9.1 Feb 10, 2015
@tcitworld tcitworld modified the milestones: 1.9.1, 1.9.2 May 22, 2015
@nicosomb nicosomb added Site Config and removed Bug labels Feb 4, 2016
@nicosomb nicosomb removed this from the 1.9.2 milestone Feb 4, 2016
@j0k3r
Copy link
Member

j0k3r commented Apr 10, 2016

The given site seems to be dead.
Anyway, I don't think we need to limit the size of an article because the main goal of wallabag is to save the article content to be able to read it later. So if you want to save only some parts of this article, you should better use your bookmark instead.

Closing the issue for now.
Feel free to arg if you think we are wrong and we'll re-open the issue.

@j0k3r j0k3r closed this as completed Apr 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants