-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import from wallabag asynchronously #1611
Comments
I'd like to suggest Wallabag should keep import/export of content for a simple reason. With really large collections, some articles could be years old and the URLs to access them may not work down the road. The website could be completely offline, or the CMS may have changed and the old permalinks not migrated to the new platform. Also, to solve the problem of loading big export files into memory, another solution would be to use a streaming json parser such as https://github.com/salsify/jsonstreamingparser. This would allow loading only one entry at a time in memory for processing, making the process much faster and less ressource-intensive. |
We can't retrieve content from Pocket, but however the solution you provide us may help us to keep content from json files.
We should make this a choice, then. If export with content fails, then export just URLs and metadata. |
That's one option, but it could also be feasible for the export process to keep track of which entries are written out to disk and resume in case of a PHP timeout error. Regardless, unless I'm mistaken I think the export process (database to file) is by nature much faster than import (file to database) because we're having to run multiple database queries per entry. So making the v1 -> v2 JSON import more efficient seems to me like a bigger priority than refactoring the v1 export code. |
A number of users also have encountered timeout/memory issues while exporting json from v1, so it stays a concern too. |
Agreed! |
Hi @j0k3r @nicosomb Is asynchronous export easier to implement now you have implemented the asynchronous import feature? But I guess it won't be for the 2.1.0 milestone. |
Easier maybe. |
As we can see in #1598, import massive files is not possible (we set the limit to 20M on v2.wallabag.org but it's not the good solution.
We need to implement RabbitMQ as in #1581.
We need to refactor JSON export in wallabag v1 to only download articles URL, not content.
The text was updated successfully, but these errors were encountered: