-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication and Plugin Error When Syncing Larger Vaults #26
Comments
Thank you for the detailed log! Maybe it was caused by a network problem as like the timeout of the replication. |
Thank you! I haven't been able to reproduce it, so I'll check it. |
That does make a lot of sense, and would also explain why I was receiving the same error when trying to sync attachments above 10MB (here). As the 10MB request limit is a hard limit imposed by Cloudant, I wonder if it might be possible to have requests larger than 10MB split into smaller chunks?
While this all could be overcome by self-hosting a CouchDB instance and lifting the request limit, there will be users who do not have the hardware facilities or infrastructure to do this, and so will be reliant on Cloudant for syncing. |
After some further testing, I've found that the limit is actually somewhere around 7MB in this use case rather than 10MB. I've just about managed to sync a portion of my vault across my devices using a workaround. The steps are as follows:
1-1.5MB chunks seem to be most reliable to ensure efficient data transfer without the plugin throwing out any other errors. The above workaround is currently the only way I've found to sync a database larger than ~7MB in total size. Perhaps having the plugin detect a difference of 7MB+ between the remote and local databases, and then automate this process of chunking and receiving/transmitting ~1-1.5MB of data until syncing is complete would resolve the problem? The time taken to sync would be far longer, but the operation would be significantly more reliable. |
I'm sorry for your frustrations... Self-hosted LiveSync using PouchDB and synchronized with the remote by this protocol. However, it was not enough. Unfortunately, there is no way to deal with this automatically by size for every request. If you set these values lower number, the number of requests will increase. I tried with IBM Cloudant with the large vault (200MB, 963 files)
was work well. Could you please try it? |
Just updated and gave it another go with those settings.
|
After some further testing, with batch size: 200 and batch limit: 40, I'm able to successfully sync a larger proportion of my vault than I was initially able to without errors - 368 files, 94 folders, 68.2MB total size. If I try to sync the entire vault, I receive the above fetch error regardless of whether I use batch size: 200 and batch limit: 40, or batch size: 50 and batch limit: 10 which you had suggested - 576 files, 126 folders, 81.7MB total size. I should also point out that with batch size: 50 and batch limit: 10, I receive the same error even while trying to sync a small vault - 69 files, 48 folders, 500KB total size. |
This would be reached the access frequency restrictions for IBM Cloudant. |
I would assume that large binary files in a vault are not changed frequently (are we doing diff on them anyway?). If that's the case, is it possible to store binary files on places such as S3 or some S3-compatible storage services (some are quite cheap and affordable) and only store a hash/URL to that file in the database? |
@chengyuhui Yes, Self-hosted LiveSync check the file in the four cases
In this scene, LiveSync checks its contents and if changed, updates to the latest version, if the changes are not conflicted. By the way, some types of attachments won't be changed so frequently. I think it is a very useful feature, in other software do like this. Outlook saves the attachment into SharePoint and links automatically. But I think it is out of the scope of my plugin. |
Are we actually comparing the contents of binary files ? My particular use case is that my self-hosted database server is quite limited in bandwidth (~5Mbps, although latency is quite good), S3 (or even S3 + Cloudflare or other CDN provider) can be a great solution if the user has large files to be synced. After all, one is not supposed to store several gigabytes of data in a database. |
You right. It will be useful that mark large files and synchronize them to another place, like Git LFS. Using a bucket is a good idea, but there are a few limitations. We need the server that proxies requests between obsidian and AWS. And even the file is binary, it is split into chunks and dedupe chunks all through the vault. So if your file doesn't rewrite completely, store and synchronize effectively. PDF or some appendable file have matched this situation. |
https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html |
Also, according to Vinzent03/obsidian-git#57, CORS can be bypassed with Obsidian's API. |
If set allowed origins to "*", usually access-control-allow-credentials can not be true. But referred some topics in StackOverflow, AWS may respond as echo back the request's origin. I have to try this. "request" API can treat only text data. but perhaps it's enough for this use case. (Sorry, I dismissed this API once and completely forgot. It couldn't be able to synchronize) Could you mind if I make this a new issue? |
Yes, feel free to do so. |
@kwjfef |
I have tried this, but it's impossible as the plugin refuses to sync at all after the replication/fetch error. The only way to recover is by resetting the local and remote databases on all devices and starting the replication process from scratch. |
Oh, thanks... |
Could you clarify what you mean by content-length? |
Perfect, thanks! For the |
Thanks too! |
Thank you so much! |
|
@kwjfef Thank you for the detailed log. It was so helpful. It's probably fixed in v0.4.1. And you don't have to set the batch size and batch limit. You can set these values back to default. Could you please try this version? |
I've just given 0.4.1 a try and I can confirm that everything works perfectly! I have noticed that the time to sync singular very large files (e.g., the ~300MB MP4) can be quite long, but I suppose there's not much that can be done about that. |
@kwjfef I'm very happy to hear that! Yes. If you are far from CouchDB's instance (logically), increasing request counts slows down the replication. |
Close this issue once. |
Just following up on a post I'd made on the Obsidian forum (here) over a month ago.
I'd noticed that you'd made several revisions to the plugin since then so thought I'd give it another go today.
Unfortunately, I've found that this issue has not yet been fixed.
I've had some time to investigate the problem and have concluded that it occurs only when attempting to sync a larger vault.
From what I'm able to make of the logs, it seems that the plugin attempts to update the database, gets stuck, and then gives up syncing the files without any particular reason.
Checking the Cloudant database reveals that only the "obsydian_livesync_version" file is ever synced, and no real vault data makes it across.
This problem occurs both if I add all files to the vault in one go to sync in one attempt, or if I slowly add the files to the vault one by one to sync across several attempts.
After the plugin gets stuck, it is frozen in an errored state. Numerous Obsidian restarts and local/remote database initialising and resets do not resolve the problem as long as all files within the large vault are still present.
The only way to recover the plugin is to reset the local/remote databases using a small vault, e.g., the 700KB vault mentioned above.
Please see the log below captured after the plugin has entered the errored state:
This is quite a significant problem for me at the moment as it means that I'm unable to sync my entire vault across my devices.
Any help or suggestions would be very much appreciated.
The text was updated successfully, but these errors were encountered: