-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.19.4/0.19.5 upgrade - looking for testers #91
Comments
Unfortunately, 0.19.4 is an enormous update that requires manual user intervention to upgrade from Postgres 15 to 16, and some minor oversight when migrating from Pictrs 0.4 to 0.5: https://github.com/LemmyNet/lemmy-ansible/blob/main/UPGRADING.md A database upgrade is not something I'm comfortable with automating for the user, and the upgrade script they provided is currently not working on my own server (they didn't really add any error checks... at all). I don't want to be overwhelmed with reports of people breaking their Lemmy-Easy-Deploy systems, so I need time to think about how to approach this properly. I've already spent 3 hours today fixing my container builds for 0.19.4, and working on other 0.19.4 changes and unfortunately I don't have any more time to give to this today. Feel free to suggest things in here, but otherwise I'm going to need more time to sort out this huge breaking change. I just released LED 1.3.4 which adds a safeguard that prevents users from upgrading to 0.19.4 for the time being. In the meantime, I'll be rolling back my own deployment to 0.19.3. |
Currently, the upgrade script provided by Lemmy is not compatible with Lemmy Easy Deploy, as it does not take into account the Compose project name. It might not be too bad, but I will probably have to make my own. I'm thinking about automating it in LED with the disclaimer that you MUST have made your own backups, but that seems dangerous to me. If anyone has any free time and expertise, any help with migrating is greatly appreciated. I can do all of this myself at some point when I have free time, but I have very little of that this weekend. Sorry for the inconvenience! |
If you need any testers just give me a yell here or @blueether@no.lastname.nz I can have a quick look at the db script today but no promises |
I can test also |
Please look at this: https://github.com/pgautoupgrade/docker-pgautoupgrade has worked in a test database |
That's probably going to save me a lot of time, I will very likely be using this. Thanks @pallebone ! |
Hi everyone, I have been testing some changes for the 0.19.4 migration. I've updated the original post with instructions on how to test, and some requests for feedback. If |
I have some comments to add to this but will only be able to mention them tomorrow (too late for me now) especially regarding posgres ‘ I am tempted to ship this as the default here too’ |
"However, I still have some concerns about how this auto upgrade will work on large instances:
Will take a while, but no way around it. My DB is 25GB so I can tell you how long it will take on that when you have a working script. I expect about 1 minute per GB so approx 25-35 minutes. " pictrs was bumped to v0.5, which will also perform its own internal database migration. How stable is it, and how long will it take?" It is imperative that the update script allows us to pause and or continue after each step. IE: first it must upgrade postgres, then wait. Then after we can check everything is working, we must be able to continue with lemmy update, then stop, and finally do the pictrs update. I dont recommend doing all 3 in one step as it will make troubleshooting very difficult and obtuse. "In addition, the official Lemmy deployment now recommends this custom Postgres config:" I dont reccomend setting "defaults" but rather only suggesting values. My customPostgresql looks like this:
some differences include: Since their values seem mostly arbitrary (probably guessed from trial and error rather than a deep understanding of the settings) I dont recommend them. I include my config above with how to derive the values. "If you add the above two volume sizes together, did you have that much disk space free before the upgrade? " Most of the update will revolve around postgres imho. Pete |
Great feedback, thank you!
Currently, there is no special upgrade script, it is just a normal Compose deployment that starts all services at once, like it always has. The only difference here is, I've changed the Postgres container to pgautoupgrade. The Pictrs container already performs a self migration. I could try to start each container one at a time in an intelligent order, then wait for the container to appear healthy before moving on. I'm just not sure how to detect when a migration is in progress or complete. My current deployment was written to expect instant container crashes, so it considers a container that has been running consistently to be healthy. I'll try to come up with something more reliable. But even in the current state, your deployment should work fine, Lemmy will just report errors in the frontend until the migrations are complete and the backend containers are operational. Side note, Lemmy just released 0.19.5, and it doesn't require any compose changes. This means you can use the branch I listed above, and it will automatically grab 0.19.5 instead. |
Alright what are the steps to try an upgrade? I can maybe try this tomorrow. |
They're in the first message of this issue, near the bottom (preceded by a very large and obnoxious warning 😄 ) |
I'm writing something in a different branch that will properly check Postgres via I can do something similar with pictrs via their I'll look into doing proper checks for the Lemmy backend as well. I'll make sure this is all working properly before I make this an official update. |
I have made some significant changes to how Lemmy-Easy-Deploy does health checks based on @pallebone 's feedback. Now, every service has a unique health check that it must pass before it is considered deployed. Each service deploys one at a time, provides log snippets every 15 seconds, and provides the opportunity to abort the deployment with CTRL+C without completely killing the containers. Because of this change, the time limit on deployments has been disabled. If there is a fatal error, it will be up to the user to recognize that from the logs and press CTRL+C to abort. Otherwise, the failing service will continue to crash loop indefinitely until the user intervenes. All of this will allow you to carefully monitor the progress of any migrations, and step in if needed. This change is now available on the |
Ok cool I will take a look at this tomorrow. |
Before lemmy-easy-deploy_pictrs_data After lemmy-easy-deploy_pictrs_data 250.1GB I had 50GB free. ~15 minutes for migrations. Everything seems good. |
Woah, only 15 minutes for that much data? And with not a lot of disk space left? That's pretty incredible, I'm very happy to hear it went smoothly! Thanks for testing!! |
Thank you for the awesome work! |
My experience: 16 min past hour - backup This issue was resolved in pictrs 0.5.15 but deployed version was 0.5. ctrl-c Manually edited the docker compose file to upgrade pictrs to 0.5.15 ran this command: waited for pictrs to recover... took some time. |
Also changed in docker compose the postgres to postgres:16-alpine |
Thanks for finding and reporting that Pictrs issue! I have been trying to keep parity with the "official" Lemmy deployment, and I noticed they reverted from Pictrs I'm not sure why this was, but if you ran into an issue like that, I'm definitely going to make pictrs 0.5.15 the default instead. Appreciate you catching that! |
I checked the dockerhub and something else must have been wrong as 0.5 is a tag that pulls the latest 0.5 release so it would have pulled 0.5.16. Unsure why I had an issue in this case. This might explain why they altered the tag to 0.5 - so it auto updates along the 0.5 branch. |
Hmmmm You said the issue was resolved in 0.5.15, is there a pictrs issue you were keeping an eye on? I want to read through any similar reports |
Unfortunately this in on matrix chat where you can ask the developer for help (Tavi is the dev): I changed to 0.5 now the upgrade is done and it seems to be working without error using tag 0.5. root@lemmy01:/var/lib/docker# cat /lemmy/Lemmy-Easy-Deploy/live/docker-compose.yml services: proxy: lemmy: lemmy-ui: pictrs: postgres: volumes: |
Thanks! That makes sense. It looks like it's only an issue with the migration code. Is it possible you had a pre-downloaded pictrs:0.5 tag that was actually <=0.5.14, which was used instead? That would explain why 0.5 is working for you now, and why 0.5.15 worked for your migration. I could be more aggressive with the image pulls to help this a little bit. |
Its technically possible I suppose but I am unclear how to check this now that I have fiddled around and so on. Either way the big stuff (postgres and lemmy) updated fine. pictrs actually can be completely broken and lemmy still works, just pictures dont work on the site so its not a big an issue, ie site is still up and you can try get pictrs working while the site is functional. |
Lemmy-Easy-Deploy 1.4.0 has been released! 🥳 Huge thanks to everyone here for helping me test and providing me with valuable insight. If you helped test, please do not forget to switch back to the main branch:
|
Thanks for all your help also :) |
Quick FYI - I forgot to change a variable name in 1.4.0, so please update to 1.4.1 or else your Pictrs might be a little broken (presumably anything requiring an API key) If you just run https://github.com/ubergeek77/Lemmy-Easy-Deploy/releases/tag/1.4.1 |
How can I make this change manually for now? |
You can edit But I'm not sure if Lemmy will pick up changes to that file automatically. The backend service may need to be restarted. Also, I couldn't quite figure out what exactly requires authentication on the pictrs API, so if you aren't having issues with thumbnail generation or user submitted images, this can probably just wait. |
With the hot fix, you forgot to update the version number, so it endlessly updates. |
Thanks, sigh... |
Thanks I will take a look into this tomorrow as its late now. |
I changed the version number and re-tagged it as 1.4.1, if you hit "yes" on the update prompt it will sort itself out. Thanks again for all the testing and reports everyone! |
Just thought I would test the script again and get this when I run it:
|
!!! WARNING !!! WARNING !!! WARNING !!! WARNING !!! WARNING !!! |
---|
Would you like to proceed with this deployment? [Y/n] n
Why does it detect 0.19.1?
I went ahead an updated anyway since you are busy. After upgrade I edited the docker compose file as it leaves this tag for postgres: I changed it to postgres:16-alpine I dont think it should leave the upgrade tag as the running database as that will continually check and loop to see if there is an upgrade to do. |
The version check is done by reading As for pgautoupgrade, it doesn't continually do a version check to my knowledge. After a migration, it just runs Postgres at the very end of the script, and should be a drop in replacement for Postgres: https://github.com/pgautoupgrade/docker-pgautoupgrade/blob/main/docker-entrypoint.sh I intend to keep this as the Postgres image moving forward just for future proofing. |
Interesting, I dont agree with this change as it means we are no longer tracking the official postgres image and are trusting no issues crop up with a 3rd party project when it is not required to do so for any meaningful benefit once the upgrade is done. However since we can alter the tags ourselves that is acceptable, but you might want to make people aware so they can decide themselves. |
I would be willing to accept a PR that does an intelligent check to see if using this image is necessary, but this is the most frictionless option I have right now. I know it's a third party image, but it's open source and functionally identical to the real postgres image. There should be no compatibility issues. If you're worried about supply chain attacks, I could consider locking it to an image fingerprint depending on the host architecture. |
Its ok, Im happy to accept your decision as its your project, and ultimately your reputation. Its also easy for me to change it myself so it does not cause me personally an issue. I assumed it was an error but if you are aware and happy, I am also happy. |
I have been testing the 0.19.4/0.19.5 upgrade on a small, brand new 0.19.3 deployment. Everything worked fine. Huge thanks to @pallebone for pointing out pgautoupgrade to me, I really had no idea it existed, and it's a huge reason this is able to be so seamless.
However, I still have some concerns about how this auto upgrade will work on large instances:
pgautoupgrade
is great, but how stable is it on very large databases. And, how long does it take?pictrs
was bumped tov0.5
, which will also perform its own internal database migration. How stable is it, and how long will it take?In addition, the official Lemmy deployment now recommends this custom Postgres config:
https://github.com/LemmyNet/lemmy-ansible/blob/main/examples/customPostgresql.conf
So far, my own single user instance has worked fine without any configuration, and even with a remarkably small 64MB SHM size, which is still the default.
I am tempted to ship this as the default here too, but I know a good handful of people use my project because I support 32-bit ARM deployments, and many of those SBCs have very low system RAM. As the above custom Postgres config wants a 2GB SHM size, I don't think those ARM users will be very happy. Feedback on this is welcome, though I will probably end up keeping the defaults untouched, and linking users to this config for extra performance.
TL;DR - I would like some feedback before I make this an official update. I will give instructions on how to perform this test. Here is what I am looking for:
docker system df -v
For the love of everything, PLEASE make a backup that you are 1,000% certain you can easily and quickly restore.
I have only tested
pgautoupgrade
on a nearly empty Lemmy deployment, I do not know what to expect from real-world servers. Please, make a backup and run through a restore process before trying this.How to test:
git pull && git checkout 0.19.5-migrationcheck
deploy.sh
like you normally wouldEverything should be automatic from there, but I have turned off the time limit for the deployment health checks - for all I know, someone's Postgres migration might take 2 hours or something crazy.
I will do my own testing as well, and if I am satisfied with my own testing and/or the responses here, we will be good to go!
The text was updated successfully, but these errors were encountered: