Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YJIT crash at !cb.has_dropped_bytes() assertion #548

Closed
intrip opened this issue Nov 17, 2023 · 23 comments
Closed

YJIT crash at !cb.has_dropped_bytes() assertion #548

intrip opened this issue Nov 17, 2023 · 23 comments
Labels
bug Something isn't working
Milestone

Comments

@intrip
Copy link

intrip commented Nov 17, 2023

We are running our app with Ruby 3.3.0-preview2 and Rails on Edge (main branch) and we noticed multiple YJIT crashes on some of our web hosts. We are using Unicorn as web server.

I've attached the full trace extracted from the unicorn crash.

error.log

Thanks for support!

@intrip
Copy link
Author

intrip commented Nov 17, 2023

We have been using YJIT for a while with no issue, and noticed that the start of this issue matches with the upgrade of Rails from 7.0.4 to the main branch.

@maximecb maximecb added the bug Something isn't working label Nov 17, 2023
@maximecb maximecb added this to the Ruby 3.3 milestone Nov 17, 2023
@maximecb
Copy link

Hi Jacopo,

Thank you for reporting this issue, we very much appreciate it. I see from your log that you're on x86-64. Is this running on Ubuntu machines? AWS?

Do you happen to have an offline reproduction? Is this an issue that only happens once deployed, or are you able to trigger it on your CI as well?

Best regards,

  • Maxime

@XrXr
Copy link

XrXr commented Nov 17, 2023

Please give preview3 a try and see if you have the same issue. We've landed many fixes since preview2 and the issue you're running into might have been fixed already.

@intrip
Copy link
Author

intrip commented Nov 17, 2023

Thank you for reporting this issue, we very much appreciate it. I see from your log that you're on x86-64. Is this running on Ubuntu machines? AWS?

Those are Ubuntu machines running on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz. No AWS, we run the app in our own servers.

Do you happen to have an offline reproduction? Is this an issue that only happens once deployed, or are you able to trigger it on your CI as well?

Unfortunately, we don't have a reproduction script for it right now. The CI works fine with YJIT enabled and the issue is triggered only in the production servers.

Please let me know if I can do anything more to help you in troubleshooting it. I really appreciate your help!

@intrip
Copy link
Author

intrip commented Nov 17, 2023

Please give preview3 a try and see if you have the same issue. We've landed many fixes since preview2 and the issue you're running into might have been fixed already.

Sure! I'll try the upgrade path as the first step and will let you know. Thanks!

@maximecb
Copy link

Hello @intrip ! Any update on whether this bug is present in preview3 as well?

@intrip
Copy link
Author

intrip commented Nov 20, 2023

Hello @intrip ! Any update on whether this bug is present in preview3 as well?

We are not live with the upgrade: We'll deploy it in a couple of days. I'll let you know if that fixes it as soon as possible 🙇‍♂️

@intrip
Copy link
Author

intrip commented Nov 22, 2023

@maximecb we just upgraded to preview3 and enabled YJIT on one of the hosts that were affected by this issue, no error so far which is a good sign but not fail-proof. Tomorrow I'll enable YJIT on more hosts, if the error won't manifest anymore we can consider it as fixed.

@maximecb
Copy link

Hi @intrip, thanks for taking the time to test this and report back. I'm glad to hear that it seems to be fixed so far 🤞

@intrip
Copy link
Author

intrip commented Nov 23, 2023

@maximecb after enabling YJIT on more hosts we encountered again the error 😞.
We are running multiple Unicorn processes in the same host and when the YJIT error occurs all the process crashes at the same time, which causes overlapping error logs. This is the best error I could manage to extract so far:
error.log. Is it enough for your troubleshooting? Hopefully, I'll manage to extract a cleaner trace tomorrow. Please let me know If I can do anything else to help troubleshoot this.

@maximecb
Copy link

Hi again @intrip,

Sorry to hear that this bug is still present in preview3. On the positive side, my colleagues have just made several fixes that seem like they are very likely related to this, which are available on Ruby/master.

Would you be able to build/deploy Ruby master? It should be 100% compatible with preview3 and compiled in the same way. If you do, I would recommend picking a commit sha that passed all the tests, e.g. 11d7c75

@intrip
Copy link
Author

intrip commented Nov 24, 2023

Thanks, @maximecb, I'm not sure if we can deploy Ruby master with our current setup, but I'll check.

I've also found a cleaner log: error.log

@k0kubun
Copy link
Member

k0kubun commented Nov 27, 2023

The backtrace was a useful information, thanks.

However, I can't figure out why it happened with that version, so we might need you to run an arbitrary Ruby sha to get more information out of the crash message (since we currently have no crash in production pods that run Ruby master). In any case, it'd be nice to first run the Ruby sha Maxime mentioned, which has some YJIT bugfixes after preview3.

My analysis (for YJIT team to read): This is a has_dropped_bytes situation in ocb for generating a side exit for cb. In set_page (for ocb), we first set self.dropped_bytes = true, assert there's a 6-byte capacity (5 in master), write 5 bytes with jmp_ptr, and assert !self.dropped_bytes (which failed). This is very weird. write_jcc_ptr could set dropped_bytes = true when an offset is very large, but this shouldn't happen for a jump between code pages (hence the assertion). I don't think write_byte would fail either. So 🤔

@maximecb
Copy link

I'm not sure if we can deploy Ruby master with our current setup, but I'll check.

@intrip How do you go about running preview3 with your setup? It should be possible to run Ruby master in mostly the same way? As in, if you compile and build preview3 from source, we can put Ruby/master into a tarball as well. If you work from a git sha or tag, we can provide that for Ruby master too.

@intrip
Copy link
Author

intrip commented Nov 30, 2023

I'm sorry but our team is feeling uncomfortable running the Master branch in the production servers right now. Are there any other options? Maybe there is a Ruby preview4 version coming 😅?

@XrXr
Copy link

XrXr commented Nov 30, 2023

There is probably not going to be a preview4 since the actual release is coming so soon. We can try to diagnose it some other way for now.
Do you use any YJIT tuning command line options like --yjit-exec-mem-size? Also, do you call RubyVM::YJIT.code_gc anywhere in your app?

@k0kubun
Copy link
Member

k0kubun commented Dec 1, 2023

It looks like rc1 is coming out shortly (not sure when, but at least before the release 😅), so you may try that version when it's released.

Do you use any YJIT tuning command line options like --yjit-exec-mem-size? Also, do you call RubyVM::YJIT.code_gc anywhere in your app?

I'm also interested in hearing about it.

@intrip
Copy link
Author

intrip commented Dec 1, 2023

It looks like rc1 is coming out shortly (not sure when, but at least before the release 😅), so you may try that version when it's released.

Ok, we'll wait for the rc1 then.

Do you use any YJIT tuning command line options like --yjit-exec-mem-size? Also, do you call RubyVM::YJIT.code_gc anywhere in your app?

We set RUBYOPT="--yjit-disable --yjit-exec-mem-size=192" at boot, and then enable YJIT at runtime via RubyVM::YJIT.enable

We don't do any calls of RubyVM::YJIT.code_gc

@k0kubun
Copy link
Member

k0kubun commented Dec 12, 2023

3.3.0-rc1 has been released. Could you try that version? That has ruby#9124, so it should have been fixed.

@maximecb
Copy link

maximecb commented Jan 3, 2024

The full Ruby 3.3.0 release is out now. Let us know if this fixed the problem for you :)

https://www.ruby-lang.org/en/downloads/releases/

@intrip
Copy link
Author

intrip commented Jan 9, 2024

Thanks @k0kubun and @maximecb. Sorry for the delay, (holidays) we just deployed 3.3.0 and started testing YJIT again, I'll keep you up to date 👍

@intrip
Copy link
Author

intrip commented Jan 12, 2024

@maximecb @k0kubun The error is gone now that we started using Ruby 3.3.0 🥳. Thank you very much for fixing this and for the support provided across the time 🙏!

@intrip intrip closed this as completed Jan 12, 2024
@maximecb
Copy link

@intrip Happy to hear it's working well. Thanks for updating us! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants