New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PL/Java crash on Ubuntu latest kernel #129

Open
petebrew opened this Issue Jun 24, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@petebrew

petebrew commented Jun 24, 2017

I've been using PLJava for many years in a scientific application that I develop and distribute. The most recent version is running on Ubuntu Xenial running PG 9.5, and all has been well for quite some time.

On Wednesday my own instance of the server was shut down to replace a disk in a degraded raid array. When the server was restarted all aspects of my server seem to work fine with the exception of calls using PLJava which result in the following error message:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

i've just had an email from another lab running my software who have the same problem. They noticed the problem following a power outage that shut their server down. So both servers first exhibited the issue following a hard restart. While there appear to have been no auto software updates done at time, I'm guessing now that a previous update may have been installed that didn't come into effect until the servers were rebooted?

Anybody have a similar problem? Any suggestions on how to troubleshoot?

Many thanks

@petebrew petebrew changed the title from Server close the connection unexpectedly to Server closed the connection unexpectedly Jun 24, 2017

@jcflack

This comment has been minimized.

Show comment
Hide comment
@jcflack

jcflack Jun 24, 2017

Contributor

Hi,

You might check issue #128, reporting a crash from Java after a recent kernel update. That could be consistent with your observation of a problem right after an OS reboot.

The hs_err_pid*.log file in #128 indicates at the time of the crash that Java is driving, and doing nothing more exotic than initializing java.lang.Object very early in VM startup. A SIGBUS gets raised on an attempt to access an address that falls in an unmapped memory region just below the stack.

OS vendors have been recently rushing out kernels to harden against a possible attack that accesses memory near the gap between stack and lower addresses. So it's quite possible that the mappings in that area have been changed, and there's an interaction with something Java tries to do (though what exactly, I haven't guessed).

You might check for your own hs_err_pid*.log file and see if the salient details match those in #128.

It would be of interest to find out if there's a way to get Java standalone to misbehave in a similar way under this updated kernel.

-Chap

Contributor

jcflack commented Jun 24, 2017

Hi,

You might check issue #128, reporting a crash from Java after a recent kernel update. That could be consistent with your observation of a problem right after an OS reboot.

The hs_err_pid*.log file in #128 indicates at the time of the crash that Java is driving, and doing nothing more exotic than initializing java.lang.Object very early in VM startup. A SIGBUS gets raised on an attempt to access an address that falls in an unmapped memory region just below the stack.

OS vendors have been recently rushing out kernels to harden against a possible attack that accesses memory near the gap between stack and lower addresses. So it's quite possible that the mappings in that area have been changed, and there's an interaction with something Java tries to do (though what exactly, I haven't guessed).

You might check for your own hs_err_pid*.log file and see if the salient details match those in #128.

It would be of interest to find out if there's a way to get Java standalone to misbehave in a similar way under this updated kernel.

-Chap

@jcflack

This comment has been minimized.

Show comment
Hide comment
Contributor

jcflack commented Jun 24, 2017

@jcflack

This comment has been minimized.

Show comment
Hide comment
@jcflack

jcflack Jun 24, 2017

Contributor

I can't access the Red Hat solutions link, but other reports of the issue online suggest adding -Xss2M (or larger) to the VM options to make the per-thread stack at least 2 MB in size. What the new kernel does apparently is to increase the size of the "Stack Guard" region below the stack in a way that Java blunders into if the initial stack size isn't big enough.

According to the docs, this option is a hard stack size setting, not a minimum. Not only will the stack begin that size, it also can't grow. So if whatever PL/Java is being used for might require more than 2 MB of stack, the option may need to be increased further to avoid stack overflow errors.

I assume this is an interim solution, and Oracle will eventually release a Java update that doesn't blunder into the stack guard, and then -Xss won't have to be explicitly set.

Contributor

jcflack commented Jun 24, 2017

I can't access the Red Hat solutions link, but other reports of the issue online suggest adding -Xss2M (or larger) to the VM options to make the per-thread stack at least 2 MB in size. What the new kernel does apparently is to increase the size of the "Stack Guard" region below the stack in a way that Java blunders into if the initial stack size isn't big enough.

According to the docs, this option is a hard stack size setting, not a minimum. Not only will the stack begin that size, it also can't grow. So if whatever PL/Java is being used for might require more than 2 MB of stack, the option may need to be increased further to avoid stack overflow errors.

I assume this is an interim solution, and Oracle will eventually release a Java update that doesn't blunder into the stack guard, and then -Xss won't have to be explicitly set.

@petebrew

This comment has been minimized.

Show comment
Hide comment
@petebrew

petebrew Jun 24, 2017

Chap, you're a total hero! This fixes my problem for now and I guess we wait for a proper Java fix.

petebrew commented Jun 24, 2017

Chap, you're a total hero! This fixes my problem for now and I guess we wait for a proper Java fix.

@akshunj

This comment has been minimized.

Show comment
Hide comment
@akshunj

akshunj Jun 24, 2017

This seems to be the same issue I encountered in issue 128. Petebrew do you happen to know the OS kernel version?

akshunj commented Jun 24, 2017

This seems to be the same issue I encountered in issue 128. Petebrew do you happen to know the OS kernel version?

@petebrew

This comment has been minimized.

Show comment
Hide comment
@petebrew

petebrew Jun 24, 2017

The version that I know definitely highlights the problem is 4.4.0-81-generic. My server hasn't been restarted before this Wednesday for months so it's possible that earlier versions also had the issue.

My unattended-upgrades log shows that this kernel was installed on Monday 19th June, and given I haven't had complaints from other users (who are likely to restart much more regularly than me) before this week it seems a good guess that prior kernel versions work OK.

Using the vmoptions workaround with the more secure kernel seems the best fix for now though.

petebrew commented Jun 24, 2017

The version that I know definitely highlights the problem is 4.4.0-81-generic. My server hasn't been restarted before this Wednesday for months so it's possible that earlier versions also had the issue.

My unattended-upgrades log shows that this kernel was installed on Monday 19th June, and given I haven't had complaints from other users (who are likely to restart much more regularly than me) before this week it seems a good guess that prior kernel versions work OK.

Using the vmoptions workaround with the more secure kernel seems the best fix for now though.

@jcflack jcflack changed the title from Server closed the connection unexpectedly to PL/Java crash on Ubuntu latest kernel Jun 25, 2017

@jcflack

This comment has been minimized.

Show comment
Hide comment
@jcflack

jcflack Jun 25, 2017

Contributor

Related OpenJDK bug entry to watch: https://bugs.openjdk.java.net/browse/JDK-8182777

Contributor

jcflack commented Jun 25, 2017

Related OpenJDK bug entry to watch: https://bugs.openjdk.java.net/browse/JDK-8182777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment