Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Container Shutdown behavior changed between 1.19.1 and 1.19.2 ressources not cleaned up properly #7871

Closed
EAlf91 opened this issue Nov 24, 2023 · 14 comments
Labels

Comments

@EAlf91
Copy link

EAlf91 commented Nov 24, 2023

Module

Core

Testcontainers version

1.19.2

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

x86

Docker version

Docker version 24.0.5, build ced0996

What happened?

We're facing problems running our integration tests using Testcontainers since 1.19.2 and even though our tests run fine, we're facing a lot of errors and flooded log outputs.

out test setup looks like this:

@SpringBootTest
@ActiveProfiles("test")
public class BaseIntegrationTest {

    @Autowired
    EntityManager entityManager;


    @Autowired
    JdbcTemplate jdbcTemplate;

    protected static final String ACCOUNT_ID = "000000000000";
    private static final String CONTAINER_PATH = "/etc/localstack/init/ready.d/";
    public static final LocalStackContainer LOCALSTACK = new LocalStackContainer(DockerImageName.parse("localstack/localstack"))
            .withCopyFileToContainer(MountableFile.forHostPath("localstack/init.sh"), CONTAINER_PATH)
            .waitingFor(Wait.forLogMessage(".*Setup done, ready to process*\\n", 1))
            .withServices(SQS, SNS, S3)
            .withEnv("AWS_DEFAULT_REGION", Region.EU_CENTRAL_1.id())
            .withEnv("ENVIRONMENT", "test");

    static {
        LOCALSTACK.start();

        System.setProperty("spring.cloud.aws.endpoint", LOCALSTACK.getEndpoint().toString());
        System.setProperty("spring.cloud.aws.credentials.access-key", LOCALSTACK.getAccessKey());
        System.setProperty("spring.cloud.aws.credentials.secret-key", LOCALSTACK.getSecretKey());
        System.setProperty("ACCOUNT_ID", ACCOUNT_ID);
        System.setProperty("ENVIRONMENT", "test");
    }

    @BeforeEach
    public void init() {
        truncate();
    }

    protected void truncate() {
        jdbcTemplate.execute("some table");
    }

when completing our tests we face a lot of errors due to our sqs listeners not being able to poll messages anymore.

The complete log output can be found in the relevant log output section

2023-11-24T08:16:05.965+01:00 ERROR 16588 --- [nc-response-1-6] i.a.c.s.l.s.AbstractPollingMessageSource : Error polling for messages in queue https://sqs.eu-central-1.amazonaws.com/000000000000/some_test_queue

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it
 does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely ma
nner. Channel Information: ChannelDiagnostics(channel=[id: 0x7eadd6af, L:/127.0.0.1:57073 ! R:/127.0.0.1:56971], channelAge=PT0.0129724S, requestCount=1)
        at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
        at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)

Relevant log output

2023-11-24T08:16:05.965+01:00 ERROR 16588 --- [nc-response-1-6] i.a.c.s.l.s.AbstractPollingMessageSource : Error polling for messages in queue https://sqs.eu-central-1.amazonaws.com/000000000000/some_test_queue

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it
 does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely ma
nner. Channel Information: ChannelDiagnostics(channel=[id: 0x7eadd6af, L:/127.0.0.1:57073 ! R:/127.0.0.1:56971], channelAge=PT0.0129724S, requestCount=1)
        at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
        at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
        at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
        at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:165)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it does not: consider disabling a
ny proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely manner. Channel Information: Chan
nelDiagnostics(channel=[id: 0x7eadd6af, L:/127.0.0.1:57073 ! R:/127.0.0.1:56971], channelAge=PT0.0129724S, requestCount=1)
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
        at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182)
        ... 22 common frames omitted
        Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but 
if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a time
ly manner. Channel Information: ChannelDiagnostics(channel=[id: 0x82f0c219, L:/127.0.0.1:57027 ! R:/127.0.0.1:56971], channelAge=PT1.3915045S, requestCount=1)
        Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Eine bestehende Verbindung wurde softwaregesteuert
durch den Hostcomputer abgebrochen
        Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but 
if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a time
ly manner. Channel Information: ChannelDiagnostics(channel=[id: 0xf9454d2c, L:/127.0.0.1:57061 ! R:/127.0.0.1:56971], channelAge=PT0.0134961S, requestCount=1)
Caused by: java.io.IOException: The connection was closed during the request. The request will usually succeed on a retry, but if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing
 a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely manner. Channel Information: ChannelDiagnostics(channel=[id: 0x7eadd6af, L:/127.0.0.1:57073 ! R:/127.0
.0.1:56971], channelAge=PT0.0129724S, requestCount=1)
        at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:472)
        at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:222)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
        at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:206)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
        at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:418)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376)
        at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:328)
        at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:221)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
        at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
        at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        ... 1 common frames omitted

Additional Information

No response

@eddumelendez
Copy link
Member

This is not related to changes in Testcontainers 1.19.2. I guess you are using @SpringBootTest and @Testcontainers so, the following happens

  1. Containers start
  2. Spring Boot Application Context starts
  3. Run tests
  4. Containers stop
  5. Spring Boot Application Context stops

Between step 4 and 5, the container and service it will become unavailable and the client is still running and sometimes trying to connect to the service that is not running anymore. You can use @DirtiesContext to close the context before the container. Of course, you should be aware of the consequences using it. See this issue in spring-boot project and also this comment.

@eddumelendez eddumelendez closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2023
@EAlf91
Copy link
Author

EAlf91 commented Nov 24, 2023

Hey @eddumelendez thanks for your reply. Initially I thought about something similar but the problem is, that this never happened before the upgrade of testcontainers from 1.19.1 to 1.19.2. I tested both versions and only this one seems to fail.

It's also the only dependency that has been updated around the time it started coming up in our projects. To me this seems highly related to this version upgrade and I also saw changes in the ryuk setup in 1.19.2.

@eddumelendez
Copy link
Member

I've been always experiencing this with sqs, activemq, artemis, r2dbc clients sometimes so far. The change about ryuk is that it will not be running 10 additional seconds after test are done but kill just right after the JVM shutdowm.

@eddumelendez
Copy link
Member

You can see others reporting the same in previous versions.

@EAlf91
Copy link
Author

EAlf91 commented Nov 27, 2023

Thanks @eddumelendez your hints and the previous issue you mentioned was really helpful! 👍

@xak2000
Copy link

xak2000 commented Jan 30, 2024

@eddumelendez Is it possible to add an ability to optionally disable the registering of this shudown hook? If I understand you correctly, ryuk will be runned for 10s after JVM is shut down. Is it guaranteed? If so, then it will allow all Spring Application Contexts to properly shut down during these 10 seconds.

At the moment after upgrading to Spring Boot 3.2.2 (and TestContainers 1.19.3) this behavior just makes any logic, that executes during Spring Application shutdown phase, and requires a container to exist, to fail. One example I described here: #7454 (comment)

Or what else can be done except marking each method with @DirtiesContext to guarantee the order of shutting down the Spring Context and TestContainers? Maybe TestContainers devs can work with Spring devs to find a solution? :)

@Tomasz-Marciniak
Copy link

Tomasz-Marciniak commented Feb 25, 2024

I can confirm that the issue occurs when I bump the version fom 1.19.1 to 1.19.2 where a shutdown hook has been added.

@eddumelendez - Could the shutdown hook registration be configurable so devs could decide if they want to use new or legacy approach which works Spring context?

I agree with @xak2000, test containers are widely used in Spring applications tests so I think that integration should also be tested before releasing a new version of test containers. Spring major version has been not bumped up so it means that testcontainers should work with all 6.x.x versions.

DirtyContext is not an option for apps with a lot of tests. I want to use shared context between tests to speed them up.

@mjagus
Copy link

mjagus commented Feb 25, 2024

Can this be reopened? The change from #7717 actually broke shutdown procedure in a lot of Spring Boot applications and there's no way to fix it without reverting back to 1.19.1.

In this thread it was incorrectly stated that Ryuk will shut down immediately after JVM shutdown, but the truth is that Ryuk now receives SIGTERM the moment JVM receives SIGTERM which introduces race condition between Spring's shutdown hook and Ryuk.

@eddumelendez
Copy link
Member

The change from #7717 actually broke shutdown procedure in a lot of Spring Boot applications and there's no way to fix it without reverting back to 1.19.1.

@mjagus can you share a project that reproduces the issue? I'm constantly running several examples and those works well. Also, spring boot keep updated with latest versions without issues. The well-known behavior is described here but that's related to logging and no test failures.

@thristov4043
Copy link

This is an issue that I am facing in projects that I am working on as well. As soon as I update to 1.19.2 getting errors like:

Unable to execute HTTP request: Connection refused: Request attempt 1 failure: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry,

@Tomasz-Marciniak
Copy link

@eddumelendez - Do you need a project which reproduces the issue even you know that the test containers stops before spring context is closed? I can provide one if you need it.

Saying that this is "just logging issue" is difficult for me to understand. I do not accept any exception in tests unless it is expected exception. Mongo connectivity exceptions are not expected.

I will stay with version which does not clutter my logs. I think test containers is great and I'm keeping fingers crossed to find satisfying solution.

@xak2000
Copy link

xak2000 commented Mar 7, 2024

I'll add my 5 cents to the last comment.

The problem with this "just logging issue" is that it is not always just logging issue, but sometimes it is also actual behavior issue.

As I alredy described here there are legitimate cases when Spring Framework (or one of its libraries) tries to execute an SQL query when Spring Context is closing. One example is DefaultLockRepository.close() from the Spring Integration project, that is not even a 3rd party library, but is an official 1st party library for Spring Framework. The method is designed to be called when Spring Application Context is closing. So, the Application in tests behaves not as expected. Tests still pass, yes. But many exceptions are thrown during shutdown. Also SQL connection pool (HikariCP in my case) can't understand what is going on and why all its connections in the pool suddenly became broken, so it waits (30 seconds by default) before giving up. This increases tests execution time by [30 second] * [number of started contexts]. And a lot of garbage log messages with stacktraces because all connections in the pool are dead and no new connections are possible to create.

It's just my experience. Who knows what else libraries this behavior could affect and in what way. I can't believe that this behavior is considered as OK now. What I believe is that it is probably not that easy to synchronize the shutdown of TestContainers with the shutdown of Spring Application Context. But this doesn't mean this problem should be ignored. Let's try to find a solution together! And while it is not found yet, at least return the old behavior of 10s delay before shutdown TestContainers. Probably with a config property to override this behavior, if these 10s more time of living containers is really a problem for someone.

You could argue that an application should be written in a way that non-functional external services should not prevent it from proper shutdown (12-factors and all this stuff) and I would agree, but this should not be "normal" situation that reproduces on every test run. A separate test should test this scenario, not all the tests that just test normal app behaviour scenarios.

@xak2000
Copy link

xak2000 commented Apr 5, 2024

@eddumelendez Do you still need a reproduction project or more arguments why this new behavior (instant shutdown) is problematic?

And, if you don't think that the new behavior is problematic, then can you please provide counter-arguments why it is not problematic in your opinion?

I and several other people described (with arguments) that this behavior leads to inability for Spring Context to properly shutdown. "Properly shutdown" is not directly related to logging. The application has rights to do anything during shutdown. Do you disagree?

And, if this "anything" happens to be something that is related to a container, that was already stopped, then this "anything" could not be done anymore. Right?

WIll it lead to a failed test? Probably, not (because all test methods are already finished at this point). But it could easily lead to thousand of other problems. E.g. very long shutdown (or even infinite) because the app tries to do something important on shutdown and it will try to reconnect multiple times to the "failed" service until the action is succeeded.

The situation when a service that the app depends on is stopped before the app is stopped is not normal. Yes, this situation should be handled, as it could happen in the producation too, but it is still not normal. It should not happen on every execution of the tests suite.

I'm scratching my head trying to understand why we are even discussing this... What is more important? The correctly executed shutdown sequence or freeing some memory 10s ealier? 🤷

I understand that not all apps do something important, that depends on a container during shutdown, but if even some applications do this, then the delay should be configurable at least.

@xak2000
Copy link

xak2000 commented Apr 6, 2024

An experiment that implements the idea of configurable shutdown hook: #7454 (reply in thread)

The direct link to the commit that implements it: alex-arana@a0fcbdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants