When starting a container fails, don't wait until the full timeout; by ostacey · Pull Request #115 · testcontainers/testcontainers-java

ostacey · 2016-04-01T19:29:36Z

When starting a container fails, don't wait until the full timeout; bail out immediately.

I'm using "finishedAt" to tell if the container did start and finish. This was a change mentioned in #107, but this isn't a total fix for that issue.

This seems like a relatively safe change, and makes failures in container start a bit less annoying to troubleshoot.

ostacey · 2016-04-02T02:35:53Z

.... and, not so much on the "safe" change. I believe this is a race condition issue. The order of operations I expected was:

attempt to start container
container fails to start
inspect container status: running=false, finishedWhen=nonzero
--> we conclude that container stopped

This is what I see when I run this locally. However, what appears to be happening in TravisCI is:

attempt to start container
container begins to start
inspect container status: running=true
--> we assume container started correctly
container fails to start

I can get around that by adding a minimum runtime; any container that hasn't been running at least that long will not be considered running. A small default value will eliminate this case, while also adding what might be useful functionality for containers that don't listen on a port.

rnorth · 2016-04-02T02:52:39Z

Ah, that sounds unfortunate, but I guess good to have detected now.

So basically the process achieves 'running' state but crashes within milliseconds?

I think the small minimum threshold you suggest sounds right as long as the value is very small. Implementation wise is it not equivalent to a Thread.sleep before starting to check the container status?

Perhaps it's worth making configurable and defaulting to 0ms; I get the feeling that this is going to be something that some use cases need but others don't. What do you think?

On 2 Apr 2016, at 03:35, ostacey notifications@github.com wrote:

.... and, not so much on the "safe" change. I believe this is a race condition issue. The order of operations I expected was:

attempt to start container
container fails to start
inspect container status: running=false, finishedWhen=nonzero
--> we conclude that container stopped
This is what I see when I run this locally. However, what appears to be happening in TravisCI is:

attempt to start container
container begins to start
inspect container status: running=true
--> we assume container started correctly
container fails to start
I can get around that by adding a minimum runtime; any container that hasn't been running at least that long will not be considered running. A small default value will eliminate this case, while also adding what might be useful functionality for containers that don't listen on a port.

―
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

bsideup · 2016-04-03T14:30:56Z

-            profiler.start("Wait until container state=running");
-            Unreliables.retryUntilTrue(30, TimeUnit.SECONDS, () -> {
+            profiler.start("Wait until container state=running, or there's evidence it failed to start.");
+            final Boolean[] startedOK = {null};


instead of a boolean you can define some non-Exception throwable (i.e. ContainerStartError extends Throwable) and catch it.

While that would work, I'd rather not. Recall that the javadoc for Error says:
"An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch."

With that in mind, what you're suggesting doesn't feel (to me at least) to pass a "principle of least surprise" check. The boolean[] approach is consistent with what we're doing with the int[] in start().

…ail out immediately.

…sts.

… case where a container starts but then quickly stops.

rnorth · 2016-04-09T13:40:21Z

Sorry for my absence over the last few days - I'm looking at this now!

ostacey mentioned this pull request Apr 2, 2016

Added "docker exec" type functionality to GenericContainer. #103 #109

Merged

bsideup reviewed Apr 3, 2016
View reviewed changes

Oliver Stacey added 3 commits April 7, 2016 14:31

When starting a container fails, don't wait until the full timeout; b…

d0a74b0

…ail out immediately.

Move status-interpeting code into its own class, with its own unit te…

08414c5

…sts.

Introduced "minimumRunningDuration" on GenericContainer, to catch the…

c681a6f

… case where a container starts but then quickly stops.

ostacey force-pushed the dont-wait-on-failed-start branch from b6de7b0 to c681a6f Compare April 7, 2016 21:54

rnorth merged commit c681a6f into testcontainers:master Apr 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When starting a container fails, don't wait until the full timeout; #115

When starting a container fails, don't wait until the full timeout; #115
rnorth merged 3 commits intotestcontainers:masterfrom
locationlabs:dont-wait-on-failed-start

ostacey commented Apr 1, 2016

Uh oh!

ostacey commented Apr 2, 2016

Uh oh!

rnorth commented Apr 2, 2016

Uh oh!

bsideup Apr 3, 2016

Uh oh!

ostacey Apr 7, 2016

Uh oh!

rnorth commented Apr 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ostacey commented Apr 1, 2016

Uh oh!

ostacey commented Apr 2, 2016

Uh oh!

rnorth commented Apr 2, 2016

Uh oh!

bsideup Apr 3, 2016

Choose a reason for hiding this comment

Uh oh!

ostacey Apr 7, 2016

Choose a reason for hiding this comment

Uh oh!

rnorth commented Apr 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants