Skip to content

HttpWaitStrategy does not retry to re-validate ResponsePredicate #2516

@kirmandi

Description

@kirmandi

using the HttpWaitStrategy with a forResponsePredicate does only try it once, although the responseBody will change when startup is completed.
expectation would be that the strategy re-evaluated until timeout is reached and until it is successful.

test setup with a HDFS cluster where the NameNode is in safemode on startup and will change into normal operation mode only after DataNode is successfully registered:

public static final GenericContainer hadoopContainer = new GenericContainer("sequenceiq/hadoop-docker:2.7.1")
	.withExposedPorts(50070, 50075, 50090, 9000)
	.waitingFor(new WaitAllStrategy()
		.withStrategy(Wait.forLogMessage(".*localhost: starting nodemanager.*\n", 1))
		.withStrategy(Wait.forHttp("/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo")
				  .forPort(50070)
				  .forResponsePredicate(s -> !s.contains("Safe mode is ON.")))
		.withStartupTimeout(Duration.ofMinutes(2))
	);

the log output clearly only tries it once to validate the predicate:

2020-04-03 10:52:31.339 INFO [ontainers.wait.strategy.HttpWaitStrategy] [ducttape-0] - /magical_euler: Waiting for 120 seconds for URL: http://localhost:33077/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo
2020-04-03 10:52:32.438 TRACE[ontainers.wait.strategy.HttpWaitStrategy] [ducttape-1] - Get response code 200
2020-04-03 10:52:32.440 TRACE[ontainers.wait.strategy.HttpWaitStrategy] [ducttape-1] - Get response {  "beans" : [ {    "name" : "Hadoop:service=NameNode,name=NameNodeInfo",    "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",    "UpgradeFinalized" : true,    "ClusterId" : "CID-5e691286-4de5-4dde-800b-c02a7a8bf44a",    "Version" : "2.7.1, r15ecc87ccf4a0228f35af08fc56de536e6ce657a",    "Used" : 335872,    "Free" : 29158596608,    "Safemode" : "Safe mode is ON. The reported blocks 31 has reached the threshold 0.9990 of total blocks 31. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 17 seconds.",    "NonDfsUsedSpace" : 33566855168,    "PercentUsed" : 5.354608E-4,    "BlockPoolUsedSpace" : 335872,    "PercentBlockPoolUsed" : 5.354608E-4,    "PercentRemaining" : 46.48582,    "CacheCapacity" : 0,    "CacheUsed" : 0,    "TotalBlocks" : 31,    "TotalFiles" : 35,    "NumberOfMissingBlocks" : 0,    "NumberOfMissingBlocksWithReplicationFactorOne" : 0,    "LiveNodes" : "{\"localhost:50010\":{\"infoAddr\":\"127.0.0.1:50075\",\"infoSecureAddr\":\"127.0.0.1:0\",\"xferaddr\":\"127.0.0.1:50010\",\"lastContact\":0,\"usedSpace\":335872,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":33566855168,\"capacity\":62725787648,\"numBlocks\":31,\"version\":\"2.7.1\",\"used\":335872,\"remaining\":29158596608,\"blockScheduled\":0,\"blockPoolUsed\":335872,\"blockPoolUsedPercent\":5.354608E-4,\"volfails\":0}}",    "DeadNodes" : "{}",    "DecomNodes" : "{}",    "BlockPoolId" : "BP-1961412683-172.17.0.32-1450036414523",    "NameDirStatuses" : "{\"failed\":{},\"active\":{\"/tmp/hadoop-root/dfs/name\":\"IMAGE_AND_EDITS\"}}",    "NodeUsage" : "{\"nodeUsage\":{\"min\":\"0.00%\",\"median\":\"0.00%\",\"max\":\"0.00%\",\"stdDev\":\"0.00%\"}}",    "NameJournalStatus" : "[{\"stream\":\"EditLogFileOutputStream(/tmp/hadoop-root/dfs/name/current/edits_inprogress_0000000000000000192)\",\"manager\":\"FileJournalManager(root=/tmp/hadoop-root/dfs/name)\",\"required\":\"false\",\"disabled\":\"false\"}]",    "JournalTransactionInfo" : "{\"LastAppliedOrWrittenTxId\":\"192\",\"MostRecentCheckpointTxId\":\"191\"}",    "NNStarted" : "Fri Apr 03 04:52:15 EDT 2020",    "CompileInfo" : "2015-06-29T06:04Z by jenkins from (detached from 15ecc87)",    "CorruptFiles" : "[]",    "DistinctVersionCount" : 1,    "DistinctVersions" : [ {      "key" : "2.7.1",      "value" : 1    } ],    "SoftwareVersion" : "2.7.1",    "RollingUpgradeStatus" : null,    "Threads" : 35,    "Total" : 62725787648  } ]}
2020-04-03 10:54:12.156 ERROR[🐳 [sequenceiq/hadoop-docker:2.7.1]     ] [main] - Could not start container
org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException
	at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:70) ~[duct-tape-1.0.8.jar:?]
	at org.rnorth.ducttape.timeouts.Timeouts.doWithTimeout(Timeouts.java:60) ~[duct-tape-1.0.8.jar:?]
	at org.testcontainers.containers.wait.strategy.WaitAllStrategy.waitUntilReady(WaitAllStrategy.java:53) ~[testcontainers-1.13.0.jar:?]
	at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:876) ~[testcontainers-1.13.0.jar:?]
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:428) ~[testcontainers-1.13.0.jar:?]
	at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:317) ~[testcontainers-1.13.0.jar:?]
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) [duct-tape-1.0.8.jar:?]
	at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:315) [testcontainers-1.13.0.jar:?]
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:302) [testcontainers-1.13.0.jar:?]
	[...]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_242]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [?:1.8.0_242]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [?:1.8.0_242]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [?:1.8.0_242]
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:217) [junit-4.12.jar:4.12]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.createTest(SpringJUnit4ClassRunner.java:226) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner$1.runReflectiveCall(SpringJUnit4ClassRunner.java:289) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.methodBlock(SpringJUnit4ClassRunner.java:291) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:246) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
	at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:?]
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) [junit-rt.jar:?]
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) [junit-rt.jar:?]
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) [junit-rt.jar:?]
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_242]
	at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:65) ~[duct-tape-1.0.8.jar:?]
	... 34 more

i am not sure if i am doing something wrong or this is actually a bug.
(the response does change after ~10seconds and does not include the safemode info anymore, so that the predicate should succeed)

otherwise after looking into source code i suspect that the retryUntilScucess has to be replaced with retryUntilTrue (at least from description the former only does until success)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions