Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to obtain Agent info #504

Closed
edwardcallahan opened this issue Jun 23, 2017 · 11 comments
Closed

Unable to obtain Agent info #504

edwardcallahan opened this issue Jun 23, 2017 · 11 comments
Assignees

Comments

@edwardcallahan
Copy link

edwardcallahan commented Jun 23, 2017

version 1.2.4, conductr 2.1.1

After simultaneously loading multiple bundles, ie

conduct load typesafe/internal-bundle/typesafe-website www-lb-prod-use-4b70508f9225bebe9c5884bd07e5429f9723b1fb6ffb3a34491d01304fbf7a26.zip &
conduct load typesafe/internal-bundle/sherpa-desk sherpa-desk-prod-7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0.zip &
conduct load typesafe/internal-bundle/sherpa-akka sherpa-akka-prod-cc8275cabf75e086da32ece40f0e84ec326d0fc44efe5c9fc801feddc0a5dd09.zip &
conduct load typesafe/internal-bundle/sherpa-sfdc sherpa-sfdc-prod-7e628f219c004870f61e3c81f65b40b1b78bde99c159e4ed3965787be987c29b.zip &
conduct load typesafe/internal-bundle/sherpa-desk sherpa-desk-prod-7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0.zip & 	

None of the bundles could be started.

Although agents are present

}ubuntu@ip-10-100-5-89:~/2.0$ curl http://10.100.6.104:9005/agents
[ {
  "address" : "akka.tcp://conductr-agent@10.100.3.244:2552/user/reaper/cluster-client#1475773385",
  "roles" : [ "haproxy", "us-east-1b" ],
  "observedBy" : [ {
    "node" : {
      "uid" : "-612088483",
      "address" : "akka.tcp://conductr@10.100.2.17:9004"
    }
  } ],
  "resourceAvailable" : {
    "nrOfCpus" : 2,
    "memory" : 7845437440,
    "diskSpace" : 18955894784
  }
}, {
  "address" : "akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/cluster-client#-820219770",
  "roles" : [ "web", "elasticsearch", "us-east-1c" ],
  "observedBy" : [ {
    "node" : {
      "uid" : "-612088483",
      "address" : "akka.tcp://conductr@10.100.2.17:9004"
    }
  } ]
}, {
  "address" : "akka.tcp://conductr-agent@10.100.4.200:2552/user/reaper/cluster-client#-1603607557",
  "roles" : [ "web", "elasticsearch", "us-east-1b" ],
  "observedBy" : [ {
    "node" : {
      "uid" : "-612088483",
      "address" : "akka.tcp://conductr@10.100.2.17:9004"
    }
  } ]
}, {
  "address" : "akka.tcp://conductr-agent@10.100.2.4:2552/user/reaper/cluster-client#-2138392620",
  "roles" : [ "web", "elasticsearch", "us-east-1a" ],
  "observedBy" : [ {
    "node" : {
      "uid" : "-612088483",
      "address" : "akka.tcp://conductr@10.100.2.17:9004"
    }
  } ]
} ]

The agents command cannot be used

ubuntu@ip-10-100-5-89:~/2.0$ conduct version
1.2.14
Supported API version(s): 1, 2

ubuntu@ip-10-100-5-89:~/2.0$ conduct agents
Error: Encountered unexpected error.
Error: Reason: KeyError 'resourceAvailable'
Error: Further information of the error can be found in the error log file: /home/ubuntu/.conductr/errors.log
ubuntu@ip-10-100-5-89:~/2.0$ tail -n 100 /home/ubuntu/.conductr/errors.log
2017-06-14 23:00:31,243: Failure running the following command: ['conduct', 'events', 'conductr.lightbend.com']
Traceback (most recent call last):
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 386, in _make_request
  File "<string>", line 2, in raise_from
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 382, in _make_request
  File "http/client.py", line 1197, in getresponse
  File "http/client.py", line 297, in begin
  File "http/client.py", line 258, in _read_status
  File "socket.py", line 575, in readinto
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "site-packages/requests/adapters.py", line 438, in send
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
  File "site-packages/requests/packages/urllib3/util/retry.py", line 357, in increment
  File "site-packages/requests/packages/urllib3/packages/six.py", line 686, in reraise
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 388, in _make_request
  File "site-packages/requests/packages/urllib3/connectionpool.py", line 308, in _raise_timeout
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='10.100.6.90', port=9005): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/conduct_events.py", line 15, in events
  File "conductr_cli/conduct_events.py", line 40, in get_bundle_events
  File "conductr_cli/conduct_request.py", line 18, in get
  File "site-packages/requests/api.py", line 72, in get
  File "site-packages/requests/api.py", line 58, in request
  File "site-packages/requests/sessions.py", line 518, in request
  File "site-packages/requests/sessions.py", line 639, in send
  File "site-packages/requests/adapters.py", line 514, in send
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='10.100.6.90', port=9005): Read timed out. (read timeout=5)
2017-06-22 21:31:59,733: Failure running the following command: ['conduct', 'load', 'typesafe/internal-bundle/sherpa-desk', 'sherpa-desk-prod-7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0.zip']
Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/validation.py", line 99, in handler
  File "conductr_cli/validation.py", line 116, in handler
  File "conductr_cli/validation.py", line 136, in handler
  File "conductr_cli/validation.py", line 152, in handler
  File "conductr_cli/validation.py", line 198, in handler
  File "conductr_cli/validation.py", line 222, in handler
  File "conductr_cli/validation.py", line 238, in handler
  File "conductr_cli/validation.py", line 256, in handler
  File "conductr_cli/validation.py", line 428, in handler
  File "conductr_cli/validation.py", line 464, in handler
  File "conductr_cli/conduct_load.py", line 71, in load
  File "conductr_cli/conduct_load.py", line 249, in load_v2
  File "conductr_cli/resolver.py", line 28, in resolve_bundle
  File "conductr_cli/resolvers/bintray_resolver.py", line 33, in resolve_bundle
  File "conductr_cli/resolvers/bintray_resolver.py", line 148, in bintray_download_artefact
  File "conductr_cli/resolvers/uri_resolver.py", line 44, in resolve_file
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.conductr/cache/bundle/sherpa-desk-v2-0c4eaae7f57dd510e355eb5ef4c502f90840d2d82e576d8d81e7b4a681a5b46b.zip.tmp'
2017-06-22 23:16:35,741: Failure running the following command: ['conduct', 'agents']
Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/conduct_agents.py", line 48, in agents
KeyError: 'resourceAvailable'
2017-06-22 23:16:50,497: Failure running the following command: ['conduct', 'agents']
Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/conduct_agents.py", line 48, in agents
KeyError: 'resourceAvailable'
2017-06-22 23:18:02,514: Failure running the following command: ['conduct', 'agents']
Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/conduct_agents.py", line 48, in agents
KeyError: 'resourceAvailable'
2017-06-22 23:59:57,756: Failure running the following command: ['conduct', 'agents']
Traceback (most recent call last):
  File "conductr_cli/main_handler.py", line 14, in run
  File "conductr_cli/conduct.py", line 35, in main_method
  File "conductr_cli/conduct_main.py", line 610, in run
  File "conductr_cli/validation.py", line 43, in handler
  File "conductr_cli/validation.py", line 59, in handler
  File "conductr_cli/conduct_agents.py", line 48, in agents
KeyError: 'resourceAvailable'
ubuntu@ip-10-100-5-89:~/2.0$ 

Agents logs:
agent6.23.tar.gz

@huntc
Copy link
Contributor

huntc commented Jun 23, 2017

As mentioned on Slack, there are no resourceAvailable records for the web-roled agents. The CLI needs to handle that.

The fact that there are no resources available probably means that anything web role based cannot be scheduled to run. However, resources should become available eventually.

@edwardcallahan
Copy link
Author

Restarting two agents (service conductr-agent restart) resolved the issue.

@huntc huntc reopened this Jun 23, 2017
@huntc
Copy link
Contributor

huntc commented Jun 23, 2017

Reopening as a) the cli should be resilient to this and b) need to understand why this agents couldn't get their resource info through.

markusjura added a commit to markusjura/conductr-cli that referenced this issue Jun 23, 2017
The `conduct agents` command is now tolerates the scenario that some agents contain resource information and some not.

Fixes a) of typesafehub#504.
@markusjura
Copy link
Contributor

@edwardcallahan @huntc The following error occurs on ConductR agent several times when the bundle is scaled.

2017-06-22T22:11:19.651Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:19.651UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Bundle configuration used: None
2017-06-22T22:11:19.651Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:19.651UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Alternate configuration used: None and Some(/tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-1543755063621017421/7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0/sherpa-desk-prod/bundle.conf)
2017-06-22T22:11:19.652Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:19.652UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - About to start process [Vector(/usr/bin/env, bash, /tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-1543755063621017421/start)]
2017-06-22T22:11:19.655Z ip-10-100-6-23 INFO  BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=22:11:19.655UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] started [18390]
2017-06-22T22:11:19.655Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=22:11:19.655UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been successfully started.
2017-06-22T22:11:20.074Z ip-10-100-6-23 WARN  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:20.074UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23
2017-06-22T22:11:20.088Z ip-10-100-6-23 INFO  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:20.088UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Component sherpa-desk exited with 255
2017-06-22T22:11:20.088Z ip-10-100-6-23 INFO  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:20.088UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Stopping bundle
2017-06-22T22:11:20.134Z ip-10-100-6-23 INFO  BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:20.134UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] exited with exit value [143]
2017-06-22T22:11:20.135Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:20.134UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been stopped with exit code Some(143)

So the bundle fails frequently because of Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23. Is this sherpa-desk related?

@markusjura
Copy link
Contributor

markusjura commented Jun 26, 2017

Additionally, I see that 2 hours before the conduct agents command has been performed, the RunExecutor is restarted due to inconsistent bundle state between core and agent:

2017-06-22T22:11:38.776Z ip-10-100-6-23 INFO  StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:38.776UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Restarting RunExecutor and stopping all running bundles due to an StopAllBundles event from akka.tcp://conductr@10.100.6.104:9004/user/reaper/agent-monitor/conductr-agent@10.100.6.23:2552.
2017-06-22T22:11:38.777Z ip-10-100-6-23 INFO  RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=22:11:38.777UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - Shutting down with 1 bundle(s).
2017-06-22T22:11:38.777Z ip-10-100-6-23 DEBUG RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=22:11:38.777UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - Stopping bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 on node akka.tcp://conductr-agent@10.100.6.23:2552.
2017-06-22T22:11:39.150Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.150UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Bundle configuration used: None
2017-06-22T22:11:39.150Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.150UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Alternate configuration used: None and Some(/tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-4841436986406642223/7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0/sherpa-desk-prod/bundle.conf)
2017-06-22T22:11:39.151Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.151UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - About to start process [Vector(/usr/bin/env, bash, /tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-4841436986406642223/start)]
2017-06-22T22:11:39.153Z ip-10-100-6-23 INFO  BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:39.153UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] started [19962]
2017-06-22T22:11:39.153Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.153UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been successfully started.
2017-06-22T22:11:39.573Z ip-10-100-6-23 WARN  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.573UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23
2017-06-22T22:11:39.586Z ip-10-100-6-23 DEBUG ElasticsearchCollector [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.586UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/ElasticsearchCollector, sourceActorSystem=conductr-agent] - Posted to Elasticsearch and receive a response failure: 503 Service Unavailable
2017-06-22T22:11:39.587Z ip-10-100-6-23 INFO  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-20, akkaTimestamp=22:11:39.587UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Component sherpa-desk exited with 255
2017-06-22T22:11:39.587Z ip-10-100-6-23 INFO  Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-20, akkaTimestamp=22:11:39.587UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Stopping bundle
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO  BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] exited with exit value [143]
2017-06-22T22:11:39.631Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been stopped with exit code Some(143)
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO  RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - All bundles stopped.
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO  StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - akka://conductr-agent/user/reaper/standalone-launcher-agent/run-executor-0 has been successfully terminated after stopping all the bundles.

After the restart of the RunExecutor the following log message is logged every 15 seconds:

2017-06-22T22:11:53.774Z ip-10-100-6-23 DEBUG AgentInfo [sourceThread=conductr-agent-akka.actor.default-dispatcher-17, akkaTimestamp=22:11:53.774UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/agent-info, sourceActorSystem=conductr-agent] - Starting Resource poller

This log message indicates that the resource poller actor gets started. We don't log any message when the start was successful. Once a resource poller actor has been started, it will send every 15 seconds the resources to ConductR core. Also here, we do not log any message on the agent so I cannot tell based on the log output if something is wrong here. However, the fact that ConductR core has no resourceAvailable information for this agent indicates that the resources are not successfully sent to the Core.

At the very beginning so before the above log messages I can see that agent connected successfully to the core and started sending resources:

2017-06-21T21:41:39.001Z ip-10-100-6-23 INFO  CoreNodesWriter [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-nodes-writer, sourceActorSystem=conductr-agent, akkaTimestamp=21:41:38.995UTC] - Saving contact points [akka.tcp://conductr@10.100.2.17:9004, akka.tcp://conductr@10.100.6.104:9004]
2017-06-21T21:41:39.001Z ip-10-100-6-23 INFO  CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:38.996UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - New contact point akka.tcp://conductr@10.100.6.104:9004/system/receptionist
2017-06-21T21:41:39.002Z ip-10-100-6-23 DEBUG CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.001UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Terminating core services requestor for core-services-requestor-conductr@10.100.2.17:9004
2017-06-21T21:41:39.003Z ip-10-100-6-23 INFO  CoreNodesWriter [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=21:41:39.003UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-nodes-writer, sourceActorSystem=conductr-agent] - Saving contact points [akka.tcp://conductr@10.100.2.17:9004, akka.tcp://conductr@10.100.6.104:9004, akka.tcp://conductr@10.100.4.28:9004]
2017-06-21T21:41:39.009Z ip-10-100-6-23 DEBUG CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.009UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Starting core services requestor for conductr@10.100.6.104:9004
2017-06-21T21:41:39.011Z ip-10-100-6-23 DEBUG CoreServicesRequestor [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=21:41:39.011UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy/core-services-requestor-conductr@10.100.6.104:9004/supervised-core-services-requestor-conductr@10.100.6.104:9004, sourceActorSystem=conductr-agent] - Requesting info on core services from akka.tcp://conductr@10.100.6.104:9004/user/reaper/core-services
2017-06-21T21:41:39.013Z ip-10-100-6-23 INFO  CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent, akkaTimestamp=21:41:39.012UTC] - New contact point akka.tcp://conductr@10.100.4.28:9004/system/receptionist
2017-06-21T21:41:39.027Z ip-10-100-6-23 INFO  ClusterClient [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=21:41:39.026UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/cluster-client, sourceActorSystem=conductr-agent] - Connected to [akka.tcp://conductr@10.100.2.17:9004/system/receptionist]
2017-06-21T21:41:39.042Z ip-10-100-6-23 INFO  CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=21:41:39.041UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Received core services information CoreServicesInfo(http://10.100.6.104:9005/,http://10.100.6.104:9008/,http://10.100.6.104:9007/).
2017-06-21T21:41:39.050Z ip-10-100-6-23 DEBUG AgentInfo [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.044UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/agent-info, sourceActorSystem=conductr-agent] - Starting Resource poller
2017-06-21T21:41:39.050Z ip-10-100-6-23 DEBUG ResourcePoller [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=21:41:39.045UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/resource-polling, sourceActorSystem=conductr-agent] - Start resource polling

Summary

Based on the ConductR agent log files I am not able to identify the issue why the resources are not updated on ConductR core. The ConductR agent was able to connect with a core initially and continues to send the resources. To further analyze the issue I'd need the ConductR core log files during the same time frame as well. @edwardcallahan Are you able to attach these log files as well?

@edwardcallahan
Copy link
Author

@markusjura wrt #504 (comment)
The sherpa-desk bundle has only its name set from the config bundle
name = "prod-sherpa-desk"
The start command is from the sbt generated application bundle (it is in the .conductr/cache folder on selkirk.lightbend.com)

start-command    = ["sherpa-desk/bin/sherpa-desk", "-J-Xms134217728", "-J-Xmx134217728", "-Dhttp.port=$DESK_BIND_PORT -Dhttp.address=$DESK_BIND_IP"]

@edwardcallahan
Copy link
Author

core_logs.tar.gz
All log files for the 22nd of the 2.17 core node observer for 6.23 as requested.

Note: When cluster was built it was over provisioned with agents. These private agent nodes are stopped before loading bundles to provide 'ready to go' spare agents. That might be the cause of the reconcile event you observed in the logs, @markusjura.

@markusjura
Copy link
Contributor

@edwardcallahan With stopped, do you mean a manual stop by a user?

I've tried to continue the investigation with the provided core log. Turns out that I need the log files of the day of all 3 core nodes. Tried to get them by ssh into selkirk and then on the respective nodes but the /var/log/conductr folders only containes archives from 29th of June. Is there a change to get the logs of the remaing cores from 22nd?

@edwardcallahan
Copy link
Author

@markusjura The nodes were stopped using AWS EC2 controls. Not terminated, but stopped, so that it can be resumed later, if needed, without incurring compute costs.

Regarding the logfile rotation, it seems that logratate is enabled and the logs were rotated away. Sorry. That said, the cluster is back in that state today. I'll assert that we can get it back into this state again with the bundles we have currently loaded.

@markusjura
Copy link
Contributor

@edwardcallahan If you can (and want) to reproduce the issue with the current bundles, please do and send me the logs of the failing agent and all core nodes. Otherwise, we can close the issue and open another one once we reproduced it.

@markusjura
Copy link
Contributor

Closing the issue because it cannot be reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants