Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

[HERON-3707] ConfigMap Pod Template Support #3710

Merged
merged 134 commits into from Nov 2, 2021

Conversation

surahman
Copy link
Member

@surahman surahman commented Sep 9, 2021

Issue #3707:

This PR adds support for custom Pod Templates via the --config-property CLI flag. Please view the documentation for this PR in issue #3717 by going to the Files changed tab and then navigating to the View file option under the ... for the MD file.

Old Post

A preliminary WIP PR to add Pod Template support in Heron similar to how it is handled in Spark. The ConfigMap name should be passed in from the cli flag config-property.

The solution is following what has been done by Spark to produce the following YAML nodes:

  volumes:
    - name: pod-template-name  # from <POD_TEMPLATE_VOLUME>.
      configMap:
        name: configmap-name  # from <configmapName>.
        items:
        - key: pod-template-key  # from <POD_TEMPLATE_KEY>.
          path: executor-pod-spec-template-file-name # from <EXECUTOR_POD_SPEC_TEMPLATE_FILE_NAME>.

The first phase of this PR is generating the above basic YAML nodes in a V1PodSpec. The objective is to call the createConfigMapVolumeMount method from within createStatefulSet to add the V1PodTemplateSpec somewhere here:

final V1PodTemplateSpec podTemplateSpec = new V1PodTemplateSpec();
// set up pod meta
final V1ObjectMeta templateMetaData = new V1ObjectMeta().labels(getPodLabels(topologyName));
Map<String, String> annotations = new HashMap<>();
annotations.putAll(getPodAnnotations());
annotations.putAll(getPrometheusAnnotations());
templateMetaData.annotations(annotations);
podTemplateSpec.setMetadata(templateMetaData);

I have added what I believe (build/test issues on my end) will extract the configmapName from a provided parameter to --config-property. The option requires a key-value pair as outlined here:

--config-property (key=value; a config key and its value; default: [])

I have selected a key of heron.kubernetes.pod.template.configmap.name for now pending approval from the dev group.

I have not located the testing suite base for the methods in the V1Controller. If there is no test suite it would be prudent to invest some time in creating one.

@joshfischer1108
Copy link
Member

@surahman Is this ready for review?

@surahman
Copy link
Member Author

@joshfischer1108, yes please. I think it would be beneficial at this point to get some people who are more experienced with the codebase to once over things. We can iterate from there as needed.

We also need someone who can deploy and test the usage of Pod Templates in a ConfigMaps.

@nicknezis
Copy link
Contributor

I'll try testing this tonight. I hadn't noticed that the Spark code used a volume mount. Will have to review that this does what we need. As you said, if not, we can iterate as needed. Thank you for working this feature request.

@surahman
Copy link
Member Author

@nicknezis thank you, I do not have a workload I can actually test this with.

@joshfischer1108
Copy link
Member

@surahman Please add unit tests to cover the current code changes. I realize there are no unit tests atm on this file. But I think this is a good time to start.

@surahman
Copy link
Member Author

surahman commented Sep 13, 2021

@joshfischer1108 I have added some tests for the exposed code but the routines in the V1Controller are scoped to private. Testing these routines would mean having to make them protected and then exposing them using an accessor testing class that extends the V1Controller. I am not sure if changing the access level for the methods in V1Controller makes sense.

I have also added tests for the new Constants which were added to ensure complete patch code coverage. I feel testing them is redundant and that testing should be limited to routines and generated objects.

Edit: I have object reflection setup in the testing base for the V1ControllerTest to gain access to the private routines..

@surahman
Copy link
Member Author

surahman commented Sep 14, 2021

There are some issues with testing the createStatefulSet method in the V1Controller, namely with the private scope and an inability to mock it. It also performs some reads from the disk for files when it is setting up the Topology config. This requires mocking to simulate the disk reads, and there is no such functionality in place within the ToplogyUtilsTests.

I have a WIP testCreateStatefulSet method setup but it seems like it might have to be removed along with a refactoring of everything else.

@joshfischer1108
Copy link
Member

@joshfischer1108 I have added some tests for the exposed code but the routines in the V1Controller are scoped to private. Testing these routines would mean having to make them protected and then exposing them using an accessor testing class that extends the V1Controller. I am not sure if changing the access level for the methods in V1Controller makes sense.

I have also added tests for the new Constants which were added to ensure complete patch code coverage. I feel testing them is redundant and that testing should be limited to routines and generated objects.

Edit: I have object reflection setup in the testing base for the V1ControllerTest to gain access to the private routines..

Well done. Thank you.

@joshfischer1108
Copy link
Member

There are some issues with testing the createStatefulSet method in the V1Controller, namely with the private scope and an inability to mock it. It also performs some reads from the disk for files when it is setting up the Topology config. This requires mocking to simulate the disk reads, and there is no such functionality in place within the ToplogyUtilsTests.

I have a WIP testCreateStatefulSet method setup but it seems like it might have to be removed along with a refactoring of everything else.

Ok, let's create an issue around this and clean this up later.

@surahman surahman force-pushed the 3707-ConfigMap-Support branch 2 times, most recently from e9a9a7b to 13d196b Compare September 14, 2021 15:08
@surahman
Copy link
Member Author

surahman commented Sep 14, 2021

I will write up what I have found on the issue with testing createStatefulSet. It is achievable but will be rather involved and require some serious digging in the codebase. I think your judgement is sound to not convolve this issue with the other.

On a side note, I think it is time to switch over from Travis CI to Github Actions. From personal experience, I feel that GH Actions are faster to run than Travis CI and it might speed things up to not rely on a third-party service.

EDIT: I have created issue #3713 with my findings on the V1Controller test suite. If anyone has insights or comments please post there.

@nicknezis
Copy link
Contributor

Looking over the code, I think I failed to point out an important aspect of what Spark is doing with this feature. This code is important to understand the goal. When the scheduler is defining the Pod, it will check for a PodTemplate, and if not defined create a default. Currently the Heron scheduler only starts with a default Pod that it creates from scratch.

This Spark code illustrates the branching logic that checks for an existing PodTemplate config item.

This is the method used by Spark to actually load the template and create the pod spec.

The mounting of the ConfigMap as a volume, is actually something Spark specific that we might not need in Heron. In Spark there is a concept of a driver pod, and this creates the executor pods. So this is why they mount the Executor's PodTemplate ConfigMap into the Driver pod. If I'm not mistaken, our version of loadPodFromTemplate could directly lookup the ConfigMap PodTemplate without the need to mount the ConfigMap into the pod.

@surahman
Copy link
Member Author

surahman commented Sep 16, 2021

I think I have an idea of what needs to happen... time to iterate 😉.

@nicknezis You are correct, Spark's architecture has a driver/coordinator with a fleet of executors. Is there already a loadPodFromTemplate in Heron or do we need to put one together? I could not find anything in the V1Controller or K8S scheduler codebase in Heron. The getContainer routine is configuring the Docker container for executor deployment, is this the routine you are referring to?

There is also this which indicates there is a built-in function in the K8S client library (scala and this potentially java too) that should handle the parse and assembly of the Pod Template.

Currently createStatefulSet relies on the V1 K8S API to put together a default Pod Template in which most of the fields are simply set to null.

@nicknezis
Copy link
Contributor

Correct, Heron does not yet have a loadPodFromTemplate function. This PR's goal is to add it. We need to look up the ConfigMap, get the embedded template and then pass that into the kubernetesClient.pods().load(templateFile).get() call.

I don't think we can mount the template ConfigMap into the scheduler, so I think instead we will need to do a K8s call to get the ConfigMap. When we get to actually testing, perhaps we will need to add new K8s permissions, but we can figure that out once we can test running the new logic.

@surahman
Copy link
Member Author

surahman commented Sep 17, 2021

Okay, so I am going to endeavour to break this down - please bear with me as I am still relatively new to the codebase and K8s API...


Starting with the reference code in Spark, and keeping in mind that the Spark architecture is different from Heron's:

This simply gets the Hadoop config from the driver/coordinator:

val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)

These two lines are going to download the remote file from the driver/coordinator and make them local to a machine. The first line downloads the file, assuming I am looking at the correct Scala K8s API, and the second retrieves a file descriptor/handle on the downloaded file:

val localFile = downloadFile(templateFileName, Utils.createTempDir(), conf, hadoopConf)
val templateFile = new File(new java.net.URI(localFile).getPath)

This third line then does the heavy lifting of reading the Pod Template into a Pod Config from the newly copied local file:

val pod = kubernetesClient.pods().load(templateFile).get()

The final line sets up the Spark container with the Pod Template and specified name:

selectSparkContainer(pod, containerName)

Moving on to what we need to do on the Heron side:

  1. Read the ConfigMap name from the --config-property option. I set the key for this to heron.kubernetes.pod.template.configmap.name with the value being the file name.
  2. Read the YAML ConfigMap and extract the YAML node tree which contains the Pod Template. For this, we will either need a YAML parser or need to find a utility in the K8s Java API to do the job. I think the K8s Java API should include a utility for this, a lack thereof would remain a significant oversight on their part.
  3. Create a V1PodTemplateSpec object using the results from step 2.
  4. Iron out permission issues during testing, should they arise.

I am not familiar with the K8s API but will start digging around for a YAML config to V1 object parser, if anyone is aware of where it is please let me know. There are some suggestions here and the YAML reader within K8s is found here.

@nicknezis
Copy link
Contributor

I think that's correct, but steps 2 might be somewhat simpler than you describe. I'll try to find a better example, but I think step 2 should be something like this example. It is Fabric8 API, but there should be a similar get method using the official k8s api.

Once you lookup the ConfigMap object, you should be able to call getData() to retrieve the ConfigMap's value which should be the template that is passed into kubernetesClient.pods().load(template).get().

@nicknezis
Copy link
Contributor

Doing a bit of research, I see a listNamespacedConfigMap method that returns V1ConfigMapList. That class has a getItems method that returns a List<V1ConfigMap>. Not sure if there is a more direct way to get the specifically named ConfigMap, but this seems to match the logic in that Fabric8 example I previously linked to.

Fixed and cleaned up tests after switching to <readNamespacedConfigMap>.
@surahman
Copy link
Member Author

I got rid of the listNamespaceConfigMaps usage. Utilizing this really bothered me because in an actual production K8s cluster there are potentially thousands of ConfigMaps loaded in any given namespace. This could lead to memory issues on the Heron API server, no to mention it is exceedingly inefficient. If I can retrieve a specific ConfigMap on the CLI then it stands to reason that there should be a matching Java API call: readNamespacedConfigMap.

Changes are on the dev branch and everything is passing with the complete testing battery. K8s cluster deployment tests are working as well. I have included the test results below. I hope TravisCI is in a good mood today 🎲 🤞🏼.

Test suite in clean Heron Docker Ubuntu 18.04 LTS
INFO: Elapsed time: 1787.995s, Critical Path: 207.90s
INFO: 5581 processes: 2396 internal, 3185 local.
INFO: Build completed successfully, 5581 total actions
//heron/api/tests/cpp:serialization_unittest                             PASSED in 0.0s
//heron/api/tests/java:BaseWindowedBoltTest                              PASSED in 0.3s
//heron/api/tests/java:ConfigTest                                        PASSED in 0.4s
//heron/api/tests/java:CountStatAndMetricTest                            PASSED in 0.3s
//heron/api/tests/java:GeneralReduceByKeyAndWindowOperatorTest           PASSED in 0.4s
//heron/api/tests/java:HeronSubmitterTest                                PASSED in 1.9s
//heron/api/tests/java:JoinOperatorTest                                  PASSED in 0.4s
//heron/api/tests/java:KVStreamletShadowTest                             PASSED in 0.4s
//heron/api/tests/java:KeyByOperatorTest                                 PASSED in 0.4s
//heron/api/tests/java:LatencyStatAndMetricTest                          PASSED in 0.3s
//heron/api/tests/java:ReduceByKeyAndWindowOperatorTest                  PASSED in 0.4s
//heron/api/tests/java:StreamletImplTest                                 PASSED in 0.4s
//heron/api/tests/java:StreamletShadowTest                               PASSED in 0.6s
//heron/api/tests/java:StreamletUtilsTest                                PASSED in 0.3s
//heron/api/tests/java:UtilsTest                                         PASSED in 0.2s
//heron/api/tests/java:WaterMarkEventGeneratorTest                       PASSED in 0.4s
//heron/api/tests/java:WindowManagerTest                                 PASSED in 0.3s
//heron/api/tests/java:WindowedBoltExecutorTest                          PASSED in 0.5s
//heron/api/tests/scala:api-scala-test                                   PASSED in 1.0s
  WARNING: //heron/api/tests/scala:api-scala-test: Test execution time (1.0s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
//heron/ckptmgr/tests/java:CheckpointManagerServerTest                   PASSED in 0.6s
//heron/common/tests/cpp/basics:fileutils_unittest                       PASSED in 0.0s
//heron/common/tests/cpp/basics:rid_unittest                             PASSED in 0.0s
//heron/common/tests/cpp/basics:strutils_unittest                        PASSED in 0.0s
//heron/common/tests/cpp/basics:utils_unittest                           PASSED in 0.0s
//heron/common/tests/cpp/config:topology-config-helper_unittest          PASSED in 0.0s
//heron/common/tests/cpp/errors:errors_unittest                          PASSED in 0.0s
//heron/common/tests/cpp/errors:module_unittest                          PASSED in 0.0s
//heron/common/tests/cpp/errors:syserrs_unittest                         PASSED in 0.0s
//heron/common/tests/cpp/metrics:count-metric_unittest                   PASSED in 0.0s
//heron/common/tests/cpp/metrics:mean-metric_unittest                    PASSED in 0.0s
//heron/common/tests/cpp/metrics:multi-count-metric_unittest             PASSED in 0.0s
//heron/common/tests/cpp/metrics:multi-mean-metric_unittest              PASSED in 0.0s
//heron/common/tests/cpp/metrics:time-spent-metric_unittest              PASSED in 1.3s
//heron/common/tests/cpp/network:http_unittest                           PASSED in 0.1s
//heron/common/tests/cpp/network:order_unittest                          PASSED in 0.1s
//heron/common/tests/cpp/network:packet_unittest                         PASSED in 0.0s
//heron/common/tests/cpp/network:piper_unittest                          PASSED in 2.0s
//heron/common/tests/cpp/network:rate_limit_unittest                     PASSED in 4.1s
//heron/common/tests/cpp/network:switch_unittest                         PASSED in 0.2s
//heron/common/tests/cpp/threads:spcountdownlatch_unittest               PASSED in 2.0s
//heron/common/tests/java:ByteAmountTest                                 PASSED in 0.3s
//heron/common/tests/java:CommunicatorTest                               PASSED in 0.3s
//heron/common/tests/java:ConfigReaderTest                               PASSED in 0.3s
//heron/common/tests/java:EchoTest                                       PASSED in 0.5s
//heron/common/tests/java:FileUtilsTest                                  PASSED in 1.3s
//heron/common/tests/java:HeronServerTest                                PASSED in 1.6s
//heron/common/tests/java:PackageTypeTest                                PASSED in 0.3s
//heron/common/tests/java:SysUtilsTest                                   PASSED in 5.3s
//heron/common/tests/java:SystemConfigTest                               PASSED in 0.4s
//heron/common/tests/java:TopologyUtilsTest                              PASSED in 0.3s
//heron/common/tests/java:WakeableLooperTest                             PASSED in 1.3s
//heron/common/tests/python/pex_loader:pex_loader_unittest               PASSED in 0.7s
//heron/downloaders/tests/java:DLDownloaderTest                          PASSED in 0.9s
//heron/downloaders/tests/java:ExtractorTests                            PASSED in 0.3s
//heron/downloaders/tests/java:RegistryTest                              PASSED in 0.4s
//heron/executor/tests/python:executor_unittest                          PASSED in 1.1s
//heron/healthmgr/tests/java:BackPressureDetectorTest                    PASSED in 0.6s
//heron/healthmgr/tests/java:BackPressureSensorTest                      PASSED in 0.6s
//heron/healthmgr/tests/java:BufferSizeSensorTest                        PASSED in 0.6s
//heron/healthmgr/tests/java:DataSkewDiagnoserTest                       PASSED in 0.6s
//heron/healthmgr/tests/java:ExecuteCountSensorTest                      PASSED in 0.6s
//heron/healthmgr/tests/java:GrowingWaitQueueDetectorTest                PASSED in 0.5s
//heron/healthmgr/tests/java:HealthManagerTest                           PASSED in 0.8s
//heron/healthmgr/tests/java:HealthPolicyConfigReaderTest                PASSED in 0.5s
//heron/healthmgr/tests/java:LargeWaitQueueDetectorTest                  PASSED in 0.6s
//heron/healthmgr/tests/java:MetricsCacheMetricsProviderTest             PASSED in 0.6s
//heron/healthmgr/tests/java:PackingPlanProviderTest                     PASSED in 0.6s
//heron/healthmgr/tests/java:ProcessingRateSkewDetectorTest              PASSED in 0.6s
//heron/healthmgr/tests/java:ScaleUpResolverTest                         PASSED in 0.6s
//heron/healthmgr/tests/java:SlowInstanceDiagnoserTest                   PASSED in 0.6s
//heron/healthmgr/tests/java:UnderProvisioningDiagnoserTest              PASSED in 0.5s
//heron/healthmgr/tests/java:WaitQueueSkewDetectorTest                   PASSED in 0.5s
//heron/instance/tests/java:ActivateDeactivateTest                       PASSED in 0.5s
//heron/instance/tests/java:BoltInstanceTest                             PASSED in 0.5s
//heron/instance/tests/java:BoltStatefulInstanceTest                     PASSED in 2.5s
//heron/instance/tests/java:ConnectTest                                  PASSED in 0.6s
//heron/instance/tests/java:CustomGroupingTest                           PASSED in 0.5s
//heron/instance/tests/java:EmitDirectBoltTest                           PASSED in 0.5s
//heron/instance/tests/java:EmitDirectSpoutTest                          PASSED in 0.6s
//heron/instance/tests/java:GlobalMetricsTest                            PASSED in 0.3s
//heron/instance/tests/java:HandleReadTest                               PASSED in 0.6s
//heron/instance/tests/java:HandleWriteTest                              PASSED in 5.5s
//heron/instance/tests/java:MultiAssignableMetricTest                    PASSED in 0.2s
//heron/instance/tests/java:SpoutInstanceTest                            PASSED in 2.6s
//heron/instance/tests/java:SpoutStatefulInstanceTest                    PASSED in 2.5s
//heron/instance/tests/python/network:event_looper_unittest              PASSED in 3.0s
//heron/instance/tests/python/network:gateway_looper_unittest            PASSED in 11.0s
//heron/instance/tests/python/network:heron_client_unittest              PASSED in 1.0s
//heron/instance/tests/python/network:metricsmgr_client_unittest         PASSED in 0.9s
//heron/instance/tests/python/network:protocol_unittest                  PASSED in 0.9s
//heron/instance/tests/python/network:st_stmgrcli_unittest               PASSED in 0.9s
//heron/instance/tests/python/utils:communicator_unittest                PASSED in 0.9s
//heron/instance/tests/python/utils:custom_grouping_unittest             PASSED in 0.9s
//heron/instance/tests/python/utils:global_metrics_unittest              PASSED in 0.9s
//heron/instance/tests/python/utils:log_unittest                         PASSED in 0.8s
//heron/instance/tests/python/utils:metrics_helper_unittest              PASSED in 1.0s
//heron/instance/tests/python/utils:outgoing_tuple_helper_unittest       PASSED in 0.9s
//heron/instance/tests/python/utils:pplan_helper_unittest                PASSED in 0.9s
//heron/instance/tests/python/utils:py_metrics_unittest                  PASSED in 0.9s
//heron/instance/tests/python/utils:topology_context_impl_unittest       PASSED in 0.9s
//heron/instance/tests/python/utils:tuple_helper_unittest                PASSED in 0.8s
//heron/io/dlog/tests/java:DLInputStreamTest                             PASSED in 0.5s
//heron/io/dlog/tests/java:DLOutputStreamTest                            PASSED in 0.5s
//heron/metricscachemgr/tests/java:CacheCoreTest                         PASSED in 0.4s
//heron/metricscachemgr/tests/java:MetricsCacheQueryUtilsTest            PASSED in 0.3s
//heron/metricscachemgr/tests/java:MetricsCacheTest                      PASSED in 0.5s
//heron/metricsmgr/tests/java:FileSinkTest                               PASSED in 0.4s
//heron/metricsmgr/tests/java:HandleTManagerLocationTest                 PASSED in 0.5s
//heron/metricsmgr/tests/java:MetricsCacheSinkTest                       PASSED in 9.5s
//heron/metricsmgr/tests/java:MetricsManagerServerTest                   PASSED in 0.5s
//heron/metricsmgr/tests/java:MetricsUtilTests                           PASSED in 0.3s
//heron/metricsmgr/tests/java:PrometheusSinkTests                        PASSED in 0.4s
//heron/metricsmgr/tests/java:SinkExecutorTest                           PASSED in 0.4s
//heron/metricsmgr/tests/java:TManagerSinkTest                           PASSED in 9.4s
//heron/metricsmgr/tests/java:WebSinkTest                                PASSED in 0.5s
//heron/packing/tests/java:FirstFitDecreasingPackingTest                 PASSED in 0.7s
//heron/packing/tests/java:PackingPlanBuilderTest                        PASSED in 0.4s
//heron/packing/tests/java:PackingUtilsTest                              PASSED in 0.3s
//heron/packing/tests/java:ResourceCompliantRRPackingTest                PASSED in 0.7s
//heron/packing/tests/java:RoundRobinPackingTest                         PASSED in 0.6s
//heron/packing/tests/java:ScorerTest                                    PASSED in 0.3s
//heron/scheduler-core/tests/java:HttpServiceSchedulerClientTest         PASSED in 1.8s
//heron/scheduler-core/tests/java:JsonFormatterUtilsTest                 PASSED in 0.4s
//heron/scheduler-core/tests/java:LaunchRunnerTest                       PASSED in 1.1s
//heron/scheduler-core/tests/java:LauncherUtilsTest                      PASSED in 2.0s
//heron/scheduler-core/tests/java:LibrarySchedulerClientTest             PASSED in 0.4s
//heron/scheduler-core/tests/java:RuntimeManagerMainTest                 PASSED in 3.7s
//heron/scheduler-core/tests/java:RuntimeManagerRunnerTest               PASSED in 2.3s
//heron/scheduler-core/tests/java:SchedulerClientFactoryTest             PASSED in 1.5s
//heron/scheduler-core/tests/java:SchedulerMainTest                      PASSED in 3.0s
//heron/scheduler-core/tests/java:SchedulerServerTest                    PASSED in 0.4s
//heron/scheduler-core/tests/java:SchedulerUtilsTest                     PASSED in 1.2s
//heron/scheduler-core/tests/java:SubmitDryRunRenderTest                 PASSED in 1.5s
//heron/scheduler-core/tests/java:SubmitterMainTest                      PASSED in 1.1s
//heron/scheduler-core/tests/java:UpdateDryRunRenderTest                 PASSED in 1.6s
//heron/scheduler-core/tests/java:UpdateTopologyManagerTest              PASSED in 11.8s
//heron/schedulers/tests/java:AuroraCLIControllerTest                    PASSED in 0.4s
//heron/schedulers/tests/java:AuroraContextTest                          PASSED in 0.3s
//heron/schedulers/tests/java:AuroraLauncherTest                         PASSED in 1.1s
//heron/schedulers/tests/java:AuroraSchedulerTest                        PASSED in 2.6s
//heron/schedulers/tests/java:HeronExecutorTaskTest                      PASSED in 1.3s
//heron/schedulers/tests/java:HeronMasterDriverTest                      PASSED in 1.7s
//heron/schedulers/tests/java:KubernetesContextTest                      PASSED in 0.4s
//heron/schedulers/tests/java:KubernetesControllerTest                   PASSED in 0.3s
//heron/schedulers/tests/java:KubernetesLauncherTest                     PASSED in 0.8s
//heron/schedulers/tests/java:KubernetesSchedulerTest                    PASSED in 1.0s
//heron/schedulers/tests/java:KubernetesUtilsTest                        PASSED in 0.4s
//heron/schedulers/tests/java:LaunchableTaskTest                         PASSED in 0.5s
//heron/schedulers/tests/java:LocalLauncherTest                          PASSED in 1.1s
//heron/schedulers/tests/java:LocalSchedulerTest                         PASSED in 0.5s
//heron/schedulers/tests/java:MarathonControllerTest                     PASSED in 1.1s
//heron/schedulers/tests/java:MarathonLauncherTest                       PASSED in 0.8s
//heron/schedulers/tests/java:MarathonSchedulerTest                      PASSED in 0.4s
//heron/schedulers/tests/java:MesosFrameworkTest                         PASSED in 0.6s
//heron/schedulers/tests/java:MesosLauncherTest                          PASSED in 0.7s
//heron/schedulers/tests/java:MesosSchedulerTest                         PASSED in 0.6s
//heron/schedulers/tests/java:NomadSchedulerTest                         PASSED in 2.6s
//heron/schedulers/tests/java:SlurmControllerTest                        PASSED in 1.1s
//heron/schedulers/tests/java:SlurmLauncherTest                          PASSED in 1.0s
//heron/schedulers/tests/java:SlurmSchedulerTest                         PASSED in 1.0s
//heron/schedulers/tests/java:TaskResourcesTest                          PASSED in 0.3s
//heron/schedulers/tests/java:TaskUtilsTest                              PASSED in 0.3s
//heron/schedulers/tests/java:V1ControllerTest                           PASSED in 1.6s
//heron/schedulers/tests/java:VolumesTests                               PASSED in 0.6s
//heron/schedulers/tests/java:YarnLauncherTest                           PASSED in 0.6s
//heron/schedulers/tests/java:YarnSchedulerTest                          PASSED in 0.4s
//heron/simulator/tests/java:AllGroupingTest                             PASSED in 0.3s
//heron/simulator/tests/java:CustomGroupingTest                          PASSED in 0.3s
//heron/simulator/tests/java:FieldsGroupingTest                          PASSED in 0.8s
//heron/simulator/tests/java:InstanceExecutorTest                        PASSED in 0.5s
//heron/simulator/tests/java:LowestGroupingTest                          PASSED in 0.3s
//heron/simulator/tests/java:RotatingMapTest                             PASSED in 0.3s
//heron/simulator/tests/java:ShuffleGroupingTest                         PASSED in 0.3s
//heron/simulator/tests/java:SimulatorTest                               PASSED in 0.4s
//heron/simulator/tests/java:TopologyManagerTest                         PASSED in 0.4s
//heron/simulator/tests/java:TupleCacheTest                              PASSED in 0.3s
//heron/simulator/tests/java:XORManagerTest                              PASSED in 0.5s
//heron/spi/tests/java:ConfigLoaderTest                                  PASSED in 1.3s
//heron/spi/tests/java:ConfigTest                                        PASSED in 1.0s
//heron/spi/tests/java:ContextTest                                       PASSED in 0.3s
//heron/spi/tests/java:ExceptionInfoTest                                 PASSED in 0.3s
//heron/spi/tests/java:KeysTest                                          PASSED in 0.4s
//heron/spi/tests/java:MetricsInfoTest                                   PASSED in 0.2s
//heron/spi/tests/java:MetricsRecordTest                                 PASSED in 0.2s
//heron/spi/tests/java:NetworkUtilsTest                                  PASSED in 1.5s
//heron/spi/tests/java:PackingPlanTest                                   PASSED in 0.3s
//heron/spi/tests/java:ResourceTest                                      PASSED in 0.3s
//heron/spi/tests/java:ShellUtilsTest                                    PASSED in 2.7s
//heron/spi/tests/java:TokenSubTest                                      PASSED in 0.3s
//heron/spi/tests/java:UploaderUtilsTest                                 PASSED in 0.4s
//heron/statefulstorages/tests/java:DlogStorageTest                      PASSED in 1.7s
//heron/statefulstorages/tests/java:HDFSStorageTest                      PASSED in 1.9s
//heron/statefulstorages/tests/java:LocalFileSystemStorageTest           PASSED in 1.0s
//heron/statemgrs/tests/cpp:zk-statemgr_unittest                         PASSED in 0.0s
//heron/statemgrs/tests/java:CuratorStateManagerTest                     PASSED in 0.5s
//heron/statemgrs/tests/java:LocalFileSystemStateManagerTest             PASSED in 1.2s
//heron/statemgrs/tests/java:ZkUtilsTest                                 PASSED in 1.6s
//heron/statemgrs/tests/python:configloader_unittest                     PASSED in 0.9s
//heron/statemgrs/tests/python:statemanagerfactory_unittest              PASSED in 0.9s
//heron/statemgrs/tests/python:zkstatemanager_unittest                   PASSED in 0.9s
//heron/stmgr/tests/cpp/grouping:all-grouping_unittest                   PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:custom-grouping_unittest                PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:fields-grouping_unittest                PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:lowest-grouping_unittest                PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:shuffle-grouping_unittest               PASSED in 0.0s
//heron/stmgr/tests/cpp/server:checkpoint-gateway_unittest               PASSED in 1.1s
  WARNING: //heron/stmgr/tests/cpp/server:checkpoint-gateway_unittest: Test execution time (1.1s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/server:stateful-restorer_unittest                PASSED in 0.0s
  WARNING: //heron/stmgr/tests/cpp/server:stateful-restorer_unittest: Test execution time (0.0s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/server:stmgr_unittest                            PASSED in 39.2s
//heron/stmgr/tests/cpp/util:neighbour_calculator_unittest               PASSED in 0.0s
  WARNING: //heron/stmgr/tests/cpp/util:neighbour_calculator_unittest: Test execution time (0.0s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/util:rotating-map_unittest                       PASSED in 0.0s
//heron/stmgr/tests/cpp/util:tuple-cache_unittest                        PASSED in 3.7s
//heron/stmgr/tests/cpp/util:xor-manager_unittest                        PASSED in 4.0s
//heron/tmanager/tests/cpp/server:stateful_checkpointer_unittest         PASSED in 0.0s
//heron/tmanager/tests/cpp/server:stateful_restorer_unittest             PASSED in 5.0s
//heron/tmanager/tests/cpp/server:tcontroller_unittest                   PASSED in 0.0s
//heron/tmanager/tests/cpp/server:tmanager_unittest                      PASSED in 26.1s
//heron/tools/apiserver/tests/java:ConfigUtilsTests                      PASSED in 0.4s
//heron/tools/apiserver/tests/java:TopologyResourceTests                 PASSED in 0.7s
//heron/tools/cli/tests/python:client_command_unittest                   PASSED in 1.0s
//heron/tools/cli/tests/python:opts_unittest                             PASSED in 0.8s
//heron/tools/explorer/tests/python:explorer_unittest                    PASSED in 1.1s
//heron/tools/tracker/tests/python:query_operator_unittest               PASSED in 1.4s
//heron/tools/tracker/tests/python:query_unittest                        PASSED in 1.3s
//heron/tools/tracker/tests/python:topology_unittest                     PASSED in 1.2s
//heron/tools/tracker/tests/python:tracker_unittest                      PASSED in 1.7s
//heron/uploaders/tests/java:DlogUploaderTest                            PASSED in 0.6s
//heron/uploaders/tests/java:GcsUploaderTests                            PASSED in 0.4s
//heron/uploaders/tests/java:HdfsUploaderTest                            PASSED in 0.4s
//heron/uploaders/tests/java:HttpUploaderTest                            PASSED in 0.6s
//heron/uploaders/tests/java:LocalFileSystemConfigTest                   PASSED in 0.3s
//heron/uploaders/tests/java:LocalFileSystemContextTest                  PASSED in 0.3s
//heron/uploaders/tests/java:LocalFileSystemUploaderTest                 PASSED in 0.4s
//heron/uploaders/tests/java:S3UploaderTest                              PASSED in 1.2s
//heron/uploaders/tests/java:ScpUploaderTest                             PASSED in 0.4s

INFO: Build completed successfully, 5581 total actions
Describe Pod acking-0
Name:           acking-0
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=heron
                controller-revision-hash=acking-7c4b6d7bb8
                statefulset.kubernetes.io/pod-name=acking-0
                topology=acking
Annotations:    prometheus.io/port: 8080
                prometheus.io/scrape: true
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/acking
Containers:
  executor:
    Image:       apache/heron:testbuild
    Ports:       5555/TCP, 5556/UDP, 6001/TCP, 6002/TCP, 6003/TCP, 6004/TCP, 6005/TCP, 6006/TCP, 6007/TCP, 6008/TCP, 6009/TCP
    Host Ports:  0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      sh
      -c
      ./heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0--5007119198915049925.tar.gz . && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor --topology-name=acking --topology-id=acking6cbc52b1-58bb-485a-866b-b4c4f5ce2a42 --topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181 --state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml --tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr --metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)" --classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml --override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824 --component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar --heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell --cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/* --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/* --python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance --metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled --is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*: --stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824 --health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/* --shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003 --shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007 --metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009
    Limits:
      cpu:     3
      memory:  4Gi
    Requests:
      cpu:     3
      memory:  4Gi
    Environment:
      HOST:        (v1:status.podIP)
      POD_NAME:   acking-0 (v1:metadata.name)
      var_one:    variable one
      var_three:  variable three
      var_two:    variable two
    Mounts:
      /shared_volume from shared-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4cxrh (ro)
  sidecar-container:
    Image:        alpine
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /shared_volume from shared-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4cxrh (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  shared-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-4cxrh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 10s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 10s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  44s   default-scheduler  0/1 nodes are available: 1 Insufficient cpu.

@surahman
Copy link
Member Author

What is wrong with TravisCI!?🤦🏼‍♂️ My last try at force-pushing to get the job on to a TravisCI cluster capable of passing the build.

<configureContainerResources> Heron values take precedence for limits.
<configureContainerResources> Heron values take precedence for limits.
@surahman
Copy link
Member Author

I realized whilst creating my slides that it makes more sense to let Heron's values for limits take precedence over those in the Pod Templates. The full battery of tests is passing locally, over to you TravisCI 🎲 🤞🏼 .

@nicknezis
Copy link
Contributor

Finally able to do some testing. I think we're still missing the Role update to include configmap resource get and list.

Kubernetes Scheduler Improvements automation moved this from In progress to Reviewer approved Oct 29, 2021
@surahman
Copy link
Member Author

Thank you @nicknezis for all the time and effort you have put into this PR, I really appreciate it.

@@ -47,4 +46,4 @@ script:
- python -V
- which python3
- python3 -V
- travis-wait-improved --timeout=180m scripts/travis/ci.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for removing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but @nicknezis fixed an issue with the TravisCI 🥊 Python and that might line may have been removed during the debug process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that script was causing the issue due to Python dependencies. I tested without the script and it worked. I think the default behavior is to terminate a build if there is no output for more than 10 minutes. I'm not sure if we have this issue in our build. If we do need to put it back, then we will have to resolve the Python issues.

@@ -0,0 +1,62 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on adding headers

@surahman
Copy link
Member Author

surahman commented Nov 1, 2021

Thank you @joshfischer1108 I really appreciate you looking these changes over!

@joshfischer1108
Copy link
Member

Thank you @joshfischer1108 I really appreciate you looking these changes over!

Ok the build is green 🙌 . Let's go!

@joshfischer1108 joshfischer1108 merged commit 837c4f2 into apache:master Nov 2, 2021
Kubernetes Scheduler Improvements automation moved this from Reviewer approved to Done Nov 2, 2021
@surahman
Copy link
Member Author

surahman commented Nov 2, 2021

Thank you @joshfischer1108 🎆 ! There is another PR incoming sometime today or tomorrow for the CLI PVC support 😄

@joshfischer1108
Copy link
Member

I will wait for that PR to come in then I think we are ready to create another Heron release and work towards graduating out of the incubator.

💯

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
4 participants