Add 2.1.* - 2.2.* tests #202

brett060102 · 2021-08-23T22:21:22Z

Add the 2.1.* and 2.2.* tests.

Also added support to run all tests.

arbulu89

Hi @brett060102
Comments added

examples/hana-scale-up-perf-optimized-azure.yaml

runner/ansible/check.yml

runner/ansible/roles/2.1.1/tasks/main.yml~

runner/ansible/roles/2.1.3/tasks/main.yml

arbulu89 · 2021-08-30T13:15:49Z

runner/ansible/vars/azure_env.yml

+  "2.1.6.rsc_order_kind": "Optional"
+  "2.1.6.rsc_order_first": "cln_SAPHanaTopology"
+  "2.1.6.rsc_order_then": "msl_SAPHana"
+  "2.1.7.provider": "provider = SAPHanaSR"


Can we change the next for lines to only contain the value, and not the key?
It requires to change the test of course

arbulu89 · 2021-08-30T13:17:17Z

runner/ansible/roles/2.2.2/tasks/main.yml

@@ -0,0 +1,24 @@
+---
+
+- name: Set Test ID


I don't know if we should have this check, that only passes if the SLE version is 15.
@lee-martin should we keep it?
Maybe we could check if the version is greater than 15 at most

@arbulu89 This looks a little strange, but this is the first time I look at our Ansible checks, so I might be missing something.
So the original check ID 2.2.2 is in https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml and that checks if the OS is 15 SP1 or newer (using the value 151). To me this reads like somehow the test ID get mixed up with the target value we are actually checking for.

I also miss the "remediation" section, which most importantly contains the reason/reference why we are even performing the check. Where does one find the remediation text in the ansible runner?

Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?

@arbulu89 I agree constant values are highly suspicious and should be reviewed. Also, all these checks seem to back the remediation text which references the reason why we are checking something. Where does one find the remediation text in the ansible runner? This would then justify why each check does what it does.
Looking at the where these checks were ported from ( https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml ) it seems many didn't have a reference and my guess is they originate from a spreadsheet we used early in Trento, but it did not have references. We need to review all checks which do not have references anyway, and this process should then also uncover unrealistic checks, e.g. overly specific version checks of packages etc. Can we open issues on a check by check basis to do this? I would not make this a stopper for implementing ainsible runner right now, we can fix these checks after the transition step by step.

The remediation texts come in an additional file. We will add them in other PR.
We just want to have here the check implementation.
The remediation will be copy/paste from the current bench common checks anyway

My fault on this test. It actually checks if the version is greater than 151.
I guess we can move on with this by now

arbulu89 · 2021-08-30T13:20:07Z

runner/ansible/roles/2.2.3.exclude/tasks/main.yml

@@ -0,0 +1,24 @@
+---
+
+- name: Set Test ID


Can we run this checks with ansible native methods? https://docs.ansible.com/ansible/latest/collections/ansible/builtin/package_module.html

yes, we should be able to do that/

runner/ansible/roles/2.2.8/tasks/main.yml

Restore to original state

Should not have been added

brett060102

I have changed what I could.

add "facts_result is unreachable" to when for Post initial empty results to ARA if all hosts are unreachable since unreachable_count is only defined if "facts_result is unreachable" is true.

The cibadmin test line were very long so, replace with shell var if not: 1: single use 2: varible would not reduce line lenth significantly

arbulu89

Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?

arbulu89 · 2021-09-08T14:38:08Z

runner/ansible/check.yml

@@ -33,7 +33,9 @@
          import_role:
            name: post-results
            tasks_from: post
-          when: unreachable_count|int == hostvars|length
+          when:


I think this is fixed on: d0b8a46

runner/ansible/roles/2.1.6/tasks/main.yml

runner/ansible/roles/2.1.7/tasks/main.yml

runner/ansible/roles/2.2.4/tasks/main.yml

arbulu89

Hey @brett060102 ,
Here many other comments.
As far as I have seen these are the checks that we need to fix:
2.1.2
2.1.3
2.1.5
2.1.7
2.2.3

arbulu89 · 2021-09-09T12:57:28Z

runner/ansible/roles/2.1.2/tasks/main.yml

+- name: "{{ test_id }}.check"
+  shell: |
+    # SAPHanaTopology is not configured, then pass
+    cibadmin -Q --xpath "//primitive[@type='SAPHanaTopology']/@type" || exit 0


This should be a exit 1. If the SAPHanaTopology is not present we have to report it as error.
(Update the comment above too)

arbulu89 · 2021-09-09T13:00:27Z

runner/ansible/roles/2.1.3/tasks/main.yml

+- name: "{{ test_id }}.check"
+  shell: |
+    # SAP HANA Resource Agent is not configured, then pass
+    cibadmin -Q --xpath "//primitive[@type='SAPHana']/@type" || exit 0


This should report a exit 1. If this primitive doesn't exist we have a missconfigured cluster.
(Update the comment above too)

arbulu89 · 2021-09-09T13:01:30Z

runner/ansible/roles/2.1.3/tasks/main.yml

+    [[ $(cibadmin -Q --xpath "${XPATH} [@name='stop']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.stop_timeout'] }}" ]] && exit 1
+    [[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'interval="\K[^"]+') != "{{ expected[test_id + '.promote_interval'] }}" ]] && exit 1
+    [[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.promote_timeout'] }}" ]] && exit 1
+    XPATH="//primitive[@type='SAPHana']/instance_attributes/nvpair"


I would add a new line here, to add visibility

arbulu89 · 2021-09-09T13:07:15Z

runner/ansible/roles/2.1.5/tasks/main.yml

+    #  if azure-lb RA is not configured, then fail
+    cibadmin -Q --xpath "//primitive[@type='azure-lb']/@type" || exit 1
+    # if not grouped, exit with fail
+    [[ $(cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type") != "0" ]] && exit 1


We need to change this call. I would do the same that you did in the 2.1.4 check.

cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type" || exit 1

arbulu89 · 2021-09-09T13:15:51Z

runner/ansible/roles/2.1.7/tasks/main.yml

+  set_fact:
+    test_id: '2.1.7'
+
+- name: "{{ test_id }}.check"


I find this test quite wrong.
Here what the check should do:

FInd the sid

With the sid we can get the /usr/sap/${sid}/SYS/global/hdb/custom/config/global.ini file

This file is a INI file format, which has a [ha_dr_provider_SAPHanaSR] header

Within this header, we need to check the content of provider, path and execution order

So, could we split the check in 2 tasks, the first to get the file path and register the value.
Once we have the value, we could use the ini files task type to check the content.
What do you think?

This should be fixed now. Was able to test with the sample file you sent me.
Still follows was was done in example code.

arbulu89 · 2021-09-09T13:21:48Z

runner/ansible/roles/2.2.3/tasks/main.yml

+  shell: |
+    # Check the pacemaker version IS
+    VERSION_ID=$(rpm -qv pacemaker | sed -e "s/pacemaker-\(.*\)+.*/\1/")
+    [[ $VERSION_ID -ge "{{ expected[test_id] }}" ]] && exit 0


This comparison doesn't work

Fixed, changed to use:
changed_when: config_updated.stdout is version(expected[test_id], '<')

arbulu89 · 2021-09-09T13:22:58Z

runner/ansible/roles/2.2.4/tasks/main.yml

+
+- name: "{{ test_id }}.check"
+  shell: |
+    # Check the pacemaker version IS


Typo in the comment pacemaker -> corosync

2.1.2: SAPHanaTopology is not configured, then fail 2.1.3: SAP HANA Resource Agent is not configured, then fail 2.1.5: fix cibadmin check 2.1.7: fix global.ini check move keys from var file to main.yml 2.2.3: fix version check add error if package not installed 2.2.3.exclude: add error if package not installed 2.2.4: add error if package not installed 2.2.5: add error if package not installed 2.2.5.exclude: add error if package not installed 2.2.6: add error if package not installed fix version check to use full version ID vars/azure_env.yml: removed keys fron 2.1.7 vars change 2.2.6 to be a full version spec vars/dev_env.yml: removed keys fron 2.1.7 vars change 2.2.6 to be a full version spec

brett060102 · 2021-09-10T17:06:49Z

@arbulu89 fingers crossed, but I think this is good now. We might want to move to use the ansible version compare everywhere we do version checking, but I think we should hold on that for later.

arbulu89

Hi @brett060102 ,
The changes look good in general.
As a last request, could we use the is version logic in all of the checks where we do version comparison. It looks the neater option, and having the same way in all the "same kind of check" items looks a better option

arbulu89 · 2021-09-13T06:49:46Z

runner/ansible/roles/2.2.3.exclude/tasks/main.yml

+    exit 1
+  check_mode: no
+  register: config_updated
+  changed_when: config_updated.rc != 0


We don't we use the is version logic here?
By the way, I don't get why you have started using the exit 2 here.
exit 1 doesn't do the job?

arbulu89 · 2021-09-13T06:54:03Z

runner/ansible/roles/2.2.3/tasks/main.yml

+  shell: |
+    # Check the pacemaker version IS
+    # If not installed, exit with error
+    rpm -q --qf "%{VERSION}\n" pacemaker || exit 2


I don't think we need the exit 2 here.
If the package doesn't exist, the rpm command will return a 1, so I think it is neater to have the failed_when: config_udpated.rc > 0.

arbulu89 · 2021-09-13T06:54:41Z

runner/ansible/roles/2.2.4/tasks/main.yml

+    # Check the corosync version IS
+    # If not installed, exit with error
+    PACKAGE=$(rpm -qv corosync) || exit 2
+    [[ "$PACKAGE" == "{{ expected[test_id] }}" ]] && exit 0


The same here, why don't we use the is version logic?

arbulu89

Hi @brett060102 ,
We are almost! There, only some super minor changes required

runner/ansible/roles/2.2.4/tasks/main.yml

runner/ansible/roles/2.2.5/tasks/main.yml

arbulu89 · 2021-09-14T06:35:51Z

runner/ansible/roles/2.2.7/tasks/main.yml

-  changed_when: config_updated.rc != 0
-  failed_when: config_updated.rc > 1
+  changed_when: >
+    config_updated.stdout is version(expected[test_id + '.low'], '<') or


Let's test only the .low value here. I think it is good enough. The higher ending checking will create issues

runner/ansible/roles/2.2.8/tasks/main.yml

arbulu89

Other issue. The 2.1.6 check doesn't work properly

arbulu89 · 2021-09-14T08:05:02Z

runner/ansible/roles/2.1.6/tasks/main.yml

@@ -0,0 +1,35 @@
+---


Hey @brett060102 ,
I have tested this 2.1.6, and I would say it is broken, as the queried lines return multi line outputs, so the comparisons always fail
Here my outputs using the command that we have right now:

vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'score="\K[^"]+' 4000 -INFINITY vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc="\K[^"]+' | grep -o g_ip g_ip vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc-role="\K[^"]+' Started Master Started Slave vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc="\K[^"]+' | grep -o msl_SAPHana msl_SAPHana msl_SAPHana vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc-role="\K[^"]+' Master Slave vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'kind="\K[^"]+' Optional vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'first="\K[^"]+' | grep -o cln_SAPHanaTopology cln_SAPHanaTopology vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'then="\K[^"]+' | grep -o msl_SAPHana msl_SAPHana vmhana01:/home/cloudadmin #

Thanks, I see that you added a space to the initial grep search string. Not sure how this worked in the example code though.

arbulu89

Error in test 2.1.7

arbulu89 · 2021-09-14T08:09:25Z

runner/ansible/roles/2.1.7/tasks/main.yml

+    echo ${ha_dr_provider_SAPHanaSR} | grep "execution_order = {{ expected[test_id + '.execution_order'] }}" || exit 1
+    echo ${ha_dr_provider_SAPHanaSR} | grep "ha_dr_saphanasr = {{ expected[test_id + '.ha_dr_saphanasr'] }}" || exit 1
+    # if hook not found, exit with fail
+    share_path=$(cat /hana/shared/PRD/global/hdb/custom/config/global.ini | grep "path =" | sed "s/.*= //")


Other failure, this PRD here must be replaced by ${sid}

yes and thank you.

arbulu89 · 2021-09-14T09:34:35Z

Hey @brett060102,
I took the liberty to fix the mentioned issues. Find the commit here: 4bd8b83

brett060102 · 2021-09-14T13:46:34Z

Look over you changes and all look fine to me. Glad this is in. Thank you.

Add 2.1.4 - 2.2.* tests

31fa7ac

arbulu89 suggested changes Aug 30, 2021

View reviewed changes

brett060102 added 4 commits August 30, 2021 17:09

Update hana-scale-up-perf-optimized-azure.yaml

720682c

Restore to original state

remove cluster_selected_checks == all support

b8b4df5

Delete main.yml~

cddcb8a

Should not have been added

Fixed cut and paste error.

105ca45

brett060102 commented Aug 30, 2021

View reviewed changes

brett060102 and others added 6 commits September 7, 2021 11:19

Merge branch 'runner' into ansible-2.1

bd7a70f

Fix in cluster_selected_checks

39c2bd2

Fix typo

a5e4501

Fix dangling space and update when for Post initial empty results

7d503cd

add "facts_result is unreachable" to when for Post initial empty results to ARA if all hosts are unreachable since unreachable_count is only defined if "facts_result is unreachable" is true.

Use shell var for xpath in cibadmin commands

8ff23a0

The cibadmin test line were very long so, replace with shell var if not: 1: single use 2: varible would not reduce line lenth significantly

fix typo

fef246f

arbulu89 suggested changes Sep 8, 2021

View reviewed changes

arbulu89 suggested changes Sep 9, 2021

View reviewed changes

brett060102 and others added 2 commits September 10, 2021 11:39

Merge branch 'runner' into ansible-2.1

549833f

arbulu89 suggested changes Sep 13, 2021

View reviewed changes

Update version tests to use ansible version builtin

5180bfd

arbulu89 suggested changes Sep 14, 2021

View reviewed changes

arbulu89 reviewed Sep 14, 2021

View reviewed changes

Fix 2.1.6, 2.1.7 and 2.2.7 checks

4bd8b83

arbulu89 merged commit f276785 into trento-project:runner Sep 14, 2021

Add 2.1.* - 2.2.* tests #202

Add 2.1.* - 2.2.* tests #202

Conversation

brett060102 commented Aug 23, 2021

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lee-martin Sep 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brett060102 left a comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brett060102 commented Sep 10, 2021

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 commented Sep 14, 2021

brett060102 commented Sep 14, 2021

lee-martin Sep 8, 2021 •

edited