Skip to content
This repository has been archived by the owner on May 30, 2022. It is now read-only.

Add 2.1.* - 2.2.* tests #202

Merged
merged 15 commits into from Sep 14, 2021
Merged

Conversation

brett060102
Copy link
Contributor

Add the 2.1.* and 2.2.* tests.

Also added support to run all tests.

Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brett060102
Comments added

examples/hana-scale-up-perf-optimized-azure.yaml Outdated Show resolved Hide resolved
runner/ansible/check.yml Outdated Show resolved Hide resolved
runner/ansible/roles/2.1.1/tasks/main.yml~ Outdated Show resolved Hide resolved
runner/ansible/roles/2.1.3/tasks/main.yml Outdated Show resolved Hide resolved
runner/ansible/roles/2.1.3/tasks/main.yml Outdated Show resolved Hide resolved
"2.1.6.rsc_order_kind": "Optional"
"2.1.6.rsc_order_first": "cln_SAPHanaTopology"
"2.1.6.rsc_order_then": "msl_SAPHana"
"2.1.7.provider": "provider = SAPHanaSR"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the next for lines to only contain the value, and not the key?
It requires to change the test of course

@@ -0,0 +1,24 @@
---

- name: Set Test ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should have this check, that only passes if the SLE version is 15.
@lee-martin should we keep it?
Maybe we could check if the version is greater than 15 at most

Copy link

@lee-martin lee-martin Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arbulu89 This looks a little strange, but this is the first time I look at our Ansible checks, so I might be missing something.
So the original check ID 2.2.2 is in https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml and that checks if the OS is 15 SP1 or newer (using the value 151). To me this reads like somehow the test ID get mixed up with the target value we are actually checking for.

I also miss the "remediation" section, which most importantly contains the reason/reference why we are even performing the check. Where does one find the remediation text in the ansible runner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?

@arbulu89 I agree constant values are highly suspicious and should be reviewed. Also, all these checks seem to back the remediation text which references the reason why we are checking something. Where does one find the remediation text in the ansible runner? This would then justify why each check does what it does.
Looking at the where these checks were ported from ( https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml ) it seems many didn't have a reference and my guess is they originate from a spreadsheet we used early in Trento, but it did not have references. We need to review all checks which do not have references anyway, and this process should then also uncover unrealistic checks, e.g. overly specific version checks of packages etc. Can we open issues on a check by check basis to do this? I would not make this a stopper for implementing ainsible runner right now, we can fix these checks after the transition step by step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remediation texts come in an additional file. We will add them in other PR.
We just want to have here the check implementation.
The remediation will be copy/paste from the current bench common checks anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault on this test. It actually checks if the version is greater than 151.
I guess we can move on with this by now

@@ -0,0 +1,24 @@
---

- name: Set Test ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should be able to do that/

runner/ansible/roles/2.2.8/tasks/main.yml Show resolved Hide resolved
runner/ansible/roles/2.2.8/tasks/main.yml Outdated Show resolved Hide resolved
Copy link
Contributor Author

@brett060102 brett060102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed what I could.

brett060102 and others added 6 commits September 7, 2021 11:19
add "facts_result is unreachable" to when for
Post initial empty results to ARA if all hosts are unreachable
since unreachable_count is only defined if "facts_result is unreachable"
is true.
The cibadmin test line were very long so, replace with
shell var if not:
1: single use
2: varible would not reduce line lenth significantly
Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?

@@ -33,7 +33,9 @@
import_role:
name: post-results
tasks_from: post
when: unreachable_count|int == hostvars|length
when:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed on: d0b8a46

runner/ansible/roles/2.1.6/tasks/main.yml Show resolved Hide resolved
runner/ansible/roles/2.1.7/tasks/main.yml Outdated Show resolved Hide resolved
runner/ansible/roles/2.2.4/tasks/main.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brett060102 ,
Here many other comments.
As far as I have seen these are the checks that we need to fix:
2.1.2
2.1.3
2.1.5
2.1.7
2.2.3

- name: "{{ test_id }}.check"
shell: |
# SAPHanaTopology is not configured, then pass
cibadmin -Q --xpath "//primitive[@type='SAPHanaTopology']/@type" || exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a exit 1. If the SAPHanaTopology is not present we have to report it as error.
(Update the comment above too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

- name: "{{ test_id }}.check"
shell: |
# SAP HANA Resource Agent is not configured, then pass
cibadmin -Q --xpath "//primitive[@type='SAPHana']/@type" || exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should report a exit 1. If this primitive doesn't exist we have a missconfigured cluster.
(Update the comment above too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

[[ $(cibadmin -Q --xpath "${XPATH} [@name='stop']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.stop_timeout'] }}" ]] && exit 1
[[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'interval="\K[^"]+') != "{{ expected[test_id + '.promote_interval'] }}" ]] && exit 1
[[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.promote_timeout'] }}" ]] && exit 1
XPATH="//primitive[@type='SAPHana']/instance_attributes/nvpair"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a new line here, to add visibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# if azure-lb RA is not configured, then fail
cibadmin -Q --xpath "//primitive[@type='azure-lb']/@type" || exit 1
# if not grouped, exit with fail
[[ $(cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type") != "0" ]] && exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this call. I would do the same that you did in the 2.1.4 check.

cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type" || exit 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set_fact:
test_id: '2.1.7'

- name: "{{ test_id }}.check"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this test quite wrong.
Here what the check should do:

  1. FInd the sid
  2. With the sid we can get the /usr/sap/${sid}/SYS/global/hdb/custom/config/global.ini file
  3. This file is a INI file format, which has a [ha_dr_provider_SAPHanaSR] header
  4. Within this header, we need to check the content of provider, path and execution order

So, could we split the check in 2 tasks, the first to get the file path and register the value.
Once we have the value, we could use the ini files task type to check the content.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now. Was able to test with the sample file you sent me.
Still follows was was done in example code.

shell: |
# Check the pacemaker version IS
VERSION_ID=$(rpm -qv pacemaker | sed -e "s/pacemaker-\(.*\)+.*/\1/")
[[ $VERSION_ID -ge "{{ expected[test_id] }}" ]] && exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison doesn't work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, changed to use:
changed_when: config_updated.stdout is version(expected[test_id], '<')


- name: "{{ test_id }}.check"
shell: |
# Check the pacemaker version IS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment pacemaker -> corosync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

brett060102 and others added 2 commits September 10, 2021 11:39
2.1.2:
   SAPHanaTopology is not configured, then fail
2.1.3:
   SAP HANA Resource Agent is not configured, then fail
2.1.5:
   fix cibadmin check
2.1.7:
   fix global.ini check
   move keys from var file to main.yml
2.2.3:
   fix version check
   add error if package not installed
2.2.3.exclude:
   add error if package not installed
2.2.4:
   add error if package not installed

2.2.5:
   add error if package not installed

2.2.5.exclude:
   add error if package not installed

2.2.6:
   add error if package not installed
   fix version check to use full version ID

vars/azure_env.yml:
   removed keys fron 2.1.7 vars
   change 2.2.6 to be a full version spec
vars/dev_env.yml:
   removed keys fron 2.1.7 vars
   change 2.2.6 to be a full version spec
@brett060102
Copy link
Contributor Author

@arbulu89 fingers crossed, but I think this is good now. We might want to move to use the ansible version compare everywhere we do version checking, but I think we should hold on that for later.

Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brett060102 ,
The changes look good in general.
As a last request, could we use the is version logic in all of the checks where we do version comparison. It looks the neater option, and having the same way in all the "same kind of check" items looks a better option

exit 1
check_mode: no
register: config_updated
changed_when: config_updated.rc != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't we use the is version logic here?
By the way, I don't get why you have started using the exit 2 here.
exit 1 doesn't do the job?

shell: |
# Check the pacemaker version IS
# If not installed, exit with error
rpm -q --qf "%{VERSION}\n" pacemaker || exit 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the exit 2 here.
If the package doesn't exist, the rpm command will return a 1, so I think it is neater to have the failed_when: config_udpated.rc > 0.

# Check the corosync version IS
# If not installed, exit with error
PACKAGE=$(rpm -qv corosync) || exit 2
[[ "$PACKAGE" == "{{ expected[test_id] }}" ]] && exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here, why don't we use the is version logic?

Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brett060102 ,
We are almost! There, only some super minor changes required

runner/ansible/roles/2.2.4/tasks/main.yml Show resolved Hide resolved
runner/ansible/roles/2.2.5/tasks/main.yml Show resolved Hide resolved
changed_when: config_updated.rc != 0
failed_when: config_updated.rc > 1
changed_when: >
config_updated.stdout is version(expected[test_id + '.low'], '<') or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test only the .low value here. I think it is good enough. The higher ending checking will create issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

runner/ansible/roles/2.2.8/tasks/main.yml Show resolved Hide resolved
Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other issue. The 2.1.6 check doesn't work properly

@@ -0,0 +1,35 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brett060102 ,
I have tested this 2.1.6, and I would say it is broken, as the queried lines return multi line outputs, so the comparisons always fail
Here my outputs using the command that we have right now:

vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'score="\K[^"]+'
4000
-INFINITY
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc="\K[^"]+' | grep -o g_ip
g_ip
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc-role="\K[^"]+'
Started
Master
Started
Slave
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc="\K[^"]+' | grep -o msl_SAPHana
msl_SAPHana
msl_SAPHana
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc-role="\K[^"]+'
Master
Slave
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'kind="\K[^"]+'
Optional
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'first="\K[^"]+' | grep -o cln_SAPHanaTopology
cln_SAPHanaTopology
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'then="\K[^"]+' | grep -o msl_SAPHana
msl_SAPHana
vmhana01:/home/cloudadmin # 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see that you added a space to the initial grep search string. Not sure how this worked in the example code though.

Copy link
Contributor

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error in test 2.1.7

echo ${ha_dr_provider_SAPHanaSR} | grep "execution_order = {{ expected[test_id + '.execution_order'] }}" || exit 1
echo ${ha_dr_provider_SAPHanaSR} | grep "ha_dr_saphanasr = {{ expected[test_id + '.ha_dr_saphanasr'] }}" || exit 1
# if hook not found, exit with fail
share_path=$(cat /hana/shared/PRD/global/hdb/custom/config/global.ini | grep "path =" | sed "s/.*= //")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other failure, this PRD here must be replaced by ${sid}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and thank you.

@arbulu89
Copy link
Contributor

Hey @brett060102,
I took the liberty to fix the mentioned issues. Find the commit here: 4bd8b83

@arbulu89 arbulu89 merged commit f276785 into trento-project:runner Sep 14, 2021
@brett060102
Copy link
Contributor Author

Look over you changes and all look fine to me. Glad this is in. Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants