Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brett060102
Comments added
runner/ansible/vars/azure_env.yml
Outdated
"2.1.6.rsc_order_kind": "Optional" | ||
"2.1.6.rsc_order_first": "cln_SAPHanaTopology" | ||
"2.1.6.rsc_order_then": "msl_SAPHana" | ||
"2.1.7.provider": "provider = SAPHanaSR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the next for lines to only contain the value, and not the key?
It requires to change the test of course
@@ -0,0 +1,24 @@ | |||
--- | |||
|
|||
- name: Set Test ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should have this check, that only passes if the SLE version is 15.
@lee-martin should we keep it?
Maybe we could check if the version is greater than 15 at most
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arbulu89 This looks a little strange, but this is the first time I look at our Ansible checks, so I might be missing something.
So the original check ID 2.2.2 is in https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml and that checks if the OS is 15 SP1 or newer (using the value 151). To me this reads like somehow the test ID get mixed up with the target value we are actually checking for.
I also miss the "remediation" section, which most importantly contains the reason/reference why we are even performing the check. Where does one find the remediation text in the ansible runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?
@arbulu89 I agree constant values are highly suspicious and should be reviewed. Also, all these checks seem to back the remediation text which references the reason why we are checking something. Where does one find the remediation text in the ansible runner? This would then justify why each check does what it does.
Looking at the where these checks were ported from ( https://github.com/trento-project/trento/blob/main/examples/hana-scale-up-perf-optimized-azure.yaml ) it seems many didn't have a reference and my guess is they originate from a spreadsheet we used early in Trento, but it did not have references. We need to review all checks which do not have references anyway, and this process should then also uncover unrealistic checks, e.g. overly specific version checks of packages etc. Can we open issues on a check by check basis to do this? I would not make this a stopper for implementing ainsible runner right now, we can fix these checks after the transition step by step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remediation texts come in an additional file. We will add them in other PR.
We just want to have here the check implementation.
The remediation will be copy/paste from the current bench common checks anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault on this test. It actually checks if the version is greater than 151.
I guess we can move on with this by now
@@ -0,0 +1,24 @@ | |||
--- | |||
|
|||
- name: Set Test ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this checks with ansible native methods? https://docs.ansible.com/ansible/latest/collections/ansible/builtin/package_module.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should be able to do that/
Restore to original state
Should not have been added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed what I could.
add "facts_result is unreachable" to when for Post initial empty results to ARA if all hosts are unreachable since unreachable_count is only defined if "facts_result is unreachable" is true.
The cibadmin test line were very long so, replace with shell var if not: 1: single use 2: varible would not reduce line lenth significantly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brett060102 ,
Some new comments added.
I'm not still sure about the 2.2 chapter.
@diegoakechi @lee-martin @stefanotorresi Are we sure we want to include these checks now and here? Keep in mind that we cannot tune the expected values, so "constant" package and os versions will tend to fail in many cases.
Opinions?
runner/ansible/check.yml
Outdated
@@ -33,7 +33,9 @@ | |||
import_role: | |||
name: post-results | |||
tasks_from: post | |||
when: unreachable_count|int == hostvars|length | |||
when: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fixed on: d0b8a46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brett060102 ,
Here many other comments.
As far as I have seen these are the checks that we need to fix:
2.1.2
2.1.3
2.1.5
2.1.7
2.2.3
- name: "{{ test_id }}.check" | ||
shell: | | ||
# SAPHanaTopology is not configured, then pass | ||
cibadmin -Q --xpath "//primitive[@type='SAPHanaTopology']/@type" || exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a exit 1
. If the SAPHanaTopology
is not present we have to report it as error.
(Update the comment above too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
- name: "{{ test_id }}.check" | ||
shell: | | ||
# SAP HANA Resource Agent is not configured, then pass | ||
cibadmin -Q --xpath "//primitive[@type='SAPHana']/@type" || exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should report a exit 1
. If this primitive doesn't exist we have a missconfigured cluster.
(Update the comment above too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
[[ $(cibadmin -Q --xpath "${XPATH} [@name='stop']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.stop_timeout'] }}" ]] && exit 1 | ||
[[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'interval="\K[^"]+') != "{{ expected[test_id + '.promote_interval'] }}" ]] && exit 1 | ||
[[ $(cibadmin -Q --xpath "${XPATH} [@name='promote']" | grep -oP 'timeout="\K[^"]+') != "{{ expected[test_id + '.promote_timeout'] }}" ]] && exit 1 | ||
XPATH="//primitive[@type='SAPHana']/instance_attributes/nvpair" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a new line here, to add visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# if azure-lb RA is not configured, then fail | ||
cibadmin -Q --xpath "//primitive[@type='azure-lb']/@type" || exit 1 | ||
# if not grouped, exit with fail | ||
[[ $(cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type") != "0" ]] && exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change this call. I would do the same that you did in the 2.1.4 check.
cibadmin -Q --xpath "//group/primitive[@type='azure-lb']/@type" || exit 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
set_fact: | ||
test_id: '2.1.7' | ||
|
||
- name: "{{ test_id }}.check" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this test quite wrong.
Here what the check should do:
- FInd the sid
- With the sid we can get the /usr/sap/${sid}/SYS/global/hdb/custom/config/global.ini file
- This file is a INI file format, which has a [ha_dr_provider_SAPHanaSR] header
- Within this header, we need to check the content of provider, path and execution order
So, could we split the check in 2 tasks, the first to get the file path and register the value.
Once we have the value, we could use the ini files task type to check the content.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now. Was able to test with the sample file you sent me.
Still follows was was done in example code.
shell: | | ||
# Check the pacemaker version IS | ||
VERSION_ID=$(rpm -qv pacemaker | sed -e "s/pacemaker-\(.*\)+.*/\1/") | ||
[[ $VERSION_ID -ge "{{ expected[test_id] }}" ]] && exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comparison doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, changed to use:
changed_when: config_updated.stdout is version(expected[test_id], '<')
|
||
- name: "{{ test_id }}.check" | ||
shell: | | ||
# Check the pacemaker version IS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the comment pacemaker -> corosync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
2.1.2: SAPHanaTopology is not configured, then fail 2.1.3: SAP HANA Resource Agent is not configured, then fail 2.1.5: fix cibadmin check 2.1.7: fix global.ini check move keys from var file to main.yml 2.2.3: fix version check add error if package not installed 2.2.3.exclude: add error if package not installed 2.2.4: add error if package not installed 2.2.5: add error if package not installed 2.2.5.exclude: add error if package not installed 2.2.6: add error if package not installed fix version check to use full version ID vars/azure_env.yml: removed keys fron 2.1.7 vars change 2.2.6 to be a full version spec vars/dev_env.yml: removed keys fron 2.1.7 vars change 2.2.6 to be a full version spec
@arbulu89 fingers crossed, but I think this is good now. We might want to move to use the ansible version compare everywhere we do version checking, but I think we should hold on that for later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brett060102 ,
The changes look good in general.
As a last request, could we use the is version
logic in all of the checks where we do version comparison. It looks the neater option, and having the same way in all the "same kind of check" items looks a better option
exit 1 | ||
check_mode: no | ||
register: config_updated | ||
changed_when: config_updated.rc != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't we use the is version
logic here?
By the way, I don't get why you have started using the exit 2
here.
exit 1
doesn't do the job?
shell: | | ||
# Check the pacemaker version IS | ||
# If not installed, exit with error | ||
rpm -q --qf "%{VERSION}\n" pacemaker || exit 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the exit 2
here.
If the package doesn't exist, the rpm
command will return a 1
, so I think it is neater to have the failed_when: config_udpated.rc > 0
.
# Check the corosync version IS | ||
# If not installed, exit with error | ||
PACKAGE=$(rpm -qv corosync) || exit 2 | ||
[[ "$PACKAGE" == "{{ expected[test_id] }}" ]] && exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here, why don't we use the is version
logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brett060102 ,
We are almost! There, only some super minor changes required
changed_when: config_updated.rc != 0 | ||
failed_when: config_updated.rc > 1 | ||
changed_when: > | ||
config_updated.stdout is version(expected[test_id + '.low'], '<') or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test only the .low
value here. I think it is good enough. The higher ending checking will create issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other issue. The 2.1.6
check doesn't work properly
@@ -0,0 +1,35 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brett060102 ,
I have tested this 2.1.6
, and I would say it is broken, as the queried lines return multi line outputs, so the comparisons always fail
Here my outputs using the command that we have right now:
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'score="\K[^"]+'
4000
-INFINITY
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc="\K[^"]+' | grep -o g_ip
g_ip
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'rsc-role="\K[^"]+'
Started
Master
Started
Slave
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc="\K[^"]+' | grep -o msl_SAPHana
msl_SAPHana
msl_SAPHana
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_colocation" | grep -oP 'with-rsc-role="\K[^"]+'
Master
Slave
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'kind="\K[^"]+'
Optional
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'first="\K[^"]+' | grep -o cln_SAPHanaTopology
cln_SAPHanaTopology
vmhana01:/home/cloudadmin # cibadmin -Q --xpath "//constraints/rsc_order" | grep -oP 'then="\K[^"]+' | grep -o msl_SAPHana
msl_SAPHana
vmhana01:/home/cloudadmin #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I see that you added a space to the initial grep search string. Not sure how this worked in the example code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error in test 2.1.7
echo ${ha_dr_provider_SAPHanaSR} | grep "execution_order = {{ expected[test_id + '.execution_order'] }}" || exit 1 | ||
echo ${ha_dr_provider_SAPHanaSR} | grep "ha_dr_saphanasr = {{ expected[test_id + '.ha_dr_saphanasr'] }}" || exit 1 | ||
# if hook not found, exit with fail | ||
share_path=$(cat /hana/shared/PRD/global/hdb/custom/config/global.ini | grep "path =" | sed "s/.*= //") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other failure, this PRD
here must be replaced by ${sid}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and thank you.
Hey @brett060102, |
Look over you changes and all look fine to me. Glad this is in. Thank you. |
Add the 2.1.* and 2.2.* tests.
Also added support to run all tests.