Skip to content
This repository has been archived by the owner on Dec 31, 2021. It is now read-only.

Test VM poweron, boot, and phonehome #227

Merged
merged 9 commits into from
Jun 24, 2021
Merged

Test VM poweron, boot, and phonehome #227

merged 9 commits into from
Jun 24, 2021

Conversation

mmlb
Copy link
Contributor

@mmlb mmlb commented Jun 7, 2021

Description

Add power on + boot from disk + expect phone home test to the VM based tests.

Why is this needed

We hit #226 in production because we don't boot test osie.sh installations.

How Has This Been Tested?

make test-x86_64 test_aarch64.

How are existing users impacted? What migration steps/scripts do we need?

Legacy code, only EM is known to be impacted.

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

@mmlb mmlb marked this pull request as ready for review June 7, 2021 20:59
Copy link
Contributor

@markjacksonfishing markjacksonfishing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
nothing looks off here

ci/Dockerfile Outdated
@@ -35,16 +35,14 @@ RUN apk add --no-cache --no-progress --update --upgrade \
&& \
mv /etc/apk/repositories /etc/apk/repositories.old && \
apk add --no-cache --update --upgrade \
--repository=http://dl-cdn.alpinelinux.org/alpine/v3.11/main \
--repository=http://dl-cdn.alpinelinux.org/alpine/v3.11/community \
--repository=http://dl-cdn.alpinelinux.org/alpine/v3.12/main \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version bump appears to be an unrelated change and not mentioned in the PR description.

  • If necessary - add thy why to the PR description.
  • If not necessary, move into a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will break out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

ci/Dockerfile Outdated
apk add --no-cache --update --upgrade \
--repository=http://dl-cdn.alpinelinux.org/alpine/edge/testing \
cfssl \
mv /etc/apk/repositories.old /etc/apk/repositories \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with our style guidance here - but the line continuation here is unnecessary and the trailing semi-colon is highly irregular. Can both be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing semi-colon enables easy copy&paste/deletion/moving and not needing to touch other lines. Along the same pattern as in go/python for trailing , in maps/dicts.

The trailing ; is necessary to inform docker that the RUN is done, otherwise docker eats any empty lines and tries to run the next COPY/RUN/... as a shell script.

ci/Dockerfile Outdated
;

RUN GOBIN=/bin GO111MODULE=on go get -v github.com/cloudflare/cfssl/cmd/cfssl@v1.6.0 github.com/cloudflare/cfssl/cmd/cfssljson@v1.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not documented in the PR description - is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in the commit it came in from (84baf06).

I'd rather keep the PR description with the important parts of the PR and leave any unrelated-yet-necessary commits documented solely in the commit message. I don't feel super strongly about that though, so if you'd like it in the PR description I can add it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reviewing a PR, it should only be necessary to read the PR description to get the necessary context. Not only does this let the reviewer know which commits were intentional versus accidental, but one cannot expect that a reviewer is going to trawl through all of the commit messages.

My expectation is that PR descriptions contain roughly:

  • What is being done
  • Why is it being done?
  • Why is this the best approach to take?

ci/vm.sh Outdated
@@ -59,12 +59,12 @@ make_disk() {
}

gen_metadata() {
local cprcmd hardware_id id
id=$(uuidgen)
local cprcmd hardware_id
hardware_id=$(uuidgen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like handling the input variables should come first.

hardware_id should be made a local, and moved closer to it's first usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like handling the input variables should come first.

Fair point, can fix.

hardware_id should be made a local, and moved closer to it's first usage.

It is local, right there on line 62 ;). Going to split into 2 lines, and closer to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ci/vm.sh Outdated
hardware_id=$(uuidgen)
local class=$1
local slug=$2
local tag=$3
local id=$4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some disambiguation in naming: what kind of ID? is it a user id?

Alternatively, document the function inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

ci/vm.sh Show resolved Hide resolved
ci/vm.sh Outdated
@@ -404,6 +410,27 @@ test_provision() {
diff -u <(jq -cS . <<<"$want") <(jq -cS . <<<"$got")
}

test_boot_and_phone_home() {
run_vm &
local vmpid=$! i=0 t=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable declaration would be more readable if split across multiple statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just get rid of i and t, they were useful when I was first running this test but no longer really relevant.

ci/vm.sh Outdated
run_vm &
local vmpid=$! i=0 t=10
until grep -qr instance_id uploads; do
sleep $t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just hardcode this as 10 for now and remove the variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ci/vm.sh Outdated
timeout 30 sh -c "wait $vmpid" || kill -KILL $vmpid

# check for phone-home
eventid=instance_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set local here and elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ci/vm.sh Outdated
@@ -353,6 +354,11 @@ do_test() {
rm -f uploads/*

color=34
colorize $color "== Running Boot & Phone-Home Test =="
test_boot_and_phone_home |& stdbuf -i 0 sed "s/^/$(colorize $color 'test_boot_and_phone_home│')/" && echo "this test is expected to fail" >&2 && exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this statement out into multiple lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I need to double check this because I thought I needed a different branch for non-UEFI that was supposed to actually work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you called me out on this because I had some other stuff laying around in a stash that I actually wanted to have in the PR :D.

@mmlb mmlb mentioned this pull request Jun 14, 2021
3 tasks
mergify bot added a commit that referenced this pull request Jun 15, 2021
## Description

Pin grub to pre-Boothole "fixes" that break non secure boot uefi boots.

## Why is this needed

Temporarily unblocks master due to #226.

## How Has This Been Tested?

Tested on top of #227 on my dev machine.

## How are existing users impacted? What migration steps/scripts do we need?

OSIE can be deployed/run while waiting on #228 to land.

## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
@mmlb
Copy link
Contributor Author

mmlb commented Jun 22, 2021

oooooookay! tests are green again. I've rolled back the "test all the oses/arches/fws" commits for another PR so we can get this one out and done with! PTAL again.

@mmlb
Copy link
Contributor Author

mmlb commented Jun 22, 2021

@tstromberg PTAL. (I've mashed the "re-request review" button a dozen times but it doesn't seem to change state, I hope I didn't spam you)

@mmlb mmlb added the ready-to-merge Signal to Mergify to merge the PR. label Jun 22, 2021
if [[ $UEFI == true ]] && [[ $arch != aarch64 ]]; then
cprcmd=(jq -S '.filesystems += [{"mount":{"create":{"options": ["32", "-n", "EFI"]},"device":"/dev/sda1","format":"vfat","point":"/boot/efi"}}]')
fi

local distro=${slug%%_*} version=${slug#*_}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: prefer declaration and assignment as separate statements.

There is no behavioral difference here, but it prevents awkward behaviors when used with command substitution; as the local builtin does not propagate the exit code from the command substitution.

Reference: https://google.github.io/styleguide/shellguide.html#use-local-variables

(feel free to disregard this comment as it seems to disagree with this code base)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shellcheck errors on the command substitution case. I tend to do the obvious/simplest/least-lines case unless a tool complains. In this case:

local foo=1 bar=2

seems more obvious than:

local foo bar
foo=1
bar=2

The latter would have me wondering why its separated. I think for shellscripts its kind of weird as most uses are interactive and the latter case (separate declaration and definition) is something anyone will do at a command prompt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ is not something anyone would do at a prompt

@@ -144,6 +141,8 @@ gen_metadata() {
"wipe_disks": true
}
EOF
mkdir -p 2009-04-04/meta-data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this magic value (2009-04-04) or set it to a variable with a describing string. I imagine that this value has to be identical across a couple of places here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ci/Dockerfile Outdated
;

RUN GOBIN=/bin GO111MODULE=on go get -v github.com/cloudflare/cfssl/cmd/cfssl@v1.6.0 github.com/cloudflare/cfssl/cmd/cfssljson@v1.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reviewing a PR, it should only be necessary to read the PR description to get the necessary context. Not only does this let the reviewer know which commits were intentional versus accidental, but one cannot expect that a reviewer is going to trawl through all of the commit messages.

My expectation is that PR descriptions contain roughly:

  • What is being done
  • Why is it being done?
  • Why is this the best approach to take?

@@ -353,6 +361,20 @@ do_test() {
rm -f uploads/*

color=34
colorize $color "== Running Boot & Phone-Home Test =="
if [[ $arch == aarch64 ]]; then
# aarch64 doesn't work, idk why
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO with an issue URL to track this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

docker/scripts/osie.sh Show resolved Hide resolved
tstromberg
tstromberg previously approved these changes Jun 23, 2021
mmlb added 9 commits June 23, 2021 20:27
Use local for more variables that were previously missed.
Move definition closer to first use.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
This will be needed to verify the boot&phone-home test is successful.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
The boot&phone-home test will need to run qemu in the background
while checking for the phone-home upload. Without this exec the test
is unable to kill the vm correctly which will cause sed's stdin pipe
to remain open and thus the test hangs.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Otherwise we overwrite the boot config written by the provision.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Boots up the VM from virtual disk and polls the caddy upload directory
for the final phone-home event/post.

Currently known/expected to fail, will be fixed in a follow up PR.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
The linux bridge doesn't support mode-5 bonds which causes the test to
fail. I could have gone with mode-1, but individual just seemed better.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Cloud init wants to browse the virtual directories instead of
just trying to fetch the endpoints it wants. Without this directive
caddy responds with 404 and cloud-init gives up.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Both are valid, and tls is preferred. This change allows boot testing
provisions as VMs in CI, so well worth it!

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Now that cloud-init is happy it will actually phone-home so we
need to update the expected outcomes to allow for BIOS to pass.

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
@mmlb mmlb requested a review from tstromberg June 23, 2021 20:30
@mmlb
Copy link
Contributor Author

mmlb commented Jun 23, 2021

Hey @tstromberg addressed some of your comments. Dropped the cfssl commit as it seems it missing from mirrors was temporary and now available again, noted the PR description details though.

@mergify mergify bot merged commit ce113f0 into tinkerbell:master Jun 24, 2021
@mmlb mmlb deleted the poweron-phone-home-test branch June 24, 2021 22:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ready-to-merge Signal to Mergify to merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants