-
Notifications
You must be signed in to change notification settings - Fork 635
tests: Bump DevStack to Dalmatian (2024.2) #2742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/hold This is the second attempt after the first was reverted (#2730). I need to see how this performs. fwiw though, I saw no performance issues locally. |
@stephenfin see #2730 |
Yup, see my comment right above 😄 |
I wonder if #2747 would help. |
/retest |
/test openstack-cloud-csi-manila-e2e-test |
/test openstack-cloud-csi-manila-e2e-test |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Error due to missing
However, once again we appear to have ended up with a Jammy image despite requesting Noble 😕 Investigating. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@stephenfin thanks for picking this up, fwiw.. https://review.opendev.org/c/openstack/devstack/+/942755 also, expect to see some failures because of: |
I've done this. |
Turns out we were never running against Ubuntu 24.04. While Boskos reaps networks, instances, disks etc., it doesn't reap images. We've likely been using the same (Ubuntu 24.04) image for who knows how long at this point 😅 |
While boskos will reap most resources for us, it doesn't reap images [1]. This has resulted in us using the same image for who knows how long at this point. Encode the Ubuntu version to prevent us picking up other version by mistake. [1] https://github.com/kubernetes-sigs/boskos/blob/5993cef5a1c719c33c0936d416b7d935058e1204/cmd/janitor/gcp_janitor.py#L46-L88 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Use the Ubuntu 24.04 version, rather than the 22.04 version. This aligns with what we're using for DevStack itself. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
It's all Python 3 now, baby. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Per the Ansible 2.19 porting guide [1]. [1] https://ansible.readthedocs.io/projects/ansible-core/devel/porting_guides/porting_guide_core_2.19.html Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
So that we actually get test results. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Add a timeout to the Manila job and otherwise move some lines around. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Investigating the performance degradation by comparing two recent builds: the last passing one and this failing one. DevStack is about 60% slower to deploy at 467 seconds (7m47s) versus 652 seconds (10m52s), but that's so small and so variable (based on other failures in between) as to be irrelevant. Looks like it's the tests themselves that take longer. I'm going to rework things so we actually get a response back from ginkgo if the test run fails. |
@stephenfin: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Looks like there are some very significant changes in runtime for tests across the board. Now to figure out why. I've been using the below script to compare results from JUnit files (specifically, the JUnit files from the last success and the most recent failure). The result can be seen in results.csv. #!/usr/bin/env python3
import csv
import pprint
from lxml import etree
def diff(before: str, after: str):
with open(before) as fh:
passing = etree.parse(fh)
with open(after) as fh:
failing = etree.parse(fh)
passing_results = {}
results_diff = {}
for testcase in passing.findall('.//testcase'):
passing_results[testcase.get('name')] = (
testcase.get('status'), testcase.get('time')
)
for testcase in failing.findall('.//testcase'):
name = testcase.get('name')
if name not in passing_results:
raise Exception('tests missing from runs: this should not happen')
if (
testcase.get('status') != passing_results[name][0] or
testcase.get('status') != 'skipped'
):
results_diff[testcase.get('name')] = {
'before': passing_results[name],
'after': (testcase.get('status'), testcase.get('time')),
}
with open('results.csv', 'w', newline='') as fh:
writer = csv.writer(fh)
for name, diff in results_diff.items():
if name in {
'[ReportBeforeSuite]',
'[SynchronizedBeforeSuite]',
'[SynchronizedAfterSuite]',
'[ReportAfterSuite] Kubernetes e2e suite report',
}:
continue
if diff['before'][0] != diff['after'][0]:
# we might want to look at this later
continue
before_sec = float(diff['before'][1])
after_sec = float(diff['after'][1])
diff_sec = ((after_sec - before_sec) / before_sec) * 100
print(f'{name}')
print(f'\tbefore: {before_sec:0.2f} seconds')
print(f'\tafter: {after_sec:0.2f} seconds')
print(f'\tchange: {diff_sec:0.2f}%')
writer.writerow([name, before_sec, after_sec, diff_sec])
def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
'before',
help='Before result (passing)',
)
parser.add_argument(
'after',
help='After result (failing)',
)
args = parser.parse_args()
diff(args.before, args.after)
if __name__ == '__main__':
main() |
Let's see if we get the same performance issues in Caracal, since that's cuts our diff in half. Proposed here. |
Tangentially, since we have limited resources, i think in this repo, we should only test with SLURP releases.. i.e., 2024.1 and 2025.1 are more appropriate/relevant than the .2 releases due to their popularity .. we could override this for individual test jobs if necessary to a .2 release.. |
I agree. |
I'm currently deploying Bobcat locally using the below [[local|localrc]]
RECLONE=False
#HOST_IP={{ local_ip_address }}
DEST=/opt/stack
DATA_DIR=${DEST}/data
USE_PYTHON3=True
LOGFILE=$DEST/logs/stack.sh.log
VERBOSE=True
LOG_COLOR=False
LOGDAYS=1
SERVICE_TIMEOUT=300
DATABASE_PASSWORD=password
ADMIN_PASSWORD=password
SERVICE_PASSWORD=password
SERVICE_TOKEN=password
RABBIT_PASSWORD=password
GIT_BASE=https://github.com
TARGET_BRANCH=2023.2-eol
ENABLED_SERVICES=rabbit,mysql,key
# Host tuning
# From: https://opendev.org/openstack/devstack/src/commit/05f7d302cfa2da73b2887afcde92ef65b1001194/.zuul.yaml#L645-L662
# Tune the host to optimize memory usage and hide io latency
# these setting will configure the kernel to treat the host page
# cache and swap with equal priority, and prefer deferring writes
# changing the default swappiness, dirty_ratio and
# the vfs_cache_pressure
ENABLE_SYSCTL_MEM_TUNING=true
# The net tuning optimizes ipv4 tcp fast open and config the default
# qdisk policy to pfifo_fast which effectively disable all qos.
# this minimizes the cpu load of the host network stack
ENABLE_SYSCTL_NET_TUNING=true
# zswap allows the kernel to compress pages in memory before swapping
# them to disk. this can reduce the amount of swap used and improve
# performance. effectivly this trades a small amount of cpu for an
# increase in swap performance by reducing the amount of data
# written to disk. the overall speedup is porportional to the
# compression ratio and the speed of the swap device.
ENABLE_ZSWAP=false
# Nova
enable_service n-api
enable_service n-cpu
enable_service n-cond
enable_service n-sch
enable_service n-api-meta
enable_service placement-api
enable_service placement-client
# Glance
enable_service g-api
enable_service g-reg
# Cinder
enable_service cinder
enable_service c-api
enable_service c-vol
enable_service c-sch
# Neutron
enable_plugin neutron ${GIT_BASE}/openstack/neutron.git 2023.2-eol
enable_service q-svc
enable_service q-ovn-metadata-agent
enable_service q-trunk
enable_service q-qos
enable_service ovn-controller
enable_service ovn-northd
enable_service ovs-vswitchd
enable_service ovsdb-server
ML2_L3_PLUGIN="ovn-router,trunk,qos"
OVN_L3_CREATE_PUBLIC_NETWORK="True"
PUBLIC_BRIDGE_MTU="1430"
IP_VERSION=4
IPV4_ADDRS_SAFE_TO_USE=10.1.0.0/26
FIXED_RANGE=10.1.0.0/26
NETWORK_GATEWAY=10.1.0.1
FLOATING_RANGE=172.24.5.0/24
PUBLIC_NETWORK_GATEWAY=172.24.5.1
# Add a pre-install script to upgrade pip and setuptools
[[local|pre-install]]
# Activate the virtual environment and upgrade pip and setuptools
if [ -f /opt/stack/data/venv/bin/activate ]; then
source /opt/stack/data/venv/bin/activate
pip install --upgrade pip setuptools
deactivate
fi
[[post-config|$GLANCE_API_CONF]]
[glance_store]
default_store = file
[[post-config|$NEUTRON_CONF]]
[DEFAULT]
global_physnet_mtu = 1430 Sharing in case it helps anyone else. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Bump the version of DevStack used in CI from Bobcat (2023.2), which is now EOL, to Dalmatian (2024.2). A future change will bump this further to Epoxy (2025.2).
Which issue this PR fixes(if applicable):
(none)
Special notes for reviewers:
(none)
Release note: