tests: Bump DevStack to Dalmatian (2024.2) #2742

stephenfin · 2024-12-09T12:56:08Z

What this PR does / why we need it:

Bump the version of DevStack used in CI from Bobcat (2023.2), which is now EOL, to Dalmatian (2024.2). A future change will bump this further to Epoxy (2025.2).

Which issue this PR fixes(if applicable):

(none)

Special notes for reviewers:

(none)

Release note:

NONE

stephenfin · 2024-12-09T12:57:53Z

/hold

This is the second attempt after the first was reverted (#2730). I need to see how this performs. fwiw though, I saw no performance issues locally.

kayrus · 2024-12-09T12:58:28Z

@stephenfin see #2730

stephenfin · 2024-12-09T17:40:30Z

@stephenfin see #2730

Yup, see my comment right above 😄

EmilienM · 2024-12-12T18:02:35Z

I wonder if #2747 would help.

kayrus · 2024-12-12T22:04:45Z

/retest

kayrus · 2024-12-12T22:11:18Z

/test openstack-cloud-csi-manila-e2e-test
previously manila tests took 49m29s
cinder tests took 1h50m18s and failed due to timeout

kayrus · 2024-12-12T22:30:56Z

/test openstack-cloud-csi-manila-e2e-test

kayrus · 2024-12-13T11:21:49Z

@EmilienM looks like the #2747 doesn't help

k8s-triage-robot · 2025-03-13T12:09:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kayrus · 2025-03-13T12:18:16Z

/remove-lifecycle stale

stephenfin · 2025-05-08T17:48:26Z

Error due to missing zpool module param:

+ lib/host:configure_zswap:45              :   sudo tee /sys/module/zswap/parameters/zpool
z3fold
tee: /sys/module/zswap/parameters/zpool: No such file or directory

However, once again we appear to have ended up with a Jammy image despite requesting Noble 😕 Investigating.

k8s-ci-robot · 2025-05-08T18:05:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mnaser · 2025-05-08T18:13:43Z

@stephenfin thanks for picking this up, fwiw..

https://review.opendev.org/c/openstack/devstack/+/942755

also, expect to see some failures because of:

#2884

stephenfin · 2025-05-08T18:38:02Z

also, expect to see some failures because of:
#2884

Thanks. It might make sense to stick with 2024.2, fix that, then bump to 2025.2 so. Will think on it 🤔

I've done this.

stephenfin · 2025-05-08T18:48:19Z

Turns out we were never running against Ubuntu 24.04. While Boskos reaps networks, instances, disks etc., it doesn't reap images. We've likely been using the same (Ubuntu 24.04) image for who knows how long at this point 😅

https://github.com/kubernetes-sigs/boskos/blob/5993cef5a1c719c33c0936d416b7d935058e1204/cmd/janitor/gcp_janitor.py#L38

While boskos will reap most resources for us, it doesn't reap images [1]. This has resulted in us using the same image for who knows how long at this point. Encode the Ubuntu version to prevent us picking up other version by mistake. [1] https://github.com/kubernetes-sigs/boskos/blob/5993cef5a1c719c33c0936d416b7d935058e1204/cmd/janitor/gcp_janitor.py#L46-L88 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Use the Ubuntu 24.04 version, rather than the 22.04 version. This aligns with what we're using for DevStack itself. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

It's all Python 3 now, baby. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Per the Ansible 2.19 porting guide [1]. [1] https://ansible.readthedocs.io/projects/ansible-core/devel/porting_guides/porting_guide_core_2.19.html Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

tests/playbooks/roles/install-devstack/defaults/main.yaml

So that we actually get test results. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Add a timeout to the Manila job and otherwise move some lines around. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

stephenfin · 2025-05-12T12:40:46Z

Investigating the performance degradation by comparing two recent builds: the last passing one and this failing one.

DevStack is about 60% slower to deploy at 467 seconds (7m47s) versus 652 seconds (10m52s), but that's so small and so variable (based on other failures in between) as to be irrelevant. Looks like it's the tests themselves that take longer. I'm going to rework things so we actually get a response back from ginkgo if the test run fails.

k8s-ci-robot · 2025-05-12T14:36:17Z

@stephenfin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
openstack-cloud-csi-cinder-e2e-test	`d827feb`	link	true	`/test openstack-cloud-csi-cinder-e2e-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

stephenfin · 2025-05-12T16:56:48Z

Looks like there are some very significant changes in runtime for tests across the board. Now to figure out why. I've been using the below script to compare results from JUnit files (specifically, the JUnit files from the last success and the most recent failure). The result can be seen in results.csv.

#!/usr/bin/env python3

import csv
import pprint

from lxml import etree


def diff(before: str, after: str):
    with open(before) as fh:
        passing = etree.parse(fh)

    with open(after) as fh:
        failing = etree.parse(fh)

    passing_results = {}
    results_diff = {}

    for testcase in passing.findall('.//testcase'):
        passing_results[testcase.get('name')] = (
            testcase.get('status'), testcase.get('time')
        )

    for testcase in failing.findall('.//testcase'):
        name = testcase.get('name')
        if name not in passing_results:
            raise Exception('tests missing from runs: this should not happen')

        if (
            testcase.get('status') != passing_results[name][0] or
            testcase.get('status') != 'skipped'
        ):
            results_diff[testcase.get('name')] = {
                'before': passing_results[name],
                'after': (testcase.get('status'), testcase.get('time')),
            }

    with open('results.csv', 'w', newline='') as fh:
        writer = csv.writer(fh)

        for name, diff in results_diff.items():
            if name in {
                '[ReportBeforeSuite]',
                '[SynchronizedBeforeSuite]',
                '[SynchronizedAfterSuite]',
                '[ReportAfterSuite] Kubernetes e2e suite report',
            }:
                continue

            if diff['before'][0] != diff['after'][0]:
                # we might want to look at this later
                continue

            before_sec = float(diff['before'][1])
            after_sec = float(diff['after'][1])

            diff_sec = ((after_sec - before_sec) / before_sec) * 100
            print(f'{name}')
            print(f'\tbefore: {before_sec:0.2f} seconds')
            print(f'\tafter:  {after_sec:0.2f} seconds')
            print(f'\tchange: {diff_sec:0.2f}%')

            writer.writerow([name, before_sec, after_sec, diff_sec])


def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'before',
        help='Before result (passing)',
    )
    parser.add_argument(
        'after',
        help='After result (failing)',
    )
    args = parser.parse_args()
    diff(args.before, args.after)


if __name__ == '__main__':
    main()

stephenfin · 2025-05-12T17:02:57Z

Let's see if we get the same performance issues in Caracal, since that's cuts our diff in half. Proposed here.

gouthampacha · 2025-05-12T17:41:17Z

Let's see if we get the same performance issues in Caracal, since that's cuts our diff in half. Proposed #2888.

Tangentially, since we have limited resources, i think in this repo, we should only test with SLURP releases.. i.e., 2024.1 and 2025.1 are more appropriate/relevant than the .2 releases due to their popularity .. we could override this for individual test jobs if necessary to a .2 release..

stephenfin · 2025-05-13T12:11:45Z

Let's see if we get the same performance issues in Caracal, since that's cuts our diff in half. Proposed #2888.

Tangentially, since we have limited resources, i think in this repo, we should only test with SLURP releases.. i.e., 2024.1 and 2025.1 are more appropriate/relevant than the .2 releases due to their popularity .. we could override this for individual test jobs if necessary to a .2 release..

I agree.

stephenfin · 2025-06-05T14:45:52Z

I'm currently deploying Bobcat locally using the below local.conf, generated with changes from #2905.

[[local|localrc]]
RECLONE=False
#HOST_IP={{ local_ip_address }}
DEST=/opt/stack
DATA_DIR=${DEST}/data
USE_PYTHON3=True
LOGFILE=$DEST/logs/stack.sh.log
VERBOSE=True
LOG_COLOR=False
LOGDAYS=1
SERVICE_TIMEOUT=300

DATABASE_PASSWORD=password
ADMIN_PASSWORD=password
SERVICE_PASSWORD=password
SERVICE_TOKEN=password
RABBIT_PASSWORD=password

GIT_BASE=https://github.com
TARGET_BRANCH=2023.2-eol

ENABLED_SERVICES=rabbit,mysql,key

# Host tuning
# From: https://opendev.org/openstack/devstack/src/commit/05f7d302cfa2da73b2887afcde92ef65b1001194/.zuul.yaml#L645-L662
# Tune the host to optimize memory usage and hide io latency
# these setting will configure the kernel to treat the host page
# cache and swap with equal priority, and prefer deferring writes
# changing the default swappiness, dirty_ratio and
# the vfs_cache_pressure
ENABLE_SYSCTL_MEM_TUNING=true
# The net tuning optimizes ipv4 tcp fast open and config the default
# qdisk policy to pfifo_fast which effectively disable all qos.
# this minimizes the cpu load of the host network stack
ENABLE_SYSCTL_NET_TUNING=true
# zswap allows the kernel to compress pages in memory before swapping
# them to disk. this can reduce the amount of swap used and improve
# performance. effectivly this trades a small amount of cpu for an
# increase in swap performance by reducing the amount of data
# written to disk. the overall speedup is porportional to the
# compression ratio and the speed of the swap device.
ENABLE_ZSWAP=false

# Nova
enable_service n-api
enable_service n-cpu
enable_service n-cond
enable_service n-sch
enable_service n-api-meta

enable_service placement-api
enable_service placement-client

# Glance
enable_service g-api
enable_service g-reg

# Cinder
enable_service cinder
enable_service c-api
enable_service c-vol
enable_service c-sch

# Neutron
enable_plugin neutron ${GIT_BASE}/openstack/neutron.git 2023.2-eol
enable_service q-svc
enable_service q-ovn-metadata-agent
enable_service q-trunk
enable_service q-qos
enable_service ovn-controller
enable_service ovn-northd
enable_service ovs-vswitchd
enable_service ovsdb-server

ML2_L3_PLUGIN="ovn-router,trunk,qos"
OVN_L3_CREATE_PUBLIC_NETWORK="True"
PUBLIC_BRIDGE_MTU="1430"

IP_VERSION=4
IPV4_ADDRS_SAFE_TO_USE=10.1.0.0/26
FIXED_RANGE=10.1.0.0/26
NETWORK_GATEWAY=10.1.0.1
FLOATING_RANGE=172.24.5.0/24
PUBLIC_NETWORK_GATEWAY=172.24.5.1

# Add a pre-install script to upgrade pip and setuptools
[[local|pre-install]]
# Activate the virtual environment and upgrade pip and setuptools
if [ -f /opt/stack/data/venv/bin/activate ]; then
    source /opt/stack/data/venv/bin/activate
    pip install --upgrade pip setuptools
    deactivate
fi

[[post-config|$GLANCE_API_CONF]]
[glance_store]
default_store = file

[[post-config|$NEUTRON_CONF]]
[DEFAULT]
global_physnet_mtu = 1430

Sharing in case it helps anyone else.

k8s-ci-robot · 2025-06-06T01:56:41Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 9, 2024

k8s-ci-robot requested review from dulek and zetaab December 9, 2024 12:56

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 9, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2024

kayrus mentioned this pull request Dec 12, 2024

ci: host tuning in devstack #2747

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2025

stephenfin force-pushed the devstack-bump branch from b72dc79 to c2351ae Compare May 8, 2025 17:17

stephenfin changed the title ~~tests: Bump DevStack to Dalmatian (2024.2)~~ tests: Bump DevStack to Expoxy (2025.1) May 8, 2025

stephenfin force-pushed the devstack-bump branch from c2351ae to 32cc8b6 Compare May 8, 2025 17:22

stephenfin changed the title ~~tests: Bump DevStack to Expoxy (2025.1)~~ tests: Bump DevStack to Epoxy (2025.1) May 8, 2025

k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 8, 2025

stephenfin changed the title ~~tests: Bump DevStack to Epoxy (2025.1)~~ tests: Bump DevStack to Dalmatian (2024.2) May 8, 2025

stephenfin force-pushed the devstack-bump branch from c4e7427 to 6c2ef2a Compare May 8, 2025 18:37

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 8, 2025

stephenfin force-pushed the devstack-bump branch from 6c2ef2a to 9e85c9c Compare May 8, 2025 18:51

stephenfin added 7 commits May 9, 2025 13:17

tests: Bump DevStack to Dalmatian (2025.1)

db3f715

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

tests: Bump amphora image

c9d6530

Use the Ubuntu 24.04 version, rather than the 22.04 version. This aligns with what we're using for DevStack itself. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

devstack: Remove USE_PYTHON3

5cfea6d

It's all Python 3 now, baby. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

devstack: Disable zswap temporarily

9f05436

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Install Ansible from Debian Testing

a184845

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Correct broken conditions

50b2dc2

Per the Ansible 2.19 porting guide [1]. [1] https://ansible.readthedocs.io/projects/ansible-core/devel/porting_guides/porting_guide_core_2.19.html Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

stephenfin force-pushed the devstack-bump branch from 1a844ac to 50b2dc2 Compare May 9, 2025 12:18

gouthampacha reviewed May 9, 2025

View reviewed changes

tests/playbooks/roles/install-devstack/defaults/main.yaml Show resolved Hide resolved

stephenfin added 2 commits May 12, 2025 13:34

Prefer ginkgo timeout

f663e16

So that we actually get test results. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

Align opts for Cinder, Manila tests

d827feb

Add a timeout to the Manila job and otherwise move some lines around. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 12, 2025

stephenfin mentioned this pull request Jun 5, 2025

tests: Temporarily unblock gate with use of EOL Bobcat tags #2905

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2025

tests: Bump DevStack to Dalmatian (2024.2) #2742

Are you sure you want to change the base?

tests: Bump DevStack to Dalmatian (2024.2) #2742

Conversation

stephenfin commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephenfin commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kayrus commented Dec 9, 2024

Uh oh!

stephenfin commented Dec 9, 2024

Uh oh!

EmilienM commented Dec 12, 2024

Uh oh!

kayrus commented Dec 12, 2024

Uh oh!

kayrus commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kayrus commented Dec 12, 2024

Uh oh!

kayrus commented Dec 13, 2024

Uh oh!

k8s-triage-robot commented Mar 13, 2025

Uh oh!

kayrus commented Mar 13, 2025

Uh oh!

stephenfin commented May 8, 2025

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

mnaser commented May 8, 2025

Uh oh!

stephenfin commented May 8, 2025

Uh oh!

stephenfin commented May 8, 2025

Uh oh!

Uh oh!

stephenfin commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 12, 2025

Uh oh!

stephenfin commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephenfin commented May 12, 2025

Uh oh!

gouthampacha commented May 12, 2025

Uh oh!

stephenfin commented May 13, 2025

Uh oh!

stephenfin commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 6, 2025

Uh oh!

Uh oh!

stephenfin commented Dec 9, 2024 •

edited

Loading

stephenfin commented Dec 9, 2024 •

edited

Loading

kayrus commented Dec 12, 2024 •

edited

Loading

stephenfin commented May 12, 2025 •

edited

Loading

stephenfin commented May 12, 2025 •

edited

Loading

stephenfin commented Jun 5, 2025 •

edited

Loading