Skip to content

Standalone EVE OS containers with RT benchmarking support#7

Merged
uncleDecart merged 15 commits intomainfrom
rucoder/eve-containers
Mar 5, 2026
Merged

Standalone EVE OS containers with RT benchmarking support#7
uncleDecart merged 15 commits intomainfrom
rucoder/eve-containers

Conversation

@rucoder
Copy link
Copy Markdown
Collaborator

@rucoder rucoder commented Mar 1, 2026

Description of the change

Restructure caterpillar and cyclictest into self-contained containers for deployment on EVE OS with full RT benchmarking support. This PR supersedes #5 and includes all its changes (BIOS scraping, detect CPUs, PQOS optional, reboot testing) rebased on top of current main.

Key changes

Base image (Dockerfile.base):

  • SSH server with pubkey auth, login banner with usage instructions
  • Shell aliases: jupyter-start, rt-preflight, rt-info
  • Entrypoint detects cpuset from cgroup, splits housekeeping vs benchmark cores
  • Container stays alive after benchmark via sshd foreground (for SSH/Jupyter access)
  • Pin service threads (sshd, shell) to housekeeping core

RT preflight checks (src/rt_preflight.py):

  • 14-point validation: PREEMPT_RT, isolcpus, nohz_full, rcu_nocbs, irqaffinity, C-states, intel_pstate, governor, clocksource, NUMA balancing, split_lock, hugepages, capabilities, kernel threads
  • PASS/WARN/FAIL output visible in EVE OS cloud log viewer

Container improvements:

  • detect_cpus() reads effective cores from RT_BENCHMARK_CORES env (set by entrypoint), falls back to cgroup/proc/sysconf
  • Record effective cores and housekeeping core in sysinfo.json
  • run.docker=false and run.interactive=false for in-container execution
  • Fix cyclictest: remove redundant chrt (handled by -p 95)
  • PQOS made optional (pqos.enable flag)
  • BIOS settings collection via Redfish
  • Reboot testing support with systemd service auto-setup

Compatibility with demo mode (from main):

Checks and balances

  • Tests added and run
  • (External) Interfaces documented
  • Has security implications (describe below)

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing
    functionality to not work as expected)

Related stories, issues and pulls

Security considerations

  • SSH server in containers uses pubkey-only authentication (password auth disabled)
  • SSH key is injected at build time via SSH_KEY build arg
  • BIOS/Redfish credentials should be passed via environment, not hardcoded (config has placeholder values)

uncleDecart and others added 13 commits March 1, 2026 22:54
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Signed-off-by: Pavel Abramov <uncle.decart@gmail.com>
Restructure caterpillar and cyclictest into self-contained containers
for deployment on EVE OS with full RT benchmarking support.

Build & deploy:
- Add build-all.sh with TAG= and optional registry push support
- SSH_KEY build arg for key-only auth (defaults to ~/.ssh/ztest_key.pub)
- BASE_TAG build arg to pin child images to versioned base
- Print build summary with full FQDN image tags

Base image (Dockerfile.base):
- Add bash, git, procps, ncurses-term, openssh-server
- Configure sshd with pubkey auth, start at container boot
- Add login banner (motd) with Jupyter/SSH/CLI instructions
- Add shell aliases: jupyter-start, rt-preflight, rt-info
- Expose ports 22 (SSH) and 8888 (Jupyter)
- Keep container alive after benchmark via sshd foreground

RT preflight checks (src/rt_preflight.py):
- 14-point validation: PREEMPT_RT, isolcpus, nohz_full, rcu_nocbs,
  irqaffinity, C-states, intel_pstate, governor, clocksource,
  NUMA balancing, split_lock, hugepages, capabilities, kernel threads
- Detects cmdline typos (e.g. rocessor.max_cstate)
- PASS/WARN/FAIL output visible in EVE OS cloud log viewer

Container improvements:
- Copy Python code, config, and notebooks into child images
- Add run.interactive flag: tqdm in terminal, brief logs in containers
- Fix detect_cpus to return actual core list from cgroup, not count
- Fix cyclictest: remove redundant chrt (handled by -p 95)
- Remove rdtset references, skip nested docker when run.docker=false
- Fix typo in main.py import (scr -> src)

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
Entrypoint dynamically splits the container cpuset at boot:
- First core → housekeeping (entrypoint, sshd, python, uv)
- Remaining cores → benchmarks only (exported as RT_BENCHMARK_CORES)

All service processes inherit the housekeeping affinity via taskset.
detect_cpus() reads RT_BENCHMARK_CORES first, so caterpillar/cyclictest
only receive the clean cores. No hardcoded core numbers — fully dynamic
from cgroup cpuset at runtime.

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
in sysinfo

- Replace exec sshd -D with sleep infinity to prevent container exit
  (sshd already running in background, second instance failed on port
conflict)
- Move detect_cpus() before sysinfo collection so effective cores are
  captured in sysinfo.json under new "runtime" section (effective_cores,
  housekeeping_core, source, config_cores)

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
After rebase onto main, the demo_mode branch (from hde2e PR) is now
in the code path. Containers must explicitly opt out to avoid hitting
the DockerHDE2E path instead of DockerTestRunner.

Config default remains demo_mode=true for bare-metal HDE2E workflow.
@rucoder rucoder mentioned this pull request Mar 1, 2026
6 tasks
- Remove the cmdline typo detector for 'rocessor.max_cstate' — it was
  fragile and produced false positives.

- Parse isolcpus flags (managed_irq, domain, io_queue) properly instead
  of feeding them to _parse_cpulist which would crash on non-numeric
  tokens.

- Warn when managed_irq or domain flags are missing — without them the
  kernel still schedules IRQs and tasks onto isolated cores.  Note that
  when any flag is specified, 'domain' is no longer implied and must be
  listed explicitly.

- Warn when io_queue flag is missing on kernel 6.17+ — this flag
  prevents block-layer IO completion queues from landing on isolated
  cores.

- Support open-ended 1-N notation in CPU lists (meaning 'through the
  last available CPU'), as accepted by the kernel.

- Make _parse_cpulist resilient to non-numeric tokens (skip instead
  of crashing with ValueError).
Comment thread caterpillar/Dockerfile Outdated
# Output directory for results
RUN mkdir -p /tmp/output

ENTRYPOINT ["/entrypoint.sh", "uv", "run", "python", "main.py", "run.docker=false", "pqos.enable=false", "run.stressor=false", "run.interactive=false", "demo.demo_mode=false", "run.command=caterpillar"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caterpillar container should just include caterpillar binary to run it, main python script is used to run those tests, here it feels to me like a cyclic dep

Copy link
Copy Markdown
Member

@uncleDecart uncleDecart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm only question about entrypoint for cyclictest and caterpillar

…OS) variants

Restore caterpillar/Dockerfile and cyclictest/Dockerfile to their
original lightweight docker-host versions. The EVE OS standalone
containers (with SSH, Jupyter, cpuset pinning, entrypoint) are now
in Dockerfile.eve, Dockerfile.base.eve, caterpillar/Dockerfile.eve,
and cyclictest/Dockerfile.eve.

Rename build-all.sh to build-all-eve.sh and update it to reference
the .eve Dockerfiles.

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
@uncleDecart
Copy link
Copy Markdown
Member

Thank you @rucoder , looks amazing !

@uncleDecart uncleDecart merged commit 6fe6d54 into main Mar 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants