Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] scripts/build-tools.sh race can corrupt development topologies #5067

Closed
marc-hb opened this issue Dec 10, 2021 · 5 comments
Closed

[BUG] scripts/build-tools.sh race can corrupt development topologies #5067

marc-hb opened this issue Dec 10, 2021 · 5 comments
Labels
bug Something isn't working as expected

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Dec 10, 2021

SOF main branch, commit 7c6186a

Run these:

NO_PROCESSORS=1 ./scripts/build-tools.sh
NO_PROCESSORS=2 ./scripts/build-tools.sh
NO_PROCESSORS=400 ./scripts/build-tools.sh

Save and compare the output sof/tools/build_tools directories each time and compare them. They don't have the same content. In one case some development topologies files are likely to be corrupted.

I learned this the really hard way while testing a totally unrelated CMake fix.

Finer-grained reproduction:

./scripts/build-tools.sh -C # really clean
cd sof/build_tools

make dev_topologies1  # builds dev topologies in topology/topology1/develoment/
make topologies1      # "installs" topology/topology1/* files one level up, INCLUDING development topologies

  => dev topologies are part of the "installed" topologies.

Compare with:

./scripts/build-tools.sh -C # really clean
cd sof/build_tools

# "installs" topology/topology1/* files one level up while MISSING development topologies
# that are not built yet
make topologies1
# builds dev topologies in topology/topology1/develoment/
make dev_topologies1

  => dev topologies are NOT part of the "installed" topologies.

Build systems are highly parallel, so a sequencing problem like this is a red flag: a race condition red flag.
The simple problem is topologies and (optional) dev_topologies1 run concurrently. This would be great if the installer of the former didn't accidentally grab whatever the latter is still building. That makes the number of development topologies "installed" VARIABLE.

Much WORSE and how I actually found this in the first place, it can happen that some topology files are TRUNCATED because they're installed before they're finished.

cc: @ranj063 from commit c0bee42 / PR #4363

@marc-hb marc-hb added the bug Something isn't working as expected label Dec 10, 2021
@lgirdwood
Copy link
Member

@marc-hb can you add a cmake dependency between them or fix the script to work around this until a proper solution is found ?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 10, 2021

I can think of a number of "quick fixes" but I could first use some idea of what purpose development topologies serve. Maybe a quick fix is not to copy / "install" them one level up if no one cares about them.

@plbossart
Copy link
Member

plbossart commented Dec 10, 2021

'Development' topologies are typically added for a specific purpose such as
a) testing an interface that requires custom hardware not widely available
b) adding paths for testing (e.g. nocodec-ci), which extends a baseline
c) added modules that are not or not yet part of the baseline.

The intent is that they are developer-maintained, but not suitable for release or use by non-developers. there are no plans for any backwards compatibility with e.g. UCM files.

As such these topologies do not belong in sof-bin. Whoever needs those topologies should use the git tree and know what they are doing.

Does this help?

marc-hb added a commit to marc-hb/sof that referenced this issue Dec 10, 2021
Move generated *.conf and *.tplg v1 files down from:

  build_tools/topology/topology1/*.{conf,tplg}

_to:

  build_tools/topology/topology1/production/*.{conf,tplg}

... then copy/"install" the production/* subdirectory two levels up.

This fixes the race condition thesofproject#5067 that also copied a random number of
development/ and dsp_enhancement/ topologies, sometimes even truncating
these.

In other words, this commit REMOVES the following two copies:

build_tools/topology/development/       # randomly corrupted copy, removed
build_tools/topology/dsp_enhancement/   # randomly corrupted copy, removed

build_tools/topology/topology1/development/    # real build dir, unchanged
build_tools/topology/topology1/dsp_enhancement # real build dir, unchanged

Production topologies are copied only to help with the v1->v2
transition. That duplication makes the build directory confusing enough,
no need to extend that copy to development topologies. A single instance
of development topologies in the build directory is enough.

This removal may break some CI script(s): this is perfect because such
CI script(s) were copying randomly corrupted data.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 10, 2021

Does this help?

It does thank you!

This took a while (always with CMake...) but I think I have a pretty good fix: #5071

lgirdwood pushed a commit that referenced this issue Dec 13, 2021
Move generated *.conf and *.tplg v1 files down from:

  build_tools/topology/topology1/*.{conf,tplg}

_to:

  build_tools/topology/topology1/production/*.{conf,tplg}

... then copy/"install" the production/* subdirectory two levels up.

This fixes the race condition #5067 that also copied a random number of
development/ and dsp_enhancement/ topologies, sometimes even truncating
these.

In other words, this commit REMOVES the following two copies:

build_tools/topology/development/       # randomly corrupted copy, removed
build_tools/topology/dsp_enhancement/   # randomly corrupted copy, removed

build_tools/topology/topology1/development/    # real build dir, unchanged
build_tools/topology/topology1/dsp_enhancement # real build dir, unchanged

Production topologies are copied only to help with the v1->v2
transition. That duplication makes the build directory confusing enough,
no need to extend that copy to development topologies. A single instance
of development topologies in the build directory is enough.

This removal may break some CI script(s): this is perfect because such
CI script(s) were copying randomly corrupted data.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@lgirdwood
Copy link
Member

@marc-hb can we close ?

@marc-hb marc-hb closed this as completed Dec 13, 2021
@marc-hb marc-hb changed the title [BUG] scripts/build-tools.sh can corrupt development topologies [BUG] scripts/build-tools.sh race can corrupt development topologies Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests

3 participants