Skip to content

Commit 34a9bf8

Browse files
ingydotnetakshat-sjaaronbronownitzmahone
authored
Dos in merge key (#937)
* Make local test targets self-contained Most developer machines do not have libyaml headers installed, so the old default test flow attempted to build the optional extension, failed to find yaml.h, and then only exercised the pure Python fallback. The extension target also required out-of-band system setup before it could pass. Split the Makefile test flow into explicit pure-Python and LibYAML targets, and make the default test target run both. The pure-Python target now forces the extension off during build, while the LibYAML target builds a pinned local yaml/libyaml checkout and wires the include, library, and runtime paths into the extension build. Document the new Makefile workflow in the README, including make test, the individual test targets, the pinned LIBYAML-REF default, and the PYTHON-VERSION override for testing against a specific Python. This makes both test modes pass from a clean checkout without requiring developers to install libyaml development packages by hand. * Fix exponential expansion DoS via duplicate merge key aliases * Fix exponential expansion based DoS in merge key processing Resolves #897 Supercedes #916 ### Summary The merge key (`<<`) constructor implementation in `SafeConstructor.flatten_mapping()` was vulnerable to an exponential time and memory complexity Denial of Service (DoS) vulnerability. When mapping/sequence nodes are merged using anchors/aliases, duplicate references to the same alias point to the same MappingNode instance in Python. During merge key processing, the node values are copied and extended in-place. If the same node appears multiple times at different levels, this causes exponential amplification of the elements list: `2^(n+1) - 1`. A small document under 1 KB can trigger millions of element list extensions, exhausting CPU and memory during safe loading. ### Hardened Fix This commit resolves the vulnerability and hardens it against secondary vectors: 1. Tracks node identity using object ID (`id(node)`) in a single `seen` set scoped to the parent mapping's `flatten_mapping()` execution. 2. Checks and skips duplicate node references inside SequenceNode merge keys (resolving PR-916). 3. Checks and skips duplicate node references across separate, independent MappingNode merge keys in the same mapping (e.g., repeating `<<: *anchor` multiple times). 4. Ensures C-based loaders (e.g., `CSafeLoader`, `CLoader`) are also protected since they inherit constructor logic from `SafeConstructor`. ### Performance Impact - Sequence-nested merge duplicates: Loading a 22-level nested document drops from 3.76s to 0.0028s (O(N) linear complexity). - Mapping-level merge duplicates: Loading a 20-level nested document drops from 0.93s to 0.0026s. ### Tests - Added regression tests to `tests/legacy_tests/data/construct-merge.data` and `tests/legacy_tests/data/construct-merge.code` covering both duplicate sequence merges and duplicate direct merges. * Fix CI toolchain drift failures The PR was failing before it reached the PyYAML regression tests in two places: the cp38 wheel jobs installed latest cibuildwheel, whose 4.x series no longer supports Python 3.8, and the Windows arm64 libyaml build used a newer CMake that rejects libyaml 0.2.5's old minimum policy version. Pin the cp38 wheel jobs to cibuildwheel<4, and quote the matrix-provided pip package spec so bash does not parse the '<4' constraint as input redirection. Pass CMAKE_POLICY_VERSION_MINIMUM=3.5 to the Windows libyaml configure step, and disable fail-fast for the Windows libyaml matrix so one architecture does not hide the others. * Add merge key deduplication regression tests * Add time-bounded DoS prevention tests to verify fix * Strengthen merge key DoS timeout tests * Fix merge key fan-out expansion DoS * Use structural merge DoS regression tests Replace the timing-based merge DoS tests with deterministic checks of the flattened mapping shape. Wall-clock thresholds are fragile on slow or emulated CI runners and do not directly prove that the expansion was prevented. The fan-out regression case uses a small levels=4, width=4 payload from the PR #916 shape. With the old behavior, flattening root expands to 1024 repeated key/value pairs and the test fails immediately because the key list contains 1020 extra entries. With the fix, duplicate merge pairs are collapsed during flattening and the same root mapping contains only k0, k1, k2, and k3. This keeps the test fast even if the old bug returns: it observes the structural amplification directly instead of waiting for a large pathological input to time out. * Hybrid merge keys optimization * A hybrid combination of fixes proposed by frenzymadness and akshat-sj to prevent exponential compute while resolving nested anchor aliases. * Further optimization is possible with broader caching of anchor node graph, but would require careful design of invalidation policy. --------- Co-authored-by: Akshat Singh Jaswal <sja.akshat@gmail.com> Co-authored-by: Aaron Bronow <abronow@gmail.com> Co-authored-by: Matt Davis <nitzmahone@redhat.com>
1 parent d51d8a1 commit 34a9bf8

9 files changed

Lines changed: 240 additions & 48 deletions

File tree

.github/workflows/ci.yaml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ jobs:
145145
with:
146146
matrix_yaml: |
147147
include:
148-
- { platform: manylinux1, arch: x86_64, spec: cp38 }
148+
- { platform: manylinux1, arch: x86_64, spec: cp38, cibw_version: 'cibuildwheel<4' }
149149
- { platform: manylinux1, arch: x86_64, spec: cp39, omit: ${{ env.skip_ci_redundant_jobs }} }
150150
- { platform: manylinux2014, arch: x86_64, spec: cp310, omit: ${{ env.skip_ci_redundant_jobs }} }
151151
- { platform: manylinux2014, arch: x86_64, spec: cp311, omit: ${{ env.skip_ci_redundant_jobs }} }
@@ -237,7 +237,7 @@ jobs:
237237
set -eux
238238
239239
python3 -V
240-
python3 -m pip install -U --user ${{ matrix.cibw_version || 'cibuildwheel' }}
240+
python3 -m pip install -U --user "${{ matrix.cibw_version || 'cibuildwheel' }}"
241241
242242
mkdir pyyaml
243243
@@ -423,7 +423,7 @@ jobs:
423423
run: |
424424
set -eux
425425
python3 -V
426-
python3 -m pip install -U --user ${{ matrix.cibw_version || 'cibuildwheel' }}
426+
python3 -m pip install -U --user "${{ matrix.cibw_version || 'cibuildwheel' }}"
427427
mkdir pyyaml
428428
429429
tar zxf pyyaml*.tar.gz/pyyaml*.tar.gz --strip-components=1 -C pyyaml
@@ -450,6 +450,7 @@ jobs:
450450
name: libyaml windows ${{ matrix.arch }}
451451
runs-on: ${{ (matrix.arch == 'arm64') && 'windows-11-arm' || 'windows-2022' }}
452452
strategy:
453+
fail-fast: false
453454
matrix:
454455
include:
455456
- arch: x64
@@ -481,7 +482,7 @@ jobs:
481482
mkdir libyaml/build
482483
483484
pushd libyaml/build
484-
cmake.exe -G "Visual Studio 17 2022" -A ${{ matrix.arch }} -DYAML_STATIC_LIB_NAME=yaml ..
485+
cmake.exe -G "Visual Studio 17 2022" -A ${{ matrix.arch }} -DYAML_STATIC_LIB_NAME=yaml -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
485486
cmake.exe --build . --config Release
486487
popd
487488
@@ -500,6 +501,7 @@ jobs:
500501
### x86_64 jobs
501502
- spec: cp38-win_amd64
502503
arch: x64
504+
cibw_version: cibuildwheel<4
503505
504506
- spec: cp39-win_amd64
505507
arch: x64
@@ -620,7 +622,7 @@ jobs:
620622
run: |
621623
set -eux
622624
python -V
623-
python -m pip install -U --user ${{ matrix.cibw_version || 'cibuildwheel' }}
625+
python -m pip install -U --user "${{ matrix.cibw_version || 'cibuildwheel' }}"
624626
mkdir pyyaml
625627
626628
tar zxf pyyaml*.tar.gz/pyyaml*.tar.gz --strip-components=1 -C pyyaml

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# build outputs
2-
/dist/*
3-
/build/*
2+
/.cache/
3+
/dist/
4+
/build/
45
/lib/PyYAML.egg-info/*
56
/wheelhouse/*
67
/yaml/_yaml.c

Makefile

Lines changed: 81 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,96 @@
1+
R := https://github.com/makeplus/makes
2+
C := b0edc13d2045c1a06a92752ea4bfb48cbb59cc4b
3+
M := $(or $(MAKES_REPO_DIR),.cache/makes)
4+
$(shell [ -d '$M' ] || git clone -q $R '$M')
5+
$(shell cd '$M' && [ "$$(git rev-parse HEAD)" = '$C' ] || \
6+
{ git fetch -q origin && git checkout -q '$C'; })
17

2-
.PHONY: build dist
8+
include $M/init.mk
9+
10+
export UV_CACHE_DIR = $(LOCAL-CACHE)/uv
11+
PYTHON-VENV-SETUP = uv pip install Cython pytest setuptools wheel
12+
include $M/python.mk
13+
include $M/clean.mk
14+
include $M/shell.mk
15+
16+
PYTHON-DEPS = $(PYTHON) $(PYTHON-VENV)
17+
PYTEST-PYTHON = \
18+
import sys; \
19+
sys.path.insert(0, 'build/lib'); \
20+
import pytest; \
21+
raise SystemExit(pytest.main())
22+
23+
PYTEST-LIBYAML = \
24+
import pathlib, sys; \
25+
sys.path.insert(0, next(str(p) for p in pathlib.Path('build').glob('lib.*'))); \
26+
import pytest; \
27+
raise SystemExit(pytest.main())
328

4-
PYTHON=/usr/bin/python3
529
TEST=
630
PARAMETERS=
731

8-
build:
9-
${PYTHON} setup.py build ${PARAMETERS}
32+
LIBYAML-REPO ?= https://github.com/yaml/libyaml
33+
LIBYAML-REF ?= 0.2.5
34+
LIBYAML-DIR := $(LOCAL-CACHE)/libyaml-$(LIBYAML-REF)
35+
LIBYAML-BUILD := $(LIBYAML-DIR)/src/.libs/libyaml.$(SO)
36+
LIBYAML-INCLUDE := $(LIBYAML-DIR)/include
37+
LIBYAML-LIB := $(LIBYAML-DIR)/src/.libs
38+
LIBYAML-ENV := \
39+
CFLAGS="-I$(LIBYAML-INCLUDE) $${CFLAGS:-}" \
40+
LDFLAGS="-L$(LIBYAML-LIB) $${LDFLAGS:-}" \
41+
LD_LIBRARY_PATH="$(LIBYAML-LIB):$${LD_LIBRARY_PATH:-}"
42+
43+
MAKES-CLEAN := \
44+
lib/PyYAML.egg-info/ \
45+
lib/yaml/__pycache__/ \
46+
tests/__pycache__/ \
47+
tests/legacy_tests/__pycache__/ \
48+
.pytest_cache/ \
49+
yaml/_yaml.c \
50+
build/ \
51+
dist \
52+
1053

11-
buildext:
12-
${PYTHON} setup.py --with-libyaml build ${PARAMETERS}
54+
build: $(PYTHON-DEPS)
55+
python setup.py build $(PARAMETERS)
1356

14-
force:
15-
${PYTHON} setup.py build -f ${PARAMETERS}
57+
build-python: $(PYTHON-DEPS)
58+
PYYAML_FORCE_LIBYAML=0 python setup.py --without-libyaml build $(PARAMETERS)
1659

17-
forceext:
18-
${PYTHON} setup.py --with-libyaml build -f ${PARAMETERS}
60+
buildext: $(PYTHON-DEPS) $(LIBYAML-BUILD)
61+
$(LIBYAML-ENV) python setup.py --with-libyaml build $(PARAMETERS)
1962

20-
install:
21-
${PYTHON} setup.py install ${PARAMETERS}
63+
force: $(PYTHON-DEPS)
64+
python setup.py build -f $(PARAMETERS)
2265

23-
installext:
24-
${PYTHON} setup.py --with-libyaml install ${PARAMETERS}
66+
forceext: $(PYTHON-DEPS) $(LIBYAML-BUILD)
67+
$(LIBYAML-ENV) python setup.py --with-libyaml build -f $(PARAMETERS)
2568

26-
test: build
27-
PYYAML_FORCE_LIBYAML=0 ${PYTHON} -I -m pytest
69+
install: $(PYTHON-DEPS)
70+
python setup.py install $(PARAMETERS)
2871

29-
testext: buildext
30-
PYYAML_FORCE_LIBYAML=1 ${PYTHON} -I -m pytest
72+
installext: $(PYTHON-DEPS) $(LIBYAML-BUILD)
73+
$(LIBYAML-ENV) python setup.py --with-libyaml install $(PARAMETERS)
3174

32-
testall:
33-
${PYTHON} -m pytest
75+
test: test-python test-libyaml
3476

35-
dist:
77+
test-python: build-python
78+
PYYAML_FORCE_LIBYAML=0 python -I -c "$(PYTEST-PYTHON)"
79+
80+
test-libyaml: buildext
81+
$(LIBYAML-ENV) PYYAML_FORCE_LIBYAML=1 python -I -c "$(PYTEST-LIBYAML)"
82+
83+
$(LIBYAML-BUILD):
84+
rm -fr $(LIBYAML-DIR)
85+
git clone --branch $(LIBYAML-REF) $(LIBYAML-REPO) $(LIBYAML-DIR)
86+
cd $(LIBYAML-DIR) && git reset --hard $(LIBYAML-REF)
87+
cd $(LIBYAML-DIR) && ./bootstrap
88+
cd $(LIBYAML-DIR) && ./configure --disable-dependency-tracking --with-pic --enable-shared=yes
89+
$(MAKE) -C $(LIBYAML-DIR)
90+
91+
dist: $(PYTHON-DEPS)
3692
@# No longer uploading a zip file to pypi
37-
@# ${PYTHON} setup.py --with-libyaml sdist --formats=zip,gztar
38-
${PYTHON} setup.py --with-libyaml sdist --formats=gztar
39-
40-
clean:
41-
${PYTHON} setup.py --with-libyaml clean -a
42-
rm -fr \
43-
dist/ \
44-
lib/PyYAML.egg-info/ \
45-
lib/yaml/__pycache__/ \
46-
tests/__pycache__/ \
47-
tests/legacy_tests/__pycache__/ \
48-
yaml/_yaml.c
93+
@# python setup.py --with-libyaml sdist --formats=zip,gztar
94+
python setup.py --with-libyaml sdist --formats=gztar
95+
96+
.PHONY: build dist

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,23 @@ If you don't trust the input YAML stream, you should use:
2727
## Testing
2828

2929
PyYAML includes a comprehensive test suite.
30-
To run the tests, type `python setup.py test`.
30+
31+
To run the complete local test suite, type:
32+
33+
make test
34+
35+
This creates a local Python environment, runs the pure Python tests, builds a
36+
local copy of LibYAML, and then runs the LibYAML extension tests. The local
37+
LibYAML build is pinned by `LIBYAML-REF`, which defaults to `0.2.5`.
38+
39+
To run only one test mode:
40+
41+
make test-python
42+
make test-libyaml
43+
44+
To test with a specific Python version:
45+
46+
make test PYTHON-VERSION=3.13.5
3147

3248
## Further Information
3349

lib/yaml/constructor.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -179,27 +179,30 @@ def construct_scalar(self, node):
179179

180180
def flatten_mapping(self, node):
181181
merge = []
182+
merge_key_ids = set() # anchor referent node objects should have reference equality here
182183
index = 0
183184
while index < len(node.value):
184185
key_node, value_node = node.value[index]
185186
if key_node.tag == 'tag:yaml.org,2002:merge':
186187
del node.value[index]
187188
if isinstance(value_node, MappingNode):
188189
self.flatten_mapping(value_node)
189-
merge.extend(value_node.value)
190+
for pair in value_node.value:
191+
if (key_id := id(pair[0].value)) not in merge_key_ids:
192+
merge_key_ids.add(key_id)
193+
merge.append(pair)
190194
elif isinstance(value_node, SequenceNode):
191-
submerge = []
192195
for subnode in value_node.value:
193196
if not isinstance(subnode, MappingNode):
194197
raise ConstructorError("while constructing a mapping",
195198
node.start_mark,
196199
"expected a mapping for merging, but found %s"
197200
% subnode.id, subnode.start_mark)
198201
self.flatten_mapping(subnode)
199-
submerge.append(subnode.value)
200-
submerge.reverse()
201-
for value in submerge:
202-
merge.extend(value)
202+
for pair in subnode.value:
203+
if (key_id := id(pair[0].value)) not in merge_key_ids:
204+
merge_key_ids.add(key_id)
205+
merge.append(pair)
203206
else:
204207
raise ConstructorError("while constructing a mapping", node.start_mark,
205208
"expected a mapping or list of mappings for merging, but found %s"

tests/legacy_tests/data/construct-merge.code

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@
77
{ "x": 1, "y": 2, "r": 10, "label": "center/big" },
88
{ "x": 1, "y": 2, "r": 10, "label": "center/big" },
99
{ "x": 1, "y": 2, "r": 10, "label": "center/big" },
10+
{ "x": 1, "y": 2, "r": 10, "label": "center/big" },
11+
{ "x": 1, "y": 2, "r": 10, "label": "center/big" },
1012
]

tests/legacy_tests/data/construct-merge.data

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,14 @@
2525
<< : [ *BIG, *LEFT, *SMALL ]
2626
x: 1
2727
label: center/big
28+
29+
- # Duplicate sequence merge
30+
<< : [ *CENTER, *CENTER ]
31+
r: 10
32+
label: center/big
33+
34+
- # Duplicate direct merge
35+
<< : *CENTER
36+
<< : *CENTER
37+
r: 10
38+
label: center/big

tests/legacy_tests/test_constructor.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -301,4 +301,3 @@ def test_subclass_blacklist_types(data_filename, verbose=False):
301301
sys.modules['test_constructor'] = sys.modules['__main__']
302302
import test_appliance
303303
test_appliance.run(globals())
304-

0 commit comments

Comments
 (0)