bench: add auto feature engineering use-case report#2
Conversation
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s core API reliability and adds practical safety/benchmarking assets, including a new “auto feature engineering” use-case benchmark report and improved guardrails in the sklearn-compatible API and engines.
Changes:
- Add
leakage_guard+ early configuration validation toAutoFeatureEngineer, plus safer__version__fallback when package metadata is missing. - Extend
TimeSeriesEngine(row-wise series mode) andRelationalEngine(relationship dedupe + key validation), with corresponding tests and docs/examples. - Add a focused use-case benchmark script/report and start introducing more realistic benchmark split logic.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_sklearn_compat.py | Adds regression test for importing from source checkout without dist metadata. |
| tests/test_engines.py | Adds coverage for new TimeSeries/Relational validation behaviors. |
| tests/test_autofeat.py | Adds tests for leakage guard + early config validation failures. |
| mkdocs.yml | Adds new examples to the documentation nav. |
| featcopilot/utils/validation.py | Introduces leakage-column detection helper. |
| featcopilot/utils/init.py | Exposes find_potential_leakage_columns at utils level. |
| featcopilot/transformers/sklearn_compat.py | Adds leakage_guard, config validation, and target_name plumbing. |
| featcopilot/engines/timeseries.py | Adds series_in_rows mode and time column validation. |
| featcopilot/engines/relational.py | Dedupes relationships and validates relationship keys in fit/transform. |
| featcopilot/init.py | Falls back to 0+unknown version when metadata is missing. |
| examples/time_aware_tabular_prototype.py | Adds a time-aware, leakage-safe workflow example. |
| docs/examples/time-aware-tabular.md | Documents the time-aware prototype + leakage guard usage. |
| docs/examples/relational-feature-engineering.md | Documents relational engine usage + guardrails. |
| benchmarks/use_cases/run_auto_feature_engineering_benchmark.py | Adds benchmark runner to generate the use-case report. |
| benchmarks/use_cases/README.md | Documents how to run the use-case benchmark. |
| benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md | Adds generated benchmark report output. |
| benchmarks/compare_tools/run_fe_tools_comparison.py | Adds split_benchmark_data() helper (currently not yet wired in). |
| PR_SUMMARY_feat_strengthen_core_api.md | Adds a branch/PR summary document. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2 +/- ##
==========================================
+ Coverage 87.84% 88.18% +0.33%
==========================================
Files 35 36 +1
Lines 3702 3808 +106
==========================================
+ Hits 3252 3358 +106
Misses 450 450
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@copilot , address all the comments. |
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Address remaining PR #2 review comments: - Move split_benchmark_data from compare_tools into a shared benchmarks/splits.py module so both the existing comparison benchmark and the new use-case benchmark route their splits through the same task-aware policy. - Wire benchmarks/use_cases/run_auto_feature_engineering_benchmark.py to call split_benchmark_data instead of train_test_split directly, matching the PR summary's intent. - Add target_name to the AutoFeatureEngineer class docstring under an "Other Parameters" section so it surfaces in help() and generated docs alongside the constructor params. - Add tests/test_benchmark_splits.py covering both the helper's behavior (chronological vs stratified vs random) and a wiring regression check that benchmark scripts use the helper instead of calling train_test_split directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Python 3.9 CI job failed when collecting tests/test_benchmark_splits.py because importing benchmarks.splits triggers benchmarks/__init__.py, which eagerly imports benchmarks.datasets. That module uses PEP 604 union syntax (`str | None`) at function-definition time, which is only supported at runtime on Python 3.10+. Add `from __future__ import annotations` to benchmarks/datasets.py so the annotations become lazy strings and the module imports cleanly under Python 3.9. Verified locally with a fresh Python 3.9 environment: the full test suite (647 passed, 2 skipped) and the new benchmark split tests both pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new “auto feature engineering” use-case benchmark + report, while strengthening core API reliability and benchmark realism (split policy, preprocessing hardening, and safer imports).
Changes:
- Introduce
benchmarks.splits.split_benchmark_data()and wire it into benchmark scripts with tests to prevent regressions. - Harden core library behavior: source-checkout import version fallback, early config validation + leakage-guard in
AutoFeatureEngineer, plus updates toTimeSeriesEngineandRelationalEngine. - Add a focused use-case benchmark script/report and new documentation examples (time-aware tabular + relational FE).
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_utils.py | Adds unit coverage for leakage-column detection on non-string column labels. |
| tests/test_sklearn_compat.py | Adds regression test for featcopilot import when package metadata is missing. |
| tests/test_engines.py | Adds coverage for TimeSeriesEngine.series_in_rows and relational relationship validation errors. |
| tests/test_benchmark_splits.py | New behavioral + wiring tests ensuring benchmark scripts use shared split helper. |
| tests/test_autofeat.py | Adds tests for leakage guard modes and early configuration validation. |
| mkdocs.yml | Adds nav entries for new documentation examples. |
| featcopilot/utils/validation.py | New leakage-column detection helper. |
| featcopilot/utils/init.py | Exposes find_potential_leakage_columns in utils public API. |
| featcopilot/transformers/sklearn_compat.py | Adds leakage guard, config validation, and forwards target_name + transform kwargs in fit_transform. |
| featcopilot/engines/timeseries.py | Adds series_in_rows mode and validates time_column. |
| featcopilot/engines/relational.py | Dedupes relationships and validates relationship keys at fit/transform time. |
| featcopilot/init.py | Falls back to 0+unknown when distribution metadata is unavailable. |
| examples/time_aware_tabular_prototype.py | New end-to-end time-aware workflow example (temporal split + leakage-safe FE). |
| docs/examples/time-aware-tabular.md | Documents the time-aware prototype + leakage guard behavior. |
| docs/examples/relational-feature-engineering.md | Documents relational engine usage + new guardrails. |
| benchmarks/use_cases/run_auto_feature_engineering_benchmark.py | New use-case benchmark runner comparing baseline vs FeatCopilot vs optional tools. |
| benchmarks/use_cases/README.md | Documents how to run the new use-case benchmark. |
| benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md | Adds a committed benchmark report artifact. |
| benchmarks/splits.py | New shared benchmark split policy implementation. |
| benchmarks/datasets.py | Adds __future__ annotations import. |
| benchmarks/compare_tools/run_fe_tools_comparison.py | Switches to shared split helper. |
| PR_SUMMARY_feat_strengthen_core_api.md | Adds a branch/PR summary document for reviewers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The PR summary belongs in the GitHub PR description, not committed to the repository. Its content has been moved to PR #2's body so the file no longer needs to live in source control. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address the latest PR review feedback on the strengthen-core-api branch:
* AutoFeatureEngineer (sklearn_compat):
- Default collection-valued constructor args (`engines`,
`selection_methods`, `llm_config`) using `is not None` instead of
truthiness so explicit empty containers and identity-bearing
arguments are preserved. This is also what sklearn's `clone()`
round-trip identity check requires; previously cloning the
estimator raised `RuntimeError: Cannot clone object ... constructor
either does not set or modifies parameter llm_config`.
- Make `set_params` tolerate `None` for the same parameters (common in
`GridSearchCV` parameter grids) by normalizing to defaults before
`_validate_configuration` is called, so callers no longer hit
`TypeError: 'NoneType' object is not iterable` from `set(None)`.
- Add tests covering the None-normalization in `set_params`,
validation of non-None values, and the sklearn `clone` round trip.
* Auto feature engineering use-case benchmark:
- In the Featuretools case, drop the assignment from
`train_copy.ww.init(...)`. `Accessor.init` mutates in place and
returns `None`, so the previous code clobbered the DataFrame and
later raised `'NoneType' object has no attribute 'columns'` when
building the EntitySet.
- In the autofeat case, convert `y_train` to a numpy array via
`np.asarray(y_train).ravel()` before calling `fit_transform`, so
autofeat receives the 1-D ndarray it expects regardless of whether
the caller passes a Series or array.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an auto feature engineering use-case benchmark + report.
Changes:
- Add leakage detection utilities and wire a configurable leakage guard + early config validation into
AutoFeatureEngineer. - Extend
TimeSeriesEngine(row-wise series mode + clearertime_columnvalidation) and hardenRelationalEngine(relationship dedupe + eager key validation). - Introduce
benchmarks/splits.pyfor task-aware splitting and add a focused auto-FE use-case benchmark with wiring tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
featcopilot/transformers/sklearn_compat.py |
Adds leakage guard, early config validation, forwards transform kwargs in fit_transform, expands sklearn-compat set_params. |
featcopilot/utils/validation.py |
New helper to detect potential leakage-prone columns via keyword + fuzzy target matching. |
featcopilot/utils/__init__.py |
Exposes find_potential_leakage_columns via utils public API. |
featcopilot/engines/timeseries.py |
Adds series_in_rows mode and improves time_column validation/logging. |
featcopilot/engines/relational.py |
Deduplicates relationships and validates relationship keys in fit/transform. |
featcopilot/__init__.py |
Makes __version__ resilient to missing distribution metadata (source checkout import). |
benchmarks/splits.py |
New shared split helper implementing chronological/stratified/random policies. |
benchmarks/compare_tools/run_fe_tools_comparison.py |
Replaces ad-hoc train_test_split usage with split_benchmark_data. |
benchmarks/datasets.py |
Adds from __future__ import annotations for Py3.9 typing compatibility. |
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py |
New interaction-heavy auto-FE benchmark script. |
benchmarks/use_cases/README.md |
Documents how to run the use-case benchmark. |
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md |
Checked-in generated report for the new use-case benchmark. |
tests/test_utils.py |
Adds tests for leakage detection with non-string column labels. |
tests/test_autofeat.py |
Adds tests for leakage guard modes and early validation failures. |
tests/test_engines.py |
Adds coverage for series_in_rows and relational relationship key validation. |
tests/test_sklearn_compat.py |
Adds sklearn clone/set_params behavior tests and import-without-metadata regression test. |
tests/test_benchmark_splits.py |
New tests for split helper behavior + wiring regression checks. |
examples/time_aware_tabular_prototype.py |
New end-to-end example demonstrating time-aware splitting + leakage-safe workflow. |
docs/examples/time-aware-tabular.md |
New docs page for the time-aware example and leakage guard usage. |
docs/examples/relational-feature-engineering.md |
New docs page for relational feature engineering usage/guardrails. |
mkdocs.yml |
Adds the new example docs pages to the navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address the latest PR review comments on the strengthen-core-api branch:
* AutoFeatureEngineer.set_params:
- Validate parameter keys against ``self.get_params(deep=True)`` and
raise ``ValueError`` on unknown keys, matching scikit-learn's
``BaseEstimator.set_params`` convention. Previously typos like
``afe.set_params(typo_param=42)`` were silently accepted via
``setattr``, which masked bugs and broke tooling that expects
sklearn-style validation. Documented the contract in the docstring
and added regression tests covering the unknown-key error and that
a failing call leaves the estimator unmutated.
* Auto feature engineering use-case benchmark:
- Drop the redundant one-hot encoding/alignment in ``run_baseline``;
``evaluate_auc`` already owns that step (and applies it
idempotently for the FE-tool runners that already produce numeric
matrices). The baseline now passes raw frames straight through and
reports ``n_features`` from the post-encoding column count so the
metric still matches what the model actually trains on.
- Sanitize ``±inf`` in ``align_and_fill`` (Featuretools'
``divide_numeric`` primitive can emit infinities on
zero-denominator rows, which then crashes ``StandardScaler``). NaN
handling is preserved.
- Regenerate the committed report so its status text reflects what
the current script actually does. With the upstream-bug fixes in
the previous commit, the Featuretools case now succeeds end to
end; the autofeat case still fails, but with the genuine current
error (sklearn ≥ 1.6 dropped the ``force_all_finite`` kwarg that
autofeat 2.1.x still passes) instead of the stale
``'Series' object has no attribute 'ravel'`` symptom.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible API (earlier validation, leakage guard, safer imports), hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding a focused auto-FE use-case benchmark and report.
Changes:
- Add leakage detection utilities + AutoFeatureEngineer leakage guard, early config validation, and improved sklearn
set_paramsbehavior. - Harden
TimeSeriesEngine(row-wise series mode + clearertime_columnvalidation) andRelationalEngine(relationship dedupe + eager key validation). - Introduce shared
benchmarks.splits.split_benchmark_data()and wire benchmark scripts through it; add new use-case benchmark, docs, and tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
featcopilot/transformers/sklearn_compat.py |
Adds leakage guard, early validation, target_name, forwards kwargs in fit_transform, and strengthens set_params. |
featcopilot/utils/validation.py |
New find_potential_leakage_columns() helper for fuzzy leakage column detection. |
featcopilot/utils/__init__.py |
Exposes leakage detection helper via featcopilot.utils. |
featcopilot/engines/timeseries.py |
Adds series_in_rows mode and time_column validation; refactors transform path. |
featcopilot/engines/relational.py |
Deduplicates relationships and validates relationship keys in fit/transform. |
featcopilot/__init__.py |
Makes __version__ import robust when package metadata is missing. |
benchmarks/splits.py |
New task-aware split helper (chronological vs stratified vs random). |
benchmarks/compare_tools/run_fe_tools_comparison.py |
Uses shared split helper instead of direct train_test_split. |
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py |
Adds new interaction-heavy auto-FE use-case benchmark and report generation. |
benchmarks/use_cases/README.md |
Documents running the new use-case benchmark. |
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md |
Committed generated benchmark report. |
benchmarks/datasets.py |
Adds from __future__ import annotations for Py3.9 typing compatibility. |
tests/test_benchmark_splits.py |
Adds behavioral + wiring tests for shared split helper. |
tests/test_sklearn_compat.py |
Adds sklearn set_params and import-metadata fallback regression coverage. |
tests/test_utils.py |
Adds tests for leakage detection with non-string column labels. |
tests/test_engines.py |
Adds tests for series_in_rows and relational key validation behavior. |
tests/test_autofeat.py |
Adds tests for leakage guard modes and early config validation. |
examples/time_aware_tabular_prototype.py |
Adds leakage-safe time-aware workflow example. |
docs/examples/time-aware-tabular.md |
Documents time-aware tabular example and leakage guard. |
docs/examples/relational-feature-engineering.md |
Documents relational engine usage and new guardrails. |
mkdocs.yml |
Adds the new examples to the docs nav. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address two PR #2 review findings on AutoFeatureEngineer: 1. ``fit`` previously appended into ``self._engine_instances`` and never cleared the prior dict or reset ``self._selector``. After changing ``engines`` via ``set_params`` (or after a previous ``fit_transform`` that built a selector), ``transform`` could run stale engines or apply a stale selection. ``fit`` now calls a new ``_reset_fit_state`` helper that mirrors the fit-derived attribute initialization in ``__init__`` (``_engine_instances``, ``_selector``, ``_feature_set``, ``_is_fitted``, ``_column_descriptions``, ``_task_description``) *before* doing any work, so a refit -- and even a fit that raises midway -- leaves the estimator in a clean, unfitted state instead of a partially-fitted one. 2. ``set_params`` mutated attributes via ``setattr`` and only then called ``_validate_configuration``. A failing call (e.g. an invalid ``engines``/``leakage_guard``/``max_features`` combination) left the estimator in a partially mutated, invalid state. The implementation now snapshots the pre-call values for every parameter in the request *before* applying any mutation (including the eventual ``None`` -> default normalization). On any exception, the snapshot is restored and the original error re-raised, so a failing ``set_params`` is a no-op from the caller's perspective. Tests: - test_set_params_invalid_value_rolls_back_state: combined invalid payload, every parameter restored. - test_set_params_invalid_value_after_none_normalization_rolls_back: rollback restores the pre-call value rather than the None-normalized default. - test_fit_resets_engine_instances_when_engines_change: removing an engine via set_params and refitting drops the stale fitted engine. - test_fit_resets_selector_after_prior_fit_transform: a plain fit() following fit_transform() clears the selector so transform() does not re-apply the prior selection. - test_fit_resets_state_when_called_after_failed_fit: a fit that raises mid-flight leaves _is_fitted=False, _engine_instances={} and transform() correctly errors out. Full suite: 657 passed, 2 skipped (was 652, +5 new tests). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and leakage detection, hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware split policy plus adding an interaction-heavy auto-feature-engineering use-case benchmark.
Changes:
- Add leakage guard + early config validation to
AutoFeatureEngineer, plus safer import-time version fallback. - Add
split_benchmark_data()and wire benchmark scripts through it; add a new use-case benchmark + report. - Add engines hardening (
TimeSeriesEngine.series_in_rows, relational relationship dedupe + key validation) and corresponding tests/docs/examples.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_utils.py | Adds tests for leakage detection with non-string column labels. |
| tests/test_sklearn_compat.py | Adds sklearn-compat regression tests for set_params behavior and package import fallback. |
| tests/test_engines.py | Adds coverage for series_in_rows and relational relationship key validation. |
| tests/test_benchmark_splits.py | Adds behavioral + wiring tests for the shared benchmark split helper. |
| tests/test_autofeat.py | Adds tests for leakage guard modes and early config validation failures. |
| mkdocs.yml | Wires new example docs pages into the MkDocs nav. |
| featcopilot/utils/validation.py | Introduces find_potential_leakage_columns() helper. |
| featcopilot/utils/init.py | Exposes find_potential_leakage_columns via featcopilot.utils. |
| featcopilot/transformers/sklearn_compat.py | Adds leakage guard, early config validation, target_name, atomic set_params, and fit-state resets. |
| featcopilot/engines/timeseries.py | Adds series_in_rows mode and clearer time_column validation/logging. |
| featcopilot/engines/relational.py | Dedupes relationships and validates relationship keys on fit/transform. |
| featcopilot/init.py | Makes __version__ robust to missing installed package metadata. |
| examples/time_aware_tabular_prototype.py | Adds an end-to-end time-aware, leakage-safer workflow example. |
| docs/examples/time-aware-tabular.md | Documents the time-aware tabular workflow + leakage guard behavior. |
| docs/examples/relational-feature-engineering.md | Documents relational engine usage + new guardrails. |
| benchmarks/use_cases/run_auto_feature_engineering_benchmark.py | Adds the interaction-heavy AFE use-case benchmark runner + report writer. |
| benchmarks/use_cases/README.md | Documents how to run the new use-case benchmark. |
| benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md | Adds a generated benchmark report artifact. |
| benchmarks/splits.py | Introduces centralized task-aware train/test split helper for benchmarks. |
| benchmarks/datasets.py | Adds from __future__ import annotations for Py3.9 typing compatibility. |
| benchmarks/compare_tools/run_fe_tools_comparison.py | Switches comparison benchmark to use split_benchmark_data. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… test_size Address two PR #2 review findings: 1. ``AutoFeatureEngineer._validate_configuration`` previously used ``sorted(set(self.engines) - SUPPORTED_ENGINES)`` (and the analogous call for ``selection_methods``) to build its error messages. If the caller passed a mix of non-string values (e.g. ``engines=[None, "spaceship"]``), ``sorted`` raised a confusing ``TypeError`` from the comparison rather than the intended ``ValueError``. The validator now rejects non-string entries up front with a clear ``ValueError`` that names the offending values and lists the supported names. ``set_params`` inherits the new check (and rolls back on it) because validation runs through the same code path. 2. ``benchmarks.splits.split_benchmark_data`` did not validate ``test_size`` for the chronological branch. Values like ``test_size <= 0`` or ``>= 1`` silently produced empty / overlapping splits, whereas the random branch (via ``train_test_split``) would have raised. The function now asserts ``0 < test_size < 1`` up front and additionally raises when the chronological ``split_idx`` would leave either side of the split empty (e.g. tiny datasets combined with an extreme ``test_size``), matching the fail-fast behavior of ``sklearn.model_selection.train_test_split``. Tests added in ``tests/test_sklearn_compat.py``: - ``test_validate_engines_rejects_non_string_entries`` - ``test_validate_selection_methods_rejects_non_string_entries`` - ``test_set_params_rejects_non_string_engine_entries_and_rolls_back`` Tests added in ``tests/test_benchmark_splits.py``: - ``test_split_benchmark_data_rejects_out_of_range_test_size`` (parametrized over both random and chronological branches with 0.0, 1.0, -0.1, 1.5, 2) - ``test_split_benchmark_data_chronological_rejects_empty_train_split`` - ``test_split_benchmark_data_chronological_single_row_dataset_raises`` Full suite: 667 passed, 2 skipped (was 657, +10 new). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and lightweight leakage detection, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware train/test split behavior and adding a focused auto-FE use-case benchmark/report.
Changes:
- Add leakage detection utilities and wire a configurable leakage guard into
AutoFeatureEngineer(plus early config validation + more sklearn-compatibleset_paramsbehavior). - Harden
TimeSeriesEngine(row-wise series mode + clearertime_columnvalidation) andRelationalEngine(relationship dedupe + eager key validation). - Centralize benchmark split policy in
benchmarks/splits.py, wire benchmark scripts to it, and add a new auto-FE use-case benchmark + docs/examples/tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_utils.py | Adds unit coverage for leakage detection with non-string column labels and fuzzy target matching. |
| tests/test_sklearn_compat.py | Expands sklearn-compat tests for set_params atomicity/validation, cloning behavior, state reset semantics, and metadata-less import. |
| tests/test_engines.py | Adds coverage for TimeSeriesEngine.series_in_rows and RelationalEngine relationship key validation. |
| tests/test_benchmark_splits.py | Adds behavioral + wiring tests for the shared benchmark split helper and ensures scripts don’t call train_test_split directly. |
| tests/test_autofeat.py | Adds coverage for leakage guard modes and early config validation failures. |
| mkdocs.yml | Adds new example pages to the documentation nav. |
| featcopilot/utils/validation.py | Introduces find_potential_leakage_columns() helper for leakage-prone column name detection. |
| featcopilot/utils/init.py | Exposes the new leakage helper via featcopilot.utils. |
| featcopilot/transformers/sklearn_compat.py | Adds leakage guard + target_name, early config validation, fit-state reset, improved fit_transform kwarg forwarding, and robust/atomic set_params. |
| featcopilot/engines/timeseries.py | Adds series_in_rows mode and more explicit time_column validation/handling. |
| featcopilot/engines/relational.py | Deduplicates relationships and validates relationship keys early (fit/transform). |
| featcopilot/init.py | Makes __version__ robust to missing installed package metadata by falling back to 0+unknown. |
| examples/time_aware_tabular_prototype.py | Adds an end-to-end time-aware (chronological split) leakage-safer example workflow. |
| docs/examples/time-aware-tabular.md | Documents the time-aware tabular example and leakage guard behavior. |
| docs/examples/relational-feature-engineering.md | Adds a relational FE walkthrough and explains new relational guardrails. |
| benchmarks/use_cases/run_auto_feature_engineering_benchmark.py | Adds a focused interaction-heavy auto-FE use-case benchmark and report generator. |
| benchmarks/use_cases/README.md | Documents how to run the new use-case benchmark. |
| benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md | Commits a generated report for the new use-case benchmark. |
| benchmarks/splits.py | Adds shared task-aware split helper (chronological for forecasting/timeseries, stratified for classification when possible). |
| benchmarks/datasets.py | Adds from __future__ import annotations for Python 3.9-friendly type syntax. |
| benchmarks/compare_tools/run_fe_tools_comparison.py | Replaces direct train_test_split usage with the shared split helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address two PR #2 review findings: 1. ``find_potential_leakage_columns`` (``featcopilot/utils/validation.py``) - ``keywords = keywords or DEFAULT_LEAKAGE_KEYWORDS`` prevented callers from intentionally passing ``keywords=[]`` to disable keyword matching. Switched to an explicit ``is None`` check so an empty list is honored. - ``normalized_target = ... if target_name else None`` had two bugs: valid falsy targets (e.g. ``0``) were treated as "no target", and ``target_name=""`` would normalize to the empty string, which matches every column via the ``normalized_target in normalized_column`` substring check. Now: explicit ``is None`` check, and an empty normalized result is treated as absent so that ``target_name=""`` (or values like ``"---"`` that strip to nothing) no longer matches every column, while ``target_name=0`` correctly drives matching. 2. ``AutoFeatureEngineer`` (``featcopilot/transformers/sklearn_compat.py``) - With the recent ``is not None`` defaulting in ``__init__``, an explicit ``engines=[]`` was preserved. ``fit()`` would then run zero engines and still mark the estimator fitted, making ``transform()`` a silent no-op. ``_validate_configuration`` now rejects empty ``engines`` and empty ``selection_methods`` with a clear ``ValueError`` that points callers at ``None`` (for the default) or the supported names. ``set_params`` inherits this check (and the atomic rollback) automatically. Tests added in ``tests/test_utils.py``: - ``test_leakage_detection_empty_keywords_disables_keyword_matching`` - ``test_leakage_detection_falsy_target_name_zero_still_matches`` - ``test_leakage_detection_empty_string_target_does_not_match_everything`` Tests added in ``tests/test_sklearn_compat.py``: - ``test_init_rejects_empty_engines_list`` - ``test_init_rejects_empty_selection_methods_list`` - ``test_set_params_rejects_empty_engines_and_rolls_back`` - ``test_init_engines_none_still_defaults_to_tabular`` Full suite: 674 passed, 2 skipped (was 667, +7). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an interaction-heavy auto feature engineering use-case benchmark + report.
Changes:
- Add leakage detection utilities and wire a configurable leakage guard into
AutoFeatureEngineerwith early config validation + improved sklearnset_paramsbehavior. - Enhance
TimeSeriesEngine(row-wise series mode + time column validation) andRelationalEngine(relationship de-dupe + eager key validation). - Introduce
benchmarks.splits.split_benchmark_data()and update benchmark scripts/tests/docs/examples, plus add a new use-case benchmark and report.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
featcopilot/transformers/sklearn_compat.py |
Adds leakage guard + early validation, resets fit state, forwards transform kwargs in fit_transform, hardens set_params. |
featcopilot/utils/validation.py |
New find_potential_leakage_columns() helper for fuzzy leakage-prone column detection. |
featcopilot/utils/__init__.py |
Exposes find_potential_leakage_columns via featcopilot.utils. |
featcopilot/engines/timeseries.py |
Adds series_in_rows mode and validates time_column earlier; updates column selection behavior. |
featcopilot/engines/relational.py |
Deduplicates relationships and validates child/parent keys during fit/transform. |
featcopilot/__init__.py |
Makes __version__ robust to missing package metadata (0+unknown fallback). |
benchmarks/splits.py |
Adds centralized task-aware split helper (chronological/stratified/random) with validation. |
benchmarks/compare_tools/run_fe_tools_comparison.py |
Routes data splitting through split_benchmark_data instead of direct train_test_split. |
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py |
Adds focused interaction-heavy use-case benchmark runner + report writer. |
benchmarks/use_cases/README.md |
Documents how to run the new use-case benchmark and outputs. |
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md |
Adds generated benchmark report output. |
benchmarks/datasets.py |
Enables from __future__ import annotations for newer typing syntax under Python 3.9. |
tests/test_benchmark_splits.py |
New tests covering split helper behavior + wiring assertions for benchmark scripts. |
tests/test_utils.py |
Adds coverage for leakage detection edge cases (non-strings, empty keywords, falsy targets). |
tests/test_autofeat.py |
Adds tests for leakage guard modes and early invalid-config failures. |
tests/test_engines.py |
Adds tests for series_in_rows and relational relationship validation behavior. |
tests/test_sklearn_compat.py |
Adds sklearn-compat regression tests (atomic set_params, clone behavior, import fallback). |
mkdocs.yml |
Adds navigation entries for the new example docs pages. |
docs/examples/time-aware-tabular.md |
Adds time-aware tabular workflow example and leakage guard explanation. |
docs/examples/relational-feature-engineering.md |
Adds relational engine usage example + guardrails description. |
examples/time_aware_tabular_prototype.py |
Adds an end-to-end leakage-safe temporal-split prototype script. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rget_name typing Address two PR #2 review findings: 1. ``_validate_configuration`` previously assumed ``self.engines`` and ``self.selection_methods`` were iterable sequences. A bare ``str`` (e.g. ``engines="tabular"``) would silently iterate character-by-character through the downstream non-string-entry / set-diff checks and surface as a confusing ``Unknown engines: ['a', 'b', 'l', 'r', 't', 'u']`` error, while a non-iterable value (e.g. ``engines=5``) raised an unrelated ``TypeError`` from ``set(self.engines)``. The validator now rejects anything that is not a ``list`` or ``tuple`` up front with a clear ``ValueError`` that names the offending type and value, and runs *before* every other engines/methods check so the rest of the validator can rely on a well-formed container. ``set_params`` inherits this guard (and its atomic rollback) automatically. 2. ``fit`` and ``fit_transform`` annotated ``target_name`` as ``Optional[str]``, but ``find_potential_leakage_columns`` is explicitly designed to accept any column-label type DataFrames support (the helper normalizes labels via ``str(...)`` and is covered by tests with integer labels). The annotation has been widened to ``Optional[Any]`` on both methods, the matching docstring entries now read ``hashable, optional`` and call out that non-string labels are accepted, and the class-level ``Other Parameters`` block was updated for consistency. Tests added in ``tests/test_sklearn_compat.py``: - ``test_init_rejects_string_engines_argument`` - ``test_init_rejects_non_sequence_engines_argument`` (covers ``int`` and ``dict``) - ``test_init_rejects_string_selection_methods_argument`` - ``test_init_rejects_non_sequence_selection_methods_argument`` - ``test_init_accepts_tuple_engines`` (pins tuple support so future tightenings don't accidentally break it) - ``test_set_params_rejects_string_engines_and_rolls_back`` (also verifies prior ``max_features`` is restored) - ``test_fit_accepts_non_string_target_name`` (integer column label ``0`` honored as a target by the leakage guard) Full suite: 681 passed, 2 skipped (was 674, +7). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation + leakage guarding, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware splitting and adding an interaction-heavy auto-FE use-case benchmark/report.
Changes:
- Added leakage detection utilities and integrated a configurable leakage guard + early config validation into
AutoFeatureEngineer(including more sklearn-compatibleset_paramsbehavior). - Hardened engines:
TimeSeriesEngineaddsseries_in_rowsmode and strictertime_columnvalidation;RelationalEnginededupes relationships and validates keys early. - Benchmarks now share a split helper (
split_benchmark_data) and include a new auto feature engineering use-case benchmark + generated report.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
featcopilot/transformers/sklearn_compat.py |
Adds leakage guard + target-aware leakage checking, early configuration validation, fit state reset, and more sklearn-compatible set_params. |
featcopilot/utils/validation.py |
Introduces find_potential_leakage_columns() helper for leakage-prone column name detection. |
featcopilot/utils/__init__.py |
Exposes find_potential_leakage_columns via featcopilot.utils. |
featcopilot/engines/timeseries.py |
Adds series_in_rows mode and validates time_column presence; updates transform logic accordingly. |
featcopilot/engines/relational.py |
Dedupes relationships and adds eager relationship/key validation in fit and transform. |
featcopilot/__init__.py |
Makes package import robust by falling back to __version__ = "0+unknown" when metadata is missing. |
benchmarks/splits.py |
Centralizes benchmark split policy (chronological for forecasting/timeseries, stratified for classification when possible). |
benchmarks/compare_tools/run_fe_tools_comparison.py |
Switches benchmark splitting to the shared split_benchmark_data helper. |
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py |
Adds a focused interaction-heavy use-case benchmark comparing FeatCopilot to baseline/tools. |
benchmarks/use_cases/README.md |
Documents how to run the new use-case benchmark and expected outputs. |
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md |
Checked-in generated benchmark report. |
benchmarks/datasets.py |
Adds from __future__ import annotations to support Python 3.9 type syntax usage. |
tests/test_utils.py |
Adds coverage for leakage detection behavior across edge cases (non-string columns, empty keywords, falsy target). |
tests/test_sklearn_compat.py |
Adds extensive regression coverage for sklearn estimator behavior, validation, rollback, and import fallback. |
tests/test_engines.py |
Adds coverage for series_in_rows timeseries mode and relational relationship validation. |
tests/test_autofeat.py |
Adds coverage for leakage guard modes and early config validation failures. |
tests/test_benchmark_splits.py |
Adds behavioral + wiring tests enforcing benchmark scripts use split_benchmark_data. |
examples/time_aware_tabular_prototype.py |
Adds an end-to-end leakage-safe time-aware workflow example. |
docs/examples/time-aware-tabular.md |
Documents the time-aware tabular example and leakage guard usage. |
docs/examples/relational-feature-engineering.md |
Documents relational feature engineering usage and new guardrails. |
mkdocs.yml |
Adds the new docs pages to the navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…s to empty string Address PR #2 review finding: ``find_potential_leakage_columns`` already guarded the *target* side of the empty-substring trap (treating ``target_name=""`` / ``"---"`` as absent), but the *column* side could still trigger it. If a column label normalized to an empty string, the ``normalized_column in normalized_target`` substring check evaluated ``True`` (because ``"" in "label"`` is ``True``), so a column literally named e.g. ``"---"`` or ``"!!!"`` would be flagged as leakage-prone whenever any ``target_name`` was provided. Fixed by skipping any column whose label normalizes to an empty string entirely (``if not normalized_column: continue`` before both the keyword and target match blocks). An empty normalized column has no meaningful content to compare against, so neither side of the match should run for it. Tests added in ``tests/test_utils.py``: - ``test_leakage_detection_columns_normalizing_to_empty_string_are_skipped`` covers: - ``["---", "!!!"]`` with ``target_name="label"`` returns ``[]``. - ``["---", "label_x"]`` returns only ``["label_x"]``. - Mixing meaningful and empty-normalizing labels still reports only the meaningful ones. Also expanded the docstring ``Notes`` block to document the symmetric column-side guard alongside the existing target-side description. Full suite: 682 passed, 2 skipped (was 681, +1). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier config validation and a lightweight leakage guard, hardens time-series/relational engines, and improves benchmark realism by introducing a shared task-aware split helper plus a focused auto feature engineering use-case benchmark/report.
Changes:
- Add leakage detection utility and wire it into
AutoFeatureEngineerwith newleakage_guard+target_namesupport, plus safer/atomicset_paramsbehavior. - Harden
TimeSeriesEngine(row-wise series mode + clearertime_columnvalidation) andRelationalEngine(relationship dedupe + eager key validation). - Introduce
benchmarks.splits.split_benchmark_data()and wire benchmark scripts through it; add a new interaction-heavy use-case benchmark and report; update docs/examples and tests accordingly.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
featcopilot/transformers/sklearn_compat.py |
Early config validation, leakage guard, state reset on refit/failure, fit_transform() kwarg forwarding, atomic sklearn-style set_params. |
featcopilot/utils/validation.py |
New find_potential_leakage_columns() helper for fuzzy leakage-prone column detection. |
featcopilot/utils/__init__.py |
Exposes find_potential_leakage_columns via featcopilot.utils. |
featcopilot/engines/timeseries.py |
Adds series_in_rows mode and validates time_column earlier; adjusts transform behavior accordingly. |
featcopilot/engines/relational.py |
Dedupes relationships and validates relationship keys in fit() and transform(). |
featcopilot/__init__.py |
Makes package import resilient when distribution metadata is unavailable (__version__ fallback). |
benchmarks/splits.py |
New shared task-aware split helper (chronological/stratified/random) with test_size validation. |
benchmarks/compare_tools/run_fe_tools_comparison.py |
Replaces direct train_test_split usage with split_benchmark_data. |
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py |
New use-case benchmark script comparing tools on an interaction-heavy classification task. |
benchmarks/use_cases/README.md |
Documents how to run the new use-case benchmark. |
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md |
Committed generated report for the new use-case benchmark. |
benchmarks/datasets.py |
Enables from __future__ import annotations for Py3.9-friendly type hints. |
tests/test_utils.py |
Adds coverage for leakage detection edge cases (non-string labels, empty keywords, empty targets). |
tests/test_sklearn_compat.py |
Adds extensive sklearn-compat/regression tests (atomic set_params, clone behavior, refit resets, import fallback). |
tests/test_engines.py |
Adds tests for TimeSeriesEngine.series_in_rows and RelationalEngine key validation. |
tests/test_benchmark_splits.py |
New tests covering split helper behavior + regression wiring checks for benchmark scripts. |
tests/test_autofeat.py |
Adds coverage for leakage guard modes and early config validation errors. |
examples/time_aware_tabular_prototype.py |
New end-to-end time-aware/leakage-safe workflow example. |
docs/examples/time-aware-tabular.md |
Documentation for the time-aware tabular example and leakage guard usage. |
docs/examples/relational-feature-engineering.md |
Documentation for relational feature engineering usage + new guardrails. |
mkdocs.yml |
Adds the new examples to the documentation navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses all five review comments from Copilot and Codex on PR #5: * explain now actually returns generated features (Copilot review #1, Codex P1). Built-in engines (e.g. tabular) populate `_feature_names` during `transform()`, not `fit()`. `_cmd_explain` now calls `fit_transform(..., apply_selection=False)` so the JSON payload contains the full `{name, explanation, code}` records the subcommand advertises. Test asserts `n_features > 0` for tabular. * main(argv) -> int contract honored on parse errors (Copilot review #2). `argparse.parse_args` raises `SystemExit` for usage errors, `--help` and `--version`. `main` now traps those and returns the exit code so programmatic and agent callers always get an int. Tests cover `--version` (rc=0), `--help` (rc=0), no-subcommand (rc=2) and unknown-flag (rc=2). * Real subprocess test for python -m featcopilot (Copilot review #3). `test_dunder_main_subprocess_invocation` and `test_dunder_main_subprocess_version_flag` spawn a real `python -m featcopilot ...` subprocess and assert stdout JSON, so a regression in `__main__.py` actually breaks the suite. * Parquet `ImportError` -> clean exit 2 (Codex P2). `_read_table`/`_write_table` now wrap parquet calls and convert `ImportError` into a `ValueError` with a friendly install hint; the top-level handler routes that to the deterministic `exit 2` user-error path instead of the generic `exit 1` backstop. `test_transform_parquet_missing_engine_returns_exit_2` exercises this via `monkeypatch` of `DataFrame.to_parquet`. * Pre-commit black: re-applied formatting from the pinned `black 24.1.1` hook (joined two long string raises) so the CI pre-commit job passes. Tests: 23 (+5 new) in tests/test_cli.py, 796 passed full suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ut cases
Round-25 reviewer feedback addressed:
1. **Copilot — empty-input branch conflated zero-row and zero-column.**
`_read_table` raised the same "zero data rows" error for both
`len(df) == 0` and `len(df.columns) == 0` (both make
`DataFrame.empty` return `True`). A JSON `[{}, {}]` array
produces a frame WITH rows but NO columns ù very different user
error than a header-only CSV. Split into two distinct, accurately
worded error messages: "no columns" vs "zero data rows".
2. **Copilot — read-time warnings leaked to stderr.**
`_cmd_transform` only opened the capture context around
`engineer.fit_transform`. `pd.read_csv` can legitimately emit
`DtypeWarning` on mixed-type CSVs and parquet/JSON readers can
emit pyarrow / pandas warnings on a successful read; those were
bypassing the capture and bleeding to stderr, breaking the
"stderr reserved for failures" contract.
3. **Copilot — write-time warnings also leaked to stderr.**
`_write_table` was likewise outside the capture, so pandas /
pyarrow `FutureWarning` / `UserWarning` from a successful
write could leak. Same contract violation as #2.
The fix wraps the entire `_cmd_transform` pipeline (read +
build_engineer + fit_transform + write) in a single
`_capture_featcopilot_messages` block so warnings from ANY phase end
up in the JSON `warnings` field instead of stderr. `_cmd_explain`
also widened: `explain_features` / `get_feature_code` are now
inside the same capture as the read+fit_transform.
The dead helper `_fit_transform_capturing_warnings` is no longer used
internally; kept as a thin convenience wrapper for external test code
with an updated docstring noting that the CLI now wraps a wider
region.
New tests:
- `test_transform_zero_columns_input_distinguishes_from_zero_rows` ù
pins the column-vs-row error-message distinction.
- `test_transform_zero_rows_input_still_uses_zero_rows_message` ù
guards the existing zero-rows wording for the header-only case.
- `test_transform_read_warning_captured_not_on_stderr` ù patches
`pd.read_csv` to emit a warning, asserts `err == ""` and the
warning lands in JSON `warnings`.
- `test_transform_write_warning_captured_not_on_stderr` ù patches
`DataFrame.to_csv` to emit a warning, same contract.
- `test_explain_features_warnings_captured_not_on_stderr` ù patches
`AutoFeatureEngineer.explain_features` to emit a warning, same
contract for `_cmd_explain`.
906 tests pass locally (full suite); 133 in tests/test_cli.py.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Strengthen FeatCopilot's core API with safer defaults and earlier validation, add a lightweight leakage guard for label/future-information columns, harden the time-series and relational engines, and improve benchmark realism with a shared task-aware split helper plus a focused use-case benchmark for interaction-heavy auto feature engineering.
Motivation
The sklearn-compatible
AutoFeatureEngineerwas easy to misuse in real workflows:fit(), often after expensive setupfit_transform()silently dropped transform-time kwargs (e.g.text_columns,related_tables)The benchmarks also under-represented realistic FE workloads:
train_test_split(...)was applied uniformly to forecasting datasets, which masked time-leakage effectsThis PR addresses both concerns incrementally without redesigning any public API.
What changed
Core API (
featcopilot/transformers/sklearn_compat.py,featcopilot/__init__.py)leakage_guardconfig ('off' | 'warn' | 'raise', default'warn') that flags column names matching configurable label/future keywords or fuzzy variants of the target.target_nameparameter onfit()andfit_transform()for leakage-aware column matching; documented in both method docstrings and the class docstring'sOther Parameterssection.leakage_guard, and non-positivemax_featuresnow raiseValueErrorimmediately in__init__(and onset_params).fit_transform()now forwards transform-time kwargs (e.g.text_columns,related_tables) to the innertransform()call so one-shot workflows match staged ones.featcopilot.__version__falls back to"0+unknown"when package metadata is unavailable (e.g. fresh source checkouts), instead of raisingPackageNotFoundErrorat import time.Engines (
featcopilot/engines/timeseries.py,featcopilot/engines/relational.py)TimeSeriesEngine: added an opt-inseries_in_rowsmode for row-wise sequence cells; clearer validation oftime_column. Default aggregate-style behavior is preserved.RelationalEngine: deduplicates relationships across repeatedadd_relationshipcalls and validates child/parent keys eagerly so missing keys fail fast with actionable errors.Utilities (
featcopilot/utils/validation.py,featcopilot/utils/__init__.py)find_potential_leakage_columns()helper, exposed atfeatcopilot.utils, robust to non-string column labels (ints, etc.) and fuzzy target-name matching.Benchmarks (
benchmarks/)benchmarks/splits.pycentralizing the split policy:forecast/timeseriestaskscompare_tools/run_fe_tools_comparison.pyand the newuse_cases/run_auto_feature_engineering_benchmark.pythrough the shared helper, replacing directtrain_test_splitcalls.benchmarks/use_cases/run_auto_feature_engineering_benchmark.pyplus its README and generated report (AUTO_FEATURE_ENGINEERING_USE_CASE.md), comparing FeatCopilot, Featuretools, and autofeat on an interaction-heavy classification task.from __future__ import annotationstobenchmarks/datasets.pyso PEP 604 (X | None) annotations work under Python 3.9.Docs / examples
examples/time_aware_tabular_prototype.py: end-to-end leakage-safe workflow.docs/examples/time-aware-tabular.mdanddocs/examples/relational-feature-engineering.md: walkthroughs wired intomkdocs.yml.Tests
tests/test_autofeat.py: covers the leakage guard modes and the early-config-validation failures.tests/test_engines.py: coversTimeSeriesEngine.series_in_rowsandRelationalEnginededupe / key-validation behavior.tests/test_sklearn_compat.py: regression test for source-checkout import without package metadata.tests/test_utils.py: covers leakage detection for non-string column labels and mixed keyword/target hits.tests/test_benchmark_splits.py(new): behavioral coverage of the split helper (chronological / stratified / random / customtest_size) plus wiring regression tests asserting both benchmark scripts usesplit_benchmark_datainstead oftrain_test_splitdirectly.Behavior changes
AutoFeatureEngineer(...)now raisesValueErrorinstead of silently accepting unknown engines, unknown selection methods, an invalidleakage_guard, or a non-positivemax_features. Callers passing valid configs are unaffected.leakage_guard='warn', fitting on a frame with suspicious column names (e.g.target,label_*,future_*) emits aUserWarning. Setleakage_guard='off'to restore prior silent behavior.simple_modelsandautomlbenchmark scripts are intentionally left untouched (out of scope; their split policy is unchanged).Backward compatibility
leakage_guard='warn'default is the only user-visible behavior change for valid existing usage; opt out vialeakage_guard='off'.Testing
pre-commit run --all-files— passes (black, ruff, hooks).pytest tests/—647 passed, 2 skippedlocally.benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md.Checklist
pre-commit run --all-filespassespytest tests/passes locally