bench: add auto feature engineering use-case report by thinkall · Pull Request #2 · thinkall/featcopilot

thinkall · 2026-03-18T07:02:01Z

Summary

Strengthen FeatCopilot's core API with safer defaults and earlier validation, add a lightweight leakage guard for label/future-information columns, harden the time-series and relational engines, and improve benchmark realism with a shared task-aware split helper plus a focused use-case benchmark for interaction-heavy auto feature engineering.

Motivation

The sklearn-compatible AutoFeatureEngineer was easy to misuse in real workflows:

invalid engine names or selection methods only failed deep inside fit(), often after expensive setup
there was no built-in signal for label/future-information leakage when feeding curated tabular data
fit_transform() silently dropped transform-time kwargs (e.g. text_columns, related_tables)
the source checkout failed to import when no installed package metadata was available

The benchmarks also under-represented realistic FE workloads:

train_test_split(...) was applied uniformly to forecasting datasets, which masked time-leakage effects
the comparison suite lacked a focused, interaction-heavy use-case showing where automated FE actually helps

This PR addresses both concerns incrementally without redesigning any public API.

What changed

Core API (`featcopilot/transformers/sklearn_compat.py`, `featcopilot/init.py`)

Added leakage_guard config ('off' | 'warn' | 'raise', default 'warn') that flags column names matching configurable label/future keywords or fuzzy variants of the target.
Added a fit-time target_name parameter on fit() and fit_transform() for leakage-aware column matching; documented in both method docstrings and the class docstring's Other Parameters section.
Hardened early configuration validation: unknown engines, unknown selection methods, invalid leakage_guard, and non-positive max_features now raise ValueError immediately in __init__ (and on set_params).
fit_transform() now forwards transform-time kwargs (e.g. text_columns, related_tables) to the inner transform() call so one-shot workflows match staged ones.
featcopilot.__version__ falls back to "0+unknown" when package metadata is unavailable (e.g. fresh source checkouts), instead of raising PackageNotFoundError at import time.

Engines (`featcopilot/engines/timeseries.py`, `featcopilot/engines/relational.py`)

TimeSeriesEngine: added an opt-in series_in_rows mode for row-wise sequence cells; clearer validation of time_column. Default aggregate-style behavior is preserved.
RelationalEngine: deduplicates relationships across repeated add_relationship calls and validates child/parent keys eagerly so missing keys fail fast with actionable errors.

Utilities (`featcopilot/utils/validation.py`, `featcopilot/utils/init.py`)

New find_potential_leakage_columns() helper, exposed at featcopilot.utils, robust to non-string column labels (ints, etc.) and fuzzy target-name matching.

Benchmarks (`benchmarks/`)

New shared module benchmarks/splits.py centralizing the split policy:
- chronological split for forecast / timeseries tasks
- stratified split for classification (when class counts allow)
- random split otherwise
Wired both compare_tools/run_fe_tools_comparison.py and the new use_cases/run_auto_feature_engineering_benchmark.py through the shared helper, replacing direct train_test_split calls.
Added focused use-case benchmark benchmarks/use_cases/run_auto_feature_engineering_benchmark.py plus its README and generated report (AUTO_FEATURE_ENGINEERING_USE_CASE.md), comparing FeatCopilot, Featuretools, and autofeat on an interaction-heavy classification task.
Added from __future__ import annotations to benchmarks/datasets.py so PEP 604 (X | None) annotations work under Python 3.9.

Docs / examples

examples/time_aware_tabular_prototype.py: end-to-end leakage-safe workflow.
docs/examples/time-aware-tabular.md and docs/examples/relational-feature-engineering.md: walkthroughs wired into mkdocs.yml.

Tests

tests/test_autofeat.py: covers the leakage guard modes and the early-config-validation failures.
tests/test_engines.py: covers TimeSeriesEngine.series_in_rows and RelationalEngine dedupe / key-validation behavior.
tests/test_sklearn_compat.py: regression test for source-checkout import without package metadata.
tests/test_utils.py: covers leakage detection for non-string column labels and mixed keyword/target hits.
tests/test_benchmark_splits.py (new): behavioral coverage of the split helper (chronological / stratified / random / custom test_size) plus wiring regression tests asserting both benchmark scripts use split_benchmark_data instead of train_test_split directly.

Behavior changes

AutoFeatureEngineer(...) now raises ValueError instead of silently accepting unknown engines, unknown selection methods, an invalid leakage_guard, or a non-positive max_features. Callers passing valid configs are unaffected.
With the default leakage_guard='warn', fitting on a frame with suspicious column names (e.g. target, label_*, future_*) emits a UserWarning. Set leakage_guard='off' to restore prior silent behavior.
Forecasting/timeseries datasets in the in-scope benchmarks now use a chronological split. Pre-existing simple_models and automl benchmark scripts are intentionally left untouched (out of scope; their split policy is unchanged).

Backward compatibility

No public API removals. All new constructor and fit-time parameters have safe defaults that preserve prior behavior for valid existing usage.
The leakage_guard='warn' default is the only user-visible behavior change for valid existing usage; opt out via leakage_guard='off'.

Testing

pre-commit run --all-files — passes (black, ruff, hooks).
pytest tests/ — 647 passed, 2 skipped locally.
Verified on Python 3.9 (fresh conda env reproducing the original Py3.9 CI failure) and Python 3.13.
New use-case benchmark report regenerated at benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md.

Checklist

Tests added / updated for new behavior
Docs / examples updated where user-facing behavior changed
pre-commit run --all-files passes
pytest tests/ passes locally
No public API removals; new parameters have backward-compatible defaults

Copilot

Pull request overview

This PR strengthens FeatCopilot’s core API reliability and adds practical safety/benchmarking assets, including a new “auto feature engineering” use-case benchmark report and improved guardrails in the sklearn-compatible API and engines.

Changes:

Add leakage_guard + early configuration validation to AutoFeatureEngineer, plus safer __version__ fallback when package metadata is missing.
Extend TimeSeriesEngine (row-wise series mode) and RelationalEngine (relationship dedupe + key validation), with corresponding tests and docs/examples.
Add a focused use-case benchmark script/report and start introducing more realistic benchmark split logic.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_sklearn_compat.py	Adds regression test for importing from source checkout without dist metadata.
tests/test_engines.py	Adds coverage for new TimeSeries/Relational validation behaviors.
tests/test_autofeat.py	Adds tests for leakage guard + early config validation failures.
mkdocs.yml	Adds new examples to the documentation nav.
featcopilot/utils/validation.py	Introduces leakage-column detection helper.
featcopilot/utils/init.py	Exposes `find_potential_leakage_columns` at utils level.
featcopilot/transformers/sklearn_compat.py	Adds `leakage_guard`, config validation, and `target_name` plumbing.
featcopilot/engines/timeseries.py	Adds `series_in_rows` mode and time column validation.
featcopilot/engines/relational.py	Dedupes relationships and validates relationship keys in fit/transform.
featcopilot/init.py	Falls back to `0+unknown` version when metadata is missing.
examples/time_aware_tabular_prototype.py	Adds a time-aware, leakage-safe workflow example.
docs/examples/time-aware-tabular.md	Documents the time-aware prototype + leakage guard usage.
docs/examples/relational-feature-engineering.md	Documents relational engine usage + guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py	Adds benchmark runner to generate the use-case report.
benchmarks/use_cases/README.md	Documents how to run the use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md	Adds generated benchmark report output.
benchmarks/compare_tools/run_fe_tools_comparison.py	Adds `split_benchmark_data()` helper (currently not yet wired in).
PR_SUMMARY_feat_strengthen_core_api.md	Adds a branch/PR summary document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-03-18T07:09:20Z

Codecov Report

❌ Patch coverage is 99.22481% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.18%. Comparing base (2eb9c6b) to head (0e329f1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
featcopilot/engines/timeseries.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #2      +/-   ##
==========================================
+ Coverage   87.84%   88.18%   +0.33%     
==========================================
  Files          35       36       +1     
  Lines        3702     3808     +106     
==========================================
+ Hits         3252     3358     +106     
  Misses        450      450

Flag	Coverage Δ
unittests	`88.18% <99.22%> (+0.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

thinkall · 2026-04-28T05:23:27Z

@copilot , address all the comments.

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-28T05:32:13Z

@copilot , address all the comments.

Addressed all review comments across the follow-up commits, ending with b1a3275. Pre-commit and targeted pytest validation pass locally.

Address remaining PR #2 review comments: - Move split_benchmark_data from compare_tools into a shared benchmarks/splits.py module so both the existing comparison benchmark and the new use-case benchmark route their splits through the same task-aware policy. - Wire benchmarks/use_cases/run_auto_feature_engineering_benchmark.py to call split_benchmark_data instead of train_test_split directly, matching the PR summary's intent. - Add target_name to the AutoFeatureEngineer class docstring under an "Other Parameters" section so it surfaces in help() and generated docs alongside the constructor params. - Add tests/test_benchmark_splits.py covering both the helper's behavior (chronological vs stratified vs random) and a wiring regression check that benchmark scripts use the helper instead of calling train_test_split directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Python 3.9 CI job failed when collecting tests/test_benchmark_splits.py because importing benchmarks.splits triggers benchmarks/__init__.py, which eagerly imports benchmarks.datasets. That module uses PEP 604 union syntax (`str | None`) at function-definition time, which is only supported at runtime on Python 3.10+. Add `from __future__ import annotations` to benchmarks/datasets.py so the annotations become lazy strings and the module imports cleanly under Python 3.9. Verified locally with a fresh Python 3.9 environment: the full test suite (647 passed, 2 skipped) and the new benchmark split tests both pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new “auto feature engineering” use-case benchmark + report, while strengthening core API reliability and benchmark realism (split policy, preprocessing hardening, and safer imports).

Changes:

Introduce benchmarks.splits.split_benchmark_data() and wire it into benchmark scripts with tests to prevent regressions.
Harden core library behavior: source-checkout import version fallback, early config validation + leakage-guard in AutoFeatureEngineer, plus updates to TimeSeriesEngine and RelationalEngine.
Add a focused use-case benchmark script/report and new documentation examples (time-aware tabular + relational FE).

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_utils.py	Adds unit coverage for leakage-column detection on non-string column labels.
tests/test_sklearn_compat.py	Adds regression test for `featcopilot` import when package metadata is missing.
tests/test_engines.py	Adds coverage for `TimeSeriesEngine.series_in_rows` and relational relationship validation errors.
tests/test_benchmark_splits.py	New behavioral + wiring tests ensuring benchmark scripts use shared split helper.
tests/test_autofeat.py	Adds tests for leakage guard modes and early configuration validation.
mkdocs.yml	Adds nav entries for new documentation examples.
featcopilot/utils/validation.py	New leakage-column detection helper.
featcopilot/utils/init.py	Exposes `find_potential_leakage_columns` in utils public API.
featcopilot/transformers/sklearn_compat.py	Adds leakage guard, config validation, and forwards `target_name` + transform kwargs in `fit_transform`.
featcopilot/engines/timeseries.py	Adds `series_in_rows` mode and validates `time_column`.
featcopilot/engines/relational.py	Dedupes relationships and validates relationship keys at fit/transform time.
featcopilot/init.py	Falls back to `0+unknown` when distribution metadata is unavailable.
examples/time_aware_tabular_prototype.py	New end-to-end time-aware workflow example (temporal split + leakage-safe FE).
docs/examples/time-aware-tabular.md	Documents the time-aware prototype + leakage guard behavior.
docs/examples/relational-feature-engineering.md	Documents relational engine usage + new guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py	New use-case benchmark runner comparing baseline vs FeatCopilot vs optional tools.
benchmarks/use_cases/README.md	Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md	Adds a committed benchmark report artifact.
benchmarks/splits.py	New shared benchmark split policy implementation.
benchmarks/datasets.py	Adds `__future__` annotations import.
benchmarks/compare_tools/run_fe_tools_comparison.py	Switches to shared split helper.
PR_SUMMARY_feat_strengthen_core_api.md	Adds a branch/PR summary document for reviewers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The PR summary belongs in the GitHub PR description, not committed to the repository. Its content has been moved to PR #2's body so the file no longer needs to live in source control. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address the latest PR review feedback on the strengthen-core-api branch: * AutoFeatureEngineer (sklearn_compat): - Default collection-valued constructor args (`engines`, `selection_methods`, `llm_config`) using `is not None` instead of truthiness so explicit empty containers and identity-bearing arguments are preserved. This is also what sklearn's `clone()` round-trip identity check requires; previously cloning the estimator raised `RuntimeError: Cannot clone object ... constructor either does not set or modifies parameter llm_config`. - Make `set_params` tolerate `None` for the same parameters (common in `GridSearchCV` parameter grids) by normalizing to defaults before `_validate_configuration` is called, so callers no longer hit `TypeError: 'NoneType' object is not iterable` from `set(None)`. - Add tests covering the None-normalization in `set_params`, validation of non-None values, and the sklearn `clone` round trip. * Auto feature engineering use-case benchmark: - In the Featuretools case, drop the assignment from `train_copy.ww.init(...)`. `Accessor.init` mutates in place and returns `None`, so the previous code clobbered the DataFrame and later raised `'NoneType' object has no attribute 'columns'` when building the EntitySet. - In the autofeat case, convert `y_train` to a numpy array via `np.asarray(y_train).ravel()` before calling `fit_transform`, so autofeat receives the 1-D ndarray it expects regardless of whether the caller passes a Series or array. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an auto feature engineering use-case benchmark + report.

Changes:

Add leakage detection utilities and wire a configurable leakage guard + early config validation into AutoFeatureEngineer.
Extend TimeSeriesEngine (row-wise series mode + clearer time_column validation) and harden RelationalEngine (relationship dedupe + eager key validation).
Introduce benchmarks/splits.py for task-aware splitting and add a focused auto-FE use-case benchmark with wiring tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`featcopilot/transformers/sklearn_compat.py`	Adds leakage guard, early config validation, forwards transform kwargs in `fit_transform`, expands sklearn-compat `set_params`.
`featcopilot/utils/validation.py`	New helper to detect potential leakage-prone columns via keyword + fuzzy target matching.
`featcopilot/utils/__init__.py`	Exposes `find_potential_leakage_columns` via utils public API.
`featcopilot/engines/timeseries.py`	Adds `series_in_rows` mode and improves `time_column` validation/logging.
`featcopilot/engines/relational.py`	Deduplicates relationships and validates relationship keys in `fit`/`transform`.
`featcopilot/__init__.py`	Makes `__version__` resilient to missing distribution metadata (source checkout import).
`benchmarks/splits.py`	New shared split helper implementing chronological/stratified/random policies.
`benchmarks/compare_tools/run_fe_tools_comparison.py`	Replaces ad-hoc `train_test_split` usage with `split_benchmark_data`.
`benchmarks/datasets.py`	Adds `from __future__ import annotations` for Py3.9 typing compatibility.
`benchmarks/use_cases/run_auto_feature_engineering_benchmark.py`	New interaction-heavy auto-FE benchmark script.
`benchmarks/use_cases/README.md`	Documents how to run the use-case benchmark.
`benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md`	Checked-in generated report for the new use-case benchmark.
`tests/test_utils.py`	Adds tests for leakage detection with non-string column labels.
`tests/test_autofeat.py`	Adds tests for leakage guard modes and early validation failures.
`tests/test_engines.py`	Adds coverage for `series_in_rows` and relational relationship key validation.
`tests/test_sklearn_compat.py`	Adds sklearn clone/set_params behavior tests and import-without-metadata regression test.
`tests/test_benchmark_splits.py`	New tests for split helper behavior + wiring regression checks.
`examples/time_aware_tabular_prototype.py`	New end-to-end example demonstrating time-aware splitting + leakage-safe workflow.
`docs/examples/time-aware-tabular.md`	New docs page for the time-aware example and leakage guard usage.
`docs/examples/relational-feature-engineering.md`	New docs page for relational feature engineering usage/guardrails.
`mkdocs.yml`	Adds the new example docs pages to the navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address the latest PR review comments on the strengthen-core-api branch: * AutoFeatureEngineer.set_params: - Validate parameter keys against ``self.get_params(deep=True)`` and raise ``ValueError`` on unknown keys, matching scikit-learn's ``BaseEstimator.set_params`` convention. Previously typos like ``afe.set_params(typo_param=42)`` were silently accepted via ``setattr``, which masked bugs and broke tooling that expects sklearn-style validation. Documented the contract in the docstring and added regression tests covering the unknown-key error and that a failing call leaves the estimator unmutated. * Auto feature engineering use-case benchmark: - Drop the redundant one-hot encoding/alignment in ``run_baseline``; ``evaluate_auc`` already owns that step (and applies it idempotently for the FE-tool runners that already produce numeric matrices). The baseline now passes raw frames straight through and reports ``n_features`` from the post-encoding column count so the metric still matches what the model actually trains on. - Sanitize ``±inf`` in ``align_and_fill`` (Featuretools' ``divide_numeric`` primitive can emit infinities on zero-denominator rows, which then crashes ``StandardScaler``). NaN handling is preserved. - Regenerate the committed report so its status text reflects what the current script actually does. With the upstream-bug fixes in the previous commit, the Featuretools case now succeeds end to end; the autofeat case still fails, but with the genuine current error (sklearn ≥ 1.6 dropped the ``force_all_finite`` kwarg that autofeat 2.1.x still passes) instead of the stale ``'Series' object has no attribute 'ravel'`` symptom. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API (earlier validation, leakage guard, safer imports), hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding a focused auto-FE use-case benchmark and report.

Changes:

Add leakage detection utilities + AutoFeatureEngineer leakage guard, early config validation, and improved sklearn set_params behavior.
Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
Introduce shared benchmarks.splits.split_benchmark_data() and wire benchmark scripts through it; add new use-case benchmark, docs, and tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`featcopilot/transformers/sklearn_compat.py`	Adds leakage guard, early validation, `target_name`, forwards kwargs in `fit_transform`, and strengthens `set_params`.
`featcopilot/utils/validation.py`	New `find_potential_leakage_columns()` helper for fuzzy leakage column detection.
`featcopilot/utils/__init__.py`	Exposes leakage detection helper via `featcopilot.utils`.
`featcopilot/engines/timeseries.py`	Adds `series_in_rows` mode and `time_column` validation; refactors transform path.
`featcopilot/engines/relational.py`	Deduplicates relationships and validates relationship keys in `fit`/`transform`.
`featcopilot/__init__.py`	Makes `__version__` import robust when package metadata is missing.
`benchmarks/splits.py`	New task-aware split helper (chronological vs stratified vs random).
`benchmarks/compare_tools/run_fe_tools_comparison.py`	Uses shared split helper instead of direct `train_test_split`.
`benchmarks/use_cases/run_auto_feature_engineering_benchmark.py`	Adds new interaction-heavy auto-FE use-case benchmark and report generation.
`benchmarks/use_cases/README.md`	Documents running the new use-case benchmark.
`benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md`	Committed generated benchmark report.
`benchmarks/datasets.py`	Adds `from __future__ import annotations` for Py3.9 typing compatibility.
`tests/test_benchmark_splits.py`	Adds behavioral + wiring tests for shared split helper.
`tests/test_sklearn_compat.py`	Adds sklearn `set_params` and import-metadata fallback regression coverage.
`tests/test_utils.py`	Adds tests for leakage detection with non-string column labels.
`tests/test_engines.py`	Adds tests for `series_in_rows` and relational key validation behavior.
`tests/test_autofeat.py`	Adds tests for leakage guard modes and early config validation.
`examples/time_aware_tabular_prototype.py`	Adds leakage-safe time-aware workflow example.
`docs/examples/time-aware-tabular.md`	Documents time-aware tabular example and leakage guard.
`docs/examples/relational-feature-engineering.md`	Documents relational engine usage and new guardrails.
`mkdocs.yml`	Adds the new examples to the docs nav.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address two PR #2 review findings on AutoFeatureEngineer: 1. ``fit`` previously appended into ``self._engine_instances`` and never cleared the prior dict or reset ``self._selector``. After changing ``engines`` via ``set_params`` (or after a previous ``fit_transform`` that built a selector), ``transform`` could run stale engines or apply a stale selection. ``fit`` now calls a new ``_reset_fit_state`` helper that mirrors the fit-derived attribute initialization in ``__init__`` (``_engine_instances``, ``_selector``, ``_feature_set``, ``_is_fitted``, ``_column_descriptions``, ``_task_description``) *before* doing any work, so a refit -- and even a fit that raises midway -- leaves the estimator in a clean, unfitted state instead of a partially-fitted one. 2. ``set_params`` mutated attributes via ``setattr`` and only then called ``_validate_configuration``. A failing call (e.g. an invalid ``engines``/``leakage_guard``/``max_features`` combination) left the estimator in a partially mutated, invalid state. The implementation now snapshots the pre-call values for every parameter in the request *before* applying any mutation (including the eventual ``None`` -> default normalization). On any exception, the snapshot is restored and the original error re-raised, so a failing ``set_params`` is a no-op from the caller's perspective. Tests: - test_set_params_invalid_value_rolls_back_state: combined invalid payload, every parameter restored. - test_set_params_invalid_value_after_none_normalization_rolls_back: rollback restores the pre-call value rather than the None-normalized default. - test_fit_resets_engine_instances_when_engines_change: removing an engine via set_params and refitting drops the stale fitted engine. - test_fit_resets_selector_after_prior_fit_transform: a plain fit() following fit_transform() clears the selector so transform() does not re-apply the prior selection. - test_fit_resets_state_when_called_after_failed_fit: a fit that raises mid-flight leaves _is_fitted=False, _engine_instances={} and transform() correctly errors out. Full suite: 657 passed, 2 skipped (was 652, +5 new tests). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and leakage detection, hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware split policy plus adding an interaction-heavy auto-feature-engineering use-case benchmark.

Changes:

Add leakage guard + early config validation to AutoFeatureEngineer, plus safer import-time version fallback.
Add split_benchmark_data() and wire benchmark scripts through it; add a new use-case benchmark + report.
Add engines hardening (TimeSeriesEngine.series_in_rows, relational relationship dedupe + key validation) and corresponding tests/docs/examples.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_utils.py	Adds tests for leakage detection with non-string column labels.
tests/test_sklearn_compat.py	Adds sklearn-compat regression tests for `set_params` behavior and package import fallback.
tests/test_engines.py	Adds coverage for `series_in_rows` and relational relationship key validation.
tests/test_benchmark_splits.py	Adds behavioral + wiring tests for the shared benchmark split helper.
tests/test_autofeat.py	Adds tests for leakage guard modes and early config validation failures.
mkdocs.yml	Wires new example docs pages into the MkDocs nav.
featcopilot/utils/validation.py	Introduces `find_potential_leakage_columns()` helper.
featcopilot/utils/init.py	Exposes `find_potential_leakage_columns` via `featcopilot.utils`.
featcopilot/transformers/sklearn_compat.py	Adds leakage guard, early config validation, `target_name`, atomic `set_params`, and fit-state resets.
featcopilot/engines/timeseries.py	Adds `series_in_rows` mode and clearer `time_column` validation/logging.
featcopilot/engines/relational.py	Dedupes relationships and validates relationship keys on fit/transform.
featcopilot/init.py	Makes `__version__` robust to missing installed package metadata.
examples/time_aware_tabular_prototype.py	Adds an end-to-end time-aware, leakage-safer workflow example.
docs/examples/time-aware-tabular.md	Documents the time-aware tabular workflow + leakage guard behavior.
docs/examples/relational-feature-engineering.md	Documents relational engine usage + new guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py	Adds the interaction-heavy AFE use-case benchmark runner + report writer.
benchmarks/use_cases/README.md	Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md	Adds a generated benchmark report artifact.
benchmarks/splits.py	Introduces centralized task-aware train/test split helper for benchmarks.
benchmarks/datasets.py	Adds `from __future__ import annotations` for Py3.9 typing compatibility.
benchmarks/compare_tools/run_fe_tools_comparison.py	Switches comparison benchmark to use `split_benchmark_data`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… test_size Address two PR #2 review findings: 1. ``AutoFeatureEngineer._validate_configuration`` previously used ``sorted(set(self.engines) - SUPPORTED_ENGINES)`` (and the analogous call for ``selection_methods``) to build its error messages. If the caller passed a mix of non-string values (e.g. ``engines=[None, "spaceship"]``), ``sorted`` raised a confusing ``TypeError`` from the comparison rather than the intended ``ValueError``. The validator now rejects non-string entries up front with a clear ``ValueError`` that names the offending values and lists the supported names. ``set_params`` inherits the new check (and rolls back on it) because validation runs through the same code path. 2. ``benchmarks.splits.split_benchmark_data`` did not validate ``test_size`` for the chronological branch. Values like ``test_size <= 0`` or ``>= 1`` silently produced empty / overlapping splits, whereas the random branch (via ``train_test_split``) would have raised. The function now asserts ``0 < test_size < 1`` up front and additionally raises when the chronological ``split_idx`` would leave either side of the split empty (e.g. tiny datasets combined with an extreme ``test_size``), matching the fail-fast behavior of ``sklearn.model_selection.train_test_split``. Tests added in ``tests/test_sklearn_compat.py``: - ``test_validate_engines_rejects_non_string_entries`` - ``test_validate_selection_methods_rejects_non_string_entries`` - ``test_set_params_rejects_non_string_engine_entries_and_rolls_back`` Tests added in ``tests/test_benchmark_splits.py``: - ``test_split_benchmark_data_rejects_out_of_range_test_size`` (parametrized over both random and chronological branches with 0.0, 1.0, -0.1, 1.5, 2) - ``test_split_benchmark_data_chronological_rejects_empty_train_split`` - ``test_split_benchmark_data_chronological_single_row_dataset_raises`` Full suite: 667 passed, 2 skipped (was 657, +10 new). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and lightweight leakage detection, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware train/test split behavior and adding a focused auto-FE use-case benchmark/report.

Changes:

Add leakage detection utilities and wire a configurable leakage guard into AutoFeatureEngineer (plus early config validation + more sklearn-compatible set_params behavior).
Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
Centralize benchmark split policy in benchmarks/splits.py, wire benchmark scripts to it, and add a new auto-FE use-case benchmark + docs/examples/tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_utils.py	Adds unit coverage for leakage detection with non-string column labels and fuzzy target matching.
tests/test_sklearn_compat.py	Expands sklearn-compat tests for `set_params` atomicity/validation, cloning behavior, state reset semantics, and metadata-less import.
tests/test_engines.py	Adds coverage for `TimeSeriesEngine.series_in_rows` and `RelationalEngine` relationship key validation.
tests/test_benchmark_splits.py	Adds behavioral + wiring tests for the shared benchmark split helper and ensures scripts don’t call `train_test_split` directly.
tests/test_autofeat.py	Adds coverage for leakage guard modes and early config validation failures.
mkdocs.yml	Adds new example pages to the documentation nav.
featcopilot/utils/validation.py	Introduces `find_potential_leakage_columns()` helper for leakage-prone column name detection.
featcopilot/utils/init.py	Exposes the new leakage helper via `featcopilot.utils`.
featcopilot/transformers/sklearn_compat.py	Adds leakage guard + `target_name`, early config validation, fit-state reset, improved `fit_transform` kwarg forwarding, and robust/atomic `set_params`.
featcopilot/engines/timeseries.py	Adds `series_in_rows` mode and more explicit `time_column` validation/handling.
featcopilot/engines/relational.py	Deduplicates relationships and validates relationship keys early (fit/transform).
featcopilot/init.py	Makes `__version__` robust to missing installed package metadata by falling back to `0+unknown`.
examples/time_aware_tabular_prototype.py	Adds an end-to-end time-aware (chronological split) leakage-safer example workflow.
docs/examples/time-aware-tabular.md	Documents the time-aware tabular example and leakage guard behavior.
docs/examples/relational-feature-engineering.md	Adds a relational FE walkthrough and explains new relational guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py	Adds a focused interaction-heavy auto-FE use-case benchmark and report generator.
benchmarks/use_cases/README.md	Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md	Commits a generated report for the new use-case benchmark.
benchmarks/splits.py	Adds shared task-aware split helper (chronological for forecasting/timeseries, stratified for classification when possible).
benchmarks/datasets.py	Adds `from __future__ import annotations` for Python 3.9-friendly type syntax.
benchmarks/compare_tools/run_fe_tools_comparison.py	Replaces direct `train_test_split` usage with the shared split helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address two PR #2 review findings: 1. ``find_potential_leakage_columns`` (``featcopilot/utils/validation.py``) - ``keywords = keywords or DEFAULT_LEAKAGE_KEYWORDS`` prevented callers from intentionally passing ``keywords=[]`` to disable keyword matching. Switched to an explicit ``is None`` check so an empty list is honored. - ``normalized_target = ... if target_name else None`` had two bugs: valid falsy targets (e.g. ``0``) were treated as "no target", and ``target_name=""`` would normalize to the empty string, which matches every column via the ``normalized_target in normalized_column`` substring check. Now: explicit ``is None`` check, and an empty normalized result is treated as absent so that ``target_name=""`` (or values like ``"---"`` that strip to nothing) no longer matches every column, while ``target_name=0`` correctly drives matching. 2. ``AutoFeatureEngineer`` (``featcopilot/transformers/sklearn_compat.py``) - With the recent ``is not None`` defaulting in ``__init__``, an explicit ``engines=[]`` was preserved. ``fit()`` would then run zero engines and still mark the estimator fitted, making ``transform()`` a silent no-op. ``_validate_configuration`` now rejects empty ``engines`` and empty ``selection_methods`` with a clear ``ValueError`` that points callers at ``None`` (for the default) or the supported names. ``set_params`` inherits this check (and the atomic rollback) automatically. Tests added in ``tests/test_utils.py``: - ``test_leakage_detection_empty_keywords_disables_keyword_matching`` - ``test_leakage_detection_falsy_target_name_zero_still_matches`` - ``test_leakage_detection_empty_string_target_does_not_match_everything`` Tests added in ``tests/test_sklearn_compat.py``: - ``test_init_rejects_empty_engines_list`` - ``test_init_rejects_empty_selection_methods_list`` - ``test_set_params_rejects_empty_engines_and_rolls_back`` - ``test_init_engines_none_still_defaults_to_tabular`` Full suite: 674 passed, 2 skipped (was 667, +7). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an interaction-heavy auto feature engineering use-case benchmark + report.

Changes:

Add leakage detection utilities and wire a configurable leakage guard into AutoFeatureEngineer with early config validation + improved sklearn set_params behavior.
Enhance TimeSeriesEngine (row-wise series mode + time column validation) and RelationalEngine (relationship de-dupe + eager key validation).
Introduce benchmarks.splits.split_benchmark_data() and update benchmark scripts/tests/docs/examples, plus add a new use-case benchmark and report.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`featcopilot/transformers/sklearn_compat.py`	Adds leakage guard + early validation, resets fit state, forwards transform kwargs in `fit_transform`, hardens `set_params`.
`featcopilot/utils/validation.py`	New `find_potential_leakage_columns()` helper for fuzzy leakage-prone column detection.
`featcopilot/utils/__init__.py`	Exposes `find_potential_leakage_columns` via `featcopilot.utils`.
`featcopilot/engines/timeseries.py`	Adds `series_in_rows` mode and validates `time_column` earlier; updates column selection behavior.
`featcopilot/engines/relational.py`	Deduplicates relationships and validates child/parent keys during fit/transform.
`featcopilot/__init__.py`	Makes `__version__` robust to missing package metadata (`0+unknown` fallback).
`benchmarks/splits.py`	Adds centralized task-aware split helper (chronological/stratified/random) with validation.
`benchmarks/compare_tools/run_fe_tools_comparison.py`	Routes data splitting through `split_benchmark_data` instead of direct `train_test_split`.
`benchmarks/use_cases/run_auto_feature_engineering_benchmark.py`	Adds focused interaction-heavy use-case benchmark runner + report writer.
`benchmarks/use_cases/README.md`	Documents how to run the new use-case benchmark and outputs.
`benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md`	Adds generated benchmark report output.
`benchmarks/datasets.py`	Enables `from __future__ import annotations` for newer typing syntax under Python 3.9.
`tests/test_benchmark_splits.py`	New tests covering split helper behavior + wiring assertions for benchmark scripts.
`tests/test_utils.py`	Adds coverage for leakage detection edge cases (non-strings, empty keywords, falsy targets).
`tests/test_autofeat.py`	Adds tests for leakage guard modes and early invalid-config failures.
`tests/test_engines.py`	Adds tests for `series_in_rows` and relational relationship validation behavior.
`tests/test_sklearn_compat.py`	Adds sklearn-compat regression tests (atomic set_params, clone behavior, import fallback).
`mkdocs.yml`	Adds navigation entries for the new example docs pages.
`docs/examples/time-aware-tabular.md`	Adds time-aware tabular workflow example and leakage guard explanation.
`docs/examples/relational-feature-engineering.md`	Adds relational engine usage example + guardrails description.
`examples/time_aware_tabular_prototype.py`	Adds an end-to-end leakage-safe temporal-split prototype script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rget_name typing Address two PR #2 review findings: 1. ``_validate_configuration`` previously assumed ``self.engines`` and ``self.selection_methods`` were iterable sequences. A bare ``str`` (e.g. ``engines="tabular"``) would silently iterate character-by-character through the downstream non-string-entry / set-diff checks and surface as a confusing ``Unknown engines: ['a', 'b', 'l', 'r', 't', 'u']`` error, while a non-iterable value (e.g. ``engines=5``) raised an unrelated ``TypeError`` from ``set(self.engines)``. The validator now rejects anything that is not a ``list`` or ``tuple`` up front with a clear ``ValueError`` that names the offending type and value, and runs *before* every other engines/methods check so the rest of the validator can rely on a well-formed container. ``set_params`` inherits this guard (and its atomic rollback) automatically. 2. ``fit`` and ``fit_transform`` annotated ``target_name`` as ``Optional[str]``, but ``find_potential_leakage_columns`` is explicitly designed to accept any column-label type DataFrames support (the helper normalizes labels via ``str(...)`` and is covered by tests with integer labels). The annotation has been widened to ``Optional[Any]`` on both methods, the matching docstring entries now read ``hashable, optional`` and call out that non-string labels are accepted, and the class-level ``Other Parameters`` block was updated for consistency. Tests added in ``tests/test_sklearn_compat.py``: - ``test_init_rejects_string_engines_argument`` - ``test_init_rejects_non_sequence_engines_argument`` (covers ``int`` and ``dict``) - ``test_init_rejects_string_selection_methods_argument`` - ``test_init_rejects_non_sequence_selection_methods_argument`` - ``test_init_accepts_tuple_engines`` (pins tuple support so future tightenings don't accidentally break it) - ``test_set_params_rejects_string_engines_and_rolls_back`` (also verifies prior ``max_features`` is restored) - ``test_fit_accepts_non_string_target_name`` (integer column label ``0`` honored as a target by the leakage guard) Full suite: 681 passed, 2 skipped (was 674, +7). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation + leakage guarding, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware splitting and adding an interaction-heavy auto-FE use-case benchmark/report.

Changes:

Added leakage detection utilities and integrated a configurable leakage guard + early config validation into AutoFeatureEngineer (including more sklearn-compatible set_params behavior).
Hardened engines: TimeSeriesEngine adds series_in_rows mode and stricter time_column validation; RelationalEngine dedupes relationships and validates keys early.
Benchmarks now share a split helper (split_benchmark_data) and include a new auto feature engineering use-case benchmark + generated report.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`featcopilot/transformers/sklearn_compat.py`	Adds leakage guard + target-aware leakage checking, early configuration validation, fit state reset, and more sklearn-compatible `set_params`.
`featcopilot/utils/validation.py`	Introduces `find_potential_leakage_columns()` helper for leakage-prone column name detection.
`featcopilot/utils/__init__.py`	Exposes `find_potential_leakage_columns` via `featcopilot.utils`.
`featcopilot/engines/timeseries.py`	Adds `series_in_rows` mode and validates `time_column` presence; updates transform logic accordingly.
`featcopilot/engines/relational.py`	Dedupes relationships and adds eager relationship/key validation in `fit` and `transform`.
`featcopilot/__init__.py`	Makes package import robust by falling back to `__version__ = "0+unknown"` when metadata is missing.
`benchmarks/splits.py`	Centralizes benchmark split policy (chronological for forecasting/timeseries, stratified for classification when possible).
`benchmarks/compare_tools/run_fe_tools_comparison.py`	Switches benchmark splitting to the shared `split_benchmark_data` helper.
`benchmarks/use_cases/run_auto_feature_engineering_benchmark.py`	Adds a focused interaction-heavy use-case benchmark comparing FeatCopilot to baseline/tools.
`benchmarks/use_cases/README.md`	Documents how to run the new use-case benchmark and expected outputs.
`benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md`	Checked-in generated benchmark report.
`benchmarks/datasets.py`	Adds `from __future__ import annotations` to support Python 3.9 type syntax usage.
`tests/test_utils.py`	Adds coverage for leakage detection behavior across edge cases (non-string columns, empty keywords, falsy target).
`tests/test_sklearn_compat.py`	Adds extensive regression coverage for sklearn estimator behavior, validation, rollback, and import fallback.
`tests/test_engines.py`	Adds coverage for `series_in_rows` timeseries mode and relational relationship validation.
`tests/test_autofeat.py`	Adds coverage for leakage guard modes and early config validation failures.
`tests/test_benchmark_splits.py`	Adds behavioral + wiring tests enforcing benchmark scripts use `split_benchmark_data`.
`examples/time_aware_tabular_prototype.py`	Adds an end-to-end leakage-safe time-aware workflow example.
`docs/examples/time-aware-tabular.md`	Documents the time-aware tabular example and leakage guard usage.
`docs/examples/relational-feature-engineering.md`	Documents relational feature engineering usage and new guardrails.
`mkdocs.yml`	Adds the new docs pages to the navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…s to empty string Address PR #2 review finding: ``find_potential_leakage_columns`` already guarded the *target* side of the empty-substring trap (treating ``target_name=""`` / ``"---"`` as absent), but the *column* side could still trigger it. If a column label normalized to an empty string, the ``normalized_column in normalized_target`` substring check evaluated ``True`` (because ``"" in "label"`` is ``True``), so a column literally named e.g. ``"---"`` or ``"!!!"`` would be flagged as leakage-prone whenever any ``target_name`` was provided. Fixed by skipping any column whose label normalizes to an empty string entirely (``if not normalized_column: continue`` before both the keyword and target match blocks). An empty normalized column has no meaningful content to compare against, so neither side of the match should run for it. Tests added in ``tests/test_utils.py``: - ``test_leakage_detection_columns_normalizing_to_empty_string_are_skipped`` covers: - ``["---", "!!!"]`` with ``target_name="label"`` returns ``[]``. - ``["---", "label_x"]`` returns only ``["label_x"]``. - Mixing meaningful and empty-normalizing labels still reports only the meaningful ones. Also expanded the docstring ``Notes`` block to document the symmetric column-side guard alongside the existing target-side description. Full suite: 682 passed, 2 skipped (was 681, +1). Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier config validation and a lightweight leakage guard, hardens time-series/relational engines, and improves benchmark realism by introducing a shared task-aware split helper plus a focused auto feature engineering use-case benchmark/report.

Changes:

Add leakage detection utility and wire it into AutoFeatureEngineer with new leakage_guard + target_name support, plus safer/atomic set_params behavior.
Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
Introduce benchmarks.splits.split_benchmark_data() and wire benchmark scripts through it; add a new interaction-heavy use-case benchmark and report; update docs/examples and tests accordingly.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`featcopilot/transformers/sklearn_compat.py`	Early config validation, leakage guard, state reset on refit/failure, `fit_transform()` kwarg forwarding, atomic sklearn-style `set_params`.
`featcopilot/utils/validation.py`	New `find_potential_leakage_columns()` helper for fuzzy leakage-prone column detection.
`featcopilot/utils/__init__.py`	Exposes `find_potential_leakage_columns` via `featcopilot.utils`.
`featcopilot/engines/timeseries.py`	Adds `series_in_rows` mode and validates `time_column` earlier; adjusts transform behavior accordingly.
`featcopilot/engines/relational.py`	Dedupes relationships and validates relationship keys in `fit()` and `transform()`.
`featcopilot/__init__.py`	Makes package import resilient when distribution metadata is unavailable (`__version__` fallback).
`benchmarks/splits.py`	New shared task-aware split helper (chronological/stratified/random) with `test_size` validation.
`benchmarks/compare_tools/run_fe_tools_comparison.py`	Replaces direct `train_test_split` usage with `split_benchmark_data`.
`benchmarks/use_cases/run_auto_feature_engineering_benchmark.py`	New use-case benchmark script comparing tools on an interaction-heavy classification task.
`benchmarks/use_cases/README.md`	Documents how to run the new use-case benchmark.
`benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md`	Committed generated report for the new use-case benchmark.
`benchmarks/datasets.py`	Enables `from __future__ import annotations` for Py3.9-friendly type hints.
`tests/test_utils.py`	Adds coverage for leakage detection edge cases (non-string labels, empty keywords, empty targets).
`tests/test_sklearn_compat.py`	Adds extensive sklearn-compat/regression tests (atomic set_params, clone behavior, refit resets, import fallback).
`tests/test_engines.py`	Adds tests for `TimeSeriesEngine.series_in_rows` and `RelationalEngine` key validation.
`tests/test_benchmark_splits.py`	New tests covering split helper behavior + regression wiring checks for benchmark scripts.
`tests/test_autofeat.py`	Adds coverage for leakage guard modes and early config validation errors.
`examples/time_aware_tabular_prototype.py`	New end-to-end time-aware/leakage-safe workflow example.
`docs/examples/time-aware-tabular.md`	Documentation for the time-aware tabular example and leakage guard usage.
`docs/examples/relational-feature-engineering.md`	Documentation for relational feature engineering usage + new guardrails.
`mkdocs.yml`	Adds the new examples to the documentation navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses all five review comments from Copilot and Codex on PR #5: * explain now actually returns generated features (Copilot review #1, Codex P1). Built-in engines (e.g. tabular) populate `_feature_names` during `transform()`, not `fit()`. `_cmd_explain` now calls `fit_transform(..., apply_selection=False)` so the JSON payload contains the full `{name, explanation, code}` records the subcommand advertises. Test asserts `n_features > 0` for tabular. * main(argv) -> int contract honored on parse errors (Copilot review #2). `argparse.parse_args` raises `SystemExit` for usage errors, `--help` and `--version`. `main` now traps those and returns the exit code so programmatic and agent callers always get an int. Tests cover `--version` (rc=0), `--help` (rc=0), no-subcommand (rc=2) and unknown-flag (rc=2). * Real subprocess test for python -m featcopilot (Copilot review #3). `test_dunder_main_subprocess_invocation` and `test_dunder_main_subprocess_version_flag` spawn a real `python -m featcopilot ...` subprocess and assert stdout JSON, so a regression in `__main__.py` actually breaks the suite. * Parquet `ImportError` -> clean exit 2 (Codex P2). `_read_table`/`_write_table` now wrap parquet calls and convert `ImportError` into a `ValueError` with a friendly install hint; the top-level handler routes that to the deterministic `exit 2` user-error path instead of the generic `exit 1` backstop. `test_transform_parquet_missing_engine_returns_exit_2` exercises this via `monkeypatch` of `DataFrame.to_parquet`. * Pre-commit black: re-applied formatting from the pinned `black 24.1.1` hook (joined two long string raises) so the CI pre-commit job passes. Tests: 23 (+5 new) in tests/test_cli.py, 796 passed full suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ut cases Round-25 reviewer feedback addressed: 1. **Copilot — empty-input branch conflated zero-row and zero-column.** `_read_table` raised the same "zero data rows" error for both `len(df) == 0` and `len(df.columns) == 0` (both make `DataFrame.empty` return `True`). A JSON `[{}, {}]` array produces a frame WITH rows but NO columns ù very different user error than a header-only CSV. Split into two distinct, accurately worded error messages: "no columns" vs "zero data rows". 2. **Copilot — read-time warnings leaked to stderr.** `_cmd_transform` only opened the capture context around `engineer.fit_transform`. `pd.read_csv` can legitimately emit `DtypeWarning` on mixed-type CSVs and parquet/JSON readers can emit pyarrow / pandas warnings on a successful read; those were bypassing the capture and bleeding to stderr, breaking the "stderr reserved for failures" contract. 3. **Copilot — write-time warnings also leaked to stderr.** `_write_table` was likewise outside the capture, so pandas / pyarrow `FutureWarning` / `UserWarning` from a successful write could leak. Same contract violation as #2. The fix wraps the entire `_cmd_transform` pipeline (read + build_engineer + fit_transform + write) in a single `_capture_featcopilot_messages` block so warnings from ANY phase end up in the JSON `warnings` field instead of stderr. `_cmd_explain` also widened: `explain_features` / `get_feature_code` are now inside the same capture as the read+fit_transform. The dead helper `_fit_transform_capturing_warnings` is no longer used internally; kept as a thin convenience wrapper for external test code with an updated docstring noting that the CLI now wraps a wider region. New tests: - `test_transform_zero_columns_input_distinguishes_from_zero_rows` ù pins the column-vs-row error-message distinction. - `test_transform_zero_rows_input_still_uses_zero_rows_message` ù guards the existing zero-rows wording for the header-only case. - `test_transform_read_warning_captured_not_on_stderr` ù patches `pd.read_csv` to emit a warning, asserts `err == ""` and the warning lands in JSON `warnings`. - `test_transform_write_warning_captured_not_on_stderr` ù patches `DataFrame.to_csv` to emit a warning, same contract. - `test_explain_features_warnings_captured_not_on_stderr` ù patches `AutoFeatureEngineer.explain_features` to emit a warning, same contract for `_cmd_explain`. 906 tests pass locally (full suite); 133 in tests/test_cli.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

thinkall added 5 commits March 17, 2026 21:48

fix: harden source imports and fit_transform kwargs

8e5d323

feat: add leakage guard and time-aware prototype

50b399c

feat: improve engine APIs and benchmark realism

50957fa

docs: add relational example and branch summary

db24392

bench: add auto feature engineering use-case report

9315422

thinkall requested a review from Copilot March 18, 2026 07:02

Copilot started reviewing on behalf of thinkall March 18, 2026 07:02 View session

style: run pre-commit on benchmark script

c4a8a93

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread featcopilot/utils/validation.py Outdated

Comment thread featcopilot/utils/validation.py Outdated

Comment thread featcopilot/transformers/sklearn_compat.py

Comment thread benchmarks/compare_tools/run_fe_tools_comparison.py Outdated

Copilot started work on behalf of thinkall April 28, 2026 05:23 View session

Copilot AI and others added 6 commits April 28, 2026 05:26

fix: address benchmark and leakage review feedback

a09dc98

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

test: cover leakage column edge cases

4ddcce5

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

refactor: simplify leakage target matching

e280cfc

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

docs: clarify leakage target matching

4dcdfe9

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

test: cover mixed leakage keyword columns

f5e78a8

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

test: cover non-string leakage targets

b1a3275

Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot finished work on behalf of thinkall April 28, 2026 05:33

thinkall and others added 2 commits April 29, 2026 16:52

thinkall requested a review from Copilot April 30, 2026 07:38

Copilot started reviewing on behalf of thinkall April 30, 2026 07:39 View session