Skip to content

bench: add auto feature engineering use-case report#2

Merged
thinkall merged 22 commits into
mainfrom
feat/strengthen-core-api
Apr 30, 2026
Merged

bench: add auto feature engineering use-case report#2
thinkall merged 22 commits into
mainfrom
feat/strengthen-core-api

Conversation

@thinkall
Copy link
Copy Markdown
Owner

@thinkall thinkall commented Mar 18, 2026

Summary

Strengthen FeatCopilot's core API with safer defaults and earlier validation, add a lightweight leakage guard for label/future-information columns, harden the time-series and relational engines, and improve benchmark realism with a shared task-aware split helper plus a focused use-case benchmark for interaction-heavy auto feature engineering.

Motivation

The sklearn-compatible AutoFeatureEngineer was easy to misuse in real workflows:

  • invalid engine names or selection methods only failed deep inside fit(), often after expensive setup
  • there was no built-in signal for label/future-information leakage when feeding curated tabular data
  • fit_transform() silently dropped transform-time kwargs (e.g. text_columns, related_tables)
  • the source checkout failed to import when no installed package metadata was available

The benchmarks also under-represented realistic FE workloads:

  • train_test_split(...) was applied uniformly to forecasting datasets, which masked time-leakage effects
  • the comparison suite lacked a focused, interaction-heavy use-case showing where automated FE actually helps

This PR addresses both concerns incrementally without redesigning any public API.

What changed

Core API (featcopilot/transformers/sklearn_compat.py, featcopilot/__init__.py)

  • Added leakage_guard config ('off' | 'warn' | 'raise', default 'warn') that flags column names matching configurable label/future keywords or fuzzy variants of the target.
  • Added a fit-time target_name parameter on fit() and fit_transform() for leakage-aware column matching; documented in both method docstrings and the class docstring's Other Parameters section.
  • Hardened early configuration validation: unknown engines, unknown selection methods, invalid leakage_guard, and non-positive max_features now raise ValueError immediately in __init__ (and on set_params).
  • fit_transform() now forwards transform-time kwargs (e.g. text_columns, related_tables) to the inner transform() call so one-shot workflows match staged ones.
  • featcopilot.__version__ falls back to "0+unknown" when package metadata is unavailable (e.g. fresh source checkouts), instead of raising PackageNotFoundError at import time.

Engines (featcopilot/engines/timeseries.py, featcopilot/engines/relational.py)

  • TimeSeriesEngine: added an opt-in series_in_rows mode for row-wise sequence cells; clearer validation of time_column. Default aggregate-style behavior is preserved.
  • RelationalEngine: deduplicates relationships across repeated add_relationship calls and validates child/parent keys eagerly so missing keys fail fast with actionable errors.

Utilities (featcopilot/utils/validation.py, featcopilot/utils/__init__.py)

  • New find_potential_leakage_columns() helper, exposed at featcopilot.utils, robust to non-string column labels (ints, etc.) and fuzzy target-name matching.

Benchmarks (benchmarks/)

  • New shared module benchmarks/splits.py centralizing the split policy:
    • chronological split for forecast / timeseries tasks
    • stratified split for classification (when class counts allow)
    • random split otherwise
  • Wired both compare_tools/run_fe_tools_comparison.py and the new use_cases/run_auto_feature_engineering_benchmark.py through the shared helper, replacing direct train_test_split calls.
  • Added focused use-case benchmark benchmarks/use_cases/run_auto_feature_engineering_benchmark.py plus its README and generated report (AUTO_FEATURE_ENGINEERING_USE_CASE.md), comparing FeatCopilot, Featuretools, and autofeat on an interaction-heavy classification task.
  • Added from __future__ import annotations to benchmarks/datasets.py so PEP 604 (X | None) annotations work under Python 3.9.

Docs / examples

  • examples/time_aware_tabular_prototype.py: end-to-end leakage-safe workflow.
  • docs/examples/time-aware-tabular.md and docs/examples/relational-feature-engineering.md: walkthroughs wired into mkdocs.yml.

Tests

  • tests/test_autofeat.py: covers the leakage guard modes and the early-config-validation failures.
  • tests/test_engines.py: covers TimeSeriesEngine.series_in_rows and RelationalEngine dedupe / key-validation behavior.
  • tests/test_sklearn_compat.py: regression test for source-checkout import without package metadata.
  • tests/test_utils.py: covers leakage detection for non-string column labels and mixed keyword/target hits.
  • tests/test_benchmark_splits.py (new): behavioral coverage of the split helper (chronological / stratified / random / custom test_size) plus wiring regression tests asserting both benchmark scripts use split_benchmark_data instead of train_test_split directly.

Behavior changes

  • AutoFeatureEngineer(...) now raises ValueError instead of silently accepting unknown engines, unknown selection methods, an invalid leakage_guard, or a non-positive max_features. Callers passing valid configs are unaffected.
  • With the default leakage_guard='warn', fitting on a frame with suspicious column names (e.g. target, label_*, future_*) emits a UserWarning. Set leakage_guard='off' to restore prior silent behavior.
  • Forecasting/timeseries datasets in the in-scope benchmarks now use a chronological split. Pre-existing simple_models and automl benchmark scripts are intentionally left untouched (out of scope; their split policy is unchanged).

Backward compatibility

  • No public API removals. All new constructor and fit-time parameters have safe defaults that preserve prior behavior for valid existing usage.
  • The leakage_guard='warn' default is the only user-visible behavior change for valid existing usage; opt out via leakage_guard='off'.

Testing

  • pre-commit run --all-files — passes (black, ruff, hooks).
  • pytest tests/647 passed, 2 skipped locally.
  • Verified on Python 3.9 (fresh conda env reproducing the original Py3.9 CI failure) and Python 3.13.
  • New use-case benchmark report regenerated at benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md.

Checklist

  • Tests added / updated for new behavior
  • Docs / examples updated where user-facing behavior changed
  • pre-commit run --all-files passes
  • pytest tests/ passes locally
  • No public API removals; new parameters have backward-compatible defaults

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s core API reliability and adds practical safety/benchmarking assets, including a new “auto feature engineering” use-case benchmark report and improved guardrails in the sklearn-compatible API and engines.

Changes:

  • Add leakage_guard + early configuration validation to AutoFeatureEngineer, plus safer __version__ fallback when package metadata is missing.
  • Extend TimeSeriesEngine (row-wise series mode) and RelationalEngine (relationship dedupe + key validation), with corresponding tests and docs/examples.
  • Add a focused use-case benchmark script/report and start introducing more realistic benchmark split logic.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_sklearn_compat.py Adds regression test for importing from source checkout without dist metadata.
tests/test_engines.py Adds coverage for new TimeSeries/Relational validation behaviors.
tests/test_autofeat.py Adds tests for leakage guard + early config validation failures.
mkdocs.yml Adds new examples to the documentation nav.
featcopilot/utils/validation.py Introduces leakage-column detection helper.
featcopilot/utils/init.py Exposes find_potential_leakage_columns at utils level.
featcopilot/transformers/sklearn_compat.py Adds leakage_guard, config validation, and target_name plumbing.
featcopilot/engines/timeseries.py Adds series_in_rows mode and time column validation.
featcopilot/engines/relational.py Dedupes relationships and validates relationship keys in fit/transform.
featcopilot/init.py Falls back to 0+unknown version when metadata is missing.
examples/time_aware_tabular_prototype.py Adds a time-aware, leakage-safe workflow example.
docs/examples/time-aware-tabular.md Documents the time-aware prototype + leakage guard usage.
docs/examples/relational-feature-engineering.md Documents relational engine usage + guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds benchmark runner to generate the use-case report.
benchmarks/use_cases/README.md Documents how to run the use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Adds generated benchmark report output.
benchmarks/compare_tools/run_fe_tools_comparison.py Adds split_benchmark_data() helper (currently not yet wired in).
PR_SUMMARY_feat_strengthen_core_api.md Adds a branch/PR summary document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/utils/validation.py Outdated
Comment thread featcopilot/utils/validation.py Outdated
Comment thread featcopilot/transformers/sklearn_compat.py
Comment thread benchmarks/compare_tools/run_fe_tools_comparison.py Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 99.22481% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.18%. Comparing base (2eb9c6b) to head (0e329f1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
featcopilot/engines/timeseries.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #2      +/-   ##
==========================================
+ Coverage   87.84%   88.18%   +0.33%     
==========================================
  Files          35       36       +1     
  Lines        3702     3808     +106     
==========================================
+ Hits         3252     3358     +106     
  Misses        450      450              
Flag Coverage Δ
unittests 88.18% <99.22%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thinkall
Copy link
Copy Markdown
Owner Author

@copilot , address all the comments.

Copilot AI and others added 6 commits April 28, 2026 05:26
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Agent-Logs-Url: https://github.com/thinkall/featcopilot/sessions/81d365b5-07ae-4e09-aa5a-8508c9323f42

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

@copilot , address all the comments.

Addressed all review comments across the follow-up commits, ending with b1a3275. Pre-commit and targeted pytest validation pass locally.

thinkall and others added 2 commits April 29, 2026 16:52
Address remaining PR #2 review comments:

- Move split_benchmark_data from compare_tools into a shared
  benchmarks/splits.py module so both the existing comparison
  benchmark and the new use-case benchmark route their splits
  through the same task-aware policy.
- Wire benchmarks/use_cases/run_auto_feature_engineering_benchmark.py
  to call split_benchmark_data instead of train_test_split directly,
  matching the PR summary's intent.
- Add target_name to the AutoFeatureEngineer class docstring under
  an "Other Parameters" section so it surfaces in help() and generated
  docs alongside the constructor params.
- Add tests/test_benchmark_splits.py covering both the helper's
  behavior (chronological vs stratified vs random) and a wiring
  regression check that benchmark scripts use the helper instead of
  calling train_test_split directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Python 3.9 CI job failed when collecting tests/test_benchmark_splits.py
because importing benchmarks.splits triggers benchmarks/__init__.py, which
eagerly imports benchmarks.datasets. That module uses PEP 604 union syntax
(`str | None`) at function-definition time, which is only supported at
runtime on Python 3.10+.

Add `from __future__ import annotations` to benchmarks/datasets.py so the
annotations become lazy strings and the module imports cleanly under Python
3.9. Verified locally with a fresh Python 3.9 environment: the full test
suite (647 passed, 2 skipped) and the new benchmark split tests both pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “auto feature engineering” use-case benchmark + report, while strengthening core API reliability and benchmark realism (split policy, preprocessing hardening, and safer imports).

Changes:

  • Introduce benchmarks.splits.split_benchmark_data() and wire it into benchmark scripts with tests to prevent regressions.
  • Harden core library behavior: source-checkout import version fallback, early config validation + leakage-guard in AutoFeatureEngineer, plus updates to TimeSeriesEngine and RelationalEngine.
  • Add a focused use-case benchmark script/report and new documentation examples (time-aware tabular + relational FE).

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_utils.py Adds unit coverage for leakage-column detection on non-string column labels.
tests/test_sklearn_compat.py Adds regression test for featcopilot import when package metadata is missing.
tests/test_engines.py Adds coverage for TimeSeriesEngine.series_in_rows and relational relationship validation errors.
tests/test_benchmark_splits.py New behavioral + wiring tests ensuring benchmark scripts use shared split helper.
tests/test_autofeat.py Adds tests for leakage guard modes and early configuration validation.
mkdocs.yml Adds nav entries for new documentation examples.
featcopilot/utils/validation.py New leakage-column detection helper.
featcopilot/utils/init.py Exposes find_potential_leakage_columns in utils public API.
featcopilot/transformers/sklearn_compat.py Adds leakage guard, config validation, and forwards target_name + transform kwargs in fit_transform.
featcopilot/engines/timeseries.py Adds series_in_rows mode and validates time_column.
featcopilot/engines/relational.py Dedupes relationships and validates relationship keys at fit/transform time.
featcopilot/init.py Falls back to 0+unknown when distribution metadata is unavailable.
examples/time_aware_tabular_prototype.py New end-to-end time-aware workflow example (temporal split + leakage-safe FE).
docs/examples/time-aware-tabular.md Documents the time-aware prototype + leakage guard behavior.
docs/examples/relational-feature-engineering.md Documents relational engine usage + new guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py New use-case benchmark runner comparing baseline vs FeatCopilot vs optional tools.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Adds a committed benchmark report artifact.
benchmarks/splits.py New shared benchmark split policy implementation.
benchmarks/datasets.py Adds __future__ annotations import.
benchmarks/compare_tools/run_fe_tools_comparison.py Switches to shared split helper.
PR_SUMMARY_feat_strengthen_core_api.md Adds a branch/PR summary document for reviewers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/transformers/sklearn_compat.py Outdated
Comment thread benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Outdated
Comment thread benchmarks/use_cases/run_auto_feature_engineering_benchmark.py
Comment thread benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Outdated
thinkall and others added 2 commits April 30, 2026 15:49
The PR summary belongs in the GitHub PR description, not committed to
the repository. Its content has been moved to PR #2's body so the file
no longer needs to live in source control.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address the latest PR review feedback on the strengthen-core-api branch:

* AutoFeatureEngineer (sklearn_compat):
  - Default collection-valued constructor args (`engines`,
    `selection_methods`, `llm_config`) using `is not None` instead of
    truthiness so explicit empty containers and identity-bearing
    arguments are preserved. This is also what sklearn's `clone()`
    round-trip identity check requires; previously cloning the
    estimator raised `RuntimeError: Cannot clone object ... constructor
    either does not set or modifies parameter llm_config`.
  - Make `set_params` tolerate `None` for the same parameters (common in
    `GridSearchCV` parameter grids) by normalizing to defaults before
    `_validate_configuration` is called, so callers no longer hit
    `TypeError: 'NoneType' object is not iterable` from `set(None)`.
  - Add tests covering the None-normalization in `set_params`,
    validation of non-None values, and the sklearn `clone` round trip.

* Auto feature engineering use-case benchmark:
  - In the Featuretools case, drop the assignment from
    `train_copy.ww.init(...)`. `Accessor.init` mutates in place and
    returns `None`, so the previous code clobbered the DataFrame and
    later raised `'NoneType' object has no attribute 'columns'` when
    building the EntitySet.
  - In the autofeat case, convert `y_train` to a numpy array via
    `np.asarray(y_train).ravel()` before calling `fit_transform`, so
    autofeat receives the 1-D ndarray it expects regardless of whether
    the caller passes a Series or array.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an auto feature engineering use-case benchmark + report.

Changes:

  • Add leakage detection utilities and wire a configurable leakage guard + early config validation into AutoFeatureEngineer.
  • Extend TimeSeriesEngine (row-wise series mode + clearer time_column validation) and harden RelationalEngine (relationship dedupe + eager key validation).
  • Introduce benchmarks/splits.py for task-aware splitting and add a focused auto-FE use-case benchmark with wiring tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
featcopilot/transformers/sklearn_compat.py Adds leakage guard, early config validation, forwards transform kwargs in fit_transform, expands sklearn-compat set_params.
featcopilot/utils/validation.py New helper to detect potential leakage-prone columns via keyword + fuzzy target matching.
featcopilot/utils/__init__.py Exposes find_potential_leakage_columns via utils public API.
featcopilot/engines/timeseries.py Adds series_in_rows mode and improves time_column validation/logging.
featcopilot/engines/relational.py Deduplicates relationships and validates relationship keys in fit/transform.
featcopilot/__init__.py Makes __version__ resilient to missing distribution metadata (source checkout import).
benchmarks/splits.py New shared split helper implementing chronological/stratified/random policies.
benchmarks/compare_tools/run_fe_tools_comparison.py Replaces ad-hoc train_test_split usage with split_benchmark_data.
benchmarks/datasets.py Adds from __future__ import annotations for Py3.9 typing compatibility.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py New interaction-heavy auto-FE benchmark script.
benchmarks/use_cases/README.md Documents how to run the use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Checked-in generated report for the new use-case benchmark.
tests/test_utils.py Adds tests for leakage detection with non-string column labels.
tests/test_autofeat.py Adds tests for leakage guard modes and early validation failures.
tests/test_engines.py Adds coverage for series_in_rows and relational relationship key validation.
tests/test_sklearn_compat.py Adds sklearn clone/set_params behavior tests and import-without-metadata regression test.
tests/test_benchmark_splits.py New tests for split helper behavior + wiring regression checks.
examples/time_aware_tabular_prototype.py New end-to-end example demonstrating time-aware splitting + leakage-safe workflow.
docs/examples/time-aware-tabular.md New docs page for the time-aware example and leakage guard usage.
docs/examples/relational-feature-engineering.md New docs page for relational feature engineering usage/guardrails.
mkdocs.yml Adds the new example docs pages to the navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/transformers/sklearn_compat.py
Comment thread benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Outdated
Comment thread benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Outdated
Address the latest PR review comments on the strengthen-core-api branch:

* AutoFeatureEngineer.set_params:
  - Validate parameter keys against ``self.get_params(deep=True)`` and
    raise ``ValueError`` on unknown keys, matching scikit-learn's
    ``BaseEstimator.set_params`` convention. Previously typos like
    ``afe.set_params(typo_param=42)`` were silently accepted via
    ``setattr``, which masked bugs and broke tooling that expects
    sklearn-style validation. Documented the contract in the docstring
    and added regression tests covering the unknown-key error and that
    a failing call leaves the estimator unmutated.

* Auto feature engineering use-case benchmark:
  - Drop the redundant one-hot encoding/alignment in ``run_baseline``;
    ``evaluate_auc`` already owns that step (and applies it
    idempotently for the FE-tool runners that already produce numeric
    matrices). The baseline now passes raw frames straight through and
    reports ``n_features`` from the post-encoding column count so the
    metric still matches what the model actually trains on.
  - Sanitize ``±inf`` in ``align_and_fill`` (Featuretools'
    ``divide_numeric`` primitive can emit infinities on
    zero-denominator rows, which then crashes ``StandardScaler``). NaN
    handling is preserved.
  - Regenerate the committed report so its status text reflects what
    the current script actually does. With the upstream-bug fixes in
    the previous commit, the Featuretools case now succeeds end to
    end; the autofeat case still fails, but with the genuine current
    error (sklearn ≥ 1.6 dropped the ``force_all_finite`` kwarg that
    autofeat 2.1.x still passes) instead of the stale
    ``'Series' object has no attribute 'ravel'`` symptom.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API (earlier validation, leakage guard, safer imports), hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding a focused auto-FE use-case benchmark and report.

Changes:

  • Add leakage detection utilities + AutoFeatureEngineer leakage guard, early config validation, and improved sklearn set_params behavior.
  • Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
  • Introduce shared benchmarks.splits.split_benchmark_data() and wire benchmark scripts through it; add new use-case benchmark, docs, and tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
featcopilot/transformers/sklearn_compat.py Adds leakage guard, early validation, target_name, forwards kwargs in fit_transform, and strengthens set_params.
featcopilot/utils/validation.py New find_potential_leakage_columns() helper for fuzzy leakage column detection.
featcopilot/utils/__init__.py Exposes leakage detection helper via featcopilot.utils.
featcopilot/engines/timeseries.py Adds series_in_rows mode and time_column validation; refactors transform path.
featcopilot/engines/relational.py Deduplicates relationships and validates relationship keys in fit/transform.
featcopilot/__init__.py Makes __version__ import robust when package metadata is missing.
benchmarks/splits.py New task-aware split helper (chronological vs stratified vs random).
benchmarks/compare_tools/run_fe_tools_comparison.py Uses shared split helper instead of direct train_test_split.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds new interaction-heavy auto-FE use-case benchmark and report generation.
benchmarks/use_cases/README.md Documents running the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Committed generated benchmark report.
benchmarks/datasets.py Adds from __future__ import annotations for Py3.9 typing compatibility.
tests/test_benchmark_splits.py Adds behavioral + wiring tests for shared split helper.
tests/test_sklearn_compat.py Adds sklearn set_params and import-metadata fallback regression coverage.
tests/test_utils.py Adds tests for leakage detection with non-string column labels.
tests/test_engines.py Adds tests for series_in_rows and relational key validation behavior.
tests/test_autofeat.py Adds tests for leakage guard modes and early config validation.
examples/time_aware_tabular_prototype.py Adds leakage-safe time-aware workflow example.
docs/examples/time-aware-tabular.md Documents time-aware tabular example and leakage guard.
docs/examples/relational-feature-engineering.md Documents relational engine usage and new guardrails.
mkdocs.yml Adds the new examples to the docs nav.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/transformers/sklearn_compat.py
Comment thread featcopilot/transformers/sklearn_compat.py Outdated
Address two PR #2 review findings on AutoFeatureEngineer:

1. ``fit`` previously appended into ``self._engine_instances`` and never
   cleared the prior dict or reset ``self._selector``. After changing
   ``engines`` via ``set_params`` (or after a previous ``fit_transform``
   that built a selector), ``transform`` could run stale engines or apply
   a stale selection. ``fit`` now calls a new ``_reset_fit_state`` helper
   that mirrors the fit-derived attribute initialization in ``__init__``
   (``_engine_instances``, ``_selector``, ``_feature_set``,
   ``_is_fitted``, ``_column_descriptions``, ``_task_description``)
   *before* doing any work, so a refit -- and even a fit that raises
   midway -- leaves the estimator in a clean, unfitted state instead of
   a partially-fitted one.

2. ``set_params`` mutated attributes via ``setattr`` and only then
   called ``_validate_configuration``. A failing call (e.g. an invalid
   ``engines``/``leakage_guard``/``max_features`` combination) left the
   estimator in a partially mutated, invalid state. The implementation
   now snapshots the pre-call values for every parameter in the request
   *before* applying any mutation (including the eventual ``None`` ->
   default normalization). On any exception, the snapshot is restored
   and the original error re-raised, so a failing ``set_params`` is a
   no-op from the caller's perspective.

Tests:
- test_set_params_invalid_value_rolls_back_state: combined invalid
  payload, every parameter restored.
- test_set_params_invalid_value_after_none_normalization_rolls_back:
  rollback restores the pre-call value rather than the
  None-normalized default.
- test_fit_resets_engine_instances_when_engines_change: removing an
  engine via set_params and refitting drops the stale fitted engine.
- test_fit_resets_selector_after_prior_fit_transform: a plain fit()
  following fit_transform() clears the selector so transform() does
  not re-apply the prior selection.
- test_fit_resets_state_when_called_after_failed_fit: a fit that
  raises mid-flight leaves _is_fitted=False, _engine_instances={}
  and transform() correctly errors out.

Full suite: 657 passed, 2 skipped (was 652, +5 new tests).
Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and leakage detection, hardens time-series/relational engines, and improves benchmark realism by centralizing task-aware split policy plus adding an interaction-heavy auto-feature-engineering use-case benchmark.

Changes:

  • Add leakage guard + early config validation to AutoFeatureEngineer, plus safer import-time version fallback.
  • Add split_benchmark_data() and wire benchmark scripts through it; add a new use-case benchmark + report.
  • Add engines hardening (TimeSeriesEngine.series_in_rows, relational relationship dedupe + key validation) and corresponding tests/docs/examples.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_utils.py Adds tests for leakage detection with non-string column labels.
tests/test_sklearn_compat.py Adds sklearn-compat regression tests for set_params behavior and package import fallback.
tests/test_engines.py Adds coverage for series_in_rows and relational relationship key validation.
tests/test_benchmark_splits.py Adds behavioral + wiring tests for the shared benchmark split helper.
tests/test_autofeat.py Adds tests for leakage guard modes and early config validation failures.
mkdocs.yml Wires new example docs pages into the MkDocs nav.
featcopilot/utils/validation.py Introduces find_potential_leakage_columns() helper.
featcopilot/utils/init.py Exposes find_potential_leakage_columns via featcopilot.utils.
featcopilot/transformers/sklearn_compat.py Adds leakage guard, early config validation, target_name, atomic set_params, and fit-state resets.
featcopilot/engines/timeseries.py Adds series_in_rows mode and clearer time_column validation/logging.
featcopilot/engines/relational.py Dedupes relationships and validates relationship keys on fit/transform.
featcopilot/init.py Makes __version__ robust to missing installed package metadata.
examples/time_aware_tabular_prototype.py Adds an end-to-end time-aware, leakage-safer workflow example.
docs/examples/time-aware-tabular.md Documents the time-aware tabular workflow + leakage guard behavior.
docs/examples/relational-feature-engineering.md Documents relational engine usage + new guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds the interaction-heavy AFE use-case benchmark runner + report writer.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Adds a generated benchmark report artifact.
benchmarks/splits.py Introduces centralized task-aware train/test split helper for benchmarks.
benchmarks/datasets.py Adds from __future__ import annotations for Py3.9 typing compatibility.
benchmarks/compare_tools/run_fe_tools_comparison.py Switches comparison benchmark to use split_benchmark_data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/transformers/sklearn_compat.py
Comment thread benchmarks/splits.py
… test_size

Address two PR #2 review findings:

1. ``AutoFeatureEngineer._validate_configuration`` previously used
   ``sorted(set(self.engines) - SUPPORTED_ENGINES)`` (and the analogous
   call for ``selection_methods``) to build its error messages. If the
   caller passed a mix of non-string values (e.g.
   ``engines=[None, "spaceship"]``), ``sorted`` raised a confusing
   ``TypeError`` from the comparison rather than the intended
   ``ValueError``. The validator now rejects non-string entries up front
   with a clear ``ValueError`` that names the offending values and
   lists the supported names. ``set_params`` inherits the new check
   (and rolls back on it) because validation runs through the same code
   path.

2. ``benchmarks.splits.split_benchmark_data`` did not validate
   ``test_size`` for the chronological branch. Values like
   ``test_size <= 0`` or ``>= 1`` silently produced empty / overlapping
   splits, whereas the random branch (via ``train_test_split``) would
   have raised. The function now asserts ``0 < test_size < 1`` up front
   and additionally raises when the chronological ``split_idx`` would
   leave either side of the split empty (e.g. tiny datasets combined
   with an extreme ``test_size``), matching the fail-fast behavior of
   ``sklearn.model_selection.train_test_split``.

Tests added in ``tests/test_sklearn_compat.py``:
- ``test_validate_engines_rejects_non_string_entries``
- ``test_validate_selection_methods_rejects_non_string_entries``
- ``test_set_params_rejects_non_string_engine_entries_and_rolls_back``

Tests added in ``tests/test_benchmark_splits.py``:
- ``test_split_benchmark_data_rejects_out_of_range_test_size``
  (parametrized over both random and chronological branches with 0.0,
  1.0, -0.1, 1.5, 2)
- ``test_split_benchmark_data_chronological_rejects_empty_train_split``
- ``test_split_benchmark_data_chronological_single_row_dataset_raises``

Full suite: 667 passed, 2 skipped (was 657, +10 new).
Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible API with earlier validation and lightweight leakage detection, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware train/test split behavior and adding a focused auto-FE use-case benchmark/report.

Changes:

  • Add leakage detection utilities and wire a configurable leakage guard into AutoFeatureEngineer (plus early config validation + more sklearn-compatible set_params behavior).
  • Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
  • Centralize benchmark split policy in benchmarks/splits.py, wire benchmark scripts to it, and add a new auto-FE use-case benchmark + docs/examples/tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_utils.py Adds unit coverage for leakage detection with non-string column labels and fuzzy target matching.
tests/test_sklearn_compat.py Expands sklearn-compat tests for set_params atomicity/validation, cloning behavior, state reset semantics, and metadata-less import.
tests/test_engines.py Adds coverage for TimeSeriesEngine.series_in_rows and RelationalEngine relationship key validation.
tests/test_benchmark_splits.py Adds behavioral + wiring tests for the shared benchmark split helper and ensures scripts don’t call train_test_split directly.
tests/test_autofeat.py Adds coverage for leakage guard modes and early config validation failures.
mkdocs.yml Adds new example pages to the documentation nav.
featcopilot/utils/validation.py Introduces find_potential_leakage_columns() helper for leakage-prone column name detection.
featcopilot/utils/init.py Exposes the new leakage helper via featcopilot.utils.
featcopilot/transformers/sklearn_compat.py Adds leakage guard + target_name, early config validation, fit-state reset, improved fit_transform kwarg forwarding, and robust/atomic set_params.
featcopilot/engines/timeseries.py Adds series_in_rows mode and more explicit time_column validation/handling.
featcopilot/engines/relational.py Deduplicates relationships and validates relationship keys early (fit/transform).
featcopilot/init.py Makes __version__ robust to missing installed package metadata by falling back to 0+unknown.
examples/time_aware_tabular_prototype.py Adds an end-to-end time-aware (chronological split) leakage-safer example workflow.
docs/examples/time-aware-tabular.md Documents the time-aware tabular example and leakage guard behavior.
docs/examples/relational-feature-engineering.md Adds a relational FE walkthrough and explains new relational guardrails.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds a focused interaction-heavy auto-FE use-case benchmark and report generator.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Commits a generated report for the new use-case benchmark.
benchmarks/splits.py Adds shared task-aware split helper (chronological for forecasting/timeseries, stratified for classification when possible).
benchmarks/datasets.py Adds from __future__ import annotations for Python 3.9-friendly type syntax.
benchmarks/compare_tools/run_fe_tools_comparison.py Replaces direct train_test_split usage with the shared split helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/utils/validation.py Outdated
Comment thread featcopilot/transformers/sklearn_compat.py
Address two PR #2 review findings:

1. ``find_potential_leakage_columns`` (``featcopilot/utils/validation.py``)
   - ``keywords = keywords or DEFAULT_LEAKAGE_KEYWORDS`` prevented callers
     from intentionally passing ``keywords=[]`` to disable keyword
     matching. Switched to an explicit ``is None`` check so an empty
     list is honored.
   - ``normalized_target = ... if target_name else None`` had two bugs:
     valid falsy targets (e.g. ``0``) were treated as "no target", and
     ``target_name=""`` would normalize to the empty string, which
     matches every column via the ``normalized_target in normalized_column``
     substring check. Now: explicit ``is None`` check, and an empty
     normalized result is treated as absent so that ``target_name=""``
     (or values like ``"---"`` that strip to nothing) no longer matches
     every column, while ``target_name=0`` correctly drives matching.

2. ``AutoFeatureEngineer`` (``featcopilot/transformers/sklearn_compat.py``)
   - With the recent ``is not None`` defaulting in ``__init__``, an
     explicit ``engines=[]`` was preserved. ``fit()`` would then run
     zero engines and still mark the estimator fitted, making
     ``transform()`` a silent no-op. ``_validate_configuration`` now
     rejects empty ``engines`` and empty ``selection_methods`` with a
     clear ``ValueError`` that points callers at ``None`` (for the
     default) or the supported names. ``set_params`` inherits this
     check (and the atomic rollback) automatically.

Tests added in ``tests/test_utils.py``:
- ``test_leakage_detection_empty_keywords_disables_keyword_matching``
- ``test_leakage_detection_falsy_target_name_zero_still_matches``
- ``test_leakage_detection_empty_string_target_does_not_match_everything``

Tests added in ``tests/test_sklearn_compat.py``:
- ``test_init_rejects_empty_engines_list``
- ``test_init_rejects_empty_selection_methods_list``
- ``test_set_params_rejects_empty_engines_and_rolls_back``
- ``test_init_engines_none_still_defaults_to_tabular``

Full suite: 674 passed, 2 skipped (was 667, +7).
Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@thinkall thinkall requested a review from Copilot April 30, 2026 14:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation and a lightweight leakage guard, hardens the time-series/relational engines, and improves benchmark realism by centralizing task-aware splitting plus adding an interaction-heavy auto feature engineering use-case benchmark + report.

Changes:

  • Add leakage detection utilities and wire a configurable leakage guard into AutoFeatureEngineer with early config validation + improved sklearn set_params behavior.
  • Enhance TimeSeriesEngine (row-wise series mode + time column validation) and RelationalEngine (relationship de-dupe + eager key validation).
  • Introduce benchmarks.splits.split_benchmark_data() and update benchmark scripts/tests/docs/examples, plus add a new use-case benchmark and report.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
featcopilot/transformers/sklearn_compat.py Adds leakage guard + early validation, resets fit state, forwards transform kwargs in fit_transform, hardens set_params.
featcopilot/utils/validation.py New find_potential_leakage_columns() helper for fuzzy leakage-prone column detection.
featcopilot/utils/__init__.py Exposes find_potential_leakage_columns via featcopilot.utils.
featcopilot/engines/timeseries.py Adds series_in_rows mode and validates time_column earlier; updates column selection behavior.
featcopilot/engines/relational.py Deduplicates relationships and validates child/parent keys during fit/transform.
featcopilot/__init__.py Makes __version__ robust to missing package metadata (0+unknown fallback).
benchmarks/splits.py Adds centralized task-aware split helper (chronological/stratified/random) with validation.
benchmarks/compare_tools/run_fe_tools_comparison.py Routes data splitting through split_benchmark_data instead of direct train_test_split.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds focused interaction-heavy use-case benchmark runner + report writer.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark and outputs.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Adds generated benchmark report output.
benchmarks/datasets.py Enables from __future__ import annotations for newer typing syntax under Python 3.9.
tests/test_benchmark_splits.py New tests covering split helper behavior + wiring assertions for benchmark scripts.
tests/test_utils.py Adds coverage for leakage detection edge cases (non-strings, empty keywords, falsy targets).
tests/test_autofeat.py Adds tests for leakage guard modes and early invalid-config failures.
tests/test_engines.py Adds tests for series_in_rows and relational relationship validation behavior.
tests/test_sklearn_compat.py Adds sklearn-compat regression tests (atomic set_params, clone behavior, import fallback).
mkdocs.yml Adds navigation entries for the new example docs pages.
docs/examples/time-aware-tabular.md Adds time-aware tabular workflow example and leakage guard explanation.
docs/examples/relational-feature-engineering.md Adds relational engine usage example + guardrails description.
examples/time_aware_tabular_prototype.py Adds an end-to-end leakage-safe temporal-split prototype script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/transformers/sklearn_compat.py
Comment thread featcopilot/transformers/sklearn_compat.py
…rget_name typing

Address two PR #2 review findings:

1. ``_validate_configuration`` previously assumed ``self.engines`` and
   ``self.selection_methods`` were iterable sequences. A bare ``str``
   (e.g. ``engines="tabular"``) would silently iterate
   character-by-character through the downstream non-string-entry /
   set-diff checks and surface as a confusing
   ``Unknown engines: ['a', 'b', 'l', 'r', 't', 'u']`` error, while a
   non-iterable value (e.g. ``engines=5``) raised an unrelated
   ``TypeError`` from ``set(self.engines)``. The validator now rejects
   anything that is not a ``list`` or ``tuple`` up front with a clear
   ``ValueError`` that names the offending type and value, and runs
   *before* every other engines/methods check so the rest of the
   validator can rely on a well-formed container. ``set_params``
   inherits this guard (and its atomic rollback) automatically.

2. ``fit`` and ``fit_transform`` annotated ``target_name`` as
   ``Optional[str]``, but ``find_potential_leakage_columns`` is
   explicitly designed to accept any column-label type DataFrames
   support (the helper normalizes labels via ``str(...)`` and is
   covered by tests with integer labels). The annotation has been
   widened to ``Optional[Any]`` on both methods, the matching
   docstring entries now read ``hashable, optional`` and call out
   that non-string labels are accepted, and the class-level
   ``Other Parameters`` block was updated for consistency.

Tests added in ``tests/test_sklearn_compat.py``:
- ``test_init_rejects_string_engines_argument``
- ``test_init_rejects_non_sequence_engines_argument`` (covers ``int``
  and ``dict``)
- ``test_init_rejects_string_selection_methods_argument``
- ``test_init_rejects_non_sequence_selection_methods_argument``
- ``test_init_accepts_tuple_engines`` (pins tuple support so future
  tightenings don't accidentally break it)
- ``test_set_params_rejects_string_engines_and_rolls_back`` (also
  verifies prior ``max_features`` is restored)
- ``test_fit_accepts_non_string_target_name`` (integer column label
  ``0`` honored as a target by the leakage guard)

Full suite: 681 passed, 2 skipped (was 674, +7).
Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier/safer validation + leakage guarding, hardens the time-series and relational engines, and improves benchmark realism by centralizing task-aware splitting and adding an interaction-heavy auto-FE use-case benchmark/report.

Changes:

  • Added leakage detection utilities and integrated a configurable leakage guard + early config validation into AutoFeatureEngineer (including more sklearn-compatible set_params behavior).
  • Hardened engines: TimeSeriesEngine adds series_in_rows mode and stricter time_column validation; RelationalEngine dedupes relationships and validates keys early.
  • Benchmarks now share a split helper (split_benchmark_data) and include a new auto feature engineering use-case benchmark + generated report.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
featcopilot/transformers/sklearn_compat.py Adds leakage guard + target-aware leakage checking, early configuration validation, fit state reset, and more sklearn-compatible set_params.
featcopilot/utils/validation.py Introduces find_potential_leakage_columns() helper for leakage-prone column name detection.
featcopilot/utils/__init__.py Exposes find_potential_leakage_columns via featcopilot.utils.
featcopilot/engines/timeseries.py Adds series_in_rows mode and validates time_column presence; updates transform logic accordingly.
featcopilot/engines/relational.py Dedupes relationships and adds eager relationship/key validation in fit and transform.
featcopilot/__init__.py Makes package import robust by falling back to __version__ = "0+unknown" when metadata is missing.
benchmarks/splits.py Centralizes benchmark split policy (chronological for forecasting/timeseries, stratified for classification when possible).
benchmarks/compare_tools/run_fe_tools_comparison.py Switches benchmark splitting to the shared split_benchmark_data helper.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py Adds a focused interaction-heavy use-case benchmark comparing FeatCopilot to baseline/tools.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark and expected outputs.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Checked-in generated benchmark report.
benchmarks/datasets.py Adds from __future__ import annotations to support Python 3.9 type syntax usage.
tests/test_utils.py Adds coverage for leakage detection behavior across edge cases (non-string columns, empty keywords, falsy target).
tests/test_sklearn_compat.py Adds extensive regression coverage for sklearn estimator behavior, validation, rollback, and import fallback.
tests/test_engines.py Adds coverage for series_in_rows timeseries mode and relational relationship validation.
tests/test_autofeat.py Adds coverage for leakage guard modes and early config validation failures.
tests/test_benchmark_splits.py Adds behavioral + wiring tests enforcing benchmark scripts use split_benchmark_data.
examples/time_aware_tabular_prototype.py Adds an end-to-end leakage-safe time-aware workflow example.
docs/examples/time-aware-tabular.md Documents the time-aware tabular example and leakage guard usage.
docs/examples/relational-feature-engineering.md Documents relational feature engineering usage and new guardrails.
mkdocs.yml Adds the new docs pages to the navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread featcopilot/utils/validation.py
…s to empty string

Address PR #2 review finding: ``find_potential_leakage_columns`` already
guarded the *target* side of the empty-substring trap (treating
``target_name=""`` / ``"---"`` as absent), but the *column* side could
still trigger it. If a column label normalized to an empty string, the
``normalized_column in normalized_target`` substring check evaluated
``True`` (because ``"" in "label"`` is ``True``), so a column literally
named e.g. ``"---"`` or ``"!!!"`` would be flagged as leakage-prone
whenever any ``target_name`` was provided.

Fixed by skipping any column whose label normalizes to an empty string
entirely (``if not normalized_column: continue`` before both the
keyword and target match blocks). An empty normalized column has no
meaningful content to compare against, so neither side of the match
should run for it.

Tests added in ``tests/test_utils.py``:
- ``test_leakage_detection_columns_normalizing_to_empty_string_are_skipped``
  covers:
  - ``["---", "!!!"]`` with ``target_name="label"`` returns ``[]``.
  - ``["---", "label_x"]`` returns only ``["label_x"]``.
  - Mixing meaningful and empty-normalizing labels still reports only
    the meaningful ones.

Also expanded the docstring ``Notes`` block to document the symmetric
column-side guard alongside the existing target-side description.

Full suite: 682 passed, 2 skipped (was 681, +1).
Pre-commit clean (black, ruff, trailing-whitespace, EOF, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens FeatCopilot’s sklearn-compatible AutoFeatureEngineer with earlier config validation and a lightweight leakage guard, hardens time-series/relational engines, and improves benchmark realism by introducing a shared task-aware split helper plus a focused auto feature engineering use-case benchmark/report.

Changes:

  • Add leakage detection utility and wire it into AutoFeatureEngineer with new leakage_guard + target_name support, plus safer/atomic set_params behavior.
  • Harden TimeSeriesEngine (row-wise series mode + clearer time_column validation) and RelationalEngine (relationship dedupe + eager key validation).
  • Introduce benchmarks.splits.split_benchmark_data() and wire benchmark scripts through it; add a new interaction-heavy use-case benchmark and report; update docs/examples and tests accordingly.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
featcopilot/transformers/sklearn_compat.py Early config validation, leakage guard, state reset on refit/failure, fit_transform() kwarg forwarding, atomic sklearn-style set_params.
featcopilot/utils/validation.py New find_potential_leakage_columns() helper for fuzzy leakage-prone column detection.
featcopilot/utils/__init__.py Exposes find_potential_leakage_columns via featcopilot.utils.
featcopilot/engines/timeseries.py Adds series_in_rows mode and validates time_column earlier; adjusts transform behavior accordingly.
featcopilot/engines/relational.py Dedupes relationships and validates relationship keys in fit() and transform().
featcopilot/__init__.py Makes package import resilient when distribution metadata is unavailable (__version__ fallback).
benchmarks/splits.py New shared task-aware split helper (chronological/stratified/random) with test_size validation.
benchmarks/compare_tools/run_fe_tools_comparison.py Replaces direct train_test_split usage with split_benchmark_data.
benchmarks/use_cases/run_auto_feature_engineering_benchmark.py New use-case benchmark script comparing tools on an interaction-heavy classification task.
benchmarks/use_cases/README.md Documents how to run the new use-case benchmark.
benchmarks/use_cases/AUTO_FEATURE_ENGINEERING_USE_CASE.md Committed generated report for the new use-case benchmark.
benchmarks/datasets.py Enables from __future__ import annotations for Py3.9-friendly type hints.
tests/test_utils.py Adds coverage for leakage detection edge cases (non-string labels, empty keywords, empty targets).
tests/test_sklearn_compat.py Adds extensive sklearn-compat/regression tests (atomic set_params, clone behavior, refit resets, import fallback).
tests/test_engines.py Adds tests for TimeSeriesEngine.series_in_rows and RelationalEngine key validation.
tests/test_benchmark_splits.py New tests covering split helper behavior + regression wiring checks for benchmark scripts.
tests/test_autofeat.py Adds coverage for leakage guard modes and early config validation errors.
examples/time_aware_tabular_prototype.py New end-to-end time-aware/leakage-safe workflow example.
docs/examples/time-aware-tabular.md Documentation for the time-aware tabular example and leakage guard usage.
docs/examples/relational-feature-engineering.md Documentation for relational feature engineering usage + new guardrails.
mkdocs.yml Adds the new examples to the documentation navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thinkall thinkall merged commit fd30793 into main Apr 30, 2026
12 checks passed
@thinkall thinkall deleted the feat/strengthen-core-api branch April 30, 2026 15:29
thinkall added a commit that referenced this pull request May 3, 2026
Addresses all five review comments from Copilot and Codex on PR #5:

* explain now actually returns generated features
  (Copilot review #1, Codex P1).
  Built-in engines (e.g. tabular) populate `_feature_names` during
  `transform()`, not `fit()`. `_cmd_explain` now calls
  `fit_transform(..., apply_selection=False)` so the JSON payload
  contains the full `{name, explanation, code}` records the
  subcommand advertises. Test asserts `n_features > 0` for tabular.

* main(argv) -> int contract honored on parse errors
  (Copilot review #2).
  `argparse.parse_args` raises `SystemExit` for usage errors,
  `--help` and `--version`. `main` now traps those and returns
  the exit code so programmatic and agent callers always get an int.
  Tests cover `--version` (rc=0), `--help` (rc=0), no-subcommand
  (rc=2) and unknown-flag (rc=2).

* Real subprocess test for python -m featcopilot
  (Copilot review #3).
  `test_dunder_main_subprocess_invocation` and
  `test_dunder_main_subprocess_version_flag` spawn a real
  `python -m featcopilot ...` subprocess and assert stdout JSON,
  so a regression in `__main__.py` actually breaks the suite.

* Parquet `ImportError` -> clean exit 2 (Codex P2).
  `_read_table`/`_write_table` now wrap parquet calls and convert
  `ImportError` into a `ValueError` with a friendly install hint;
  the top-level handler routes that to the deterministic `exit 2`
  user-error path instead of the generic `exit 1` backstop.
  `test_transform_parquet_missing_engine_returns_exit_2` exercises
  this via `monkeypatch` of `DataFrame.to_parquet`.

* Pre-commit black: re-applied formatting from the pinned
  `black 24.1.1` hook (joined two long string raises) so the CI
  pre-commit job passes.

Tests: 23 (+5 new) in tests/test_cli.py, 796 passed full suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
thinkall added a commit that referenced this pull request May 5, 2026
…ut cases

Round-25 reviewer feedback addressed:

1. **Copilot — empty-input branch conflated zero-row and zero-column.**
   `_read_table` raised the same "zero data rows" error for both
   `len(df) == 0` and `len(df.columns) == 0` (both make
   `DataFrame.empty` return `True`). A JSON `[{}, {}]` array
   produces a frame WITH rows but NO columns ù very different user
   error than a header-only CSV. Split into two distinct, accurately
   worded error messages: "no columns" vs "zero data rows".

2. **Copilot — read-time warnings leaked to stderr.**
   `_cmd_transform` only opened the capture context around
   `engineer.fit_transform`. `pd.read_csv` can legitimately emit
   `DtypeWarning` on mixed-type CSVs and parquet/JSON readers can
   emit pyarrow / pandas warnings on a successful read; those were
   bypassing the capture and bleeding to stderr, breaking the
   "stderr reserved for failures" contract.

3. **Copilot — write-time warnings also leaked to stderr.**
   `_write_table` was likewise outside the capture, so pandas /
   pyarrow `FutureWarning` / `UserWarning` from a successful
   write could leak. Same contract violation as #2.

The fix wraps the entire `_cmd_transform` pipeline (read +
build_engineer + fit_transform + write) in a single
`_capture_featcopilot_messages` block so warnings from ANY phase end
up in the JSON `warnings` field instead of stderr. `_cmd_explain`
also widened: `explain_features` / `get_feature_code` are now
inside the same capture as the read+fit_transform.

The dead helper `_fit_transform_capturing_warnings` is no longer used
internally; kept as a thin convenience wrapper for external test code
with an updated docstring noting that the CLI now wraps a wider
region.

New tests:
- `test_transform_zero_columns_input_distinguishes_from_zero_rows` ù
  pins the column-vs-row error-message distinction.
- `test_transform_zero_rows_input_still_uses_zero_rows_message` ù
  guards the existing zero-rows wording for the header-only case.
- `test_transform_read_warning_captured_not_on_stderr` ù patches
  `pd.read_csv` to emit a warning, asserts `err == ""` and the
  warning lands in JSON `warnings`.
- `test_transform_write_warning_captured_not_on_stderr` ù patches
  `DataFrame.to_csv` to emit a warning, same contract.
- `test_explain_features_warnings_captured_not_on_stderr` ù patches
  `AutoFeatureEngineer.explain_features` to emit a warning, same
  contract for `_cmd_explain`.

906 tests pass locally (full suite); 133 in tests/test_cli.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants