Fix integration tests #464

johnwalz97 · 2026-01-07T17:22:06Z

Pull Request Description

What and why?

Integration tests were failing due to 403 errors when downloading the sklearn california housing dataset. Bundling the csv in the repo to prevent that.

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

github-actions · 2026-01-07T17:22:26Z

Pull requests must include at least one of the required labels: internal (no release notes required), highlight, enhancement, bug, deprecation, documentation. Except for internal, pull requests must also include a description in the release notes section.

github-actions · 2026-01-07T17:22:35Z

PR Summary

This pull request introduces significant enhancements to the California housing dataset module. The changes primarily focus on improving the data loading mechanism by:

Introducing a new API in the load_data function that supports a 'bundled' data source as an alternative to the default sklearn fetch. The function will first attempt to load the dataset from a bundled CSV file. If the file is absent or the columns do not match the expected ones, it falls back to fetching the data using sklearn.
Adding robust error handling in the helper function _load_from_sklearn to capture common issues such as HTTP 403 errors or network-related problems. Detailed error messages are provided to guide the user on potential resolutions, including instructions for manually downloading the dataset if necessary.
Including a helper script generate_california_housing_csv.py that downloads the dataset (using the same fallback mechanisms) and saves it as a CSV file in the repository. This script assists in generating the bundled version of the dataset, ensuring that the repository can serve the dataset without always relying on an external download.

These changes aim to improve data reliability, user experience, and local caching of the dataset while providing clear diagnostic feedback when operations fail.

Test Suggestions

Test loading data with the 'bundled' source when the CSV file exists and contains the correct columns.
Test loading data with the 'bundled' source when the CSV file is missing to ensure it falls back to the sklearn fetch.
Simulate a scenario where the bundled CSV is present but has missing or incorrect columns to validate the fallback mechanism.
Test for error conditions by providing an invalid source parameter to ensure the appropriate ValueError is raised.
Test the helper script by running it in an environment without a cached dataset to ensure it can download and generate the CSV file.

johnwalz97 added 2 commits January 7, 2026 12:19

fix: bundle housing dataset in repo to prevent 403 errors in ci

76994c9

fix: remove unused import and fix f-string in generate script

36787ef

johnwalz97 requested review from AnilSorathiya, cachafla and juanmleng January 7, 2026 17:22

johnwalz97 added the internal Not to be externalized in the release notes label Jan 7, 2026

cachafla approved these changes Jan 7, 2026

View reviewed changes

johnwalz97 merged commit 494754b into main Jan 7, 2026
17 of 18 checks passed

johnwalz97 deleted the fix-integration-tests branch January 7, 2026 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix integration tests #464

Fix integration tests #464

Uh oh!

johnwalz97 commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix integration tests #464

Fix integration tests #464

Uh oh!

Conversation

johnwalz97 commented Jan 7, 2026

Pull Request Description

What and why?

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants