Skip to content

Auto start testcontainers for datafusion-cli #16644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

blaginin
Copy link
Contributor

@blaginin blaginin commented Jul 1, 2025

Which issue does this PR close?

Rationale for this change

Right now, running integration tests requires manual setup. Once tests are complete, you also need to remember to clean them up...

What changes are included in this PR?

Automatically start and configure containers on start and delete them on completion

Are these changes tested?

Are there any user-facing changes?

No

@blaginin blaginin self-assigned this Jul 1, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 1, 2025
@github-actions github-actions bot added the development-process Related to development process of DataFusion label Jul 1, 2025
@blaginin blaginin marked this pull request as ready for review July 1, 2025 23:29
@blaginin
Copy link
Contributor Author

blaginin commented Jul 1, 2025

fyi @Omega359 -- as you brought test containers initially 👀

@@ -173,6 +173,8 @@ rstest = "0.25.0"
serde_json = "1"
sqlparser = { version = "0.55.0", default-features = false, features = ["std", "visitor"] }
tempfile = "3"
testcontainers = { version = "0.24", features = ["default"] }
testcontainers-modules = { version = "0.12" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I am not sure why but locally I am unable to properly run minio with version 0.12, I have this in my Cargo.toml:

# version 0.12.0 seems to cause minio to hang
testcontainers-modules = { version = "=0.11.6", features = ["postgres", "minio", "blocking"] }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to properly run minio

can you please post the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just hangs locally (NOT this code, but my own tests). I think it's because of something related to waiting for minio to come up - I'm not doing the mc commands like you are.

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

Do we need teardown code to shutdown the container after the run in a dtor annotated method ?

@blaginin
Copy link
Contributor Author

blaginin commented Jul 2, 2025

I don't think so because:

Containers implement Drop. As soon as they go out of scope, the underlying docker container is removed.

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

Ah. I'm used to scala version mostly where if a test fails the container sticks around. I have to clean them up every so often when I'm doing the test/fix iteration cycle :(

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

I haven't run this myself but looking over the changes I see nothing problematic .. I think this is a good improvement!

@blaginin
Copy link
Contributor Author

blaginin commented Jul 5, 2025

@findepi hey 👋 FYI as you suggested the change - in case you want to approve / merge 👀

blaginin added a commit to blaginin/datafusion that referenced this pull request Jul 7, 2025
@alamb
Copy link
Contributor

alamb commented Jul 12, 2025

This looks great -- I am testing it out now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @blaginin -- I tried this locally on my mac and on a hosted ubuntu cloud VM and it seems to work great

The only thing I think would be good to figure out is if we can get the error when docker is not running to be more visible

@Omega359 did you ever figure out the problem on your machine? Given that this won't affect anyone unless they are running with INTEGRATION_TESTS I think we can merge it and perhaps debug on main

export AWS_SECRET_ACCESS_KEY=TEST-DataFusionPassword
export AWS_ENDPOINT=http://127.0.0.1:9000
export AWS_ALLOW_HTTP=true
TEST_STORAGE_INTEGRATION=1 cargo test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first ran this command without docker running, several commands failed

Once I started docker it worked great

TEST_STORAGE_INTEGRATION=1 cargo test

---- test_aws_options stdout ----

thread 'test_aws_options' panicked at datafusion-cli/tests/cli_integration.rs:63:10:
Failed to start MinIO container: Client(CreateContainer(HyperLegacyError { err: hyper_util::client::legacy::Error(Connect, Os { code: 61, kind: ConnectionRefused, message: "Connection refused" }) }))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- test_cli stdout ----

thread 'test_cli' panicked at datafusion-cli/tests/cli_integration.rs:63:10:
Failed to start MinIO container: Client(CreateContainer(HyperLegacyError { err: hyper_util::client::legacy::Error(Connect, Os { code: 61, kind: ConnectionRefused, message: "Connection refused" }) }))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it didn't show any errors about starting the container in the logs / output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ! docker info > /dev/null 2>&1; then
  echo "This script requires docker to be running. Please start docker and try again."
  exit 1
fi

???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated in 4deb0eb

@@ -35,6 +40,67 @@ fn make_settings() -> Settings {
settings
}

async fn setup_minio_container() -> ContainerAsync<minio::MinIO> {
const MINIO_ROOT_USER: &str = "TEST-DataFusionLogin";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very nice that these environment variables get setup via the test harness now rather than having to be setup outside

alamb pushed a commit that referenced this pull request Jul 12, 2025
* Add rust CI cache

* comment save if for now

* Add cache to depscache

* Unify the containers

* linux-build-lib to use its own cache

* rollback datafusion-cli for now (blocked by #16644)

* Fix CLI comment

* Load cache for `linux-datafusion-common-features`

* Setup more caches

* Cache for clippy

* comment clippy save

* Remove caches that aren't working

* Save if

* Save if
@Omega359
Copy link
Contributor

@Omega359 did you ever figure out the problem on your machine? Given that this won't affect anyone unless they are running with INTEGRATION_TESTS I think we can merge it and perhaps debug on main

The problem with minio? No, unfortunately not. How I'm starting and waiting for minio to come up in my test code is different than how it's handled in this PR and I don't think this PR will have the same issue that I'm seeing.

@blaginin
Copy link
Contributor Author

There's a minio alternative which actually uses datafusion internally 😱 https://github.com/rustfs/rustfs

blaginin and others added 2 commits July 14, 2025 20:32
@blaginin
Copy link
Contributor Author

There's a minio alternative which actually uses datafusion internally 😱 https://github.com/rustfs/rustfs

It's in active development but will switch after the first stable release 📝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto run docker containers needed for tests
3 participants