-
Notifications
You must be signed in to change notification settings - Fork 367
Description
What is the new container you'd like to have?
I propose the addition of a DatabricksContainer module to facilitate testing applications that use the databricks-sql-connector. This would allow developers to run integration and end-to-end tests against a mock Databricks SQL API in an isolated and reproducible manner.
Unlike many services, Databricks does not provide an official local emulator in a Docker container. Therefore, this module would be designed to work with a user-provided Docker image that runs a mock server emulating the Databricks SQL API.
The benefits of having this dedicated container module include:
- Isolated Testing: Enables hermetic tests without relying on shared, live Databricks workspaces.
- CI/CD Integration: Simplifies running automated tests in CI/CD pipelines without complex credential management or network configurations.
- Developer Experience: Provides a simple, Pythonic interface consistent with other testcontainers modules like AzuriteContainer or PostgresContainer.
- Reliability: Eliminates test flakiness caused by network issues or changes in shared development environments.
Why not just use a generic container for this?
While it is possible to use a generic DockerContainer("my-databricks-mock:latest"), a dedicated DatabricksContainer module would abstract away significant complexity related to configuration and readiness checks.
- Complicated Setup and Configuration:
The databricks-sql-connector requires specific connection parameters: server_hostname, http_path, and access_token. A user of DockerContainer would have to manually:
- Get the container's dynamic IP address and port.
- Correctly format the server_hostname and http_path.
- Know which token the mock server expects.
- This process is cumbersome and error-prone.
Generic DockerContainer approach :
from databricks import sql
from testcontainers.core.container import DockerContainer
with DockerContainer("my-databricks-mock:latest").with_exposed_ports(8080) as mock_container:
host = mock_container.get_container_host_ip()
port = mock_container.get_exposed_port(8080)
# User must manually construct connection parameters
connection = sql.connect(
server_hostname=host,
http_path=f"/sql/1.0/warehouses/{port}", # Path might be complex and mock-specific
access_token="dummy-token"
)A dedicated DatabricksContainer would provide helper methods to abstract this away, offering a much cleaner interface.
Proposed DatabricksContainer approach:
from databricks import sql
# from testcontainers.databricks import DatabricksContainer
with DatabricksContainer as databricks_container:
# Clean, abstracted methods
connection = sql.connect(
server_hostname=databricks_container.get_server_hostname(),
http_path=databricks_container.get_http_path(),
access_token=databricks_container.get_token()
)