Skip to content

[PECOBLR-587] Azure Service Principal Credential Provider #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jprakash-db
Copy link
Contributor

@jprakash-db jprakash-db commented Jun 30, 2025

Description

This pull request introduces support for Azure Service Principal M2M (SP) authentication to the PySQL Connector

Key Changes

  • AzureServicePrincipalCredentialProvider, this is the provider that will deal with getting the creds using the azure credentials
  • ClientCredentialsTokenSource, is responsible for managing the token lifecycle for the flow where credentials are obtained using the grant_type: client_credentials
  • Introduced a common HTTP client called DatabricksHttpClient that can be used to unify the Http logic across the connector, to ensure standard client level behaviour

New dependencies

  • Introduced library PyJWT which is required for handling JWT parsing functionalities

Tests

Expanded unit tests to cover:

  • AzureServicePrincipalCredentialProvider.
  • ClientCredentialsTokenSource

Manual Testing

  • Tested the Azure M2M SP flow by creating a SP using Azure Entra ID and then mapping it to a workspace SP. Further queries were made to the workspace using the Azure Entra ID credentials to test it on Databricks Workspace

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Jul 3, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@jprakash-db jprakash-db marked this pull request as ready for review July 3, 2025 05:55
@jprakash-db jprakash-db changed the title Azure Service Principal Credential Provider [PECOBLR-587] Azure Service Principal Credential Provider Jul 3, 2025

[tool.poetry.extras]
pyarrow = ["pyarrow"]

[tool.poetry.dev-dependencies]
[tool.poetry.group.dev.dependencies]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the poetry update have to go in the same PR?


if auth_type == AuthType.AZURE_SP_M2M.value:
pass
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just do if if auth_type != AuthType.AZURE_SP_M2M.value:

oauth_redirect_port_range=[kwargs["oauth_redirect_port"]]
if kwargs.get("oauth_client_id") and kwargs.get("oauth_redirect_port")
if client_id and kwargs.get("oauth_redirect_port")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is behaviour change from before i.e. earlier we required client_id to come in from kwargs and now it is ok even if we derive it from get_client_id_and_redirect_port, is this intended?


# Private API: this is an evolving interface and it will change in the future.
# Please must not depend on it in your applications.
from databricks.sql.experimental.oauth_persistence import OAuthToken, OAuthPersistence

logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this is being used, can we add logging?

return app_id

# default databricks resource id
return "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add these IDs as constants at the top so they're in one place


@abstractmethod
def refresh(self) -> Token:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment here that we have duplicate code here with the sdk (with the code pointer) and in the long term we should try to unify?

self.token = self.refresh()
return self.token

def refresh(self) -> Token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the refresh mechanism being handled for the existing credential providers? is there opportunity to dedup/reuse?


# Singleton class for common Http Client
class DatabricksHttpClient:
## TODO: Unify all the http clients in the PySQL Connector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be done right now? can we not use the existing http client?

@vikrantpuppala vikrantpuppala requested a review from Copilot July 4, 2025 05:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Azure Service Principal M2M authentication to the PySQL Connector by introducing a shared HTTP client, a client-credentials token source, and a new credentials provider for Azure SP.

  • Introduce DatabricksHttpClient for unified HTTP logic
  • Add Token and ClientCredentialsTokenSource to manage OAuth client-credentials flow
  • Implement AzureServicePrincipalCredentialProvider and wire it through get_auth_provider

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/test_thrift_field_ids.py Formatting cleanup and consistent quote style
tests/unit/test_auth.py Added tests for ClientCredentialsTokenSource and SP credential provider, JWT fixtures
src/databricks/sql/common/http.py New singleton DatabricksHttpClient with retry logic
src/databricks/sql/auth/oauth.py Introduced Token, RefreshableTokenSource, and ClientCredentialsTokenSource
src/databricks/sql/auth/common.py Extended AuthType and helper for mapping Azure login app IDs
src/databricks/sql/auth/authenticators.py Added AzureServicePrincipalCredentialProvider
src/databricks/sql/auth/auth.py Updated ClientContext, get_auth_provider, and auth provider resolution for SP
pyproject.toml Added pyjwt, moved dev dependencies under [tool.poetry.group.dev.dependencies]

from typing import Optional, List

from databricks.sql.auth.authenticators import (
AuthProvider,
AccessTokenAuthProvider,
ExternalAuthProvider,
DatabricksOAuthProvider,
AzureServicePrincipalCredentialProvider,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this coming from sdk?

cfg.hostname,
cfg.oauth_client_id,
cfg.oauth_client_secret,
cfg.azure_tenant_id,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were talking about this in JDBC if tenant ID can be derived dynamically and not be a required config


# Enums for HTTP Methods
class HttpMethod(str, Enum):
GET = "GET"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no existing enums for these?

Copy link
Contributor

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have I have much objection with this implementation. But, what would it take for us to include dependency on SDK as it has the support for this already?

@@ -24,6 +18,9 @@ def __init__(
auth_type: Optional[str] = None,
oauth_scopes: Optional[List[str]] = None,
oauth_client_id: Optional[str] = None,
oauth_client_secret: Optional[str] = None,
azure_tenant_id: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a todo to remove the need of passing tenant id ? databricks/databricks-sdk-py#638

Also see : https://databricks.atlassian.net/browse/PECOBLR-212

@@ -0,0 +1,65 @@
import requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varun-edachali-dbx has added a http client already, I suggest you two collaborate to push the common http to main first https://github.com/databricks/databricks-sql-python/blob/sea-migration/src/databricks/sql/backend/sea/utils/http_client.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants