Skip to content

Conversation

millin
Copy link
Contributor

@millin millin commented Feb 25, 2025

⚠️ Breaking Changes

  • Specifying the Airflow connection in the target parameter is no longer supported.
    Use dbt_conn_id parameter instead.

✨ Features

🐛 Bug Fixes

🔥 Refactoring

  • Dropped support for py37_copytree as Python 3.7 is no longer supported.
  • Simplified connection management by removing Python 3.7 compatibility checks.
  • Used functools.cache to cache get_remote method calls.
  • Consolidated remote hooks under the airflow_dbt_python.hooks.remote package.

🧪 Tests

  • Added extensive test coverage for new adapter-specific hooks.
  • Introduced test profiles for Postgres, Redshift, Snowflake, BigQuery, and Spark.
  • Improved unit tests for connection parameter extraction.

🔍 Other

  • Fixed Poetry deprecation warnings by migrating to the new pyproject.toml format.
  • Added missing dbt configuration flags
  • Docs updated

@millin millin force-pushed the fix/adapter_specific_hooks branch 2 times, most recently from 0f03285 to dd0fddd Compare February 25, 2025 15:39
@millin millin changed the title Adapter specific hooks Draft: Adapter specific hooks Feb 26, 2025
@millin millin changed the title Draft: Adapter specific hooks Draft: Version 3.0.0 Mar 4, 2025
@millin millin force-pushed the fix/adapter_specific_hooks branch from dd0fddd to 8d6cc58 Compare March 4, 2025 12:38
@millin millin changed the title Draft: Version 3.0.0 Version 3.0.0 Mar 4, 2025
@millin millin force-pushed the fix/adapter_specific_hooks branch from 8d6cc58 to c666d7b Compare March 4, 2025 13:13
@millin
Copy link
Contributor Author

millin commented Mar 5, 2025

Hi, @tomasfarias
Please take a look when you have time

@tomasfarias
Copy link
Owner

Hello @millin. Thanks for the work! I was to busy the previous one, so I'll be taking a look this week.

@millin
Copy link
Contributor Author

millin commented Mar 27, 2025

Hello, @tomasfarias!
Will you have time to look at it this week?

@tomasfarias
Copy link
Owner

Apologies @millin, I've been out sick for a few days, and I missed my original deadline. I'll review this now.

The PR is quite large, so I'll go commit by commit.

)


def py37_copytree(source: URL, destination: URL, replace: bool = True):
Copy link
Owner

@tomasfarias tomasfarias Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Happy to see this workaround removed. Fully behind dropping support for Python 3.7.

default_factory=list, repr=False
)

# legacy behaviors - https://github.com/dbt-labs/dbt-core/blob/main/docs/guides/behavior-change-flags.md
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: neat 👍

conn = copy(conn)
extra_dejson = conn.extra_dejson
options = extra_dejson.pop("options")
for k, v in re.findall(r"-c (\w+)=(.*)$", options):
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue(blocking): Could we modify the docstring (or add a comment) explaining what are we trying to find with this regular expression?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in a different module, but the same regular expression is present.

"password",
depends_on=lambda x: not any(
(
*(k in x.extra_dejson for k in ("private_key_file", "private_key_content")),
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue(blocking): I don't think we care whether there are private key file and contents in extra_dejson as long as we have our password and .get("authenticator", "") == "oauth".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the mental model small for users with only one key to consider to determine what everything else means (authenticator).

DbtConnectionParam(
"login",
"user",
depends_on=lambda x: x.extra_dejson.get("authenticator", "") != "oauth",
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I feel like we could just have a utility function that we can re-use here instead of using lambda:

def make_extra_dejson_compare_callable(key, default, comparison_operator, expected):
    def compare_extra_dejson_value(conn):
        return comparison_operator(conn.extra_dejson.get(key, default), expected))
        
    return compare_extra_dejson_value

Then here:

import operator
...
        DbtConnectionParam(
            "login",
            "user",
            depends_on=make_extra_dejson_compare_callable("authenticator", "", operator.ne, "oauth"),
        ),

This one is kind of a nitpick. I do prefer an approach like this to using lambda, but I think the other comments are more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this idea and decided to combine recurring parameters together


return {self.conn.conn_id: details}

def get_dbt_details_from_connection(self, conn: Connection) -> dict[str, Any]:
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Okay, this is no longer a classmethod, safe to ignore my comment on this topic. I think this is now fine.

raise ValueError(f"Unsupported scheme: {url.scheme}")

return client, path
path, *remain = path.split("@", maxsplit=1)
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: neat 👍

from airflow_dbt_python.utils.url import URL, URLLike


class DbtGCSRemoteHook(GCSHook, DbtRemoteHook):
Copy link
Owner

@tomasfarias tomasfarias Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I'm not very familiar with GCS unfortunately, so I'll default on saying this is fine 👍, although I do have one comment about some (apparently) copy-pasted comment.

Copy link
Owner

@tomasfarias tomasfarias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, pretty lengthy PR, so naturally a pretty lengthy review is in order.

I've tagged all my review comments with conventional comments to give you an idea of what needs more focus to address.

Ultimately, I agree with the direction of the changes, so if we get these comments addressed I have no problem with shipping this.

pyproject.toml Outdated
types-freezegun = ">=1.1.6"
types-PyYAML = ">=6.0.7"
pytest-mock = "^3.14.0"
mock-gcp = {git = "https://github.com/millin/mock-gcp.git", rev = "0d972df9b6cce164b49f09ec4417a4eb77beb960"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I skimmed through this, and it seems like upstream is not active. Could you publish your own fork in PyPI? I feel a bit uneasy about including a git source, so if publishing to PyPI is not possible I would like a comment briefly describing what are we getting from the fork that is not in upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference with upstream is that it mostly doesn't work 😄. I sent a PR to the original author but haven't gotten a response. But I liked the concept and used it as a basis.

I may try publishing this in PyPI (never done it, to be honest).

@millin millin force-pushed the fix/adapter_specific_hooks branch from d8d6a79 to 5ff0c76 Compare April 1, 2025 19:24
@millin millin force-pushed the fix/adapter_specific_hooks branch from 5ff0c76 to e136027 Compare April 1, 2025 19:26
@millin millin requested a review from tomasfarias April 2, 2025 06:35
@millin
Copy link
Contributor Author

millin commented Apr 4, 2025

@tomasfarias Please check out the new changes

pyproject.toml Outdated
types-PyYAML = ">=6.0.7"
pytest-mock = "^3.14.0"
mock-gcp = {git = "https://github.com/millin/mock-gcp.git", rev = "0d972df9b6cce164b49f09ec4417a4eb77beb960"}
mock-gcp = ">=0.2.0"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thank you for this.

@@ -1 +1 @@
"""Hooks module provides DbtHooks and DbtRemoteHooks."""
"""Hooks module provides DbtHooks and DbtFSHooks."""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Fantastic work in this commit.

Copy link
Owner

@tomasfarias tomasfarias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got no more comments. Let's ship this!

@tomasfarias tomasfarias merged commit 8d2136f into tomasfarias:master Apr 12, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants