Skip to content

Conversation

adamantike
Copy link
Contributor

For complex DAGs, the import time added by this library could generate DagBag import timeouts if the configured value is low enough. An Airflow documented best practice is to
avoid heavy Python code that runs on DAG and Operator creation, and dbt imports are slow, based on profiling.

Profiling can be easily run locally, with the following command:

python -X importtime -c "from airflow_dbt_python.operators.dbt import DbtRunOperator" 2>import-times.log

And then parsed using a tool like tuna.

Before this change, the operator import takes ~1.37s, which is reduced to ~0.25s with this fix.

It's important to note that, from those 0.25s, more than 80% of the time is spent importing airflow components, which will be commonly already loaded in DAGs, so this library's import time for operators becomes insignificant.

For complex DAGs, the import time added by this library could generate
`DagBag` import timeouts if the configured value is low enough. An
Airflow documented best practice is to
[avoid heavy Python code](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code)
that runs on DAG and Operator creation, and `dbt` imports are slow,
based on profiling.

Profiling can be easily run locally, with the following command:
```shell
python -X importtime -c "from airflow_dbt_python.operators.dbt import DbtRunOperator" 2>import-times.log
```
And then parsed using a tool like
[tuna](https://github.com/nschloe/tuna).

Before this change, the operator import takes ~1.37s, which is reduced
to ~0.25s with this fix.

It's important to note that, from those 0.25s, more than 80% of the time
is spent importing `airflow` components, which will be commonly already
loaded in DAGs, so this library's import time for operators becomes
insignificant.
@tomasfarias
Copy link
Owner

Changes look neat, thanks for your contribution.

and dbt imports are slow, based on profiling

Fun story: sometimes dbt does network calls (!) when importing modules. This was a particular headache for me when I was trying to fetch the dbt version fast.

Anyways, if CI is green, I'll be happy to merge this and push out a patch release.

@tomasfarias tomasfarias added the enhancement New feature or request label Nov 22, 2022
@tomasfarias tomasfarias merged commit 67c8d78 into tomasfarias:master Nov 22, 2022
@adamantike
Copy link
Contributor Author

Thank you for the quick review and release! You have done an excellent job building and supporting this project :)

@adamantike adamantike deleted the fix/avoid-heavy-top-level-imports-in-operators branch November 23, 2022 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants