Skip to content

Commit

Permalink
docs: A lot of documentation improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
tomasfarias committed Feb 5, 2022
1 parent b1b2b3a commit 4746d80
Show file tree
Hide file tree
Showing 10 changed files with 396 additions and 17 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ with DAG(
exclude=["tag:deprecated"],
target="production",
profile="my-project",
full_refresh=True,
full_refresh=True,See the full example [here](examples/use_dbt_artifacts_dag.py).

do_xcom_push_artifacts=["manifest.json", "run_results.json"],
)

Expand Down
15 changes: 0 additions & 15 deletions docs/autodoc.rst

This file was deleted.

83 changes: 83 additions & 0 deletions docs/development.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
.. _development:

Development
===========

This section describes how to setup a development environment. If you are looking to dig into the internals of airflow-dbt-python and make a (very appreciated) contribution to the project, read along.

Poetry
------

airflow-dbt-python uses `Poetry <https://python-poetry.org/>`_ for project management. Ensure it's installed before running: see `Poetry's installation documentation <https://python-poetry.org/docs/#installation>`_.

Additionally, we recommend running the following commands in a virtual environment.

Installing Airflow
------------------

For running unit-tests we require a local installation of Airflow. We can install a specific version using ``pip``:

.. code-block:: shell
pip install apache-airflow==1.10.12
.. note::
Installin any 1.X version of Airflow will raise warnings due to dependency conflicts with ``dbt-core``. These conflicts should not impact airflow-dbt-python.

Or install the ``airflow`` extra which will fetch the latest version of Airflow with major version 2:

.. code-block:: shell
cd airflow-dbt-python
poetry install -E airflow
Building from source
--------------------

Clone the main repo and install it:


.. code-block:: shell
git clone https://github.com/tomasfarias/airflow-dbt-python.git
cd airflow-dbt-python
poetry install
Testing
-------

Unit tests are available for all operators and hooks. That being said, only a fraction of the large amount of possible inputs that the operators and hooks can take is currently covered, so the unit tests do not offer perfect coverage (a single peek at the ``DbtBaseOperator`` should give you an idea of the level of state explosion we manage).

.. note::
Unit tests (and airflow-dbt-python) assume dbt works correctly and do not assert the behavior of the dbt commands themselves.

Requirements
^^^^^^^^^^^^

Unit tests interact with a `PostgreSQL <https://www.postgresql.org/>`_ database as a target to run dbt commands. This requires PostgreSQL to be installed in your local environment. Installation instructions for all major platforms can be found here: https://www.postgresql.org/download/.

Some unit tests require the `Amazon provider package for Airflow <https://pypi.org/project/apache-airflow-providers-amazon/>`_. Ensure it's installed via the ``amazon`` extra:

.. code-block:: shell
poetry install -E amazon
Running unit tests with pytest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

airflow-dbt-python uses `pytest <https://docs.pytest.org/>`_ as its testing framework. After you have saved your changes, all unit tests can be run with:

.. code-block:: shell
poetry run pytest tests/ -vv
Generating coverage reports with pytest-cov can be done with:

.. code-block:: shell
poetry run pytest -vv --cov=./airflow_dbt_python --cov-report=xml:./coverage.xml --cov-report term-missing tests/
Pre-commit hooks
----------------
4 changes: 4 additions & 0 deletions docs/example_dags.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Example DAGs
============

This section contains a few DAGs showing off some dbt pipelines to get you going.
194 changes: 194 additions & 0 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
Getting started
===============

This section gives a quick run-down on installing airflow-dbt-python and getting your first DAG running.

.. _requirements:

Requirements
------------

airflow-dbt-python requires the latest major version of ```dbt-core`` <https://pypi.org/project/dbt-core/>`_ which at the time of writing is version 1.

To line up with ``dbt-core``, airflow-dbt-python supports Python 3.7, 3.8, and 3.9. We also include Python 3.10 in our testing pipeline, although as of the time of writing ``dbt-core`` does not yet support it.

On the Airflow side, we support the release version 1.10.12 and all Airflow major version 2 releases.

.. note::
``apache-airflow==1.10.12`` has a dependency conflict with ``dbt-core>=1.0.0``. airflow-dbt-python does not require the conflicting dependency, nor does it access the parts of ``dbt-core`` that use it, so it should work regardless.

That being said, installing airflow-dbt-python in an environment with ``apache-airflow==1.10.12`` will produce warnings, and we do recommend upgrading to version 2 or later due to higher likelihood of future versions of airflow-dbt-python dropping support for version 1.10.12 entirely if the conflicts become unmanageable.

.. warning::
Due to the dependency conflict just now described, airflow-dbt-python does not include Airflow as a dependency. We expect it to be installed into an environment with Airflow already in it. For instructions on setting up a development environment, see :ref:`development`.


Installation
------------

airflow-dbt-python can be installed in any environment that has a supported version of Airflow already installed. See :ref:`requirements` for details, and refer to the `Airflow documentation <https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html>`_ for instructions on how to install it.

From PyPI
^^^^^^^^^

airflow-dbt-python is available in `PyPI <https://pypi.org/project/airflow-dbt-python/>`_ and can be installed with ``pip``:

.. code-block:: shell
pip install airflow-dbt-python
As a convinience, any dbt adapters that are required can be installed by specifying extras. The ``all`` extra includes all adapters:

.. code-block:: shell
pip install airflow-dbt-python[snowflake,postgres,redshift,bigquery]
pip install airflow-dbt-python[all]
Building from source
^^^^^^^^^^^^^^^^^^^^

airflow-dbt-python can also be built from source by cloning the main repo:

.. code-block:: shell
git clone https://github.com/tomasfarias/airflow-dbt-python.git
cd airflow-dbt-python
And installing with ``poetry`` (without development dependencies):

.. code-block:: shell
poetry install --no-dev
As with ``pip``, any extra adapters can be installed:

.. code-block:: shell
poetry install -E postgres -E redshift -E bigquery -E snowflake --no-dev
poetry install -E all --no-dev
Installing in MWAA
^^^^^^^^^^^^^^^^^^

airflow-dbt-python can be installed in an Airflow environment managed by AWS via their `Managed Workflows for Apache Airflow <https://aws.amazon.com/managed-workflows-for-apache-airflow/>`_ service. To do so, include airflow-dbt-python in MWAA's ``requirements.txt`` file, for example:

.. code-block:: shell
:caption: requirements.txt
airflow-dbt-python[redshift,amazon]
Installs airflow-dbt-python, dbt's Redshift adapter, and Airflow's Amazon providers library.


Setting up a dbt project
------------------------

Setting up a dbt project for airflow-dbt-python to run depends on the type of executor running in your production Airflow environment:

1. Using a `LocalExecutor <https://airflow.apache.org/docs/apache-airflow/stable/executor/local.html>`_ with a single-machine deployment means we can rely on the local machine's filesystem to store our project. This also applies to DebugExecutor and SequentialExecutor, but these executors are generally only used for debugging/development so we will ignore them.

2. However, once your setup has evolved to a multi-machine/cloud installation, we must rely on an external backend to store any dbt files. The only currently supported backend is S3 although more are in plans to be added (see :ref:`download-dbt-files-from-s3`).


Single-machine setup
^^^^^^^^^^^^^^^^^^^^

As we can rely on the local machine's filesystem, simply copy your dbt project files and dbt ``profiles.yml`` to a path in your local machine. In your local machine, files may be laid out as:

.. code::
.
|-- ~/.dbt/
| `-- profiles.yml
`-- /path/to/project/
|-- dbt_project.yml
|-- models/
| |-- model1.sql
| `-- model2.sql
|-- seeds/
| |-- seed1.csv
| `-- seed2.csv
|-- macros/
| |-- macro1.csv
| `-- macro2.csv
`-- tests/
|-- test1.sql
`-- test2.sql
So we can simply set ``project_dir`` and ``profiles_dir`` to ``"/path/to/project/"`` and ``"~/.dbt/"`` respectively:

.. code-block:: python
:linenos:
:caption: example_local_1.py
import datetime as dt
from airflow.utils.dates import days_ago
from airflow_dbt_python.operators.dbt import DbtRunOperator
with DAG(
dag_id="example_dbt_artifacts",
schedule_interval="0 0 * * *",
start_date=days_ago(1),
catchup=False,
dagrun_timeout=dt.timedelta(minutes=60),
) as dag:
dbt_run = DbtRunOperator(
task_id="dbt_run_daily",
project_dir="/path/to/project",
profiles_dir="~/.dbt/",
select=["+tag:daily"],
exclude=["tag:deprecated"],
target="production",
profile="my-project",
)
.. note::
Setting ``profiles_dir`` to ``"~/.dbt/"`` can be ommitted as this is the default value.


If we have multiple operators, we can also utilize default arguments and include other parameters like the profile and target to use:

.. code-block:: python
:linenos:
:caption: example_local_2.py
import datetime as dt
from airflow.utils.dates import days_ago
from airflow_dbt_python.operators.dbt import DbtRunOperator, DbtSeedOperator
default_args = {
"project_dir": "/path/to/project/",
"profiles_dir": "~/.dbt/",
"target": "production",
"profile": "my-project",
}
with DAG(
dag_id="example_dbt_artifacts",
schedule_interval="0 0 * * *",
start_date=days_ago(1),
catchup=False,
dagrun_timeout=dt.timedelta(minutes=60),
default_args=default_args,
) as dag:
dbt_seed = DbtSeedOperator(
task_id="dbt_seed",
)
dbt_run = DbtRunOperator(
task_id="dbt_run_daily",
select=["+tag:daily"],
exclude=["tag:deprecated"],
)
dbt_seed >> dbt_run
.. note::
dbt supports configuration via environment variables, which may also be used. Additionally, ``profile`` and ``target`` may be ommitted if already specified in ``dbt_project.yml`` and ``profiles.yml`` respectively.

Multi-machien/cloud installation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6 changes: 5 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@ Welcome to airflow-dbt-python's documentation!
:maxdepth: 2
:caption: Contents:

autodoc
introduction.rst
getting_started.rst
example_dags.rst
development.rst
reference.rst

Indices and tables
==================
Expand Down

0 comments on commit 4746d80

Please sign in to comment.