Skip to content

1.12.0

Compare
Choose a tag to compare
@rudolfix rudolfix released this 17 Jun 21:43
· 1 commit to master since this release
f6a8f65

Core Library

Quality of Life (fixing annoying little things)

  • 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
  • warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
  • Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
  • QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
  • Fix issue 2690: switch to packaging to remove warning on import dlt by @djudjuu in #2707
  • qol: exception formatting by @zilto in #2715
  • Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
    In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie. LIMIT env variable which was skipped before
@dlt.source
def source():
    @dlt.resource(write_disposition="merge", primary_key="_id")
    def documents(access_token=dlt.secrets.value, limit=10):
        yield from generate_json_like_data(access_token, limit)

    return documents

⚠️ Still we do not recommend to define parametrized inner resources.

  • You can now return data from resources instead of yielding single item. We do not recommend that for code readability.dlt always wraps resources in generators so your return will be converted to yield.
  • To return a DltResource from a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
    return dlt.resource([1, 2, 3], name=name, primary_key="value")
  • normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
  • ⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).
  • ⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
  • We use custom, consistent wrap and unwrap of functions. our decorators preserve both typing and runtime signature of decorated functions. makefun got removed.
  • if Incremental initializes from another Incremental as native value, it copies original type correctly
  • dlt.resource can define configuration section (also using lambdas)

Bugfixes and improvements

  • feat: Expand sql table resource config by @xneg in #2396
  • Added write_disposition to sql table config
  • Added primary_key and merge_key to sql table config
  • Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
  • Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
  • Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
  • Upsert merge strategy for iceberg by @anuunchin in #2671
  • Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
  • motherduck destination config improvement: uppercase env var by @djudjuu in #2703
  • adds parquet support to postgres via adbc by @rudolfix in #2685
  • 2681 - fixes null on non null column arrow by @rudolfix in #2721
  • removes cffi version of psycopg2
  • mssql and snowflake bugfixes by @rudolfix in #2756
  • allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
  • logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730

Chores & tech debt
We switch to uv in the coming days and:

  • Simplify workflow files by @sh-rp in #2663
  • fix/2677: remove recursive filewatching by @zilto in #2678
  • QoL: improved __repr__() for public interface by @zilto in #2630
  • fix: incrementally watch files by @zilto in #2697
  • Simplify pipeline test utils by @sh-rp in #2566 (we use data access and dataset for testing now)
  • added constants for load_id col in _dlt_loads table by @zilto in #2729
  • Update github workflow setup by @sh-rp in #2728
  • fixes leaking datasets tests by @rudolfix in #2730

🧪 Upgrades to data access

  • Normalize model files by @sh-rp in #2507
  • dlt.transformation implementation by @sh-rp in #2528
  • [transformations] decouples sqlglot lineage and schema generation from destination identifiers by @rudolfix in #2705
  • All SQL queries are destination agnostic. For example
  • Column lineage is computed and inferred. x-annotation hints are propagated
  • SqlModel represent SQL query and is processed in extract, normalize and loaded in load step
  • you can use scalar() on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()

🧪 Cool experimental stuff:

Check out our new embedded pipeline explorer app

dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edit

use edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer

Docs

  • docs: dlt+ iceberg destination partitioning by @burnash in #2686
  • docs: fix invalid bigquery reference in athena destination by @goober in #2700
  • docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
  • docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
  • rest_api: document pagination hierarchy and add tests by @burnash in #2745
  • docs: add session parameter to rest_api client configuration by @burnash in #2746
  • docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768

We updated contribution guidelines

  • By default we do not accept more destinations (except a few like DuckLake or Trino)
  • Each PR needs a test and (possibly) docs entry

New Contributors

Full Changelog: 1.11.0...1.12.0