Skip to content

2.0.0-beta5

Pre-release
Pre-release

Choose a tag to compare

@jarohen jarohen released this 13 Jan 13:41
· 2115 commits to main since this release
b48480f

We're pleased to announce the release of 2.0.0-beta5! 🚀

In amongst the usual array of bugfixes and minor improvements, 2.0.0-beta5 adds "template-friendly SQL", as well as a 4x reduction in per-transaction overhead for small transactions through the Postgres wire-server.

Template-friendly SQL

Hands up if you've ever had to get the right number of commas or ANDs when generating a SQL SELECT/WHERE clause respectively 😆 🖐

SQL was originally intended to be a language written by hand, and was created long (long) before modern languages had decent support for string templating. While SQL itself is incredibly powerful, adapting it to dynamic contexts in application code has often felt like trying to fit a square peg into a round hole. From string concatenation bugs to unexpected syntax errors caused by stray commas, many developers have experienced the frustration of working with SQL in dynamic templates.

A common workaround has been to introduce tools or libraries to generate SQL programmatically, but these come with their own set of challenges. Developers often find themselves unsure about the exact SQL being generated and sometimes need to cajole these tools into producing the desired query, adding extra complexity and reducing transparency.

With XTDB, we're introducing template-friendly SQL, a modern take on making SQL generation smoother and more forgiving in templated environments, particularly for languages and libraries that already offer strong support for string templating. Whether you're using Kotlin, Python, JavaScript, or Clojure libraries like YeSQL and HugSQL which emphasize 'getting back to SQL', this feature integrates seamlessly into workflows focused on dynamic SQL generation.

Here's what's new:

  • Flexible Comma Placement: Forget the days of meticulously managing commas in your SELECT and WHERE clauses. Taking inspiration from Clojure's "commas as whitespace," trailing commas, multiple commas in a row, or even empty predicates are gracefully handled without causing syntax errors. You can now safely generate SQL without needing additional libraries or worrying about “did I add that comma in the right place?”

    query = f"""
    SELECT *
    FROM my_table
    WHERE
        {"col1 = 'value1'," if condition1 else ''}
        {"col2 = 'value2'," if condition2 else ''}
    """
    # The extra commas are ignored, making templates more forgiving and human-readable.
  • Query Pipelining: SQL's top-level structure is relatively constrained - it must only have one SELECT, FROM, WHERE, GROUP BY, and if you require multiple steps, you need to use subqueries.

    Take this query to calculate a frequency distribution, for example:

    SELECT order_count, COUNT(*) AS freq
    FROM (SELECT customer, COUNT(*) AS order_count
          FROM orders
          GROUP BY customer) counts
    GROUP BY order_count
    ORDER BY order_count DESC

    With XTDB, you can now:

    • optionally move the SELECT to the place in the query where it logically runs
    • run multiple aggregations in a pipeline
    • elide the GROUP BY in this common case
    FROM orders
    SELECT customer, COUNT(*) AS order_count
    SELECT order_count, COUNT(*) AS freq
    
    ORDER BY order_count DESC

    To clarify: as always, the standard SQL structure still works (so your existing tooling shouldn't be affected) - this is an opt-in feature.

These enhancements align with XTDB's philosophy of developer-first tooling, reducing friction and making your code cleaner and more robust. Whether you're dynamically generating queries or simply writing complex SQL templates, template-friendly SQL lets you focus on the logic rather than the syntax.

We'd love to hear how this feature improves your workflow. Try it out and share your feedback with us!

Transaction ingestion performance

We've also spent some time improving XTDB's ingestion performance through the Postgres wire-server.

Our recommendation has always been (and continues to be) to batch your inserts wherever possible - the right granularity will obviously depend on your exact use-case but in practice we've found batches of the order of magnitude of ~1k seem to be a sweet spot. (This generally applies to whatever database you're using - XT is no different!)

Of course, though, there will be times when this isn't possible, and we'd like to make that case faster too.

So, since beta4, we've added in a micro-benchmark to measure this performance - ingesting 100k small documents at various batch sizes. On my desktop, we've seen the following improvements:

  • batches of 10: 24s -> 6s
  • batches of 100: 8s -> 1.8s
  • batches of 1000: 6s -> 1.2s

How? Well:

  • we found a case in Apache Arrow where one of the vectors had a O(n²) operation in its copy implemention
  • we've spent some time optimising our low-level Postgres wire implementation - down to the buffers, bytes and sockets
  • we're now able to recognise another class of INSERT queries that don't need to read the database to make their writes.

Again, do give it a spin, and let us know what you find on your hardware!

Elsewhere

Two breaking changes to tell you about:

  • If you are using XT in your Clojure process, we've moved our time reader-macros off the #time namespace to not clash with Henry's library.

    You can either depend on that library specifically, or replace #time/ with #xt/.

  • The Postgres wire-server and FlightSQL server now start on an unused port by default (rather than 5432/9832), to avoid clashing with other processes you may have running.

    To restore the old behaviour, you'll need to explicitly specify the port in your configuration. (The official Docker images remain on the original ports.)

As always, we'd love to hear your thoughts - you can get in touch either through GitHub, https://discuss.xtdb.com, or hello@xtdb.com.

Cheers!

James & the XT Team