Merge pull request #194 from tinybirdco/add-column-kafka-ds-ex

Add new column to kafka ds, example and documentation
tinybirdco · Jan 16, 2024 · 3e9630c · 3e9630c
2 parents a9115e0 + 0b6aa82
commit 3e9630c
Show file tree

Hide file tree

Showing 5 changed files with 32 additions and 21 deletions.
diff --git a/README.md b/README.md
@@ -12,6 +12,7 @@ This repository contains all the use cases you can iterate with Versions:
 
 - [Add column to a Materialized View](add_new_column_to_a_materialized_view)
 - [Add column to a Landing Data Source](add_nullable_column_to_landing_data_source)
+- [Add column to a Kafka Data Source](add_column_kafka_data_source)
 - [Change column type in a Materialized View](change_column_type_materialized_view)
 - [Change Copy Pipe time granularity](change_copy_pipe_granularity)
 - [Change sorting key to a Landing Data Source](change_sorting_key_landing_data_source)

diff --git a/add_column_kafka_data_source/README.md b/add_column_kafka_data_source/README.md
@@ -1,30 +1,33 @@
 # Add a new column to a Kafka Data Source
 
-In this use case we are going to add a column to a Kafka Data Source. The same steps will work independently of the MergeTree engine used.
+- Just a add the desired column (`meta_image_v3` in this example), remember to set it as Nullable.
 
-### 1st Kafka Data Source 
+> Important! Not to bump the version, leave it unchanged, other case you'll create a new Data Source instead of altering the current one.
 
-[Pull Request #1](https://github.com/tinybirdco/use-case-examples/pull/79)
+[Pull Request](https://github.com/tinybirdco/use-case-examples/pull/194)
 
-You have already created a Kafka Data Sources using the UI or with CLI following [the docs](https://www.tinybird.co/docs/ingest/kafka.html).
-
-### Add the new column to the kafka Data Source
-
-[Pull Request #2](https://github.com/tinybirdco/use-case-examples/pull/83)
-
-- Create a new branch
-- Add a new column in the Data Source. For example you can do:
 ```diff
-diff --git a/add_column_kafka_data_source/datasources/my_kafka_ds.datasource b/add_column_kafka_data_source/datasources/my_kafka_ds.datasource
-index 72ae19b..ff621f9 100644
---- a/add_column_kafka_ds/datasources/my_kafka_ds.datasource
-+++ b/add_column_kafka_ds/datasources/my_kafka_ds.datasource
-@@ -12,7 +12,8 @@ SCHEMA >
+SCHEMA >
      `user_agent` String `json:$.user_agent`,
      `meta_color` Nullable(String) `json:$.meta.color`,
      `meta_size` Nullable(String) `json:$.meta.size`,
--    `meta_image` Nullable(String) `json:$.meta.image`
-+    `meta_image` Nullable(String) `json:$.meta.image`,
-+    `meta_image_v2` Nullable(String) `json:$.meta.image_v2`
+     `meta_image` Nullable(String) `json:$.meta.image`
+-    `meta_image` Nullable(String) `json:$.meta.image_v2`
++    `meta_image_v2` Nullable(String) `json:$.meta.image_v2`,
++    `meta_image_v3` Nullable(String) `json:$.meta.image_v3`
+```
+- Make sure you don't change the version in the `.tinyenv` file. We want to alter the existing Data Source in the `live` release and in other case a new data source would be created.
+
+> We added a CI custom deployment that is only required to bypass a known bug. Basically, when working with kafka connections we check that the new columns of the Data Source are in the Data Source's Quarantine Data Source, but we don't create Quarantine Data Sources in the CI Branches by default. For that reason, we push some wrong fixtures which forces the Quarantine Data Source creation.
+
+```bash
+set +e # Allow errors, the append command will fail
+tb datasource append my_kafka_ds datasources/fixtures/my_kafka_ds.ndjson # Hack, it's required to create quarantine table
 ```
 
+The rest is the similar to what a default deployment without custom script does:
+
+```bash
+set -e # Deployment should not fail
+tb --semver 0.0.0 deploy --v3 --yes
+```
diff --git a/add_column_kafka_data_source/datasources/fixtures/my_kafka_ds.ndjson b/add_column_kafka_data_source/datasources/fixtures/my_kafka_ds.ndjson
@@ -1 +1 @@
-{"datetime": "bad_input", "event_id": "e123", "event_name": "click", "fingerprint": "fp1", "price": 100, "product_id": "p123", "product_type": "Electronics", "url": "https://example.com/product/p123", "user_agent": "Mozilla/5.0", "meta": {"color": "Red", "size": "M", "image": "https://example.com/images/p123.jpg", "image_v2": "https://example.com/images/v2/p125.jpg"}}
+{"datetime": "bad_input", "event_id": "e123", "event_name": "click", "fingerprint": "fp1", "price": 100, "product_id": "p123", "product_type": "Electronics", "url": "https://example.com/product/p123", "user_agent": "Mozilla/5.0", "meta": {"color": "Red", "size": "M", "image": "https://example.com/images/p123.jpg", "image_v2": "https://example.com/images/v2/p125.jpg"}}
diff --git a/add_column_kafka_data_source/datasources/my_kafka_ds.datasource b/add_column_kafka_data_source/datasources/my_kafka_ds.datasource
@@ -13,7 +13,8 @@ SCHEMA >
     `meta_color` Nullable(String) `json:$.meta.color`,
     `meta_size` Nullable(String) `json:$.meta.size`,
     `meta_image` Nullable(String) `json:$.meta.image`,
-    `meta_image_v2` Nullable(String) `json:$.meta.image_v2`
+    `meta_image_v2` Nullable(String) `json:$.meta.image_v2`,
+    `meta_image_v3` Nullable(String) `json:$.meta.image_v3`
 
 ENGINE "MergeTree"
 ENGINE_PARTITION_KEY "toYYYYMM(__timestamp)"

diff --git a/add_column_kafka_data_source/deploy/0.0.0/ci-deploy.sh b/add_column_kafka_data_source/deploy/0.0.0/ci-deploy.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set +e # Allow errors, the append command will fail
+tb datasource append my_kafka_ds datasources/fixtures/my_kafka_ds.ndjson # Hack, it's required to create quarantine table
+set -e # Deployment should not fail
+
+tb --semver 0.0.0 deploy --v3 --yes