Skip to content

Commit

Permalink
Merge pull request #194 from tinybirdco/add-column-kafka-ds-ex
Browse files Browse the repository at this point in the history
Add new column to kafka ds, example and documentation
  • Loading branch information
Alberto Juan committed Jan 16, 2024
2 parents a9115e0 + 0b6aa82 commit 3e9630c
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 21 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This repository contains all the use cases you can iterate with Versions:

- [Add column to a Materialized View](add_new_column_to_a_materialized_view)
- [Add column to a Landing Data Source](add_nullable_column_to_landing_data_source)
- [Add column to a Kafka Data Source](add_column_kafka_data_source)
- [Change column type in a Materialized View](change_column_type_materialized_view)
- [Change Copy Pipe time granularity](change_copy_pipe_granularity)
- [Change sorting key to a Landing Data Source](change_sorting_key_landing_data_source)
Expand Down
41 changes: 22 additions & 19 deletions add_column_kafka_data_source/README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,33 @@
# Add a new column to a Kafka Data Source

In this use case we are going to add a column to a Kafka Data Source. The same steps will work independently of the MergeTree engine used.
- Just a add the desired column (`meta_image_v3` in this example), remember to set it as Nullable.

### 1st Kafka Data Source
> Important! Not to bump the version, leave it unchanged, other case you'll create a new Data Source instead of altering the current one.
[Pull Request #1](https://github.com/tinybirdco/use-case-examples/pull/79)
[Pull Request](https://github.com/tinybirdco/use-case-examples/pull/194)

You have already created a Kafka Data Sources using the UI or with CLI following [the docs](https://www.tinybird.co/docs/ingest/kafka.html).

### Add the new column to the kafka Data Source

[Pull Request #2](https://github.com/tinybirdco/use-case-examples/pull/83)

- Create a new branch
- Add a new column in the Data Source. For example you can do:
```diff
diff --git a/add_column_kafka_data_source/datasources/my_kafka_ds.datasource b/add_column_kafka_data_source/datasources/my_kafka_ds.datasource
index 72ae19b..ff621f9 100644
--- a/add_column_kafka_ds/datasources/my_kafka_ds.datasource
+++ b/add_column_kafka_ds/datasources/my_kafka_ds.datasource
@@ -12,7 +12,8 @@ SCHEMA >
SCHEMA >
`user_agent` String `json:$.user_agent`,
`meta_color` Nullable(String) `json:$.meta.color`,
`meta_size` Nullable(String) `json:$.meta.size`,
- `meta_image` Nullable(String) `json:$.meta.image`
+ `meta_image` Nullable(String) `json:$.meta.image`,
+ `meta_image_v2` Nullable(String) `json:$.meta.image_v2`
`meta_image` Nullable(String) `json:$.meta.image`
- `meta_image` Nullable(String) `json:$.meta.image_v2`
+ `meta_image_v2` Nullable(String) `json:$.meta.image_v2`,
+ `meta_image_v3` Nullable(String) `json:$.meta.image_v3`
```
- Make sure you don't change the version in the `.tinyenv` file. We want to alter the existing Data Source in the `live` release and in other case a new data source would be created.

> We added a CI custom deployment that is only required to bypass a known bug. Basically, when working with kafka connections we check that the new columns of the Data Source are in the Data Source's Quarantine Data Source, but we don't create Quarantine Data Sources in the CI Branches by default. For that reason, we push some wrong fixtures which forces the Quarantine Data Source creation.
```bash
set +e # Allow errors, the append command will fail
tb datasource append my_kafka_ds datasources/fixtures/my_kafka_ds.ndjson # Hack, it's required to create quarantine table
```

The rest is the similar to what a default deployment without custom script does:

```bash
set -e # Deployment should not fail
tb --semver 0.0.0 deploy --v3 --yes
```
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"datetime": "bad_input", "event_id": "e123", "event_name": "click", "fingerprint": "fp1", "price": 100, "product_id": "p123", "product_type": "Electronics", "url": "https://example.com/product/p123", "user_agent": "Mozilla/5.0", "meta": {"color": "Red", "size": "M", "image": "https://example.com/images/p123.jpg", "image_v2": "https://example.com/images/v2/p125.jpg"}}
{"datetime": "bad_input", "event_id": "e123", "event_name": "click", "fingerprint": "fp1", "price": 100, "product_id": "p123", "product_type": "Electronics", "url": "https://example.com/product/p123", "user_agent": "Mozilla/5.0", "meta": {"color": "Red", "size": "M", "image": "https://example.com/images/p123.jpg", "image_v2": "https://example.com/images/v2/p125.jpg"}}
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ SCHEMA >
`meta_color` Nullable(String) `json:$.meta.color`,
`meta_size` Nullable(String) `json:$.meta.size`,
`meta_image` Nullable(String) `json:$.meta.image`,
`meta_image_v2` Nullable(String) `json:$.meta.image_v2`
`meta_image_v2` Nullable(String) `json:$.meta.image_v2`,
`meta_image_v3` Nullable(String) `json:$.meta.image_v3`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(__timestamp)"
Expand Down
6 changes: 6 additions & 0 deletions add_column_kafka_data_source/deploy/0.0.0/ci-deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set +e # Allow errors, the append command will fail
tb datasource append my_kafka_ds datasources/fixtures/my_kafka_ds.ndjson # Hack, it's required to create quarantine table
set -e # Deployment should not fail

tb --semver 0.0.0 deploy --v3 --yes

0 comments on commit 3e9630c

Please sign in to comment.