## CDF for Data Mesh & Delta Sharing

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/retail/resources/images/delta-cdf-datamesh.png" style="float:right; margin-right: 50px" width="300px" />

When sharing data within a Datamesh and/or to external organization with Delta Sharing, you not only need to share existing data, but also all modifications, so that your consumer can capture apply the same changes.

CDF makes **Data Mesh** implementation easier. Once enabled by an organisation, data can be shared with other. It's then easy to subscribe to the modification stream and propagage GDPR DELETE downstream.

To do so, we need to make sure the CDF are enabled at the table level. Once enabled, it'll capture all the table modifications using the `table_changes` function.

For more details, visit the [CDF documentation](https://docs.databricks.com/delta/delta-change-data-feed.html)

In [0]:
%run ./_resources/00-setup $reset_all_data=false

In [0]:
ALTER TABLE user_delta SET TBLPROPERTIES (delta.enableChangeDataFeed = true)

#### Delta CDF table_changes output
In addition to the row details, `table_changes` provides back 4 cdc types in the "_change_type" column:

| CDC Type             | Description                                                               |
|----------------------|---------------------------------------------------------------------------|
| **update_preimage**  | Content of the row before an update                                       |
| **update_postimage** | Content of the row after the update (what you want to capture downstream) |
| **delete**           | Content of a row that has been deleted                                    |
| **insert**           | Content of a new row that has been inserted                               |

In [0]:
UPDATE user_delta SET firstname = 'John' WHERE id < 10;

DELETE FROM user_delta WHERE id > 1000;

In [0]:
DESCRIBE HISTORY user_delta;

In [0]:
SELECT * FROM table_changes("user_delta", 9)

In [0]:
SELECT * FROM table_changes("user_delta", 8)

In [0]:
SELECT DISTINCT(_change_type) FROM table_changes("user_delta", 8)

## Using CDF to capture incremental change (stream):
To capture the last changes from your table, you can leverage Spark Streaming API. 

It's then easy to subscribe to modifications stream on one of your table to propagage GDPR DELETE downstream

In [0]:
%python
stream = (spark.readStream.format("delta")
                .option("readChangeFeed", "true")
                .option("startingVersion", 7)
                .table("user_delta"))

display(stream, get_chkp_folder(folder))

In [0]:
%python
DBDemos.stop_all_streams()