Skip to content

Commit

Permalink
vdk-events: update Ingest and Anonymize workshop (#2891)
Browse files Browse the repository at this point in the history
Add two surveys/quizes at start and finish.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
antoniivanov and pre-commit-ci[bot] committed Nov 13, 2023
1 parent 16793ad commit 49ac35d
Showing 1 changed file with 76 additions and 29 deletions.
105 changes: 76 additions & 29 deletions events/workshops/ingest-anonymize/IngestAndAnonymizeWorkshop.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNuXm0FYkB1JhBw54WVYWVR",
"include_colab_link": true
},
"kernelspec": {
Expand All @@ -23,7 +22,7 @@
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/antoniivanov/vdk-demo/blob/main/ingest-anonymize-workshop/IngestAndAnonymizeWorkshop.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
"<a href=\"https://colab.research.google.com/github/vmware/versatile-data-kit/blob/main/events/workshops/ingest-anonymize/IngestAndAnonymizeWorkshop.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
Expand Down Expand Up @@ -124,14 +123,39 @@
{
"cell_type": "code",
"source": [
"!pip install quickstart-vdk vdk-notebook vdk-ipython"
"!pip install quickstart-vdk vdk-notebook vdk-ipython vdk-singer tap-rest-api-msdk"
],
"metadata": {
"id": "A4cmJRoOHyUV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 1.4. Intro quiz"
],
"metadata": {
"id": "bWO_0_v1kAwB"
}
},
{
"cell_type": "code",
"source": [
"from IPython.display import display, HTML\n",
"\n",
"iframe_html = \"\"\"\n",
"<iframe width=\"100%\" height=\"600px\" src=\"https://forms.office.com/Pages/ResponsePage.aspx?id=yjiRs-48Skuk1s2D2d1i8HWjU6i4VDpCiaHEIRxgSIpUOUZLMUFUNUoyUk9TODYzOU1GTU5KVklUQi4u&embed=true\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" style=\"border: none; max-width:100%; max-height:100vh\" allowfullscreen webkitallowfullscreen mozallowfullscreen msallowfullscreen> </iframe>\"\"\"\n",
"\n",
"display(HTML(iframe_html))\n"
],
"metadata": {
"id": "V1feu4xej_2c"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
Expand Down Expand Up @@ -250,20 +274,6 @@
"id": "jwlP2bHNnqoT"
}
},
{
"cell_type": "code",
"source": [
"# Ingest list or iterable or collection of data (rows) you can use\n",
"job_input.send_tabular_data_for_ingestion(rows=[[1,2], [11, 22]],\n",
" destination_table = \"dummy_sent_data\",\n",
" column_names = ['column_1', 'column_2'])"
],
"metadata": {
"id": "Eifx8Dx-Rq1J"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
Expand Down Expand Up @@ -295,6 +305,15 @@
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"For no-code or less-code ingestion you can checkout [this tutorial](https://colab.research.google.com/github/vmware/versatile-data-kit/blob/main/events/data-sources/Ingest.ipynb)"
],
"metadata": {
"id": "j0y9qzq8p7bB"
}
},
{
"cell_type": "markdown",
"source": [
Expand Down Expand Up @@ -370,9 +389,10 @@
"import pandas as pd\n",
"# TODO:\n",
"# HINTs:\n",
"# - use job_input.send.... methods (see above examples)\n",
"# - requests.get().json()\n",
"# - pandas has pd.json_normalize useful to flatten nested objects\n",
"# - You can get the data using data = requests.get(url).json()\n",
"# - flattened_df = pd.json_normalize(data, sep='_') is useful to flatten nested objects\n",
"# - use job_input.send.... methods (see above examples) for pandas data frames\n",
"\n",
"\n"
],
"metadata": {
Expand All @@ -397,7 +417,7 @@
"cell_type": "markdown",
"source": [
"<a name=\"anonymization\"></a>\n",
"## Anonymization Plugin (Task 3)\n",
"## 5. Anonymization Plugin (Task 3)\n",
"\n",
"- The plugin should be configurable using `anonymization_fields={table_name: [columns], ...}`\n",
"\n",
Expand All @@ -414,7 +434,7 @@
{
"cell_type": "markdown",
"source": [
"### Benefits of Using a Pre-Ingest Plugin for Data Anonymization"
"### 5.1 Benefits of Using a Pre-Ingest Plugin for Data Anonymization"
],
"metadata": {
"id": "QjN8SwXabR8h"
Expand All @@ -436,7 +456,7 @@
{
"cell_type": "markdown",
"source": [
"### Generate a VDK Plugin package project"
"### 5.2 Generate a VDK Plugin package project"
],
"metadata": {
"id": "DuGGzyvwJOEm"
Expand Down Expand Up @@ -481,7 +501,7 @@
{
"cell_type": "markdown",
"source": [
"### Implement Pre Ingest Process Hook\n",
"### 5.3 Implement Pre Ingest Process Hook\n",
"\n",
"See [Ingester hooks documentation at here](https://github.com/vmware/versatile-data-kit/blob/7fba4f7c5c4da968e80d6a562b44517433b76e73/projects/vdk-core/src/vdk/api/plugin/plugin_input.py#L230)\n",
"\n",
Expand Down Expand Up @@ -518,7 +538,7 @@
{
"cell_type": "markdown",
"source": [
"### What configuration your plugin needs? \n",
"### 5.4 What configuration your plugin needs? \n",
"\n",
"Edit plugin_entry.py and add `vdk_configure` hook\n",
"\n",
Expand Down Expand Up @@ -556,7 +576,7 @@
{
"cell_type": "markdown",
"source": [
"### Initialize and register the ingester hooks\n",
"### 5.5 Initialize and register the ingester hooks\n",
"\n",
"Copy this in plugin_entry.py\n",
"\n",
Expand All @@ -581,7 +601,7 @@
{
"cell_type": "markdown",
"source": [
"### Install the newly build plugin"
"### 5.6 Install the newly build plugin"
],
"metadata": {
"id": "_248rdZROtSr"
Expand Down Expand Up @@ -612,7 +632,7 @@
{
"cell_type": "markdown",
"source": [
"### Configure the new plugin"
"### 5.7 Configure the new plugin"
],
"metadata": {
"id": "NkhjJ8EVdeS5"
Expand Down Expand Up @@ -645,7 +665,7 @@
{
"cell_type": "markdown",
"source": [
"### Test and verify\n",
"### 5.8 Test and verify\n",
"\n",
"Go back to step [**4. Ingesting Users data**](#ingest) and do it again. Now data should be anonymized"
],
Expand All @@ -662,6 +682,33 @@
"id": "BQzaocdzeITg"
}
},
{
"cell_type": "markdown",
"source": [
"<a name=\"final-quiz\"></a>\n",
"## Final Quiz"
],
"metadata": {
"id": "ru3dTo25oEAc"
}
},
{
"cell_type": "code",
"source": [
"from IPython.display import display, HTML\n",
"\n",
"iframe_html = \"\"\"\n",
"<iframe width=\"100%\" height=\"480px\" src=\"https://forms.office.com/Pages/ResponsePage.aspx?id=yjiRs-48Skuk1s2D2d1i8HWjU6i4VDpCiaHEIRxgSIpUMUlXTE9LS0lQQzFXNE5BUEdCWkVSRk5PNi4u&embed=true\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" style=\"border: none; max-width:100%; max-height:100vh\" allowfullscreen webkitallowfullscreen mozallowfullscreen msallowfullscreen> </iframe>\n",
"\"\"\"\n",
"\n",
"display(HTML(iframe_html))"
],
"metadata": {
"id": "QljxdCn_oJfQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
Expand Down

0 comments on commit 49ac35d

Please sign in to comment.