To set up, first create a cluster and attach each of these libraries.

```
# Maven coordinates
com.ionic:ionic-sdk:2.6.0
org.slf4j:slf4j-api:1.7.30
org.slf4j:slf4j-simple:1.7.30

# Upload Jar from releases
https://github.com/turtlemonvh/ionic-spark-utils/releases/tag/v0.0.2
```

The cluster "libraries" page should look like this:

![databricks-libraries](databricks-libraries.png)

We're going to be using [Databrick's native secret store](https://docs.databricks.com/security/secrets/secrets.html) to store the device profile.  We'll use the Databricks CLI for this step.

```bash
# Install the databricks cli
# https://docs.databricks.com/dev-tools/cli/index.html#install-the-cli
$ pip install databricks-cli

# Configure auth for the cli
$ databricks configure
Databricks Host (should begin with https://): https://dbc-9da8a959-d077.cloud.databricks.com/
Username: timothy@ionicsecurity.com
Password:
Repeat for confirmation:

# Create a secret scope
# https://docs.databricks.com/security/secrets/secret-scopes.html#create-a-databricks-backed-secret-scope
$ databricks secrets create-scope --scope ionic-demo --initial-manage-principal users

# Confirm you can list the contents of the scope
$ databricks secrets list --scope ionic-demo
Key name    Last updated
----------  --------------

# Use the machina cli to create a copy a profile as plaintext, so we can load that into secrets: https://dev.ionic.com/tools/machina
# Docs on creating a profile can be found here: https://dev.ionic.com/getting-started/create-ionic-profile
machina profile move -d O_6t.e.06a7dcec-7f4b-4361-543e-048e4b1a733c -t plaintext -f profile.tmp
# Set that profile as active within the plaintext persistor
machina -t plaintext -f profile.pt profile set -d O_6t.e.06a7dcec-7f4b-4361-543e-048e4b1a733c

# Load that profile into databricks and delete the local unencrypted copy
$ databricks secrets put --scope ionic-demo --key demo-profile --string-value $(cat profile.tmp)
$ rm profile.tmp

# Ensure the new value shows up in the list
$ databricks secrets list --scope ionic-demo
Key name        Last updated
------------  --------------
demo-profile   1590122961171
```

In [3]:
# Now we'll make sure we can load that secret. We'll use python for this step.
# Databricks helpfully redacts the secret value so we don't do anything silly.

dbutils.secrets.get(scope="ionic-demo", key="demo-profile")

In [4]:
# We do want to make sure we can work with this value, though, so let's make sure we can parse the JSON form of the profile and grab the deviceid

import json

json.loads(dbutils.secrets.get(scope="ionic-demo", key="demo-profile"))["profiles"][0]["deviceId"]

In [5]:
%scala
// Now we can start using these credentials to do something interesting.
// Let's start by creating an Ionic agent.

import com.ionic.sdk.agent.Agent
import com.ionic.sdk.device.profile.persistor.DeviceProfiles

// Need to mark these as transient because they are not serializable and databricks tries to serialize them into the scope available to each executor
// https://www.scala-lang.org/files/archive/spec/2.11/11-annotations.html#java-platform-annotations
// https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.3.1.3
// https://docs.oracle.com/javase/8/docs/api/java/beans/Transient.html
@transient val profileJson = dbutils.secrets.get(scope="ionic-demo", key="demo-profile")
@transient val deviceProfiles = new DeviceProfiles(profileJson)
@transient val a = new Agent(deviceProfiles)

In [6]:
%scala

// Check our active profile (setting if needed)

a.setActiveProfile("O_6t.e.06a7dcec-7f4b-4361-543e-048e4b1a733c")
a.getActiveProfile().getDeviceId()

In [7]:
%scala
// Let's create and fetch a key just for fun

// https://dev.ionic.com/sdk_docs/ionic_platform_sdk/java/version_2.7.0/sdk/com/ionic/sdk/agent/Agent.html
@transient val keyid = a.createKey().getFirstKey().getId() 

In [8]:
%scala
// Fetch

a.getKey(keyid)

In [9]:
%scala
// Now that we have the basics, we'll move into some Spark specific code

// We'll be pretending that some data got dumped into the raw ingest section of our data lake, and we need to clean this up so
// we can use the data in clean tables for downstream jobs.

// We'll work with sample data from: https://docs.databricks.com/getting-started/spark/datasets.html#load-sample-data
// We'll start by loading the dataset and displaying it to get an idea of the contents.
val ds = spark.read.json("/databricks-datasets/iot/iot_devices.json")
ds.persist
display(ds.limit(10))

battery_level,c02_level,cca2,cca3,cn,device_id,device_name,humidity,ip,latitude,lcd,longitude,scale,temp,timestamp
8,868,US,USA,United States,1,meter-gauge-1xbYRYcj,51,68.161.225.1,38.0,green,-97.0,Celsius,34,1458444054093
7,1473,NO,NOR,Norway,2,sensor-pad-2n2Pea,70,213.161.254.1,62.47,red,6.15,Celsius,11,1458444054119
2,1556,IT,ITA,Italy,3,device-mac-36TWSKiT,44,88.36.5.1,42.83,red,12.83,Celsius,19,1458444054120
6,1080,US,USA,United States,4,sensor-pad-4mzWkz,32,66.39.173.154,44.06,yellow,-121.32,Celsius,28,1458444054121
4,931,PH,PHL,Philippines,5,therm-stick-5gimpUrBB,62,203.82.41.9,14.58,green,120.97,Celsius,25,1458444054122
3,1210,US,USA,United States,6,sensor-pad-6al7RTAobR,51,204.116.105.67,35.93,yellow,-85.46,Celsius,27,1458444054122
3,1129,CN,CHN,China,7,meter-gauge-7GeDoanM,26,220.173.179.1,22.82,yellow,108.32,Celsius,18,1458444054123
0,1536,JP,JPN,Japan,8,sensor-pad-8xUD6pzsQI,35,210.173.177.1,35.69,red,139.69,Celsius,27,1458444054123
3,807,JP,JPN,Japan,9,device-mac-9GcjZ2pw,85,118.23.68.227,35.69,green,139.69,Celsius,13,1458444054124
7,1470,US,USA,United States,10,sensor-pad-10BsywSYUF,56,208.109.163.218,33.61,red,-111.89,Celsius,26,1458444054125


In [10]:
%scala

// We're working on a reasonably sized dataframe
ds.count

In [11]:
%scala

// Now for the security details.
// Let's assume that the device name is sensitive and must be encrypted before we can start using this data in downstream jobs.
// We could just drop that column, but it may be useful downstream to some jobs with permission to work with that data.
// So we'e going to encrypt that column.

import io.github.turtlemonvh.ionicsparkutils.KeyServicesCache;
import com.ionic.sdk.key.KeyServices;

// I'm leaving this string outside of the agentFactory because I was having issues getting dbutils 
// to behave inside the agentFactory code, which runs on each Spark executor node.
val profileJson = dbutils.secrets.get(scope="ionic-demo", key="demo-profile")

// We need a function that will can create an agent (technically an instance of a class implementing the KeyServices interface) on each spark executor.
// Here we'll use the "profileJson", which Spark will serialize and make available in `agentFactory` when it runs on each executor.
// Creating the new agent has some overhead, but this is minor for larger transform operations.
// We're considering other strategies to reduce overhead going forward.

// We also wrap the agent in a caching layer so we only fetch a key once and use the same key to encrypt all the data in the column.
// Since this moves us from performing thousands of sequential http requests (to create keys) to using a local cache, performance is vastly improved.
// Empirically, runtime for the transform operation goes from >30 min to <10s (and most of that is Spark task orchestration overhead).

def agentFactory(): KeyServices = {
  val deviceProfiles = new DeviceProfiles(profileJson)
  val threadLocalAgent = new Agent(deviceProfiles)
  threadLocalAgent.setActiveProfile("O_6t.e.06a7dcec-7f4b-4361-543e-048e4b1a733c")
  new KeyServicesCache(threadLocalAgent)
}

In [12]:
%scala
import io.github.turtlemonvh.ionicsparkutils.{Transformers => IonicTransformers};

// Now we're going to transform the dataset by encrypting a column.
// The resulting column will have the prefix "ionic_enc_", so we'll
// take that column and rename it.

val encryptedDF = ds
.transform(IonicTransformers.Encrypt(
  encryptCols = List("device_name"),
  decryptCols = List(),
  agentFactory = agentFactory
))
.drop("device_name")
.withColumnRenamed("ionic_enc_device_name", "device_name")

encryptedDF.persist

display(encryptedDF.limit(10))

battery_level,c02_level,cca2,cca3,cn,device_id,humidity,ip,latitude,lcd,longitude,scale,temp,timestamp,device_name
8,868,US,USA,United States,1,51,68.161.225.1,38.0,green,-97.0,Celsius,34,1458444054093,~!3!O_6tPOBBEcY!gRyuG9Ks8s62ez+DRa3H0a+si/LIK30oZvvJ1AoWt2UXoInNjTcgsto9mA/MIaGxrd/KQQ!
7,1473,NO,NOR,Norway,2,70,213.161.254.1,62.47,red,6.15,Celsius,11,1458444054119,~!3!O_6tPOBBEcY!XUgNrEO+ubmzU/lhIeFeoqlMLR5BBU49nNEEqK19uKx41QOlbbjN5l5sBV9EFVWddA!
2,1556,IT,ITA,Italy,3,44,88.36.5.1,42.83,red,12.83,Celsius,19,1458444054120,~!3!O_6tPOBBEcY!9OWbJmEt6jls8rNIeTMkh0Oogb+bPKdgDOJg8T7XdaIPZdqkpCSgdZZVFPimvGIZ1FAQ!
6,1080,US,USA,United States,4,32,66.39.173.154,44.06,yellow,-121.32,Celsius,28,1458444054121,~!3!O_6tPOBBEcY!XsxxgRU6yd39CLD7nHgWcTmiVtRFHzG8pDPcy6HHhgezBBCVWsFl/sQGtC+87vwqrw!
4,931,PH,PHL,Philippines,5,62,203.82.41.9,14.58,green,120.97,Celsius,25,1458444054122,~!3!O_6tPOBBEcY!Sji+zQpttz5+ncfzABbjZFCrrU66UF31L1OsrhI7vJa8fkpOgu1LOn7WKR1KAe50kFotlAo!
3,1210,US,USA,United States,6,51,204.116.105.67,35.93,yellow,-85.46,Celsius,27,1458444054122,~!3!O_6tPOBBEcY!Boa5j1MBWrUxqECVO5ZvZmWd0gwOCAk7gO5KTliNDQewZQ43MhvsfV25XRewavOtab3DPJY!
3,1129,CN,CHN,China,7,26,220.173.179.1,22.82,yellow,108.32,Celsius,18,1458444054123,~!3!O_6tPOBBEcY!nXafBw+m9w+ykeWgkfRcX1Dg98qzlr+UeQZJobBE6t/13loD4yWoie71x4XjnfIPxA2nRg!
0,1536,JP,JPN,Japan,8,35,210.173.177.1,35.69,red,139.69,Celsius,27,1458444054123,~!3!O_6tPOBBEcY!8FFkPm5L2x1iWmbgu7XmE8U1psgBdidjqtWrAKzpwx2Lk43dsKplS/XrMnvo8jHZbfEEAq4!
3,807,JP,JPN,Japan,9,85,118.23.68.227,35.69,green,139.69,Celsius,13,1458444054124,~!3!O_6tPOBBEcY!JCR92niyWRb/WYfZwdnhvAzW91Vgd3ErgAWd367fiYH2NsqS1bFPJIISH3UlyL+OqEAh!
7,1470,US,USA,United States,10,56,208.109.163.218,33.61,red,-111.89,Celsius,26,1458444054125,~!3!O_6tPOBBEcY!RW0JMIVFwRNL9UzEkxuPANqFApQtDjg5YmzZGQSiT6vHFiCQtYdZLK+d3KoN5dbp80EWqbk!


In [13]:
%scala

// Now that the sensitive data is encrypted we can save it in a table for later use.
// Since at this point the data is just a string, that's not very exciting, so we'll skip that step.

// Instead, we'll show how we can decrypt the encrypted column of the dataset to get back plaintext.
// The resulting column will have the prefix "ionic_dec_", so we'll
// take that column and rename it.

val decryptedDF = encryptedDF
.transform(IonicTransformers.Encrypt(
  encryptCols = List(),
  decryptCols = List("device_name"),
  agentFactory = agentFactory
))
.drop("device_name")
.withColumnRenamed("ionic_dec_device_name", "device_name")

decryptedDF.persist

display(decryptedDF.limit(10))

battery_level,c02_level,cca2,cca3,cn,device_id,humidity,ip,latitude,lcd,longitude,scale,temp,timestamp,device_name
8,868,US,USA,United States,1,51,68.161.225.1,38.0,green,-97.0,Celsius,34,1458444054093,meter-gauge-1xbYRYcj
7,1473,NO,NOR,Norway,2,70,213.161.254.1,62.47,red,6.15,Celsius,11,1458444054119,sensor-pad-2n2Pea
2,1556,IT,ITA,Italy,3,44,88.36.5.1,42.83,red,12.83,Celsius,19,1458444054120,device-mac-36TWSKiT
6,1080,US,USA,United States,4,32,66.39.173.154,44.06,yellow,-121.32,Celsius,28,1458444054121,sensor-pad-4mzWkz
4,931,PH,PHL,Philippines,5,62,203.82.41.9,14.58,green,120.97,Celsius,25,1458444054122,therm-stick-5gimpUrBB
3,1210,US,USA,United States,6,51,204.116.105.67,35.93,yellow,-85.46,Celsius,27,1458444054122,sensor-pad-6al7RTAobR
3,1129,CN,CHN,China,7,26,220.173.179.1,22.82,yellow,108.32,Celsius,18,1458444054123,meter-gauge-7GeDoanM
0,1536,JP,JPN,Japan,8,35,210.173.177.1,35.69,red,139.69,Celsius,27,1458444054123,sensor-pad-8xUD6pzsQI
3,807,JP,JPN,Japan,9,85,118.23.68.227,35.69,green,139.69,Celsius,13,1458444054124,device-mac-9GcjZ2pw
7,1470,US,USA,United States,10,56,208.109.163.218,33.61,red,-111.89,Celsius,26,1458444054125,sensor-pad-10BsywSYUF
