# Overview

In this notebook, we will explore how to create Azure Purview entity, classication, and lineage using Atlas APIs.

## Pre-requsites

- [Python 3](https://www.python.org/downloads/)
- [Az CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)

In [None]:
import json

In [None]:
SUBSCRIPTION_ID = "TODO" # fill in
RESOURCE_GROUP = "TODO" # fill in
PURVIEW_NAME = "TODO" # fill in
SERVICE_PRINCIPAL_NAME = "TODO" # fill in

In [None]:
!az login

In [None]:
!az account set --subscription {SUBSCRIPTION_ID}

In [None]:
# Create service principal to access Purview endpoint
sp = !az ad sp create-for-rbac \
    --name "http://{SERVICE_PRINCIPAL_NAME}" \
    --role "Purview Data Curator" \
    --scopes /subscriptions/{SUBSCRIPTION_ID}/resourceGroups/{RESOURCE_GROUP}/providers/Microsoft.Purview/accounts/{PURVIEW_NAME}

In [None]:
sp_json_string = ''.join(sp[-7:])
sp = json.loads(sp_json_string)

In [None]:
# Install Atlas Python client (https://github.com/wjohnson/pyapacheatlas)
!pip install pyapacheatlas

In [None]:
from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core import PurviewClient


oauth = ServicePrincipalAuthentication(
    tenant_id=os.environ.get("TENANT_ID", sp['tenant']),
    client_id=os.environ.get("CLIENT_ID", sp['appId']),
    client_secret=os.environ.get("CLIENT_SECRET", sp['password'])
)


In [None]:
# Instantiate PurviewClient 
client = PurviewClient(
    account_name = os.environ.get("PURVIEW_NAME", PURVIEW_NAME),
    authentication=oauth
)

In [None]:
from pyapacheatlas.core import AtlasEntity


# Create an entity
# You must provide a name, typeName, qualified_name, and guid
# the guid must be a negative number and unique in your batch
# being uploaded.

input01_qn = "pyapacheatlas://demoinputclassification01"
input02_qn = "pyapacheatlas://demoinputclassification02"
output01_qn = "pyapacheatlas://demooutput01"
dataset_type_name = "DataSet"

input01 = AtlasEntity(
    name="input01",
    typeName=dataset_type_name,
    qualified_name=input01_qn,
    guid=-100
)
input02 = AtlasEntity(
    name="input02",
    typeName=dataset_type_name,
    qualified_name=input02_qn,
    guid=-101
)
output01 = AtlasEntity(
    name="output01",
    typeName=dataset_type_name,
    qualified_name=output01_qn,
    guid=-102
)

results = client.upload_entities(
    batch=[input01, input02, output01]
)

After the AtlasEntities are created, you will be able to see these assets within the Purview portal.

![Azure Purview Browse Asset Page](./img/purview_browse_asset.png)

![Azure Purview Custom Asset Page](./img/purview_custom_assets.png)

In [None]:
# Get the Guids for us to work with
guids = [v for v in results["guidAssignments"].values()]

guids

In [None]:
from pyapacheatlas.core import AtlasClassification


# Classify one entity with multiple classifications
print(f"Adding multiple classifications to guid: {guids[0]}")
one_entity_multi_class = client.classify_entity(
    guid=guids[0], 
    classifications=[
        AtlasClassification("MICROSOFT.PERSONAL.DATE_OF_BIRTH"),
        AtlasClassification("MICROSOFT.PERSONAL.NAME")
        ],
    force_update=True
)
print(json.dumps(one_entity_multi_class, indent=2))

In [None]:
from pyapacheatlas.core import AtlasClassification
from pyapacheatlas.core.util import AtlasException


# Classify Multiple Entities with one classification
try:
    multi_entity_single_class = client.classify_bulk_entities(
        entityGuids=guids,
        classification=AtlasClassification("MICROSOFT.PERSONAL.IPADDRESS")
    )
    print(json.dumps(multi_entity_single_class, indent=2))
except AtlasException as e:
    print("One or more entities had the existing classification, so skipping it.")
    print(e)

After entities are classified, you can navigate to individual asset and explore its classifications within the Purview portal. 

![Azure Purview Classification](./img/purview_classification.png)

In [None]:
from pyapacheatlas.core import AtlasProcess


# The Atlas Process is the lineage component that links the two
# entities together. The inputs and outputs need to be the "header"
# version of the atlas entities, so specify minimum = True to
# return just guid, qualifiedName, and typeName.

process_qn = "pyapacheatlas://democustomprocess"
process_type_name = "Process"

process = AtlasProcess(
    name="sample process",
    typeName=process_type_name,
    qualified_name=process_qn,
    inputs=[input01, input02],
    outputs=[output01],
    guid=-103
)

# Convert the individual entities into json before uploading.
results = client.upload_entities(
    batch=[input01, input02, output01, process]
)

print(json.dumps(results, indent=2))

After the AtlasProcess is created, you can navigate to the `sample process` asset and explore its lineage.

![Azure Purview Lineage](./img/purview_lineage.png)

# Clean Up

In [None]:
# Deletes all entities

guid_assignment = results['guidAssignments']

for local_guid in guid_assignment:
    guid = guid_assignment[local_guid]
    _ = client.delete_entity(guid)