# OpenMetadata Demo

We can access OpenMetadata at http://localhost:8585 with the following credentials:
- Username: admin@open-metadata.org
- Password: admin

First, we will create a spark session with lineage tracking enabled. It may take a while since it will also download the required packages.

**Spark Agent is not working due to issues like https://github.com/open-metadata/openmetadata-spark-agent/pull/10 and https://github.com/open-metadata/openmetadata-spark-agent/pull/16.** 

**`spark-agent:1.0` has not been released to maven artifactory. The only thing we can do with `1.0-beta` is creating a pipeline.**

In [1]:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
import pyspark.sql.functions as f

jwt_token = "eyJraWQiOiJHYjM4OWEtOWY3Ni1nZGpzLWE5MmotMDI0MmJrOTQzNTYiLCJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZG1pbiIsImlzQm90IjpmYWxzZSwiaXNzIjoib3Blbi1tZXRhZGF0YS5vcmciLCJpYXQiOjE2NjM5Mzg0NjIsImVtYWlsIjoiYWRtaW5Ab3Blbm1ldGFkYXRhLm9yZyJ9.tS8um_5DKu7HgzGBzS1VTA5uUjKWOCU0B_j08WXBiEC0mr0zNREkqVfwFDD-d24HlNEbrqioLsBuFRiwIWKc1m_ZlVQbG7P36RUxhuv2vbSp80FKyNM-Tj93FDzq91jsyNmsQhyNv_fNr3TXfzzSPjHt8Go0FMMP66weoKMgW2PbXlhVKwEuXUHyakLLzewm9UMeQaEiRzhiTMU3UkLXcKbYEJJvfNFcLwSl9W8JCO_l0Yj3ud-qt_nQYEZwqW6u5nfdQllN133iikV4fM5QZsMCnm8Rq1mvLR0y9bmJiD7fwM1tmJ791TUWqmKaTnP49U493VanKpUAfzIiOiIbhg"
# Ref: https://docs.open-metadata.org/v1.5.x/connectors/ingestion/lineage/spark-lineage
spark = (SparkSession.builder
    .appName("OpenMetadata Demo")
    .config("spark.jars.packages", "org.open-metadata:openmetadata-spark-agent:1.0-beta")
    .config("spark.extraListeners", "org.openmetadata.spark.agent.OpenMetadataSparkListener")
    .config("spark.openmetadata.facets.disabled", "spark_unknown;spark.logicalPlan")
    .config("spark.openmetadata.transport.hostPort", "http://host.docker.internal:8585")
    .config("spark.openmetadata.transport.type", "openmetadata")
    .config("spark.openmetadata.transport.jwtToken", jwt_token)
    .config("spark.openmetadata.transport.pipelineServiceName", "jupyter_spark_service")
    .config("spark.openmetadata.transport.pipelineName", "jupyter_spark")
    .config("spark.openmetadata.transport.pipelineSourceUrl", "http://localhost:8888/lab/tree/notebooks/openmetadata_demo.ipynb")
    .config("spark.openmetadata.transport.pipelineDescription", "Jupyter Spark Pipeline")
    .config("spark.openmetadata.transport.databaseServiceNames", "random, local_mysql")
    .config("spark.openmetadata.transport.timeout", "30")
    .getOrCreate()
)

## Read datasets
Let's read sample product, customer and raw sales data.

In [2]:
input_dir = "/home/jovyan/data"
output_dir = "/home/jovyan/output"
product = spark.read.option("header", True).csv(f"{input_dir}/product")
customer = spark.read.option("header", True).csv(f"{input_dir}/customer")
sales_raw = spark.read.option("header", True).csv(f"{input_dir}/sales_raw")

## Generate new datasets

Let's create newa datasets for US customers and SG customers.

In [3]:
(customer
    .filter(f.col("country") == 'US')
    .drop(f.col("country"))
    .write.mode("overwrite")
    .option("header", True)
    .csv(f"{output_dir}/customer_us")
)

(customer
    .filter(f.col("country") == 'SG')
    .drop(f.col("country"))
    .write.mode("overwrite")
    .option("header", True)
    .csv(f"{output_dir}/customer_sg")
)

### Pipeline
The pipeline will be created at http://localhost:8585/pipeline/jupyter_spark_service.jupyter_spark.

![](demo_images/pipeline_1.png)


## Generate joined dataset

Next, we will check how lineage works when multiple sources are joined.

In [4]:
sales_report = (sales_raw
    .join(product, sales_raw.product_id == product.id)
    .join(customer, sales_raw.customer_id == customer.id)
    .select(
        customer["name"].alias("customer_name"),
        product["name"].alias("product_name"),
        sales_raw["qty"],
    )
)
sales_report.write.mode("overwrite").option("header", True).csv(f"{output_dir}/sales_report")
sales_report.show()

+-------------+-------------+---+
|customer_name| product_name|qty|
+-------------+-------------+---+
|        Alice|Awesome Apple|  1|
|          Bob|Awesome Apple| 10|
|          Bob|   Big Banana|  3|
+-------------+-------------+---+

