# End to End Pure Streaming Data-Pipeline for Building Maintenance Table Using Spark Structured Streaming on Databricks

###### Description: In this notebook we read building_maintenance state rows from incoming csv files into a streamig dataframe, transform (clean, cast, rename) the data, add/update the latest state to a Databricks Delta table
###### Objective: (incoming csv files) --> "building_maintenance_streamingDF" --> "results_df" --> "building_maintenance_data"

In [0]:
import requests
import json
import optimus as op
import phonenumbers 
import re
import datetime
import time

from pyspark.sql.types import *
from pyspark.sql.functions import udf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext, Row
from pyspark.sql.functions import unix_timestamp, from_unixtime
from pyspark.sql import functions as F
from pyspark.sql.window import Window as W
from pyspark.sql import DataFrame
from pyspark.sql.functions import lit
from pyspark.sql.functions import rank, col

In [0]:
# Schema for Building Maintenance
building_maintenance_schema = StructType([
            StructField("Maintenance_id", IntegerType(), False),
            StructField("Building_name", StringType(), True),
            StructField("Ndate", TimestampType(), False),
            StructField("Issue_reported", StringType(), True),
            StructField("Contractor_id", IntegerType(), True), 
            StructField("Resolution", StringType(), True), 
            StructField("Status", StringType(), True),
            StructField("event_time", TimestampType(), True)])

building_maintenance_udf_schema = StructType([
            StructField("Building_name", StringType(), True),
            StructField("Ndate", TimestampType(), False),
            StructField("Issue_reported", StringType(), True),
            StructField("Contractor_id", IntegerType(), True), 
            StructField("Resolution", StringType(), True), 
            StructField("Status", StringType(), True),
            StructField("event_time", TimestampType(), True)])

###### Description: Get building_maintenance csv files as a streaming "building_maintenance_streamingDF" and process it on the fly and get transformed stream "building_maintenance_df"
###### Objective: (incoming csv files) --> "building_maintenance_streamingDF" --> "building_maintenance_df"

In [0]:
# Get building_maintenance Steaming DataFrame from csv files

# streaming starts here by reading the input files 
building_maintenance_Path = "/FileStore/apartment/building_maintenance/inprogress/"
building_maintenance_streamingDF = (
  spark
    .readStream
    .schema(building_maintenance_schema)
    .option("maxFilesPerTrigger", "1")
    .option("header", "true")
    .option("multiLine", "true")
    .csv(building_maintenance_Path)
)
# Clear invalid rows
building_maintenance_df = building_maintenance_streamingDF.select("*").where("Maintenance_id IS NOT NULL")
# Instantiation of DataTransformer class:
transformer = op.DataFrameTransformer(building_maintenance_df)
# Replace NA with 0's
transformer.replace_na(0.0, columns="*")
# Clear accents: clear_accents only from name column and not everywhere 
transformer.clear_accents(columns='*')
# Remove special characters:  From all Columns 
# transformer.remove_special_chars(columns=['building_maintenance_name', 'Address_line_1', 'City', 'Post_code', 'Region'])

##### This function parses the corresponding columns into a single column

In [0]:
def my_fun(Building_name, Ndate, Issue_reported, Contractor_id, Resolution, Status, event_time):
  return zip(Building_name, Ndate, Issue_reported, Contractor_id, Resolution, Status, event_time)

udf_Fun = udf(my_fun, ArrayType(building_maintenance_udf_schema))

In [0]:
intermediate_df = ( building_maintenance_df.withWatermark("event_time", "10 seconds")
            .groupBy("Maintenance_id")
            .agg(F.collect_list("Building_name").alias("Building_name"),
                 F.collect_list("Ndate").alias("Ndate"),
                 F.collect_list("Issue_reported").alias("Issue_reported"),
                 F.collect_list("Contractor_id").alias("Contractor_id"), 
                 F.collect_list("Resolution").alias("Resolution"), 
                 F.collect_list("Status").alias("Status"), 
                 F.collect_list("event_time").alias("event_time"), 
                 F.max("event_time").alias("latest_event_time"))
            .select("Maintenance_id", 
                    F.explode(udf_Fun(F.column("Building_name"), 
                                      F.column("Ndate"), 
                                      F.column("Issue_reported"), 
                                      F.column("Contractor_id"), 
                                      F.column("Resolution"), 
                                      F.column("Status"), 
                                      F.column("event_time")))
                    .alias("data"), "latest_event_time"))

##### Filter the data where event_time is latest

In [0]:
results_df = (intermediate_df
              .select("Maintenance_id", 
                      "data.Building_name", 
                      "data.Ndate", 
                      "data.Issue_reported", 
                      "data.Contractor_id", 
                      "data.Resolution", 
                      "data.Status",  
                      "data.event_time", 
                      "latest_event_time")
              .where("data.event_time=latest_event_time")).orderBy("Maintenance_id")

##### Display final result
###### This result shows the latest state of all the unique building_maintenance_id

In [0]:
display(results_df)

Maintenance_id,Building_name,Ndate,Issue_reported,Contractor_id,Resolution,Status,event_time,latest_event_time
1,Will-Krajcik,2017-01-27T13:27:19.000+0000,"Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede.",814,"Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede. Morbi porttitor lorem id ligula. Suspendisse ornare consequat lectus. In est risus, auctor sed, tristique in, tempus sit amet, sem. Fusce consequat. Nulla nisl. Nunc nisl. Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",Closed,2017-11-12T18:26:23.000+0000,2017-11-12T18:26:23.000+0000
2,Lang-Auer,2017-06-30T22:03:08.000+0000,Duis consequat dui nec nisi volutpat eleifend. Donec ut dolor. Morbi vel lectus in quam fringilla rhoncus.,851,"Quisque porta volutpat erat. Quisque erat eros, viverra eget, congue eget, semper rutrum, nulla. Nunc purus. Phasellus in felis. Donec semper sapien a libero. Nam dui. Proin leo odio, porttitor id, consequat in, consequat ut, nulla. Sed accumsan felis. Ut at dolor quis odio consequat varius.",Closed,2018-04-08T14:03:02.000+0000,2018-04-08T14:03:02.000+0000
3,"Rice, Cormier and Turcotte",2017-11-24T18:20:44.000+0000,"Maecenas leo odio, condimentum id, luctus nec, molestie sed, justo. Pellentesque viverra pede ac diam. Cras pellentesque volutpat dui.",542,"In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst.",Closed,2017-03-05T08:49:39.000+0000,2017-03-05T08:49:39.000+0000
4,Heller-Windler,2017-05-01T16:35:34.000+0000,"Fusce consequat. Nulla nisl. Nunc nisl. Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",699,"Proin leo odio, porttitor id, consequat in, consequat ut, nulla. Sed accumsan felis. Ut at dolor quis odio consequat varius.",Open,2016-11-30T08:20:07.000+0000,2016-11-30T08:20:07.000+0000
5,"Wisoky, Maggio and Parisian",2017-06-24T06:24:32.000+0000,"Sed sagittis. Nam congue, risus semper porta volutpat, quam pede lobortis ligula, sit amet eleifend pede libero quis orci. Nullam molestie nibh in lectus.",648,"Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede.",Closed,2018-07-17T01:04:16.000+0000,2018-07-17T01:04:16.000+0000
6,Reichert LLC,2017-01-26T13:00:11.000+0000,"Vestibulum ac est lacinia nisi venenatis tristique. Fusce congue, diam id ornare imperdiet, sapien urna pretium nisl, ut volutpat sapien arcu sed augue. Aliquam erat volutpat. In congue. Etiam justo. Etiam pretium iaculis justo. In hac habitasse platea dictumst. Etiam faucibus cursus urna. Ut tellus. Nulla ut erat id mauris vulputate elementum. Nullam varius. Nulla facilisi.",202,In congue. Etiam justo. Etiam pretium iaculis justo.,Assigned,2017-07-07T06:09:40.000+0000,2017-07-07T06:09:40.000+0000
7,"Cruickshank, Moore and Quitzon",2017-08-27T10:33:46.000+0000,"Morbi non lectus. Aliquam sit amet diam in magna bibendum imperdiet. Nullam orci pede, venenatis non, sodales sed, tincidunt eu, felis. Fusce posuere felis sed lacus. Morbi sem mauris, laoreet ut, rhoncus aliquet, pulvinar sed, nisl. Nunc rhoncus dui vel sem. Sed sagittis. Nam congue, risus semper porta volutpat, quam pede lobortis ligula, sit amet eleifend pede libero quis orci. Nullam molestie nibh in lectus.",238,"Duis bibendum. Morbi non quam nec dui luctus rutrum. Nulla tellus. In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst.",Assigned,2017-12-11T02:00:48.000+0000,2017-12-11T02:00:48.000+0000
8,"Gibson, Beahan and Roob",2017-03-14T15:54:57.000+0000,"Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus vestibulum sagittis sapien. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Etiam vel augue. Vestibulum rutrum rutrum neque. Aenean auctor gravida sem. Praesent id massa id nisl venenatis lacinia. Aenean sit amet justo. Morbi ut odio.",347,"Proin eu mi. Nulla ac enim. In tempor, turpis nec euismod scelerisque, quam turpis adipiscing lorem, vitae mattis nibh ligula nec sem. Duis aliquam convallis nunc. Proin at turpis a pede posuere nonummy. Integer non velit. Donec diam neque, vestibulum eget, vulputate ut, ultrices vel, augue. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec pharetra, magna vestibulum aliquet ultrices, erat tortor sollicitudin mi, sit amet lobortis sapien sapien non mi. Integer ac neque.",Closed,2018-05-11T13:58:57.000+0000,2018-05-11T13:58:57.000+0000
9,Littel Inc,2017-03-30T20:09:01.000+0000,"Morbi non lectus. Aliquam sit amet diam in magna bibendum imperdiet. Nullam orci pede, venenatis non, sodales sed, tincidunt eu, felis. Fusce posuere felis sed lacus. Morbi sem mauris, laoreet ut, rhoncus aliquet, pulvinar sed, nisl. Nunc rhoncus dui vel sem.",932,"In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst. Maecenas ut massa quis augue luctus tincidunt. Nulla mollis molestie lorem. Quisque ut erat. Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat.",Open,2017-09-27T13:00:54.000+0000,2017-09-27T13:00:54.000+0000
10,Zulauf Inc,2017-08-18T20:05:56.000+0000,"Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat.",669,"Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",Closed,2017-12-06T20:35:56.000+0000,2017-12-06T20:35:56.000+0000


##### Below cells are optional if external functionality or storage is needed

###### Write the stream to a Databricks Delta table for storage

In [0]:
streaming_query = (results_df.writeStream
 .format("delta")
 .outputMode("complete")
 .option("mergeSchema", "true")
 .option("checkpointLocation", "/delta/apartment/building_maintenance/_checkpoints/streaming-agg")
 .start("/delta/apartment/building_maintenance_data"))

#### Read the Delta Table as a Static or Streaming DataFrame
#### This dataframe will always be Up-To-Date

In [0]:
building_maintenance_data = spark.read.format("delta").load("/delta/apartment/building_maintenance_data").orderBy("Maintenance_id")

In [0]:
display(building_maintenance_data)

Maintenance_id,Building_name,Ndate,Issue_reported,Contractor_id,Resolution,Status,event_time,latest_event_time
1,Will-Krajcik,2017-01-27T13:27:19.000+0000,"Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede.",814,"Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede. Morbi porttitor lorem id ligula. Suspendisse ornare consequat lectus. In est risus, auctor sed, tristique in, tempus sit amet, sem. Fusce consequat. Nulla nisl. Nunc nisl. Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",Closed,2017-11-12T18:26:23.000+0000,2017-11-12T18:26:23.000+0000
2,Lang-Auer,2017-06-30T22:03:08.000+0000,Duis consequat dui nec nisi volutpat eleifend. Donec ut dolor. Morbi vel lectus in quam fringilla rhoncus.,851,"Quisque porta volutpat erat. Quisque erat eros, viverra eget, congue eget, semper rutrum, nulla. Nunc purus. Phasellus in felis. Donec semper sapien a libero. Nam dui. Proin leo odio, porttitor id, consequat in, consequat ut, nulla. Sed accumsan felis. Ut at dolor quis odio consequat varius.",Closed,2018-04-08T14:03:02.000+0000,2018-04-08T14:03:02.000+0000
3,"Rice, Cormier and Turcotte",2017-11-24T18:20:44.000+0000,"Maecenas leo odio, condimentum id, luctus nec, molestie sed, justo. Pellentesque viverra pede ac diam. Cras pellentesque volutpat dui.",542,"In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst.",Closed,2017-03-05T08:49:39.000+0000,2017-03-05T08:49:39.000+0000
4,Heller-Windler,2017-05-01T16:35:34.000+0000,"Fusce consequat. Nulla nisl. Nunc nisl. Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",699,"Proin leo odio, porttitor id, consequat in, consequat ut, nulla. Sed accumsan felis. Ut at dolor quis odio consequat varius.",Open,2016-11-30T08:20:07.000+0000,2016-11-30T08:20:07.000+0000
5,"Wisoky, Maggio and Parisian",2017-06-24T06:24:32.000+0000,"Sed sagittis. Nam congue, risus semper porta volutpat, quam pede lobortis ligula, sit amet eleifend pede libero quis orci. Nullam molestie nibh in lectus.",648,"Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat. Praesent blandit. Nam nulla. Integer pede justo, lacinia eget, tincidunt eget, tempus vel, pede.",Closed,2018-07-17T01:04:16.000+0000,2018-07-17T01:04:16.000+0000
6,Reichert LLC,2017-01-26T13:00:11.000+0000,"Vestibulum ac est lacinia nisi venenatis tristique. Fusce congue, diam id ornare imperdiet, sapien urna pretium nisl, ut volutpat sapien arcu sed augue. Aliquam erat volutpat. In congue. Etiam justo. Etiam pretium iaculis justo. In hac habitasse platea dictumst. Etiam faucibus cursus urna. Ut tellus. Nulla ut erat id mauris vulputate elementum. Nullam varius. Nulla facilisi.",202,In congue. Etiam justo. Etiam pretium iaculis justo.,Assigned,2017-07-07T06:09:40.000+0000,2017-07-07T06:09:40.000+0000
7,"Cruickshank, Moore and Quitzon",2017-08-27T10:33:46.000+0000,"Morbi non lectus. Aliquam sit amet diam in magna bibendum imperdiet. Nullam orci pede, venenatis non, sodales sed, tincidunt eu, felis. Fusce posuere felis sed lacus. Morbi sem mauris, laoreet ut, rhoncus aliquet, pulvinar sed, nisl. Nunc rhoncus dui vel sem. Sed sagittis. Nam congue, risus semper porta volutpat, quam pede lobortis ligula, sit amet eleifend pede libero quis orci. Nullam molestie nibh in lectus.",238,"Duis bibendum. Morbi non quam nec dui luctus rutrum. Nulla tellus. In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst.",Assigned,2017-12-11T02:00:48.000+0000,2017-12-11T02:00:48.000+0000
8,"Gibson, Beahan and Roob",2017-03-14T15:54:57.000+0000,"Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus vestibulum sagittis sapien. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Etiam vel augue. Vestibulum rutrum rutrum neque. Aenean auctor gravida sem. Praesent id massa id nisl venenatis lacinia. Aenean sit amet justo. Morbi ut odio.",347,"Proin eu mi. Nulla ac enim. In tempor, turpis nec euismod scelerisque, quam turpis adipiscing lorem, vitae mattis nibh ligula nec sem. Duis aliquam convallis nunc. Proin at turpis a pede posuere nonummy. Integer non velit. Donec diam neque, vestibulum eget, vulputate ut, ultrices vel, augue. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec pharetra, magna vestibulum aliquet ultrices, erat tortor sollicitudin mi, sit amet lobortis sapien sapien non mi. Integer ac neque.",Closed,2018-05-11T13:58:57.000+0000,2018-05-11T13:58:57.000+0000
9,Littel Inc,2017-03-30T20:09:01.000+0000,"Morbi non lectus. Aliquam sit amet diam in magna bibendum imperdiet. Nullam orci pede, venenatis non, sodales sed, tincidunt eu, felis. Fusce posuere felis sed lacus. Morbi sem mauris, laoreet ut, rhoncus aliquet, pulvinar sed, nisl. Nunc rhoncus dui vel sem.",932,"In sagittis dui vel nisl. Duis ac nibh. Fusce lacus purus, aliquet at, feugiat non, pretium quis, lectus. Suspendisse potenti. In eleifend quam a odio. In hac habitasse platea dictumst. Maecenas ut massa quis augue luctus tincidunt. Nulla mollis molestie lorem. Quisque ut erat. Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat.",Open,2017-09-27T13:00:54.000+0000,2017-09-27T13:00:54.000+0000
10,Zulauf Inc,2017-08-18T20:05:56.000+0000,"Curabitur gravida nisi at nibh. In hac habitasse platea dictumst. Aliquam augue quam, sollicitudin vitae, consectetuer eget, rutrum at, lorem. Integer tincidunt ante vel ipsum. Praesent blandit lacinia erat. Vestibulum sed magna at nunc commodo placerat.",669,"Duis bibendum, felis sed interdum venenatis, turpis enim blandit mi, in porttitor pede justo eu massa. Donec dapibus. Duis at velit eu est congue elementum.",Closed,2017-12-06T20:35:56.000+0000,2017-12-06T20:35:56.000+0000


### Do Some Live Streaming Graphs

In [0]:
building_maintenance_data_stream = spark.readStream.format("delta").load("/delta/apartment/building_maintenance_data")

In [0]:
display(building_maintenance_data_stream.groupBy("Status").count())

Status,count
Open,19
Closed,54
Assigned,27
