### Task 2.2 Data Visualisation

In this task, you will implement a program to visualize the joined streaming data. For the incoming camera event(s), 
* plot the number of violation against arrival time. You need to label some interesting points such as maximum and minimum values. 
* In addition to that, plot the speed against arrival time. You need to include some interesting points such as average and maximum values.

For visualization on the data stored in the database, you have to plot a map using camera location. On the map, annotate
* number of violations between the checkpoints
* identify hotspot (e.g. when number of violations exceed certain threshold within a time in a day)

Explain and justify the plots and the inclusion of the interesting points. Set your own threshold for the hotspot.

If you are running this task in a separate Jupyter notebook file, save the file as **xxx_assignment02_visualisation.ipynb**, where **xxx** represents the student IDs of the group members.

In [None]:
import matplotlib.pyplot as plt
%matplotlib notebook


In [None]:
#read data
hostip = "192.168.0.21"

DB_NAME     = "awas_db"


from pymongo import MongoClient, ASCENDING, HASHED
from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col, split, element_at, when, from_json, expr, unix_timestamp
from pyspark.sql.types import StructType, StringType, IntegerType, DoubleType, TimestampType, StructField
from pyspark.sql.streaming.state import GroupState, GroupStateTimeout
import os
os.environ["PYSPARK_SUBMIT_ARGS"] = (
    "--packages "
    "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0,"
    "org.apache.spark:spark-streaming-kafka-0-10_2.12:3.5.0,"
    "org.mongodb.spark:mongo-spark-connector_2.12:10.3.0 "
    "pyspark-shell"
)

spark = SparkSession.builder \
    .appName("AWAS-Speed-Enforcement") \
    .master("local[*]") \
    .config("spark.mongodb.read.connection.uri", f"mongodb://{hostip}:27017/{DB_NAME}") \
    .config("spark.mongodb.write.connection.uri", f"mongodb://{hostip}:27017/{DB_NAME}") \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

eventSchema = (StructType()
    .add("event_id", StringType())
    .add("car_plate", StringType()) 
    .add("camera_id", IntegerType()) 
    .add("timestamp", TimestampType()) 
    .add("speed_reading", DoubleType()) 
    .add("producer", StringType()))

df = spark.readStream.schema(eventSchema).format("json").load("path_to_incoming_joined_stream")



In [None]:
# No of violation vs Arrival time
def visualisation():
    try:
        data = df.select(
            col("timestamp").alias("arrival_time"),
            when(col("speed_reading") > 60, 1).otherwise(0).alias("violation")
        ).groupBy("arrival_time").sum("violation").orderBy("arrival_time")
        data = data.toPandas() 
        # data.set_index("arrival_time", inplace=True)  # Set arrival_time as index   
        # data.columns = ["num_violations"]  # Rename column for clarity
        # data = data.reset_index()  # Reset index to have arrival_time as a column
        # # Plotting
        # plt.close('all')  # Close any existing plots
        # plt.style.use('ggplot')  # Use ggplot style for better aesthetics
        # # Create a new figure with specified size

        width = 9.5
        height = 6
        fig = plt.figure(figsize=(width, height)) # new figure
        ax = fig.add_subplot(111)  # add subplot axes
        fig.suptitle("Number of Violations vs Arrival Time")
        ax.set_xlabel("Arrival Time (in seconds)")
        ax.set_ylabel("Number of Violations")
        
