# FIT5202 Data processing for Big data

##  Activity: Assignment 2 Part B

### Task 3

##### Student ID: `31265154`
##### Student Name: `Vivekkumar Chaudhari`


### Table of Contents


* [1. Installing Map Library](#one)
    * [1.1 Installing "ipyleaflet"](#oneone)
    * [1.2 Enable ipyleaflet extension to display map on notebook](#onetwo)
    * [1.3 Restart kernel and Reload this notebook by refreshing webpage then run code](#onethree)
* [2. Consuming Kafka topic and showing data on map](#two)

<a class="anchor" name="one"></a>
## 1. Installing Map Library
<a class="anchor" name="oneone"></a>
### 1.1 Installing "ipyleaflet"

In [1]:
## Step 1: Installing required map ipyleaflet 
!pip install ipyleaflet

## Step 2: Enable ipyleaflet extension to display map on notebook 
!jupyter nbextension enable --py widgetsnbextension
!jupyter nbextension enable --py ipyleaflet



Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m
Enabling notebook extension jupyter-leaflet/extension...
      - Validating: [32mOK[0m


<a class="anchor" name="onetwo"></a>
### 1.2 Enable ipyleaflet extension to display map on notebook 

In [2]:
!jupyter nbextension enable --py widgetsnbextension
!jupyter nbextension enable --py ipyleaflet

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m
Enabling notebook extension jupyter-leaflet/extension...
      - Validating: [32mOK[0m


<a class="anchor" name="onethree"></a>
### 1.3 Restart kernel and Reload this notebook by refreshing webpage then run code. 
#### Note: If code does not show map then try to Restart jupyter notebook on ubuntu then run code.

<a class="anchor" name="two"></a>
## 2. Consuming Kafka topic and showing data on map

<h3>About map: &nbsp;</h3><p><span style="font-size: 15px;">It shows predictions on pedestrian count when it goes higher than 2000 at a given location at specific date-time.</span></p><p><strong>Markers on the map:</strong></p><p>Each marker on the map represent the sensor location with the following details:</p><p><strong><em>Sensor id</em></strong>: Sensor identification&nbsp;number</p><p><strong><em>Sensor name</em></strong>: based on sensor description</p><p><strong><em>Date</em></strong>: Prediction date in the format: YYYY-MM-DD</p><p><strong><em>Busiest Time</em></strong>: Peak hour of the Date</p><p><em><strong><u>Note</u></strong></em><em>: Click on a marker on the map to see details</em></p>

In [None]:
# importing required libraries
import ast
from time import sleep
from kafka import KafkaConsumer
import datetime as dt
from json import loads
from ipyleaflet import Map, Marker, MarkerCluster, ScaleControl, FullScreenControl
from ipywidgets import HTML
import pandas as pd

topic = 'Pedestrian_data_Prediction_Location'

def connect_kafka_consumer():
    _consumer = None
    try:
         _consumer = KafkaConsumer(topic,
                                   consumer_timeout_ms=20000, # stop iteration if no message after 20 sec
                                   auto_offset_reset='latest', # we want to consume latest available message
                                   bootstrap_servers=['localhost:9092'], # we are using same location and port set in task 2. 
                                   api_version=(0, 10))
    except Exception as ex:
        print('Exception while connecting Kafka')
        print(str(ex))
    finally:
        return _consumer

# Plotting real-time data on map using creating new markers every time 
def add_marker_on_map(imap, marker_cluster, new_markers):
    # Remove previous cluster layer to maintain cluster with new data
    imap.remove_layer(marker_cluster)
    marker_cluster = MarkerCluster(markers=tuple(new_markers))
    # Adding new cluster with updated/new data
    imap.add_layer(marker_cluster)
    return marker_cluster

# Retrive marker list based on:
# One marker represent: Busiest hour for a sensor in a given day 
def get_marker_list(df):
    marker_list = []
    df_grouped_max = df.loc[df.groupby(['sid', 'mdate'], sort=False)['prediction'].idxmax()]
    df_grouped_max = df_grouped_max.reset_index()
    for row in df_grouped_max.itertuples():
        location_alt = [float(getattr(row, 'latitude')), float(getattr(row, 'longitude'))]
        # creating popup text for marker on map.
        poupup_text = HTML()
        poupup_text.value = f'\
            <p>\
                Sensor id: <span style="color: rgb(226, 80, 65);">{getattr(row, "sid")}</span><br>\
                Sensor name: <span style="color: rgb(226, 80, 65);">{getattr(row, "sname")}</span><br>\
                Date: <span style="color: rgb(65, 168, 95);">{getattr(row, "mdate")} (Year-Month-Date)</span><br>\
                Busiest Time: <span style="color: rgb(147, 101, 184);">{getattr(row, "time")}:00 (24hr)</span>\
            </p>'
        # Creating marker for each sensor with busiest hour in each day.
        marker = Marker(location=location_alt, draggable=False)
        marker.popup = poupup_text
        marker_list.append(marker)
    return marker_list

# Consume message from kafka topic for further processing.
def consume_messages(consumer):
    try:
        pre_len = 0
        imap = None
        marker_cluster = None
        df = None
        # Waiting for messages
        for message in consumer:
            # Decoding message from bytes to string
            data = str(message.value.decode('utf-8'))
            # converting "value" string to dictionary object
            data = ast.literal_eval(data)
            location_alt = [float(data['latitude']), float(data['longitude'])]
            if not imap:
                df = pd.DataFrame(data, index=[i for i in range(len(data))])
                # Initializing the map with Scaling and Fullscreen controls
                imap = Map(center=location_alt, dragging=True, scroll_wheel_zoom=True, zoom=15)
                imap.add_control(ScaleControl(position='bottomleft'))
                imap.add_control(FullScreenControl())
                # Creating cluster of markers(locations) that display number on map for very near locations
                marker_cluster = MarkerCluster(markers=tuple(get_marker_list(df)))
                imap.add_layer(marker_cluster)
                display(imap)
            else:
                df = df.append(data, ignore_index=True)
                marker_list = get_marker_list(df)
                if(len(marker_list) - pre_len >= 10):
                    pre_len = len(marker_list)
                    marker_cluster = add_marker_on_map(imap, marker_cluster, marker_list)
    except Exception as ex:
        print(str(ex))
    
# Main execution flow
if __name__ == '__main__':
    consumer = connect_kafka_consumer()
    consume_messages(consumer)

Map(center=[-37.81380844116211, 144.9651641845703], controls=(ZoomControl(options=['position', 'zoom_in_text',…