# INTRODUCTION
* In this kernel, we will learn how to use plotly.express and to compare it with plotly.graph_objs library. We can see that most graph can be done with express library with much simipler code.

    1. Plotly library: plotly.py is an interactive, open-source, and JavaScript-based graphing library for Python. Built on top of plotly.js, plotly.py is a high-level, declarative charting library that includes over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more.The ultimate responsibility of plotly.py is to produce Python dictionaries that can be serialized into a JSON data structure that represents a valid figure
    
    2. What is plotly.express and plotly.graph_objs?
        1. plotly.graph_objs provides a hierarchy of classes called "graph objects" that may be used to construct figures.
        2. Plotly Express is a terse, consistent, high-level wrapper around plotly.graph_objects for rapid data exploration and figure generation. Most plots are made with just one function call that accepts a **tidy Pandas data frame**, and a simple description of the plot you want to make.
        
<br>Content:
1. [Loading Data and Explanation of Features](#1)
1. [Line Charts](#2)
    1. Use plotly.graph_objs
        1. Draw single line
        2. Style line
        3. Draw multiple lines
    2. Use plotly.express
        1. Draw single line
        2. Style line
        3. Draw multiple lines
1. [Scatter Charts](#3)
    1. Use plotly.graph_objs draw scatter and line plot
    2. Use plotly.express 
        1. Scatter plot with color
        2. Scatter plot with facet
        3. Scatter Plot with categorical size and hover
        4. Scatter plot matrix
1. [Bar Charts](#4)
1. [Histogram](#5)
1. [Box Plot](#6)
1. [Heatmap](#7)
1. [3D Plot](#8)





# Install packages
* To install this package with conda run the following"
    * conda install -c plotly/label/test plotly

In [1]:
import sys
print(sys.executable)

/Users/anna/anaconda3/envs/bulb/bin/python


In [2]:
import numpy as np
import pandas as pd
import datetime as dt
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot


In [3]:
go.__path__

['/Users/anna/anaconda3/envs/bulb/lib/python3.7/site-packages/plotly/graph_objs']

In [7]:
import plotly.express as px

<a id="1"></a> <br>
# Loading Data and Data preprocessing

In [8]:
#Loading data
df = pd.read_csv('/Users/anna/Desktop/project/crimes-in-boston/crime.csv',header=0,encoding = 'unicode_escape')

In [9]:
#This dataset contains 14 columns and their datatype is listed below
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 319073 entries, 0 to 319072
Data columns (total 17 columns):
INCIDENT_NUMBER        319073 non-null object
OFFENSE_CODE           319073 non-null int64
OFFENSE_CODE_GROUP     319073 non-null object
OFFENSE_DESCRIPTION    319073 non-null object
DISTRICT               317308 non-null object
REPORTING_AREA         319073 non-null object
SHOOTING               1019 non-null object
OCCURRED_ON_DATE       319073 non-null object
YEAR                   319073 non-null int64
MONTH                  319073 non-null int64
DAY_OF_WEEK            319073 non-null object
HOUR                   319073 non-null int64
UCR_PART               318983 non-null object
STREET                 308202 non-null object
Lat                    299074 non-null float64
Long                   299074 non-null float64
Location               319073 non-null object
dtypes: float64(2), int64(4), object(11)
memory usage: 41.4+ MB


In [10]:
df.head(5)

Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,I182070945,619,Larceny,LARCENY ALL OTHERS,D14,808,,2018-09-02 13:00:00,2018,9,Sunday,13,Part One,LINCOLN ST,42.357791,-71.139371,"(42.35779134, -71.13937053)"
1,I182070943,1402,Vandalism,VANDALISM,C11,347,,2018-08-21 00:00:00,2018,8,Tuesday,0,Part Two,HECLA ST,42.306821,-71.0603,"(42.30682138, -71.06030035)"
2,I182070941,3410,Towed,TOWED MOTOR VEHICLE,D4,151,,2018-09-03 19:27:00,2018,9,Monday,19,Part Three,CAZENOVE ST,42.346589,-71.072429,"(42.34658879, -71.07242943)"
3,I182070940,3114,Investigate Property,INVESTIGATE PROPERTY,D4,272,,2018-09-03 21:16:00,2018,9,Monday,21,Part Three,NEWCOMB ST,42.334182,-71.078664,"(42.33418175, -71.07866441)"
4,I182070938,3114,Investigate Property,INVESTIGATE PROPERTY,B3,421,,2018-09-03 21:05:00,2018,9,Monday,21,Part Three,DELHI ST,42.275365,-71.090361,"(42.27536542, -71.09036101)"


In [11]:
#checking missing and empty data in the dataframe
df =df.replace('',np.nan)
df.isnull().sum()

INCIDENT_NUMBER             0
OFFENSE_CODE                0
OFFENSE_CODE_GROUP          0
OFFENSE_DESCRIPTION         0
DISTRICT                 1765
REPORTING_AREA              0
SHOOTING               318054
OCCURRED_ON_DATE            0
YEAR                        0
MONTH                       0
DAY_OF_WEEK                 0
HOUR                        0
UCR_PART                   90
STREET                  10871
Lat                     19999
Long                    19999
Location                    0
dtype: int64

<a id="2"></a> <br>
# Line charts

In [12]:
#Total number of crimes
#There are total 282517 crimes in df
df['INCIDENT_NUMBER'].nunique()

282517

## Using plotly.graph_objs library

#### Draw single line

In [13]:
month_crime = df.groupby('MONTH')['INCIDENT_NUMBER']\
    .nunique()\
    .apply(lambda x: round(x/282517,2))\
    .reset_index()

In [55]:
# Creating trace1
trace1 = go.Scatter(
                    x = month_crime.MONTH,
                    y = month_crime.INCIDENT_NUMBER,
                    mode = "lines")
data = [trace1]

layout = dict(title = 'Percentage of crimes across a year',
              width=800,
              height=500,
              xaxis= dict(title= 'MONTH',ticklen= 1,zeroline= False),
              yaxis= dict(title= 'Percentage of cimes')
             )
fig = dict(data = data, layout = layout)
iplot(fig)

In [11]:
# plotly.offline.iplot(fig, filename='linePlot')
# plotly.offline.plot(fig, include_plotlyjs=False, output_type='div')

#### Style Line Chart

In [15]:
hour_crime= df.groupby('HOUR')['INCIDENT_NUMBER']\
    .nunique()\
    .apply(lambda x: round(x/282517,2))\
    .reset_index()

In [54]:
# Creating trace1
trace1 = go.Scatter(
                    x = hour_crime.HOUR,
                    y = hour_crime.INCIDENT_NUMBER,
                    mode = "lines",
                    line=dict(color='firebrick', width=4,dash='dash'))
data = [trace1]
layout = dict(title = 'Probability of crimes across a day',
              width=800,
              height=500,
              xaxis= dict(title= 'HOUR',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Probability of cimes')
             )
## dash options include 'dash', 'dot', and 'dashdot'
fig = dict(data = data, layout = layout)
iplot(fig)

#### Draw multiple lines

In [17]:
crimei_TimePerYear = df.groupby(['YEAR','HOUR'])['INCIDENT_NUMBER']\
                    .nunique()\
                    .groupby('YEAR')\
                    .apply(lambda x: x/x.sum())\
                    .reset_index()

In [18]:
crimei_TimePerYear.groupby('YEAR').size()

YEAR
2015    24
2016    24
2017    24
2018    24
dtype: int64

In [19]:
x = c['HOUR']
crime_2015=crimei_TimePerYear[crimei_TimePerYear['YEAR']==2015]['INCIDENT_NUMBER']
crime_2016=crimei_TimePerYear[crimei_TimePerYear['YEAR']==2016]['INCIDENT_NUMBER']
crime_2017=crimei_TimePerYear[crimei_TimePerYear['YEAR']==2017]['INCIDENT_NUMBER']
crime_2018=crimei_TimePerYear[crimei_TimePerYear['YEAR']==2018]['INCIDENT_NUMBER']

In [53]:
year_2015 = go.Scatter(
                    x = x,
                    y = crime_2015,
                    name="Crime in 2015",
                    mode = "lines",
                    line=dict(color='firebrick', width=4,dash='dash'))
year_2016 = go.Scatter(
                    x = x,
                    y = crime_2016,
                    name="Crime in 2016",
                    mode = "lines",
                    line=dict(color='blue', width=4,dash='dot'))
year_2017 = go.Scatter(
                    x = x,
                    y = crime_2017,
                    name="Crime in 2017",
                    mode = "lines",
                    line=dict(color='pink', width=4,dash='dashdot'))
year_2018 = go.Scatter(
                    x = x,
                    y = crime_2018,
                    name="Crime in 2018",
                    mode = "lines",
                    line=dict(color='purple', width=4,dash='dash'))

data = [year_2015,year_2016,year_2017,year_2018]

layout = dict(title = 'Percentage of crimes across a day per year',
              width=800,
              height=500,
              xaxis= dict(title= 'HOUR',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Percentage of cimes'))

fig = dict(data = data, layout = layout)
iplot(fig)

## Using plotly.express library

In [21]:
hour_crime.head()

Unnamed: 0,HOUR,INCIDENT_NUMBER
0,0,0.05
1,1,0.03
2,2,0.02
3,3,0.01
4,4,0.01


In [60]:
fig = px.line(hour_crime, x="HOUR", y="INCIDENT_NUMBER", title='Probability of crimes across a day')
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

In [68]:
crimei_TimePerYear=crimei_TimePerYear.rename({'INCIDENT_NUMBER':'Percentage'},axis=1)

In [69]:
fig = px.line(crimei_TimePerYear, x="HOUR", y='Percentage',color='YEAR',
             title='Percentage of crimes across a day for each year')
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

<a id="3"></a> <br>
# Scatter Plot
* Use mode argument to choose between markers, lines, or a combination of both. 

## Line and scatter plot using plotly.graph_objs library
* go.Scatter can be used both for plotting points (makers) or lines, depending on the value of mode. The different options of go.Scatter are documented in its reference page.


In [25]:
df_shotting = df.dropna(subset=['SHOOTING'])

In [26]:
hour_shotting= df_shotting.groupby('HOUR')['INCIDENT_NUMBER']\
        .nunique()\
        .reset_index()

In [52]:
# Creating trace1
trace1 = go.Scatter(
                    x = hour_shotting.HOUR,
                    y = hour_shotting.INCIDENT_NUMBER,
                    mode = "lines + markers") 
# dash options include 'dash', 'dot', and 'dashdot'

data = [trace1]

layout = dict(title = 'Number of shotting across a day',
              width=800,
              height=500,
              xaxis= dict(title= 'HOUR',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Number of shotting'))
fig = dict(data = data, layout = layout)
iplot(fig)

## Using plotly.express


#### Scatter plot with color

In [101]:
shotting_timeDist = df_shotting.groupby(['DISTRICT','HOUR'])['INCIDENT_NUMBER']\
                    .nunique()\
                    .reset_index()

In [102]:
fig = px.scatter(shotting_timeDist, x="HOUR", y="INCIDENT_NUMBER", color='DISTRICT',
                title='Number of shotting for each district at different time')

fig.update_layout(
    width=800,
    height=500,
)
fig.show()

#### Scatter plot with facet

In [31]:
df.head()

Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,I182070945,619,Larceny,LARCENY ALL OTHERS,D14,808,,2018-09-02 13:00:00,2018,9,Sunday,13,Part One,LINCOLN ST,42.357791,-71.139371,"(42.35779134, -71.13937053)"
1,I182070943,1402,Vandalism,VANDALISM,C11,347,,2018-08-21 00:00:00,2018,8,Tuesday,0,Part Two,HECLA ST,42.306821,-71.0603,"(42.30682138, -71.06030035)"
2,I182070941,3410,Towed,TOWED MOTOR VEHICLE,D4,151,,2018-09-03 19:27:00,2018,9,Monday,19,Part Three,CAZENOVE ST,42.346589,-71.072429,"(42.34658879, -71.07242943)"
3,I182070940,3114,Investigate Property,INVESTIGATE PROPERTY,D4,272,,2018-09-03 21:16:00,2018,9,Monday,21,Part Three,NEWCOMB ST,42.334182,-71.078664,"(42.33418175, -71.07866441)"
4,I182070938,3114,Investigate Property,INVESTIGATE PROPERTY,B3,421,,2018-09-03 21:05:00,2018,9,Monday,21,Part Three,DELHI ST,42.275365,-71.090361,"(42.27536542, -71.09036101)"


In [32]:
crime_facet = df.groupby(['DAY_OF_WEEK','HOUR'])['INCIDENT_NUMBER']\
  .size()\
  .groupby('DAY_OF_WEEK')\
  .apply(lambda x: x/x.sum())\
  .reset_index()

In [33]:
crime_facet = crime_facet.rename({'DAY_OF_WEEK':'Day','INCIDENT_NUMBER':'Perc'},axis=1)

In [34]:
crime_facet['Day'] = crime_facet['Day'].apply(lambda x:x[0:3])

In [235]:
fig = px.scatter(crime_facet, x="HOUR", y="Perc", facet_row="Day", title='Percentage of crime at different time',
           color_continuous_scale=px.colors.sequential.Viridis, render_mode="webgl")
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

#### Scatter Plot with categorical size and hover

In [65]:
fig = px.scatter(crime_facet, x="HOUR", y="Perc", color="Day",size='Perc',title='Percentage of crime at different time',
                 hover_name='Day',size_max=20,
           color_continuous_scale=px.colors.sequential.Viridis, render_mode="webgl")
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

#### Scatter plot with matrix

In [37]:
crime_matrix = df.groupby(['SHOOTING','YEAR','DAY_OF_WEEK','HOUR'])['INCIDENT_NUMBER']\
                      .nunique()\
                      .groupby(['YEAR','DAY_OF_WEEK'])\
                      .apply(lambda x: x/x.sum())\
                      .reset_index()
crime_matrix = crime_matrix.rename({'DAY_OF_WEEK':'Day','INCIDENT_NUMBER':'Perc'},axis=1)
crime_matrix['Day'] = crime_matrix['Day'].apply(lambda x:x[0:3])

In [38]:

fig = px.scatter(crime_matrix, x="HOUR", y="Perc", facet_row="Day", facet_col="YEAR", color="SHOOTING", trendline="ols",
          category_orders={"Day": ["Mon", "Tue", "Wen", "Thu","Fri","Sat","Sun"], "YEAR": [2015, 2016,2017,2018]})
fig.show()


Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.



<a id="4"></a> <br>
# Bar chart
* presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent 

#### Simple bar chart

In [91]:
df_bar2 = df.groupby(['YEAR','UCR_PART'])['INCIDENT_NUMBER'].nunique().reset_index()

In [92]:
df_bar2.head()

Unnamed: 0,YEAR,UCR_PART,INCIDENT_NUMBER
0,2015,Other,164
1,2015,Part One,11753
2,2015,Part Three,22483
3,2015,Part Two,15337
4,2016,Other,365


In [152]:
fig = px.bar(df_bar, x="UCR_PART", y="INCIDENT_NUMBER", color="DISTRICT", barmode="group",
            title='Number of crime for different parts and district')
fig.update_layout(
    width=800,
    height=500,
)
fig.show()
# can add facet_col or facet_row 

<a id="5"></a> <br>
# Histogram
* presents distribution of continuous variable

In [144]:
shotting_timeDist.head()

Unnamed: 0,DISTRICT,HOUR,INCIDENT_NUMBER
0,A1,0,1
1,A1,2,2
2,A1,3,1
3,A1,6,1
4,A1,16,1


In [153]:
fig = px.histogram(shotting_timeDist, x="HOUR", y="INCIDENT_NUMBER",histfunc="avg", barmode="group",nbins=70
                  ,title='Distibution of the number of crimes across a day')
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

<a id="6"></a> <br>
# Boxplot
* presents distribution of continuous variable for different groups

In [161]:
df_bar2.head()

Unnamed: 0,YEAR,DISTRICT,UCR_PART,INCIDENT_NUMBER
0,2015,A1,Other,12
1,2015,A1,Part One,1524
2,2015,A1,Part Three,2384
3,2015,A1,Part Two,1602
4,2015,A15,Other,4


In [228]:
fig = px.box(df_bar2, x="YEAR", y="INCIDENT_NUMBER", color="UCR_PART", notched=True)
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

<a id="7"></a> <br>
# Heatmap

In [231]:
crime_matrix.head()

Unnamed: 0,SHOOTING,YEAR,Day,HOUR,Perc
0,Y,2015,Fri,1,0.214286
1,Y,2015,Fri,2,0.071429
2,Y,2015,Fri,16,0.071429
3,Y,2015,Fri,17,0.142857
4,Y,2015,Fri,18,0.071429


In [234]:
fig = px.density_heatmap(crime_matrix, x="HOUR", y="YEAR", marginal_x="rug", marginal_y="histogram")
fig.update_layout(
    width=800,
    height=500,
)
fig.show()

<a id="8"></a> <br>
# 3D

In [227]:
fig = px.scatter_3d(df_bar2, x="INCIDENT_NUMBER", y="YEAR",z="UCR_PART",color='DISTRICT')
fig.show()