<center><img src = 'https://static.bokeh.org/branding/logos/bokeh-logo.svg' width = 200>
<center> 

<html>
   <head>
      <title>HTML Document</title>
   </head>

   <body>
      <h1 style="text-align:center">Tutorial</h1>
      <p>Learn for free</p>
   </body>
</html>

#### What is Bokeh?

Bokeh is a library for creating interactive data visualizations in a web browser. It offers a concise, human-readable syntax, which allows for rapidly presenting data in an aesthetically pleasing manner.

#### Matplotlib vs Bokeh

Matplotlib creates static graphics that are useful for quick and simple visualizations, or for creating publication quality images. Bokeh creates visualizations for display on the web (whether locally or embedded in a webpage) and most importantly, the visualizations are meant to be highly interactive. Matplotlib does not offer either of these features.

#### Basic Plot

In [1]:
from bokeh.plotting import figure, output_file, show, output_notebook

To implement and use Bokeh, we first import some basics functions that we need from the `bokeh.plotting` module.

1. `figure` is the core object that we will use to create plots. It also handles the styling of plots, including title, labels, axes, and grids, and it exposes methods for adding data to the plot. 
2. The `output_file` function defines how the visualization will be rendered (namely to an html file)
3. The `show` function will be invoked when the plot is ready for output. `show` tells Bokeh that all of the data has been added to the plot and it is time to render it.
4. An alternative output function to be aware of is `output_notebook` which is used to show plots in-line in a Jupyter Notebook.

Next, we create some data to plot. Data in Bokeh can take on different forms, but at its simplest, data is just a list of values.
x is a list of values of x-axis and y is a list for y-axis.

In [2]:
x = [1, 3, 5, 7]
y = [2, 4, 6, 8]

First, we instantiate a figure and add the data to it. p is a common variable name for a figure object, since a figure is a type of plot. After instantiating the figure, we call the circle , line, and triangle methods to plot our data. These types of methods are known as a *glyph* method. The term *glyph* in Bokeh refers to the lines, circles, bars, and other shapes that are added to plots to display data.

In [3]:
p = figure()

p.circle(x, y, size=10, color='red', legend_label='circle')
p.line(x, y, color='blue', legend_label='line')
p.triangle(y, x, color='gold', size=10, legend_label='triangle')
output_notebook()
show(p)

### Bokeh and Pandas

Now, we will see how to create a bokeh plot with a real-world data.

**The WWII THOR Dataset**

The Theater History of Operations Reports (THOR) lists aerial bombing operations during World War I, World War II, the Korean War, and the Vietnam War undertaken by the United States and Allied Powers. Each row in the THOR dataset contains information on a single mission or bombing run. This information can include the mission date, takeoff and target locations, the target type, aircraft involved, and the types and weights of bombs dropped on the target. The THOR [data dictionary](https://data.world/datamil/thor-data-dictionary) provides detailed information on the structure of the dataset.

In [4]:
import pandas as pd
import os
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool

Here, we import Pandas and the ColumnDataSource object from bokeh.models. We’re also going to expand our knowledge of interactions in this example by adding a hover feature that is facilitated by the HoverTool. 

Note: The figure object and basic functions from bokeh.plotting have already been imported.

In [5]:
# Read the THOR dataset
df = pd.read_csv('https://query.data.world/s/mp3rupjnvdzmyq4nfbsojbqn354phd',nrows = 50000)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 62 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   WWII_ID                   50000 non-null  int64  
 1   MASTER_INDEX_NUMBER       48872 non-null  float64
 2   MSNDATE                   50000 non-null  object 
 3   THEATER                   48566 non-null  object 
 4   NAF                       33459 non-null  object 
 5   COUNTRY_FLYING_MISSION    33459 non-null  object 
 6   TGT_COUNTRY_CODE          35514 non-null  float64
 7   TGT_COUNTRY               49865 non-null  object 
 8   TGT_LOCATION              49615 non-null  object 
 9   TGT_TYPE                  41084 non-null  object 
 10  TGT_ID                    35300 non-null  float64
 11  TGT_INDUSTRY_CODE         35362 non-null  float64
 12  TGT_INDUSTRY              34802 non-null  object 
 13  SOURCE_LATITUDE           46803 non-null  object 
 14  SOURCE

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,WWII_ID,MASTER_INDEX_NUMBER,MSNDATE,THEATER,NAF,COUNTRY_FLYING_MISSION,TGT_COUNTRY_CODE,TGT_COUNTRY,TGT_LOCATION,TGT_TYPE,...,CALLSIGN,ROUNDS_AMMO,SPARES_RETURN_AC,WX_FAIL_AC,MECH_FAIL_AC,MISC_FAIL_AC,TARGET_COMMENT,MISSION_COMMENTS,SOURCE,DATABASE_EDIT_COMMENTS
0,1,,8/15/1943,MTO,12 AF,USA,13.0,ITALY,SPADAFORA,,...,,,,,,,,,,
1,4285,20028.0,2/20/1945,PTO,5 AF,USA,,PHILIPPINE ISLANDS,PUERTA PRINCESA,UNIDENTIFIED TARGET,...,,,,,,,,,,
2,3,,8/15/1943,MTO,12 AF,USA,13.0,ITALY,COSENZA,,...,,,,,,,,,,
3,4,,8/15/1943,MTO,12 AF,USA,13.0,ITALY,GIOJA TAURO,,...,,,,,,1.0,,,,
4,8167,14639.0,2/23/1945,PTO,5 AF,USA,,PHILIPPINE ISLANDS,BALETE PASS,WOODED AREA,...,,,,,,,,,,


We just want to use a sample of this dataset to plot instead of all the records. We randomly sample 50 rows and then pass this sample to the `ColumnDataSource` constructor and store this in a variable called source.

In [6]:
sample = df.sample(50)
source = ColumnDataSource(sample)

Next, we create our figure object and call the circle glyph method to plot our data. This is where the source variable that holds our ColumnDataSource comes into play. It’s passed as our source argument to the glyph method and the column names holding the number of attacking aircraft (AC_ATTACKING) and tons of munitions dropped (TOTAL_TONS) are passed as our x and y arguments.

In [7]:
p = figure()
p.circle(x='TOTAL_TONS', y='AC_ATTACKING',
         source=source,
         size=10, color='green')
output_notebook()
show(p)

Next, we add a title and label our axes

In [8]:
p.title.text = 'Attacking Aircraft and Munitions Dropped'
p.xaxis.axis_label = 'Tons of Munitions Dropped'
p.yaxis.axis_label = 'Number of Attacking Aircraft'
output_notebook()
show(p)

HoverTool allows you to set a tooltips property which takes a list of tuples. The first part of the tuple is a display name and the second is a column name from your `ColumnDataSource` prefaced with @. Once we’ve instantiated this tool, we add it to the plot using the add_tool method. We’ll see how this looks in a moment.

In [9]:
hover = HoverTool()
hover.tooltips=[
    ('Attack Date', '@MSNDATE'),
    ('Attacking Aircraft', '@AC_ATTACKING'),
    ('Tons of Munitions', '@TOTAL_TONS'),
    ('Type of Aircraft', '@AIRCRAFT_NAME')
]

p.add_tools(hover)
output_notebook()
show(p)

### Categorical Data and Bar Charts: Munitions Dropped by Country

In this section, we’ll learn how to use categorical data as our x-axis values in Bokeh and how to use the `vbar` glyph method to create a vertical bar chart (an `hbar` glyph method functions similarly to create a horizontal bar chart). In addition, we’ll learn about preparing categorical data in Pandas by grouping data. Further, we’ll add to our knowledge of Bokeh styling and the hover tool.

To work through this information, we’ll create a bar chart that shows the total tons of munitions dropped by each country listed in our csv.

First, we group the records of individuals missions to one record per attacking country with the total munitions dropped. To plot this data, we convert the data to kilotons by dividing by 1000.

In [10]:
grouped = df.groupby('COUNTRY_FLYING_MISSION')[['TOTAL_TONS', 'TONS_OF_HE', 'TONS_OF_IC', 'TONS_OF_FRAG']].sum()
grouped = grouped / 1000

In [11]:
grouped.head()

Unnamed: 0_level_0,TOTAL_TONS,TONS_OF_HE,TONS_OF_IC,TONS_OF_FRAG
COUNTRY_FLYING_MISSION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AUSTRALIA,0.00707,0.00695,0.00012,0.0
GREAT BRITAIN,325.032458,235.5785,73.96455,0.822
NEW ZEALAND,0.282185,0.713135,0.044,0.0
SOUTH AFRICA,0.0,0.0,0.0,0.0
USA,546.481175,461.052175,40.867,44.429


Now, we need to make a `ColumnDataSource` from our grouped data and create a `figure`. Since our x-axis will list the five countries (rather than numerical data) we need to tell the figure how to handle the x-axis.

To do this, we create a list of countries from our source object, using `source.data` and the column name as key. The list of countries is then passed as the x_range to our `figure` constructor. Because this is a list of text data, the figure knows the x-axis is categorical and it also knows what possible values our x range can take (i.e. AUSTRALIA, GREAT BRITAIN, etc.).

In [12]:
source = ColumnDataSource(grouped)
countries = source.data['COUNTRY_FLYING_MISSION'].tolist()
p = figure(x_range=countries)

We also make two new imports: Spectral5 is a pre-made five color pallette, one of Bokeh’s many [pre-made color palettes](https://bokeh.pydata.org/en/latest/docs/reference/palettes.html), and `factor_cmap` is a helper method for mapping colors to bars in a bar-charts.

Now we plot our data as individually colored bars and add basic labels. To color our bars we use the factor_cmap helper function. This creates a special color map that matches an individual color to each category (i.e. what Bokeh calls a `factor`). The color map is then passed as the color argument to our `vbar` glyph method.

For the data in our glyph method, we’re passing a source and again referencing column names. Instead of using a `y` parameter, however, the `vbar` method takes a `top` parameter. A `bottom` parameter can equally be specified, but if left out, its default value is 0.

In [13]:
from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap
color_map = factor_cmap(field_name='COUNTRY_FLYING_MISSION',
                    palette=Spectral5, factors=countries)

p.vbar(x='COUNTRY_FLYING_MISSION', top='TOTAL_TONS', source=source, width=0.70, color= color_map)

p.title.text ='Munitions Dropped by Allied Country'
p.xaxis.axis_label = 'Country'
p.yaxis.axis_label = 'Kilotons of Munitions'
output_notebook()
show(p)

In [14]:
hover = HoverTool()
hover.tooltips = [
    ("Totals", "@TONS_OF_HE High Explosive / @TONS_OF_IC Incendiary / @TONS_OF_FRAG Fragmentation")]

hover.mode = 'vline'

p.add_tools(hover)
output_notebook()
show(p)

### Stacked Bar Charts and Sub-sampling Data: Types of Munitions Dropped by Country

Because the previous plot shows that the USA and Great Britain account for the overwhelming majority of bombings, we now focus on these two countries and learn how to make a stacked bar chart that shows the types of munitions each country used.

Since the x-axis is again categorical, we’ll need to group and aggregate our data. This time, though, we need to exclude any records hat don’t have a COUNTRY_FLYING_MISSION with a value of GREAT BRITAIN or USA. To do that, we filter our dataframe.

In [15]:
filter = df['COUNTRY_FLYING_MISSION'].isin(('USA','GREAT BRITAIN'))
df = df[filter]

Now that we have reduced the dataframe to show only records for the USA and Great Britain, we group our data with groupby and aggregate the three columns that hold bomb types with sum.

In [16]:
grouped = df.groupby('COUNTRY_FLYING_MISSION')[['TONS_OF_IC', 'TONS_OF_FRAG', 'TONS_OF_HE']].sum()

#convert tons to kilotons again
grouped = grouped / 1000

source = ColumnDataSource(grouped)
countries = source.data['COUNTRY_FLYING_MISSION'].tolist()
p = figure(x_range=countries)

To create the stacked bar chart, we call the `vbar_stack` glyph method. Rather than passing a single column name to a `y` parameter, we instead pass a list of column names as `stackers`. The order of this list determines the order that the columns will be stacked from bottom to top (after you’ve worked through this example, try switching the column order to see what happens). The `legend` argument supplies text for each stacker and the `Spectral3` palette provides colors for each stacker. Then we add basic styling and labeling, and finally output the plot.

In [17]:
from bokeh.palettes import Spectral3
p.vbar_stack(stackers=['TONS_OF_HE', 'TONS_OF_FRAG', 'TONS_OF_IC'],
             x='COUNTRY_FLYING_MISSION', source=source,
             legend_label = ['High Explosive', 'Fragmentation', 'Incendiary'],
             width=0.5, color=Spectral3)

p.title.text ='Types of Munitions Dropped by Allied Country'
p.legend.location = 'top_left'

p.xaxis.axis_label = 'Country'
p.xgrid.grid_line_color = None  #remove the x grid lines
p.y_range.start = 0

p.yaxis.axis_label = 'Kilotons of Munitions'
output_notebook()
show(p)

### Time-Series and Annotations: Bombing Operations over Time

Let’s now explore the use of incendiary and fragmentation explosive a little more by seeing if there’s any trend in their use over time versus the total munitions dropped.

First, the statement `df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')` makes sure our MSNDATE column is a datetime. This is important because often data loaded from a csv file will not be properly typed as datetime. Supplying the format argument is not required, but doing so significantly speeds up the process.

Second, we pass the argument x_axis_type='datetime' to our figure constructor to tell it that our x data will be datetimes. Otherwise, Bokeh works seamlessly with time data just like any other type of numerical data!

In [18]:
# Read the data
df = pd.read_csv('https://query.data.world/s/mp3rupjnvdzmyq4nfbsojbqn354phd',nrows=50000)

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

grouped = df.groupby('MSNDATE')[['TOTAL_TONS', 'TONS_OF_IC', 'TONS_OF_FRAG']].sum()
# grouped = df.groupby(pd.Grouper(key='MSNDATE', freq='M'))[['TOTAL_TONS', 'TONS_OF_IC', 'TONS_OF_FRAG']].sum()
grouped = grouped/1000

source = ColumnDataSource(grouped)

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, legend_label='All Munitions')
p.line(x='MSNDATE', y='TONS_OF_FRAG', line_width=2, source=source, color=Spectral3[1], legend_label='Fragmentation')
p.line(x='MSNDATE', y='TONS_OF_IC', line_width=2, source=source, color=Spectral3[2], legend_label='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

hover = HoverTool()
hover.tooltips = [
    ("Totals", "@TOTAL_TONS All Munitions / @TONS_OF_IC Incendiary / @TONS_OF_FRAG Fragmentation")]

hover.mode = 'vline'

p.add_tools(hover)

output_notebook()
show(p)

  exec(code_obj, self.user_global_ns, self.user_ns)
