# INTRODUCTION
<a href="http://ibb.co/hxXdKx"><img src="http://preview.ibb.co/cgA9Rc/ww2.png" alt="ww2" border="0"></a>

* In this kernel, we use multiple data sources that are **aerial bombing  operations** and **weather conditions in world war 2**.
* After this point, I will use acronym ww2 for world war 2.
* We will start with **data description and cleaning**, then we will visualize our data to understand better. These processes can be called **EDA (Exploratory Data Analysis)**.
* After that, we will focus on **time series prediction** to predict when bombing operations are done. 
* For time series prediction, we will use **ARIMA** method that will be a tutorial. 
 
 <br> <font color='blue'> Content: 
    * [Load the Data](#1)
    * [Data Description](#2)
    * [Data Cleaning](#3)
    * [Data Visualization](#4)
    * [Time Series Prediction with ARIMA](#5)
        * [ What is Time Series ?](#6)
        * [Stationarity of a Time Series](#7)
        * [Make a Time Series Stationary](#8)
            * Moving Average method
            * Differencing method
        * [Forecasting a Time Series](#9)
    * [Conclusion](#10)


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # visualization library
import matplotlib.pyplot as plt # visualization library
import plotly.graph_objs as go # plotly graphical object

In [None]:
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
import os
print(os.listdir("../input"))
# import warnings library
import warnings        
# ignore filters
warnings.filterwarnings("ignore") # if there is a warning after some codes, this will avoid us to see them.
plt.style.use('ggplot') # style of plots. ggplot is one of the most used style, I also like it.
# Any results you write to the current directory are saved as output.

<a id="1"></a> <br>
## Load the Data
* We use multiple data sources.
    * Aerial Bombing Operations in WW2
        * Shortly, this data includes bombing operations. For example, USA who use ponte olivo airfield bomb Germany (Berlin) with A36 air craft in 1945.
    * Wether Conditions in WW2
        * Shortly, weather conditions during ww2. For example, according to george town weather station, average temperature is 23.88 in 1/7/1942. 
        * This data set has 2 subset in it. First one includes weather station locations like country, latitude and longitude.
        * Second one includes measured min, max and mean temperatures from weather stations.

In [None]:
# bombing data
aerial = pd.read_csv("../input/operations/operations.csv")
# first weather data that includes locations like country, latitude and longitude.
weather_station_location = pd.read_csv("../input/Weather Station Locations.csv")

Now load the second weather data, called <u>Summary of Weather.csv</u> and assign it to the variable called `weather`

In [None]:
# Write your code here. Second weather data that includes measured min, max and mean temperatures. 


<a id="2"></a> <br>
## Data Description
Only DATA FEATURES USED IN THIS KERNEL ARE LISTED.
* **Aerial bombing Data description:**
    * Mission Date: Date of mission
    * Theater of Operations: Region in which active military operations are in progress; "the army was in the field awaiting action"; Example: "he served in the Vietnam theater for three years"
    * Country: Country that makes mission or operation like USA
    * Air Force: Name or id of air force unity like 5AF
    * Aircraft Series: Model or type of aircraft like B24
    * Callsign: Before bomb attack, message, code, announcement, or tune that is broadcast by radio.
    * Takeoff Base: Takeoff airport name like Ponte Olivo Airfield 
    * Takeoff Location: takeoff region Sicily
    * Takeoff Latitude: Latitude of takeoff region
    * Takeoff Longitude: Longitude of takeoff region
    * Target Country: Target country like Germany
    * Target City: Target city like Berlin
    * Target Type: Type of target like city area
    * Target Industry: Target industy like town or urban
    * Target Priority: Target priority like 1 (most)
    * Target Latitude: Latitude of target 
    * Target Longitude: Longitude of target
* **Weather Condition data description:**
    * Weather station location:
        * WBAN: Weather station number
        * NAME: weather station name
        * STATE/COUNTRY ID: acronym of countries
        * Latitude: Latitude of weather station
        * Longitude: Longitude of weather station
    * Weather:
        * STA: eather station number (WBAN)
        * Date: Date of temperature  measurement 
        * MeanTemp: Mean temperature

<a id="3"></a> <br>
## Data Cleaning
* Aerial  Bombing data includes a lot of NaN value. Instead of using them, we drop some NaN values. It does not only remove the uncertainty but it also ease the visualization process.
    * Drop countries that are NaN
    * Drop if target longitude is NaN
    * Drop if takeoff longitude is NaN
    * Drop unused features
* Weather Condition data does not need any cleaning. According to exploratory data analysis and visualization, we will choose certain location to examine deeper. However, lets put our data variables what we use only. 

In [None]:
# Drop countries that are NaN. Assign it back to the original variable. Insert your code below.


In [None]:
# drop if target longitude is NaN
aerial = aerial[pd.isna(aerial['Target Longitude'])==False]

# Drop if takeoff longitude is NaN. Insert code here.


In [None]:
# drop unused features
drop_list = ['Mission ID','Unit ID','Target ID','Altitude (Hundreds of Feet)','Airborne Aircraft',
             'Attacking Aircraft', 'Bombing Aircraft', 'Aircraft Returned',
             'Aircraft Failed', 'Aircraft Damaged', 'Aircraft Lost',
             'High Explosives', 'High Explosives Type','Mission Type',
             'High Explosives Weight (Pounds)', 'High Explosives Weight (Tons)',
             'Incendiary Devices', 'Incendiary Devices Type',
             'Incendiary Devices Weight (Pounds)',
             'Incendiary Devices Weight (Tons)', 'Fragmentation Devices',
             'Fragmentation Devices Type', 'Fragmentation Devices Weight (Pounds)',
             'Fragmentation Devices Weight (Tons)', 'Total Weight (Pounds)',
             'Total Weight (Tons)', 'Time Over Target', 'Bomb Damage Assessment','Source ID']
             aerial.drop(drop_list, axis=1,inplace = True)
aerial = aerial[ aerial.iloc[:,8]!="4248"] # drop this takeoff latitude 
aerial = aerial[ aerial.iloc[:,9]!=1355]   # drop this takeoff longitude

Select only the following columns and assign it to the variable `weather_station_location` "WBAN","NAME","STATE/COUNTRY ID","Latitude","Longitude".

In [None]:
 # Insert your code here. 


<a id="4"></a> <br>
## Data Visualization
* Lets start with basics of visualization that is understanding data.
    * How many country which attacks
    * Top target countries
    * Top 10 aircraft series
    * Takeoff base locations (Attack countries)
    * Target locations (If you do not understand methods of pyplot look at my pyplot tutorial: https://www.kaggle.com/kanncaa1/plotly-tutorial-for-beginners)
    * Bombing paths
    * Theater of Operations
    * Weather station locations

In [None]:
# country
print(aerial['Country'].value_counts())
plt.figure(figsize=(22,10))
sns.countplot(aerial['Country'])
plt.show()

In [None]:
# Top target countries
print(aerial['Target Country'].value_counts()[:10])
plt.figure(figsize=(22,10))
sns.countplot(aerial['Target Country'])
plt.xticks(rotation=90)
plt.show()

Visualise the number of count of "Aircraft Series" from the `aerial` dataset. 
Plot using a barchart to visualise the distribution. 

In [None]:
# Aircraft Series
data = aerial['Aircraft Series'].value_counts()
print(data[:10])
data = [go.Bar(
            x=data[:10].index,
            y=data[:10].values,
            hoverinfo = 'text',
            marker = dict(color = 'rgba(177, 14, 22, 0.5)',
                             line=dict(color='rgb(0,0,0)',width=1.5)),
    )]

layout = dict(
    title = 'Aircraft Series',
)
fig = go.Figure(data=data, layout=layout)
fig.show()

In [None]:
aerial.head()

Let's visualise from which countries the bombing are taken off. We are calling this attack.

In [None]:
## ATTACK. Assign different colors for different Country. 
aerial["color"] = ""
aerial.color[aerial.Country == "USA"] = "rgb(0,116,217)"
aerial.color[aerial.Country == "GREAT BRITAIN"] = "rgb(255,65,54)"
aerial.color[aerial.Country == "NEW ZEALAND"] = "rgb(133,20,75)"
aerial.color[aerial.Country == "SOUTH AFRICA"] = "rgb(255,133,27)"

## We are using a Scatter Plot with Geographical properties from the Plotly library called 'scattergeo'.  
data = [dict(
    type='scattergeo',
    lon = aerial['Takeoff Longitude'], #The takeoff longitude
    lat = aerial['Takeoff Latitude'], # The takeoff latitude
    hoverinfo = 'text',
    text = "Country: " + aerial.Country + " Takeoff Location: "+aerial["Takeoff Location"]+" Takeoff Base: " + aerial['Takeoff Base'],
    mode = 'markers',
    marker=dict(
        sizemode = 'area',
        sizeref = 1,
        size= 10 ,
        line = dict(width=1,color = "white"),
        color = aerial["color"],
        opacity = 0.7),
)]

# In Plotly, we can also customise the layout of the visualisation.
layout = dict(
    title = 'Countries Take Off Bases ',
    hovermode='closest',
    geo = dict(showframe=False, showland=True, showcoastlines=True, showcountries=True,
               countrywidth=1, projection=dict(type='mercator'),
              landcolor = 'rgb(217, 217, 217)',
              subunitwidth=1,
              showlakes = True,
              lakecolor = 'rgb(255, 255, 255)',
              countrycolor="rgb(5, 5, 5)")
)
# Initialise a class of Plotly graph object, and fit the data and layout. Use fig.show() to visualise the scattergeo.
fig = go.Figure(data=data, layout=layout)
fig.show()

Now lets visualize <u>bombing paths</u> which country from which <u>take off</u> base bomb the which **countries and cities.**

In [None]:
# Bombing paths
# trace1: Takeoff Location
airports = [ dict(
        type = 'scattergeo',
        lon = aerial['Takeoff Longitude'],
        lat = aerial['Takeoff Latitude'],
        hoverinfo = 'text',
        text = "Country: " + aerial.Country + " Takeoff Location: "+aerial["Takeoff Location"]+" Takeoff Base: " + aerial['Takeoff Base'],
        mode = 'markers',
        marker = dict( 
            size=5, 
            color = aerial["color"],
            line = dict(
                width=1,
                color = "white"
            )
        ))]

# trace 2: Target city. Complete the code below
targets = 

#trace 3: Flight Paths. We need to create a new dataframe for this. 
flight_paths = []
for i in range( len( aerial['Target Longitude'] ) ):
    flight_paths.append(
        dict(
            type = 'scattergeo',
            lon = [ aerial.iloc[i,9], aerial.iloc[i,16] ], # Takeoff Longitude, Target Longitude
            lat = [ aerial.iloc[i,8], aerial.iloc[i,15] ], # Takeoff Latitude, Target Latitude.
            mode = 'lines',
            line = dict(
                width = 0.7,
                color = 'black',
            ),
            opacity = 0.6,
        )
    )

In [None]:
# Theater of Operations. What can you conclude?
print(aerial['Theater of Operations'].value_counts())
plt.figure(figsize=(22,10))
sns.countplot(aerial['Theater of Operations'])
plt.show()

In [None]:
# Weather Station Locations. Using the weather_station_location data, fill in the longitude and latitude in the missing fields below. 
# What can you conclude about the weather_station_location?
data = [dict(
    type='scattergeo',
    lon = 
    lat = 
    hoverinfo = 'text',
    text = "Name: " + weather_station_location.NAME + " Country: " + weather_station_location["STATE/COUNTRY ID"],
    mode = 'markers',
    marker=dict(
        sizemode = 'area',
        sizeref = 1,
        size= 8 ,
        line = dict(width=1,color = "white"),
        color = "blue",
        opacity = 0.7),
)]
layout = dict(
    title = 'Weather Station Locations ',
    hovermode='closest',
    geo = dict(showframe=False, showland=True, showcoastlines=True, showcountries=True,
               countrywidth=1, projection=dict(type='Mercator'),
              landcolor = 'rgb(217, 217, 217)',
              subunitwidth=1,
              showlakes = True,
              lakecolor = 'rgb(255, 255, 255)',
              countrycolor="rgb(5, 5, 5)")
)
fig = go.Figure(data=data, layout=layout)


## Focus on USA and BURMA War

* In this war USA bomb BURMA( KATHA city) from 1942 to 1945.
* The closest weather station to this war is **BINDUKURI** and it has temperature record from 1943 to 1945.
* Now lets visualize this situation. But before visualization, we need to make date features date time object.

In [None]:
weather_station_id = weather_station_location[weather_station_location.NAME == "BINDUKURI"].WBAN 

Using the `weather_station_id`, create a new dataframe called `weather_bin` where <p> `weather.STA == weather_station_id`

In [None]:
# Complete the code below
weather_bin = 
## Convert the date column to datetime object
weather_bin["Date"] = 

In [None]:
# Now, let's visualise the mean temperature of the Bindukuri location
plt.figure(figsize=(22,10))
plt.plot(weather_bin.Date,weather_bin.MeanTemp)
plt.title("Mean Temperature of Bindukuri Area")
plt.xlabel("Date")
plt.ylabel("Mean Temperature")
plt.show()

* As you can see, we have temperature measurement from 1943 to 1945.
* Temperature ossilates between 12 and 32 degrees.
* Temperature of winter months is colder than temperature of summer months.

### Let's select USA from the attack country, and Burma as target, and City of target as Katha

Reload the <u>operations</u> data as `aerial`

In [None]:
# Load the data here
aerial = pd.read_csv("")

# Let's look at the data.
aerial.head()

Create a new column called year by getting the year of attack from the "Mission Date" column. Filter only after year 1943.

In [None]:
aerial["year"] = [ each.split("/")[2] for each in aerial["Mission Date"]]
aerial["month"] = [ each.split("/")[0] for each in aerial["Mission Date"]]
aerial = aerial[aerial["year"]>="1943"]
aerial = aerial[aerial["month"]>="8"]

For each of the date of attack in `Mission Date`, get the MeanTemp of the city of that date from `weather_bin` dataframe. 

In [None]:
# Create 2 vector, one for MeanTemp, one for Date of attack.
meantemp_list = []
date_of_attack = []

# Now find date of attack, and meanTemp of that day
for each in aerial_war["Mission Date"]:
    dummy = weather_bin[weather_bin.Date == each]
    meantemp_list.append(dummy["MeanTemp"].values)
aerial_war["dene"] = liste
for each in aerial_war.dene.values:
    date_of_attack.append(each[0])

Visualise the Mean temperature on the day of the attacks.


In [None]:
# Create a trace
trace = go.Scatter(
    x = weather_bin.Date,
    mode = "lines",
    y = weather_bin.MeanTemp,
    marker = dict(color = 'rgba(16, 112, 2, 0.8)'),
    name = "Mean Temperature"
)
trace1 = go.Scatter(
    x = aerial_war["Mission Date"],
    mode = "markers",
    y = date_of_attack,
    marker = dict(color = 'rgba(16, 0, 200, 1)'),
    name = "Bombing temperature"
)
layout = dict(title = 'Mean Temperature --- Bombing Dates and Mean Temperature at this Date')
data = [trace,trace1]

* Green line is mean temperature that is measured in Bindukuri.
* Blue markers are bombing dates and bombing date temperature.
* As it can be seen from plot, USA bomb at high temperatures.
    * The question is that can we predict future weather and according to this prediction can we know whether bombing will be done or not.
    * In order to answer this question lets first start with time series prediction.