# Predicting Terrorist Attacks
## Exploratory Data Analysis

**Author:** Thomas Skowronek

**Date:** March 23, 2018

### Notebook Configuration

In [31]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

In [32]:
# Configure notebook output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

pd.set_option('display.max_rows', 150)
pd.set_option('display.max_columns', 150)

mpl.rcParams['figure.figsize'] = (14.6, 9.0)

### Load the Datasets
Load the dataset created by the preprocessing notebook.

In [33]:
# Load the preprocessed GTD dataset
gtd_df = pd.read_csv('../data/gtd_preprocessed_95t016.csv', low_memory=False, index_col = 0,
                      na_values=[''])

### Inspect the Structure
The cleansed data frame contains 56 attributes, one of which is used for the data frame index, and 112,251 observations.

In [34]:
# Display a summary of the data frame
gtd_df.info(verbose = True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 112251 entries, 199501000001 to 201701270001
Data columns (total 55 columns):
iyear               112251 non-null int64
imonth              112251 non-null int64
iday                112251 non-null int64
extended            112251 non-null int64
country             112251 non-null int64
country_txt         112251 non-null object
region              112251 non-null int64
region_txt          112251 non-null object
provstate           109653 non-null object
city                111805 non-null object
latitude            110844 non-null float64
longitude           110844 non-null float64
specificity         112247 non-null float64
vicinity            112251 non-null int64
summary             102988 non-null object
crit1               112251 non-null int64
crit2               112251 non-null int64
crit3               112251 non-null int64
doubtterr           112251 non-null int64
multiple            112251 non-null int64
success             1

### Code Book
Python interprets the data type differently than the code book provided by START.

|ATTRIBUTE|PYTHON DTYPE|CODE BOOK TYPE|DEFINITION|
|:----------------|:--------|:------------|:---------------------------------------|
|eventid|int64|Numeric|12-digit Event ID system. First 8 numbers - date recorded "yyyymmdd". Last 4 numbers - sequential case number for the given day|
|iyear|int64|Numeric|The year in which the incident occurred|
|imonth|int64|Numeric|The month in which the incident occurred.  When the exact month of the incident is unknown, this will be recorded as "0".|
|iday|int64|Numeric|The numeric day of the month on which the incident occurred.  When the exact day of the incident is unknown, the field is recorded as "0".|
|extended|int64|Categorical|The duration of an incident extended more than 24 hours.  1 = YES, 0 = NO|
|country|int64|Categorical|Identifies the country or location where the incident occurred.  When incident occurred cannot be identified, it is coded as "Unknown.|
|country_txt|object|Categorical|Identifies the country or location where the incident occurred.  When incident occurred cannot be identified, it is coded as "Unknown.|
|region|int64|Categorical|Identifies the region in which the incident occurred, and divided into 1 of 12 categories|
|region_txt|object|Categorical|Identifies the region in which the incident occurred, and divided into 1 of 12 categories|
|provstate|object|Text|The name (at the time of event) of the 1st order subnational administrative region in which the event occurs|
|city|object|Text|The name of the city, village, or town in which the incident occurred.  If unknown, then this field contains the smallest administrative area below provstate|
|latitude|float64|Numeric|The latitude (based on WGS1984 standards) of the city in which the event occurred|
|longitude|float64|Numeric|The longitude (based on WGS1984 standards) of the city in which the event occurred.|
|specificity|float64|Categorical|Identifies the geospatial resolution of the latitude and longitude fields. 1 to 5|
|vicinity|int64|Categorical|1 = YES, The incident occurred in the immediate vicinity of the city in question.  0 = NO, The incident in the city itself.|
|summary|object|Text|A brief narrative summary of the incident, noting the "when, where, who, what, how, and why.|
|crit1|int64|Categorical|The violent act must be aimed at attaining a political, economic, religious, or social goal. 1 = YES, 0 = NO|
|crit2|int64|Categorical|There must be evidence of an intention to coerce, intimidate, or convey some other message to a larger audience than the immediate victims. 1 = YES, 0 = NO|
|crit3|int64|Categorical|The action is outside the context of legitimate warfare activities, insofar as it targets non-combatants.  1 = YES, 0 = NO|
|doubtterr|int64|Categorical|There is doubt as to whether the incident is an act of terrorism.  1 = YES, 0 = NO|
|multiple|int64|Categorical|Denote that the particular attack was part of a "multiple" incident.  1 = YES, 0 = NO|
|success|int64|Categorical|A successful attack depends on the type of attack. The key question is whether or not the attack type took place.  1 = YES, 0 = NO|
|suicide|int64|Categorical|Coded "Yes" in those cases where there is evidence that the perpetrator did not intend to escape from the attack alive. 1 = YES, 0 = NO|
|attacktype1|int64|Categorical|The general method of attack and often reflects the broad class of tactics used. 9 categories|
|attacktype1_txt|object|Categorical|The general method of attack and often reflects the broad class of tactics used. 9 categories|
|targtype1|int64|Categorical|The general type of target/victim.  22 categories|
|targtype1_txt|object|Categorical|The general type of target/victim.  22 categories|
|targsubtype1|float64|Categorical|The more specific target category and provides the next level of designation for each target type. If a target subtype is not applicable this variable is left blank|
|targsubtype1_txt|object|Categorical|The more specific target category and provides the next level of designation for each target type. If a target subtype is not applicable this variable is left blank|
|corp1|object|Text|The corporate entity or government agency that was targeted|
|target1|object|Text|The specific person, building, installation, etc., that was targeted and/or victimized|
|natlty1|float64|Categorical|The nationality of the target that was attacked.  For hijacking incidents, the nationality of the plane is recorded|
|natlty1_txt|object|Categorical|The nationality of the target that was attacked.  For hijacking incidents, the nationality of the plane is recorded|
|gname|object|Text|The name of the group that carried out the attack|
|guncertain1|float64|Categorical|Indicates whether or not the information reported about the Perpetrator Group Name(s) is based on speculation or dubious claims of responsibility.  1 = YES, 0 = NO|
|individual|int64|Categorical|Indicates whether or not the attack was carried out by an individual or several individuals not known to be affiliated with a group or organization. 1 = YES, 0 = NO|
|nperpcap|float64|Numeric|The number of perpetrators taken into custody. "-99" or "Unknown" appears when there is evidence of captured, but the number is not reported|
|claimed|float64|Categorical|Indicates whether a group or person(s) claimed responsibility for the attack.  1 = YES, 0 = NO|
|weaptype1|int64|Categorical|Records the general type of weapon used in the incident.  Up to four weapon types are recorded for each incident|
|weaptype1_txt|object|Categorical|Records the general type of weapon used in the incident.  Up to four weapon types are recorded for each incident|
|weapsubtype1|float64|Categorical|A more specific value for most of the Weapon Types identified|
|weapsubtype1_txt|object|Categorical|A more specific value for most of the Weapon Types identified|
|nkill|float64|Numeric|Total confirmed fatalities for the incident|
|nkillus|float64|Numeric|The number of U.S. citizens who died as a result of the incident|
|nkillter|float64|Numeric|Limited to only perpetrator fatalities|
|nwound|float64|Numeric|The number of confirmed non-fatal injuries to both perpetrators and victims|
|nwoundus|float64|Numeric|The number of confirmed non-fatal injuries to U.S. citizens, both perpetrators and victims|
|nwoundte|float64|Numeric|Number of Perpetrators Injured|
|property|int64|Categorical|There is evidence of property damage from the incident.  1 = YES, 0 = NO|
|ishostkid|float64|Categorical|Whether or not the victims were taken hostage or kidnapped during an incident. 1 = YES, 0 = NO|
|scite1|object|Text|Cites the first source that was used to compile information on the specific incident|
|dbsource|object|Text|Identifies the original data collection effort in which each event was recorded|
|INT_LOG|int64|Categorical|It indicates whether a perpetrator group crossed a border to carry out an attack (logistically international).  1 = YES, 0 = NO, -9=UNKNOWN|
|INT_IDEO|int64|Categorical|It indicates whether a perpetrator group attacked a target of a different nationality (ideologically international). 1 = YES, 0 = NO, -9=UNKNOWN|
|INT_MISC|int64|Categorical|It indicates whether a perpetrator group attacked a target of a different nationality (not clear if logistically or ideologically international) 1 = YES, 0 = NO, -9=UNKNOWN|
|INT_ANY|int64|Categorical|The attack was international on any of the dimensions.  1 = YES, 0 = NO, -9=UNKNOWN|

### Summary Statistics
View summary statistics for the numerical attributes.

In [35]:
gtd_df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
iyear,112251.0,2010.1488,6.028381,1995.0,2008.0,2013.0,2015.0,2016.0
imonth,112251.0,6.48755,3.387775,1.0,4.0,7.0,9.0,12.0
iday,112251.0,15.516699,8.822328,0.0,8.0,15.0,23.0,31.0
extended,112251.0,0.057256,0.232331,0.0,0.0,0.0,0.0,1.0
country,112251.0,122.624752,95.952366,4.0,92.0,95.0,160.0,1004.0
region,112251.0,7.883066,2.446715,1.0,6.0,8.0,10.0,12.0
latitude,110844.0,26.347196,13.9188,-42.884049,15.359018,32.374802,34.36871,74.633553
longitude,110844.0,48.244971,40.474892,-149.569504,35.368727,44.579959,70.798316,179.366667
specificity,112247.0,1.468013,0.952514,1.0,1.0,1.0,1.0,5.0
vicinity,112251.0,0.082467,0.323313,-9.0,0.0,0.0,0.0,1.0
