<a href="https://colab.research.google.com/github/treasure823/Global_Terrorism_Analysis_Nidhi_Pandey/blob/main/Global_Terrorism_Analysis_Nidhi_Pandey.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism Dataset EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -**  Nidhi Pandey


# **Project Summary -**

The GTD is an open-source database, which provides information on domestic and international terrorist attacks around the world since 1970 through 2017 and now includes more than 180,000 events. For each event, a wide range of information is available, including the date and location of the incident, the weapons used, nature of the target, the number of casualties, and – when identifiable – the group or individual responsible. The data is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland. The project aims to gain insights about terror attacks collectively in different regions through time, find out the regions that are hit the most and check to see how terrorism in itself is diverse with its targets and means. Due to the large quantity of the data and its diversity, the analysis and visualizations are performed only after doing preliminary processes including Handling Null Values, Finding Correlations and summarizing the data’s central tendency, dispersion and shape of data distribution in order to get rudimentary insights. For Exploratory Data Analysis, we are viewing the data through 4 lenses: Terrorist Organization, Global & Regional trends of Attack, Type of Attack and Target Type. The main criteria used to measure the magnitude of the attack is the number of fatalities. The project made use of the Correlation Heatmaps initially as a guide to explore relations between Attack Types and Weapon Types. The project made use of different plots including line plots, bar plots, pie plots, joint plots, strip plot and scatter plots to visualize the data. As more findings were presented, more questions arose. This never-ending loop helped arrive to the following conclusions: • The Civil War between Iraq and Islamic State of Iraq and the Levant (ISIL) has claimed 1570 fatalities, the largest taken in any terror attack. • The top 20% of the Terrorist Organisations have contributed to almost 99% of the crimes. This clearly shows the unequal relationship between actions and consequences verifying Pareto's Principle. • The most notorious terrorist organisations are Islamic State of Iraq and the Levant (ISIL), Taliban, Boko Haram and Shining Path(SL). These organisations alone have contributed to more than one third of the total fatalities and almost 50% of the total attacks. • Global Terrorism started increasing to an all-time high from 2011. This peaked in the year 2014 and started dipping ever since. The regions: South Asia and Middle East & North Africa, play a substantial role in Global Terrorism. • The top 3 most commonly perpetrated types of attack are Bombing/Explosion, Armed Assault and Assassination whose common victims are Private Citizens/Property, Government Officials and Journalists respectively. The frequency of the top 3 attacks increase by two-folds as we move up the ladder. • Armed Assault is a dominant type of attack in Central America & Caribbean and Bombings/Explosion is a dominant type of attack in Middle East & North Africa. • Most Affected Target types are Private Citizens & Property, Military and Police. • Islamic State of Iraq and the Levant (ISIL) and Taliban are active from the mid 2010's and Boko Haram are active from 2010. Liberation Tigers of Tamil Eelam (LTTE) are active from late 1980's. Their attacks peaked in mid 1990's and reduced in 2000's. The Irish Republican Army (IRA) has made their attacks constantly from 1970's to 1980's.

# **GitHub Link -**

https://github.com/treasure823/Global_Terrorism_Analysis_Nidhi_Pandey

# **Problem Statement**


**The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

### Dataset Loading

In [None]:
# Mount Drive 
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset 
filepath = '/content/drive/MyDrive/GlobalTerrorism Analysis/Global Terrorism Data.csv' 
gta_df = pd.read_csv(filepath, encoding = "ISO-8859-1", engine='python') # Encoding is specified

### Dataset First View

In [None]:
# Dataset First Look
gta_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
gta_df.head(7)

In [None]:
gta_df.tail(6)

### Dataset Information

In [None]:
# Dataset Info
gta_df.columns

In [None]:
useful_columns = ['eventid','iyear','country_txt', 'region_txt','attacktype1','attacktype1_txt',
                  'weaptype1','weaptype1_txt','targtype1','targtype1_txt','nwound', 'gname','claimed','nkill','crit1','crit2','crit3','success','suicide','city'
                  ]

Relevant Columns are chosen with the help of the [GTA CodeBook
](https://colab.research.google.com/drive/1ubNPU3SmI9i0I3TC8Q7mO1vV7zsveAD1#scrollTo=MHRVkguR38fF&line=1&uniqifier=1)


In [None]:
#useful_columns contain the list of selected attributes
gta_df = gta_df.loc[:, useful_columns]

In [None]:
# checking the statistical features  of the numerical coloumns 
gta_df.describe()

In [None]:
gta_df[gta_df['claimed']==1]

**Observation: The maximum amount of fatalities an even has claimed is 1570**

In [None]:
gta_df[gta_df['nkill']== 1570]

**The Civil War between Iraq and Islamic State of Iraq and the Islamic State of Iraq and the Levant (ISIL) has claimed the most amount of fatalities**

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
gta_df.head(7)

In [None]:
gta_df.duplicated()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
gta_df.info()

In [None]:
# Visualizing the missing values
for i, col in gta_df.iteritems():
  if col.dtype == 'int64':
    print(col.name)

**Attributes 'claimed', 'nkill' and 'nwound' have null values.**

**Handling null values**

1) 'claimed'

**Down below are the unique values in the column 'claimed'. Here, -9 indicates that the data was unavailable at the time. Null values will be replaced by -9 as dropping the rows will eliminate almost 8000 columns**



In [None]:
gta_df['claimed'].fillna(-9, inplace = True)

2)'nkill'

Null values of nkill is replaced by the median values of nkill

In [None]:
gta_df['nkill'].median()

In [None]:
gta_df['nkill'].fillna(0, inplace = True)

3)'nwound'

In [None]:
gta_df['nwound'].median()


In [None]:
gta_df['nkill'].fillna(0, inplace = True)

Converting year to date time

In [None]:
gta_df['iyear'] = pd.to_datetime(gta_df['iyear'], format = '%Y')

### What did you know about your dataset?

The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.

The provided columns helped us to gather information about all the data which we will be required for our code eventid , iyear , country_txt , region_txt , attacktype1 , attacktype1_txt ,weaptype1 , weaptype1_txt , targtype1 , targtype1_txt , nwound , gname ,claimed , nkill , crit1 , crit2 , crit3 ,success , sucide , city.

This data helped us to know about all the terrorist attack had been happened in the past years which coutry,city, or we can say that which region has been attacked the most by the terrorist organization .The provided data helped us to know what type of attacks had been planned to harm the society such as Bombing/Explosion , Facilities and Infrastucture , Assassination , Hosttage taking ,Armed Assault and the number of fatalities has been happened due to these attacks , the provided data helps to fetch the details that has been provided in the data. Global terrorism has cause an effect on infrastucture and most of attacks have harmed private citizens and properties and it causes a negative impact on bussiness on international level. Terrorism also negatively affects business performance by increasing the cost of conducting business through higher wages and larger security expenditures.The overall psychological impact of the possibility of a future terrorist attack and the immediate expense of heightened airport security have a negative economic impact on the survival and expansion of businesses. Additional expenses (such as spending on security and surveillance, repairs, and replacing stolen property) negatively diminish the already limited financial resources, which may have a negative impact on corporate performance.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
list(gta_df.columns.values)

In [None]:
# Dataset Describe
gta_df.describe()

In [None]:
gta_df.mean()

### Variables Description 

**A symbolic name that serves as a reference or pointer to an object is called a variable in Python. You can use the variable name to refer to an object once it has been assigned to it. Nonetheless, the object itself still holds the data.**


The Global Terrorism Database contains a record of over 180,000 terror attacks from 1970 through 2017 with 137 attributes. These attributes were cut down to 20. The attributes are listed below:-
 
1.   eventid: Unique ID number assigned each terror attack.
2.   iyear: The year the terror attack took place.
3.   country_txt: Name of the country the terror attack took place in.
4.   region_txt: Name of the subcontinent the terror attack took place in.
5.   city: Name of the city the terror attack took place in 
6.   attacktype1, attacktype1_txt: Numerical Encoding and the corresponding type of attack that took place.
7.   weaptype1weaptype1_txt: Numerical Encoding and the type of weapon used to propagate the attack.
8.   targtype1, targtype1_txt: Numerical Encoding and the type of target the attack was perpetrated on.
9.   nwound: Number of victims wounded from the terror attack.
10.  gname: Name of the Terrorist Group that perpetrated the attack.
11.  claimed: Information on whether or not the attack was claimed.
12.  nkill: Number of Fatalities of the terror attack.
13.  crit1: Information if the attack had a political, economic, religious, or social goal.
14.  crit2: Information if the attack had an intention to coerce, intimidate or 
publicize to a larger audience.
15.  crit3: Information if the attack was outside international humanitarian 
law.
16.  success: Specifies if the attack took place 
17.  suicide: Specifies if the attack was perpetrated by suicide

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
gta_df['country_txt'].unique()

In [None]:
gta_df['city'].unique()

In [None]:
gta_df['weaptype1_txt'].unique()

In [None]:
gta_df['suicide'].unique()

In [None]:
gta_df['country_txt'].value_counts()

In [None]:
gta_df["attacktype1_txt"].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.


In [None]:
iyear_str = gta_df.iyear.map(lambda x:str(x))
gta_df["region_txt"]+"-"+iyear_str

In [None]:
gta_df.groupby('country_txt').count()

In [None]:
gta_df.groupby("region_txt").min()

In [None]:
gta_df.groupby("targtype1_txt").apply(lambda row : row["weaptype1"].iloc(0))

In [None]:
gta_df.groupby("attacktype1_txt").weaptype1.agg([len,min,max,sum])

In [None]:
gta_df.rename(index ={0:"Entry 1" , 1:"Entry 2"})

In [None]:
gta_df.set_index(["city" ,"iyear"])

In [None]:
targtype_1 = gta_df.iloc[: 6 , :5]

In [None]:
attacktype1_txt = gta_df.iloc[: 12 , : 18]

*The Concat() method is used to combine a set of elements together along an axis*

In [None]:
pd.concat([targtype_1 , attacktype1_txt ])

### What all manipulations have you done and insights you found?

Data manipulation is significant because it enables quick access to the information that is essential to your particular business and objectives.

1) I have used groupby() method for grouping the important for data the requirement of the code.

2) I have used agg() method .

3) I have used rename () method to rename two indices .

4) I have used iloc(){index based selection}.

5) I have used concat() method for combing two set of elements. 

The insights we got to know about the few nan values by joining the two columns and about number attacks that has been implemented on the various city /region /country such as Assassination ,Bombing/Explosion , Facilities and infrastucture
and Armed assualt and mostly attack region was North America.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline


In [None]:
corrdf = gta_df.corr(method = 'spearman')

In [None]:
plt.figure(figsize=(10,10))
sns.heatmap(corrdf,square = True)

##### 1. Why did you pick the specific chart?

The statistical investigation of the link or dependence between two variables is called correlation. We can examine the direction and degree of the association between two sets of values using correlation.

##### 2. What is/are the insight(s) found from the chart?

Given that the majority of the data is categorical, the observed correlation appears to be relatively weak. The only statistical relationship that can be taken into account is that between "nkill" (Fatalities) and "nwound" (Wounded)

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No we didn't get any negative insight from the above chart.

#### Chart - 2


**Visualizing relation between number Fatalities and Wounded victims**

In [None]:
# Chart - 2 visualization code
sns.jointplot(data= gta_df, x="nkill", y="nwound", height = 10)
plt.title("Analysis of Fatalities & Wounded wrt to Region")
plt.ylabel("Wounded")
plt.xlabel("Fatalities")

##### 1. Why did you pick the specific chart?

The relationship between variables or features in a dataset is quantified by correlation coefficients. Python has excellent tools that you may use to calculate these statistics, which are very important for research and technology. The correlation methods in SciPy, NumPy, and pandas are quick, thorough, and well-documented.

##### 2. What is/are the insight(s) found from the chart?

The graphic shows the breakdown of the total number of assaults' fatalities and wounded victims. From this, we might infer that the majority of attacks result in more than 1000 fatalities and 8000 wonded people.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No 

#### Chart - 3
Line Plot 

Analysis of the types of attacks and number of casualities



In [None]:
gta_df = gta_df.iloc[ :  ,  : ]
gta_df.head(10)


In [None]:
gta_df.tail(10)

In [None]:
# Chart - 3 visualization code
#x variable on horizontal axis (in this case "nkill" column)
# y variable on vertical  axis (in this case "attacktype1_txt" column)
sns.lineplot(x= gta_df['nkill'] , y = gta_df['attacktype1_txt'])

##### 1. Why did you pick the specific chart?

**While working on machine learning or data science projects, line plots, one of the simplest and most fundamental graphical analysis techniques, are crucial for data analysis. The relationship between two variables is expressed using them.**

##### 2. What is/are the insight(s) found from the chart?

we have observe from the above graph that the maximum number of kills has been due to Bombing/Explosion , Facilityandinfrastructureattack , and Armed assault.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

From the above chart we got to know that how the facalities and infrastucture has been damaged, so its a negative impact on business.  

#### Chart - 4
Bar Graph

Analysis of types of attacks and weapons that have been used



In [None]:
# Chart - 4 visualization code
plt.rcParams['figure.figsize'] = (9,7)
sns.barplot(x = 'attacktype1_txt', y = 'weaptype1', data= gta_df, estimator= np.count_nonzero, palette = 'icefire_r')
plt.xticks(rotation = 90)
plt.title("Types of Terrorist Attacks ")
plt.ylabel("No of Attacks")
plt.xlabel("Attack Type")

##### 1. Why did you pick the specific chart?

Bar graph can be used to used to understand the relationship between different variables in your data. They help in the comparison of quantities that belongs to differnt groups .

##### 2. What is/are the insight(s) found from the chart?

The top 3 comonly perpetrated types of attack are

Bombing/Explosion (more than twice as frequent than Armed Assault)

Armed Assault (twice as frequent than Assassinations)

Assassination

The frequency of the top 3 attacks increase by twofolds as we move up the ladder

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The direct economic loss of lives and property is the most obvious. Through fostering xenophobia, decreasing tourism, and increasing insurance claims, terrorism has an indirect impact on the economy.

#### Chart - 5
Scatterplot

Analysis of the number of fatalities

In [None]:
# Chart - 5 visualization code
# x variable on horizontal axis (in this case "nwound" column)
# y variable on vertical  axis (in this case "attacktype1_txt" column) 
sns.scatterplot(x= gta_df['nwound'] , y = gta_df['attacktype1_txt'] , color = "g")

##### 1. Why did you pick the specific chart?

The scatter plots help in understanding the relationship the data better. They are generally used to visualize the relationship between two continous variables.

##### 2. What is/are the insight(s) found from the chart?

From the chart it is clear that more than 8000 fatalities are caused during these attacks .Most of it happens because of Hijacking , unarmed assualt and bombing/explosion.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No positive impact on business can be visualized with the help of this chart.



#### Chart - 6
Pie Chart

Analysis of Top 10 notorious organization

In [None]:
# Chart - 6 visualization code
a = gta_df.groupby(['gname']).agg({'eventid':'count', 'nkill': 'sum'}).sort_values(['eventid','nkill'], ascending=False)
top20 = a.drop('Unknown', axis=0).head(10).reset_index().copy()
rest = a.iloc[11:,:].agg({'eventid':'count', 'nkill':'sum'}).copy()
lst = rest.to_list()
top20.loc[len(top20.index)] = ['Remaining 3217 Terrorist Organisations'] + lst
troa = a.index[:10] #List of top 10 Terrorist organisations with highest activities


In [None]:
plt.rcParams['figure.figsize'] = (15,15)
top20.set_index('gname').plot(x= 'gname', y='eventid',kind='pie', legend=False,colors= sns.color_palette(palette = 'PuBu'), 
                              autopct='%1.1f%%')
plt.title("\n Top 10 Terrorist Organisations with Large Attack Numbers")
plt.ylabel("")

##### 1. Why did you pick the specific chart?

A pie chart is useful for organising and displaying data by percentage of the total.



##### 2. What is/are the insight(s) found from the chart?

We got know about "Top 3 Terrorist Organisations with Large Attack Numbers" such as

1)Taliban

2)Islamic state of Iraq and the Levant

3)shining Path(SL)

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

There is no such insight which will lead to any negative growth.

#### Chart - 7
Line plot

Analysis of Global terrorist attack trend


In [None]:
# Chart - 7 visualization code
fig, ax = plt.subplots(figsize = (14,10))
sns.lineplot(x= 'iyear', y= 'eventid',data= gta_df,estimator= np.count_nonzero, palette= 'bright')

plt.ylabel("No. of Attacks")
plt.title("Global Terrorist Attack Trends")
plt.xlabel("Year")

##### 1. Why did you pick the specific chart?

A line plot gives us the best resprentation of required data.A line chart displays the evolution of one or several numeric variables.

##### 2. What is/are the insight(s) found from the chart?

From the above chart we can see the growth in global terrorism trend since 1970 it grown in between those years as we observe through the chart . And it is at its peaks after 2010 and Global Terrorism started increasing to an all time high from 2011. This peaked in the year 2014 and started dipping ever since.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

As we all know that terrorism hinders the business also the impact from the chart are clearly negative.

#### Chart - 8
Causalities of Terrorist Attacks in different Regions of the World from 1970 to 2017

In [None]:
# Chart - 8 visualization code
fig, ax = plt.subplots(figsize = (14,7))
sns.lineplot(x='iyear', y='nkill',data= gta_df, hue='region_txt',estimator= np.sum, palette= 'bright')
plt.title("Terrorist Attacks in different Regions from 1970-2017 ")
plt.ylabel("No of Fatalities")
plt.xlabel("Year")

##### 1. Why did you pick the specific chart?

Because line chart gives better representation when maximum number of data used is used for viualization .



##### 2. What is/are the insight(s) found from the chart?

Orange line spike near the 2000 indicates september 11 attacks in USA

Yellow line spike between mid 2000's and 2010 the war tensions refers to the battle of gaza bw Hamas and Fatah. 2009 the gaza war ended explaining the dip.The Yellow line peaks again in 2010 with Syrian Civil War(2011) and Iraqui Civil War(2013-2017)

We can also say that even though the number of attacks in South Asia has been higher than the attacks in Sub-Saharan Africa, the attacks in Sub-Saharan africa has been far more devastating by magnitude.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No

#### Chart - 9

Analyzing Target Types by Fatalities and Wounded

In [None]:
# Chart - 9 visualization code
df = gta_df.groupby(['targtype1_txt']).agg({'nkill': 'sum', 'nwound':'sum'}).reset_index()

In [None]:
sns.set_style('darkgrid')
fig, ax = plt.subplots(figsize = (14,10))

sns.scatterplot(x = df['nwound'],y = df['nkill'], hue = df['targtype1_txt'], size= df['nkill'],sizes = (100,1000), palette= 'bright')
plt.title(" Target Type Analysis")
plt.xlabel("Number of Wounded")
plt.ylabel("Number of Fatalities")

##### 1. Why did you pick the specific chart?

in an effort to demonstrate the degree to which one variable is influenced by another, scatter plots are used to plot data points on a horizontal and vertical axis. The values of the columns set on the X and Y axes determine the position of the marker that represents each row in the data table.

##### 2. What is/are the insight(s) found from the chart?

From above chart we can say that the most targets are

1) Educational institution

2)Private citizen and properties

3)Police

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No

#### Chart - 10
Visualizing the extent of the commonly perpetrated attacks in different regions of the country

In [None]:
# Chart - 10 visualization code
col= [ 'Bombing/Explosion','Armed Assault', 'Facility/Infrastructure Attack	']

In [None]:
sns.set_style('darkgrid')
fig, ax = plt.subplots(figsize = (16,9))
sns.stripplot(x ='region_txt', y ='nkill', data = gta_df,hue='attacktype1_txt', hue_order= col,
              palette="Set2", 
              size=25, 
              marker="D",
              edgecolor="gray", alpha=.10)

plt.xticks(rotation=90)
plt.title("Types of Attacks in Different Regions")
plt.ylabel('Number of Fatalities')
plt.xlabel('Attack Type')

##### 1. Why did you pick the specific chart?

A strip plot can be created on its own, but it also works well as an addition to a box or violin plot when you wish to display all observations along with an illustration of the distribution they are based on.

##### 2. What is/are the insight(s) found from the chart?

#Armed Assault is a dominant type of attack in regions like

1) Central America & Caribbean

2) Sub-Saharan Africa and

3) North America

(presence of dense orange)

#Bombings/Explosion is a dominant type of attack in regions like

1) Middle East & North Africa

2) South Asia and

3) Eastern Europe

(presence of dense green)

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No

#### Chart - 11
Common Attacks Perpertrated on different Targets

In [None]:
# Chart - 11 visualization code
sns.set_style('darkgrid')
fig, ax = plt.subplots(figsize = (15,9))
clms = ['Assassination', 'Bombing/Explosion', 'Facility/Infrastructure Attack']
sns.barplot(x = 'targtype1_txt',y = 'nkill', hue ='attacktype1_txt' , hue_order = clms, data = gta_df, palette= 'bright', estimator = np.count_nonzero)
plt.title(" Target Type Analysis")
plt.xticks(rotation=90)
plt.xlabel("Number of Wounded")
plt.ylabel("Number of Fatalities")

##### 1. Why did you pick the specific chart?

Using rectangular bars with heights proportionate to the values they represent, a bar chart or bar graph visualises categorical data.

##### 2. What is/are the insight(s) found from the chart?

The most common attacks are

1)Assassination

2)Bombing/Explosion

3)Facilities and infrastucture attack

Assassination is the most common type attack perpetrated on Government officials and Media Journalists.

Bombings are frequently used to thwart Businesses and Religious Figures/Institutions

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No

#### Chart - 12
Most Notorious Terror Organisations

In [None]:
# Chart - 12 visualization code
a = gta_df.groupby(['gname']).agg({'eventid':'count', 'nkill': 'sum'}).sort_values(['nkill', 'eventid'], ascending=False)
top20 = a.drop('Unknown', axis=0).head(10).reset_index().copy()
rest = a.iloc[11:,:].agg({'eventid':'count', 'nkill':'sum'}).copy()
lst = rest.to_list()
top20.loc[len(top20.index)] = ['Remaining 3217 Terrorist Organisations'] + lst
trof = a.index[:10] #List of top 10 Terrorist organisations with Highest Fatlities


In [None]:
plt.rcParams['figure.figsize'] = (10,10)
top20.set_index('gname').plot(x= 'gname', y='nkill',kind='pie', legend=False,colors= sns.color_palette(palette = 'PiYG'),autopct='%1.1f%%')
plt.yticks(rotation = 45)
plt.title("\n Top 10 Terrorist Organisations with highest Fatalities")
plt.ylabel("")

##### 1. Why did you pick the specific chart?

Pie graphs display the proportional size of objects (referred to as wedges) inside a single data series. In a pie chart, the data points are shown as a percentage of the entire pie

##### 2. What is/are the insight(s) found from the chart?

Inference: The most notorious terrorist organisations are

1) Taliban

2) Boko Haram

3) Islamic State of Iraq and the Levant (ISIL)

4) Shining Path(SL)

These organisations alone have contributed to more than one third of the total fatalities and almost 50% of the total attacks

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

No

#### Chart - 13
Comparing Global Trends of terrorist attack per year with Regional Trends of terrorist attack per year

In [None]:
# Chart - 13 visualization code
fig, ax = plt.subplots(figsize = (18,7))
sns.lineplot(x='iyear', y='eventid',data= gta_df,hue = 'targtype1_txt',estimator= np.count_nonzero, palette= 'Dark2')
plt.ylabel("No. of Attacks")
plt.title("Comparing REgional terrorist  Attack Trends")
plt.xlabel("Year")

##### 1. Why did you pick the specific chart?

I chose linechart because it helps to get the most insightfull information about the data and represent the best data visualization report.

##### 2. What is/are the insight(s) found from the chart?

By compairing regional terrorist attack we got to know that the most targettypes are

1)Private citizen and Property

2)Miltary

3)Police

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The negative impact shown from this chart is that the harm to the public property and to the private citizen 

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
clms = ['attacktype1_txt','targtype1_txt', 'weaptype1_txt', 'suicide'] 
ohdf = gta_df.copy()

In [None]:
ohdf.loc[:, :]

In [None]:
corrdf = ohdf.iloc[:,].corr()
plt.figure(figsize=(40,30))
sns.heatmap(corrdf,square = True,vmin=-1, vmax=1, annot=True)

##### 1. Why did you pick the specific chart?

Heatmaps can be used to understand the relationship between the variables / features in data .They provide color-coded pattern in a table of integers and floating points values . Heatmaps can take a matrix of numerical values and create a chart that uses color contrast to show the relationship between the variables.

##### 2. What is/are the insight(s) found from the chart?

We observe positive correlation between weapons and types of attacks

Use of Explosives and Bombings/Explosions (0.91)

Use of Firearms and Armed Assault (0.68)

Use of Incendiary and Facility/Infrastructure Attack(0.71)

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Advantages of EDA 

Data visualization is an important component of Exploratory Data Analysis (EDA) because it allows a data analyst to “look at” their data and get to know the variables and relationships between them.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***