<a href="https://colab.research.google.com/github/treasure823/Global_Terrorism_Analysis_Nidhi_Pandey/blob/main/Global_Terrorism_Analysis_Nidhi_Pandey.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism Dataset EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -**  Nidhi Pandey


# **Project Summary -**

The GTD is an open-source database, which provides information on domestic and international terrorist attacks around the world since 1970 through 2017 and now includes more than 180,000 events. For each event, a wide range of information is available, including the date and location of the incident, the weapons used, nature of the target, the number of casualties, and – when identifiable – the group or individual responsible. The data is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland. The project aims to gain insights about terror attacks collectively in different regions through time, find out the regions that are hit the most and check to see how terrorism in itself is diverse with its targets and means. Due to the large quantity of the data and its diversity, the analysis and visualizations are performed only after doing preliminary processes including Handling Null Values, Finding Correlations and summarizing the data’s central tendency, dispersion and shape of data distribution in order to get rudimentary insights. For Exploratory Data Analysis, we are viewing the data through 4 lenses: Terrorist Organization, Global & Regional trends of Attack, Type of Attack and Target Type. The main criteria used to measure the magnitude of the attack is the number of fatalities. The project made use of the Correlation Heatmaps initially as a guide to explore relations between Attack Types and Weapon Types. The project made use of different plots including line plots, bar plots, pie plots, joint plots, strip plot and scatter plots to visualize the data. As more findings were presented, more questions arose. This never-ending loop helped arrive to the following conclusions: • The Civil War between Iraq and Islamic State of Iraq and the Levant (ISIL) has claimed 1570 fatalities, the largest taken in any terror attack. • The top 20% of the Terrorist Organisations have contributed to almost 99% of the crimes. This clearly shows the unequal relationship between actions and consequences verifying Pareto's Principle. • The most notorious terrorist organisations are Islamic State of Iraq and the Levant (ISIL), Taliban, Boko Haram and Shining Path(SL). These organisations alone have contributed to more than one third of the total fatalities and almost 50% of the total attacks. • Global Terrorism started increasing to an all-time high from 2011. This peaked in the year 2014 and started dipping ever since. The regions: South Asia and Middle East & North Africa, play a substantial role in Global Terrorism. • The top 3 most commonly perpetrated types of attack are Bombing/Explosion, Armed Assault and Assassination whose common victims are Private Citizens/Property, Government Officials and Journalists respectively. The frequency of the top 3 attacks increase by two-folds as we move up the ladder. • Armed Assault is a dominant type of attack in Central America & Caribbean and Bombings/Explosion is a dominant type of attack in Middle East & North Africa. • Most Affected Target types are Private Citizens & Property, Military and Police. • Islamic State of Iraq and the Levant (ISIL) and Taliban are active from the mid 2010's and Boko Haram are active from 2010. Liberation Tigers of Tamil Eelam (LTTE) are active from late 1980's. Their attacks peaked in mid 1990's and reduced in 2000's. The Irish Republican Army (IRA) has made their attacks constantly from 1970's to 1980's.

# **GitHub Link -**

https://github.com/treasure823/Global_Terrorism_Analysis_Nidhi_Pandey

# **Problem Statement**


**The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

### Dataset Loading

In [None]:
# Mount Drive 
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset 
filepath = '/content/drive/MyDrive/GlobalTerrorism Analysis/Global Terrorism Data.csv' 
gta_df = pd.read_csv(filepath, encoding = "ISO-8859-1", engine='python') # Encoding is specified

### Dataset First View

In [None]:
# Dataset First Look
gta_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
gta_df.head(7)

In [None]:
gta_df.tail(6)

### Dataset Information

In [None]:
# Dataset Info
gta_df.columns

In [None]:
useful_columns = ['eventid','iyear','country_txt', 'region_txt','attacktype1','attacktype1_txt',
                  'weaptype1','weaptype1_txt','targtype1','targtype1_txt','nwound', 'gname','claimed','nkill','crit1','crit2','crit3','success','suicide','city'
                  ]

Relevant Columns are chosen with the help of the [GTA CodeBook
](https://colab.research.google.com/drive/1ubNPU3SmI9i0I3TC8Q7mO1vV7zsveAD1#scrollTo=MHRVkguR38fF&line=1&uniqifier=1)


In [None]:
#useful_columns contain the list of selected attributes
gta_df = gta_df.loc[:, useful_columns]

In [None]:
# checking the statistical features  of the numerical coloumns 
gta_df.describe()

In [None]:
gta_df[gta_df['claimed']==1]

**Observation: The maximum amount of fatalities an even has claimed is 1570**

In [None]:
gta_df[gta_df['nkill']== 1570]

**The Civil War between Iraq and Islamic State of Iraq and the Islamic State of Iraq and the Levant (ISIL) has claimed the most amount of fatalities**

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
gta_df.head(7)

In [None]:
gta_df.duplicated()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
gta_df.info()

In [None]:
# Visualizing the missing values
for i, col in gta_df.iteritems():
  if col.dtype == 'int64':
    print(col.name)

**Attributes 'claimed', 'nkill' and 'nwound' have null values.**

**Handling null values**

1) 'claimed'

**Down below are the unique values in the column 'claimed'. Here, -9 indicates that the data was unavailable at the time. Null values will be replaced by -9 as dropping the rows will eliminate almost 8000 columns**



In [None]:
gta_df['claimed'].fillna(-9, inplace = True)

2)'nkill'

Null values of nkill is replaced by the median values of nkill

In [None]:
gta_df['nkill'].median()

In [None]:
gta_df['nkill'].fillna(0, inplace = True)

3)'nwound'

In [None]:
gta_df['nwound'].median()


In [None]:
gta_df['nkill'].fillna(0, inplace = True)

Converting year to date time

In [None]:
gta_df['iyear'] = pd.to_datetime(gta_df['iyear'], format = '%Y')

### What did you know about your dataset?

The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.

The provided columns helped us to gather information about all the data which we will be required for our code eventid , iyear , country_txt , region_txt , attacktype1 , attacktype1_txt ,weaptype1 , weaptype1_txt , targtype1 , targtype1_txt , nwound , gname ,claimed , nkill , crit1 , crit2 , crit3 ,success , sucide , city.

This data helped us to know about all the terrorist attack had been happened in the past years which coutry,city, or we can say that which region has been attacked the most by the terrorist organization .The provided data helped us to know what type of attacks had been planned to harm the society such as Bombing/Explosion , Facilities and Infrastucture , Assassination , Hosttage taking ,Armed Assault and the number of fatalities has been happened due to these attacks , the provided data helps to fetch the details that has been provided in the data. Global terrorism has cause an effect on infrastucture and most of attacks have harmed private citizens and properties and it causes a negative impact on bussiness on international level. Terrorism also negatively affects business performance by increasing the cost of conducting business through higher wages and larger security expenditures.The overall psychological impact of the possibility of a future terrorist attack and the immediate expense of heightened airport security have a negative economic impact on the survival and expansion of businesses. Additional expenses (such as spending on security and surveillance, repairs, and replacing stolen property) negatively diminish the already limited financial resources, which may have a negative impact on corporate performance.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
list(gta_df.columns.values)

In [None]:
# Dataset Describe
gta_df.describe()

In [None]:
gta_df.mean()

### Variables Description 

**A symbolic name that serves as a reference or pointer to an object is called a variable in Python. You can use the variable name to refer to an object once it has been assigned to it. Nonetheless, the object itself still holds the data.**


The Global Terrorism Database contains a record of over 180,000 terror attacks from 1970 through 2017 with 137 attributes. These attributes were cut down to 20. The attributes are listed below:-
 
1.   eventid: Unique ID number assigned each terror attack.
2.   iyear: The year the terror attack took place.
3.   country_txt: Name of the country the terror attack took place in.
4.   region_txt: Name of the subcontinent the terror attack took place in.
5.   city: Name of the city the terror attack took place in 
6.   attacktype1, attacktype1_txt: Numerical Encoding and the corresponding type of attack that took place.
7.   weaptype1weaptype1_txt: Numerical Encoding and the type of weapon used to propagate the attack.
8.   targtype1, targtype1_txt: Numerical Encoding and the type of target the attack was perpetrated on.
9.   nwound: Number of victims wounded from the terror attack.
10.  gname: Name of the Terrorist Group that perpetrated the attack.
11.  claimed: Information on whether or not the attack was claimed.
12.  nkill: Number of Fatalities of the terror attack.
13.  crit1: Information if the attack had a political, economic, religious, or social goal.
14.  crit2: Information if the attack had an intention to coerce, intimidate or 
publicize to a larger audience.
15.  crit3: Information if the attack was outside international humanitarian 
law.
16.  success: Specifies if the attack took place 
17.  suicide: Specifies if the attack was perpetrated by suicide

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
gta_df['country_txt'].unique()

In [None]:
gta_df['city'].unique()

In [None]:
gta_df['weaptype1_txt'].unique()

In [None]:
gta_df['suicide'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***