<a href="https://www.kaggle.com/code/tanishaharde/exploring-global-terrorism-patterns-and-trends?scriptVersionId=167188969" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>


## Introduction

Greetings, Kaggle Community! In this project, we embark on an analytical journey to decode the intricate patterns within terrorism incidents. Our primary goal is to harness the potential of data-driven approaches, deepening our understanding of terrorism dynamics and contributing to global security strategies.

Terrorism's enduring impact underscores the need for robust insights. To address this, we delve into the "Global Terrorism Database," an invaluable resource for uncovering trends and patterns of these incidents.

**Aim:**
We dive into a comprehensive analysis enriched with  visualizations. Our goal is to uncover hidden trends, patterns, within terrorism incidents. By achieving this, we aim to unravel the intricacies that influence these events.





## Dataset Justification: Global Terrorism Database

We chose the "Global Terrorism Database" due to its relevance, comprehensive data, and potential for impactful insights. Here's why this dataset is a perfect fit for our analysis:

**1. Relevance to Global Issue:**
Terrorism is a critical global concern. This dataset offers a comprehensive collection of terrorism incidents, enabling us to delve into trends, patterns to gain insights related to these incidents.

**2. Rich Data Spectrum:**
The dataset provides a wide array of information, including attack details, locations, perpetrator groups, and casualties. This richness allows us to explore multifaceted aspects of terrorism incidents.

**3. Real-World Impact:**
By analyzing this data, we can contribute to informed decision-making for security strategies, resource allocation, and response planning. Our analysis can have practical implications on a global scale.

**4. Open and Collaborative:**
The dataset's open availability on platforms like Kaggle encourages collaboration, diverse perspectives, and knowledge sharing. This fosters a collective effort to tackle complex global challenges.

**5. Multidisciplinary Exploration:**
Terrorism incidents encompass various factors—social, political, and geographic. This dataset enables us to conduct a multidisciplinary analysis, combining insights from different domains.





## Methodology Recap and Setup

Before we dive into the code, let's recap our methodology and set up the environment for our analysis. We'll also lay out the steps that will guide our exploration. Let's navigate through the following stages:

1. **Data Loading and Inspection:** We'll start by loading the "Global Terrorism Database" and conducting a quick overview of its structure to ensure we're working with the correct data.

2. **Data Preprocessing:** To prepare the data for analysis, we'll address missing values, outliers, and convert relevant columns into appropriate formats.

3. **Data Visualization:** Visualizations play a pivotal role in uncovering insights. We'll create meaningful charts and graphs to visualize trends, patterns, and correlations within the dataset.

4. **Hypothesis Testing:** We use Hypothesis testing to confirm whether the observed patterns in the data are statistically significant or if they could have occurred by random chance.By performing hypothesis tests, we can determine if our findings are reliable and support our conclusions about the data.

With this roadmap in place, let's kick off our journey by loading the data, immersing ourselves in visual insights!



In [None]:

import pandas as pd

# Load the dataset
data = pd.read_csv("/kaggle/input/gtd/globalterrorismdb_0718dist.csv", encoding='ISO-8859-1')

# Display basic information about the dataset
print("Dataset Overview:")
print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
print("\nColumn names:\n", data.columns)

print("\nFirst few rows of the dataset:")
print(data.head())


In [None]:
data.describe()

In [None]:
data.head()

 Here's a brief overview of some important columns in the GTD:


1. **iyear:** Year of the incident.
2. **imonth:** Month of the incident.
3. **iday:** Day of the incident.
4. **country_txt:** Name of the country where the incident occurred.
5. **region_txt:** Name of the region where the incident occurred.
6. **provstate:** Province or state within the country.
7. **city:** City or location where the incident occurred.
8. **attacktype1_txt:** Type of attack (e.g., bombing, assassination).
9. **targtype1_txt:** Type of target (e.g., government, civilians).
10. **gname:** Name of the perpetrator group.
11. **nkill:** Number of reported kills.
12. **nwound:** Number of reported wounded.
13. **summary:** Brief summary of the incident.
14. **motive:** Motivation behind the incident.
15. **weaptype1_txt:** Type of weapon used.
16. **propextent_txt:** Extent of property damage.
17. **ishostkid:** Indicates if hostages were taken.
18. **ransomamt:** Amount of ransom demanded.
19. **ransompaid:** Amount of ransom paid.


In [None]:
#  Check Column Data Types
print(data[["iyear", "imonth", "iday"]].dtypes)



In [None]:
data.duplicated()

We observe that there are no duplicate values, but still to be on safer side we drop the duplicates, given the fact that number of features and rows are extensive

In [None]:
data.drop_duplicates()

In [None]:
data.isnull().sum()

In [None]:
#  Missing values can affect analysis. We'll handle them appropriately.

# Drop columns with too many missing values
data = data.dropna(thresh=len(data) * 0.7, axis=1)

In [None]:
data

In [None]:
# Drop rows with missing target values (e.g., rows without information on attacks)
data = data.dropna(subset=["nkill"])


In [None]:
data

In [None]:
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the "attacktype1_txt" column
encoded_values = label_encoder.fit_transform(data["attacktype1_txt"])

# Create a dictionary to map encoded values back to original categories
encoded_to_category = {encoded: category for encoded, category in zip(encoded_values, data["attacktype1_txt"])}

# Print the mapping
print(encoded_to_category)


Reason for Label Encoding: The "attacktype1_txt" column represents different types of terrorist attacks, such as bombings, assassinations, hijackings, etc. These values are categorical and have no inherent order or numerical meaning. By applying label encoding, you convert these categorical values into numerical labels while preserving the original order of the categories. 

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 7))

# Create a box plot for the nkill column
bp = ax.boxplot(data["nkill"], patch_artist=True, vert=False, widths=0.4, showmeans=True,
                meanprops={"marker": "o", "markerfacecolor": "blue", "markeredgecolor": "blue"})

# Define custom colors
box_color = 'lightgreen'
outlier_color = 'red'

# Set box plot face color
for patch in bp['boxes']:
    patch.set_facecolor(box_color)

# Set whisker and cap color and linewidth
for whisker, cap in zip(bp['whiskers'], bp['caps']):
    whisker.set(color='black', linewidth=1)
    cap.set(color='black', linewidth=2)

# Set median color and linewidth
for median in bp['medians']:
    median.set(color='black', linewidth=3)

# Set outlier marker style and color
for flier in bp['fliers']:
    flier.set(marker='o', markersize=6, markerfacecolor=outlier_color, alpha=0.5)

# Set y-axis label
ax.set_yticklabels(["nkill"])

# Set title
plt.title("Deeper look into Nkill feature")

# Show plot
plt.show()

   




In [None]:
#we observe that there are many outliers in our dataset, so we attempt to remove few 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Apply outlier detection (e.g., using Z-score method) on normalized nkill values
z_scores = np.abs((data["nkill"] - data["nkill"].mean()) / data["nkill"].std())
outlier_threshold = 3  # Adjust this threshold as needed
outliers = np.where(z_scores > outlier_threshold)[0]

# Reset the index of the data DataFrame
data.reset_index(drop=True, inplace=True)

# Remove outliers from the original DataFrame
data.drop(outliers, inplace=True)

# Print the number of outliers removed
print(f"Number of outliers removed: {len(outliers)}")





In [None]:


# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 7))

# Create a box plot for the nkill column
bp = ax.boxplot(data["nkill"], patch_artist=True, vert=False, widths=0.4, showmeans=True,
                meanprops={"marker": "o", "markerfacecolor": "blue", "markeredgecolor": "blue"})

# Define custom colors
box_color = 'lightgreen'
outlier_color = 'red'

# Set box plot face color
for patch in bp['boxes']:
    patch.set_facecolor(box_color)

# Set whisker and cap color and linewidth
for whisker, cap in zip(bp['whiskers'], bp['caps']):
    whisker.set(color='black', linewidth=1)
    cap.set(color='black', linewidth=2)

# Set median color and linewidth
for median in bp['medians']:
    median.set(color='black', linewidth=3)

# Set outlier marker style and color
for flier in bp['fliers']:
    flier.set(marker='o', markersize=6, markerfacecolor=outlier_color, alpha=0.5)

# Set y-axis label
ax.set_yticklabels(["nkill"])

# Set title
plt.title("Deeper look into Nkill feature")

# Show plot
plt.show()

In [None]:
data.shape

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


numeric_data = data.select_dtypes(include=['number'])

# Correlation Matrix
correlation_matrix = numeric_data.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()


The key findings from the above correlation matrix are: 
1. A high correlation between "country" and "natlty1" could indicate that, in many terrorist incidents, the nationality of the target ("natlty1") tends to be the same as the country where the attack occurred.It may suggests that The nationality of the target ("natlty1") is often associated with the country where the terrorist event took place. In other words, there is a tendency for terrorists to target individuals or groups from the same country as the attack location.

2.The high correlation between "attacktype1" and "weaptype1" in a terrorism dataset suggests a meaningful relationship between the type of attack and the choice of weapon used in terrorist incidents.Terrorist groups often plan their attacks meticulously, and the choice of weapon is a crucial part of their strategy. The high correlation indicates that certain types of attacks are more likely to be associated with specific weapon types. For example, bombings may be more likely to involve explosives, while armed assaults may involve firearms.Also,Different weapons have varying degrees of lethality. For instance, explosives and firearms can cause significant casualties, while other weapons like knives may be less lethal. Understanding this correlation can help in assessing the potential impact and danger posed by different types of attacks.






We'll be further exploring the correlation between suicide and weapon subtype in the next cell.

In [None]:
contingency_table = pd.crosstab(data['suicide'], data['weapsubtype1_txt'])
print(contingency_table)




The use of explosives stands out as a significant factor associated with suicide in terrorist incidents. While there were 55 incidents without suicide involving explosives, there were 28 incidents with suicide involving explosives. This suggests that suicide attacks are more likely to involve explosives compared to incidents without suicide.
This key interpretation is important because it highlights a specific weapon subtype (explosives) that is notably linked to suicide attacks. Understanding this association can be valuable for security and counterterrorism efforts, as it indicates that a focus on explosives-related prevention and response measures may be particularly relevant when addressing suicide terrorist incidents.

In [None]:
data.columns


In [None]:
# Attacks by Year
plt.figure(figsize=(18, 6))
attacks_by_year = data.groupby("iyear")["nkill"].sum()
plt.plot(attacks_by_year.index, attacks_by_year.values, marker="o")
plt.title("Total Killings by Year")
plt.xlabel("Year")
plt.ylabel("Total Killings")
plt.grid(True)

# Set x-axis labels to display every 5 years
plt.xticks(range(1970, 2023, 5))  # Adjust the range as needed

plt.show()


We can observe that number of killings by terrorist attack reaches it's peak in the year 2015 and then starts to decrease. Let's find out which Terrrorist group is responsible for the this peak in the year 2015

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Filter the data for the year 2015
data_2015 = data[data['iyear'] == 2015]

# Group the data by terrorist group and sum the total killings
killings_by_group = data_2015.groupby("gname")["nkill"].sum()

# Sort the groups by total killings in descending order and select the top 10
top_10_groups = killings_by_group.sort_values(ascending=False).head(10)

# Create a colormap with shades of red
colors = plt.cm.Reds(np.linspace(1,0.2, 10))

# Create a bar chart with the colormap
plt.figure(figsize=(12, 6))
top_10_groups.plot(kind="bar", color=colors)
plt.title("Top 10 Terrorist Groups with Most Killings in 2015")
plt.xlabel("Terrorist Group")
plt.ylabel("Total Killings")
plt.xticks(rotation=45)
plt.show()




1. **Rise of ISIS:** The year 2015 marked a significant period in the rise of the Islamic State of Iraq and Syria (ISIS) as a major terrorist organization. ISIS was responsible for numerous large-scale attacks in different parts of the world during this period. Their actions contributed to a surge in terrorist-related violence and casualties.

2. **Syrian Civil War:** The ongoing Syrian Civil War, which began in 2011, intensified in the years leading up to 2015. This conflict attracted foreign fighters and led to increased terrorism and violence, not only in Syria but also in neighboring countries and globally.

3. **Global Response:** The increase in terrorist activities by groups like ISIS prompted a global response, including military interventions and coordinated efforts to counter terrorism. These actions might have started to have an impact on the capabilities of terrorist organizations which led to decrease in the number of killings after year 2015.

4. **Counterterrorism Measures:** Many countries implemented stricter counterterrorism measures, enhanced intelligence sharing, and increased security efforts in the aftermath of high-profile attacks. These measures may have hindered the ability of terrorist groups to carry out large-scale attacks further leading to decrease after year 2015.


In summary, the peak in killings in 2015 and the subsequent decrease can be attributed to a combination of factors, including the rise and subsequent decline of ISIS, ongoing conflicts, global counterterrorism efforts, and potential data limitations. Analyzing these trends in conjunction with broader geopolitical events and counterterrorism strategies can provide a more comprehensive understanding of the dynamics behind the data.

In [None]:
# Attacks by Region
plt.figure(figsize=(12, 6))
attacks_by_region = data.groupby("region_txt")["nkill"].sum().sort_values(ascending=False)
sns.barplot(x=attacks_by_region.values, y=attacks_by_region.index)
plt.title("Total Killings by Region")
plt.xlabel("Total Killings")
plt.ylabel("Region")
plt.show()

Middle East & North African region and the South Asian region have faced the most number of killings compared to any other region.

In [None]:
# Attacks by Attack Type
plt.figure(figsize=(12, 6))
attacks_by_attack_type = data.groupby("attacktype1_txt")["nkill"].sum().sort_values(ascending=False)
sns.barplot(x=attacks_by_attack_type.values, y=attacks_by_attack_type.index)
plt.title("Total Killings by Attack Type")
plt.xlabel("Total Killings")
plt.ylabel("Attack Type")
plt.show()

Most number of killings were due to Bombing/Explosion and Armed Assualt.

In [None]:
# Attacks by Target Type
plt.figure(figsize=(12, 6))
attacks_by_target_type = data.groupby("targtype1_txt")["nkill"].sum().sort_values(ascending=False)
sns.barplot(x=attacks_by_target_type.values, y=attacks_by_target_type.index)
plt.title("Total Killings by Target Type")
plt.xlabel("Total Killings")
plt.ylabel("Target Type")
plt.show()

Majority of the deaths were observed when the target was citizens of the country, specific property, military or the police.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
# Attacks by Group Name
plt.figure(figsize=(8, 8))
top_groups = data.groupby("gname")["nkill"].sum().nlargest(10)
top_groups = top_groups.reset_index()
colors = sns.color_palette("pastel")[0:len(top_groups)]

# Create a pie chart
pie, texts, autotexts = plt.pie(top_groups["nkill"], labels=top_groups["gname"], colors=colors, autopct='%1.1f%%', startangle=140)
plt.title("Total Killings by Groups in percentage")
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Show the plot
plt.show()




The total percentage killings by Unknown groups were highest , 49.8%. Followed by, Taliban at 12.8% and ISIL at 11.8%.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Filter data for the Middle East & North Africa and Sub-Saharan Africa regions
mena_data = data[data['region_txt'] == 'Middle East & North Africa']
ssa_data = data[data['region_txt'] == 'Sub-Saharan Africa']

# Set up the plot
plt.figure(figsize=(10, 6))
sns.histplot(data=mena_data, x='nkill', bins=20, kde=True, color='blue', label='Middle East & North Africa')
sns.histplot(data=ssa_data, x='nkill', bins=20, kde=True, color='orange', label='Sub-Saharan Africa')

# Add labels and title
plt.xlabel('Number of Killings (nkill)')
plt.ylabel('Frequency')
plt.title('Distribution of Killings in Middle East & North Africa vs. Sub-Saharan Africa')
plt.legend()

# Show the plot
plt.show()


In [None]:
# Create a pivot table of weapon types by region
pivot_table = data.pivot_table(index="region_txt", columns="weaptype1_txt", values="eventid", aggfunc="count", fill_value=0)

# Create a heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_table, cmap="YlGnBu")
plt.xlabel("Weapon Type")
plt.ylabel("Region")
plt.title("Weapon Types Used in Terrorism Incidents by Region")
plt.show()


## Hypothesis Testing

Question :Is there a significant difference in the average number of fatalities between attacks in the Middle East & North Africa region and attacks in the Sub-Saharan Africa region?

Hypotheses:

Null Hypothesis (H0): There is no significant difference in the average number of fatalities between the two regions.
Alternative Hypothesis (H1): There is a significant difference in the average number of fatalities between the two regions.

Statistical Test:
We can use an independent two-sample t-test to compare the means of the two groups (regions).
We chose the t-test for the hypothesis testing scenario mentioned above because we were comparing the means of two independent groups (attacks in different regions) . 


In [None]:

import scipy.stats as stats


# Select data for the two regions of interest
middle_east_north_africa = data[data["region_txt"] == "Middle East & North Africa"]
sub_saharan_africa = data[data["region_txt"] == "Sub-Saharan Africa"]

# Perform independent two-sample t-test
t_statistic, p_value = stats.ttest_ind(middle_east_north_africa["nkill"], sub_saharan_africa["nkill"], equal_var=False)

# Compare p-value to significance level
alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis: There is a significant difference in average fatalities between the two regions.")
else:
    print("Fail to reject null hypothesis: There is no significant difference in average fatalities between the two regions.")


In [None]:

# Separate data for known groups and unknown groups
known_groups = data[data['gname'] != 'Unknown']
unknown_groups = data[data['gname'] == 'Unknown']

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(known_groups['nkill'], unknown_groups['nkill'], equal_var=False)

# Print the results
print("Two-Sample T-Test Results:")
print(f"T-Statistic: {t_statistic}")
print(f"P-Value: {p_value}")

# Determine significance level (e.g., 0.05)
alpha = 0.05
if p_value < alpha:
    print("Null hypothesis rejected: There is a significant difference in average fatalities between known and unknown groups.")
else:
    print("Null hypothesis not rejected: There is no significant difference in average fatalities between known and unknown groups.")


Comparing Fatalities in Attacks by Known and Unknown Groups:
Hypothesis: Are attacks carried out by unknown groups associated with significantly higher fatalities than attacks carried out by known groups?

In [None]:

# Filter data for known and unknown groups
known_group_data = data[data['gname'] != 'Unknown']
unknown_group_data = data[data['gname'] == 'Unknown']

# Perform two-sample t-test (two-tailed)
t_statistic, p_value = stats.ttest_ind(unknown_group_data['nkill'], known_group_data['nkill'], equal_var=False)

# Print the results
print("Two-Sample T-Test Results:")
print(f"T-Statistic: {t_statistic}")
print(f"P-Value: {p_value}")

# Determine significance level (e.g., 0.05)
alpha = 0.05
if p_value < alpha:
    print("Null hypothes is rejected: Attacks carried out by unknown groups are associated with significantly higher fatalities than attacks by known groups.")
else:
    print("Null hypothesis not rejected: There is no significant difference in fatalities between attacks by unknown and known groups.")


Is the average number of total killings in bombing attacks greater than in armed assault attacks?

In [None]:

# Filter data for attacks carried out by bombing and armed assault
bombing_data = data[data['attacktype1_txt'] == 'Bombing/Explosion']
armed_assault_data = data[data['attacktype1_txt'] == 'Armed Assault']

# Perform one-sided t-test (right-tailed)
t_statistic, p_value = stats.ttest_ind(bombing_data['nkill'], armed_assault_data['nkill'], alternative='greater', equal_var=False)

# Print the results
print("One-Sided T-Test Results:")
print(f"T-Statistic: {t_statistic}")
print(f"P-Value: {p_value}")

# Determine significance level (e.g., 0.05)
alpha = 0.05
if p_value < alpha:
    print("Null hypothesis rejected: The average total killings in bombing attacks is significantly greater than in armed assault attacks.")
else:
    print("Null hypothesis not rejected: There is no significant difference in total killings between bombing and armed assault attacks.")



**Conclusion:**

The analysis of global terrorist incidents reveals several key insights into the patterns and dynamics of terrorist attacks over the years. These insights shed light on the factors contributing to the rise and decline of terrorism-related killings and provide valuable information for understanding the nature of terrorist incidents.

**1. Rise and Decline of Terrorism-Related Killings:**
   - The peak in the number of killings by terrorist attacks occurred in the year 2015, followed by a subsequent decrease. This pattern can be attributed to multiple factors:
     - **Rise of ISIS:** The year 2015 witnessed the significant rise of the Islamic State of Iraq and Syria (ISIS), leading to numerous large-scale attacks worldwide.
     - **Syrian Civil War:** The intensification of the Syrian Civil War and its global impact contributed to increased terrorism and violence.
     - **Global Response:** International efforts to counter terrorism, including military interventions and coordinated actions, began to impact terrorist organizations.
     - **Counterterrorism Measures:** Stricter counterterrorism measures, intelligence sharing, and enhanced security efforts hindered the capacity of terrorist groups to carry out large-scale attacks.
     - **Data Limitations:** Data reporting variations and limitations may have influenced the decline in reported incidents in more recent years.

**2. Regional Patterns:**
   - The Middle East & North African region and South Asian region have consistently faced a higher number of killings compared to other regions, indicating ongoing instability.

**3. Attack Types and Targets:**
   - The most common attack types are bombings/explosions and armed assaults.
   - The majority of deaths occurred when the targets were citizens of the country, specific properties, military personnel, or the police.

**4. Terrorist Groups:**
   - Unknown groups account for the highest percentage of killings at 49.8%, followed by the Taliban at 12.8% and ISIL at 11.8%.

**5. Weapons Used:**
   - Explosives are the primary weapons used, especially in the Middle East & North African and South Asian regions.

**6. Regional Differences:**
   - Significant differences exist in average fatalities between the Middle East & North Africa region and Sub-Saharan Africa.
   - Attacks by unknown groups are associated with significantly higher fatalities compared to attacks by known groups.

**7. Attack Types and Total Killings:**
   - There is no significant difference in total killings between bombing and armed assault attacks.

In conclusion, the rise and subsequent decline in terrorism-related killings are the result of complex interactions among geopolitical events, global responses, and counterterrorism efforts. The insights drawn from this analysis provide a deeper understanding of the evolving nature of terrorist incidents, which can inform strategies for prevention, response, and international cooperation in countering terrorism.    