# Nuclear Explosions Data Analysis (1945-1998)

## A Comprehensive Data Science Study

This notebook provides a detailed analysis of nuclear explosions worldwide from 1945 to 1998. We'll explore patterns, trends, and insights from this historical dataset.

**Author:** Data Science Analysis  
**Dataset:** Nuclear Explosions (2046 records)  
**Source:** DOE and other sources

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

# Set figure size default
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

In [None]:
# Load the dataset
df = pd.read_csv('nuclear_explosions.csv')

print(f"Dataset loaded successfully!")
print(f"Total records: {len(df)}")
print(f"Columns: {df.shape[1]}")

## 2. Exploratory Data Analysis

In [None]:
# Display first few rows
df.head(10)

In [None]:
# Dataset information
df.info()

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
print("Missing values per column:")
print(missing_values[missing_values > 0] if missing_values.sum() > 0 else "No missing values!")

In [None]:
# Statistical summary
df.describe()

## 3. Feature Engineering

In [None]:
# Create additional features for analysis
df['Average.Yield'] = (df['Data.Yeild.Lower'] + df['Data.Yeild.Upper']) / 2
df['Decade'] = (df['Date.Year'] // 10) * 10
df['Yield.Category'] = pd.cut(df['Average.Yield'], 
                               bins=[0, 20, 150, 1000, np.inf],
                               labels=['Low (<20kt)', 'Medium (20-150kt)', 
                                      'High (150-1000kt)', 'Very High (>1000kt)'])

print("New features created:")
print("  - Average.Yield")
print("  - Decade")
print("  - Yield.Category")

## 4. Temporal Analysis

In [None]:
# Explosions over time
yearly_counts = df.groupby('Date.Year').size()

plt.figure(figsize=(16, 6))
plt.plot(yearly_counts.index, yearly_counts.values, marker='o', linewidth=2, markersize=4)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Number of Explosions', fontsize=12)
plt.title('Nuclear Explosions Over Time (1945-1998)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.axvline(x=1963, color='red', linestyle='--', alpha=0.5, label='Partial Test Ban Treaty')
plt.axvline(x=1991, color='blue', linestyle='--', alpha=0.5, label='End of Cold War')
plt.legend()
plt.tight_layout()
plt.show()

print(f"Peak year: {yearly_counts.idxmax()} with {yearly_counts.max()} explosions")

In [None]:
# Decade-wise distribution
decade_counts = df.groupby('Decade').size()

plt.figure(figsize=(12, 6))
plt.bar(decade_counts.index.astype(str), decade_counts.values, color='coral', edgecolor='black')
plt.xlabel('Decade', fontsize=12)
plt.ylabel('Number of Explosions', fontsize=12)
plt.title('Nuclear Explosions by Decade', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nExplosions by decade:")
print(decade_counts)

In [None]:
# Cumulative explosions
cumulative = yearly_counts.cumsum()

plt.figure(figsize=(14, 6))
plt.fill_between(cumulative.index, cumulative.values, alpha=0.7, color='skyblue')
plt.plot(cumulative.index, cumulative.values, color='navy', linewidth=2)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Cumulative Number of Explosions', fontsize=12)
plt.title('Cumulative Nuclear Explosions Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 5. Country-wise Analysis

In [None]:
# Nuclear explosions by country
country_counts = df['Location.Country'].value_counts()

plt.figure(figsize=(12, 6))
plt.barh(range(len(country_counts)), country_counts.values, color='steelblue')
plt.yticks(range(len(country_counts)), country_counts.index)
plt.xlabel('Number of Explosions', fontsize=12)
plt.title('Nuclear Explosions by Country', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nCountry statistics:")
print(country_counts)

In [None]:
# Top countries over time
top_countries = country_counts.head(5).index

plt.figure(figsize=(16, 8))
for country in top_countries:
    country_yearly = df[df['Location.Country'] == country].groupby('Date.Year').size()
    plt.plot(country_yearly.index, country_yearly.values, marker='o', label=country, linewidth=2, markersize=4)

plt.xlabel('Year', fontsize=12)
plt.ylabel('Number of Explosions', fontsize=12)
plt.title('Top 5 Countries: Nuclear Testing Timeline', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Total yield by country
country_yield = df.groupby('Location.Country')['Average.Yield'].agg(['sum', 'mean', 'max', 'count'])
country_yield = country_yield.sort_values('sum', ascending=False)

print("Total yield by country (kilotons):")
print(country_yield)

## 6. Purpose and Type Analysis

In [None]:
# Purpose distribution
purpose_counts = df['Data.Purpose'].value_counts()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Pie chart
axes[0].pie(purpose_counts.values, labels=purpose_counts.index, autopct='%1.1f%%',
           startangle=90, textprops={'fontsize': 11})
axes[0].set_title('Distribution by Purpose', fontsize=14, fontweight='bold')

# Bar chart
axes[1].bar(range(len(purpose_counts)), purpose_counts.values, color='teal')
axes[1].set_xticks(range(len(purpose_counts)))
axes[1].set_xticklabels(purpose_counts.index, rotation=45, ha='right')
axes[1].set_ylabel('Number of Explosions', fontsize=12)
axes[1].set_title('Explosions by Purpose', fontsize=14, fontweight='bold')
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print("\nPurpose statistics:")
print(purpose_counts)

In [None]:
# Type distribution
type_counts = df['Data.Type'].value_counts()

plt.figure(figsize=(14, 7))
plt.pie(type_counts.values, labels=type_counts.index, autopct='%1.1f%%',
       startangle=90, textprops={'fontsize': 10})
plt.title('Distribution by Explosion Type', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nType statistics:")
print(type_counts)

In [None]:
# Country vs Purpose heatmap
country_purpose = pd.crosstab(df['Location.Country'], df['Data.Purpose'])

plt.figure(figsize=(12, 8))
sns.heatmap(country_purpose, annot=True, fmt='d', cmap='YlOrRd', 
           cbar_kws={'label': 'Number of Explosions'})
plt.title('Country vs Purpose Heatmap', fontsize=14, fontweight='bold')
plt.xlabel('Purpose', fontsize=12)
plt.ylabel('Country', fontsize=12)
plt.tight_layout()
plt.show()

## 7. Yield Analysis

In [None]:
# Yield statistics
print("Yield Statistics (kilotons):")
print(f"Mean: {df['Average.Yield'].mean():.2f}")
print(f"Median: {df['Average.Yield'].median():.2f}")
print(f"Std Dev: {df['Average.Yield'].std():.2f}")
print(f"Min: {df['Average.Yield'].min():.2f}")
print(f"Max: {df['Average.Yield'].max():.2f}")

# Distribution
plt.figure(figsize=(14, 6))
plt.hist(df['Average.Yield'], bins=50, color='orange', edgecolor='black', alpha=0.7)
plt.xlabel('Yield (kilotons)', fontsize=12)
plt.ylabel('Frequency (log scale)', fontsize=12)
plt.title('Distribution of Nuclear Explosion Yields', fontsize=14, fontweight='bold')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Top 10 largest explosions
print("Top 10 Largest Nuclear Explosions:")
largest = df.nlargest(10, 'Average.Yield')[['Data.Name', 'Location.Country', 
                                              'Date.Year', 'Average.Yield', 'Data.Type']]
display(largest)

In [None]:
# Yield by country (boxplot)
top_5_countries = country_counts.head(5).index
country_yields = [df[df['Location.Country'] == country]['Average.Yield'].values 
                 for country in top_5_countries]

plt.figure(figsize=(12, 6))
plt.boxplot(country_yields, labels=top_5_countries)
plt.ylabel('Yield (kilotons)', fontsize=12)
plt.title('Yield Distribution by Top 5 Countries', fontsize=14, fontweight='bold')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Yield over time
plt.figure(figsize=(16, 6))
plt.scatter(df['Date.Year'], df['Average.Yield'], alpha=0.3, s=10)
yearly_avg_yield = df.groupby('Date.Year')['Average.Yield'].mean()
plt.plot(yearly_avg_yield.index, yearly_avg_yield.values, 
        color='red', linewidth=2, label='Average Yield')
plt.xlabel('Year', fontsize=12)
plt.ylabel('Yield (kilotons)', fontsize=12)
plt.title('Nuclear Explosion Yield Over Time', fontsize=14, fontweight='bold')
plt.yscale('log')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Yield categories
yield_categories = df['Yield.Category'].value_counts()

plt.figure(figsize=(10, 6))
colors = ['green', 'yellow', 'orange', 'red']
plt.bar(range(len(yield_categories)), yield_categories.values, color=colors)
plt.xticks(range(len(yield_categories)), yield_categories.index, rotation=45, ha='right')
plt.ylabel('Number of Explosions', fontsize=12)
plt.title('Explosions by Yield Category', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nYield category distribution:")
print(yield_categories)

## 8. Geographic Analysis

In [None]:
# Geographic distribution
plt.figure(figsize=(16, 8))
for country in country_counts.head(5).index:
    country_data = df[df['Location.Country'] == country]
    plt.scatter(country_data['Location.Cordinates.Longitude'], 
               country_data['Location.Cordinates.Latitude'],
               alpha=0.6, s=50, label=country)
plt.xlabel('Longitude', fontsize=12)
plt.ylabel('Latitude', fontsize=12)
plt.title('Geographic Distribution of Nuclear Tests (Top 5 Countries)', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Regional distribution
region_counts = df['Location.Region'].value_counts().head(15)

plt.figure(figsize=(12, 8))
plt.barh(range(len(region_counts)), region_counts.values, color='teal')
plt.yticks(range(len(region_counts)), region_counts.index)
plt.xlabel('Number of Explosions', fontsize=12)
plt.title('Top 15 Nuclear Test Regions', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

## 9. Statistical Analysis

In [None]:
# Correlation analysis
correlation_cols = ['Location.Cordinates.Latitude', 'Location.Cordinates.Longitude',
                   'Data.Magnitude.Body', 'Data.Magnitude.Surface',
                   'Location.Cordinates.Depth', 'Average.Yield', 'Date.Year']

corr_matrix = df[correlation_cols].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, 
           fmt='.2f', square=True, linewidths=1)
plt.title('Correlation Matrix of Numerical Features', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Compare yields between countries (statistical test)
usa_yields = df[df['Location.Country'] == 'USA']['Average.Yield']
ussr_yields = df[df['Location.Country'] == 'USSR']['Average.Yield']

# T-test
t_stat, p_value = stats.ttest_ind(usa_yields, ussr_yields, equal_var=False)

print("Statistical Comparison: USA vs USSR Yields")
print(f"USA - Mean: {usa_yields.mean():.2f} kt, Median: {usa_yields.median():.2f} kt")
print(f"USSR - Mean: {ussr_yields.mean():.2f} kt, Median: {ussr_yields.median():.2f} kt")
print(f"\nT-test results:")
print(f"  t-statistic: {t_stat:.4f}")
print(f"  p-value: {p_value:.4f}")
if p_value < 0.05:
    print("  Result: Statistically significant difference (p < 0.05)")
else:
    print("  Result: No statistically significant difference (p >= 0.05)")

## 10. Key Insights & Conclusions

In [None]:
# Generate comprehensive summary
print("=" * 80)
print("KEY INSIGHTS FROM NUCLEAR EXPLOSIONS ANALYSIS")
print("=" * 80)

insights = [
    f"1. Total nuclear explosions: {len(df)}",
    f"2. Time period: {df['Date.Year'].min()} - {df['Date.Year'].max()} ({df['Date.Year'].max() - df['Date.Year'].min()} years)",
    f"3. Countries conducting tests: {df['Location.Country'].nunique()}",
    f"4. Top 3 countries: {', '.join([f'{c} ({country_counts[c]})' for c in country_counts.head(3).index])}",
    f"5. Peak testing year: {yearly_counts.idxmax()} ({yearly_counts.max()} explosions)",
    f"6. Average explosion yield: {df['Average.Yield'].mean():.2f} kilotons",
    f"7. Largest explosion: {df['Average.Yield'].max():.2f} kt ({df.loc[df['Average.Yield'].idxmax(), 'Data.Name']})",
    f"8. Most common purpose: {df['Data.Purpose'].value_counts().index[0]}",
    f"9. Most common type: {df['Data.Type'].value_counts().index[0]}",
    f"10. Cold War era (1947-1991): {len(df[(df['Date.Year'] >= 1947) & (df['Date.Year'] <= 1991)])} tests"
]

for insight in insights:
    print(insight)

print("\n" + "=" * 80)
print("CONCLUSIONS")
print("=" * 80)
print("""
1. Nuclear testing peaked during the Cold War era, particularly in the 1960s
2. USA and USSR conducted the majority of tests (>85% combined)
3. Underground testing became the dominant method after the 1963 treaty
4. Testing dramatically decreased after the end of the Cold War in 1991
5. The largest explosions were conducted by the USSR (Tsar Bomba ~50,000 kt)
6. Most tests were for weapons development (Wr) rather than combat
7. Only 2 nuclear weapons were used in combat (Hiroshima and Nagasaki, 1945)
""")

## Summary

This comprehensive analysis has revealed important patterns in nuclear testing history:

- **Temporal patterns**: Testing peaked in the 1960s and declined significantly after 1991
- **Geographic concentration**: Most tests were conducted by the USA and USSR
- **Yield trends**: Wide variation in yields, with the largest being the Soviet Tsar Bomba
- **Purpose**: Overwhelming majority for weapons research, not combat
- **Impact of treaties**: Clear impact of test ban treaties on testing patterns

This dataset provides crucial insights into the nuclear age and the arms race during the Cold War period.