# Shark Attacks - Business Insights & Conclusions

**Project:** Shark Attacks Data Analysis  
**Author:** Data Science Bootcamp - Ironhack  
**Date:** January 2026

## Objective
This notebook synthesizes findings from EDA and hypothesis testing to provide:
- Actionable business insights
- Risk assessment framework
- Strategic recommendations
- Final conclusions

## Setup

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Import custom modules
import sys
sys.path.append('..')
from src import (
    analyze_species,
    analyze_age_distribution,
    analyze_fatality_rates,
    calculate_surf_risk_score,
    get_summary_statistics,
    plot_species,
    plot_age_distribution,
    plot_fatality_analysis,
    plot_risk_score,
    set_plot_style
)

# Set visualization style
set_plot_style()

In [None]:
# Load cleaned data
df = pd.read_csv('../data/shark_attacks_cleaned.csv')
df.shape

## 1. Additional Analysis

### Shark Species Insights

In [None]:
# Analyze species
top_species = analyze_species(df, top_n=10)
top_species

In [None]:
# Visualize
plot_species(top_species, save_path='species.png')
plt.show()

**Species Insights:**
- **White Shark** (Great White) is the most commonly identified species in attacks
- **Tiger Shark** and **Bull Shark** are also frequently involved
- These three species are known for:
  - Large size and power
  - Coastal habitat overlap with humans
  - Curious/investigative behavior
- Many incidents have unconfirmed species (underwater identification is difficult)

### Age Demographics

In [None]:
# Analyze age distribution
age_results = analyze_age_distribution(df)
age_results['stats']

In [None]:
# Visualize
plot_age_distribution(age_results['age_data'], save_path='age_distribution.png')
plt.show()

**Age Profile:**
- **Median age:** 24 years (prime water activity participation age)
- **Peak risk:** 15-30 year age group
- **Mean age:** 28 years
- Distribution is right-skewed (younger victims more common)
- Target demographic for safety education and equipment marketing

### Fatality Analysis

In [None]:
# Analyze fatality rates
fatality_results = analyze_fatality_rates(df, top_n_countries=5)

overall_rate = fatality_results['overall_fatality_rate']
f"Overall fatality rate: {overall_rate:.1f}%"

In [None]:
# Fatality by country
fatality_results['fatality_by_country']

In [None]:
# Visualize
plot_fatality_analysis(
    fatality_results['fatal_counts'],
    fatality_results['fatality_by_country'],
    overall_rate=overall_rate,
    save_path='fatality.png'
)
plt.show()

**Fatality Insights:**
- **Overall fatality rate: ~23%** - most attacks are survivable with proper medical response
- **Geographic variation:** Australia and South Africa have higher fatality rates
- **USA has lower fatality rate** (~8%) - likely due to:
  - Better emergency response infrastructure
  - Proximity to medical facilities
  - Effective first-aid practices
- **Implication:** Medical response time is critical factor in outcomes

## 2. Business Case: Surf Shop Location Risk Assessment

**Scenario:** A surf equipment company wants to identify the safest coastal markets for expansion.

**Approach:** Calculate a risk score combining attack frequency and fatality rate for surfing-specific incidents.

In [None]:
# Calculate risk scores for surf locations
risk_scores = calculate_surf_risk_score(df, min_attacks=10)
risk_scores

In [None]:
# Top 10 safest countries
safest = risk_scores.head(10)
safest

In [None]:
# Top 10 riskiest countries
riskiest = risk_scores.tail(10)
riskiest

In [None]:
# Visualize risk scores
plot_risk_score(risk_scores, n_safest=7, n_riskiest=8, save_path='risk_score.png')
plt.show()

### Risk Score Interpretation:

**Risk Score Formula:**
```
Risk Score = (Normalized Attack Frequency √ó 50) + Fatality Rate
```

- **Lower score = Safer location**
- **Score < 30:** Low risk (green)
- **Score 30-50:** Medium risk (orange)
- **Score > 50:** High risk (red)

### Business Recommendations:

**‚úÖ Recommended Expansion Markets (Low Risk):**
- New Zealand
- Brazil
- South Africa (moderate volume, manageable risk)

**‚ö†Ô∏è Caution Markets (Medium Risk):**
- Australia (high volume, moderate fatality)

**‚ùå High Risk Markets:**
- USA (very high volume despite lower fatality rate)
- Reunion Island (very high fatality rate)

**Note:** USA remains commercially viable due to market size, but requires comprehensive safety product positioning.

## 3. Executive Summary

In [None]:
# Get overall statistics
summary = get_summary_statistics(df)
summary

### Dataset Overview:

- **Total Attacks Analyzed:** 7,058 incidents
- **Time Span:** 1500s - 2025 (focus on 1900-2025)
- **Geographic Coverage:** 210+ countries
- **Activity Types:** 1,500+ different activities recorded

### Key Findings Summary:

#### 1. Geographic Patterns (H1 ‚úÖ)
- **66.4%** of attacks occur in just 3 countries: USA, Australia, South Africa
- Extreme geographic concentration enables targeted interventions
- Coastal infrastructure and reporting quality vary by region

#### 2. Activity-Based Risk (H2 ‚úÖ)
- **36.5%** of attacks involve surfing or swimming
- Recreational activities dominate (vs. commercial fishing)
- Clear opportunity for activity-specific safety products

#### 3. Demographic Profile (H3 ‚úÖ)
- **87.5%** of victims are male (7:1 ratio)
- **Median age: 24 years** (peak: 15-30 age group)
- Target demographic: Young adult males in water sports

#### 4. Temporal Trends (H4 ‚úÖ)
- **257% increase** comparing early (1900-1940s) to recent decades (1970-2020s)
- Growth driven by: population, tourism, reporting improvements
- Market for safety products is expanding

#### 5. Severity & Outcomes
- **Overall fatality rate: 23.2%**
- **Most attacks are survivable** with proper medical response
- Fatality varies by geography (Australia/SA higher, USA lower)

#### 6. Species Involved
- **Top 3 species:** White Shark, Tiger Shark, Bull Shark
- Large coastal species with investigative behavior
- Many incidents have unconfirmed species

## 4. Strategic Recommendations

### For Safety & Education:

1. **Geographic Targeting**
   - Focus resources on USA, Australia, South Africa coastal regions
   - Develop region-specific safety protocols
   - Partner with local authorities in high-risk areas

2. **Activity-Specific Campaigns**
   - Create targeted education for surfers and swimmers
   - Develop activity-specific safety equipment
   - Promote buddy system and group activities

3. **Demographic-Focused Messaging**
   - Target young adult males (15-30 years)
   - Use channels popular with water sports enthusiasts
   - Emphasize risk awareness without fear-mongering

### For Business/Insurance:

1. **Risk-Based Pricing**
   - Use geographic and activity data for insurance premiums
   - Offer discounts for safety equipment usage
   - Consider age and gender in risk assessment

2. **Product Development**
   - Shark deterrent technology for surfers
   - Emergency response equipment (tourniquets, communication devices)
   - Monitoring/alert systems for high-risk areas

3. **Market Expansion**
   - Prioritize low-risk coastal markets for surf shops
   - USA remains viable despite high volume (low fatality, large market)
   - Avoid very high-risk markets unless premium positioning

### For Research & Policy:

1. **Improve Data Collection**
   - Standardize reporting across countries
   - Better species identification protocols
   - Track environmental factors (water temp, season, etc.)

2. **Medical Response**
   - Emergency response time is critical to survival
   - Invest in coastal medical infrastructure
   - Train lifeguards in shark attack first aid

3. **Prevention Research**
   - Study shark behavior and deterrent effectiveness
   - Investigate temporal patterns (time of day, season)
   - Monitor impact of shark populations on attack rates

## 5. Limitations & Future Work

### Data Limitations:
- **Reporting bias:** Developed countries have better data infrastructure
- **Survivorship bias:** Non-fatal attacks more likely to be documented
- **Historical data:** Early records less reliable and complete
- **Species identification:** Difficult in attack situations, many unconfirmed
- **Activity classification:** Inconsistent categorization across records

### Future Analysis Opportunities:
1. **Time series forecasting** of attack trends by region
2. **Environmental factor analysis** (water temperature, shark populations, seasonality)
3. **Machine learning models** for risk prediction
4. **Cost-benefit analysis** of various prevention strategies
5. **Comparative analysis** with other recreational risks
6. **Social media sentiment analysis** around shark attacks and safety

## 6. Final Conclusions

This comprehensive analysis of 7,000+ shark attacks spanning over a century reveals clear, actionable patterns:

### ‚úÖ All Hypotheses Validated:
Our data-driven approach confirmed all four initial hypotheses, demonstrating that shark attacks exhibit predictable patterns based on:
- Geography (concentrated in 3 countries)
- Activity type (surfing/swimming highest risk)
- Demographics (young adult males)
- Time period (increasing over decades)

### üéØ Practical Implications:
1. **Risk is concentrated and predictable** - enabling targeted interventions
2. **Most attacks are survivable** - medical response is critical
3. **Market exists for safety products** - clear target demographic and use cases
4. **Geographic variation matters** - one-size-fits-all approaches won't work

### üí° Key Insight:
While shark attacks are increasing in absolute numbers, this primarily reflects:
- Growing coastal populations and tourism
- Increased participation in water sports
- Better reporting and data collection

The **actual risk per ocean exposure** may be stable or even declining. Shark attacks remain extremely rare events relative to the millions of people who safely enjoy ocean activities annually.

### üöÄ Recommended Action:
Focus on **evidence-based prevention** rather than fear-based messaging. The data shows clear patterns that can inform:
- Where to deploy resources
- Who to target with education
- What activities need safety enhancements
- How to price risk appropriately

**The goal is informed coexistence with sharks, not avoidance of the ocean.**

---

## Project Information

**Data Source:** Global Shark Attack File (GSAF)  
**Analysis Period:** January 2026  
**Tools Used:** Python, Pandas, Matplotlib, Seaborn  
**Notebooks:**
1. `01_eda.ipynb` - Exploratory Data Analysis
2. `02_hypothesis_testing.ipynb` - Statistical Hypothesis Testing
3. `03_conclusions.ipynb` - Business Insights & Recommendations (this notebook)

**GitHub Repository:** [Link to your repo]  
**Contact:** [Your email/LinkedIn]