# Analysis of Shark Attacks on the Oregon Coast

## By Warren Berg

The Oregon Coast is known for having sharks in the region, which poses a risk for surfers who like explore and enjoy the many great breaks from Brookings to Seaside. 

As an experienced surfer, the thought of being attacked by a shark seems like an unlikely threat that would never actually happen to *me*. However, this is mostly just willfull ignorance, and seriously considering the prospect of getting attacked by a shark is enough to put a shudder down any surfers spine. 

Therefore, I decided to create this project with the goal of understanding the inherent risk of being attacked by a shark on the Oregon Coast. I used historical shark attack data sourced from Globa Shark Attack File (http://www.sharkattackfile.net/). 

Hopefully, this analysis can help Oregon surfers (and myself) better understand the risks before paddling out.

In [None]:
### Great Progress!

# What to work on next:
# - Create Visualizations with Data
#    - Show geographically on map based on frequency of attacks
#    - Look at most common types of injuries
#    - visualize calendar and show months where attacks are most likely to occur
#        - Any correlation between what time of year attacks occur and which location?
#    - visualize times of day on number line
# - Clearly comment and describe what I am doing
# - Make visually elegant, reduce amount of code

In [181]:
#First Import the necessary packages

import pandas as pd
import numpy as np

In [182]:
cd C:\Users\wberg\OneDrive\Documents\Data Projects

C:\Users\wberg\OneDrive\Documents\Data Projects


In [183]:
# import West Coast Shark Attack Data
# Sourced From: Global Shark Attack File at http://www.sharkattackfile.net/

shark_data = pd.read_excel("Shark Attack Data West Coast.xlsx")

In [184]:
# Filter data to include only attacks from Oregon

oregon_attacks = pd.DataFrame(shark_data[shark_data["Area"] == "Oregon"])

In [185]:
oregon_attacks

Unnamed: 0,Case Number,Date,Year,Type,Country,Area,Location,Activity,Name,Sex,Age,Injury,Fatal (Y/N),Time,Species,Investigator or Source
9,2019.03.05.b,05-Mar-2019,2019,Unprovoked,USA,Oregon,"Cape Kiwanda, Tillamook County",Surfing,Nathan Holstedt,M,,"No injury, board bitten and dented",N,08h30,,"M. Michaelson, GSAF"
27,2016.10.10,10-Oct-2016,2016,Unprovoked,USA,Oregon,"Indian Beach, Ecola State Park, Clatsop County",Surfing,Joseph Tanner,M,29.0,Wounds to upper thigh and lower leg,N,16h00,,"UP Beacon, 10/12/2016"
57,2013.11.22,22-Nov-2013,2013,Unprovoked,USA,Oregon,"Gleneden Beach, Lincoln County",Surfing,Andrew Gardiner,M,25.0,"No injury, board bitten",N,10h30,"White shark, 10 '",R. Collier
71,2012.01.13,13-Jan-2012,2012,Unprovoked,USA,Oregon,"Lincoln City, Lincoln County",Surfing,Steve Harnack,M,53.0,"No injury, surfboard damaged",N,,White shark,R. Collier
72,2011.12.06,06-Dec-2011,2011,Unprovoked,USA,Oregon,Seaside Cove,Surfing,female,F,,Minor injury to calf,N,09h00,,"Seaside Signal, 12/6/2011"
75,2011.10.20,20-Oct-2011,2011,Unprovoked,USA,Oregon,"Newport, Lincoln County",Surfing,Bobby Gumm,M,41.0,"No injury, shark bit surfboard",N,,"White shark, 15'",R. Collier
76,2011.10.10,10-Oct-2011,2011,Unprovoked,USA,Oregon,Seaside,Surfing,Doug Niblack,M,,No injury,N,,10' to 12' shark,"R.Collier; KATU News, 10/11/2011"
80,2010.10.28,28-Oct-2010,2010,Unprovoked,USA,Oregon,Florence,Surfing,Seth Mead,M,,No injury to surfer,N,15h20,,R. Collier
82,2010.09.27,27-Sep-2010,2010,Unprovoked,USA,Oregon,Winchester Bay,Surfing,David Lowden,M,29.0,"No injury, surfboard rammed",N,16h00,White shark,R. Collier
113,2006.10.31,31-Oct-2006,2006,Unprovoked,USA,Oregon,"Siletz River mouth, Lincoln County",Surfing,Tony Perez,M,22.0,"No injury, surfboard bitten",N,Just before sundown,"White shark, 16'",R. Collier


In [186]:
# Cleaning up the data

# Change Names of Locations to make them Uniform
oregon_attacks.loc[[9],['Location']]='Cape Kiwanda'
oregon_attacks.loc[[27],['Location']]='Indian Beach'
oregon_attacks.loc[[57],['Location']]='Gleneden Beach'
oregon_attacks.loc[[71],['Location']]='Lincoln City'
oregon_attacks.loc[[75],['Location']]='Newport'
oregon_attacks.loc[[76],['Location']]='Seaside Cove'
oregon_attacks.loc[[113],['Location']]='Lincoln City'
oregon_attacks.loc[[114],['Location']]='Florence'
oregon_attacks.loc[[119],['Location']]='Tillamook Head'
oregon_attacks.loc[[129],['Location']]='Gold Beach'

oregon_attacks.loc[[161],['Location']]='Oswald State Park'
oregon_attacks.loc[[169],['Location']]='Bastendorf Beach'
oregon_attacks.loc[[174],['Location']]='Gold Beach'
oregon_attacks.loc[[176],['Location']]='Winchester Bay'
oregon_attacks.loc[[180],['Location']]='Neskowin'
oregon_attacks.loc[[193],['Location']]='Indian Beach'
oregon_attacks.loc[[207],['Location']]='Cape Kiwanda'
oregon_attacks.loc[[212],['Location']]='Cape Kiwanda'
oregon_attacks.loc[[220],['Location']]='Winchester Bay'
oregon_attacks.loc[[223],['Location']]='Haystack Rock'

# Fix Times and Dates
oregon_attacks.loc[[113],['Time']] = '17h00'
oregon_attacks.loc[[114],['Time']] = '17h00'
oregon_attacks.loc[[242], ['Date']]= 'NA-Sep-1974'

# Fill Age
oregon_attacks.loc[[80],['Age']] = 32


# Delete missing value
oregon_attacks = oregon_attacks.drop(235)

In [200]:
# Region
print('Analysis of Shark Attacks by Region')
print('')
print('Which regions along the coast have a higher concentration of shark attacks?')
print('_______________________________________________________________________________________________________________________________')
print('')

# Shark Attack Count by Location
by_location = oregon_attacks['Location'].value_counts()
print('This shows how many shark attacks have been reported at each location')
by_location

Analysis of Shark Attacks by Region

Which regions along the coast have a higher concentration of shark attacks?
_______________________________________________________________________________________________________________________________

This shows how many shark attacks have been reported at each location


Winchester Bay       5
Cape Kiwanda         4
Seaside Cove         2
Lincoln City         2
Florence             2
Oswald State Park    2
Gleneden Beach       2
Gold Beach           2
Indian Beach         2
Neskowin             1
Newport              1
Bastendorf Beach     1
Haystack Rock        1
Myers Creek          1
Tillamook Head       1
Name: Location, dtype: int64

In [206]:
# Activity
print('What People were doing when attacked')
print('_______________________________________________________________________________________________________________________________')
print('')


print('This shows the activity of the victim when they were attacked')
print('')
print(oregon_attacks['Activity'].value_counts())
print('')
print('The data shows that every person that has been attacked by a shark in Oregon has been surfing when they were attacked')


What People were doing when attacked
_______________________________________________________________________________________________________________________________

This shows the activity of the victim when they were attacked

Surfing                               24
Surfing (sitting on his board)         3
Surfing (lying prone on his board)     1
Boogie boarding or Surfing             1
Name: Activity, dtype: int64

The data shows that every person that has been attacked by a shark in Oregon has been surfing when they were attacked


In [207]:
# Demographics of people attacked
print('Demographics of the People Attacked by Sharks')
print('_______________________________________________________________________________________________________________________________')
print('')

# Age Demographics 
by_age = oregon_attacks['Age']    # subset column of Age values
by_age = by_age.dropna()          # delete missing age values
by_age = by_age.astype(int)       # convert data type to int

#Take average and print
avg_age = np.average(by_age)
print("The average age of people who were attacked is,", avg_age,".")
print('_______________________________________________________________________________________________________________________________')
print('')

# Gender Demographics
oregon_attacks.rename(columns = {'Sex ': 'Sex'}, inplace = True)
print(oregon_attacks['Sex'].value_counts())
print("")
print('There have been 28 males and 1 female involved in attacks')
print('_______________________________________________________________________________________________________________________________')

# Any people Attacked more than once?
oregon_attacks['Name'].value_counts()
print("Amazingly, there has been one surfer who has been attacked twice on the Oregon Coast."
      "His Name is Seth Mead and he was attacked in 2004 and 2010. The details are provided below.")
oregon_attacks[oregon_attacks['Name'] == 'Seth Mead']

Demographics of the People Attacked by Sharks
_______________________________________________________________________________________________________________________________

The average age of people who were attacked is, 29.84 .
_______________________________________________________________________________________________________________________________

M    28
F     1
Name: Sex, dtype: int64

There have been 28 males and 1 female involved in attacks
_______________________________________________________________________________________________________________________________
Amazingly, there has been one surfer who has been attacked twice on the Oregon Coast.His Name is Seth Mead and he was attacked in 2004 and 2010. The details are provided below.


Unnamed: 0,Case Number,Date,Year,Type,Country,Area,Location,Activity,Name,Sex,Age,Injury,Fatal (Y/N),Time,Species,Investigator or Source
80,2010.10.28,28-Oct-2010,2010,Unprovoked,USA,Oregon,Florence,Surfing,Seth Mead,M,32,No injury to surfer,N,15h20,,R. Collier
129,2004.09.20,20-Sep-2004,2004,Unprovoked,USA,Oregon,Gold Beach,Surfing,Seth Mead,M,26,Leg bitten,N,09h00,White shark,"S. Mead, R. Collier, J. Eager, B. Middleton"


In [211]:
# Injuries
print('Breakdown of Injuries Sustained During Shark Attacks')
print('_______________________________________________________________________________________________________________________________')
print('')

# Fatalities
print('Fatal Attack? (Y/N)')
print('')
fatalities = oregon_attacks['Fatal (Y/N)'].value_counts()
print(fatalities)
print('')
print('Luckily, none of the reported Shark Attacks on the Oregon have been fatal since data began being collected in 1974')
print('_______________________________________________________________________________________________________________________________')
print('')

# Types of Injuries Sustained
print('The types of injuries sustained in shark attacks include')
by_injury = oregon_attacks['Injury']
inj_split = by_injury.str.split(',', expand = True)

total_injury_list = inj_split[0].append(inj_split[1], ignore_index=True)
total_injury_list = total_injury_list.dropna()
total_injury_list = total_injury_list.astype(str)
total_injury_list.value_counts()

Breakdown of Injuries Sustained During Shark Attacks
_______________________________________________________________________________________________________________________________

Fatal Attack? (Y/N)

N    29
Name: Fatal (Y/N), dtype: int64

Luckily, none of the reported Shark Attacks on the Oregon have been fatal since data began being collected in 1974
_______________________________________________________________________________________________________________________________

The types of injuries sustained in shark attacks include


No injury                               13
 board bitten                            4
 surfboard bitten                        2
Calf lacerated & board bitten            1
Laceration & puncture wounds to foot     1
Multiple major Injuries                  1
 Ankle lacerated                         1
 board bitten and dented                 1
 board broken                            1
Wounds to upper thigh and lower leg      1
 surfboard damaged                       1
Lacerations to ankle & calf              1
 surfboard rammed                        1
No injury to surfer                      1
Minor bruises                            1
 shark bit surfboard                     1
Leg bitten & femur fractured             1
Thigh lacerated                          1
Minor injury to calf                     1
Right thigh bitten                       1
Left shoulder & side bitten              1
Minor injury                             1
 shark bit board                         1
Abrasion on

In [191]:
# Might want to consider editing these values to be more concise

total_injury_list

0                                No injury
1      Wounds to upper thigh and lower leg
2                                No injury
3                                No injury
4                    Minor injury to calf 
5                                No injury
6                                No injury
7                      No injury to surfer
8                                No injury
9                                No injury
10    Laceration & puncture wounds to foot
11                            Minor injury
12             Lacerations to ankle & calf
13                              Leg bitten
14                         Ankle lacerated
15                               No injury
16                      Right thigh bitten
17                               No injury
18                               No injury
19                           Minor bruises
20             Left shoulder & side bitten
21           Calf lacerated & board bitten
22            Leg bitten & femur fractured
23         

In [192]:
oregon_attacks.head()

Unnamed: 0,Case Number,Date,Year,Type,Country,Area,Location,Activity,Name,Sex,Age,Injury,Fatal (Y/N),Time,Species,Investigator or Source
9,2019.03.05.b,05-Mar-2019,2019,Unprovoked,USA,Oregon,Cape Kiwanda,Surfing,Nathan Holstedt,M,,"No injury, board bitten and dented",N,08h30,,"M. Michaelson, GSAF"
27,2016.10.10,10-Oct-2016,2016,Unprovoked,USA,Oregon,Indian Beach,Surfing,Joseph Tanner,M,29.0,Wounds to upper thigh and lower leg,N,16h00,,"UP Beacon, 10/12/2016"
57,2013.11.22,22-Nov-2013,2013,Unprovoked,USA,Oregon,Gleneden Beach,Surfing,Andrew Gardiner,M,25.0,"No injury, board bitten",N,10h30,"White shark, 10 '",R. Collier
71,2012.01.13,13-Jan-2012,2012,Unprovoked,USA,Oregon,Lincoln City,Surfing,Steve Harnack,M,53.0,"No injury, surfboard damaged",N,,White shark,R. Collier
72,2011.12.06,06-Dec-2011,2011,Unprovoked,USA,Oregon,Seaside Cove,Surfing,female,F,,Minor injury to calf,N,09h00,,"Seaside Signal, 12/6/2011"


In [214]:
# Timing of Attacks 
print('Breakdown of the Timing of Attacks')
print('')
print('Are there times throughout the year, and times in the day where it is more likely to get attacked?')
print('_______________________________________________________________________________________________________________________________')
print('')

# Date
print('What are the most common months to get attacked?')
print('')
date_of_attack = oregon_attacks['Date']
day_month_year = date_of_attack.str.split('-', expand = True)
day_month_year = day_month_year.rename(columns = {0:'Day', 1:"Month", 2:"Year"})

print(day_month_year['Month'].value_counts())
print('_______________________________________________________________________________________________________________________________')
print('')


# Time of Day
print('What time of day do most attacks occur?')
print('')
time_of_attack = oregon_attacks['Time']

# Delete Missing / Incomplete values
time_of_attack = time_of_attack.dropna()
time_of_attack = time_of_attack.drop(137)
time_of_attack = time_of_attack.drop(242)

# Reformat
time_reformat = time_of_attack.str.split('h', expand = True)
time_of_attack = time_reformat[0] + time_reformat[1]
print(time_of_attack)



print('Any correlation between what time of year attacks occur and which location?')


Breakdown of the Timing of Attacks

Are there times throughout the year, and times in the day where it is more likely to get attacked?
_______________________________________________________________________________________________________________________________

What are the most common months to get attacked?

Sep    7
Oct    7
Aug    3
Nov    3
Mar    2
Jan    2
Dec    2
Apr    1
Feb    1
Jul    1
Name: Month, dtype: int64
_______________________________________________________________________________________________________________________________

What time of day do most attacks occur?

9       0830
27      1600
57      1030
72      0900
80      1520
82      1600
113     1700
114     1700
115     1600
119     1200
129     0900
149    0930 
161     1630
169     1630
174     1700
176     0945
180     0930
193     1730
207     1530
212     1000
220     1545
223     1020
231     1400
dtype: object
Any correlation between what time of year attacks occur and which location?


In [194]:
# Information on Sharks

oregon_attacks.rename(columns = {'Species ' : 'Species'}, inplace = True)
oregon_attacks['Species']

9                                                    NaN
27                                                   NaN
57                                     White shark, 10 '
71                                           White shark
72                                                   NaN
75                                      White shark, 15'
76                                      10' to 12' shark
80                                                   NaN
82                                           White shark
113                                    White shark, 16' 
114                                                  NaN
115                                         White shark?
119                                          White shark
129                                          White shark
137                                     2.4 m [8'] shark
147             White shark, 5 m  to 6 m [16.5' to 20'] 
149                              5 m [16.5'] white shark
161                            

In [None]:
# We want to visualize where the attacks have taken place on a map of oregon

In [None]:
# Next, we want to perform some statistical analysis to determine how likely it is that you will 
# get attacked by a shark given a certain location.

# What is the probability of getting attacked 

# Using Bayesian Statistics, what kind of probability distribution


In [None]:
# What is the most common injury?