# Mapping in Python

In this notebook, we’ll walk through the process of **organizing and preparing a real-world dataset** for analysis and storytelling using Python and AI tools like ChatGPT.

Imagine you have collected data on **pirate attacks around the world**, and you now have a CSV file with hundreds or thousands of entries, each containing information such as the date, location, type of attack, and other details.

Our goal is to:  
1. **Create a clean folder structure** for storing your dataset and any outputs.  
2. **Understand the contents of your dataset** so you know what needs cleaning or organizing.  
3. **Prepare the dataset for further analysis and visualization**, step by step, using AI-assisted coding.

At the beginning, we’ll **set up our environment** in Jupyter Notebook and organize the files we need. You’ll also learn how to **ask ChatGPT** for guidance on tasks like creating folders, inspecting files, or standardizing column names so you can get the correct code without guessing or memorizing syntax.

### Step 1: Loading CSV Files from a Folder

At the beginning, we’ll **import the libraries** we need in Jupyter Notebook. Since we’ve already used **pandas**, we don’t need to install it again, we just import it and open our dataset.

In [1]:
import pandas as pd

At the beginning, we’ll **install and import the libraries** we need in Jupyter Notebook. Since we’ve already used **pandas**, we don’t need to install it again, we just import it and open our dataset.

If you've never done it before, you can **ask your LLM how to import a CSV file**. You should get something like this, and then just adjust the dataset name and how you wish to address it:

In [2]:
pirates = pd.read_csv('pirate_attacks_data.csv')
pirates.head() #shows the first 5 rows of the dataset, if you add the number in the bracket it will show that many rows

Unnamed: 0.1,Unnamed: 0,date,time,longitude,latitude,attack_type,location_description,country,eez_country,shore_distance,shore_longitude,shore_latitude,attack_description,vessel_name,vessel_type,vessel_status,data_source,region,country_name
0,0,1993-01-02,,116.9667,19.7,,Hong Kong - Luzon - Hainan,CHN,TWN,357.502373,115.825956,22.746644,,Mv Cosmic Leader,,,mappingpiracy,East Asia & Pacific,China
1,1,1993-01-04,,116.0,22.35,,Hong Kong - Luzon - Hainan,CHN,CHN,47.431573,115.825956,22.746644,,Mv Tricolor Star III,,,mappingpiracy,East Asia & Pacific,China
2,2,1993-01-06,,115.25,19.67,,Hong Kong - Luzon - Hainan,CHN,TWN,280.811871,114.302501,22.044867,,Mv Arktis Star,,,mappingpiracy,East Asia & Pacific,China
3,3,1993-01-08,,124.5833,29.9,,East China Sea,CHN,CHN,209.923396,122.409679,29.9112,,Ussurijsk,,,mappingpiracy,East Asia & Pacific,China
4,4,1993-01-12,,120.2667,18.133333,,Hong Kong - Luzon - Hainan,PHL,PHL,22.027332,120.470063,18.09101,,Mv Chennai Nermai,,,mappingpiracy,East Asia & Pacific,Philippines


In [3]:
pirates.head(3) #like this

Unnamed: 0.1,Unnamed: 0,date,time,longitude,latitude,attack_type,location_description,country,eez_country,shore_distance,shore_longitude,shore_latitude,attack_description,vessel_name,vessel_type,vessel_status,data_source,region,country_name
0,0,1993-01-02,,116.9667,19.7,,Hong Kong - Luzon - Hainan,CHN,TWN,357.502373,115.825956,22.746644,,Mv Cosmic Leader,,,mappingpiracy,East Asia & Pacific,China
1,1,1993-01-04,,116.0,22.35,,Hong Kong - Luzon - Hainan,CHN,CHN,47.431573,115.825956,22.746644,,Mv Tricolor Star III,,,mappingpiracy,East Asia & Pacific,China
2,2,1993-01-06,,115.25,19.67,,Hong Kong - Luzon - Hainan,CHN,TWN,280.811871,114.302501,22.044867,,Mv Arktis Star,,,mappingpiracy,East Asia & Pacific,China


### Step 2: Analyze our dataset

To understand our dataset better, we first want to **see how many entries we have in each column**. This helps us check data quality, notice missing values, and get a sense of the dataset’s structure.

If you're not sure how to do this, you can ask your LLM something like:

*How can I check how many non-null entries I have in each column in my pandas DataFrame?*

We get something like this:

In [4]:
pirates.count()  # counts non-null entries in each column

Unnamed: 0              7511
date                    7511
time                    1149
longitude               7511
latitude                7511
attack_type             7391
location_description    7503
country                 7492
eez_country             7216
shore_distance          7511
shore_longitude         7511
shore_latitude          7511
attack_description      1173
vessel_name             6079
vessel_type             1173
vessel_status           6599
data_source             7511
region                  7475
country_name            7475
dtype: int64

We can see that the columns for coordinates (**lon** and **lat**) have the same number of entries, which is important because we need both values to map each incident correctly.

Next, let's check how many **types of attack** appear in the `attack_type` column. This helps us understand whether this field could be used as a useful category for grouping or analysis.

*How can I check how many unique attack types appear in the attack_type column of my pandas DataFrame called pirates?*

In [6]:
pirates['attack_type'].nunique()   # number of unique attack types
pirates['attack_type'].unique()    # list of all unique attack types

array([nan, 'Attempted', 'Hijacked', 'Boarded', 'Boarding', 'Fired Upon',
       'Explosion', 'Detained', 'Suspicious'], dtype=object)

We can try that. 

Next, we want to check the **data type of each column** in our dataset. This helps us understand whether a column contains numbers, text, dates, or other types of data, which is important before creating a map or duing further analysis.

*How can I see the data type of each column in my pandas DataFrame called `pirates`?*

In [7]:
pirates.dtypes   # shows the data type of each column

Unnamed: 0                int64
date                     object
time                     object
longitude               float64
latitude                float64
attack_type              object
location_description     object
country                  object
eez_country              object
shore_distance          float64
shore_longitude         float64
shore_latitude          float64
attack_description       object
vessel_name              object
vessel_type              object
vessel_status            object
data_source              object
region                   object
country_name             object
dtype: object

For now, this overview is enough. Depending on which parts of the data we want to use and how we plan to visualize it, we might perform more detailed analyses later. For the moment, this gives us a solid understanding of the dataset and its structure.

### Step 3: Create a map

Now, let's map! I tried saying something like this to a LLM:

*I have a Python DataFrame `pirates` with locations and details of pirate attacks. The columns include:* **(paste dtypes output here)** *I want code to create a map showing all attack locations with the following features:*

*- Use the `attack_type` column as a category and assign different icons for each type*  
*- If some coordinates are at the same location, offset them slightly so points do not overlap*  
*- There are many points, so when zoomed out, they should appear clustered, and when zoomed in, all individual points should be visible*  
*- Mobile-friendly and responsive display is crucial*  
*- Clicking on each point should show a popup with the attack date, attack details, location, nearest country (`country_name` column), and distance from the coast*  
*- Include a legend that explains which icon represents which attack type*  
*- Add a title: "Where Pirate Attacks Happened"*  
*- Add a subtitle: "From 1993 to 2020"*  
*- Add a footer note: "Source: Kaggle"*  
*- If I need to install or import libraries, tell me that as well*
*- I want to export my map as HTML so I can host it on my website*

This is more than enough for beginning. Let's try it:

In [8]:
!pip install folium

Collecting folium
  Downloading folium-0.20.0-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.8.2-py3-none-any.whl.metadata (1.7 kB)
Downloading folium-0.20.0-py2.py3-none-any.whl (113 kB)
Downloading branca-0.8.2-py3-none-any.whl (26 kB)
Installing collected packages: branca, folium
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [folium]2m1/2[0m [folium]
[1A[2KSuccessfully installed branca-0.8.2 folium-0.20.0


In [18]:
# Import required libraries
import pandas as pd
import folium
from folium.plugins import MarkerCluster

# Drop rows with missing coordinates
pirates = pirates.dropna(subset=['latitude', 'longitude'])

# Replace missing attack_type with 'Unknown'
pirates['attack_type'] = pirates['attack_type'].fillna('Unknown')

# Create map centered on mean coordinates
m = folium.Map(
    location=[pirates['latitude'].mean(), pirates['longitude'].mean()],
    zoom_start=3,
    tiles='CartoDB positron'
)

# Define colors and icons dynamically
attack_types = pirates['attack_type'].unique()
colors = ['red', 'blue', 'green', 'orange', 'purple', 'darkred', 'darkblue', 'darkgreen', 'gray']
icons = ['glyphicon-flag', 'glyphicon-briefcase', 'glyphicon-ship', 'glyphicon-asterisk',
         'glyphicon-warning-sign', 'glyphicon-fire', 'glyphicon-lock', 'glyphicon-question-sign', 'glyphicon-map-marker']

# Ensure enough colors/icons
while len(colors) < len(attack_types):
    colors.append('gray')
while len(icons) < len(attack_types):
    icons.append('glyphicon-map-marker')

icon_dict = {atype: {'icon': icon, 'color': color} for atype, icon, color in zip(attack_types, icons, colors)}

# MarkerCluster
marker_cluster = MarkerCluster().add_to(m)

# Add markers
for idx, row in pirates.iterrows():
    icon_info = icon_dict[row['attack_type']]
    country = row['country_name'] if pd.notna(row['country_name']) else "Unknown"
    shore_distance = row['shore_distance'] if pd.notna(row['shore_distance']) else "Unknown"

    popup_text = f"""
    <b>Date:</b> {row['date']}<br>
    <b>Attack Type:</b> {row['attack_type']}<br>
    <b>Nearest Country:</b> {country}<br>
    <b>Distance from Coast:</b> {shore_distance} km
    """
    
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=popup_text,
        icon=folium.Icon(color=icon_info['color'], icon=icon_info['icon'], prefix='glyphicon')
    ).add_to(marker_cluster)

# Dynamic legend
legend_html = '''
<div style="
 position: fixed; 
 bottom: 50px; left: 50px; width: 180px; height: auto; 
 background-color: white; 
 border:2px solid grey; 
 z-index:9999; 
 font-size:14px;
 padding: 10px;
 box-shadow: 3px 3px 6px rgba(0,0,0,0.3);
">
<b>Attack Types</b><br>
'''
for atype, info in icon_dict.items():
    legend_html += f'<i class="glyphicon {info["icon"]}" style="color:{info["color"]}"></i> {atype}<br>'
legend_html += '</div>'
m.get_root().html.add_child(folium.Element(legend_html))

# Visible title, subtitle, footer
title_html = '''
<div style="
 position: fixed; top: 10px; left: 50%; transform: translateX(-50%);
 background-color: white; padding: 5px 15px; border:1px solid grey; z-index:9999;
 text-align: center; font-family: Arial, sans-serif;">
 <h3 style="margin:0; font-size:20px;"><b>Where Pirate Attacks Happened</b></h3>
 <h5 style="margin:0; font-size:16px;">From 1993 to 2020</h5>
 <p style="margin:0; font-size:12px;">Source: Kaggle</p>
</div>
'''
m.get_root().html.add_child(folium.Element(title_html))

# Export as HTML
m.save("pirate_attacks_map.html")
print("Map saved as pirate_attacks_map.html")


Map saved as pirate_attacks_map.html


**Woho! Of course, we can still fix and improve many things, but we now have a solid foundation. It didn’t take much time to map over 8,000 attacks and make everything clear and interactive.**