# 1. Introduction: Can You Hear the Internet?

Have you ever wondered what your internet usage sounds like? We're used to seeing data in charts and graphs, but what if we could represent it with music? This was the question that sparked the Network Communications Project.

In this article, I'll walk you through how I took a raw dataset of my university's weekly network traffic and transformed it into a musical piece. The goal was to "sonify" the download and upload statistics, creating a unique auditory representation of data.

We'll use Python's Pandas library for the data manipulation and a fascinating online tool called [TwoTone MIDI Out Beta](https://twotone-midiout-beta.netlify.app/) to generate the final musical output. Let's get started!

# 2. The Raw Material: Sourcing the Network Data

Every data story starts with the data. For this project, the source was the Janet Netsight Portal, which tracks network traffic for UK educational institutions.

**The Goal:** To get a historical view of the "In" (download) and "Out" (upload) traffic for my University.  
**The Challenge:** The portal doesn't allow for a simple "download all" button. Through experimentation, I found that requesting a date range of approximately 549 days (e.g., from July 2023 to January 2025) provided the daily data I needed.  
**A Quick Note on Perspective:** The data is labeled from the provider's (Janet's) perspective. This means "In" is data coming into their network from the university (our upload), and "Out" is data going out to the university (our download). We'll need to remember to swap these later!

# 3. The Setup: Loading and Cleaning the Data  

With several CSV files downloaded, the first step is to load them into a single, clean DataFrame using Pandas. We'll use Python's glob library to find all the CSV files in our input directory.

## Code Block 1: Imports and File Loading  
First, let's import our libraries and set up the path to our data files.

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
import glob

# Define the path to the input folder and get a list of all CSV files
path = rf'./input/*.csv'
files = glob.glob(path)

Now, we'll loop through each file, read it into a Pandas DataFrame, and combine them. A small trick here is to use a header_flag to ensure we only include the header row from the very first file, creating a clean, unified dataset.

In [2]:
# List to hold each DataFrame
df_list = []
header_flag = False

for file in files:
    df_temp = pd.read_csv(file)

    if not header_flag:
        header_flag = True
    else:
        df_temp = df_temp.iloc[1:]

    # --- THIS IS THE ADDED LINE ---
    # Filter out rows where the second column (original 'Traffic Out') is zero
    if df_temp.shape[1] > 2: # Ensure the column exists before filtering
        df_temp = df_temp[df_temp.iloc[:, 2] != 0]

    df_list.append(df_temp)

# Concatenate all DataFrames into a single one
df = pd.concat(df_list, ignore_index=True)

# Sort by time and rename columns to reflect our perspective
df = df.sort_values('Time', ascending=True).copy()
df.columns = ['Time', 'in', 'out']

print("Data loaded and filtered.")

Data loaded and filtered.


# 4: Feature Engineering - Building a Richer Musical Palette

Raw data is just the beginning. To transform our data into a piece with rhythm, texture, and distinct movements, we need to create more features. Think of these new data columns as potential instruments or triggers in our final musical score.

The goal is to create flags that mark specific points in time, such as the start of a week, a month, or a quarter. We can also add context, like whether a given day is a workday or a weekend.

Here are the features we'll add to our daily data:

* Week_Start: A flag for Monday to mark the start of a new week.
* Month_Start: A flag for the first day of the month.
* Year_Start: A flag for the first day of the year.
* Week_Number: The week number of the year (1-52/53).

Let's generate these using the  datetime functionality built into Pandas.

## Code Block 2: Creating Time-Based Features

First, we ensure the 'Time' column is in the correct datetime format. Then, we use the .dt accessor to pull out all the information we need. We use np.where to create our flags: if a condition is true (e.g., the day is a Monday), we assign it a value of 256; otherwise, it's 0. This high value creates a clear signal for our sonification tool.

# Convert the 'Time' column to a proper datetime format
df['Time'] = pd.to_datetime(df['Time'], format='mixed')

# Extract time-based features that we can use for musical cues
df['Week_Start'] = np.where(df['Time'].dt.day_name() == 'Monday', 256, 0)
df['Month_Start'] = np.where(df['Time'].dt.is_month_start, 256, 0)
df['Year_Start'] = np.where(df['Time'].dt.is_year_start, 256, 0)

# We'll also extract the week number and year for grouping
df['Week_Number'] = df['Time'].dt.isocalendar().week
df['Year'] = df['Time'].dt.year

In [3]:
# Convert the 'Time' column to a proper datetime format
df['Time'] = pd.to_datetime(df['Time'], format='mixed')

# Create flag columns. A high value (256) is used to create a clear signal for our sonification tool.
df['Week_Start'] = np.where(df['Time'].dt.day_name() == 'Monday', 256, 0)
df['Month_Start'] = np.where(df['Time'].dt.is_month_start, 256, 0)
df['Year_Start'] = np.where(df['Time'].dt.is_year_start, 256, 0)
#df['Year_Quarter_Week'] = np.where(df['Time'].dt.isocalendar().week.isin([13, 26, 39, 52]), 256, 0) # <-- ADD THIS LINE BACK IN

# We'll also extract the week number and year for grouping
df['Week_Number'] = df['Time'].dt.isocalendar().week
df['Year'] = df['Time'].dt.year

# 5. From Daily Noise to a Weekly Melody
Daily data can be noisy. To create a smoother, more melodic output, I decided to aggregate the data by week, taking the average (mean) traffic for each week.

## Code Block 3: Grouping by Week

In [4]:
# Group the daily data by year and week number to create our weekly dataset
weekly_df = df.groupby(['Year', 'Week_Number']).agg(
    in_mean=('in', 'mean'),
    out_mean=('out', 'mean'),
    Year_Start=('Year_Start', 'max'),
    Month_Start=('Month_Start', 'max'),
    #Year_Quarter_Week=('Year_Quarter_Wee
).reset_index()

# 6. The Core Challenge: Proportional Sonification

A key challenge is that download traffic is much larger than upload traffic. To prevent both from being mapped to the same highest musical note, we must scale them proportionally. This is done by calculating a scaling factor based on the maximum values in the original daily data, then applying it to the quantized upload data.

## Code Block 4: Scaling the data

In [5]:
# Calculate the scaling factor based on the max values in the original daily 'df'
in_max = df['in'].max()
out_max = df['out'].max()
scaling_factor = out_max / in_max

# Create the quantized 'in' column (download) with 48 bins for musical notes
#weekly_df['in_quant'] = pd.cut(
#    weekly_df['in_mean'], bins=48, labels=range(1, 49)
#).astype(int)

# Create the scaled, quantized 'out' column (upload)
out_quant_scaled = (pd.cut(
    weekly_df['out_mean'], bins=48, labels=range(1, 49)
).astype(float) * scaling_factor)

# Round the result and ensure the minimum note value is 1 (0 would be silence)
#weekly_df['out_quant'] = out_quant_scaled.round().astype(int).clip(lower=1)

# 7. Adding Dynamics: Isolating Weekly Trends

To give the final music a sense of movement, we need to isolate the weeks where traffic was increasing from the weeks where it was decreasing. The script achieves this by determining the trend from the **raw weekly average traffic** and using that to create two full sets of trend columns: one containing the raw values and one containing the final quantized "musical notes."

## Code Block 5: Capturing Trends

In [16]:
# 1. First, define the trend direction based on the raw weekly mean values
is_in_up = weekly_df['in_mean'] > weekly_df['in_mean'].shift()
is_in_down = weekly_df['in_mean'] < weekly_df['in_mean'].shift()
is_out_up = weekly_df['out_mean'] > weekly_df['out_mean'].shift()
is_out_down = weekly_df['out_mean'] < weekly_df['out_mean'].shift()

# 2. Create trend columns containing the RAW mean values
weekly_df['in_up'] = weekly_df['in_mean'].where(is_in_up, 0)
weekly_df['in_down'] = weekly_df['in_mean'].where(is_in_down, 0)
weekly_df['out_up'] = weekly_df['out_mean'].where(is_out_up, 0)
weekly_df['out_down'] = weekly_df['out_mean'].where(is_out_down, 0)

display(weekly_df.head().style.hide(axis="index"))

Year,Week_Number,in_mean,out_mean,Year_Start,Month_Start,in_up,in_down,out_up,out_down
2019,1,55040087.0,544081067.0,256,256,0.0,0.0,0.0,0.0
2019,2,128355950.857143,1548699540.571429,0,0,128355950.857143,0.0,1548699540.571429,0.0
2019,3,134956550.857143,1562482549.714286,0,0,134956550.857143,0.0,1562482549.714286,0.0
2019,4,154850965.714286,1763358201.142857,0,0,154850965.714286,0.0,1763358201.142857,0.0
2019,5,170757284.571429,1828013320.0,0,256,170757284.571429,0.0,1828013320.0,0.0


# 8. Final Polish and Saving

Finally, we perform some housekeeping. The mean columns are renamed (e.g., `in_mean` becomes `in`) and the columns are selected and reordered to exactly match the desired final format before saving the result to a CSV file in the `/output` folder.

In [15]:
print("Finalizing the DataFrame...")

# Rename the '_mean' columns for a cleaner final output
final_df = weekly_df.rename(columns={
    'in_mean': 'in',
    'out_mean': 'out'
})

# Define exactly which columns to keep in the final CSV file.
# This list also sets the order of the columns.
columns_to_keep = [
    'in', 'out',
    'Month_Start', 'Year_Start',
    'in_up', 'in_down',
    'out_up', 'out_down'
]

# Create the final DataFrame with only the selected columns in the correct order
final_df = final_df[columns_to_keep]

# Define the output path and filename
output_filename = "./output/Processed_traffic.csv"

# Save the final DataFrame to a CSV file.
# We use index=False to prevent writing the row numbers (0, 1, 2...) to the file.
final_df.to_csv(output_filename, index=False)

# Print a confirmation message that uses the correct filename
print(f"Processing complete. File saved to: {output_filename}")

# Display the first few rows of the final data to confirm it's correct
display(final_df.head().style.hide(axis="index"))

Finalizing the DataFrame...
Processing complete. File saved to: ./output/Processed_traffic2.csv


in,out,Month_Start,Year_Start,in_up,in_down,out_up,out_down
55040087.0,544081067.0,256,256,0.0,0.0,0.0,0.0
128355950.857143,1548699540.571429,0,0,128355950.857143,0.0,1548699540.571429,0.0
134956550.857143,1562482549.714286,0,0,134956550.857143,0.0,1562482549.714286,0.0
154850965.714286,1763358201.142857,0,0,154850965.714286,0.0,1763358201.142857,0.0
170757284.571429,1828013320.0,256,0,170757284.571429,0.0,1828013320.0,0.0


# 8. The Final Composition: Making Music with TwoTone  

With our data prepared, the final step is to bring it to life. I used the [TwoTone MIDI Out Beta](https://twotone-midiout-beta.netlify.app/) . You simply upload the final CSV, map your data columns to musical parameters, and press play.  
Here are the settings I found worked best to create a clear and pleasant composition from the weekly data:

**Download Data (in_quant)**

* Instrument: Double Bass
* Range: 2 Octaves
* Speed: 4x
* Style: Arpeggio (4 notes, ascending)

**Upload Data (out_quant)**

* Instrument: Glockenspiel
* Range: 2 Octaves
* Speed: 2x
* Style: Ascending

The low, heavy notes of the double bass represent the high volume of download traffic, while the lighter, twinkling glockenspiel represents the lower volume of upload traffic. The time markers for month and year starts were used as cues to add speech annotations (e.g., "2024," "February") using an external audio editor for the final video.

# 9. Conclusion
This project was a fascinating journey into a new, for me, form of data representation. It demonstrates that with a bit of creativity and some data manipulation skills, we can find stories in data that go beyond traditional charts. Sonification offers a uniquely emotional and intuitive way to experience data patterns over time.

The final CSV file is ready for sonification. I encourage you to download the notebook from my GitHub [link to your GitHub repo here], experiment with the data, and create your own data-driven music with the [TwoTone MIDI Out Beta](https://twotone-midiout-beta.netlify.app/)!