#  Project 1: Data Analysis and Visualization with Real-Life Marine Litter Dataset

Now, let's combine everything we have learned so far to work on a real life dataset.

This dataset provides valuable information about marine litter found at the seafloor of the southeastern North Sea. It was collected during scientific research aimed at understanding the distribution and impact of marine litter in this area. Here's a breakdown of the dataset description and its columns:

- **Citation:** The dataset is attributed to Lars Gutow and published by the Alfred Wegener Institute. It is part of a larger study that investigates marine litter in the North Sea. The dataset can be accessed through the provided DOI link (https://doi.org/10.1594/PANGAEA.890785)

- **Location:** The dataset is specific to the North Sea.

- **Campaign:** The specific research campaign under which this data was collected was "HE419".

- **Method/Device:** A beam trawl (BEAM) was used to collect marine litter data from the seafloor.

## Dataset Columns:
- **Station:** This identifies the specific sampling station where the data was collected.
- **Date/Time:** The exact date and time of the sampling at the station.
- **Latitude:** The geographic latitude of the sampling point.
- **Longitude:** The geographic longitude of the sampling point.
- **Elevation [m]:** The depth at which the sampling occurred, measured in meters.
- **Litter obj:** This column likely records the specific litter object(s) found during sampling.
- **Litter cat:** This column categorizes the types of litter objects identified.
- **Litter fish:** This column may provide additional information regarding litter impacts on fish species or related observations.

This dataset is crucial for understanding marine pollution in the North Sea. It documents the types and quantities of litter found at various depths and locations, which can help researchers assess the impact of marine litter on marine ecosystems and contribute to environmental management efforts. The detailed metadata allows for effective spatial and temporal analyses, enabling scientists to track changes over time and identify potential sources of pollution.

Let’s start by **importing the libraries** we’ll need for our analysis. We’ll use Pandas for data manipulation, NumPy for numerical operations, and Matplotlib and Seaborn for our visualizations. Setting these up first will help us later.

In [1]:
#Your code goes here

Now that we have our libraries ready, let’s **load our dataset** from the CSV file. We’ll display the first few rows so we can get an idea of what the data looks like

In [2]:
#Your code goes here

Next, we’ll check for any **missing values** in our dataset:

In [3]:
#Your code goes here

We can see how many missing values are in each column. Now we can get **handle the missing values** by using the **fillna()** function and we can check the missing values in our dataframe again:

In [4]:
#Your code goes here

Let’s also **check for duplicate entries** in our dataset. If we find any, we’ll remove them to ensure our analysis is based on unique observations.

In [5]:
#Your code goes here

Now, let’s **handle the outliers** in our dataframe with the *Interquartile Range (IQR)* method:

In [6]:
#Your code goes here

Let’s **calculate how many times in our datafram were there fish present alongside litter**.

We’ll use a condition to check if Litter fish equals 'yes', and then count those occurrences. This will give us a clearer picture of the relationship between litter and fish populations.

In [7]:
#Your code goes here

The code checks how many times 'yes' appears in the 'Litter fish' column of litter_data.

The result shows how many entries are 'yes' and how many are not.

Now, let's **filter** the dataset to look specifically at **plastic litter** since it's a significant issue in marine environments. We’ll create a new DataFrame that *contains only plastic litter*. We’ll create a new DataFrame that contains only plastic litter and calculate the percentage of plastic litter in the dataset.

In [8]:
#Your code goes here

Next, let’s **slice our DataFrame to focus on the columns that matter for our analysis**. We’ll keep the Station, Date/Time, Litter Object, Litter Category, and Litter Fish columns.

In [9]:
#Your code goes here

Let’s **visualize the relationship between different types of litter and the fish populations** associated with them. We’ll create a bar plot to see how many fish species are linked to each litter category.

In [10]:
#Your code goes here

Now, let's **analyze how the amount of litter collected changes over time** by visualizing litter counts against the sampling date.

In [11]:
#Your code goes here

This line plot displays the number of litter objects collected over time. By visualizing litter collection on a timeline, we can identify trends or spikes in litter accumulation, which may correspond to specific events or seasonal patterns, providing insights for further research or action.

Now **let's examine the relationship between litter objects and fish populations**. We'll create a count plot to visualize how many different types of litter objects were found in samples where fish were present. First, we'll filter our dataset to include only those entries where Litter fish is marked as 'yes'. This will give us the data specifically related to the interaction between litter and fish.

In [12]:
#Your code goes here

With our graph we can see that there are multiple litter objects that have similar but not the same naming, before we standarize our data let's **check if they fall under the same litter category**:

In [13]:
#Your code goes here

With this grouping we now know that **they are all the same category = 'plastics'**, so our next step is to standarize our data.

To **standardize the values in the Litter obj column** we can use the Pandas *.replace()* method. Here's how to do that:

In [14]:
#Your code goes here

Next, let's analyze the **relationship between elevation (depth) and litter category** with a boxplot that can show how litter types are distributed across different elevations:

In [15]:
#Your code goes here