# TidyTuesday 2024-03-05 - Python

This is my first attempt to conquer a tidyTuesday using python.

Sources:
- https://github.com/rfordatascience/tidytuesday/blob/master/data/2024/2024-03-05/readme.md
- https://www.mrtrashwheel.com
- https://docs.google.com/spreadsheets/d/1b8Lbe-z3PNb3H8nSsSjrwK2B0ReAblL2/edit#gid=1143432795


In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

## First step - doing the preparations like in the description (readme.md on tidytuesday)

In [18]:
# Read the csvs (parse_date has sadly no effect, so we make this after loading the data)
# Each csv contains the data of one 'semi-autonomous trash interceptor'
mrtrash = pd.read_csv('./data/mrtrashwheel.csv', skiprows=1)
professortrash = pd.read_csv('./data/professortrashwheel.csv', skiprows=1)
captaintrash = pd.read_csv('./data/captaintrashwheel.csv', skiprows=1)
gwynndatrash = pd.read_csv('./data/gwynndatrashwheel.csv', skiprows=1)

In [19]:
mrtrash.head(1)

Unnamed: 0,Dumpster,Month,Year,Date,Weight (tons),Volume (cubic yards),Plastic Bottles,Polystyrene,Cigarette Butts,Glass Bottles,Plastic Bags,Wrappers,Sports Balls,Homes Powered*,Unnamed: 14,Unnamed: 15
0,1.0,May,2014.0,5/16/2014,4.31,18,1450,1820,126000,72,584,1162,7,0,,


Following table shows what we see in each column (source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2024/2024-03-05/readme.md)

We can use these information for setting correct data types.

|variable       |class     |description    |
|:--------------|:---------|:--------------|
|ID             |character |Short name for the Trash Wheel             |
|Name           |character |Name of the Trash Wheel           |
|Dumpster       |double    |Dumpster number       |
|Month          |character |Month          |
|Year           |double    |Year           |
|Date           |character |Date           |
|Weight         |double    |Weight in tons         |
|Volume         |double    |Volume in cubic yards          |
|PlasticBottles |double    |Number of plastic bottles |
|Polystyrene    |double    |Number of polystyrene items    |
|CigaretteButts |double    |Number of cigarette butts |
|GlassBottles   |double    |Number of glass bottles   |
|PlasticBags    |double    |Number of plastic bags    |
|Wrappers       |double    |Number of wrappers       |
|SportsBalls    |double    |Number of sports balls    |
|HomesPowered   |double    |Homes Powered - Each ton of trash equates to on average 500 kilowatts of electricity.  An average household will use 30 kilowatts per day.   |



There seems to be some problem in the csv. we have 'Unnamend: XY' columns
Lets make this easy and get only the columns we know we want.

In [25]:
mrtrash = mrtrash[['Dumpster', 'Month', 'Year', 'Date', 'Weight (tons)',
       'Volume (cubic yards)', 'Plastic Bottles', 'Polystyrene',
       'Cigarette Butts', 'Glass Bottles', 'Plastic Bags', 'Wrappers',
       'Sports Balls', 'Homes Powered*']]
# The gwynnda seems to not have 'Glass Bottles' and 'Sport Balls'
gwynndatrash = gwynndatrash[['Dumpster', 'Month', 'Year', 'Date', 'Weight (tons)',
       'Volume (cubic yards)', 'Plastic Bottles', 'Polystyrene',
       'Cigarette Butts', 'Plastic Bags', 'Wrappers',
       'Homes Powered*']]
# The captain seems to not have 'Glass Bottles' and 'Sport Balls'
captaintrash = captaintrash[['Dumpster', 'Month', 'Year', 'Date', 'Weight (tons)',
       'Volume (cubic yards)', 'Plastic Bottles', 'Polystyrene',
       'Cigarette Butts', 'Plastic Bags', 'Wrappers',
       'Homes Powered*']]
# The professor seems to not have 'Sports Balls'
professortrash = professortrash[['Dumpster', 'Month', 'Year', 'Date', 'Weight (tons)',
       'Volume (cubic yards)', 'Plastic Bottles', 'Polystyrene',
       'Cigarette Butts', 'Glass Bottles', 'Plastic Bags', 'Wrappers',
       'Homes Powered*']]

First insight is, that we do not have the full column set in all datasets.

Interesting for us here in germany: The unit 'cubic yards' uses the following conversion to the here better understandable 'cubic meter'

1 yd^3 = 0.764554858 m^3
1 m^3 = 1.3079506193 yd^3

source: https://www.unitconverters.net/volume/cubic-yard-to-cubic-meter.htm

Don't know, if i will add an additional column with the metric unit yet. Let's see..

In [26]:
## Tidy column names for better accessing 
tidy_column_names = {
    'Weight (tons)': 'WeightTons',
    'Volume (cubic yards)': 'VolumeCubicYards',
    'Plastic Bottles': 'PlasticBottles',
    'Cigarette Butts': 'CigaretteButts',
    'Glass Bottles': 'GlassBottles',
    'Plastic Bags': 'PlasticBags',
    'Sports Balls': 'SportsBalls',
    'Homes Powered*': 'HomesPowered'
}

mrtrash = mrtrash.rename(columns=tidy_column_names)
professortrash = professortrash.rename(columns=tidy_column_names)
captaintrash = captaintrash.rename(columns=tidy_column_names)
gwynndatrash = gwynndatrash.rename(columns=tidy_column_names)

In [28]:
mrtrash.head(1)

Unnamed: 0,Dumpster,Month,Year,Date,WeightTons,VolumeCubicYards,PlasticBottles,Polystyrene,CigaretteButts,GlassBottles,PlasticBags,Wrappers,SportsBalls,HomesPowered
0,1.0,May,2014.0,5/16/2014,4.31,18,1450,1820,126000,72,584,1162,7,0


In [29]:
# Add the trash interceptor name as column
mrtrash['Name'] = 'Mister Trash Wheel'
professortrash['Name'] = 'Professor Trash Wheel'
captaintrash['Name'] = 'Captain Trash Wheel'
gwynndatrash['Name'] = 'Gwynnda Trash Wheel'

Since we will join the datasets together to one 'big', we have to add missing columns (SportsBalls and GlassBottles for captain and gwynnda and SportsBalls for the professor)

In [30]:
professortrash['SportsBalls'] = np.nan
captaintrash['SportsBalls'] = np.nan
captaintrash['GlassBottles'] = np.nan
gwynndatrash['SportsBalls'] = np.nan
gwynndatrash['GlassBottles'] = np.nan

So, now we have to adjust the column data types and than put them together to one big file. Hopefully that works without haven the same column ordering. We will see...