# Trash Panda

### By _Tobias Reaper_

---

## Notebook Outline

* Technologies
* Topics
* Introduction
* The Data Pipeline
  * Challenges
* The Model

### Technologies

* Python
* OpenCV
* Mask R-CNN
* Flask
* Docker

### Topics

* Computer vision
* Object detection
* Data engineering

---

## Introduction

### The Problem

You have an object in your hand that you intend to throw away. When you think about it as you're walking to the bins, you realize you actually don't know whether this type of object is recyclable or not. Maybe it is made of multiple different materials, or of an uncommon or unrecognizable material.

You're in the middle of an important project, and it's crunch time—no extra time available to spend researching. You end up throwing it in the recycling because it...well, it _seems_ like something that would be recyclable. With the decision made and action taken, you return to your important project, forgetting all about what just transpired.

I'd bet that most who are reading this have had an experience like this.

The priceless time and energy spent researching how to properly dispose of every single item can add up. However, the US is in something of a recycling crisis at the moment, partially due to the low quality of our recyclable waste—it tends to be very intermixed with non-recyclables.

Ever since China's [National Sword](https://99percentinvisible.org/episode/national-sword/) legislation in 2017, which vastly reduced the amount of foreign recycling—particularly low-quality—the country would accept, recycling companies in the US have been forced to grapple with this quality issue. The cost of recycling increases when more trash is intermingled with it, as more sorting has to occur prior to processing. Whether it is more expensive machines or higher cost of labor, sorting costs money.

While the domestic recycling infrastructure will (hopefully) grow to meet the increasing demand, the best way to solve a problem is to address the source of the issue, not the symptoms. One key reason for the low quality recycling is simply a lack of easily accessible information. Even with the power of modern search engines at our fingertips, finding relevant recycling information can take a long time, as what exactly constitutes recycling changes depending on area and company.

The simple fact is that most people don't want to spend the additional time it takes (at least up front) to have good recycling habits. So why not simply remove that additonal time from the equation?

### The Solution

The goal was to build an app that helps to foster better recycling habits by reducing the the effort needed to find accurate and relevant information on how to properly dispose of any given item of waste. To make this possible, we needed to reduce the friction so much so that looking up how to dispose of something that a user is holding in their hand is just as quick and easy as debating for a few moments on what bin it goes in.

Put another way, our goal was to reduce the cognitive tax of getting relevant recycling information so much that disposing of every item of waste properly, regardless of what it is, becomes effortless.

Our stakeholder envisioned that the user would simply snap a photo of something they are about to toss. Then, the app's computer vision (object detection) functionality would recognize the object and automatically pull up the relevant information on how it should be disposed of according to the user's location and available services. The user would know immediately if the item should be thrown in the trash, recycle, or compost, or if it is recyclable only at an offsite facility. For the latter case, the user would be able to view a list of nearby facilities that accept the item or material.

The result of this vision is a progressive web app (PWA) called Trash Panda, which does just that. You can try out the app on your mobile device now by following the link below.

[[LinkBlock]]
The Trash Panda app (meant for mobile)

#### A note on PWAs

For those who aren't familiar, a PWA is basically a web app that can both be used via the browser and downloaded to the home screen of a mobile device. Google has been moving to fully support PWAs, meaning Trash Panda is available on the Play Store right now. Of course the benefit of a PWA is you don't actually have to download it at all if you don't want to. You can use it directly from the browser.

Apple is pretty far behind in their support of PWAs. As a result, the behavior on an iOS device is not ideal. For those on iOS, be sure to use Safari. And when taking a picture of an item, you have to exit out of the video window before pressing the normal shutter button.

You'll figure it out—we believe in you!

### The Team (and My Role On It)

For eight weeks near the beginning of 2020, I worked with a remote interdisciplinary team to bring the vision of Trash Panda to life.

Trash Panda is by far the most ambitious machine learning endeavor I had yet embarked on. Indeed, it was the largest software project I'd worked on in just about every respect: time, team, ambition, breadth of required knowledge. As such, it provided to me many valuable, foundational experiences that I'll surely keep with me throughout my entire career.

I seriously lucked out on the team for this project. Every single one of them was hard-working, thoughtful, friendly—a pleasure to work with. The team included myself and three other machine learning engineers, four web developers, and two UX designers (links to all of their respective sites in the Final Thoughts section below). Our stakeholder Trevor Clack, who came up with the idea for the app, also worked on the project as a machine learning engineer.

We all pushed ourselves throughout each and every day of the eight weeks to make Trevor's vision come to life, learning entirely new technologies, frameworks, skills, and processes along the way.

For example, the web developers taught themselves how to use GraphQL, along with a variety of related/dependent technology. On the machine learning side of things, none of us had significant applied experience with computer vision (CV) going into the project. We'd spent a few days studying and working with it in the Deep Learning unit of our Lambda School curriculum. But that was more to expose us to it, rather than covering the entire process in-depth. We had only the shallowest of surface scratches compared to what was ultimately needed to meet the vision set out for us.

As the machine learning engineers on the team, we were responsible for the entire process of getting an object detection system built, trained, deployed, and integrated with the app. Basically, we were starting from scratch, both in the sense of a greenfield project and of us being inexperienced with CV.

Of course, CV is still machine learning—many steps in the process are similar to any other supervised machine learning project. But working with images comes with its own suite of unique challenges that we had to learn how to overcome.

We split up the work as evenly as possible, given our initially limited knowledge of the details, with some steps being split up between some or all of us, and other steps having a sole owner.

For the first couple of weeks, the entire project team collaborated on fleshing out the product vision, release canvas, and high-level architecture.

The first step for which I was solely responsible included building a system to automatically remove the background from images (or extract the foreground, depending on how you look at it). Essentially, when tasked with figuring out a way to automate the process of removing the background from images so they could be labeled via a script written by Trevor, I built a secondary pipeline that included an image segmentation model. More details can be found in the Automated Background Removal section below.

Furthermore, I was responsible for building and deploying the object detection API. I built the API using Flask, containerized it with Docker, and deployed it to AWS Elastic Beanstalk. I go into a little more detail in the Deployment section below.

All members of the machine learning team contributed to the gathering and labeling of the dataset. To this end, each of us ended up gathering and labeling somewhere in the range of 20,000 images.

---

## Data Pipeline

As seems to be the case with most, if not all, machine learning projects, we spent the vast majority of the time gathering and labeling our dataset.

As also seems to be the case with most, if not all, projects in general, we were almost constantly grappling with scope management. In an ideal world, our model would be able to recognize any object that anyone would ever want to throw away. But in reality is this is practically impossible, particularly within the 8 weeks we had to work on Trash Panda.

I say "practically" because I'm sure if a company dedicated enough resources to the problem, eventually it could be solved, at least to some degree.

Fortunately, we were granted an API key from [Earth911](https://earth911.com/) (shoutout to them for helping out!) to utilize their [recycling center search database](https://search.earth911.com/). At the time we were working with it, the database held information on around 300 items—how they should be recycled based on location, and facilities that accept them if they are not curbside recyclable. They added a number of items when we were already most of the way done with the project, and have likely added more since then.

We had our starting point for the list of items our system should be able to recognize. However, the documentation for the neural network architecture we'd decided to use suggested that to create a robust model, it should be trained with at least 1,000 instances (in this case, images) of each of the classes we wanted it to detect.

Gathering 300,000 images was also quite a bit out of the scope of the project at that point. So the DS team spent many hours reducing the size of that list to something a little more manageable and realistic.

The main method of doing so was to group the items based primarily on visual similarity. We knew it was also out of the scope of our time with the project to train a model that could tell the difference between #2 plastic bottles and #3 plastic bottles, or motor oil bottles and brake fluid bottles.

[Image of plastic bottles / example]

Given enough time and resources, who knows? Maybe we could train an object detection model that could accurately recognize 300+ items and distinguish between similar-looking items.

We also considered the items that 1) users would be throwing away on a somewhat regular basis, and 2) users would usually either be unsure of how to dispose of properly or would dispose of properly.

By the end of this process, we managed to cluster and prune the original list of about 300 items and materials down to 73.

We were ready to start building out the data pipeline. We split up the tasks between the four of us and got to coding!

I go into detail on my role in the build below, in the "Background Removal" section.

### Gather

The next step after defining our list of classes was to figure out some way of getting somewhere in the range of 1,000 images for each one.

Timothy built the piece of the pipeline that we used to gather the majority of images. I say majority because we also used Google's Open Images Dataset for any classes we could. Somewhat to our surprise, Bing ended up being the most fruitful, primarily due to the friendliness of the API.

Timothy's blog post about the Trash Panda project can be found below.

[LinkBlock]
[Trash Panda](https://www.gamesbytim.com/2020/03/trash-panda.html)

### Annotate

To train an object detection model, each image in the training dataset must be annotated with rectangular bounding boxes (or, more accurately, the coordinates that define the bounding box) surrounding each of the objects belonging to a class that we want the model to recognize. These are used as the label, or target, for the model—i.e. what the model will be trying to predict.

In order to gather and annotate over 70,000 images between four of us in only a handful of weeks while keeping our sanity, we had to come up with some way of automating all or part of the process.

Trevor, one of the other Data Scientists on the team, came up with an idea to automate the labeling part of the process—arguably the most time-intensive part. Basically, the idea was to use images that feature the items over transparent backgrounds. If the item is the only object in the image, it would be relatively simple and straightforward to write a script that draws a bounding box around it.

If you'd like some more detail on this, Trevor wrote a blog post about it.

[LinkBlock]
[automated bbox](https://tclack88.github.io/blog/code/2020/02/17/automated-bounding-boxes.html)

All of the major search engines allow an advanced search for transparent images, which is what we did. Of course finding a thousand unique images of a single class of object is already something of a task. And depending on the object, finding that many without backgrounds was virtually impossible.

For the images that had backgrounds, we would either have to manually label them, or find a way to automate the process and build it into the pipeline.

Because the script to label images without backgrounds was already written and working, we decided the way to go was to find a way of automatically removing the background from images.

This is the part of the pipeline that I built.

### Automated Background Removal

I'll give a brief overview here of how I built a system for automatically removing backgrounds from images. If you're curious about the details, check out my separate blog post on the topic.

[LinkBlock]
[Automated BG Removal With Python, OpenCV, and Mask R-CNN]()

I tested out a few different methods of image background removal, or foreground extraction, depending on how you look at it. I ended up building a short image processing pipeline that utilized a pre-trained image segmentation model (similar to object detection) to find the object(s) in the image.

Part of the output of the image segmentation model is a series of coordinates that describe an outline of the object(s) in the image. I then used that as a binary mask to define the area of the image that should be kept, making the rest of it transparent.

Unfortunately, I did not have much time to spend on improving the performance of the image segmentation model, and as a result there was still a fair amount of manual labeling to be done after the pipeline. For example, I could have trained the image segmentation model using around 50 images of each class. This would've made the output mask much more accurate and reduced the time spent fixing the labels afterwards.

As it was, using only the pretrained weights, there were some object classes that it performed very well on, while for others it did not.

### Running the Pipeline

As with building the pipeline, we split up the classes evenly amongst the four of us and got to work gathering and labeling the images.

[Talk a bit about the challenges faced during this part?]

---

## The Model

### Architecture

The neural network architecture we used to train the main object detection system used in the app (and still is being used, in case you want to try it out) is called YOLOv3: You Only Look Once, version 3.

YOLOv3 is a state-of-the-art, real-time, single-shot object detection system.

[Use some verbiage from Shark Tank demo?]

### Training

The model was trained on a XXX sagemaker instance over the course of roughly 60 hours.

[Talk about accuracy and other metrics]

### Deployment

The trained model was deployed as a Flask API to AWS Elastic Beanstalk. Once a user takes a photo in the app, it is sent to the detection API. The trained model runs inference on the image, and sends back the class of item with the highest probability.

---

## Final Thoughts