<a href="https://colab.research.google.com/github/yilinmiao/NYC_Food_Scrap_Drop_Off_Sites_Custom_Chatbot/blob/main/NYC_Food_Scrap_Drop_Off_Sites_Custom_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NYC Food Scrap Drop-Off Sites Custom Chatbot

For this chatbot, I've chosen the NYC Food Scrap Drop-off Sites dataset. This dataset is appropriate for this application because it provides comprehensive information about composting locations throughout New York City, including details about location, hours, acceptable materials, and special instructions. This data would be valuable for creating a chatbot that can help NYC residents find convenient places to drop off their food scraps and understand composting guidelines, promoting sustainable waste management practices in the city. The structured nature of the data with various fields makes it suitable for conversion into a text format that can be used to generate custom prompts.


## Data Wrangling

Load the chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of the text data, separated into at least 20 rows.

In [1]:
import pandas as pd
import numpy as np
import os

Load the NYC Food Scrap Drop-off Sites dataset

In [2]:
df = pd.read_csv('nyc_food_scrap_drop_off_sites.csv')

Display basic information about the dataset

In [3]:
print(f"Dataset shape: {df.shape}")
print(df.head())

Dataset shape: (576, 25)
   Unnamed: 0        borough                                     ntaname  \
0           0  Staten Island  Grasmere-Arrochar-South Beach-Dongan Hills   
1           1      Manhattan                                      Inwood   
2           2       Brooklyn                                  Park Slope   
3           3      Manhattan                         East Harlem (North)   
4           4         Queens                                      Corona   

                      food_scrap_drop_off_site  \
0                                  South Beach   
1       SE Corner of Broadway & Academy Street   
2                     Old Stone House Brooklyn   
3  SE Corner of Pleasant Avenue & E 116 Street   
4                               Malcolm X FSDO   

                                   location  \
0           21 Robin Road, Staten Island NY   
1                                       NaN   
2            336 3rd St, Brooklyn, NY 11215   
3                            

Let's look at the columns in our dataset

In [4]:
print("\nColumns in the dataset:")
for col in df.columns:
    print(f"- {col}")


Columns in the dataset:
- Unnamed: 0
- borough
- ntaname
- food_scrap_drop_off_site
- location
- hosted_by
- open_months
- operation_day_hours
- website
- borocd
- councildist
- latitude
- longitude
- precinct
- object_id
- location_point
- :@computed_region_yeji_bk3q
- :@computed_region_92fq_4b7q
- :@computed_region_sbqj_enih
- :@computed_region_efsh_h5xi
- :@computed_region_f5dn_yrer
- notes
- ct2010
- bbl
- bin


Transform the dataset to create a "text" column with relevant information. We'll create comprehensive descriptions of each drop-off site

In [5]:
def create_site_description(row):
    # Start with the site name and borough
    text = f"Food Scrap Drop-off Site: {row['food_scrap_drop_off_site']} in {row['borough']}, {row['ntaname']}. "

    # Add location details
    if pd.notna(row['location']):
        text += f"Located at {row['location']}. "

    # Add hosting organization
    if pd.notna(row['hosted_by']):
        text += f"Hosted by {row['hosted_by']}. "

    # Add operating hours
    if pd.notna(row['open_months']) and pd.notna(row['operation_day_hours']):
        text += f"Open {row['open_months']}, {row['operation_day_hours']}. "
    elif pd.notna(row['open_months']):
        text += f"Open {row['open_months']}. "
    elif pd.notna(row['operation_day_hours']):
        text += f"Open {row['operation_day_hours']}. "

    # Add website if available
    if pd.notna(row['website']):
        text += f"Website: {row['website']}. "

    # Add notes about acceptable materials
    if pd.notna(row['notes']):
        text += f"Additional information: {row['notes']} "

    # Add coordinates for mapping
    if pd.notna(row['latitude']) and pd.notna(row['longitude']):
        text += f"GPS coordinates: {row['latitude']}, {row['longitude']}."

    return text

Create the text column

In [6]:
df['text'] = df.apply(create_site_description, axis=1)

Show examples of the created text descriptions

In [7]:
print("\nSample text descriptions:")
for i in range(3):
    print(f"\nSite {i+1}:\n{df['text'].iloc[i]}")


Sample text descriptions:

Site 1:
Food Scrap Drop-off Site: South Beach in Staten Island, Grasmere-Arrochar-South Beach-Dongan Hills. Located at 21 Robin Road, Staten Island NY. Hosted by Snug Harbor Youth. Open Year Round, Friday (Start Time: 1:30 PM - End Time:  4:30 PM). Website: snug-harbor.org. GPS coordinates: 40.595579, -74.062991.

Site 2:
Food Scrap Drop-off Site: SE Corner of Broadway & Academy Street in Manhattan, Inwood. Hosted by Department of Sanitation. Open Year Round, 24/7. Website: www.nyc.gov/smartcomposting. Additional information: Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin! 

Site 3:
Food Scrap Drop-off Site: Old Stone House Brooklyn in Brooklyn, Park Slope. Located at 336 3rd St, Brooklyn, NY 11215. Hosted by Old Stone House Brooklyn. Open Year Round, 24/7 (Start Time: 24/7 - End Time:  24/7). GPS coordinates: 40.6727118, -73.984731.


## Custom Query Completion

Compose a custom query using the chosen dataset and retrieve results from an OpenAI `Completion` model.

Import necessary libraries for working with the OpenAI API

In [9]:
!pip install dotenv

Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading dotenv-0.9.9-py2.py3-none-any.whl (1.9 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, dotenv
Successfully installed dotenv-0.9.9 python-dotenv-1.1.0


In [10]:
import openai
import os
from dotenv import load_dotenv

Load API key from environment variables


load_dotenv()  # Load environment variables from .env file if present
openai.api_key = os.getenv('OPENAI_API_KEY')