<a href="https://colab.research.google.com/github/yilinmiao/NYC_Food_Scrap_Drop_Off_Sites_Custom_Chatbot/blob/main/NYC_Food_Scrap_Drop_Off_Sites_Custom_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NYC Food Scrap Drop-Off Sites Custom Chatbot

For this chatbot, I've chosen the NYC Food Scrap Drop-off Sites dataset. This dataset is appropriate for this application because it provides comprehensive information about composting locations throughout New York City, including details about location, hours, acceptable materials, and special instructions. This data would be valuable for creating a chatbot that can help NYC residents find convenient places to drop off their food scraps and understand composting guidelines, promoting sustainable waste management practices in the city. The structured nature of the data with various fields makes it suitable for conversion into a text format that can be used to generate custom prompts.


## Data Wrangling

Load the chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of the text data, separated into at least 20 rows.

In [1]:
import pandas as pd
import numpy as np
import os

Load the NYC Food Scrap Drop-off Sites dataset

In [2]:
df = pd.read_csv('nyc_food_scrap_drop_off_sites.csv')

Display basic information about the dataset

In [3]:
print(f"Dataset shape: {df.shape}")
print(df.head())

Dataset shape: (576, 25)
   Unnamed: 0        borough                                     ntaname  \
0           0  Staten Island  Grasmere-Arrochar-South Beach-Dongan Hills   
1           1      Manhattan                                      Inwood   
2           2       Brooklyn                                  Park Slope   
3           3      Manhattan                         East Harlem (North)   
4           4         Queens                                      Corona   

                      food_scrap_drop_off_site  \
0                                  South Beach   
1       SE Corner of Broadway & Academy Street   
2                     Old Stone House Brooklyn   
3  SE Corner of Pleasant Avenue & E 116 Street   
4                               Malcolm X FSDO   

                                   location  \
0           21 Robin Road, Staten Island NY   
1                                       NaN   
2            336 3rd St, Brooklyn, NY 11215   
3                            

Let's look at the columns in our dataset

In [4]:
print("\nColumns in the dataset:")
for col in df.columns:
    print(f"- {col}")


Columns in the dataset:
- Unnamed: 0
- borough
- ntaname
- food_scrap_drop_off_site
- location
- hosted_by
- open_months
- operation_day_hours
- website
- borocd
- councildist
- latitude
- longitude
- precinct
- object_id
- location_point
- :@computed_region_yeji_bk3q
- :@computed_region_92fq_4b7q
- :@computed_region_sbqj_enih
- :@computed_region_efsh_h5xi
- :@computed_region_f5dn_yrer
- notes
- ct2010
- bbl
- bin


Transform the dataset to create a "text" column with relevant information. We'll create comprehensive descriptions of each drop-off site

In [5]:
def create_site_description(row):
    # Start with the site name and borough
    text = f"Food Scrap Drop-off Site: {row['food_scrap_drop_off_site']} in {row['borough']}, {row['ntaname']}. "

    # Add location details
    if pd.notna(row['location']):
        text += f"Located at {row['location']}. "

    # Add hosting organization
    if pd.notna(row['hosted_by']):
        text += f"Hosted by {row['hosted_by']}. "

    # Add operating hours
    if pd.notna(row['open_months']) and pd.notna(row['operation_day_hours']):
        text += f"Open {row['open_months']}, {row['operation_day_hours']}. "
    elif pd.notna(row['open_months']):
        text += f"Open {row['open_months']}. "
    elif pd.notna(row['operation_day_hours']):
        text += f"Open {row['operation_day_hours']}. "

    # Add website if available
    if pd.notna(row['website']):
        text += f"Website: {row['website']}. "

    # Add notes about acceptable materials
    if pd.notna(row['notes']):
        text += f"Additional information: {row['notes']} "

    # Add coordinates for mapping
    if pd.notna(row['latitude']) and pd.notna(row['longitude']):
        text += f"GPS coordinates: {row['latitude']}, {row['longitude']}."

    return text

Create the text column

In [6]:
df['text'] = df.apply(create_site_description, axis=1)

Show examples of the created text descriptions

In [7]:
print("\nSample text descriptions:")
for i in range(3):
    print(f"\nSite {i+1}:\n{df['text'].iloc[i]}")


Sample text descriptions:

Site 1:
Food Scrap Drop-off Site: South Beach in Staten Island, Grasmere-Arrochar-South Beach-Dongan Hills. Located at 21 Robin Road, Staten Island NY. Hosted by Snug Harbor Youth. Open Year Round, Friday (Start Time: 1:30 PM - End Time:  4:30 PM). Website: snug-harbor.org. GPS coordinates: 40.595579, -74.062991.

Site 2:
Food Scrap Drop-off Site: SE Corner of Broadway & Academy Street in Manhattan, Inwood. Hosted by Department of Sanitation. Open Year Round, 24/7. Website: www.nyc.gov/smartcomposting. Additional information: Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin! 

Site 3:
Food Scrap Drop-off Site: Old Stone House Brooklyn in Brooklyn, Park Slope. Located at 336 3rd St, Brooklyn, NY 11215. Hosted by Old Stone House Brooklyn. Open Year Round, 24/7 (Start Time: 24/7 - End Time:  24/7). GPS coordinates: 40.6727118, -73.984731.


## Custom Query Completion

Compose a custom query using the chosen dataset and retrieve results from an OpenAI `Completion` model.

Import necessary libraries for working with the OpenAI API

In [8]:
!pip install dotenv



In [43]:
from openai import OpenAI

Load API key from colab variables

In [58]:
from google.colab import userdata
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

Check if API key is available

In [59]:
if not client.api_key:
    print("WARNING: OPENAI_API_KEY environment variable not set. Please set this before proceeding.")


Prepare data for the custom prompt.
Let's create a subset of the data to keep our prompt concise.
We'll make sure to have at least 20 rows as required.

In [60]:
df_subset = df.head(30)  # Taking 30 rows to ensure we have more than 20
# Verify we have at least 20 rows with text content
print(f"\nNumber of rows in our subset: {len(df_subset)}")


Number of rows in our subset: 30


In [61]:
# Create a function to generate a custom prompt
def generate_custom_prompt(user_query, dataset_texts):
    # Create a system prompt that describes the role and includes the dataset
    system_prompt = (
        "You are an expert assistant specializing in NYC food scrap drop-off sites and composting information. "
        "You help people find places to drop off their food scraps for composting in New York City. "
        "You have detailed knowledge about drop-off locations, operating hours, accepted materials, and other relevant information. "
        "\n\nHere is information about food scrap drop-off sites in NYC that you can use to answer questions:\n\n"
    )

    # Add dataset texts to the system prompt
    for i, text in enumerate(dataset_texts):
        system_prompt += f"Site {i+1}: {text}\n\n"

    # Add instructions for how to respond
    system_prompt += (
        "\nWhen answering questions:\n"
        "1. Provide specific site recommendations based on the locations or criteria mentioned."
        "2. Include relevant details like operating hours, accepted materials, and any special instructions."
        "3. If you don't know the answer based on the provided information, say so instead of making up details."
        "4. Be helpful, accurate, and concise in your responses."
    )

    return system_prompt, user_query


In [62]:
# Function to call the OpenAI API with the custom prompt
def get_completion_with_custom_prompt(user_query, dataset_texts, model="gpt-3.5-turbo"):
    system_prompt, formatted_query = generate_custom_prompt(user_query, dataset_texts)

    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": formatted_query}
            ],
            temperature=0.7,
            max_tokens=1000
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"


In [63]:
# Function to call the OpenAI API with a basic prompt (for comparison)
def get_completion_with_basic_prompt(user_query, model="gpt-3.5-turbo"):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_query}
            ],
            temperature=0.7,
            max_tokens=1000
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

## Custom Performance Demonstration
===== Example usage (commented out to avoid actual API calls) =====

### Test the custom query with a sample question

> sample_query = "Where can I drop off food scraps in Brooklyn on weekends?"

### Get response using the custom prompt

> custom_response = get_completion_with_custom_prompt(sample_query, df_subset['text'].tolist())
>
> print("Response with custom prompt:\n")
>
> print(custom_response)

### Get response using a basic prompt for comparison

> basic_response = get_completion_with_basic_prompt(sample_query)
>
> print("\nResponse with basic prompt:\n")
>
> print(basic_response)


In [65]:

# Question 1
question1 = "Can I compost meat and dairy at the Astoria Pug drop-off site?"

# Get response with custom prompt
custom_response1 = get_completion_with_custom_prompt(question1, df_subset['text'].tolist())
print("Question: Can I compost meat and dairy at the Astoria Pug drop-off site?\n")
print("Response with custom prompt:\n")
print(custom_response1)

# Get response with basic prompt
basic_response1 = get_completion_with_basic_prompt(question1)
print("\nResponse with basic prompt:\n")
print(basic_response1)

# Question 2
question2 = "What are the drop-off sites that are open 24/7 in Manhattan?"

# Get response with custom prompt
custom_response2 = get_completion_with_custom_prompt(question2, df_subset['text'].tolist())
print("\nQuestion: What are the drop-off sites that are open 24/7 in Manhattan?\n")
print("Response with custom prompt:\n")
print(custom_response2)

# Get response with basic prompt
basic_response2 = get_completion_with_basic_prompt(question2)
print("\nResponse with basic prompt:\n")
print(basic_response2)



Question: Can I compost meat and dairy at the Astoria Pug drop-off site?

Response with custom prompt:

No, meat and dairy are not accepted at the Astoria Pug drop-off site. The site is located at Ditmars Boulevard and 41st Street in Queens, and it is hosted by Astoria Pug. Only food scraps excluding meat, bones, or dairy are accepted at this location. The drop-off hours are on Mondays from 8:00 AM to 2:00 PM. For more information, you can visit their website at https://www.instagram.com/astoriapug/?hl=en.

Response with basic prompt:

No, it is not recommended to compost meat and dairy at a community compost drop-off site like the one in Astoria. Meat and dairy products can attract pests, produce odors, and take longer to break down compared to plant-based materials. It's best to stick to composting fruit and vegetable scraps, coffee grounds, eggshells, and yard waste at community composting sites. If you have meat and dairy products to dispose of, it's better to use a designated wast