# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [3]:
import os
import pandas as pd
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

Load Invironment variables

In [4]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the API key
if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-"):
    print("An API key was found, but it doesn't start with 'sk-'; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

client = OpenAI(api_key=api_key)
model = 'gpt-4'

API key found and looks good so far!


## Read URLs from CSV

Define a function to read URLs from a CSV file.

In [5]:
file_path = 'sitemap.csv'  # Replace with your file path

def read_urls_from_csv(file_path, urls_column='URL'):
    try:
        df = pd.read_csv(file_path)
        urls_list = {index: row[urls_column] for index, row in df.iterrows()}
        return urls_list
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

urls_list = read_urls_from_csv(file_path, 'URL')
print(urls_list)

{0: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/', 1: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/snippets.gtl', 2: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/newsnippet.gtl', 3: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/upload.gtl', 4: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/editprofile.gtl', 5: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/logout', 6: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/deletesnippet?index=0', 7: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/deletesnippet?index=1', 8: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/cheese.png', 9: 'https://google-gruyere.appspot.com/662787643234587727633733504808217804147/feed.gtl?uid=admin', 10: 'https://google-gruyere.appspot.com/662787643234

## Define Prompts and Functions

Define the system prompt and functions to analyze and stream URLs.

In [6]:
system_prompt = """You are a web application developer and security consultant. 
Your role is to analyze URLs and determine which URLs might be similar in function. Respond in markdown. Group similar URLs into groups."""

def get_url_user_prompt(urls_list):
    truncated_urls = []
    total_length = 0
    for url in urls_list.values():
        if total_length + len(url) + 1 > 5000:
            break
        truncated_urls.append(url)
        total_length += len(url) + 1
    return "Here is the list of URLs to analyze:\n" + "\n".join(truncated_urls)

def analyse_urls(urls_list):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_url_user_prompt(urls_list)}
        ]
    )
    display(Markdown(response.choices[0].message.content))

def stream_url(urls_list):
    stream = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_url_user_prompt(urls_list)}
        ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            response += chunk.choices[0].delta.content
            response = response.replace("```", "").replace("markdown", "")
            update_display(Markdown(response), display_id=display_handle.display_id)


In [7]:
analyse_urls(urls_list)

Looking at the URLs and analyzing their functionality, we can organize them into the following groups:

**Group 1: Base URLs**
These URLs represent the landing/home pages for unique users identified by a unique long string number. They seem to serve the same base function.
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/

**Group 2: Snippet URLs**
These URLs all relate to "snippets". They might be involved in creating, viewing, and deleting snippets.
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/snippets.gtl
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/newsnippet.gtl
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/deletesnippet?index=0
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/deletesnippet?index=1
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/snippets.gtl?uid=cheddar
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/snippets.gtl?uid=brie

**Group 3: User Profile URLs**
These URLs relate to user interact actions like uploading, editing profile and logout functions.
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/upload.gtl
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/editprofile.gtl
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/logout

**Group 4: Feed URLs**
These URLs, identified by the keyword 'feed', are likely related to generating feeds for individual users.
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/feed.gtl?uid=admin
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/feed.gtl

**Group 5: Media URLs**
This URL has a filetype .png in the end, indicating that it's likely for serving media files.
* https://google-gruyere.appspot.com/662787643234587727633733504808217804147/cheese.png

**Group 6: Google Images and News URLs**
These URLs are for searching specific results on Google Images and Google News.
* https://images.google.com/?q=cheddar cheese
* https://news.google.com/news/search?q=brie

## Example Usage

Analyze the URLs using the defined functions.

In [None]:
# Get Llama 3.2 to answer