<a href="https://colab.research.google.com/github/wgalindo1453/PythonTeachingMaterial/blob/main/VisualAssistantVBI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Visual Assistant using ChatGPT (VBI Tutorial)**

##What This Program Does:

This program is like a smart helper on your computer. It can look at your screen, figure out what's on it, and then tell you about it using speech. It also listens to your voice commands. Pretty cool, right?

##Importing Tools

In [None]:
import pyttsx3 #This is like a robot voice for your computer. pyttsx3 is a tool that allows the computer to talk to you.
import base64 #Think of base64 as a secret code translator. When you want to send images over the internet, they need to be in a special code.
import speech_recognition as sr  #This tool is like the computer's ears. It's used for listening to what you say.
import keyboard #The keyboard library is all about listening to your keyboard. It helps the program know when you press a key.
import pyautogui #This tool is like the computer's eyes and hands. pyautogui can take pictures of your screen (like a screenshot)
from openai import OpenAI #This library contains tools and functions that allow us to communicate with OpenAI's services

#Each of these tools gives the computer special abilities, like talking,
#listening, seeing your screen, understanding keyboard presses, and sending images.
#With these tools, you can make your computer do some pretty amazing things


##Setting up Connection to ChatGPT
We use something called an API key to talk to ChatGPT, which is a very smart robot that can describe pictures.


In [None]:

# OpenAI API Key
client = OpenAI(
    api_key="YOUR API KEY HERE"
)


##Taking a Screenshot
This part is like telling your computer to take a picture of what's on the screen and save it.

In [1]:
# Define a function named 'capture_screenshot'. This is a set of instructions for the computer.
def capture_screenshot():
    # Use the pyautogui tool to take a screenshot. It's like telling your computer to take a picture of the screen.
    screenshot = pyautogui.screenshot()

    # Save the screenshot as a file named 'screenshot.png'. It's like saving the picture in your computer's memory.
    screenshot.save('screenshot.png')

    # The function then gives back the name of the file where the screenshot was saved.
    return 'screenshot.png'


##Converting the Image
Here, we turn the screenshot into a special code that we can send over the internet.

In [None]:
# Define a function named 'encode_image'. This function is for turning an image into a special code.
def encode_image(image_path):
    # Open the image file at the given path ('image_path').
    # 'rb' means read the file in binary mode, which is the way computers read files.
    with open(image_path, "rb") as image_file:
        # Convert the image file to base64 code. This is like translating the image into a secret language.
        # Then, decode this code into a string format that can be easily used and sent over the internet.
        return base64.b64encode(image_file.read()).decode('utf-8')


##Asking ChatGPT about the Image:
We send the image to ChatGPT and ask, "What's in this image?" ChatGPT looks at the image and gives us an answer.

In [None]:
# Define a function named 'analyze_image' for analyzing the content of an image.
def analyze_image(image_path):
    # First, convert the image at the given path to base64 format using the 'encode_image' function we defined earlier.
    base64_image = encode_image(image_path)

    # Send a request to the OpenAI's ChatGPT model to analyze the image.
    # This is like asking a very smart robot to look at the image and tell us what it sees.
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",  # Specify the model to use, here it's GPT-4 optimized for analyzing images.
        messages=[
            {
                "role": "system",  # This is a system message that sets up the context for the analysis.
                "content": """
                            Whats in this image?
                            """,  # The question we are asking about the image.
            },
            {
                "role": "user",  # This represents the user's input.
                "content": [
                    # Provide the image in base64 format as a URL. This is how we show the image to the model.
                    {"type": "image_url", "image_url": f"data:image/jpeg;base64,{base64_image}"},
                ],
            },
        ],
        max_tokens=500,  # This sets a limit on how long the response from the model can be.
    )

    # Get the response from the model. This is what the model thinks is in the image.
    response_text = response.choices[0].message.content

    # Return the response text. This is the model's description of the image.
    return response_text


##Reading Out Loud:
This is where the computer reads out what ChatGPT said about the image. It's like having a friend who can describe things to you.

In [None]:
# Define a function called 'narrate_text'. This function is used to make the computer speak out loud.
def narrate_text(text):
    # Initialize the text-to-speech engine. This is like starting up the tool that will read the text out loud.
    engine = pyttsx3.init()

    # Tell the text-to-speech engine to say the provided text. It's like giving it a script to read.
    engine.say(text)

    # Make the program wait until the text-to-speech engine has finished speaking.
    # This ensures that the computer doesn't move on to do something else before finishing speaking.
    engine.runAndWait()


##Listening to Your Voice:
 Here, the computer listens to what you say. You need to press and hold the 'shift' key and then speak.

In [None]:
# Define a function called 'get_voice_input' to listen to and recognize spoken words.
def get_voice_input():
    # Create a speech recognizer. This is like giving your computer the ability to understand spoken language.
    recognizer = sr.Recognizer()

    # Start using the computer's microphone as the source of audio.
    with sr.Microphone() as source:
        # Print a message to let the user know that the computer is ready to listen.
        print("Listening... Press and hold a key to speak.")

        # Set how long the recognizer waits after the speaker stops talking before it considers the speech complete.
        recognizer.pause_threshold = 0.5  # waits for half a second of silence before ending

        # Wait for the user to press the 'shift' key to start speaking. This is like a 'start talking' button.
        keyboard.wait('shift')

        # Listen for the audio (speech) from the microphone.
        audio = recognizer.listen(source, timeout=None, phrase_time_limit=None)

        # Print a message to let the user know that the speech is being processed.
        print("Processing...")

    # Try to recognize what was said using Google's speech recognition service.
    try:
        return recognizer.recognize_google(audio)
    except sr.UnknownValueError:
        # If the speech was unclear, return a message saying it couldn't understand the audio.
        return "Could not understand audio"
    except sr.RequestError:
        # If there's a problem with the internet connection, return a message to check the connection.
        return "Could not request results; check your internet connection"


##Putting It All Together:
This is where everything comes together. The computer keeps listening to you. When you speak, it takes a screenshot, asks ChatGPT about it, and then reads the description back to you.

In [None]:
# Define the main function that orchestrates the other functions in the program.
def main():
    try:
        # Start an infinite loop. This means the program will keep running until we tell it to stop.
        while True:
            # Get a voice command from the user and convert it to lowercase.
            command = get_voice_input().lower()

            # Check if the command is to exit or quit the program.
            if 'exit' in command or 'quit' in command:
                break  # If the command is to exit or quit, break out of the loop and end the program.
            else:
                # If the command is not to exit, then take a screenshot.
                image_path = capture_screenshot()

                # Analyze the screenshot and get a description of it.
                description = analyze_image(image_path)

                # Read out the description using the text-to-speech function.
                narrate_text(description)
    except KeyboardInterrupt:
        # If there is a keyboard interrupt (like pressing Ctrl+C), pass and do nothing.
        pass

# Check if this script is being run directly (and not being imported elsewhere).
if __name__ == '__main__':
    # If the script is being run directly, call the main function to start the program.
    main()


###NOTE**::
Be Patient:
Sometimes, the computer might not understand you, or the internet might be slow. Just try again.

That's pretty much it! You've got a smart helper right in your computer that can see and speak, thanks to this program. Isn't technology amazing? ðŸŒŸ