&copy; 2024 by Deitel & Associates, Inc. All Rights Reserved. https://deitel.com

<hr/>

In [None]:
%%html
<style>
    h1 {color:#BB0000}
    h2 {color:purple}
    h3 {color:#0099ff}
    hr {    
        border: 0;
        height: 3px;
        background: #333;
        background-image: linear-gradient(to right, #ccc, black, #ccc);
    }
</style>

In [None]:
# enable high-res images in notebook 
%config InlineBackend.figure_format = 'retina'

# What is Generative AI?
* AI that creates content
    * Text, images, audio, video, music, poetry, code, ...
    * Makes content creation accessible to everyone
* Can be tailored to specific requirements
* References
    * https://en.wikipedia.org/wiki/Generative_artificial_intelligence
    * https://www.techtarget.com/searchenterpriseai/definition/generative-AI
    * https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai

<hr/>

# What is a Large Language Model (LLM)?
* AI that understands/generates natural language 
* "Large" — Neural nets with billions, and now trillions, of parameters
    * As neural nets learn, they tune parameters in an effort to produce better results
    * Parameters helps models understand/generate language patterns 
* Trained on massive amounts of text
    * books, articles, the Internet, code, ...
* Great at natural language tasks
    * Translation, summarization, answering questions, sentiment analysis, creative writing, ...
    * Also generating code, explaining code, finding errors in code, ...
* Can generate text indistinguishable from human writing
* Some continue to learn and adapt, improving effectiveness over time
* LLMs typically use generative pretrained transformers (GPTs)
* References
    * https://en.wikipedia.org/wiki/Large_language_model
    * https://www.techtarget.com/whatis/definition/large-language-model-LLM
    * https://machinelearningmastery.com/what-are-large-language-models/

<hr/>

# What is a Transformer? 
* Neural network that processes sequential data, like text
* Can understand/recognize relationships between distant words in a sentence 
* Also recognizes the relative importance of words 
* References for Transformers
    * https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)  
    * https://blogs.nvidia.com/blog/what-is-a-transformer-model/
    * https://towardsdatascience.com/transformers-141e32e69591

<hr/>

# What is a Self-Attention? 
* Key aspect of transformers
* When you try to understand a word in a sentence, words around it (its context) can help you understand the meaning
    * "good" by itself is positive, but preceding it with "not" makes the context negative
* Transformers use self-attention to understand each word by looking at other words around it
    * Words are not equally important 
    * Self-attention weighs which words to pay attention to when trying to understand each word
* Unlike humans, transformers can consider massive numbers of words in parallel to understand how words are connected
* References for Self-Attention 
    * https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a
    * https://en.wikipedia.org/wiki/Attention_(machine_learning)

<hr/>

# Tokens, Context and Context Windows 
* Token 
    * Unit of processing in language models
    * Words, parts of words, punctuation
    * Tokenization converts text into format model can process
    * Each token is represented by a numerical value
    * Models typically have token limits
* Context
    * Sequence of tokens that comes before (and sometimes after) a word/phrase
    * Understanding context helps models generate more accurate responses
* Context window
    * Maximum range of tokens (both input and output) model can consider at once
    * Larger == more context to the conversation == more relevant/coherent responses

<hr/>

# Prompts and Prompt Engineering
* Poor prompts, lead to poor results — "garbage in, garbage out"
* Prompt engineering is the process of desining prompts to obtain the best responses
* OpenAI provides various strategies
    * https://platform.openai.com/docs/guides/prompt-engineering

<hr/>

# Issues with Generative AI
* Ethics — e.g., misinformation, realistic images/videos
* Biases
* Hallucinations 
* Copyrights
    * NYT Sues OpenAI: https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
* Picks up human traits? 
    * ChatGPT gets lazy (“winter break hypothesis”): https://arstechnica.com/information-technology/2023/12/is-chatgpt-becoming-lazier-because-its-december-people-run-tests-to-find-out/)
* ...

<hr/>

# OpenAI 
* https://openai.com/
* One of many generative AI providers 
* Creator of ChatGPT, Dall-E
* APIs for
    * Audio — speech-to-text, text-to-speech, translation
    * Chat (ChatGPT) 
    * Fine - tuning—tailoring to your own data
    * Images (Dall-E)
    * Models — explains all the available models
    * Moderations — checks content policy violations
    * Assistants — create assitants that can perform step-by-step tasks
    * more...
* OpenAI Cookbook — loaded with examples: https://cookbook.openai.com/
* OpenAI Docs — tutorials, guides, ...: https://platform.openai.com/docs/overview
* OpenAI API Reference: https://platform.openai.com/docs/api-reference 

<hr/>

# Installing the OpenAI Python Module
* `pip install --upgrade openai`
    * Requires Python 3.7.1 or higher 

<hr/>

# Get an OpenAI Developer Account
* Signup: https://platform.openai.com/signup
* Pricing — small free credit depending on location 
> https://openai.com/pricing
* Rate Limits 
> https://platform.openai.com/docs/guides/rate-limits/rate-limits

<hr/>

## OpenAI Developer API Key
* While signed into your account
    * Go to https://platform.openai.com/docs/overview
    * Hover over the icons at the left of the page and click the **API keys** icon (looks like a padlock)
    * Click **Create new secret key**
    * Optionally name your key
    * Click **Create secret key**
    * Copy the lengthy alphanumeric key
    * Follow instructions for your platform at https://platform.openai.com/docs/quickstart to store the API key in an environment variable
    * **Restart your command line before launching iPython or Jupyter Lab**

<hr/>

# Some of OpenAIs Models with APIs

| MODEL | DESCRIPTION |
|-------|-------------|
| GPT-4 & GPT-4 Turbo | Latest models. Understand/generate natural language or code. |
| GPT-3.5 | Previous models. Understand/generate natural language or code. |
| DALL·E | Generates/edits images in response to natural language prompts. |
| TTS | Converts text to speech. |
| Whisper | Transcribes and translates audio. |
| Embeddings | Converts text into numerical form. |
| Moderation | Detects sensitive/unsafe text. |

<hr/>

# Speech-to-Text, Text-to-Speech, Language Translation
* Based on OpenAI's tutorial: Creating an automated meeting minutes generator with Whisper and GPT-4
> https://platform.openai.com/docs/tutorials/meeting-minutes

## Importing the Modules for This Demo

In [None]:
from openai import OpenAI 
from docx import Document # for creating a Microsoft Word docx file

## Creating the `OpenAI` Client Object
* Provides access to the OpenAI APIs

In [None]:
client = OpenAI() 

* Assumes your API key is in the environment variable `OPENAI_API_KEY`
* If you used a different environment variable name, replace with preceding statement with
> `client = OpenAI(api_key=os.environ.get("CUSTOM_ENV_NAME"))`

## `speech_to_text` Function
* Opens file at `audio_path` for reading in binary mode (`'rb'`)
* Calls OpenAI `client` object's `audio.transcriptions.create` method 
    * passes `audio_file` to OpenAI's `whisper-1` model to create transcript

In [None]:
def speech_to_text(audio_path):
    with open(audio_path, 'rb') as audio_file:
        transcript = client.audio.transcriptions.create(
            model='whisper-1', file=audio_file)
    return transcript.text 

In [None]:
transcript = speech_to_text('deep_learning_intro.m4a')

In [None]:
transcript

## `create_abstract` Function
* `chat.completions.create` method responds to prompts, similar to interacting directly with ChatGPT through its web interface
* `model` specifies OpenAI model that will formulate a response
* Each item in `messages` list is a dictionary containing two keys
    * `role`: `system`, `user` or `assistant`
    * `content`: Prompt passed to the OpenAI model
* Use `role`: `system` when prompting the model with details to help it perform a task
* Use `role`: `user` when prompting for the task to perform
* For more details about the message formats: https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models

In [None]:
def create_abstract(transcript):
    response = client.chat.completions.create(
        model='gpt-4',
        messages=[
            {'role': 'system',
             'content': """Given the following transcript of a 
                 technical presentation, create a summary abstract 
                 that is concise, clear, and written in a direct 
                 style suitable for a 10th-grade reading level. 
                 Focus on the key points of the presentation without 
                 referring to the speaker. Avoid using prepositional 
                 phrases and unnecessary words. Aim for straightforward 
                 sentence structures. Capture the presentation's 
                 essence, enabling a person to understand the presentation 
                 without having to read the full transcript. Format the 
                 abstract in a single, well-structured paragraph."""},
            {'role': 'user',
             'content': transcript}
        ]
    )
    return response.choices[0].message.content
    # Old way to get content--OpenAI tutorial is not yet up to date:
    # return response['choices'][0]['message']['content']

In [None]:
summary = create_abstract(transcript)

In [None]:
summary

## `get_key_points` Function
* Prompt explains how GPT-4 should extract transcript's key points

In [None]:
def get_key_points(transcript):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {'role': 'system',
             'content' : """Given the following transcript of a 
                 technical presentation, identify and list the 
                 presentation's key points. Keep your writing 
                 concise, direct, and suitable for a 10th grade 
                 reading level. Avoid unnecessary words and excessive 
                 use of prepositional phrases. Aim for clear, 
                 straightforward sentence structures. Present this 
                 information as a numbered list, each point summarizing 
                 a separate key aspect of the presentation."""},
            {'role': 'user',
             'content': transcript}
        ]
    )
    #return response['choices'][0]['message']['content']
    return response.choices[0].message.content

In [None]:
key_points = get_key_points(transcript)
print(key_points)

## `analyze_sentiment` Function
* Prompt explains how GPT-4 should analyze transcript's sentiment

In [None]:
def analyze_sentiment(transcript):
    response = client.chat.completions.create(
        model='gpt-4',
        messages=[
            {'role': 'system',
             'content': """You are an expert in sentiment analysis. 
                 Analyze the following presentation transcript, then 
                 explain whether the sentiment is positive, negative, 
                 or neutral. Explain your analysis."""},
            {'role': 'user',
             'content': transcript}
        ]
    )
    return response.choices[0].message.content

In [None]:
sentiment = analyze_sentiment(transcript)
sentiment

## `save_as_docx` Function
* Saves a dictionary of key-value pairs in `docx` format
    * Used by Microsoft Word 
    * Supported by other word processors like Google Docs
* Document headings are keys in `data_dict`
    * Output as level 1 heading
* Document contents are output as paragraphs

In [None]:
def save_as_docx(data_dict, filename):
    document = Document() 
    
    for heading, content in data_dict.items():
        document.add_heading(heading, level=1)
        document.add_paragraph(content)
        
    document.save(filename)

In [None]:
data_dict = {
    'Presentation Overview': summary,
    'Key Points': key_points,
    'Sentiment Analysis': sentiment
}

In [None]:
save_as_docx(data_dict, 'presentation_summary.docx')

## `text_to_speech` Function
* Synthesizes speech using OpenAI Audio API
* Model `'tts-1'` supports 6 voices and 57 languages
    * Voices optimized for English, but can speak other languages
* Supports streaming audio with chunk transfer encoding
* MP3 is default format
    * Also supports Opus (Internet streaming), AAC (compressed), FLAC (lossless)
* Can't control emotion (yet)
> "There is no direct mechanism to control the emotional output of the audio generated. Certain factors may influence the output audio like capitalization or grammar but our internal tests with these have yielded mixed results." (https://platform.openai.com/docs/guides/text-to-speech/faq)
* For more info, see **Text to speech** guide:
> https://platform.openai.com/docs/guides/text-to-speech/voice-options

In [None]:
def text_to_speech(text, filename):
    response = client.audio.speech.create(
        model='tts-1',
        voice='onyx',
        input=text,
    )

    response.stream_to_file(f'{filename}.mp3')

In [None]:
text_to_speech(summary, 'english_summary')

* `IPython` module provides various utilities, including an audio player

In [None]:
import IPython
IPython.display.Audio('english_summary.mp3')

# `translate` Function
* Text translation via chat completion
* `system` role's `content` prompt tells chat completion how to operate

In [None]:
def translate(text, language):
    response = client.chat.completions.create(
        model='gpt-4',
        messages=[
            {'role': 'system',
             'content': f"""You are an expert in natural language translation.
                 Translate the following text into {language}."""},
            {'role': 'user',
             'content': text}
        ]
    )
    #return response['choices'][0]['message']['content']
    return response.choices[0].message.content

## Text to Translate

In [None]:
text = 'Today is a beautiful day! Tomorrow looks like bad weather.'

## Translate to Spanish
* Automatically figures out source language

In [None]:
spanish_text = translate(text, 'Spanish')
spanish_text

In [None]:
chinese_text = translate(text, 'Chinese')
chinese_text

In [None]:
text_to_speech(spanish_text, 'Spanish')

In [None]:
IPython.display.Audio('Spanish.mp3')

In [None]:
text_to_speech(chinese_text, 'Chinese')

In [None]:
IPython.display.Audio('Chinese.mp3')

<hr/>

# Generating Python Code with the Chat Completions API

In [None]:
response = client.chat.completions.create(
    model='gpt-4',
    messages=[
        {'role': 'system', 
         'content': 'You are an expert Python programmer.'},
        {'role': 'user', 
         'content': """Write Python code that uses the word_cloud 
             Python library to generate a 1000 pixel by 1000 pixel 
             rainbow color word cloud from the top 200 words in the 
             file 'RomeoAndJuliet.txt' which is in the current folder. 
             Remove both modern English and old English stop words. 
             Use the mask image named mask_heart.png, which is also in 
             the current folder. Display the generated word cloud."""}
    ]
)

In [None]:
print(response.choices[0].message.content)

In [None]:
import os
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords as sw
import nltk

# Install nltk 'stopwords' if not already installed
nltk.download('stopwords')

def make_word_cloud(filename, mask_image, max_words=200):
    with open(filename, 'r', encoding="utf-8") as f:
        text = f.read()

    stopwords = set(STOPWORDS)
    stopwords.update(sw.words('english'))

    mask = np.array(Image.open(mask_image))

    wc = WordCloud(stopwords=stopwords, max_words=max_words, mask=mask, contour_color='black',
                   contour_width=0.5, colormap='rainbow', background_color='white'
                   )

    wordcloud = wc.generate(text)

    plt.figure(figsize=[10,10])
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()

make_word_cloud('RomeoAndJuliet.txt', 'mask_heart.png', 200)

<hr/>

# Generating an Image with the Image API using Dall-E 3
* Generated Sunday using the Image API's `'dall-e-3'` model
<img src="HavaneseDog.png" alt="Drawing" style="width: 300px;"/>

In [None]:
response = client.images.generate(
    model='dall-e-3',
    prompt="""Havanese dog as a japanese anime character  
              in neon colors against a black background""",
    quality='hd'
)

In [None]:
response

In [None]:
response.data[0].url

In [None]:
response = client.images.generate(
    model='dall-e-3',
    prompt="""Havanese dog in the style of Vincent Van Gogh""",
    quality='hd'
)

In [None]:
response

In [None]:
response = client.images.generate(
    model='dall-e-3',
    prompt="""Havanese dog in the style of Leonardo DaVinci""",
    quality='hd'
)

In [None]:
response

<hr/>

# Other Cool Stuff To Check Out
* ChatGPT as a Data Analytics Tool ― https://mitsloanedtech.mit.edu/ai/tools/data-analysis/how-to-use-chatgpts-advanced-data-analysis-feature/
* OpenAI Playground ― https://platform.openai.com/playground
* 7 Best AI Art Generators ― https://www.techrepublic.com/article/best-ai-art-generators/
* Top 20 ChatGPT Prompts For Software Developers ― https://www.geeksforgeeks.org/chatgpt-prompts-for-software-developers/
* OpenAI Developer Forum ― https://community.openai.com/
* Google Project Gemini ― https://blog.google/technology/ai/google-gemini-ai/
* Vectorizer.ai — Converts generated images to vector graphics ― https://vectorizer.ai
* "A Very Gentle Introduction to Large Language Models without the Hype" by Mark Riedl, Georgia Tech Professor ― https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e

<hr/>
&copy; 2024 by Deitel & Associates, Inc. All Rights Reserved. https://deitel.com