<a href="https://colab.research.google.com/github/wendywqz/ML/blob/main/GenAI_Inferring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Setup

import openai
import os
from dotenv import local_dotenv, find dotenv
_ = local_dotenv(find_dotenv()) # read Local .env file

openai.api_key = os.getenv("OPENAI_API_KEY")

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
  messages = [{"role": "user", "content": prompt}]
  response = openai.ChatCompletion.create(
    model=model,
    messages=messages,
    temperature=0,  # this is the degree of randomness of the model's output
  )
  return response.choices[0].message["content"]

#Product review text

In [None]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!

#Sentiment (positive/negative)

In [None]:
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

【Output】

The sentiment of the review is positive. The reviewer is satisfied with the lamp they purchased, mentioning the additional storage, reasonable price, fast delivery, good customer service, and ease of assembly. They also praise the company for caring about their customers and products.

In [None]:
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

# Then the output is 'Positive'

#Identify types of emotions

In [None]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

# by adding additional prompt: Justify the reasons why you leverage such words.

【Output 1】happy, satisfied, grateful, impressed, content

【Output 2】happy, satisfied, grateful, impressed, content

The writer expresses happiness and satisfaction with the lamp they purchased, as well as gratitude towards the company for their quick response to issues. They are impressed with the customer service and overall content with their experience.

#Identify anger

In [None]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

# Output is 'No'.

# Extract product and company name from customer reviews

In [None]:
prompt = f"""
Identify the following items from the review text:
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

【Output】

{
  "Item": "lamp",
  
  "Brand": "Lumina"
}

# Doing multiple tasks at once

In [None]:
prompt = f"""
Identify the following items from the review text:
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

【Output】


{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}

#Infer 5 topics

In [None]:
story = """
In a recent survey conducted by the government,
public sector employees were asked to rate their level
of satisfaction with the department they work at.
The results revealed that NASA was the most popular
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings,
stating, "I'm not surprised that NASA came out on top.
It's a great place to work with amazing people and
incredible opportunities. I'm proud to be a part of
such an innovative organization."

The results were also welcomed by NASA's management team,
with Director Tom Johnson stating, "We are thrilled to
hear that our employees are satisfied with their work at NASA.
We have a talented and dedicated team who work tirelessly
to achieve our goals, and it's fantastic to see that their
hard work is paying off."

The survey also revealed that the
Social Security Administration had the lowest satisfaction
rating, with only 45% of employees indicating they were
satisfied with their job. The government has pledged to
address the concerns raised by employees in the survey and
work towards improving job satisfaction across all departments.
"""

In [None]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

【Output】

1. Survey
2. Job satisfaction
3. NASA
4. Social Security Administration
5. Government pledge

In [None]:
response.split(sep=',')

【Output】


['1. Survey\n2. Job satisfaction\n3. NASA\n4. Social Security Administration\n5. Government pledge']

In [None]:
topic_list = [
    "nasa", "local government", "engineeering", "employee satisfaction", "federal government
]

#Make a news alert for certain topics

In [None]:
promt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answers as follows:
item from the list: 0 or 1

List of topics: {",".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

【Output】

nasa: 1

local government: 0

engineering: 0

employee satisfaction: 1

federal government: 1

In [None]:
topic_dict = {i.split(': ')[0]: int(i.split(': ')[1]) for i in response.split(sep='\n')}
if topic_dict['nasa'] == 1:
    print("ALERT: New NASA story!")


ALERT: New NASA story!

#Web text retrieval

In [3]:
import requests
from bs4 import BeautifulSoup

# Step 1: Fetch the web page
url = "https://www.npr.org/2025/01/23/nx-s1-5259822/u-s-colleges-saw-enrollment-rise-last-fall"  # Replace with the actual URL
response = requests.get(url)

# Step 2: Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")

reviews = []
for review in soup.find_all("div", class_="review-text"):
    reviews.append(review.get_text(strip=True))  # Extract and clean the text

print (reviews)

# NOTE! The result is a empty blanket


[]


In [None]:
# The codes above mainly retrive the div class review-text.
# However, the page does not have reivews in that class. So, the list is empty.

import requests
from bs4 import BeautifulSoup

# Step 1: Fetch the web page
url = "https://www.npr.org/2025/01/23/nx-s1-5259822/u-s-colleges-saw-enrollment-rise-last-fall"  # Replace with the actual URL
response = requests.get(url)

# Step 2: Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")

# Step 3: Extract article content
article_content = []
for paragraph in soup.find_all("p"):  # <p> tags often contain the body text
    article_content.append(paragraph.get_text(strip=True))

# Step 4: Print the extracted content
print(article_content)


【Output】

['Elissa Nadworny', 'Students walk across the campus of the University of Maryland. A new round of data shows that college enrollment in the U.S. has surpassed pre-pandemic levels.The Washington Post/via Getty Imageshide caption', 'College enrollment in the U.S. rose for the first time last fall to surpass pre-pandemic levels, new figures out today show.', 'Across undergraduate and graduate programs, total enrollment rose 4.5 percent, or 817,000 students, according to the National Student Clearinghouse Research Center.', "The numbers provided welcome news to colleges worried about the Biden administration'sbotched revampof the federal student aid application known as FAFSA, and reports showing many Americans questioning thevalue of a college degree.", 'Among the incoming freshman class in the fall of 2024, enrollment increased 5.5 percent, or 130,000 students, the data shows.', 'The growth among freshmen "is driven by older first-year students, as 18-year-olds are still below their 2019 numbers," Doug Shapiro, the center\'s executive director, said in a statement.', 'The research center also corrected an error in data released last fall that mistakenly showed freshman enrollment had declined, Shapiro said.', 'The latest figures are a relief to higher education experts worried about a looming"demographic cliff"expected to bring enrollment declines in coming years. That could mean trouble for colleges in terms of lost revenue, and trouble for the economy by creating shortages of educated graduates.', '"The fact that students are both seeing the value in college and enrolling, I think, is really great news," said Tolani Birtton, an associate professor at the University of California, Berkeley, who studies higher education.', 'In the past few years of enrollment drops, many in higher education were asking the question, "Will it ever recover?" Britton said. "And what we\'ve seen is the answer to that in some ways is yes."', 'A particular bright spot in the new figures was at community colleges, which saw the biggestenrollment declines during the pandemic. Freshman enrollment at community colleges rose 7.1 percent last fall, while their overall enrollment rose 5.9 percent, or 325,000 students.', 'The positive trend across higher education is important because the U.S. economy is expected to create many more jobs "needing some type of credential to be able to do those jobs adequately," said Nicole Smith, a research professor and chief economist at the Georgetown University Center on Education and the Workforce.', "It's not just the lack of workers with credentials, Smith added. She worries about upcoming labor shortages due to retirements,especially in trade jobs. More students seeking degrees or certifications could help fill gaps across the labor market.", '"We are hoping that by increasing enrollment that a number of people will be better prepared to take those jobs in the future."', 'Sponsor Message', 'Become an NPR sponsor']

In [None]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{article_content}'''
"""
response = get_completion(prompt)
print(response)

- College enrollment
- Freshman enrollment
- Higher education
- Demographic cliff
- Labor shortages

In [None]:
reponse.split(sep=',')

Output

['- College enrollment\n- Freshman enrollment\n- Higher education\n- Demographic cliff\n- Labor shortages']

In [None]:
# Set the target topics in the list, NOTE: even some topic words are incompleted,
# ChatGPT can complete the missing letters

topic_list = [
    "higher education", "local Labor", "engineering",
    "atisfaction employee ", "ederal government"
]

In [None]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as follows:
item from the list: 0 or 1

List of topics: {", ".join(topic_list)}

Text sample: '''{article_content}'''
"""
response = get_completion(prompt)
print(response)

Output


- item from the list: 1
- higher education: 1
- local Labor: 0
- engineering: 0
- satisfaction employee: 0
- federal government: 1