# Homework 3


Imagine you're a tech journalist at a major publication. Your editor has just greenlit your next big feature: a compelling deep-dive into OpenAI and Nvidia — two titans in the artificial intelligence space that have shaped the AI boom in dramatically different ways.

On one hand, OpenAI is at the forefront of generative AI and large language models, capturing public imagination with ChatGPT and partnerships with Microsoft. On the other hand, Nvidia powers the hardware backbone of this revolution, producing the GPUs that fuel AI training at scale — and recently became one of the most valuable companies in the world.

Beyond their core innovations, these companies also differ in how they engage with the public. Who is more influential on social media? Which brand has better engagement? Who reaches a broader audience through their content?

In this assignment, you will:

- Collect real Twitter data using the itom6219 package
- Analyze and compare the social presence, content strategy, engagement metrics, and popularity of OpenAI and Nvidia
- Apply techniques like TF-IDF vectorization, topic modeling, and regression analysis
- Draw data-driven conclusions on how these two companies communicate and influence in the digital space



## Step 1: Install ITOM6219 Package.
**You will need to install the itom6219 package to collect Twitter data.**

In [None]:
!pip install --upgrade --force-reinstall git+https://github.com/tantantan12/itom6219.git > /dev/null 2>&1

## Step 2: Account Summary Comparison

You will need to collect Twitter account data and analyze the Twitter account data. This could be done by using the user_info function.

- Write down your code to collect information about the Twitter account of nvidia by using function user_info from the package of itom6219.
- Print account information of OpenAI (save as user1) and nvidia (save as user2).
- What are the data types of user1 and user2?
- Print the column names of user1.
- print the followers_count for user1(OpenAI).
- print the followers_count for user2(nvidia).
- Generate a boolean variable OpenAI_greater_than_nvidia, which takes the value of True if OpenAI has more followers and False otherwise.
- retrieve the user ids of OpenAI and nvidia. It is stored in the column of "id".

*Complete the above requirements using Python.*

In [None]:
# make sure you run the above block of code to install the user package itom6219.
import os
from google.colab import userdata
os.environ["BEARER_TOKEN"] =userdata.get('BEARER_TOKEN')

from itom6219 import user_info, user_tweets, user_tweets_all


user1=user_info(["OpenAI"])

user2=user_info(["nvidia"])  ### Write down your code to collect information about the Twitter account of nvidia.

# Print account information of OpenAI and nvidia.
print(user1)

# What is the data type of user1 and user2?
print(type(user1))
# Print the column names of user1
print(user1.columns)
# print the followers_count for OpenAI
user1['public_metrics.followers_count']
# print the followers_count for nvidia


# generate a boolean variable OpenAI_greater_than_nvidia, which takes the value of True if OpenAI has more followers and False otherwise.
user1['public_metrics.followers_count']>user2['public_metrics.followers_count']
# retrieve the user ids of OpenAI and nvidia. It is stored in the column of "id".

     name          id  verified username  \
0  OpenAI  4398626122      True   OpenAI   

                                         description  \
0  OpenAI’s mission is to ensure that artificial ...   

                 created_at  public_metrics.followers_count  \
0  2015-12-06T22:51:08.000Z                         4077612   

   public_metrics.following_count  public_metrics.tweet_count  \
0                               3                        1143   

   public_metrics.listed_count  public_metrics.like_count  \
0                        23029                        750   

   public_metrics.media_count  
0                         324  
<class 'pandas.core.frame.DataFrame'>
Index(['name', 'id', 'verified', 'username', 'description', 'created_at',
       'public_metrics.followers_count', 'public_metrics.following_count',
       'public_metrics.tweet_count', 'public_metrics.listed_count',
       'public_metrics.like_count', 'public_metrics.media_count'],
      dtype='object')


Unnamed: 0,public_metrics.followers_count
0,True


## Step 3: 100 Tweets Comparison

- Use the function user_tweets from the package itom6219 to retrieve 100 tweets generated by OpenAI and Nvidia. Save the result into "tweets_openAI" and "tweets_nvidia".

- Use function head() to display the first five rows of these two dataframes.

- Use function head(n) to display the irst n rows. Allow n to be 10.

- Display the column names of the dataframes.

- How many rows and columns are in tweets_openAI and tweets_nvidia?

- Summarize tweets_openAI and tweets_nvidia using the function of describe. What's the comparison between average  public_metrics.impression_count and public_metrics.impression_count?

- Display the text of tweet from OpenAI and Nvidia that has the highest public_metrics.bookmark_count and public_metrics.impression_count, and public_metrics.impression_count, respectively.

- Generate a new column by using two existing columns to represent the ratio of likes over impression. Which tweet has the highest like ratio for OpenAI and Nvidia, respectively?

*Complete the above requirements using Python.*

In [None]:
# The instruction below are based on openAI; Repeat the process for Nvidia to answer the above questions.

# Use the function user_tweets from the package itom6219 to retrieve tweets generated by OpenAI. Save the result into "tweets_openAI".
tweets_openAI=user_tweets(["OpenAI"], exclude_replies=True, exclude_retweets=True)
tweets_nvidia=user_tweets(["nvidia"], exclude_replies=True, exclude_retweets=True)

# Use function head() to display the first five rows of tweets_openAI.

# Use function head(n) to display the irst n rows of tweets_openAI. Allow n to be 10.

# Display the column names of tweets_openAI.

# How many rows and columns are in tweets_openAI?

# Summarize the tweets_openAI using the function of describe. What's the average  public_metrics.impression_count and public_metrics.impression_count?

# Display the text of tweet that has the highest public_metrics.bookmark_count and public_metrics.impression_count, and public_metrics.impression_count, respectively.

# Generate a new column by using two existing columns to represent the ratio of likes over impression. Which tweet has the highest like ratio?

#tweets_openAI['like_ratio']=



In [None]:
tweets_openAI["like_ratio"]=tweets_openAI["public_metrics.like_count"]/tweets_openAI["public_metrics.impression_count"]

In [None]:
tweets_openAI.describe()

Unnamed: 0,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count,public_metrics.bookmark_count,public_metrics.impression_count,like_ratio
count,99.0,99.0,99.0,99.0,99.0,99.0,99.0
mean,510.515152,353.131313,4326.666667,231.060606,693.616162,1217767.0,0.005112
std,1200.377257,621.122595,7066.033412,826.919188,1742.293783,3426476.0,0.002213
min,17.0,5.0,231.0,1.0,23.0,46108.0,0.001463
25%,62.5,64.0,882.0,12.5,78.5,189820.5,0.003423
50%,219.0,193.0,2137.0,46.0,240.0,455926.0,0.00487
75%,497.5,491.0,5560.0,194.5,566.0,987958.5,0.0063
max,11290.0,5752.0,61937.0,8025.0,15444.0,32788740.0,0.011121


In [None]:
tweets_nvidia["like_ratio"]=tweets_nvidia["public_metrics.like_count"]/tweets_nvidia["public_metrics.impression_count"]

In [None]:
tweets_nvidia.describe()

Unnamed: 0,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count,public_metrics.bookmark_count,public_metrics.impression_count,like_ratio
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,111.12987,18.051948,603.324675,27.727273,56.233766,136779.1,0.005073
std,218.599029,29.888783,1252.740107,89.07103,146.37487,289455.7,0.002107
min,7.0,0.0,55.0,0.0,1.0,16732.0,0.001089
25%,27.0,6.0,159.0,2.0,6.0,36452.0,0.003695
50%,47.0,9.0,251.0,5.0,14.0,45749.0,0.004632
75%,88.0,17.0,459.0,10.0,31.0,73216.0,0.006677
max,1538.0,166.0,9429.0,567.0,865.0,2035662.0,0.01067


## Step 4: Large Data Comparison

In the previous step, you were able to collectt 100 tweets from OpenAI. Moving forward, use the tweets_openAI.csv and tweets_nvidia.csv, which contain 1500 tweets each. Due to theAPI rate limit, this data is provided to you; you do not need to collect the data by yourself.

In [None]:
#tweets_openAI=user_tweets_all(["OpenAI"],max_total=1500,exclude_replies=True, exclude_retweets=True)

#tweets_openAI.to_csv("tweets_openAI.csv", index=False)

#tweets_nvidia=user_tweets_all(["nvidia"],max_total=1500,exclude_replies=True, exclude_retweets=True)

#tweets_nvidia.to_csv("tweets_nvidia.csv", index=False)


Error fetching tweets for user nvidia: 429


In [None]:
import pandas as pd
df1=pd.read_csv('tweets_openAI.csv')
df2=pd.read_csv('tweets_nvidia.csv')

df1=df1[df1["public_metrics.impression_count"]>0]

df1.describe()


Unnamed: 0,id,conversation_id,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count,public_metrics.bookmark_count,public_metrics.impression_count,in_reply_to_user_id
count,524.0,524.0,524.0,524.0,524.0,524.0,524.0,524.0,233.0
mean,1.802658e+18,1.802612e+18,691.053435,348.801527,4360.278626,376.40458,597.833969,1557279.0,4398626000.0
std,7.738816e+16,7.742115e+16,1955.007489,643.34397,9026.369361,2055.937436,1954.491301,5254878.0,0.0
min,1.603467e+18,1.603467e+18,5.0,1.0,111.0,0.0,2.0,22793.0,4398626000.0
25%,1.74469e+18,1.74469e+18,67.75,44.75,792.25,13.0,68.75,211928.5,4398626000.0
50%,1.815928e+18,1.815928e+18,269.0,213.0,1950.0,76.0,208.5,572518.0,4398626000.0
75%,1.869169e+18,1.869169e+18,657.25,442.5,4911.25,225.0,564.5,1266745.0,4398626000.0
max,1.904711e+18,1.904711e+18,31249.0,9417.0,134783.0,42191.0,36466.0,97565970.0,4398626000.0


In [None]:
df1.sort_values(by="created_at",ascending=True)

Unnamed: 0,created_at,id,conversation_id,text,lang,edit_history_tweet_ids,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count,public_metrics.bookmark_count,public_metrics.impression_count,in_reply_to_user_id,username
523,2022-12-15T19:07:45.000Z,1603466863370854401,1603466863370854401,Our new embedding model is significantly more ...,en,['1603466863370854401'],517,407,2829,72,223,1138908,,OpenAI
522,2023-01-11T14:32:45.000Z,1613182128661069831,1613182128661069831,"We're publishing a report, co-authored with @C...",en,['1613182128661069831'],584,250,2414,88,271,934979,,OpenAI
521,2023-01-17T01:33:01.000Z,1615160228366147585,1615160228366147585,We've learned a lot from the ChatGPT research ...,en,['1615160228366147585'],2175,593,13501,534,896,4203422,,OpenAI
520,2023-01-23T14:01:14.000Z,1617522852273737728,1617522852273737728,We are happy to announce the next phase of our...,en,['1617522852273737728'],2576,676,14528,471,575,2348140,,OpenAI
519,2023-01-31T18:10:32.000Z,1620484691462852609,1620484691462852609,We’re developing a new tool to help distinguis...,en,['1620484691462852609'],1960,676,8817,819,776,3217129,,OpenAI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4,2025-03-25T18:34:40.000Z,1904602854062776515,1904602845221187829,GPT‑4o’s image generation follows complex prom...,en,['1904602854062776515'],43,9,648,8,119,109383,4.398626e+09,OpenAI
3,2025-03-25T18:34:41.000Z,1904602855413301430,1904602845221187829,4o’s ability to precisely blend text with imag...,en,['1904602855413301430'],36,9,568,13,92,87821,4.398626e+09,OpenAI
2,2025-03-25T18:34:41.000Z,1904602856830943674,1904602845221187829,Creating and customizing images is as simple a...,en,['1904602856830943674'],63,24,878,16,139,201170,4.398626e+09,OpenAI
1,2025-03-25T18:34:42.000Z,1904602859259166778,1904602845221187829,Create or transform images into a variety of s...,en,['1904602859259166778'],64,22,979,33,158,176961,4.398626e+09,OpenAI


## Step 5. Data visualization

Modify the code from class exercise to visualize the log(impression+1) and log(like+1) for Twitter and Nvidia over time.

Tip: use pd.to_datetime to convert "created_at" before using it as x-axis.

In [None]:
import plotly.express as px
import numpy as np

df1['log_view']=np.log1p(df1['public_metrics.impression_count'])
df1['log_view']=np.log1p(df1['public_metrics.impression_count'])
df1['datetime']=pd.to_datetime(df1["created_at"])
# create a line plot with Plotly Express
fig = px.line(df1, x='created_at', y='log_view', title='Impression Over Time', template='plotly_white')

# display the plot
fig.show()


## Step 6: Vectorization

Next, we will vectorize the tweet content by generating a document-term matrix with TF-IDF scoress.
- We remove English stopwords.
- max_df=0.8
- min_df=0.02

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
docs=df1['text']
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, min_df=0.02,
stop_words='english')

tfidf = tfidf_vectorizer.fit_transform(docs)


tfidf_df = pd.DataFrame(tfidf.toarray(),
columns=tfidf_vectorizer.get_feature_names_out())
tfidf_df

Unnamed: 0,4o,ability,access,advanced,ai,amp,announcing,api,app,apps,...,voice,want,way,web,week,windows,work,working,world,writing
0,0.000000,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.406877,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.291023,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.400230,0.540485,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.541607,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
519,0.000000,0.000000,0.0,0.0,0.25731,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
520,0.000000,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
521,0.000000,0.000000,0.0,0.0,0.00000,0.0,0.0,0.242589,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
522,0.000000,0.000000,0.0,0.0,0.00000,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Step 7. Topic Discovery

We use non-negative matrix factorization to generate 20 topics.
For each topic, we use the top 5 words to describe the topic.

In [None]:
# Apply NMF
from sklearn.decomposition import NMF

nmf_model = NMF(n_components=20, random_state=0)
nmf_model.fit(tfidf)
W =nmf_model.fit_transform(tfidf)  # Document-topic matrix

# Display topics
feature_names = tfidf_vectorizer.get_feature_names_out()
topic_names=[]
for topic_index in range(len(nmf_model.components_)):
    topic = nmf_model.components_[topic_index]
    # Get the indices of the top 3 words (largest values in the topic)
    sorted_indices = topic.argsort()  # sorts from smallest to largest
    top_indices = sorted_indices[-5:]  # get the last 3 (top 3 words)
    # Reverse to make it largest to smallest
    top_indices = top_indices[::-1]
    # Get the actual word names for these indices
    top_words = []
    for i in top_indices:
        top_words.append(feature_names[i])
    # Join the top words into a single string
    top_words_string = " ".join(top_words)
    # Print and save
    print("Topic #{}:".format(topic_index))
    print(top_words_string)
    topic_names.append(top_words_string)
topic_df = pd.DataFrame(W, columns=topic_names)
topic_df

Topic #0:
https openai windows gpts update
Topic #1:
chatgpt available ios app rolling
Topic #2:
gpt 4o capabilities image chat
Topic #3:
ai announcing systems news today
Topic #4:
openai o1 preview pro amp
Topic #5:
users plus team pro today
Topic #6:
models future model frontier team
Topic #7:
safety sharing security model red
Topic #8:
voice advanced app access desktop
Topic #9:
research deep work future images
Topic #10:
day https way canvas plus
Topic #11:
new openai features introducing way
Topic #12:
use canvas work code writing
Topic #13:
using custom instructions operator work
Topic #14:
soon ll enterprise coming edu
Topic #15:
eu iceland liechtenstein switzerland norway
Topic #16:
ve content sora api data
Topic #17:
o3 mini reasoning model coding
Topic #18:
live app 4o openai features
Topic #19:
tasks world real today information


Unnamed: 0,https openai windows gpts update,chatgpt available ios app rolling,gpt 4o capabilities image chat,ai announcing systems news today,openai o1 preview pro amp,users plus team pro today,models future model frontier team,safety sharing security model red,voice advanced app access desktop,research deep work future images,day https way canvas plus,new openai features introducing way,use canvas work code writing,using custom instructions operator work,soon ll enterprise coming edu,eu iceland liechtenstein switzerland norway,ve content sora api data,o3 mini reasoning model coding,live app 4o openai features,tasks world real today information
0,0.049721,0.000000,0.000000,0.000000,0.000000,0.305311,0.005752,0.000000,0.000000,0.000000,0.001150,0.000000,0.000000,0.000000,0.000000,0.005278,0.000000,0.000000,0.000000,0.000000
1,0.039114,0.009519,0.111897,0.000000,0.003182,0.000000,0.000000,0.000000,0.000000,0.031933,0.003897,0.000000,0.000000,0.023954,0.000000,0.000000,0.022409,0.000000,0.009072,0.000000
2,0.028019,0.001114,0.149874,0.001363,0.000000,0.000000,0.000000,0.000000,0.000000,0.013180,0.000000,0.000000,0.000000,0.204244,0.000000,0.000000,0.003164,0.003347,0.006629,0.006877
3,0.038481,0.003885,0.113017,0.000085,0.002582,0.000000,0.000000,0.000000,0.008036,0.014229,0.001913,0.003296,0.000000,0.007307,0.002869,0.000000,0.017881,0.004729,0.008780,0.009813
4,0.052098,0.001866,0.283169,0.000000,0.005230,0.000000,0.000000,0.000000,0.000000,0.011478,0.001524,0.000000,0.000000,0.000000,0.000000,0.000000,0.001347,0.000000,0.003374,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
519,0.028652,0.000000,0.000000,0.073482,0.000000,0.000000,0.027135,0.000000,0.026596,0.011342,0.000000,0.124888,0.000772,0.017833,0.041129,0.000000,0.003292,0.022107,0.000000,0.000000
520,0.276079,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
521,0.020023,0.053126,0.000000,0.004867,0.074725,0.000000,0.002692,0.000000,0.000000,0.083577,0.000000,0.000000,0.000000,0.000000,0.118001,0.000000,0.124810,0.000000,0.004268,0.000000
522,0.095965,0.000000,0.000000,0.000000,0.000000,0.000000,0.370398,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


## Step 8 Linear Regression
Model
Construct out outcome variable $log(Likes)=log(Likes+1)$, run the model, and evaluate the model.


In [None]:
topic_df

Unnamed: 0,https openai windows gpts update,chatgpt available ios app rolling,gpt 4o capabilities image chat,ai announcing systems news today,openai o1 preview pro amp,users plus team pro today,models future model frontier team,safety sharing security model red,voice advanced app access desktop,research deep work future images,day https way canvas plus,new openai features introducing way,use canvas work code writing,using custom instructions operator work,soon ll enterprise coming edu,eu iceland liechtenstein switzerland norway,ve content sora api data,o3 mini reasoning model coding,live app 4o openai features,tasks world real today information
0,0.049721,0.000000,0.000000,0.000000,0.000000,0.305311,0.005752,0.000000,0.000000,0.000000,0.001150,0.000000,0.000000,0.000000,0.000000,0.005278,0.000000,0.000000,0.000000,0.000000
1,0.039114,0.009519,0.111897,0.000000,0.003182,0.000000,0.000000,0.000000,0.000000,0.031933,0.003897,0.000000,0.000000,0.023954,0.000000,0.000000,0.022409,0.000000,0.009072,0.000000
2,0.028019,0.001114,0.149874,0.001363,0.000000,0.000000,0.000000,0.000000,0.000000,0.013180,0.000000,0.000000,0.000000,0.204244,0.000000,0.000000,0.003164,0.003347,0.006629,0.006877
3,0.038481,0.003885,0.113017,0.000085,0.002582,0.000000,0.000000,0.000000,0.008036,0.014229,0.001913,0.003296,0.000000,0.007307,0.002869,0.000000,0.017881,0.004729,0.008780,0.009813
4,0.052098,0.001866,0.283169,0.000000,0.005230,0.000000,0.000000,0.000000,0.000000,0.011478,0.001524,0.000000,0.000000,0.000000,0.000000,0.000000,0.001347,0.000000,0.003374,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
519,0.028652,0.000000,0.000000,0.073482,0.000000,0.000000,0.027135,0.000000,0.026596,0.011342,0.000000,0.124888,0.000772,0.017833,0.041129,0.000000,0.003292,0.022107,0.000000,0.000000
520,0.276079,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
521,0.020023,0.053126,0.000000,0.004867,0.074725,0.000000,0.002692,0.000000,0.000000,0.083577,0.000000,0.000000,0.000000,0.000000,0.118001,0.000000,0.124810,0.000000,0.004268,0.000000
522,0.095965,0.000000,0.000000,0.000000,0.000000,0.000000,0.370398,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [None]:
df1.columns

Index(['created_at', 'id', 'conversation_id', 'text', 'lang',
       'edit_history_tweet_ids', 'public_metrics.retweet_count',
       'public_metrics.reply_count', 'public_metrics.like_count',
       'public_metrics.quote_count', 'public_metrics.bookmark_count',
       'public_metrics.impression_count', 'in_reply_to_user_id', 'username',
       'log_view', 'datetime'],
      dtype='object')

In [None]:
# Code for estimating the linear regression model with the outcome being $log(Likes)$
import numpy as np
import pingouin as pg
pd.options.display.float_format = '{:.3f}'.format

# Combine X and y into a single dataframe
df_model = topic_df.copy()
df_model['view'] =np.log1p(df1['public_metrics.impression_count'] )
# Run linear regression
result = pg.linear_regression(df_model.drop(columns='view'),
df_model['view'])
# Round coef and pval to 3 decimal places
result[['names', 'coef', 'pval']] = result[['names', 'coef',
'pval']].round(4)
# Display the rounded result
result[['names', 'coef', 'pval']]

result


Unnamed: 0,names,coef,se,T,pval,r2,adj_r2,CI[2.5%],CI[97.5%]
0,Intercept,12.793,0.243,52.705,0.0,0.127,0.092,12.316,13.27
1,https openai windows gpts update,0.088,1.108,0.08,0.937,0.127,0.092,-2.088,2.264
2,chatgpt available ios app rolling,9.002,2.011,4.476,0.0,0.127,0.092,5.05,12.953
3,gpt 4o capabilities image chat,1.875,0.844,2.221,0.027,0.127,0.092,0.216,3.534
4,ai announcing systems news today,3.464,1.704,2.032,0.043,0.127,0.092,0.115,6.812
5,openai o1 preview pro amp,2.278,0.827,2.756,0.006,0.127,0.092,0.654,3.903
6,users plus team pro today,1.343,1.099,1.222,0.222,0.127,0.092,-0.816,3.502
7,models future model frontier team,0.419,1.07,0.392,0.696,0.127,0.092,-1.683,2.521
8,safety sharing security model red,-0.674,1.36,-0.496,0.62,0.127,0.092,-3.346,1.997
9,voice advanced app access desktop,-0.545,1.012,-0.538,0.591,0.127,0.092,-2.532,1.443


In [None]:
# Code for results production

print(result[['names', 'coef', 'pval']])

print("R²:", round(result['r2'][0], 3))

## Please Type Your Code Here

Please fill in the text box with your interpretation of the estimates.
### Coefficients interpretation.
---

```
+------------------------------------+
| Interpret the coefficients here.   |
1. statistical significance
2. Direction of coefficients
+------------------------------------+
```



Please fill in the text box with your interpretation of the R2.

### R2 Interpretation.
---

```
+---------------------------------------------------------------------------+
| Input Your Answers Here.                                                  |
| - Interpret the R2.  - Definition of R2                                                      |
| - Change the number of components to 10 and 5. How would that affect R2?  |
+---------------------------------------------------------------------------+
```


## Step 9: Perform the same analysis for the outcome of impression based on tweets from OpenAI.

```
+-------------------------------------------------------------------------+
| No code is required but you will need to keep the regression results.   |
+-------------------------------------------------------------------------+
```


## Step 10: Topic modeling and regression for the outcome of impression and likes based on tweets from Nvidia.

```
+-------------------------------------------------------------------------+
| No code is required but you will need to keep the regression results.   |
+-------------------------------------------------------------------------+
```

## Step 11: Summarize your findings based on the comparison between Nvidia and OpenAI.
```
+----------------------------+
| Input Your Answers Here.   |
+----------------------------+
```