# Analyze Twitter Data with OpenAI V1

This notebook assumes that your Tweets are collected with Twitter API V1, or the Tweets are orgianzied as:
```
{
id:123,
text:'abc',
...
}

```
If you Tweets are collected with Twitter API V2 or organized in a different foramt, please use the code at [V1](V1).

## Install Python libraries

We need the [pymongo](https://pypi.org/project/pymongo/) to manage the MongoDB database, and [openai](https://github.com/openai/openai-python) to call the OpenAI APIs.

In [2]:
!pip install pymongo

Collecting pymongo
  Obtaining dependency information for pymongo from https://files.pythonhosted.org/packages/5e/97/6fc527b749f4af354042c43b7032d0734923be2dad6c8ffdd28b469b8e93/pymongo-4.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading pymongo-4.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Obtaining dependency information for dnspython<3.0.0,>=1.16.0 from https://files.pythonhosted.org/packages/f6/b4/0a9bee52c50f226a3cbfb54263d02bb421c7f2adc136520729c2c689c1e5/dnspython-2.4.2-py3-none-any.whl.metadata
  Downloading dnspython-2.4.2-py3-none-any.whl.metadata (4.9 kB)
Downloading pymongo-4.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (677 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.1/677.1 kB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.4.2-py3-none-any.whl (300 kB)
[2K   [90m━━━━━━━━━━━━━━━

In [3]:
pip install openai

Collecting openai
  Obtaining dependency information for openai from https://files.pythonhosted.org/packages/dd/82/b92f73453ea318c0d46f31aeb56e9d94a42606c010fb72a513f4a3cd4bac/openai-1.1.2-py3-none-any.whl.metadata
  Downloading openai-1.1.2-py3-none-any.whl.metadata (16 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.8.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Obtaining dependency information for httpx<1,>=0.23.0 from https://files.pythonhosted.org/packages/82/61/a5fca4a1e88e40969bbd0cf0d981f3aa76d5057db160b94f49603fc18740/httpx-0.25.1-py3-none-any.whl.metadata
  Downloading httpx-0.25.1-py3-none-any.whl.metadata (7.1 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Obtaining dependency information for pydantic<3,>=1.9.0 from https://files.pythonhosted.org/packages/73/66/0a72c9fcde42e5650c8d8d5c5c1873b9a3893018020c77ca8eb62708b923/pydantic-2.4.2-py3-none-any.whl.metadata
  Downloading pydantic-2.4.2-py3-none-any.whl.metadata (158 kB)


## Import Python libraries

In [4]:
import pymongo
from pymongo import MongoClient
import json
from pprint import pprint
import configparser
from tqdm import tqdm
import re

## Load the authorization info

Save the database connection info and API key in a config.ini file and use the configparse to load the authorization info.

The config.ini file should look like:
``` 
[myopenai]
openai_api = <your openai API>

[mymongo]
connection = <your monogdb connection>
```


In [5]:
config = configparser.ConfigParser(interpolation=None)
config.read('config.ini')

openai_api_key   = config['myopenai']['openai_api']

mongod_connect = config['mymongo']['connection']

## Connect to the MongoDB cluster

We will connect to the MongoDB database that contains the tweet data. You need to change the database name and collection name to match your settings.

In [6]:
client = MongoClient(mongod_connect)
db = client.tweet # use or create a database named tweet
tweet_collection = db.gun_va #use or create a collection named gun_va
# tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

## Extract Twitter Data

Search the Tweets you are intrested.
You can use [MongoDB Compass](https://www.mongodb.com/try/download/compass) to help you write the queries.

In [7]:
'''
The following code is generated in MongoDB Compass to find the top 100 tweets 
with a key word of 'shooting', ordered by the favorite count
'''
filter={
    '$text': {
        '$search': 'shooting'
    }
}
project={
    'id': 1, 
    'text': 1
}
sort=list({
    'favorite_count': -1
}.items())
limit=100
result = client['tweet']['gun_va'].find(
  filter=filter,
  projection=project,
  sort=sort,
  limit=limit
)

Save the extracted Tweets into the ```tweet_data``` list. Remove URLs and extract lines to save the tokens. 

In [8]:
tweet_data = []
url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
for tweet in result:
    text_without_urls = re.sub(url_pattern, '', tweet['text'])
    tweet_data.append({'tweet_id':tweet['id'],'tweet_text':text_without_urls.replace('\n','')})

In [135]:
print('Number of tweets: ',len(tweet_data))

Number of tweets:  73


## Set up OpenAI API

Load the OpenAI API key and set the API parameters. 
- Model type: use ```gpt-3.5-turbo``` by default, and you can use the [avaiabel models](https://platform.openai.com/docs/models/continuous-model-upgrades).
- Token estimate: 100 tokens ~= 75 words in English. You can get a more accurate estimate at [Tokenier](https://platform.openai.com/tokenizer).
- Temperature: use default value 0. Lower temperature result in more consistent outputs, while higher values generate more diverse and creative results

We also C

In [10]:
from openai import OpenAI
client = OpenAI(api_key=openai_api_key)
model="gpt-3.5-turbo"
temperature=0



def openai_help(prompt, model=model, temperature =temperature ):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature

    )
    return response.choices[0].message.content

## Sentiment anlysis

Analyze the sentiment of each tweet and save the result to the MongoDB database.

In [115]:
for tweet in tqdm(tweet_data):
  
    prompt = f"""
    What is the sentiment of the following tweet, 
    tweet text: {tweet['tweet_text']}
    return  the result with one word as positive, neutral,or negative
 
    """
#     print(prompt)
    try:
        sentiment_result =openai_help(prompt)
    #     print(sentiment_result)

        tweet_collection.update_one(
            {'id':tweet['tweet_id']},
            {"$set":{'sentiment':sentiment_result}}
        )
    except:
        pass


100%|██████████| 73/73 [00:32<00:00,  2.21it/s]


## Translate

Translate each tweet into a different language, and save the result to the MongoDB database.

In [117]:
for tweet in tqdm(tweet_data):
  
    prompt = f"""
    Translate the follwoing tweet into Chinese
    tweet text: {tweet['tweet_text']}
 
    """
#     print(prompt)
    try:
        translate_result =openai_help(prompt)
#         print(translate_result)

        tweet_collection.update_one(
            {'id':tweet['tweet_id']},
            {"$set":{'translate':translate_result}}
        )
    except:
        pass


100%|██████████| 73/73 [01:33<00:00,  1.28s/it]


## Identify emotions

Identify whether a tweet expresses anger, and save the result to the MongoDB database.

In [15]:
for tweet in tqdm(tweet_data):
  
    prompt = f"""
    Does the following tweet express anger?
    Provide the result as eitehr True or False.
    tweet text: {tweet['tweet_text']}
 
    """
#     print(prompt)
    try:
        emotion_result =openai_help(prompt)
    #     print(emotion_result)

        tweet_collection.update_one(
                {'id':tweet['tweet_id']},
                {"$set":{'anger':emotion_result}}
            )
    except:
        pass


100%|██████████| 73/73 [00:30<00:00,  2.42it/s]


## Extract entities

Extract persons and organzations names from each tweet and save the result to the MongoDB database. 

In [23]:
for tweet in tqdm(tweet_data):
  
    prompt = f"""
    Identify the person name or organzation names from the following tweet,
    tweet text: {tweet['tweet_text']}
    format the response as a JSON document with person and organzation the keys.
    If the information is not presented, use "unknow"
    """
#     print(prompt)
    try:
        extract_result =openai_help(prompt)
#         print(extract_result)

        tweet_collection.update_one(
                {'id':tweet['tweet_id']},
                {"$set":{'extracted_item':json.loads(extract_result)}}
                )
    except:
        pass

100%|██████████| 73/73 [00:51<00:00,  1.41it/s]


## Smmuarize

Summarize the tweet texts with a specific focus, and save the result to the MongoDB database.
Due to the token limitation, each time we sumarize no more than 50 tweets.

In [148]:
# Define the batch size
batch_size = 50

start_index = 0


while start_index < len(tweet_data):
    batch = tweet_data[start_index:start_index + batch_size]

    tweet_id_list =[]
    tweet_text_summary =''
    
    for tweet in batch:
        tweet_id_list.append(tweet['tweet_id'])
        tweet_text_summary = tweet_text_summary+'.'+tweet['tweet_text']
        
    prompt = f"""
    Sumarize the following tweets in at most 50 words
    and focusing on any spect that mentioned election
    tweet text: {tweet_text_summary}
 
    """
#     print(prompt)
    try:
        summary_result =openai_help(prompt)
        
        tweet_summary = db.tweet_summary #use or create a collection named gun_va
        tweet_summary.insert_one({'id_list':tweet_id_list,
                            'tweet_text_summary':summary_result})
        print(summary_result)
    except:
        pass
    start_index += batch_size

The tweets mention various aspects of the election, including gun control and mass shootings. Some tweets express frustration with the lack of action on gun control, while others argue that gun control is not the solution to preventing shootings. There is also mention of politicians being exposed by mass shooting survivors and criticism of the NRA.
The tweets discuss various aspects of gun control in relation to mass shootings. Some argue for stricter gun control laws as a way to prevent shootings, while others believe mental health and other factors should be addressed. There is criticism of politicians offering thoughts and prayers instead of passing legislation. The tweets also mention the potential for gun control discussions to fade away until the next shooting occurs.
