# 12. Data Mining Twitter
> [Chapter 12, Data Mining Twitter (Updated for the Twitter v2 APIs)](https://deitel.com/wp-content/uploads/2022/09/python-for-programmers-chapter-12-data-mining-twitter-v2.pdf)

# Objectives
* **Data-mine Twitter** with **Tweepy** library
* Use various **Twitter v2** API methods
* **Get information** about a specific Twitter account
* **Search for past tweets** that meet your criteria
* **Sample the stream of live tweets** as they’re happening
* **Request additional metadata** in Twitter responses via the Twitter v2 API’s **expansions** and **fields**
* **Clean and preprocess tweets** to prepare them for analysis
* **Translate foreign language tweets** into English and to perform **sentiment analysis** on tweets
* **Spot trends** with the Twitter v1.1 **Trends API**
* **Map tweets** using **folium** and OpenStreetMap map tiles

------

# 12.1 Introduction 
* Popular **big-data source**  
* **Data mining** &mdash; searching large collections of data for **insights**
* **Sentiment** in tweets can help **make predictions**  
    * **Stock prices**
    * **Election results**
    * Likely **revenues** for a **new movie**
    * **Success** of a company’s **marketing campaign**
* Spot **faults in competitors’ products** 
* Spot **trending topics**
* **Connect to Twitter** with easy-to-use **Web services**

### What Is Twitter?
* Tweets
    * Short messages
    * Initially limited to **140 characters**
    * Now limited to **280 characters**
* Anyone can generally choose to follow anyone else

### Twitter Statistics
* [Hundreds of millions of tweets are sent every day with many thousands sent per second](http://www.internetlivestats.com/twitter-statistics/)
* Can **tap into the live stream** of tweets
    * Like **“drinking from a fire hose”** 

### Twitter and Big Data 
* A **favorite big data source** for researchers and business people worldwide
* **Free** access to a small portion of recent tweets
* Can pay for access to much larger portions the all-time tweets database

------

# 12.2 Overview of the Twitter APIs 
* **Web services** are methods that you call in the **cloud**
* Each method has a **web service endpoint** represented by a **URL**
* **Caution**: Internet connections can be lost, services can change and some services are not available in all countries, so **apps can be brittle**
* **API categories** we'll look at
    * **Users API** — Access information about **Twitter user accounts**
    * **Tweets API** — Search through **past tweets**, access **live tweet streams**
    * **Trends API (Twitter v1.1)** — Find locations and lists of **trending topics**
* **Additional Twitter API categories**  
>https://developer.twitter.com/en/docs/api-reference-index

### Rate Limits: A Word of Caution 
* Twitter expects developers to use its services responsibly
* **Understand rate limits** before using any method
    * Twitter may **block you** for repeated violations
    * **Tweepy** can be configured to **wait when it encounters rate limits**
* Some methods list both **user rate limits** and **app rate limits**
    * We use **app rate limits** in the demos
    * **User rate limits** are for apps that enable individuals to log into their Twitter accounts
* [Details on rate limiting](https://developer.twitter.com/en/docs/basics/rate-limiting)
* [Specific rate limits on individual API methods](https://developer.twitter.com/en/docs/basics/rate-limits) — also see each API method’s documentation. 

### Other Restrictions
* **Follow Twitter’s rules/regulations** 
	* Terms of service — https://twitter.com/tos

------

# 12.3 Creating a Twitter Account
* [Apply for a developer account](https://developer.twitter.com/) to use the APIs
* Every application is subject to approval

### Twitter Developer Account Levels
https://developer.twitter.com/en/products/twitter-api
* Some Twitter v2 APIs are accessible only to Elevated-level and higher accounts. 
    * **Essentials** — “The best way to get started quickly, test, and build across all endpoints.”
    * **Elevated** — “More access for solutions that are beginning to experience growth or who prefer to work with multiple App environments.”
    * **Academic Research** — “Access to public data on nearly any topic to advance research objectives of Master’s students, doctoral candidates, post-docs, and faculty at an academic institution or university.”
* Twitter documentation specifies the minimum account level and the rate-limit differences between levels, if any.

### Choosing a Developer Account Application Type
* **Professional**, **Hobbyist**, and **Academic Research** use
    * choose the type most appropriate for your use case
    * For our examples, can choose **Hobbyist** then **Exploring the API**
* If asked to apply for an **Elevated** application, click **Get started**, then:
    1. On the **Basic info** tab, fill in the form with your information and click **Next**.
    2. On the **Intended use** tab, describe how you intend to use the APIs. 
    3. Answer the other questions provided—For this chapter’s examples, you will not 
        * use the tweet, retweet, like, follow or direct message functionality
        * will not display tweets or aggregate data about Twitter content outside of Twitter
        * will not make Twitter content available to a government entity
    4. Click **Next** to review your answers, then click **Next** again. 
    5. Carefully read and agree to Twitter’s **Developer agreement & policy**, then click **Submit** to complete the application. You will be asked to confirm your email address.

### Essentials Level Accounts and the Twitter v1.1 APIs
* As of mid-2022, Twitter requires new developer accounts to use the Twitter v2 APIs
* Twitter has not yet migrated some v1.1 APIs to v2

------

# 12.5 What’s in a Twitter API Response?
* Twitter API methods return **JSON (JavaScript Object Notation)** objects
* Text-based **data-interchange format** 
* Represents objects as **collections of name–value pairs** (like dictionaries)
* Commonly used in web services
* Human and computer readable

# 12.5 What’s in a Twitter API Response? (cont.)
* **JSON object format**:
> ```
> {propertyName1: value1, propertyName2: value2}
> ```
* **JSON array format (like Python list)**:
> ```
> [value1, value2, value3]
> ```
* **Tweepy handles the JSON for you** behind the scenes

### Default Properties of a Tweet Object
Twitter returns a JSON object that, by default, contains 
* the tweet’s unique ID number 
* its text (up to a maximum of 280 characters)

### Twitter Metadata and the Twitter v1.1 APIs 
In Twitter v1.1 APIs, a tweet’s JSON object automatically included many additional metadata attributes that described aspects of the tweet, such as:
* when it was created, 
* who created it, 
* lists of the hashtags, URLs, @-mentions and media (such as images and videos) included in the tweet,
* and more. 

### Twitter v2 API Expansions and Fields 
* Now must use **fields** and **expansions** to request metadata your app requires
* **Fields** are **additional metadata attributes** you’d like Twitter to return to your app
* When you get a tweet, you might need 
    * the unique `author_id` attribute, indicating a tweet’s sender
    * the tweet’s `created_at` attribute, indicating when the user sent the tweet was sent
* Complete list of tweet fields, visit
> https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

### Twitter v2 API Expansions and Fields 
* Some **fields** are associated with **other metadata objects** with their own fields
* Associated with a tweet’s **`author_id`** attribute is a **user JSON object** 
* Use an **Expansion** to request associated metadata objects
* Each will contain its default attributes
    * For a user object, these would be the user’s **unique id number**, **name** and **username**
    * Can request more from the list of user fields 
    > https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user
* **Overview of all JSON objects** Twitter APIs return, and links to details
> https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction

### Sample JSON for the NASA Account’s 10 Most Recent Tweets
* Some of the JSON response to a request for recent tweets from `@NASA`

```json
{
  "data": [
    {
      "id": "1562156100136292352",
      "text": "RT @NASAInSight: Thanks again for all the kind 
               thoughts you’ve been sending. There’s still 
               time to write me a note for the mission team to…"
    },
    {
      "id": "1561886047331487744",
      "text": "We see Martian dust devils (whirlwinds) from the 
              ground, as in this shot from the Opportunity rover
              in 2016, left. From space, we can see the tracks 
              they leave behind, as in this view of dunes from 
              Mars Reconnaissance Orbiter in 2009, right. More: 
              https://t.co/kd1BNEDBUD https://t.co/RxeKTI5Fv5"
    },
    ...
  ],
  "meta": {
    "result_count": 10,
    "newest_id": "1562156100136292352",
    "oldest_id": "1555635141728382976",
    "next_token": "7140dibdnow9c7btw422nm76p6owdso7rqahg96mulyd2"
  }
}
```

------

# 12.6 Installing `tweepy`, `geopy`, `folium` and `deep-translator`

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Installing Tweepy 
* [**Tweepy library**](http://www.tweepy.org/) — **one of the most popular Python Twitter clients**
* Easy access to Twitter’s capabilities
* [Tweepy’s documentation](https://docs.tweepy.org/en/stable/)
> `pip install tweepy`
* Windows users **should run the Anaconda Prompt as an Administrator**



<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Installing geopy 
* One function from our `tweetutilities.py` file (in the ch13 folder) depends on [**geopy**](https://github.com/geopy/geopy) (a geocoding library we'll use later to plot tweet locations on a map
>`conda install -c conda-forge geopy`
* Windows users **should run the Anaconda Prompt as an Administrator**


<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### OpenMapQuest Geocoding API
* Section 12.15 uses the **OpenMapQuest Geocoding API** to convert locations, such as **Boston, MA**, into their latitudes and longitudes, such as **42.3602534** and **-71.0582912**, for plotting on maps
* Currently allows **15,000 transactions per month** on their free tier
* Sign up at 
> https://developer.mapquest.com/
* Go to https://developer.mapquest.com/user/me/apps 
    * Click **Create a New Key**, fill in the **App Name** field with a name of your choosing, leave the **Callback URL** empty and click **Create App** to create an API key
    * Click your app’s name to see your consumer key
    * In the `keys.py` file, store the consumer key by replacing `YourKeyHere` in the line
    > `mapquest_key = 'YourKeyHere'`

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Folium Library and Leaflet.js JavaScript Mapping Library
* Section 12.15 uses folium to create an interactive map
> https://github.com/python-visualization/folium

> `pip install folium`

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Maps from OpenStreetMap.org
* Leaflet.js uses open-source maps from `OpenStreetMap.org`. 
* Copyrighted by the OpenStreetMap.org contributors
* www.openstreetmap.org/copyright 
* www.opendatacommons.org/licenses/odbl

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### deep-translator Library
Supports several translation services
> `pip install -U deep_translator`

------

# 12.7 Authenticating with Twitter Via Tweepy to Access Twitter v2 APIs
* A **Tweepy `Client` object** is your gateway to using the Twitter v2 APIs
* Must first **authenticate with Twitter**

In [1]:
import tweepy

In [2]:
# before executing this cell, ensure that your copy of keys.py 
# contains your Twitter credentials as described earlier
import keys  

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Creating a `Client` Object

* To use the Twitter v2 APIs, **create a `Client` object**

In [3]:
client = tweepy.Client(bearer_token=keys.bearer_token,
                       wait_on_rate_limit=True)

* `bearer_token` is your bearer token 
* `wait_on_rate_limit=True` lets Tweepy **manage rate limits** for you
    * For most Twitter APIs, the rate-limit interval is 15 minutes

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.8 Getting Information About a Twitter Account
* `Client` object’s `get_user` method gets a `tweepy.Response` object containing information about a `@NASA`’s Twitter account

In [4]:
nasa = client.get_user(username='NASA',
    user_fields=['description', 'public_metrics'])

* `get_user` with the `username` keyword argument calls Twitter API method 
> `/2/users/by/username/:username`
* Returns JSON data that Tweepy converts into a **`tweepy.Response`** 
    * Contains account’s **ID number**, **name** and **user name** by default
* Can request additional fields via the **`user_fields`** keyword argument
    * We requested the account’s **`description`** and **`public_metrics`**
* Complete **list of user fields**
> https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user
* **Rate limit** for `/2/users/by/username/:username`
    * Can call up to 900 times every 15 minutes 

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### `tweepy.Response` Object
* Contains four fields:
    * **`data`** — contains the data returned by Twitter, including any additional fields you request  
    * **`includes`** — contains related objects specified via the method’s `expansions` parameter 
    * **`errors`** — information about any errors that occurred
    * **`meta`** — method-specific information that can be useful in processing the response

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Getting a User’s Basic Account Information
* For a **user JSON object**, `Response`’s `data` attribute is a **named tuple** containing default fields
    * `id` is the account’s unique ID number.
    * `name` is the name associated with the user’s account.
    * `username` is the user’s Twitter handle (`@NASA`) 
* Additional `user_fields` `description` and `public_metrics` (discussed momentarily) also are in the `Response` object’s `data` attribute

In [5]:
nasa.data.id

11348282

In [6]:
nasa.data.name

'NASA'

In [7]:
nasa.data.username

'NASA'

In [8]:
nasa.data.description

"There's space for everybody. ✨"

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Getting the Number of Accounts That Follow This Account and the Number of Accounts This Account Follows
* **`public_metrics`** are returned as a dictionary with keys
    * **`'followers_count'`** — number of users who follow this account, 
    * **`'following_count'`** — number of users that this account follows, 
    * **`'tweet_count'`** — total number of tweets (and retweets) sent by this user
    * **`'listed_count'`** — total number of Twitter lists that include this user

In [9]:
nasa.data.public_metrics['followers_count']

71698219

In [10]:
nasa.data.public_metrics['following_count']

178

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Getting Your Own Account’s Information
* Get via `Client` object’s `get_me` method
> `me = client.get_me()`
* Returns a **User object** for the account you used to authenticate with Twitter

# 12.9 Intro to Tweepy `Paginator`s: Getting More than One Page of Results 
* Twitter API methods often **return collections of objects**
    * tweets sent by a particular user
    * tweets matching specified search criteria 
    * tweets in a user’s timeline (tweets sent by a user and by other accounts that user follows)
* Each Twitter API can return a maximum number of items per call
    * known as a **page of results**
* Tweepy **`Paginator`** handles paging details
* Invokes a specified `Client` method and checks whether there is another page of results
    * If so, the `Paginator` automatically calls the method again to get next page
    * Continues (subject to the method’s rate limits) until there are no more results to process

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.9.1 Determining an Account’s Followers  
* Use a `Paginator` to invoke the `Client` object’s `get_users_followers` method
* Calls the Twitter API’s method
> `/2/users/:id/followers`
* Returns followers in groups of 100 by default
* Can request up to 1000 at a time
* We’ll grab 10 of NASA’s followers, five at a time, so we receive two pages of results
* Create a list to store the followers’ Twitter user names

In [11]:
followers = []

### Creating a `Paginator`
* `Paginator` to call `get_users_followers` for NASA’s account  

In [12]:
paginator = tweepy.Paginator(
   client.get_users_followers, nasa.data.id, max_results=5)

* arguments are the method to call and any arguments that should be passed to that method
    * `client.get_users_followers` indicates that the `Paginator` will call the `client` object’s `get_users_followers` method, 
    * `nasa.data.id` — ID number (obtained in Section 12.8) of the NASA Twitter account for which we’ll get followers, and 
    * `max_results=5` — results per page.

### Getting Results
* Use the `Paginator` to get some followers
    * `paginator.flatten(10)` initiates the call to `client.get_users_followers`
    * `10` indicates number of results to obtain

In [13]:
for follower in paginator.flatten(limit=10):
    followers.append(follower.username)

In [14]:
print('Followers:', 
    ' '.join(sorted(followers, key=lambda s: s.lower())))

Followers: anishani1168136 AygulTehli19217 Bones08823 fckwildone InvictusSweden JimBourg1 kap4435 kingnibs_ MalaPrincesss PanesitoSu56935


### Automatic Paging
* `flatten` automatically “pages” through the results by making multiple calls to `client.get_users_followers` as necessary
* `flatten` makes multiple pages appear to be a sequence of results
* If you do not specify an argument to `flatten`, the `Paginator` attempts to get all of the account’s followers
    * This could take significant time due to Twitter’s rate limits 
    * `/2/users/:id/followers` can return a maximum of 1000 followers at a time, and Twitter allows up to 15 calls every 15 minutes
    * 15,000 followers every 15 minutes using Twitter’s free APIs
    * At 60,000 followers per hour, it would take over 40 days to get all of NASA’s followers

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.9.2 Determining Whom an Account Follows 
* `Client` object’s `get_users_following` method calls the Twitter API’s 
`/2/users/:id/following` method to get a list of Twitter users an account follows
* Returns groups of 100 by default, but you can request up to 1000 at a time
* Can call this method up to 15 times every 15 minutes
* Get 10 accounts that NASA follows:

In [15]:
following = []

paginator = tweepy.Paginator(
    client.get_users_following, nasa.data.id, max_results=5)

for user_followed in paginator.flatten(limit=10):
    following.append(user_followed.username)

print('Following:', 
      ' '.join(sorted(following, key=lambda s: s.lower())))

Following: Astro_AndreD astro_anil Astro_Ayers astro_berrios Astro_ChrisW astro_deniz astro_watkins librarycongress NASASolarSystem NASASpaceSci


<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.9.3 Getting a User’s Recent Tweets 
* `Client` method `get_users_tweets` returns a `tweepy.Response` containing tweets from a specified user
* Calls the Twitter API’s `/2/users/:id/tweets` method
* Returns the most recent 10 tweets but can between 5 and 100 at a time
* Can return only an account’s 3200 most recent tweets
* May call this method up to 1500 times every 15 minutes 

## 12.9.3 Getting a User’s Recent Tweets (cont.) 
* The `data` attribute of the `tweepy.Response` contains a list of the returned tweets
    * Each object in that list has a dictionary `data` attribute containing the keys `'id'` and `'text'` for each tweet’s unique ID and its text
* Display five tweets from the `@NASA` account using its ID number that we obtained previously: 

In [16]:
nasa_tweets = client.get_users_tweets(
     id=nasa.data.id, max_results=5)

for tweet in nasa_tweets.data:
    print(f"NASA: {tweet.data['text']}\n")

NASA: Looking to do some stargazing? We’ve put together a handy guide with tips to help you find the best times and locations for enjoying the night sky: https://t.co/S0HWQqi2G6 https://t.co/MEibyzoCbQ

NASA: Our #Crew5 mission returned from the @Space_Station, @POTUS announced our proposed budget for the 2024 fiscal year, and the prototype spacesuit for our return to the lunar surface was revealed—This Week at NASA. 

Subscribe for weekly updates at https://t.co/MGGi7zOQWF https://t.co/VmhLzS9iee

NASA: RT @NASAHubble: This newly released Hubble image shows M55 – a loosely concentrated globular star cluster about 20,000 light-years away.…

NASA: "People need to realize not to make assumptions. Be open and curious about the abilities of everybody."
 
Dana Bolles is an External Information Technology Lead at NASA Headquarters. Discover how her community shapes her life, professionally and personally https://t.co/k2Ct6U4yj1 https://t.co/nuaHChhTRZ

NASA: When Irish skies are smiling...



<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.9.3 Getting a User’s Recent Tweets (cont.)
* We called the `get_users_tweets` method directly and used the keyword argument `max_results` to specify the number of tweets to retrieve
* For more than the maximum number of tweets per call (100), use a `Paginator` to call `get_users_tweets` 

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Grabbing Recent Tweets from Your Own Timeline
* `Client` method `get_home_timeline` gets tweets from your home timeline
    * your tweets and retweets, as well as tweets and retweets from the Twitter users you follow
> `client.get_home_timeline()`
* Calls Twitter’s `/2/users/:id/timelines/reverse_chronological` method 
* Returns up to 100 tweets by default

# 12.10 Searching Recent Tweets; Intro to Twitter v2 API Search Operators 
* `Client` method `search_recent_tweets` 
    * Returns tweets from the last seven days that match a query string you provide
    * Calls Twitter method `/2/tweets/search/recent`, 
    * **Returns a minimum of 10 tweets** at a time (the default) but **can return up to 100** (specified with keyword argument **`max_results`**)
    * It’s possible that fewer than 10 tweets will match the specified query string.

### Utility Function `print_tweets` from `tweetutilities.py`
* Receives the results of a call to API method `search` and for each tweet displays the user’s `screen_name` and the tweet’s `text`. 
* If the tweet is not in English and the `tweet.lang` is not `'und'` (undefined), we’ll also translate the tweet to English  

In [17]:
from tweetutilities import print_tweets

```python
def print_tweets(tweets):
    # translator to autodetect source language and return English
    translator = GoogleTranslator(source='auto', target='en')

    """For each tweet in tweets, display the username of the sender
    and tweet text. If the language is not English, translate the text 
    with the deep-translator library's GoogleTranslator."""
    for tweet, user in zip(tweets.data, tweets.includes['users']):
        print(f'{user.username}:', end=' ')

        if 'en' in tweet.lang:
            print(f'{tweet.text}\n')
        elif 'und' not in tweet.lang: # translate to English first
            print(f'\n  ORIGINAL: {tweet.text}')
            print(f'TRANSLATED: {translator.translate(tweet.text)}\n')
```

### Searching for Specific Words
* Call `Client` object’s `search_recent_tweets` method to search for 10 recent tweets about the Webb Space Telescope
* Returns a `Response` object in which the data attribute contains a list of matching tweets

In [18]:
tweets = client.search_recent_tweets(
    query='Webb Space Telescope -is:retweet', 
    expansions=['author_id'], tweet_fields=['lang']) 

* `query` keyword argument specifies the query string containing your search criteria
* Twitter returns only each tweet’s unique ID and text by default
* `'lang'` is an additional field you may request via the `tweet_fields` parameter
* **Complete list of tweet fields**
> https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet
* The **expansion `'author_id'`** indicates that for each tweet, Twitter also should return the **user JSON object for the user who sent the tweet**—`id`, `name` and `username` by default
* Tweepy places the **expansion objects** in the **`Response`’s `includes` dictionary attribute**
    * For the `'author_id'` expansion, a **list of tweet authors** is stored with the key **`'users'`**
    * **Each tweet has a corresponding user in this list**
    * The following expression in line 8 of `print_tweets` creates tuples in which the first element represents a tweet and the second element represents the user object for the sender
    > `zip(tweets.data, tweets.includes['users'])`

* Display the tweets 

In [19]:
print_tweets(tweets)

avbenefits: https://t.co/TEyBIHbjih Photo: NASA's Webb Telescope spots rare star about to go supernova - Business Insider https://t.co/tYbB2cC2Y1

drhafezster: This image from the James Webb Space Telescope shows IC 5332, a spiral galaxy, in unprecedented detail.

IC 5332 lies over 29 million light-years from Earth, and has a diameter of roughly 66 000 light-years, making it about a third smaller than the Milky Way. ⬇️ https://t.co/ldZHS9FcUa

hardknoxfirst: James Webb Space Telescope captures the beauty of a rare, violent phenomena https://t.co/OozlrVVvTG

QLDriver: @SwiftOnSecurity I enjoy the fact that Ball corporation make drinks containers and instruments for the James Webb Space Telescope. https://t.co/HDX3PAB4M9

worldzalexie: NASA James Webb Space Telescope Tracking Jupiter Planet With Never Seen Details
https://t.co/LVqwUk5KRs https://t.co/CzrQdcc91c

mountainmama50: James Webb Space Telescope spots huge star about to go supernova (video, photos) https://t.co/vrjyPPiFUP

felix

### Searching with Twitter v2 API Search Operators
* Can use **Twitter search operators** in query strings to refine search results
* **Max query-string length** is limited by your developer account type:
    * For **Essentials** and **Elevated** accounts: up to **512 characters**
    * For **Academic Research** accounts: up to **1024 characters**
* Some operators are available only for Elevated accounts or higher
* The Twitter v2 operators are categorized as **standalone** or **conjunction-required**
    * **Standalone operators** can be used alone or combined with other operators in a query string
    * **Conjunction-required operators** must be combined with at least one standalone operator in a query string
* The following table shows several Twitter search operators, as well as logical AND, logical OR and logical negation capabilities
    * parentheses can be used to group query-string subexpressions
    * matching is performed using case-insensitive searching

| Example&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Finds tweets containing |
| --- | --- |
| `python twitter` | Finds tweets containing `python` AND `twitter`. Spaces between query string terms and operators are implicitly treated as logical AND operations. In this query string, `python` and `twitter` are terms to search for—these are considered **standalone operators**.
| `python OR twitter` 	| Finds tweets containing `python` `OR` `twitter` `OR` both. The logical `OR` operator is case-sensitive.
| `planets -mars` 	| `-` (minus sign)—Finds tweets containing `planets` but not `mars`. The minus is the logical NOT operator and can be applied to any operator.
| An emoji | Use emojis as standalone operators to find tweets containing those emojis. 
| `has:hashtags`, `has:links`, `has:mentions`, `has:media`, … | You can combine these **conjunction-required operators** with standalone operators to find tweets containing hashtags, links, mentions of other users, media and more. 
| `is:retweet`, `is:reply`, `is:verified`, … | You can combine these **conjunction-required operators** with standalone operators to determine whether a tweet is a retweet, a tweet is a reply, the sender is a verified Twitter account and more. 
| `place:"New York City"` | Finds tweets that were sent near `"New York City"`. Multiword places should be quoted as shown here. 
| `from:NASA` 	| Finds tweets from the account `@NASA`.
| `to:NASA` 	| Finds tweets to the account `@NASA`. You also may use `to:id`, where `id` is the unique ID number of the user account.

### Operator Documentation and Tutorial
* All operators with examples of each  
> https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
* Twitter’s tutorial on building high-quality Twitter v2 API query strings to obtain the targeted results
> https://developer.twitter.com/en/docs/tutorials/building-high-quality-filters
* Twitter online tool to help you build Twitter v2 API query strings
> https://developer.twitter.com/apitools/query?query=

### Searching for Tweets From NASA Containing Links
* Use `from` and `has:links` operators to get recent tweets from `NASA` that contain hyperlinks

In [20]:
tweets = client.search_recent_tweets(
    query='from:NASA has:links', 
    expansions=['author_id'], tweet_fields=['lang'])

In [21]:
print_tweets(tweets)

NASA: Looking to do some stargazing? We’ve put together a handy guide with tips to help you find the best times and locations for enjoying the night sky: https://t.co/S0HWQqi2G6 https://t.co/MEibyzoCbQ



### Searching for a Hashtag
* Get tweets containing the hashtag `#metaverse`

In [22]:
tweets = client.search_recent_tweets(query='#metaverse', 
    expansions=['author_id'], tweet_fields=['lang'])

In [23]:
print_tweets(tweets)

ihsandenizli: 
  ORIGINAL: RT @hmutlu72: OXRO, piyasa koşullarına göre otomatik kararlar verir.
#oxro #bitcoin #dxgm #dexgame #btc #binance #coinbase #crypto #machine…
TRANSLATED: RT @hmutlu72: OXRO makes automatic decisions based on market conditions.
#oxro #bitcoin #dxgm #dexgame #btc #binance #coinbase #crypto #machine...

HanbalYunus: 
  ORIGINAL: RT @hmutlu72: Yapay zeka ile geliştirilmiş OXRO, kripto paralar için daha doğru kararlar verebilir.
#oxro #bitcoin #dxgm #dexgame #btc #bin…
TRANSLATED: RT @hmutlu72: AI-enhanced OXRO can make more accurate decisions for cryptocurrencies.
#oxro #bitcoin #dxgm #dexgame #btc #bin...

0xBigWin0xWin: RT @bcvvirtual: ✅AI ✅Metaverse ✅Utility NFTs ✅Partnerships

Everything is in place for the biggest launch of 2023.

Join our telegram airdr…

AldenBerge25462: RT @paraverse_world: Welcome to PARAVERSE, the Augmented World.
Explore, interact and play in a world of endless possibilities.
Be the prou…

JoyceCook731829: RT @paraverse_world: Welcome t

# 12.11 Spotting Trends: Twitter Trends API
**Note: At the time of this writing, Twitter had not yet migrated their Trending Topics APIs from v1.1 to v2. The v1.1 APIs used in this section are accessible only to Twitter Developer accounts with “Elevated” access and higher.**

* If a topic **“goes viral,”** thousands or even millions of people could tweet about it
* Twitter calls these **trending topics** and maintains lists of them worldwide
* Via the Twitter v1.1 Trends API, you can get lists of locations with trending topics and lists of the top 50 trending topics for each location
* To use the v1.1 APIs in Tweepy, initialize an object of class `OAuth2BearerHandler` with your bearer token, then create an `API` object that uses the `OAuth2BearerHandler` object to authenticate with Twitter:

In [24]:
auth = tweepy.OAuth2BearerHandler(keys.bearer_token)

api = tweepy.API(auth=auth, wait_on_rate_limit=True)

## 12.11.1 Places with Trending Topics 
* See how to find places with trending topics: https://learning.oreilly.com/videos/python-fundamentals/9780135917411/9780135917411-PFLL_Lesson12_15

## 12.11.2 Getting a List of Trending Topics 
* Via Tweepy `API`’s **`get_place_trends` method** 
* Calls **Twitter Trends API’s [`trends/place` method](https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place)**
* Returns top 50 trending topics for the location 
* [Look up WOEIDs](http://www.woeidlookup.com) 
* Look up WOEID’s programmatically using **Yahoo!’s web services** via [Python libraries like `woeid`](https://github.com/Ray-SunR/woeid)

### Worldwide Trending Topics 

In [25]:
world_trends = api.get_place_trends(id=1)  # list containing one dictionary

* **`'trends'` key** refers to a list of dictionaries representing each trend

In [26]:
trends_list = world_trends[0]['trends']

* Each trend has **`name`**, **`url`**, **`promoted_content`** (whether it's an advertisement), **`query`** and **`tweet_volume`** keys

In [27]:
trends_list[0]

{'name': '#SaudiArabianGP',
 'url': 'http://twitter.com/search?q=%23SaudiArabianGP',
 'promoted_content': None,
 'query': '%23SaudiArabianGP',
 'tweet_volume': 228944}

### Get Today's Worldwide Trending Topics (cont.)
* For **trends with more than 10,000 tweets**, the `tweet_volume` is the number of tweets; otherwise, it’s `None`
* Filter the list so that it contains only trends with more than 10,000 tweets:

In [28]:
trends_list = [t for t in trends_list if t['tweet_volume']]

* Sort the trends in _descending_ order by `tweet_volume`:

In [29]:
from operator import itemgetter 

In [30]:
trends_list.sort(key=itemgetter('tweet_volume'), reverse=True) 

### Get Today's Worldwide Trending Topics (cont.)
* Display names of the **top five trending topics**

In [31]:
for trend in trends_list:
    print(trend['name'])

Fenerbahçe
#SaudiArabianGP
Alonso
Antony
Mother's Day
Nana
Credit Suisse
#MUFC
Verstappen
Fulham
Checo
Lewis
Russell
Hamilton
#FACup
Ferrari
Mitrovic
Sancho
Rennes
Kanaga
Mete Kalkavan
Wembley
Brighton
Sabitzer
Stroll
Aston Martin
La FIA
Willian
Safety Car
#MUNFUL
Sainz
Maguire
McLaren
Red Bull
Old Trafford
Bruno Fernandes
Weghorst


------       

# 12.12 Cleaning/Preprocessing Tweets for Analysis
* **Data cleaning** is one of data scientists' most common tasks 
* Some NLP tasks for normalizing tweets
    * Converting all text to the same case
    * Removing `#` from hashtags, `@`-mentions, duplicates, hashtags
    * Removing excess whitespace, punctuation, **stop words**, URLs
    * Removing `RT` (retweet) and `FAV` (favorite) 
    * **Stemming** and **lemmatization**
    * **Tokenization**

### [**tweet-preprocessor**](https://github.com/s/preprocessor) Library and TextBlob Utility Functions
* `pip install tweet-preprocessor`
* Can automatically remove any combination of:

| Option | Option constant |
| :--- | :--- |
| **`OPT.MENTION`** | @-Mentions (e.g., `@nasa`) |
| **`OPT.EMOJI`** | Emoji |
| **`OPT.HASHTAG`** | Hashtag (e.g., `#mars`) |
| **`OPT.NUMBER`** | Number |
| **`OPT.RESERVED`** | Reserved Words (`RT` and `FAV`) |
| **`OPT.SMILEY`** | Smiley |
| **`OPT.URL`** | URL |

### Cleaning a Tweet Containing a Reserved word and a URL
* The tweet-preprocessor library’s module name is **`preprocessor`**

In [32]:
import preprocessor as p

In [33]:
p.set_options(p.OPT.URL, p.OPT.RESERVED)

In [34]:
tweet_text = 'RT A sample retweet with a URL https://nasa.gov'

In [35]:
p.clean(tweet_text)

'A sample retweet with a URL'

------

# 12.13 Twitter Streaming API
* Your app can receive tweets as they occur in real-time
* Based on the Twitter Statistics page at [InternetLiveStats.com](http://www.internetlivestats.com/twitter-statistics/)
    * **over 10,000 tweets per second**
    * approximately **880 million tweets per day**
* Most developer accounts are subject to a **tweet cap** — a maximum number of tweets per month that an account’s Twitter apps can acquire using the Twitter APIs
    * 500,000 for Essentials accounts 
    * two million for Elevated accounts
    * academic research and paid accounts can get more

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.13.1 Creating a Subclass of `StreamingClient` 
* A stream uses a **persistent** connection to **push** tweets to your app
* Streaming rate varies, based on search criteria specified with **`StreamRule`s** 
* Twitter uses all the `StreamRule`s you set to find tweets, including those set previously
* You may want to **delete existing `StreamRule`s before creating new ones**

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

## 12.13.1 Creating a Subclass of `StreamingClient` (cont.)
* Create a subclass of Tweepy’s `StreamingClient` class to process the tweet stream
* Tweepy calls the methods on an object of this class as it receives each new tweet (or other message, such as an error) from Twitter
    * `on_connect(self)` is called when your app successfully connects to the Twitter stream
    * `on_respone(self, response)` is called when a response arrives from the Twitter stream—`response` parameter is a Tweepy `StreamResponse` named tuple object containing the tweet data, any expansion objects you requested and more
* `StreamingClient` already defines these and other "on_" methods 
* Override only the methods your app needs
* `StreamingClient` methods
> https://docs.tweepy.org/en/latest/streamingclient.html  

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Class `TweetListener`
`StreamingClient` subclass `TweetListener` is defined in `tweetlistener.py`

```python
# tweetlistener.py
"""StreamingClient subclass that processes tweets as they arrive."""
from deep_translator import GoogleTranslator
import tweepy

class TweetListener(tweepy.StreamingClient):
    """Handles incoming Tweet stream."""
```

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Class `TweetListener`: `__init__` Method 
* called when you create a new `TweetListener` object
* `bearer_token` is used to authenticate with Twitter
* `limit` parameter is the number of tweets to process
* Line 11: instance variable to track the number of tweets processed so far
* Line 12: constant to store the limit
* `GoogleTranslator` object for translating tweets into English
* Line 17 passes the `bearer_token` to the superclass’s `__init__`

```python
    def __init__(self, bearer_token, limit=10):
        """Create instance variables for tracking number of tweets."""
        self.tweet_count = 0
        self.TWEET_LIMIT = limit
        
        # GoogleTranslator object for translating tweets to English 
        self.translator = GoogleTranslator(source='auto', target='en')

        super().__init__(bearer_token, wait_on_rate_limit=True)  

```

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Class `TweetListener`: `on_connect` Method 
* Called when your app successfully connects to the Twitter stream

```python
    def on_connect(self):
        """Called when your connection attempt is successful, enabling 
        you to perform appropriate application tasks at that point."""
        print('Connection successful\n')
```

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Class `TweetListener`: `on_response` Method 
* Called by when each tweet arrives
* second parameter is a Tweepy `StreamResponse` named tuple object containing:
    * `data` — the tweet’s attributes
    * `includes` — any requested expansion objects
    * `errors` — any errors that occurred
    * `matching_rules` — `StreamRules` that the returned tweet matched
* This example uses an expansion to include in the `StreamResponse` the user JSON object for each tweet’s sender
    * Twitter also returns user objects for accounts mentioned in the tweet’s text

```python
    def on_response(self, response):
        """Called when Twitter pushes a new tweet to you."""
        
        try:
            # get username of user who sent the tweet
            username = response.includes['users'][0].username
            print(f'Screen name: {username}')
            print(f'   Language: {response.data.lang}')
            print(f' Tweet text: {response.data.text}')

            if response.data.lang != 'en' and response.data.lang != 'und':
                english = self.translator.translate(response.data.text)
                print(f' Translated: {english}')

            print()
            self.tweet_count += 1 
        except Exception as e:
            print(f'Exception occured: {e}')
            self.disconnect()
            
        # if TWEET_LIMIT is reached, terminate streaming
        if self.tweet_count == self.TWEET_LIMIT:
            self.disconnect()
```

* Line 29 gets the sender’s username
    * List element 0 of `response.includes['users']` contains the tweet sender’s user object
    * Subsequent elements would contain accounts mentioned in the tweet
* Lines 30–32 display the tweet sender’s `username`, the tweet’s language (`lang`) and the tweet’s `text`
* If necessary, lines 34–36 translate the tweet to English and display it
* Line 39 increments `self.tweet_count`
* Lines 45–46 determine whether to terminate streaming. 


<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.13.2 Initiating Stream Processing

### Creating a TweetListener 
* `StreamingClient` subclass `TweetListener` manages the connection to the Twitter stream and receives and processes the tweets

In [36]:
from tweetlistener import TweetListener

tweet_listener = TweetListener(
    bearer_token=keys.bearer_token, limit=3)

### Redirecting the Standard Error Stream to the Standard Output Stream
* When `StreamingClient` subclass’s `disconnect` method is called to terminate the tweet stream, the method sends the following message to `sys.stderr` which is not synchronized with the standard output stream
> `Stream connection closed by Twitter`
* Sometimes causes the preceding message to be interspersed with other messages that this app sends to the standard output stream
* To prevent this, redirect the standard error stream to the standard output stream

In [37]:
import sys

sys.stderr = sys.stdout

### Deleting Existing Stream Rules
* Twitter uses all the `StreamRule`s you’ve specified previously to filter the tweets it pushes to your app
* Twitter does not automatically remove your `StreamRule`s after you terminate the tweet stream
* If your app filters the tweet stream with different rules each time you run it, you should delete any existing `StreamRule`s before creating new ones

* Get the `StreamRule`s by calling your `StreamingClient`’s `get_rules` method
    * `Response`’s `data` attribute contains a `list` of `StreamRule`s

In [38]:
rules = tweet_listener.get_rules().data

* Get the rule IDs

In [39]:
# execute only if you have rules previously saved; 
# Twitter recently started deleting saved rules that have not been used recently
rule_ids = [rule.id for rule in rules]

* Call `StreamingClient`’s `delete_rules` method with a list of rule IDs to delete
    * response contains a `'summary'` dictionary with information about the number of deleted rules

In [40]:
# execute only if you have rules previously saved; 
tweet_listener.delete_rules(rule_ids)    

Response(data=None, includes={}, errors=[], meta={'sent': '2023-03-19T19:36:17.226Z', 'summary': {'deleted': 1, 'not_deleted': 0}})

### Creating and Adding a Stream Rule
* Create a rule to filter the live tweet stream looking for tweets about football
* Then, add the rule
    * `add_rules`’ Response contains a `'summary'` dictionary with information about the `StreamRule` you set and whether it was valid

In [41]:
filter_rule = tweepy.StreamRule('football')

In [42]:
tweet_listener.add_rules(filter_rule)

Response(data=[StreamRule(value='football', tag=None, id='1637538501104136192')], includes={}, errors=[], meta={'sent': '2023-03-19T19:36:18.446Z', 'summary': {'created': 1, 'not_created': 0, 'valid': 1, 'invalid': 0}})

### Starting the Tweet Stream
* `StreamingClient`'s `filter` method begins streaming 
    * `expansions` argument indicates that we’d like the response for each tweet to include the sender’s user JSON object
    * `tweet_fields` argument indicates that the tweet’s language should be included in the responses tweet JSON object

In [43]:
tweet_listener.filter( 
    expansions=['author_id'], tweet_fields=['lang'])

Connection successful

Screen name: hunnay_x
   Language: en
 Tweet text: She’s the reason then https://t.co/NiKDPgD1Dc

Screen name: Silvanosbhandit
   Language: en
 Tweet text: @Blue_Footy Twickenham not Wembley

Screen name: RichardEghosa
   Language: en
 Tweet text: RT @Blue_Footy: I'm not sure I can stand playing in small Craven Cottage for four years. Wembley's big pitch is better for our football but…

Stream connection closed by Twitter


### Asynchronous vs. Synchronous Streams
* Tweepy supports asynchronous tweet streams by creating a subclass of `AsyncStreamingClient`
* Allows your application to continue executing while your listener waits to receive tweets
* Convenient in GUI applications, so users can continue interacting with other parts of the application while tweets arrive

------

# 12.14 Tweet Sentiment Analysis 
* Political researchers might use during elections to understand how people feel about specific politicians and issues, and **how they're likely to vote**
* Companies might use to see what people are saying about their products and competitors’ products
* Script `sentimentlistener.py` checks sentiment on a specified topic for a specified number of tweets

In [44]:
run sentimentlistener.py football 10

- mhussein_fcb: These people have never watched a game of football I’m convinced

+ cuongdien2468: @Football__Tweet The first pic Mitrovic slandering the ref said it all

+ MCFCTone: Enjoy a couple of the City podcasts but one of the more established ones they sound so bored when talking City / football, why even bother 😁😁

+ FrannyReilly: @Joanna54006491 Where you been hiding lol 😂 I've deleted people and I'm suddenly finding people I used to message Awww I've just had a quiet one,watched football and F1 and packed a few things for leaving tomorrow,back to nightshifts next week xx

- _aas20_: I skipped a lime to watch football and FLOW just fucking up😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡

  Adriana10474322: Football vfk The strongest round games 👍👍 Barcelona vs Real Madrid • Mobile ➤👍 • PC ➤👍 #ElClásico

- wendlofc: @MambaSZN explain how that was bad in football terms

+ YoungArab61: We might not be playing the best football but baldy always has a plan.

+ HabibiCity21: @Predaxxx @ESPNFC Lol

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Class `SentimentListener`

* Import the keys.py file and the libraries used throughout the script

```python
# sentimentlisener.py
"""Script that searches for tweets that match a search string
and tallies the number of positive, neutral and negative tweets."""
import keys
import preprocessor as p 
import sys
from textblob import TextBlob
import tweepy
```

### Class `SentimentListener`: `__init__` Method
* Receives:
    * `bearer_token` for authentication
    * `sentiment_dict` dictionary in which we’ll keep track of the tweet sentiments
    * `topic` we’re searching for so we can ensure that it appears in the tweet text  
    * `limit` of tweets to process (not including the ones we eliminate)
* Each of these is stored in the current `SentimentListener` object (`self`)

```python
class SentimentListener(tweepy.StreamingClient):
    """Handles incoming Tweet stream."""

    def __init__(self, bearer_token, sentiment_dict, topic, limit=10):
        """Configure the SentimentListener."""
        self.sentiment_dict = sentiment_dict
        self.tweet_count = 0
        self.topic = topic
        self.TWEET_LIMIT = limit

        # set tweet-preprocessor to remove URLs/reserved words
        p.set_options(p.OPT.URL, p.OPT.RESERVED) 
        super().__init__(bearer_token, wait_on_rate_limit=True)
```

### Method `on_response `
* If the tweet is not a retweet (line 28):
    * Line 29 gets and cleans the tweet’s text 
    * Lines 32–33 skip the tweet if it does not contain `topic` in the tweet text
    * Lines 36–45 use a `TextBlob` to check the tweet’s sentiment and update the `sentiment_dict` accordingly
    * Line 48 gets the sender’s `username` from `response.includes['users']` — we’ll use an expansion to include this user object 
    * Line 49 prints the tweet text preceded by `+` for positive sentiment, a space for neutral sentiment or `-` for negative sentiment
    * Line 51 increments the `tweet_count`, and lines 54–55 check whether the app should disconnect from the tweet stream

```python
    def on_response(self, response):
        """Called when Twitter pushes a new tweet to you."""

        # if the tweet is not a retweet
        if not response.data.text.startswith('RT'):
            text = p.clean(response.data.text) # clean the tweet

            # ignore tweet if the topic is not in the tweet text
            if self.topic.lower() not in text.lower():
                return

            # update self.sentiment_dict with the polarity
            blob = TextBlob(text)
            if blob.sentiment.polarity > 0:
                sentiment = '+'
                self.sentiment_dict['positive'] += 1 
            elif blob.sentiment.polarity == 0:
                sentiment = ' '
                self.sentiment_dict['neutral'] += 1 
            else:
                sentiment = '-'
                self.sentiment_dict['negative'] += 1 

            # display the tweet
            username = response.includes['users'][0].username
            print(f'{sentiment} {username}: {text}\n')

            self.tweet_count += 1 # track number of tweets processed

            # if TWEET_LIMIT is reached, terminate streaming
            if self.tweet_count == self.TWEET_LIMIT:
                self.disconnect()
```

### Main Application
* The main application is defined in the function `main` (lines 57–87; discussed after the code), which is called by lines 90–91 when you execute the file as a script
* `sentimentlistener.py` also can be imported into IPython or other modules to use class `SentimentListener` as we did with `TweetListener`

```python
def main():
    # get search term and number of tweets
    search_key = sys.argv[1]
    limit = int(sys.argv[2]) # number of tweets to tally

    # set up the sentiment dictionary
    sentiment_dict = {'positive': 0, 'neutral': 0, 'negative': 0}

    # create the StreamingClient subclass object
    sentiment_listener = SentimentListener(keys.bearer_token, 
        sentiment_dict, search_key, limit)

    # redirect sys.stderr to sys.stdout
    sys.stderr = sys.stdout

    # delete existing stream rules
    rules = sentiment_listener.get_rules().data
    rule_ids = [rule.id for rule in rules]
    sentiment_listener.delete_rules(rule_ids)    

    # create stream rule
    sentiment_listener.add_rules(
        tweepy.StreamRule(f'{search_key} lang:en'))

    # start filtering English tweets containing search_key
    sentiment_listener.filter(expansions=['author_id'])

    print(f'Tweet sentiment for "{search_key}"')
    print('Positive:', sentiment_dict['positive'])
    print(' Neutral:', sentiment_dict['neutral'])
    print('Negative:', sentiment_dict['negative'])

# call main if this file is executed as a script
if __name__ == '__main__':
    main()
```

* In `main`:
    * Lines 59–60 get the command-line arguments
    * Line 63 creates the `sentiment_dict` dictionary that keeps track of the tweet sentiments
    * Lines 66–67 create the `SentimentListener` 
    * Line 70 redirects the standard error stream to the standard output stream
    * Lines 73–75 delete any existing `StreamRule`s
    * Lines 78–79 create a new `StreamRule` that searches for English (`lang:en`) tweets that match the `search_key`
    * Line 82 starts the stream — `expansions` indicates that we’d like Twitter to include the tweet sender’s user object in the response
    * Lines 84–87 display the sentiment report

------

# 12.15 Geocoding and Mapping
* Collect streaming tweets, then plot their locations on an interactive map
* **Twitter disables precise location info (latitude/longitude) by default** (users must opt in to allowing Twitter to track locations) 
* Large percentage include the user’s home location information
    * Sometimes invalid or fictitious 
* Map markers will show the sender's `location` and tweet text

### [**geopy** library](https://github.com/geopy/geopy)
* Setup in Section 12.6
* **Geocoding**&mdash;translate locations into **latitude** and **longitude**
* **geopy** supports dozens of **geocoding web services**, many with **free or lite tiers**
* We’ll use **OpenMapQuest geocoding service** 

### OpenMapQuest Geocoding API
* Sign-up instructions in Section 12.6
* Convert locations, such as **Boston, MA** into their **latitudes** and **longitudes**, such as **42.3602534** and **-71.0582912**, for plotting on maps


### [**folium library**](https://github.com/python-visualization/folium) and Leaflet.js JavaScript Mapping Library
* Setup in Section 12.6
* For maps — uses **Leaflet.js JavaScript mapping library** to display maps in a web page 
* Folium save as HTML files that you can view in your web browser

## 12.15.1 Getting and Mapping the Tweets
* We’ll use utility functions from our **`tweetutilities.py`** file and class **`LocationListener`** in **`locationlistener.py`**

### Collections Required By LocationListener
* a list (`tweets`) to store the data from the tweets we collect 
* a dictionary (`counts`) to track the total number of tweets we collect and the number that have location data

In [45]:
tweets = [] 

counts = {'total_tweets': 0, 'locations': 0}

### Creating the LocationListener 
* Collect 50 tweets about `'football'`
* `LocationListener` will use utility function `get_tweet_content` (located in `tweetutilities.py`; discussed in Section 12.15.2) to place in a dictionary the `username`, tweet `text` and user `location` from each tweet

In [46]:
from locationlistener import LocationListener

location_listener = LocationListener(
    keys.bearer_token, counts_dict=counts, tweets_list=tweets,
    topic='football', limit=50)

### Redirect sys.stderr to sys.stdout

In [47]:
import sys

sys.stderr = sys.stdout

### Delete Existing StreamRules

In [48]:
rules = location_listener.get_rules().data

rule_ids = [rule.id for rule in rules]

location_listener.delete_rules(rule_ids)    

Response(data=None, includes={}, errors=[], meta={'sent': '2023-03-19T19:36:36.330Z', 'summary': {'deleted': 1, 'not_deleted': 0}})

### Create a StreamRule
* Rule to get tweets in English (`lang:en`) about football 

In [49]:
location_listener.add_rules(
    tweepy.StreamRule('football lang:en'))

Response(data=[StreamRule(value='football lang:en', tag=None, id='1637538581475389444')], includes={}, errors=[], meta={'sent': '2023-03-19T19:36:37.793Z', 'summary': {'created': 1, 'not_created': 0, 'valid': 1, 'invalid': 0}})

### Configure and Start the Stream of Tweets
* start streaming the tweets
    * expansion `'author_id'` gets information about the user who sent the tweet, including the `username`
    * `user_fields` argument specifies that the user information should include the account’s `'location'` 
    * `tweet_fields` argument specifies additional information to include with each tweet—in this case, the tweet’s `language`


In [50]:
location_listener.filter(expansions=['author_id'], 
    user_fields=['location'], tweet_fields=['lang'])

50: wandcfc: We would like to welcome new girls to ‘The Beautiful Game’ at our Wildcats sessions. Our sessions are fun, inclusive and a great place to make new friends and enjoy playing football. https://t.co/XAqB0VrN0x

Stream connection closed by Twitter


### Displaying the Location Statistics
* check how many tweets we processed, how many had locations and the percentage that had locations

In [51]:
counts['total_tweets']

82

In [52]:
counts['locations']

50

In [53]:
print(f'{counts["locations"] / counts["total_tweets"]:.1%}')

61.0%


### Geocoding the Locations
* Use `get_geocodes` utility function (from `tweetutilities.py`; discussed in Section 12.15.2) to geocode the location of each tweet stored in the list of tweets

In [54]:
from tweetutilities import get_geocodes

bad_locations = get_geocodes(tweets)

Getting coordinates for tweet locations...
Done geocoding


* For each tweet with a valid location, the `get_geocodes` function adds the new keys `'latitude'` and `'longitude'` to that tweet’s dictionary in the `tweets` list — these will be used to plot map markers on our interactive map

### Displaying the Bad Location Statistics

In [55]:
bad_locations

5

In [56]:
print(f'{bad_locations / counts["locations"]:.1%}')

10.0%


### Cleaning the Data
* Before we plot the tweet locations on a map, let’s use a pandas `DataFrame` to clean the data
* When you create a * DataFrame* from the `tweets` list, it will contain the value `NaN` for the `'latitude'` and `'longitude'` of any tweet that does not have a valid location
* `NaN` cannot be plotted on a map, so remove any rows containing `NaN` by calling the `DataFrame`’s `dropna` method

In [57]:
import pandas as pd

In [58]:
df = pd.DataFrame(tweets)

In [59]:
df

Unnamed: 0,username,text,location,latitude,longitude
0,Chrisob1985,@AngelD53416774 Someone who doesn't know rules...,Ireland,53.17588,-8.146006
1,JJShreeve,Updating my football music playlist for my fut...,London,51.507408,-0.127699
2,Mango_the_OG,@lobzin_soul Amazing football watch the game,"Mogwase, South Africa",-25.2747,27.21043
3,Ravennfer,Arbi is going down in history! 📚 @SignalsRiley...,"Charlotte, NC",35.22286,-80.83796
4,FrnzJaeger,@Sachinettiyil Yesterday. Rakow Częstochowa f...,"Knoxville, TN",35.96068,-83.92103
5,MadVee_98,@Football__Stage Supersport is running with it,Durban,-29.85756,31.02781
6,leonsbilliards,@colin_dunlap how about go all in for football...,"Wexford, PA",40.62334,-80.0538
7,Kyle_xcy,In terms of general football tweets is there a...,A galaxy far far away,-26.21232,152.40062
8,slickDA1st,@Olag0ke The way we watch football has changed...,Sane part of life,26.79837,80.88592
9,osmanuludag8888,@UEFAcom @Turbine_Potsdam Referees and politic...,Türkiye cumhuriyeti,39.066251,35.142286


In [60]:
df = df.dropna()

In [61]:
df

Unnamed: 0,username,text,location,latitude,longitude
0,Chrisob1985,@AngelD53416774 Someone who doesn't know rules...,Ireland,53.17588,-8.146006
1,JJShreeve,Updating my football music playlist for my fut...,London,51.507408,-0.127699
2,Mango_the_OG,@lobzin_soul Amazing football watch the game,"Mogwase, South Africa",-25.2747,27.21043
3,Ravennfer,Arbi is going down in history! 📚 @SignalsRiley...,"Charlotte, NC",35.22286,-80.83796
4,FrnzJaeger,@Sachinettiyil Yesterday. Rakow Częstochowa f...,"Knoxville, TN",35.96068,-83.92103
5,MadVee_98,@Football__Stage Supersport is running with it,Durban,-29.85756,31.02781
6,leonsbilliards,@colin_dunlap how about go all in for football...,"Wexford, PA",40.62334,-80.0538
7,Kyle_xcy,In terms of general football tweets is there a...,A galaxy far far away,-26.21232,152.40062
8,slickDA1st,@Olag0ke The way we watch football has changed...,Sane part of life,26.79837,80.88592
9,osmanuludag8888,@UEFAcom @Turbine_Potsdam Referees and politic...,Türkiye cumhuriyeti,39.066251,35.142286


### Creating a Map with Folium
Create a folium Map on which we’ll plot the tweet locations

In [62]:
import folium

In [63]:
usmap = folium.Map(location=[39.8283, -98.5795], 
    tiles='Stamen Terrain', zoom_start=5, detect_retina=True)

* `location` keyword argument specifies a sequence containing latitude and longitude coordinates for the **map’s center point** 
    * The values in this snippet are the **geographic center of the continental United States**
    * In many places worldwide, the term `'football'` describes the sport we call soccer in the U.S., so some of the tweets we plot may be outside the U.S
    * You can zoom using the **+** and **–** buttons at the map’s top-left, or you can dragging the map with the mouse (that is, pan) to see anywhere in the world
*  `zoom_start` keyword argument specifies the map’s initial zoom level, lower values show more of the world
* `detect_retina` keyword argument enables folium to detect high-resolution screens to use higher-resolution maps from `OpenStreetMap.org`

### Creating Popup Markers for the Tweet Locations
* Create `folium` `Popup` objects containing each tweet’s text and add them to the `Map`
* `DataFrame` method `itertuples` creates a named tuple from each row containing properties corresponding to each `DataFrame` column

In [64]:
for t in df.itertuples():
    text = ': '.join([t.username, t.text])
    popup = folium.Popup(text, parse_html=True)
    marker = folium.Marker((t.latitude, t.longitude), 
                           popup=popup)
    marker.add_to(usmap)

* Creates a string (`text`) containing the user’s `username` and tweet `text` 
* Creates a `folium` `Popup` to display the `text`
* Creates a `folium` `Marker`
    * tuple to specify the `Marker`’s latitude and longitude
    * `popup` keyword argument associates the tweet’s `Popup` object with the new `Marker`
* Calls the `Marker`’s `add_to` method to specify the `Map` that will display the `Marker`

### Saving the Map
* Call the `Map`’s `save` method to store the map in an HTML file, which you can then double-click to open in your web browser

In [65]:
usmap.save('tweet_map.html')

In [66]:
usmap # displays the map in the notebook

## 12.15.2 Utility Functions in `tweetutilities.py` 
### `get_tweet_content` Utility Function 
* Receives a **`StreamResponse` object (`response`)** and creates a **dictionary** containing the **tweet’s `username`, `text` and `location`**

```python
def get_tweet_content(response):
    """Return dictionary with data from tweet."""
    fields = {}
    fields['username'] = response.includes['users'][0].username
    fields['text'] = response.data.text
    fields['location'] = response.includes['users'][0].location

    return fields
```

### `get_geocodes` Utility Function 
* Receives a list of dictionaries containing tweets and **geocodes their locations**
* If geocoding is successful for a tweet, adds the **latitude** and **longitude** to the tweet’s **dictionary in `tweet_list`**
* Requires class **`OpenMapQuest`** from the **geopy module**

```python
from geopy import OpenMapQuest
```

```python
def get_geocodes(tweet_list):
    """Get the latitude and longitude for each tweet's location.
    Returns the number of tweets with invalid location data."""
    print('Getting coordinates for tweet locations...')
    geo = OpenMapQuest(api_key=keys.mapquest_key)  # geocoder
    bad_locations = 0  

    for tweet in tweet_list:
        processed = False
        delay = .1  # used if OpenMapQuest times out to delay next call
        while not processed:
            try:  # get coordinates for tweet['location']
                geo_location = geo.geocode(tweet['location'])
                processed = True
            except:  # timed out, so wait before trying again
                print('OpenMapQuest service timed out. Waiting.')
                time.sleep(delay)
                delay += .1

        if geo_location:  
            tweet['latitude'] = geo_location.latitude
            tweet['longitude'] = geo_location.longitude
        else:  
            bad_locations += 1  # tweet['location'] was invalid
    
    print('Done geocoding')
    return bad_locations

```

### `get_geocodes` Utility Function (cont.)
* Creates the **`OpenMapQuest` object** we’ll use to geocode locations
* Initializes **`bad_locations`** which we use to keep track of the number of invalid locations in the tweet objects we collected
* Attempts to **geocode the current tweet’s location**
* Prints a message that it’s done geocoding and returns the `bad_locations` value

## 12.15.3 Class `LocationListener`
```python
# locationlistener.py
"""Receives tweets matching a search string and stores a list of
dictionaries containing each tweet's username/text/location."""
import tweepy
from tweetutilities import get_tweet_content

class LocationListener(tweepy.StreamingClient):
    """Handles incoming Tweet stream to get location data."""
```

```python
    def __init__(self, bearer_token, counts_dict, 
                 tweets_list, topic, limit=10):
        """Configure the LocationListener."""
        self.tweets_list = tweets_list
        self.counts_dict = counts_dict
        self.topic = topic
        self.TWEET_LIMIT = limit
        super().__init__(bearer_token, wait_on_rate_limit=True)
```

```python
    def on_response(self, response):
        """Called when Twitter pushes a new tweet to you."""

        # get each tweet's username, text and location
        tweet_data = get_tweet_content(response)  

        # ignore retweets and tweets that do not contain the topic
        if (tweet_data['text'].startswith('RT') or
            self.topic.lower() not in tweet_data['text'].lower()):
            return

        self.counts_dict['total_tweets'] += 1 # it's an original tweet

        # ignore tweets with no location 
        if not tweet_data.get('location'):  
            return

        self.counts_dict['locations'] += 1 # user account has location
        self.tweets_list.append(tweet_data) # store the tweet
        print(f"{tweet_data['username']}: {tweet_data['text']}\n")
        
        # if TWEET_LIMIT is reached, terminate streaming
        if self.counts_dict['locations'] == self.TWEET_LIMIT:
            self.disconnect()
```

## 12.15.3 Class `LocationListener` (cont.)
* `__init__` receives 
    * the `bearer_token` 
    * the number of tweets to process (`limit`)
    * `counts` dictionary that we use to keep track of the total number of tweets processed
    * `tweet_list` in which we store the dictionaries returned by the `get_tweet_content` utility function
    * a string representing the topic so we can confirm that its text is contained in the tweet text

## 12.15.3 Class `LocationListener` (cont.)
* In method `on_response`
   * Line 23 calls `get_tweet_content` to get each tweet’s screen name, text and location.
    * Lines 26–28 ignore the tweet if it is a retweet or if the text does not include the topic we’re searching for
    * Line 30 adds 1 to the value of the `'total_tweets'` key in the `counts` dictionary to track the number of original tweets
    * Lines 33–34 ignore tweets that have no location data
    * Line 36 adds 1 to the value of the `counts` dictionary’s `'locations'` key to indicate that we found a tweet with a location
    * Line 37 appends the `tweet_data` dictionary to the `tweets_list`
    * Line 38 displays the tweet’s screen name and tweet text so you can see that the app is making progress
    * Lines 41–42 check whether the `TWEET_LIMIT` has been reached, and if so, disconnect from the stream.

------

# More Info 
* See Lesson 12 in [**Python Fundamentals LiveLessons** here on O'Reilly Online Learning](https://learning.oreilly.com/videos/python-fundamentals/9780135917411)
* See Chapter 12 in [**Python for Programmers** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/)
* See Chapter 13 in [**Intro Python for Computer Science and Data Science** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/intro-to-python/9780135404799/)
* Interested in a print book? Check out:

| Python for Programmers<br>(640-page professional book) | Intro to Python for Computer<br>Science and Data Science<br>(880-page college textbook)
| :------ | :------
| <a href="https://amzn.to/2VvdnxE"><img alt="Python for Programmers cover" src="../images/PyFPCover.png" width="150" border="1"/></a> | <a href="https://amzn.to/2LiDCmt"><img alt="Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud" src="../images/IntroToPythonCover.png" width="159" border="1"></a>

>Please **do not** purchase both books&mdash;_Python for Programmers_ is a subset of _Intro to Python for Computer Science and Data Science_

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  