In [None]:
%%html
<!-- CSS settings for this notbook -->
<style>
    h1 {color:#BB0000}
    h2 {color:purple}
    h3 {color:#0099ff}
    hr {    
        border: 0;
        height: 3px;
        background: #333;
        background-image: linear-gradient(to right, #ccc, black, #ccc);
    }
</style>

# Data Mining Mastodon

------

# Objectives
* What is **Mastodon**?
* Why we're presenting **Mastodon** rather than **Twitter**
* **Data-mine Mastodon** with **Mastodon.py** library
* Use various **Mastodon** API methods
* **Get information** about a specific Mastodon account
* **Look up trending hashtags** 
* **Search for toots** containing a specific hashtag
* **Process streams of toots** as they’re happening
* **Clean and preprocess toots** to prepare them for analysis
* **Translate foreign language toots** into English 
* Tap into the **live streams of toots**
<!--* Perform **sentiment analysis** on toots from the live stream-->
* Create an **interactive map of Mastodon servers locations** from which Toots are received

------

# 12.1 Introduction 
* **Data mining** &mdash; searching large collections of data for **insights**
* **Sentiment** in toots can help **make predictions**  
    * **Stock prices**
    * **Election results**
    * Likely **revenues** for a **new movie** or, more generally, **product**
    * **Success** of a company’s **marketing campaign**
* Spot **comments on your company's products** 
* Spot **faults in competitors’ products** 
* Spot **trending topics**
* **Connect to Mastodon** with easy-to-use **Web services**

## What Is Mastodon?
* Free social network
* Similar to Twitter, but decentralized and more privacy focused
* No ads/profit model
* Thousands of servers run by individuals and companies worldwide
* **Federated** (known as the **Fediverse**)
    * Independent servers distributed across the Internet
    * Communication among the server nodes  
    > https://en.wikipedia.org/wiki/Distributed_social_network
    * Architecture like **Web3** and the **Metaverse**
    * Can communicate with accounts throughout the **Fediverse** 
* **Toots**
    * Messages up to **500 characters**
    * Some servers allow more
* Anyone can generally choose to follow anyone else but depends on
    * individual users' account settings 
    * specific server rules

## Monthly Active Users
* End of October 2022: Appoximately 500,000 active monthly Mastodon users
> https://www.theguardian.com/news/datablog/2023/jan/08/elon-musk-drove-more-than-a-million-people-to-mastodon-but-many-arent-sticking-around
* Now 8 million users and about 1.8 million monthly active users
> https://mastodon-analytics.com/
> https://mashable.com/article/mastodon-monthly-users

## Accessing Mastodon Data Programmatically 
* Anyone with an account can use the APIs
* Access and manipulate accounts, servers, toots (statuses), timelines, trends, ...
* Can **tap into the live stream** of toots for a given server or the **Fediverse** 

------

# 12.2 Overview of the Mastodon APIs 
* **Web services** are methods that you call in the **cloud**
* Each method has a **web service endpoint** represented by a **URL**
* **Caution**: Internet connections can be lost, services can change and some services are not available in all countries, so **apps can be brittle**
* Some **API categories** 
    * **Accounts API** — Access information about and manipulate **Mastodon user accounts**
    * **Statuses API** — Access info about and post **status updates**, known as **toots**
    * **Timelines API** — Toots and other "events" (follows, likes, ...) over time — since the inception of each Mastodon server
    > Enables access to toots and other "events" from the **public fediverse**, **toots with specific hashtags**, **logged-in user's timeline** (including accounts the user follows) and **lists** for filtering a user's home timeline. Can use timelines to search for **past toots** containing specific hashtags and access **live toot streams**
* **Mastodon API categories** under the **API METHODS** heading in the left column at
>https://docs.joinmastodon.org/

------

# 12.3 Creating a Mastodon Account 

## Developer Accounts
* **Mastodon does not have separate developer accounts**
    * Anyone with a Mastodon account can be a developer
    * Every server has its own rules — some servers allow anyone to join, some require approval

## Servers
* Sign-up for main server: https://mastodon.social/auth/sign_up
* Or, explore servers worldwide at
    * https://joinmastodon.org/servers
    * 9500+ (December 2023)
    * Many more servers than listed here

## Mastodon Server Covenant
* Servers listed at https://joinmastodon.org/servers adhere to the **Mastodon Server Covenant**
    * https://joinmastodon.org/covenant
    * **Active moderation** by the server administrators
    * **Daily backups**
    * **Multiple administrators** with emergency access to server infrastructure for fixing problems
    * Administrator agrees to provide **at least 3 months notice** to users before **shutting down a server permanently** (if a server shuts down, the data is no longer available)

## Mastodon.social — Original and Largest Overall Mastodon Server
* Due to exodus from Twitter, **Deitel joined `mastodon.social`** 
    * Deitel requirement: Readers/viewers must be able to experiment with web services using free tiers
    * New Twitter API free and initial paid tier have extremely limited capabilities
* https://joinmastodon.org/servers enables you to filter servers based on
    * region
    * language
    * topical focus of that server
* Can join multiple servers
* Can also **set up your own servers to create new Mastodon communities** with your own rules and restrictions

------

# 12.4 What’s in a Mastodon API Response? 
* Mastodon API methods return **JSON (JavaScript Object Notation)** objects
    * Like Twitter and most popular web services today
* Text-based **data-interchange format** 
* Represents objects as **collections of name–value pairs** (like dictionaries)
* Commonly used in web services
* Human and computer readable

## JSON
* **JSON object format**:
> ```
> {propertyName1: value1, propertyName2: value2}
> ```
* **JSON array format (like Python list)**:
> ```
> [value1, value2, value3]
> ```
* **Mastodon.py handles the JSON for you** behind the scenes

## Class `mastodon.AttribAccessDict` 
* Mastodon returns JSON as **`mastodon.AttribAccessDict` objects**
* Python `dict` (dictionary) subclass
* Access via
    * traditional Python dictionary keys  
    * attributes named to match the dictionary keys
* **API ENTITIES** section of the Mastodon docs (https://docs.joinmastodon.org/) describes the 52 JSON objects you'll find in various Mastodon API responses

## Sample JSON for Trending Hashtags
* A portion of the JSON response to a request for recent trending hashtags

```json
[{'name': 'caturday',
  'url': 'https://mastodon.social/tags/caturday',
  'history': [{'day': datetime.datetime(2023, 4, 22, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '719',
    'uses': '828'},
   {'day': datetime.datetime(2023, 4, 21, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '58',
    'uses': '62'},
   {'day': datetime.datetime(2023, 4, 20, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '25',
    'uses': '32'},
   {'day': datetime.datetime(2023, 4, 19, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '28',
    'uses': '34'},
   {'day': datetime.datetime(2023, 4, 18, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '19',
    'uses': '22'},
   {'day': datetime.datetime(2023, 4, 17, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '25',
    'uses': '26'},
   {'day': datetime.datetime(2023, 4, 16, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '225',
    'uses': '254'}],
  'following': False},
 {'name': 'ScreenshotSaturday',
  'url': 'https://mastodon.social/tags/ScreenshotSaturday',
  'history': [{'day': datetime.datetime(2023, 4, 22, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '56',
    'uses': '59'},
   {'day': datetime.datetime(2023, 4, 21, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '0',
    'uses': '0'},
   {'day': datetime.datetime(2023, 4, 20, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '0',
    'uses': '0'},
   {'day': datetime.datetime(2023, 4, 19, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '0',
    'uses': '0'},
   {'day': datetime.datetime(2023, 4, 18, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '3',
    'uses': '3'},
   {'day': datetime.datetime(2023, 4, 17, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '3',
    'uses': '3'},
   {'day': datetime.datetime(2023, 4, 16, 0, 0, tzinfo=datetime.timezone.utc),
    'accounts': '24',
    'uses': '26'}],
  'following': False},
  ...
 ]
```

------

# 12.5 Installing the Libraries Used in This Notebook

## Installing Mastodon.py 
* https://github.com/halcy/Mastodon.py 
* Easy access to Mastodon APIs
* Mastodon.py docs: https://mastodonpy.readthedocs.io/
> `pip3 install Mastodon.py`



## DeepL AI Translator 
* Mastodon's API supports translation, but not yet supported by Mastodon.py library
* https://github.com/DeepLcom/deepl-python
> `pip install --upgrade deepl`
* DeepL requires an API key
* Free one allows 500,000 characters/month
* To get a key:
> * Go to https://www.deepl.com/pro#developer
> * Click **API**
> * Click **Sign up for free**
> * Under **DeepL API Free** click **Sign up for free**
> * Specify an email/password and click **Continue**
> * Fill in the form and provide a credit card — required to prevent “fraudulent multiple registrations”, then click **Continue**
> * Read the terms and, if you agree, click **Sign up for free**
> * Click the **Account Management** link on the thank you page
> * Click the **Account** tab and scroll to **Authentication Key for DeepL API**
> * Copy your key then open **`keys_mastodon.py`** and replace `'your key here'` with your DeepL key
>> `deepL_key = 'your key here'`

## Installing geopy 
* https://github.com/geopy/geopy
* Convert locations, such as **Boston, MA**, into latitudes and longitudes, such as **42.3602534** and **-71.0582912**, for plotting on maps
* We'll use the free **ArcGIS** service
>`conda install -c conda-forge geopy`
> * Windows users: **Run the Anaconda Prompt as an Administrator**

## Folium Library and Leaflet.js JavaScript Mapping Library
* https://github.com/python-visualization/folium
* Creates interactive maps
> `pip install folium`

**Maps from OpenStreetMap.org**
* Leaflet.js uses open-source maps from `OpenStreetMap.org`. 
* Copyrighted by the OpenStreetMap.org contributors
* www.openstreetmap.org/copyright 
* www.opendatacommons.org/licenses/odbl

------

# 12.6 Preparing to Interact with Mastodon Programmatically

## Import Username and Password
* Before executing this cell, ensure that your copy of `keys_mastodon.py` contains your Mastodon credentials
* **Many Mastodon APIs do not require authentication**
    * Some APIs optionally require authentication — determined by each server's administrator
    * Some require authentication, such as those that enable administration of a mastodon server
* See each method's documentation for **authentication requirements**
    * Mastodon.py: https://mastodonpy.readthedocs.io/
    * Main Mastodon docs: https://docs.joinmastodon.org/
* **We will log in, so we are authenticated for calls that require authentication**, such as searching for accounts

In [None]:
import keys_mastodon

## Register a Mastodon App
* Must be done **once per server** that you'll directly interact with via the API 
    * As you'll see, through one server, you can get access to the Fediverse data
* For apps you are distributing (e.g., a mobile-phone app for interacting with Mastodon)
    * must register **once for each device/server pair**
    * for example, a mobil app might allow the user to manage accounts on multiple Mastodon servers
* Arguments
    * app name
    * `api_base_url` — your specific Mastodon server
    * `to_file` — file in which `create_app` saves app credentials to the specified file

In [None]:
from mastodon import Mastodon

In [None]:
# create deiteltest app and save its credentials
credentials = Mastodon.create_app(
    'DeitelPythonDataScienceMastodonApp',
    api_base_url='https://mastodon.social',
    to_file='deiteltest_client_credentials.secret'
)

In [None]:
# create deiteltest app and save its credentials
credentials = Mastodon.create_app(
    'DeitelPythonDataScienceMastodonApp',
    api_base_url='https://mastodon.online',
    to_file='deiteltest_client_credentials.secret'
)

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.7 Creating a `Mastodon` object to Access Mastodon APIs
* **`Mastodon` object** is your gateway to using the Mastodon APIs
* Uses the info stored via the `to_file` parameter in preceding `create_app` call 

In [None]:
mastodon = Mastodon(client_id='deiteltest_client_credentials.secret')

## Log into Mastodon
* Log into the account
* May not be required depending on the API methods you'll use

In [None]:
access_token = mastodon.log_in(keys_mastodon.usr, keys_mastodon.pwd, 
    to_file='deiteltest_client_credentials.secret')

## **Example:** Rate Limits
* Typically, **300 calls per user** or **7500 calls per IP address** in **5 minutes**
    * Can vary by server
* Options **throw**, **wait** (default) and **pace**
    * **throw**: `MastodonRateLimitError` when a request hits the rate limit — for apps that manage their own rate limiting
    * **wait** (default): When rate limit hit, waits until rate limit resets (at end of five-minute interval), then tries again
    * **pace**: Delays each request after the first, attempting to avoid hitting the rate limit; acts like **wait** mode if limit is hit


### Number of Calls Per 5 Minutes Allowed on This Server

In [None]:
mastodon.ratelimit_limit 

### Number of Calls Remaining in Current Rate Interval Period

In [None]:
mastodon.ratelimit_remaining

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.8 **Example:** Getting a Mastodon Instance's (Server's) Info
* **Instance** JSON description: https://docs.joinmastodon.org/entities/Instance/

### Get the Instance Info
* Depending on server, might need to be logged in

In [None]:
instance = mastodon.instance() 

### Print Some Instance Info
* `title` ― server name
* `uri` ― server address (access this in a web browser with `https:` followed by `uri`)
* `stats.user_count` ― number of users on that server
* `stats.status_count` ― cumulative number of toots posted to that server 
* `stats.domain_count` ― number of other known Mastodon servers in the fediverse

In [None]:
print(f'{"server title":>19}: {instance.title}')
print(f'{"uri":>19}: {instance.uri}')
print(f'{"short_description":>19}: {instance.short_description}')
print(f'{"stats.user_count":>19}: {instance.stats.user_count:,}')
print(f'{"stats.status_count":>19}: {instance.stats.status_count:,}')
print(f'{"stats.domain_count":>19}: {instance.stats.domain_count:,}')

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.9 Searching for Mastodon Accounts By User Name
* A mobile app might allows user to locate other accounts to follow
* Can programmatically search for accounts containing a specified string

## **Example:** Find Account Names Containing `'Mastodon'`
* Returns list of **Account**s
* **Account** JSON description: https://docs.joinmastodon.org/entities/Account/

In [None]:
accounts = mastodon.account_search(q='Mastodon@mastodon.social') 

In [None]:
len(accounts)

## **Example:** Basic Account Information for Top 3 Accounts with `mastodon` in the name
* You can discover info about an account
    * Might want to follow and account based on popularity (number of followers)
    * Might want to follow some of the same accounts that a specific account follows
* Each has many properties, including:
    * `username` — user’s Mastodon handle 
    * `id` — account’s unique ID number
    * `url` — URL used to access the account in a web browser
    * `note` — account description (may contain HTML tags)
    * `statuses_count` — number of toots posted by the account
    * `followers_count` — account's number of followers
    * `following_count` — number of other accounts this account follows
* Sort by `followers_count` in descending order, then display top 3 accounts by followers

In [None]:
sorted_accounts = sorted(accounts, key=lambda acct: acct.followers_count, reverse=True)

In [None]:
print('username: ', sorted_accounts[0].username)
print('id: ', sorted_accounts[0].id)
print('url: ', sorted_accounts[0].url)
print(f'statuses_count: {sorted_accounts[0].statuses_count:,}')
print(f'followers_count: {sorted_accounts[0].followers_count:,}')
print(f'following_count: {sorted_accounts[0].following_count:,}')
print()

### Getting Your Own Account’s Information
* Get via `Mastodon` object’s `me` method
> `my_account = mastodon.me()`
* Returns an **Account object** for the account you used to authenticate with Mastodon

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.10 Spotting Trending Hashtags: Mastodon Trends API
* If a topic **“goes viral,”** thousands or even millions of people could be talking about it
* Mastodon allows you to look up **trending hashtags**, **trending toots** and  **trending links** across the fediverse
    * NOTE: **trending toots** and **trending links** are supposed to return lists of items, but each returns only one item at the moment 
    * **I filed an issue in the Mastodon.py GitHub repository** — the developers acknowledged the bug and are looking into it 

## **Example:** Getting a List of Trending Hashtags 
* `trending_tags` returns a list of trending hashtags in the fediverse
* Returned as JSON **Tag** objects
> https://docs.joinmastodon.org/entities/Tag/
* Each contains
    * `name`
    * `url`
    * `history` list of last 7 days' stats for the hashtag
    * `following` ― whether the logged in account is following the trending tag

In [None]:
trends = mastodon.trending_tags(limit=20)  # 20 max, 10 default

In [None]:
trends[0] # sample dictionary for one hashtag

## **Example:** Display Trending Hashtags in Descending Order By Toot Volume over the Last Seven Days

In [None]:
def tag_count(tag):
    """Counts number of times a hashtag was used in last 7 days"""
    total_uses = 0
    
    for day in tag.history:
        total_uses += int(day['uses'])  
    
    # add attribute to tag object specifying 7-day hashtag count
    tag.seven_day_count = total_uses 
    return total_uses

* Sort the trends in **descending** order by toot volume:

In [None]:
trends.sort(key=tag_count, reverse=True)

* Display names, counts and URLs of the **top 20 trending topics**

In [None]:
for tag in trends:
    print(f'{tag.name}: {tag.seven_day_count}')
    print(f'   {tag.url}')

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.11 Searching for Toots Containing Specific Hashtags; Getting More than One Page of Results 

* Mastodon API methods often **return collections of objects** 
    * Docs describe these as "Arrays" 
    * Mastodon.py returns **lists of Python dictionaries**
* For example, `timeline_xxx` functions can return:
    * `timeline_hashtag` — toots **containing a specific hashtag**
    * `timeline_public` — toots from the **fediverse's public timeline** 
    * `timeline_local` — toots posted on the **local server**
    * `timeline_home` — toots in a user’s **home timeline** (includes accounts followed by the user)
* Each Mastodon API call returns a maximum number of items per call
    * known as a **page of results**
    * default is often 10 or 20
    * max is typically 40 for toots and 80 for accounts
* **Mastodon.py can handle paging details** with utility functions (as you'll see momentarily)
> https://mastodonpy.readthedocs.io/en/stable/12_utilities.html

## Functions `print_toot` and `print_toots` from `tootutilities.py`
```python
# DeepL Translator object used to autodetect source language and return English
translator = deepl.Translator(keys_mastodon.deepL_key)

def print_toot(toot):
    """Prints one toot's username and content (translated, if not English)."""
    
    # display the username of the sender 
    print(f'{toot.account.username}:')

    # display toot text
    if toot.language and toot.language.startswith('en'):
        # render HTML version of toot in an HTML div indented 0.5 inches
        text = profanity.censor(toot.content)
        display(HTML(f'<div style="padding-left: 0.5in;">{text}</div>'))

    # if the language is not English, display original & translated text 
    else:            
        # render HTML version of toot in an HTML div indented 0.5 inches
        print(f'ORIGINAL: ')
        display(HTML(f'<div style="padding-left: 0.5in;">{toot.content}</div>'))

        # translate content
        result = translator.translate_text(toot.content, target_lang='en-us')

        # render HTML version of toot in an HTML div indented 0.5 inches
        print(f'TRANSLATED: ')
        text = profanity.censor(result.text)
        display(HTML(f'<div style="padding-left: 0.5in;">{text}</div>'))
        
def print_toots(toots):
    """For each toot in toots, call print_toot to display 
       the username of the sender and toot text."""
    for toot in toots:
        print_toot(toot)
```

## **Example:** Paging Through Results
* We purposely grab only 2 toots per page of results to show the mechanics of paging

In [None]:
total_toots = 10 # how many toots to get
pages = 5 # of pages of toots to process
toots_per_page = total_toots // pages # 2 per page for demo purposes; can do 40/page
hashtag = 'football'

### Get First Page of Toots Containing a Given Hashtag 
* `timeline_hashtag` searches public timeline of past toots containing hashtag you specify

In [None]:
result = mastodon.timeline_hashtag(hashtag, limit=toots_per_page)
saved_toots = result # saved_toots will eventually contain all the toots

### Get Remaining Pages of Toots Containing a Hashtag 
* Mastodon.py utility function **`fetch_next`** gets the next page of results
* **Argument is the previous page of results**, which includes the info need by Mastodon.py to get the next page of result

In [None]:
for toots in range(pages - 1): # for each remaining page
    # save previous page of results
    previous_result = result 
    
    # use Mastodon.py utility function fetch_next to get next page of results
    result = mastodon.fetch_next(previous_result) 

    # if there are results add them to saved_toots; otherwise, temrinate loop
    if result:
        saved_toots += result
    else:
        break # no more results

In [None]:
len(saved_toots) # total toots acquired

In [None]:
import tootutilities 
tootutilities.print_toots(saved_toots)

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

# 12.12 Cleaning/Preprocessing Toots for Analysis
* **Data cleaning** is one of data scientists' most common and important tasks 
* Depending on the text analyses you wish to perform, you may need to normalize text so NLP tools can "understand" it
    * abbreviations, slang, incorrectly spelled words, inconsistent formatting, etc. can limit NLP tools' ability to understand and analyze text 
* Some NLP tasks (Lesson 11) for normalizing social media posts
    * Converting all text to the same case
    * Removing `#` from hashtags, `@`-mentions, duplicates, hashtags
    * Removing excess whitespace, punctuation, **stop words**, URLs
    * **Stemming** and **lemmatization**
    * **Tokenization**
    * Removing formatting, like HTML, which NLP tools might not understand

### **BeautifulSoup4** Library
* https://www.crummy.com/software/BeautifulSoup/
> `pip install beautifulsoup4`
* Most popular library for **parsing HTML and extracting content** from it
* Commonly used to **data-mine content in web pages**

### **tweet-preprocessor** Library 
* https://github.com/s/preprocessor
* Library designed to clean tweets, but useful for posts in general
* `pip install tweet-preprocessor`
* Can automatically remove any combination of:

| Option | Option constant |
| :--- | :--- |
| **`OPT.MENTION`** | @-Mentions (e.g., `@nasa`) |
| **`OPT.EMOJI`** | Emoji |
| **`OPT.HASHTAG`** | Hashtag (e.g., `#mars`) |
| **`OPT.NUMBER`** | Number |
| **`OPT.RESERVED`** | Twitter reserved Words (`RT` and `FAV`) |
| **`OPT.SMILEY`** | Smiley |
| **`OPT.URL`** | URL |

## **Example:** Cleaning a Toot Containing HTML and a URL

In [None]:
toot_text = '<p style="padding-left: 3em">A sample fake toot with a URL https://nasa.gov</p>'

* **BeautifulSoup** library can be used to **parse HTML and extract content** from it

In [None]:
from bs4 import BeautifulSoup

In [None]:
soup = BeautifulSoup(toot_text, 'html.parser') 

In [None]:
plain_text = soup.get_text() # remove all HTML/CSS tags and commands

In [None]:
plain_text

* The **tweet-preprocessor** library’s module name is **`preprocessor`**

In [None]:
import preprocessor as p

In [None]:
p.set_options(p.OPT.URL)

In [None]:
p.clean(plain_text)

------

# 12.13 Mastodon Streaming API
* Your app can receive various Mastodon streams as they occur in real-time
    * `stream_hashtag` — toots containing specified hashtag (home timeline and notifications)
    * `stream_user` — events related to the logged in user account
    * `stream_public` — public fediverse event stream
    * `stream_local` — local server event stream
    * `stream_list` — events for the specified user, but resticted to accounts from a list 

## Creating a Subclass of `StreamListener` 
* Mastodon **pushes** data to your listener
* Streaming rate varies 
* Create a **subclass of Mastodon.py’s `StreamListener` class** to process the stream
* Mastodon.py calls `StreamListener` methods as it receives events
    * `on_update(self, status)` is called when when a toot arrives from the stream
    * `StreamListener` defines other **`on_`** methods for other "events"
    > https://mastodonpy.readthedocs.io/en/stable/10_streaming.html#streamlistener
    * Override only the methods your app needs

## Class `TootListener` (Located in tootlistener.py)

```python
# tootlistener.py
"""StreamListener subclass that processes tweets as they arrive."""
from mastodon import StreamListener
import tootutilities 

# Global variable toot_listener_stream will be set to the stream handle so  
# we can close the stream after a specified number of toots are received
toot_listener_stream = None 

class TootListener(StreamListener):
    """Handles incoming toot stream."""

    def __init__(self, limit=10):
        """Create instance variables for tracking number of tweets."""
        self.toot_count = 0
        self.TOOT_LIMIT = limit

    def on_update(self, status):
        """Called when your listener receives a toot (status)."""
        tootutilities.print_toot(status)
        print()
        self.toot_count += 1 # track number of toots received
            
        # if TOOT_LIMIT is reached, close the stream
        if self.toot_count == self.TOOT_LIMIT:
            toot_listener_stream.close()
```

## **Example:** Streaming the Mastodon Federated Timeline

### Creating a `TootListener` 
* `StreamListener` subclass `TootListener` manages the connection to the Mastodon stream and receives and processes the toots

In [None]:
import tootlistener 

In [None]:
toot_listener = tootlistener.TootListener(limit=5)

### Streaming All Public Events
* Events that are not toots are ignored by our `StreamListener` subclass
    * notifications (e.g., someone followed you or reblogged your post)
    * a toot deleted
    * someone direct messaged you
    * a toot was edited
    * the streaming connection terminated
* `stream_public` starts the live stream
    * `toot_listener` receives each event—for toots, displays toot text
    * `run_async=True` ensures that `stream_public` **returns a stream handle** we can use to **close the stream**
* **Asynchronous vs. Synchronous Streams**
    * `run_async=True` (asynchronous) runs the stream in a separate thread and returns a stream handle for managing the stream
    * `run_async=False` (synchronous) runs the stream forever unless an unanticipated failure occurs, such as an unhandled exception

In [None]:
# store the stream handle 
tootlistener.toot_listener_stream = mastodon.stream_public(
    toot_listener, run_async=True)

------

# 12.14 **Example:** Sentiment Analysis 
* Political researchers might use during elections to understand how people feel about specific politicians and issues, and **how they're likely to vote**
* Companies might use to see **what people are saying about their products and competitors’ products**
* Class `SentimentListener` (in `sentimentlistener.py`) checks sentiment on toots 

## Class `SentimentListener`
```python
# sentimentlisener.py
"""Tallies the number of positive, neutral and negative toots."""
import keys_mastodon
from better_profanity import profanity 
from mastodon import StreamListener
from bs4 import BeautifulSoup
import deepl
import preprocessor as p 
from textblob import TextBlob

# load censored words list
profanity.load_censor_words()

# stream will be set to the stream handle so we can close 
# the stream after a specified number of toots are received
sentiment_listener_stream = None

# translator to autodetect source language and return English
translator = deepl.Translator(keys_mastodon.deepL_key)

class SentimentListener(StreamListener):
    """Handles incoming Tweet stream."""

    def __init__(self, sentiment_dict, limit=10):
        """Configure the SentimentListener."""
        self.sentiment_dict = sentiment_dict
        self.toot_count = 0
        self.TOOT_LIMIT = limit
        
        # tweet-preprocessor remove @ mentions, emojis, hashtags, URLs
        p.set_options(p.OPT.MENTION, p.OPT.EMOJI, p.OPT.HASHTAG, p.OPT.URL)

    def on_update(self, status):
        """Called when Mastodon pushes a new toot to you."""

        # if the toot is not a retoot
        if status.reblogs_count == 0:                 
            # remove all HTML/CSS tags and commands
            plain_text = BeautifulSoup(
                status.content, 'html.parser').get_text()

            # clean the toot
            text = p.clean(plain_text) 

            # possibly translate plain_text
            if status.language and not status.language.startswith('en'):
                try:
                    result = translator.translate_text(
                        plain_text, target_lang='en-us')
                    text = profanity.censor(result.text) # save translated text
                except:
                    text = 'toot was empty'   

            # update self.sentiment_dict with the polarity
            blob = TextBlob(text)
            if blob.sentiment.polarity > 0.1:
                sentiment = '+'
                self.sentiment_dict['positive'] += 1 
            elif blob.sentiment.polarity < -0.1:
                sentiment = '-'
                self.sentiment_dict['negative'] += 1 
            else:
                sentiment = ' '
                self.sentiment_dict['neutral'] += 1 

            # display the toot
            print(f'{sentiment} {status.account.username}: {text}\n')

            self.toot_count += 1 # track number of toots processed

        # if TOOT_LIMIT reached, terminate streaming 
        if self.toot_count == self.TOOT_LIMIT:
            sentiment_listener_stream.close()
```

## Main Application

### Specify number of toots to tally

In [None]:
limit = 10

### Set up Dictionary to Track Toot Sentiment

In [None]:
sentiment_dict = {'positive': 0, 'neutral': 0, 'negative': 0}

### Create `StreamListener` Subclass Object

In [None]:
import sentimentlistener 
sentiment_listener = sentimentlistener.SentimentListener(sentiment_dict, limit)

### Start Stream and Store Its Handle

In [None]:
sentimentlistener.sentiment_listener_stream = mastodon.stream_public(
    sentiment_listener, run_async=True)

#### Display summary of results

In [None]:
print(f'Toot sentiment:')
print('Positive:', sentiment_dict['positive'])
print(' Neutral:', sentiment_dict['neutral'])
print('Negative:', sentiment_dict['negative'])

------

# 12.15 **Example:** Geocoding and Mapping
* Collect streaming toots
* Look up sever locations and plot toots at those loctions on an interactive map
* **Mastodon is privacy focused**
    * Only server admins have access to any location data
* Even in Twitter, geo location is off by default, though many accounts specify home location 
    * Sometimes invalid or fictitious 
* Map markers will show the sender's `location` and toot text

### **geopy** library
* https://github.com/geopy/geopy
* Installed in Section 5
* **Geocoding** — translate locations into **latitude** and **longitude**
* **geopy** supports dozens of **geocoding web services**, many with **free or lite tiers**


### **folium library** and Leaflet.js JavaScript Mapping Library
* https://github.com/python-visualization/folium
* Setup in Section 5
* For maps — uses **Leaflet.js JavaScript mapping library** to display maps in a web page 
* Folium can save HTML files that you can view in your web browser or add to a website

## Getting and Mapping the Toots
* We’ll use utility functions from our **`tootutilities.py`** file and class **`LocationListener`** in **`locationlistener.py`**
* Each is included after the example

### Collections Required By LocationListener
* a list (`toots`) to store the data from the toots we collect 
* a dictionary (`counts`) to track the total number of toots we collect and the number that have location data

In [None]:
toots = [] 
counts = {'total_toots': 0, 'locations': 0}

### Creating the LocationListener 
* Collect 50 toots 
* `LocationListener` will use utility function `get_toot_content` (located in `tootutilities.py`; discussed after this example) to place in a dictionary the `username`, toot `text` and Mastodon server `location` from each toot

In [None]:
import locationlistener 

location_listener = locationlistener.LocationListener(
    counts_dict=counts, toots_list=toots, limit=50)

### Start Stream and Store Its Handle
* We display the toot count so far and usernames to show progress

In [None]:
locationlistener.location_listener_stream = mastodon.stream_public(
    location_listener, run_async=True)

### Displaying the Location Statistics
* of the toots we processed, check the percentage of servers for which we were able to find locations (should be 100%)

In [None]:
counts['total_toots']

In [None]:
counts['locations']

In [None]:
print(f'{counts["locations"] / counts["total_toots"]:.1%}')

### Geocoding the Locations
* Use `get_geocodes` utility function (from `tootutilities.py`; discussed after this example) to geocode the location of each toot stored in the list of toots

In [None]:
from tootutilities import get_geocodes

bad_locations = get_geocodes(toots)

* For each toot with a valid location, the `get_geocodes` function adds the new keys `'latitude'` and `'longitude'` to that toot’s dictionary in the `toots` list — these will be used to plot map markers on our interactive map

### Displaying the Bad Location Statistics
* If geopy is unable to geo-encode a specific location `bad_locations` will be greater than 0

In [None]:
bad_locations 

In [None]:
print(f'{bad_locations / counts["locations"]:.1%}')

### Cleaning the Data
* Before we plot the toot locations on a map, let’s use a pandas `DataFrame` to clean the data
* When you create a `DataFrame` from the `toots` list, it may contain `NaN` for `'latitude'` and `'longitude'` if geopy was unable to geoencode a specific location 
* `NaN` cannot be plotted on a map, so remove any rows containing `NaN` by calling the `DataFrame`’s `dropna` method

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(toots)

In [None]:
df

In [None]:
df = df.dropna() # if there are rows with missing data drop them

In [None]:
df

### Creating a Map with Folium
Create a folium Map on which we’ll plot the toot locations

Note: We used **Stamen map tiles** in our book: 
* Stamen recently lost their funding, which they used to maintain their own servers. 
* The `folium` maintainers have decided not to support Stamen map tiles directly moving forward, but you can specify a custom map tile link with `folium`.
* Stamen has begun a new partnership with stadiamaps.com, which requires an API key. 
* See https://stadiamaps.com/stamen/ for details on setting up a free account and getting an API key. 
* In `keys_mastodon.py`, provide your API key in the `stadia_key` variable. 

In [None]:
import folium

In [None]:
#usmap = folium.Map(location=[39.8283, -98.5795], 
#    tiles='Stamen Terrain', zoom_start=4, detect_retina=True) 

In [None]:
base_tile_url = 'https://tiles.stadiamaps.com/tiles/stamen_terrain/{z}/{x}/{y}@2x.png'
tile_url = f'{base_tile_url}?api_key="{keys_mastodon.stadia_key}")'

In [None]:
usmap = folium.Map(location=[39.8283, -98.5795], 
    tiles=tile_url,
    attr='Map tiles by Stamen Design, under CC BY 4.0. Data by OpenStreetMap, under ODbL.',
    zoom_start=4, detect_retina=True)  

* `location` keyword argument specifies a sequence containing latitude and longitude coordinates for the **map’s center point** 
    * The values in this snippet are the **geographic center of the continental United States**
    * In many places worldwide, the term `'football'` describes the sport we call soccer in the U.S., so some of the toots we plot may be outside the U.S
    * You can zoom using the **+** and **–** buttons at the map’s top-left, or you can dragging the map with the mouse (that is, pan) to see anywhere in the world
*  `zoom_start` keyword argument specifies the map’s initial zoom level, lower values show more of the world
* `detect_retina` keyword argument enables folium to detect high-resolution screens to use higher-resolution maps from `OpenStreetMap.org`

### Creating Popup Markers for the Toot Locations
* Create `folium` `Popup` objects containing each toot’s text and add them to the `Map`
* `DataFrame` method `itertuples` creates a named tuple from each row containing properties corresponding to each `DataFrame` column

In [None]:
for t in df.itertuples():
    text = ''.join(['<p>' + t.username + '</p>', t.text if t.text else ''])
    popup = folium.Popup(text)
    marker = folium.Marker((t.latitude, t.longitude), 
                           popup=popup)
    marker.add_to(usmap)

* Creates a string (`text`) containing the user’s `username` and toot `text` 
* Creates a `folium` `Popup` to display the `text`
* Creates a `folium` `Marker`
    * tuple to specify the `Marker`’s latitude and longitude
    * `popup` keyword argument associates the toot’s `Popup` object with the new `Marker`
* Calls the `Marker`’s `add_to` method to specify the `Map` that will display the `Marker`

### Saving the Map
* Call the `Map`’s `save` method to store the map in an HTML file, which you can then double-click to open in your web browser

In [None]:
usmap.save('toot_map.html')

In [None]:
usmap # displays the map in the notebook

## Class `LocationListener`

```python
# locationlistener.py
"""Receives toots and stores a list of dictionaries containing 
each toot's username/content, server domain and server location."""
import keys_mastodon
import tootutilities 
from mastodon import StreamListener
from bs4 import BeautifulSoup
import deepl
import preprocessor as p 
from textblob import TextBlob

# stream will be set to the stream handle so we can close 
# the stream after a specified number of toots are received
location_listener_stream = None

class LocationListener(StreamListener):
    """Handles incoming Toot stream to get location data."""

    def __init__(self, counts_dict, toots_list, limit=10):
        """Configure the LocationListener."""
        self.toots_list = toots_list
        self.counts_dict = counts_dict
        self.TOOT_LIMIT = limit

    def on_update(self, status):
        """Called when Mastodon pushes a new toot to you."""

        # get toot's username, text and location
        toot_data = tootutilities.get_toot_content(status)  
        self.counts_dict['total_toots'] += 1 # it's an original toot

        # ignore toots with no server location--can't plot on a map
        if not toot_data.get('location'):  
            return

        self.counts_dict['locations'] += 1 
        self.toots_list.append(toot_data) # store the toot
        
        print(f"{self.counts_dict['locations']:2}: {toot_data['username']}")

        # if TOOT_LIMIT is reached, terminate streaming
        if self.counts_dict['locations'] == self.TOOT_LIMIT:
            location_listener_stream.close()
```

## Utility Functions in `tootutilities.py` 

### Utility Function `get_toot_content`  
* Receives a toot and creates a **dictionary** containing the **toot’s `username`, `text` and server `location`**

```python
def get_toot_content(toot):
    """Return dictionary with username of toot sender, toot content
       translated to English (if necessary) and the sender account's 
       Mastodon server domain."""

    fields = {}
    fields['username'] = toot.account.username

    # possibly translate plain_text
    if toot.language and not toot.language.startswith('en'):
        try:
            result = translator.translate_text(
                toot.content, target_lang='en-us')
            
            # save translated text
            fields['text'] = profanity.censor(result.text)
        except:
            fields['text'] = None
    else:
        fields['text'] = profanity.censor(toot.content)
        
    # look up server location
    fields['location'] = get_domain_location_from_url(toot.url)

    return fields
```

### Utility Function `get_domain_location_from_url`  
* Receives the URL of a toot, extracts the domain name, looks up the domain's IP address, then uses the free tier of the ipgeolocation.io API to lookup the geographic location of that IP address
*  Get a key at for ipgeolocation.io web service at: https://ipgeolocation.io/signup.html
    * store it in `ipgeolocation_key` within in c`keys_mastodon.py`

```python
def get_domain_location_from_url(url):
    """Parse the domain from url, then look up the IP address 
       and get its city.
       
       Get a key at for ipgeolocation.io web service at: 
       https://ipgeolocation.io/signup.html
    """
    
    # get domain from url argument using Python 
    # urllib.parse module's urlparse function
    domain = urlparse(url).netloc

    # get IP address for hostname using Python 
    # socket module's gethostbyname function
    ip_address = socket.gethostbyname(domain)
    
    # URL to access ipgeolocation.io web services
    ipgeolocation_url = ('https://api.ipgeolocation.io/ipgeo?apiKey=' + 
       keys_mastodon.ipgeolocation_key + f'&ip={ip_address}')

    # use Python requests module to invoke web service
    response = requests.get(ipgeolocation_url)

    # if successful get the location
    if response.status_code == 200:
        data = response.json() # convert JSON to dictionary
        location = (f"{data['city']}, {data['state_prov']}, " +
            f"{data['country_code2']}")
        return location
    else:
        print('Error getting location\n',
              f'{response.status_code}: {response.text}')
        return None
```

### `get_geocodes` Utility Function 
* Receives a list of dictionaries containing toots and **geocodes their server locations**
* If geocoding is successful for a toot, adds the **latitude** and **longitude** to the toot’s **dictionary in `toot_list`**

```python
def get_geocodes(toot_list):
    """Get the latitude and longitude for each tweet's location.
    Returns the number of tweets with invalid location data."""
    
    print('Getting coordinates for mastodon server locations...')
    geo = ArcGIS()  # geocoder
    bad_locations = 0  

    for toot in toot_list:
        processed = False
        delay = .1  # used if OpenMapQuest times out to delay next call
        while not processed:
            try:  # get coordinates for tweet['location']
                geo_location = geo.geocode(toot['location'])
                processed = True
            except:  # timed out, so wait before trying again
                print('Service timed out. Waiting.')
                time.sleep(delay)
                delay += .1

        if geo_location:  
            toot['latitude'] = geo_location.latitude
            toot['longitude'] = geo_location.longitude
        else:  
            bad_locations += 1  # toot['location'] was invalid
    
    print('Done geocoding')
    return bad_locations
```

<!--
# More Info 
* See Lesson 12 in [**Python Fundamentals LiveLessons** here on O'Reilly Online Learning](https://learning.oreilly.com/videos/python-fundamentals/9780135917411)
* See Chapter 12 in [**Python for Programmers** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/)
* See Chapter 13 in [**Intro Python for Computer Science and Data Science** on O'Reilly Online Learning](https://learning.oreilly.com/library/view/intro-to-python/9780135404799/)
* Interested in a print book? Check out:

| Python for Programmers<br>(640-page professional book) | Intro to Python for Computer<br>Science and Data Science<br>(880-page college textbook)
| :------ | :------
| <a href="https://amzn.to/2VvdnxE"><img alt="Python for Programmers cover" src="../images/PyFPCover.png" width="150" border="1"/></a> | <a href="https://amzn.to/2LiDCmt"><img alt="Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud" src="../images/IntroToPythonCover.png" width="159" border="1"></a>

>Please **do not** purchase both books&mdash;_Python for Programmers_ is a subset of _Intro to Python for Computer Science and Data Science_
-->

------
&copy;1992&ndash;2024 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 12 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  