# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

**The functions that the OS module** provides allows you to interface with the underlying operating system that Python is running on – be that Windows, Mac or Linux. <br>
You can find important information about your location or about the process. 

**The Pickle module** is serializing and de-serializing a Python object structure:<br>
In this case, the Twitter object (dict), into **a character stream** so the object can be created later in Python.

In [5]:
import pickle
import os

Create a pickle file - secret_twitter_credentials, which will enable you to authenticate with the Twitter API to access the data.

In [6]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = 'qC5eGQtluR6KC552C2OxXgbEl'
    Twitter['Consumer Secret'] = 'J2kwUgajrGkK9rtQ3XB8VL2ZskDxMAzK8oObwO2nmRDhDePE6B'
    Twitter['Access Token'] = '488484886-2IpT6dG4vb5tirYY3fuR0BY9C2nyXKnJ7B7uQaaH'
    Twitter['Access Token Secret'] = 'BVHqqJ2qP7oqNY0SihtXa9ZSjXxvapYGGKTV42NS4hxI7'
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

**pickle.dump( )**: <br>
Write the pickled representation of the object obj (Twitter) to the open file object file (f).<br>

**open( )**:<br>
You can also open a file in **“rb” (read binary), “w” (write), “a” (append), or “wb” (write binary)**. Note that if you use either “w” or “wb”, Python will overwrite the file, if it exists already or create it if the file doesn’t exist.

Install the `twitter` package to interface with the Twitter API

In [7]:
!pip install twitter



## Example 1. Authorizing an application to access Twitter account data

**OAuth** allows users to grant access to their Twitter accounts for your application without you knowing the passwords.<br>
Use the authentication and create **twitter_api, a Twitter-Api object** we will use to start accessing the data via the Twitter API.

In [8]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])
print (auth)
twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.oauth.OAuth object at 0x10a4f4be0>
<twitter.api.Twitter object at 0x10a4f4d30>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

In [9]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

In [10]:
LOCAL_WOE_ID=2487889

# Prefix ID with the underscore for query string parameterization (_id):
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID) # Top 50
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [11]:
len (world_trends)

1

**The API response is in JSON format (like a hierarchical database using Python Dictionary).** <br>
**JSON (JavaScript Object Notation) is an open-standard file format** for data interchange on the web. JSON is a lightweight **text based, data-interchange format and it completely language independent**. <br>

**[{ 'trends': X,   'as_of': '2019-10-10',   'created_at': '2019-10-10',   'locations': [{'name': 'Worldwide', 'woeid': 1}]  }]<br>**
**where X = [{'name':  ,'url':  ,'promoted_content':   ,'query':  ,'tweet_volume':   }, {}...]<br>**
https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place<br>

JSON can store Lists, bools, numbers, tuples and dictionaries. But to be saved into a file, all these structures must be reduced to strings.

In [12]:
world_trends[:2]

[{'trends': [{'name': '避難勧告',
    'url': 'http://twitter.com/search?q=%E9%81%BF%E9%9B%A3%E5%8B%A7%E5%91%8A',
    'promoted_content': None,
    'query': '%E9%81%BF%E9%9B%A3%E5%8B%A7%E5%91%8A',
    'tweet_volume': 212558},
   {'name': '避難所',
    'url': 'http://twitter.com/search?q=%E9%81%BF%E9%9B%A3%E6%89%80',
    'promoted_content': None,
    'query': '%E9%81%BF%E9%9B%A3%E6%89%80',
    'tweet_volume': 176175},
   {'name': '多摩川',
    'url': 'http://twitter.com/search?q=%E5%A4%9A%E6%91%A9%E5%B7%9D',
    'promoted_content': None,
    'query': '%E5%A4%9A%E6%91%A9%E5%B7%9D',
    'tweet_volume': 89763},
   {'name': '#PrayForJapan',
    'url': 'http://twitter.com/search?q=%23PrayForJapan',
    'promoted_content': None,
    'query': '%23PrayForJapan',
    'tweet_volume': 128570},
   {'name': '避難準備',
    'url': 'http://twitter.com/search?q=%E9%81%BF%E9%9B%A3%E6%BA%96%E5%82%99',
    'promoted_content': None,
    'query': '%E9%81%BF%E9%9B%A3%E6%BA%96%E5%82%99',
    'tweet_volume': 89031},
   {'nam

In [13]:
trends=local_trends
print(type(trends))
print(list(trends[0].keys())) 
print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>
['trends', 'as_of', 'created_at', 'locations']
[{'name': '#NLCS', 'url': 'http://twitter.com/search?q=%23NLCS', 'promoted_content': None, 'query': '%23NLCS', 'tweet_volume': 19961}, {'name': 'Shep Smith', 'url': 'http://twitter.com/search?q=%22Shep+Smith%22', 'promoted_content': None, 'query': '%22Shep+Smith%22', 'tweet_volume': 118020}, {'name': '#NationalComingOutDay', 'url': 'http://twitter.com/search?q=%23NationalComingOutDay', 'promoted_content': None, 'query': '%23NationalComingOutDay', 'tweet_volume': 147535}, {'name': 'El Camino', 'url': 'http://twitter.com/search?q=%22El+Camino%22', 'promoted_content': None, 'query': '%22El+Camino%22', 'tweet_volume': 125359}, {'name': '#SaddleridgeFire', 'url': 'http://twitter.com/search?q=%23SaddleridgeFire', 'promoted_content': None, 'query': '%23SaddleridgeFire', 'tweet_volume': 72318}, {'name': '#DayoftheGirl', 'url': 'http://twitter.com/search?q=%23DayoftheGirl', 'promoted_content': None, 'query'

## Example 3. Displaying API responses as pretty-printed JSON

The JSON module is mainly used to convert the python dictionary-like response above into a JSON string that can be written into a file.<br>
Here we also use 1 indent to make the default result easier to read.

In [22]:
import json
trends_js = json.dumps(us_trends[:2], indent=1)
print(trends_js)

[
 {
  "trends": [
   {
    "name": "Robert Forster",
    "url": "http://twitter.com/search?q=%22Robert+Forster%22",
    "promoted_content": null,
    "query": "%22Robert+Forster%22",
    "tweet_volume": 22321
   },
   {
    "name": "Anibal Sanchez",
    "url": "http://twitter.com/search?q=%22Anibal+Sanchez%22",
    "promoted_content": null,
    "query": "%22Anibal+Sanchez%22",
    "tweet_volume": 15879
   },
   {
    "name": "#ShaneXJeffree",
    "url": "http://twitter.com/search?q=%23ShaneXJeffree",
    "promoted_content": null,
    "query": "%23ShaneXJeffree",
    "tweet_volume": 27380
   },
   {
    "name": "#NLCS",
    "url": "http://twitter.com/search?q=%23NLCS",
    "promoted_content": null,
    "query": "%23NLCS",
    "tweet_volume": 19961
   },
   {
    "name": "Bayley",
    "url": "http://twitter.com/search?q=Bayley",
    "promoted_content": null,
    "query": "Bayley",
    "tweet_volume": 31317
   },
   {
    "name": "#WWEDraft",
    "url": "http://twitter.com/search?q=%23WW

In [23]:
type(trends_js)

str

## Example 4. Computing the intersection of two sets of trends

In [24]:
# get trend['name'] from each element (dict) of world_trends[0]['trends']:
# [{'name':  ,'url':  ,'promoted_content':   ,'query':  ,'tweet_volume':   }, {}...]
# create a set: set = ([  ])

In [25]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['san diego'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

In [18]:
trends_set['world']

{'#BigilTrailerDay',
 '#Eliud159',
 '#MaineCarloOnEB',
 '#PVLCollegiateConference2019',
 '#PrayForJapan',
 '#ProvaDeFogo',
 '#QuestionMark',
 '#SaturdayMotivation',
 '#ShaneXJeffree',
 '#TeamNeilEnPremiosTelehit',
 '#TheReadOnFuse',
 '#WWEDraft',
 '#cumartesi',
 '#اليوم_العالمي_للرسايل',
 '#كلمه_وقولها_للحب',
 '#台風19',
 '#台風だけど出社させた企業',
 '#台風怖すぎるから会いたい人たち',
 '#激獣神祭',
 'Anibal Sanchez',
 'Bayley',
 'Chocked',
 'Juanfer',
 'Ralf',
 'Robert Forster',
 'Ryan Zimmerman',
 'うんこ10連ガチャ',
 'ほん怖',
 'アナザーディエンド',
 'エリアメール',
 'カエサル',
 'ハイキュー',
 'ピーク',
 'プロミ',
 'ライブカメラ',
 '多摩川',
 '暴風域',
 '氾濫危険水位',
 '江戸川区',
 '無事帰宅',
 '緊急速報',
 '警戒レベル4',
 '避難勧告',
 '避難場所',
 '避難所',
 '避難指示',
 '避難準備',
 '避難警報',
 '雨と風',
 '雨漏り'}

In [26]:
for loc in ['world','us','san diego']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
プロミ,#ProvaDeFogo,避難指示,避難警報,#台風だけど出社させた企業,避難勧告,エリアメール,#BigilTrailerDay,#اليوم_العالمي_للرسايل,ほん怖,#MaineCarloOnEB,無事帰宅,Ryan Zimmerman,Juanfer,#ShaneXJeffree,#激獣神祭,ライブカメラ,雨漏り,#PrayForJapan,#台風19,多摩川,アナザーディエンド,#cumartesi,#台風怖すぎるから会いたい人たち,うんこ10連ガチャ,緊急速報,カエサル,Bayley,Chocked,ハイキュー,氾濫危険水位,Ralf,暴風域,#TheReadOnFuse,雨と風,避難準備,#SaturdayMotivation,#كلمه_وقولها_للحب,Robert Forster,避難場所,#PVLCollegiateConference2019,#TeamNeilEnPremiosTelehit,#QuestionMark,避難所,#Eliud159,警戒レベル4,江戸川区,#WWEDraft,Anibal Sanchez,ピーク
('----------', 'us')
Just a Theory,#loveafterlockup,Chixtape 5,Bryce Perkins,#LandmarkSongOrShow,#PFLPlayoffs,#BreakingBadAFilm,#CartoonCandies,#fridaynight,#wisfb,Kevin McAleenan,End of 3rd,#ShaneXJeffree,#UVAvsMIA,With Love 2,Like Mike,where's rudy,Mikolas,Montez,#wafbscores,#ShepardSmith,#TrumpTaxDeductions,#CUvsORE,no hitter,#Dateline,Bayley,Chocked,#GhostNation,#BMovieManiacs,#TheReadOnFuse,#NLCS,#TheU,Enos,#FoundInTheTrunk,#SorryChuckTodd,Gobert,The Descendants,Robert

In [27]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'#ShaneXJeffree', 'Bayley', 'Chocked', 'Robert Forster', '#QuestionMark', '#TheReadOnFuse', '#WWEDraft', 'Anibal Sanchez'}
{'Just a Theory', '#loveafterlockup', 'Chixtape 5', '#LandmarkSongOrShow', '#PFLPlayoffs', '#fridaynight', '#CartoonCandies', '#BreakingBadAFilm', 'End of 3rd', '#ShaneXJeffree', '#UVAvsMIA', 'With Love 2', 'Like Mike', "where's rudy", 'Mikolas', 'Montez', '#wafbscores', '#TrumpTaxDeductions', 'no hitter', '#Dateline', 'Bayley', 'Chocked', '#GhostNation', '#BMovieManiacs', '#TheReadOnFuse', '#NLCS', '#TheU', 'Enos', '#FoundInTheTrunk', '#SorryChuckTodd', '#iahsfb', 'Breeland', '#QuestionMark', '#BlueBloods', 'Empire Records', 'daniel hudson', 'Buffs', '#WWEDraft', '#WSHvsSTL', '#SignsYouAreSexy', 'Anibal Sanchez', '#CUvsORE'}


In [28]:
type (trends_set['world'])

set

## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter

search.tweets( )<br>
q: search query of 500 characters maximum<br>
count: The number of tweets to return per page, up to a maximum of 100.

In [29]:
q = '#MTVAwards'  # topic

number = 100 # most recent

# See https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses'] # [{'created_at':  ,'text':  ,... }, { }...]

In [19]:
type (search_results) # dict-like

twitter.api.TwitterDictResponse

In [20]:
search_results.keys()

dict_keys(['statuses', 'search_metadata'])

In [21]:
len(statuses)


98

In [22]:
print(statuses) # twitters that we searched

[{'created_at': 'Fri Oct 11 10:32:10 +0000 2019', 'id': 1182604798924906496, 'id_str': '1182604798924906496', 'text': 'RT @MTVAwards: 😱MOST FRIGHTENED PERFORMANCE GOES TO....😱\n\n#SandraBullock in "Bird Box" #MTVAwards https://t.co/P2qNbSfZBQ', 'truncated': False, 'entities': {'hashtags': [{'text': 'SandraBullock', 'indices': [58, 72]}, {'text': 'MTVAwards', 'indices': [87, 97]}], 'symbols': [], 'user_mentions': [{'screen_name': 'MTVAwards', 'name': 'Movie & TV Awards', 'id': 834116685577740290, 'id_str': '834116685577740290', 'indices': [3, 13]}], 'urls': [], 'media': [{'id': 1140787812209266689, 'id_str': '1140787812209266689', 'indices': [98, 121], 'media_url': 'http://pbs.twimg.com/media/D9TknldU4AEy2OY.jpg', 'media_url_https': 'https://pbs.twimg.com/media/D9TknldU4AEy2OY.jpg', 'url': 'https://t.co/P2qNbSfZBQ', 'display_url': 'pic.twitter.com/P2qNbSfZBQ', 'expanded_url': 'https://twitter.com/MTVAwards/status/1140788314238148609/video/1', 'type': 'photo', 'sizes': {'thumb': {'w': 15

Twitter often returns duplicate results (for example: people retweet the same tweet so the message would be the same), we can filter them out checking for duplicate texts:

In [23]:
all_text = []
filtered_statuses = []
for s in statuses: # s: dict
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [24]:
len(statuses)

69

In [25]:
[s['text'] for s in search_results['statuses']] # s: dict

['RT @MTVAwards: 😱MOST FRIGHTENED PERFORMANCE GOES TO....😱\n\n#SandraBullock in "Bird Box" #MTVAwards https://t.co/P2qNbSfZBQ',
 'RT @MTVAwards: "Family is what you fight for. Family is what you protect." - #SandraBullock ❤️\n\n#MTVAwards',
 'RT @MTV: Sandra Bullock takes home the golden popcorn for her Most Frightened Performance in #BirdBox! #MTVAwards https://t.co/YJ3R6exUkc',
 'RT @sanbullockdaily: #SandraBullock just took home the Golden Popcorn for Most Frightened Performance at the #MTVAwards for her role in #Ne…',
 'RT @theblackpanther: Congratulations to @ChadwickBoseman for winning tonight’s @MTV Movie Award for “Best Performance in a Movie” as T’Chal…',
 'RT @theblackpanther: “@TheBlackPanther” has won “Best Movie” at the @MTV Awards! Thank you to all of the fans for your support. #WakandaFor…',
 'RT @Variety: #StrangerThings @noah_schnapp wants to meet @Zendaya: "I love her" #MTVAwards https://t.co/WUK5j3LLqV https://t.co/SK2bIPNmWQ',
 'RT @MTVAwards: NEVER FORGET that @xti

In [26]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Fri Oct 11 10:32:10 +0000 2019",
 "id": 1182604798924906496,
 "id_str": "1182604798924906496",
 "text": "RT @MTVAwards: \ud83d\ude31MOST FRIGHTENED PERFORMANCE GOES TO....\ud83d\ude31\n\n#SandraBullock in \"Bird Box\" #MTVAwards https://t.co/P2qNbSfZBQ",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "SandraBullock",
    "indices": [
     58,
     72
    ]
   },
   {
    "text": "MTVAwards",
    "indices": [
     87,
     97
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [
   {
    "screen_name": "MTVAwards",
    "name": "Movie & TV Awards",
    "id": 834116685577740290,
    "id_str": "834116685577740290",
    "indices": [
     3,
     13
    ]
   }
  ],
  "urls": [],
  "media": [
   {
    "id": 1140787812209266689,
    "id_str": "1140787812209266689",
    "indices": [
     98,
     121
    ],
    "media_url": "http://pbs.twimg.com/media/D9TknldU4AEy2OY.jpg",
    "media_url_https": "https://pbs.twimg.com/media/D9TknldU4AEy2OY.jpg",
    "url"

In [27]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[0] # t: dict
#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])


171
False


## Example 6. Extracting text, screen names, and hashtags from tweets

In [31]:
status_texts = [ status['text'] 
                 for status in statuses ]
# Screen name is the username of the twitter account.
screen_names = [ user_mention['screen_name'] 
                 for status in statuses # status:{}
                     for user_mention in status['entities']['user_mentions'] ]  # like a hierarchy database

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ] # split extracted text into words

In [32]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "RT @MTVteenwolf: \u2728 me and my pack watching the #MTVAwards tonight at 9/8c \u2728 https://t.co/ERQpzN9zWt",
 "RT @CMBYNFilm: Congratulations to @RealChalamet on his \n@MTVAwards nomination for Best Performance in a Movie. Vote today: https://t.co/SVb\u2026",
 "RT @MTVAwards: \ud83d\ude31MOST FRIGHTENED PERFORMANCE GOES TO....\ud83d\ude31\n\n#SandraBullock in \"Bird Box\" #MTVAwards https://t.co/P2qNbSfZBQ",
 "RT @MTVAwards: \"Family is what you fight for. Family is what you protect.\" - #SandraBullock \u2764\ufe0f\n\n#MTVAwards",
 "RT @MTV: Sandra Bullock takes home the golden popcorn for her Most Frightened Performance in #BirdBox! #MTVAwards https://t.co/YJ3R6exUkc"
]
[
 "MTVteenwolf",
 "CMBYNFilm",
 "RealChalamet",
 "MTVAwards",
 "MTVAwards"
]
[
 "MTVAwards",
 "SandraBullock",
 "MTVAwards",
 "SandraBullock",
 "MTVAwards"
]
[
 "RT",
 "@MTVteenwolf:",
 "\u2728",
 "me",
 "and"
]


## Example 7. Creating a basic frequency distribution from the words in tweets

In [33]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print (type(c)) # a list of tuples
    print(c.most_common()[:10]) # top 10, a list of tuples
    print()

<class 'collections.Counter'>
[('RT', 73), ('the', 70), ('#MTVAwards', 56), ('at', 47), ('to', 31), ('in', 29), ('of', 27), ('for', 23), ('and', 20), ('@MTV:', 19)]

<class 'collections.Counter'>
[('MTV', 23), ('MTVAwards', 20), ('MTVteenwolf', 6), ('xtina', 6), ('LilKim', 6), ('MYAPLANET9', 6), ('Pink', 6), ('G_R_IND', 4), ('ITMovieOfficial', 4), ('theblackpanther', 3)]

<class 'collections.Counter'>
[('MTVAwards', 59), ('tidal', 7), ('revolt', 7), ('ITMovie', 6), ('massappeal', 5), ('grammyawards', 5), ('SandraBullock', 4), ('mtvawards', 3), ('arsenal', 3), ('AB6IX', 3)]



## Example 8. Create a prettyprint function to display tuples in a nice tabular format

In [31]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^20} | {:^6}".format(label, "Count")) # \n: write in a new line, {:^20}: format centered within 20 
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:20} | {:>6}".format(k,v))  # k,v : name and count number, {:>6}: right align

In [32]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
RT                   |     53
the                  |     46
#MTVAwards           |     38
in                   |     27
to                   |     27
at                   |     25
for                  |     23
of                   |     15
@MTV:                |     13
*                    |     12

    Screen Name      | Count 
****************************************
MTV                  |     18
MTVAwards            |     12
theblackpanther      |      3
G_R_IND              |      3
ladygaga             |      3
MTVteenwolf          |      3
noah_schnapp         |      2
brielarson           |      2
iamKingLos           |      2
RoddyRicch           |      2

      Hashtag        | Count 
****************************************
MTVAwards            |     42
tidal                |      6
revolt               |      6
grammyawards         |      5
SandraBullock        |      4
massappeal           |      4
mtva

## Example 9. Finding the most popular retweets (sort the status['retweet_count'])

Retweets always contain two Tweet objects. <br>
The 'original' Tweet being Retweeted is provided in a "retweeted_status" object.

In [33]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'],  # the popularity of a twitter
             status['retweeted_status']['user']['screen_name'], #This is the original user name. General: status['entities']['user_mentions']
             # check https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
             status['text'].replace("\n","\\"))  # str.replace(old, new), here it merges text to least lines
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status # ensure it's a retwitter, object structure is different
           ]

In [34]:
retweets[:2]

[(171,
  'MTVAwards',
  'RT @MTVAwards: 😱MOST FRIGHTENED PERFORMANCE GOES TO....😱\\\\#SandraBullock in "Bird Box" #MTVAwards https://t.co/P2qNbSfZBQ'),
 (76,
  'MTVAwards',
  'RT @MTVAwards: "Family is what you fight for. Family is what you protect." - #SandraBullock ❤️\\\\#MTVAwards')]

We can build another `prettyprint` function to print entire tweets with their retweet count.

We also want to split the text of the tweet in up to 3 lines, if needed.

In [35]:
str_test = 'I love you,\n but I do not know how' 
print (str_test)
print (str_test.replace("\n","\\"))

print ('It\'s raining') # escape the second '
#print ('It's raining')
print ('\hello') # the escape sequence cannot be recognized 

I love you,
 but I do not know how
I love you,\ but I do not know how
It's raining
\hello


Python uses **the backslash '\\' : 1. signals a special sequence or 2. as an escape sequence**; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string.<br>
https://www.pitt.edu/~naraehan/python2/tutorial7.html

In [36]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print() # an empty line
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50])) # split the text of a Tweet (140 characters) into 3 lines with 50 characters each
        if len(text) > 50:
            print(row_template.format("", "", text[50:100])) # align the remained text to the next row
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [37]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10]) # sort retweets by the first variable: Count


 Count  |   Screen Name   | Text                                              
************************************************************
 13893  |    getFANDOM    | RT @getFANDOM: Chris Pratt with all of the wisdom 
        |                 | #MTVAwards https://t.co/eu5cXU7WcQ                
 9307   |    ladygaga     | RT @ladygaga: So happy that #GagaFiveFootTwo won B
        |                 | est Music Documentary at the #MTVAwards! Thank u L
        |                 | ittle Monsters &amp; @MTV!! 😘 https://t.co/…      
 6588   |       MTV       | RT @MTV: YES! Girl Power!! @brielarson #MTVAwards 
        |                 | https://t.co/QHBKbpNJe0                           
 5704   | theblackpanther | RT @theblackpanther: “@TheBlackPanther” has won “B
        |                 | est Movie” at the @MTV Awards! Thank you to all of
        |                 |  the fans for your support. #WakandaFor…          
 4529   |       MTV       | RT @MTV: Congratulations to @RobertDowney

RT @userName:<br>
means it is a retweet message.