## Table of Contents
- [Introduction](#intro)
- [Part I - Data Gathering](#gathering)
- [Part II - Data Accessing](#accessing)
- [Part III - Data exploratory](#explore)

In [1]:
#import necessary libraries
import numpy as np
import pandas as pd
import requests
import os
import tweepy
import json

<a id='gathering'></a>
### Data Gathering


- [1. get twitter_archive_enhanced.csv](#getdf1)
- [2. get image_predictions.tsv online](#getdf2)
- [3. get tweet_json.txt with tweey api](#getdf3)

<a id='getdf1'></a>
##### 1. get `twitter_archive_enhanced.csv`

In [2]:
#get twitter_archive_enhanced.csv
df1 = pd.read_csv('twitter-archive-enhanced.csv')
df1.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,


<a id='getdf2'></a>
##### 2. get `image_predictions.tsv` online

In [3]:
#get image_predictions.tsv

url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)
with open (os.path.join(url.split('/')[-1]), mode = 'wb') as file:
    file.write(response.content)
    
df2 = pd.read_csv('image-predictions.tsv', sep = '\t')
df2.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.596461,True,malinois,0.138584,True,bloodhound,0.116197,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.408143,True,redbone,0.360687,True,miniature_pinscher,0.222752,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.560311,True,Rottweiler,0.243682,True,Doberman,0.154629,True


<a id='getdf3'></a>
##### 3. get `tweet_json.txt` with tweey api
##### And read this .txt file line by line into a pandas DataFrame `df3`

In [4]:


twitter_keys = {
        'consumer_key':        '*****',
        'consumer_secret':     '*****',
        'access_token_key':    '*****',
        'access_token_secret': '*****'
    }


auth = tweepy.OAuthHandler(twitter_keys['consumer_key'], twitter_keys['consumer_secret'])
auth.set_access_token(twitter_keys['access_token_key'], twitter_keys['access_token_secret'])
api = tweepy.API(auth,wait_on_rate_limit=True, wait_on_rate_limit_notify=True)


In [5]:

if os.path.exists('tweet_json.txt'):
     os.remove('tweet_json.txt')

with open (os.path.join('tweet_json.txt'), mode = 'a') as file:
     for i in twi_a_e.tweet_id:
        try:
             json.dump(api.get_status(i,tweet_mode='extended')._json, file)
             file.write('\n')
        except:
             continue
        
        
        

In [6]:

tweet_id = []
retweet_count = []
favorite_count = []

with open('tweet_json.txt', mode = 'r') as file:
    lines = file.readlines()
    for line in lines:
        jsonfile = json.loads(line)
        tweet_id.append(jsonfile["id"])
        retweet_count.append(jsonfile["retweet_count"])
        favorite_count.append(jsonfile["favorite_count"])
        
        

In [7]:
df3 = pd.DataFrame({'tweet_id': tweet_id, 'favorite_count':favorite_count, 'retweet_count': retweet_count})
df3.head()

Unnamed: 0,tweet_id,favorite_count,retweet_count
0,892420643555336193,35266,7436
1,892177421306343426,30519,5527
2,891815181378084864,22950,3648
3,891689557279858688,38555,7613
4,891327558926688256,36833,8195


In [8]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2331 entries, 0 to 2330
Data columns (total 3 columns):
tweet_id          2331 non-null int64
favorite_count    2331 non-null int64
retweet_count     2331 non-null int64
dtypes: int64(3)
memory usage: 54.7 KB


<a id='accessing'></a>
### Data Assessing & Cleaning

In this part, we do data assessing and data cleaning iteratively. First we access tidiness issues and fix those issues, then we examine quality issues and clean all of them. Last, we store the clean DataFrame in a CSV file with the main one named `twitter_archive_master.csv`.


### Data Assessing

In order to better examine the quality issue, the tidiness issue will be solved firstly.

##### 2 tidiness issues:
- types of dogs take several columns( doggo, floofer, pupper, puppo)
- Too many tables.


In [9]:
df1.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,


In [10]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [11]:
df2.head(2)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True


In [12]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


In [13]:
df3.head()

Unnamed: 0,tweet_id,favorite_count,retweet_count
0,892420643555336193,35266,7436
1,892177421306343426,30519,5527
2,891815181378084864,22950,3648
3,891689557279858688,38555,7613
4,891327558926688256,36833,8195


In [14]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2331 entries, 0 to 2330
Data columns (total 3 columns):
tweet_id          2331 non-null int64
favorite_count    2331 non-null int64
retweet_count     2331 non-null int64
dtypes: int64(3)
memory usage: 54.7 KB


### Data Cleaning

In order to better examine the quality issue, the tidiness issue will be solved firstly.

##### fix tidiness issue

> - Combine three tables together based on tweet_id.
> - Combine four columns together.


In [15]:
# combine three tables together on tweet_id.
df = pd.merge(pd.merge(df1,df2,how = 'left', on='tweet_id'),df3, how = 'left', on = 'tweet_id')

In [16]:
#Combine four columns together.
df.dog_type = df.doggo + df.floofer + df.pupper + df.puppo
df.dog_type = df.dog_type.apply(lambda x : x.replace('None',''))
df.dog_type = df.dog_type.apply(lambda x : None if (x == '') else x)
df.dog_type = df.dog_type.apply(lambda x :'doggo, floofer' if (x == 'doggofloofer') else x)
df.dog_type = df.dog_type.apply(lambda x : 'doggo, pupper' if (x == 'doggopupper') else x)
df.dog_type = df.dog_type.apply(lambda x : 'doggo, puppo' if (x == 'doggopuppo') else x)
df.dog_type.value_counts()

  


pupper            245
doggo              83
puppo              29
doggo, pupper      12
floofer             9
doggo, floofer      1
doggo, puppo        1
dtype: int64

In [17]:
#eliminate doggo, pupper, floofer, puppo columns.
df = df.drop(['doggo','pupper','floofer','puppo'], axis = 1)

### Data Assessing

Since the tidiness issue has been solved, we come back to data Assessing part to check the quality issues. 

##### 8 quality issues:
-  some tweets have no images or they are not the original tweet.
-  incomplete and redundant columns: `in_reply_to_status_id`, `in_reply_to_user_id`,`retweeted_status_id`, `retweeted_status_user_id`, `retweeted_status_timestamp`
- `timestamp` and `retweeted_status_timestamp` datatype and format issue 
- `favorite_count` and `retweet_count` datatype issue
-  mistakes in `name` column: something like `a`, `an`, `the`,`all`, can be detected via islower() function.
-  numerator and denominator mistakes on the tweet with tweet ID = `810984652412424192` and `740373189193256964` and `722974582966214656` and `716439118184652801` and `682962037429899265` and `666287406224695296`: something like 24/7, 9/11, 50/50
-  issues on `p1_dog == False` while it actually True.
-  dog stage issue.

##### Checked:

- no duplicated `tweet_id`
- `in_reply_to_status_id`, `in_reply_to_user_id`, `retweeted_status_id`, `retweeted_status_user_id` and `retweeted_status_timestamp` are not quite useful in our analysis since they have only a few records and they are also in a wrong datatype. In order to make our table tidier, it's better remove those columns.
- 


In [18]:
df.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,...,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,favorite_count,retweet_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,...,0.097049,False,bagel,0.085851,False,banana,0.07611,False,35266.0,7436.0


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2356 entries, 0 to 2355
Data columns (total 26 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
jpg_url                       2075 non-null object
img_num                       2075 non-null float64
p1                            2075 non-null object
p1_conf                       2075 non-null float64
p1_dog                        

In [20]:
df.tweet_id.nunique()

2356

In [21]:
#df.in_reply_to_status_id.unique()

In [22]:
#df.in_reply_to_user_id.unique()

In [23]:
#df.retweeted_status_id.unique()

In [24]:
#df.rating_numerator.value_counts()

In [77]:
df.rating_denominator.value_counts()

10     1944
80        2
50        2
11        2
150       1
130       1
120       1
110       1
90        1
70        1
40        1
20        1
7         1
Name: rating_denominator, dtype: int64

In [69]:
with pd.option_context('max_colwidth', 400):
     display(df[df['rating_denominator'] != 10]
            [['tweet_id', 'text', 'rating_numerator', 'rating_denominator']])

Unnamed: 0,tweet_id,text,rating_numerator,rating_denominator
433,820690176645140481,The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd,84,70
516,810984652412424192,Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. \nKeep Sam smiling by clicking and sharing this link:\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx,24,7
902,758467244762497024,Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE,165,150
1068,740373189193256964,"After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ",9,11
1165,722974582966214656,Happy 4/20 from the squad! 13/10 for all https://t.co/eV1diwds8a,4,20
1202,716439118184652801,This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq,50,50
1228,713900603437621249,Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1,99,90
1254,710658690886586372,Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12,80,80
1274,709198395643068416,"From left to right:\nCletus, Jerome, Alejandro, Burp, &amp; Titson\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK",45,50
1433,697463031882764288,Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ,44,40


In [34]:
#df[df.name.str.islower()].name

22              such
56                 a
169            quite
193            quite
369              one
542       incredibly
649                a
759               an
801                a
819             very
822             just
852               my
924              one
988              not
992              his
993              one
1002               a
1004               a
1017               a
1025              an
1031            very
1049               a
1063            just
1071         getting
1097            very
1120            this
1121    unacceptable
1138             all
1193               a
1207               a
            ...     
2161               a
2191               a
2198               a
2204              an
2211               a
2218               a
2222               a
2235               a
2249               a
2255               a
2264               a
2273               a
2287               a
2304               a
2311               a
2314               a
2326         

In [28]:
df.img_num.value_counts()

1.0    1780
2.0     198
3.0      66
4.0      31
Name: img_num, dtype: int64

In [106]:
df.query('p1_dog == False').sample(10)[['p1','p1_conf','jpg_url']]

Unnamed: 0,p1,p1_conf,jpg_url
1010,binoculars,0.192717,https://pbs.twimg.com/media/Cl-EXHSWkAE2IN2.jpg
57,tusker,0.473303,https://pbs.twimg.com/media/DDrk-f9WAAI-WQv.jpg
703,teddy,0.97207,https://pbs.twimg.com/media/CugtKeXWEAAamDZ.jpg
1177,home_theater,0.059033,https://pbs.twimg.com/media/CfznaXuUsAAH-py.jpg
1616,dishwasher,0.888829,https://pbs.twimg.com/media/CYJQxvJW8AAkkws.jpg
580,teddy,0.311928,https://pbs.twimg.com/media/CxvNfrhWQAA2hKM.jpg
1690,seat_belt,0.532441,https://pbs.twimg.com/media/CXSanNkWkAAqR9M.jpg
2299,jigsaw_puzzle,0.560001,https://pbs.twimg.com/media/CUHkkJpXIAA2w3n.jpg
2247,common_iguana,0.999647,https://pbs.twimg.com/media/CUTDtyGXIAARxus.jpg
1530,badger,0.28955,https://pbs.twimg.com/media/CZNzV6cW0AAsX7p.jpg


1727    <a href="http://twitter.com/download/iphone" r...
Name: source, dtype: object

### Data Cleaning

##### Recall 8 quality issues:
-  some tweets have no images or they are not the original tweet.
-  incomplete and redundant columns: `in_reply_to_status_id`, `in_reply_to_user_id`,`retweeted_status_id`, `retweeted_status_user_id`, `retweeted_status_timestamp`
- `timestamp` datatype and format issue 
- `favorite_count` and `retweet_count` datatype issue
-  mistakes in `name` column: something like `a`, `an`, `the`,`all`, can be detected via islower() function.
-  numerator and denominator mistakes on the tweet with tweet ID = `810984652412424192` and `740373189193256964` and `722974582966214656` and `716439118184652801` and `682962037429899265` and `666287406224695296`: something like 24/7, 9/11, 50/50
-  issues on `p1_dog == False` while it is actually True.
-  dog stage issue.


##### fix quality issue

> - Remove redundant columns.
> - eliminate the tweets without a image or not original tweet.
> - change `timestamp` datatype and format.
> - change `favorite_count` and `retweet_count` datatype to int.
> - remove rows with name lowercase.
> - remove detected numerator and denominator mistakes.
> - 

**1. Remove redundant columns.**

In [30]:
df = df.drop(['in_reply_to_status_id','in_reply_to_user_id','retweeted_status_id','retweeted_status_user_id','retweeted_status_timestamp'], axis = 1)

**2. eliminate the tweets without a image or not original tweet.**

In [35]:
#eliminate the tweets without a image or not original tweet.
df = df[df.retweet_count.notna()]
df = df[df.img_num.notna()]
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2059 entries, 0 to 2355
Data columns (total 21 columns):
tweet_id              2059 non-null int64
timestamp             2059 non-null object
source                2059 non-null object
text                  2059 non-null object
expanded_urls         2059 non-null object
rating_numerator      2059 non-null int64
rating_denominator    2059 non-null int64
name                  2059 non-null object
jpg_url               2059 non-null object
img_num               2059 non-null float64
p1                    2059 non-null object
p1_conf               2059 non-null float64
p1_dog                2059 non-null object
p2                    2059 non-null object
p2_conf               2059 non-null float64
p2_dog                2059 non-null object
p3                    2059 non-null object
p3_conf               2059 non-null float64
p3_dog                2059 non-null object
favorite_count        2059 non-null float64
retweet_count         2059 non-

**3. change `timestamp` datatype and format.**

In [51]:
# df.timestamp.head()
# df.timestamp = df.timestamp.apply(lambda x: x.split('+')[0].strip())
# df.timestamp = pd.to_datetime(df.timestamp)
# df.info()

**4. change `favorite_count` and `retweet_count` datatype to int.**

In [50]:
df.favorite_count.astype(int, inplace = True)
df.retweet_count.astype(int, inplace = True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2059 entries, 0 to 2355
Data columns (total 21 columns):
tweet_id              2059 non-null int64
timestamp             2059 non-null datetime64[ns]
source                2059 non-null object
text                  2059 non-null object
expanded_urls         2059 non-null object
rating_numerator      2059 non-null int64
rating_denominator    2059 non-null int64
name                  2059 non-null object
jpg_url               2059 non-null object
img_num               2059 non-null float64
p1                    2059 non-null object
p1_conf               2059 non-null float64
p1_dog                2059 non-null object
p2                    2059 non-null object
p2_conf               2059 non-null float64
p2_dog                2059 non-null object
p3                    2059 non-null object
p3_conf               2059 non-null float64
p3_dog                2059 non-null object
favorite_count        2059 non-null float64
retweet_count         2

**5. remove rows with name lowercase.**

In [68]:
df.drop(df[df.name.str.islower()].index,axis = 0, inplace = True)
df[df.name.str.islower()]

Unnamed: 0,tweet_id,timestamp,source,text,expanded_urls,rating_numerator,rating_denominator,name,jpg_url,img_num,...,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,favorite_count,retweet_count


**6. detected numerator and denominator mistakes**

> mistakes on tweet ID = `810984652412424192` and `740373189193256964` and `722974582966214656` and `716439118184652801` and `682962037429899265` and `666287406224695296`

In [79]:
#drop the one with no rating.
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 810984652412424192')
           [['text','rating_numerator','rating_denominator']])
df.drop(516,axis = 0,inplace = True)

Unnamed: 0,text,rating_numerator,rating_denominator
516,Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. \nKeep Sam smiling by clicking and sharing this link:\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx,24,7


In [86]:
#change the numerator and denominator mannully.
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 740373189193256964')
           [['text','rating_numerator','rating_denominator']])

df.at[1068, 'rating_numerator'] = 14
df.at[1068, 'rating_denominator'] = 10

#check
df.query('tweet_id == 740373189193256964')[['rating_numerator','rating_denominator']]


Unnamed: 0,text,rating_numerator,rating_denominator
1068,"After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ",9.0,11.0


Unnamed: 0,rating_numerator,rating_denominator
1068,14.0,10.0


In [88]:
#change the numerator and denominator manully.
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 722974582966214656')
           [['text','rating_numerator','rating_denominator']])

df.at[1165, 'rating_numerator'] = 13
df.at[1165, 'rating_denominator'] = 10

#check
df.query('tweet_id == 722974582966214656')[['rating_numerator','rating_denominator']]


Unnamed: 0,text,rating_numerator,rating_denominator
1165,Happy 4/20 from the squad! 13/10 for all https://t.co/eV1diwds8a,4.0,20.0


Unnamed: 0,rating_numerator,rating_denominator
1165,13.0,10.0


In [91]:
#change the numerator and denominator manully.
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 716439118184652801')
           [['text','rating_numerator','rating_denominator']])

df.at[1202, 'rating_numerator'] = 11
df.at[1202, 'rating_denominator'] = 10

#check
df.query('tweet_id == 716439118184652801')[['rating_numerator','rating_denominator']]


Unnamed: 0,text,rating_numerator,rating_denominator
1202,This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq,50.0,50.0


Unnamed: 0,rating_numerator,rating_denominator
1202,11.0,10.0


In [93]:
#change the numerator and denominator manully.
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 682962037429899265')
           [['text','rating_numerator','rating_denominator']])

df.at[1662, 'rating_numerator'] = 10
df.at[1662, 'rating_denominator'] = 10

#check
df.query('tweet_id == 682962037429899265')[['rating_numerator','rating_denominator']]


Unnamed: 0,text,rating_numerator,rating_denominator
1662,This is Darrel. He just robbed a 7/11 and is in a high speed police chase. Was just spotted by the helicopter 10/10 https://t.co/7EsP8LmSp5,7.0,11.0


Unnamed: 0,rating_numerator,rating_denominator
1662,10.0,10.0


In [96]:
#This one has been dropped already
with pd.option_context('max_colwidth', 400):
    display(df.query('tweet_id == 666287406224695296')
           [['text','rating_numerator','rating_denominator']])

Unnamed: 0,text,rating_numerator,rating_denominator


### Reference:
> 1. https://towardsdatascience.com/tweepy-for-beginners-24baf21f2c25