# Tweezers
A lightweight Python library for really simple data scraping from the Twitter API.

In [1]:
import os, sys
sys.path.append(os.path.dirname(os.getcwd()))

from tweezers import Tweezers

### Create an instance of Tweezers with API credentials

In [2]:
# Loading Twitter auth credentials from a local JSON file. Get yours here:
# https://developer.twitter.com/en/apps/
import json

fp = os.path.join(os.getcwd(), "credentials.json")
with open(fp) as f:
    credentials = json.load(f)

t = Tweezers(api_key=credentials["api_key"], 
             api_secret_key=credentials["api_secret_key"], 
             access_token=credentials["access_token"], 
             access_token_secret=credentials["access_token_secret"]
            )
print(t)

Tweezers instance with status code 200


### Perform a search
Searching returns an instance of a class `TweezerSearch`, which contains various data attributes returned by the Twitter API:

In [3]:
s = t.search(search_term="bitcoin", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [4]:
# All the tweet results are returned in a Pandas DataFrame:
s.results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,Misbahu47726195,Dowload &amp; Install Brave Browser. Get BAT R...,Dowload &amp; Install Brave Browser. Get BAT R...,"[https://t.co/2c7ZctrkdF, https://t.co/q2q4jbW...","[browser, brave, BATtoken]",[],2020-03-04 04:26:49,0,0,0.9,1.0,"(None, None)"
1,miiyuwa,#Bitcoin is inevitable. https://t.co/4zQow82PIH,Bitcoin is inevitable.,[https://t.co/4zQow82PIH],[Bitcoin],[],2020-03-04 04:26:49,0,0,0.0,1.0,"(None, None)"
2,bitcoinvaluebot,Current Bitcoin Price\nAll Forks = $9392.02 📈 ...,Current Bitcoin Price\nAll Forks = $9392.02 📈 ...,[],[],[],2020-03-04 04:26:32,0,0,0.0,0.4,"(None, None)"
3,danlyke,This is kind of a point: We worry about the en...,This is kind of a point: We worry about the en...,[https://t.co/LxBytKpBXg],[],[],2020-03-04 04:26:29,0,0,0.4,0.55,"(None, None)"
4,botbaitclick,10 ways to maximise your wealth. #bitcoin #cli...,10 ways to maximise your wealth. bitcoin click...,[],"[bitcoin, clickbait]",[],2020-03-04 04:26:18,0,0,0.0,0.0,"(None, None)"


The columns containing lists (`urls`, `hashtags`, `ats`), can be counted using the `count_list_col_values` function:

In [5]:
s.count_list_col_values("ats").head()

@Bitcoin          74
@PeterSchiff      64
@binance          15
@bgarlinghouse     9
@ethereum          6
dtype: int64

The full JSON file is also stored as an attribute:

In [6]:
print(s.results_json[0]["text"])

Dowload &amp; Install Brave Browser. Get BAT Reward! Download here https://t.co/2c7ZctrkdF #browser #brave #BATtoken… https://t.co/q2q4jbW2EC


Get an estimate of the average tweet frequency for the search term:

In [7]:
print(s.time_per_tweet)

0 days 00:00:05.418000


Get an estimate of the number of tweets per week about the search term:

In [8]:
print(f"{s.tweets_per_week:,}")

120,960


To facilitate Natural Language Processing, the `results_df` contains a column of the tweet texts, but with URLs, hashtag symbols, and @ symbols removed:

In [9]:
print(s.results_df["tweet"][0])

Dowload &amp; Install Brave Browser. Get BAT Reward! Download here https://t.co/2c7ZctrkdF #browser #brave #BATtoken… https://t.co/q2q4jbW2EC


In [10]:
print(s.results_df["stripped_tweet"][0])

Dowload &amp; Install Brave Browser. Get BAT Reward! Download here browser brave BATtoken…


`results_df` also contains a simple implementation of sentiment analysis using  <a href="https://github.com/sloria/TextBlob">TextBlob</a>

In [11]:
s.results_df["polarity"].head()

0    0.9
1    0.0
2    0.0
3    0.4
4    0.0
Name: polarity, dtype: float64

In [12]:
s.results_df["subjectivity"].head()

0    1.00
1    1.00
2    0.40
3    0.55
4    0.00
Name: subjectivity, dtype: float64

### Multiple searches
If performing multiple different searches on the same instance of `Tweezers`, a history of the searches is stored at the `search_history` attribute:

In [13]:
new_s = t.search("elizabeth warren", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [14]:
t.search_history

[TweezerSearch: `bitcoin` (1000 total),
 TweezerSearch: `elizabeth warren` (1000 total)]

In [15]:
t.search_history[-1].results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,RLubbin,@realDonaldTrump @BenSotoKarass The other Demo...,realDonaldTrump BenSotoKarass The other Democr...,[https://t.co/tvNmSQQgsu],[],"[@realDonaldTrump, @BenSotoKarass]",2020-03-04 04:27:13,0,0,-0.125,0.375,"(None, None)"
1,5mintuesTurkish,"Bernie would of won Maine, Massachusetts &amp;...","Bernie would of won Maine, Massachusetts &amp;...",[https://t.co/dl4Ufs4SBN],[],[],2020-03-04 04:27:12,0,0,-0.2,0.8,"(None, None)"
2,The_Mr_Innocent,The actual audacity of Elizabeth Warren to act...,The actual audacity of Elizabeth Warren to act...,[],[],[],2020-03-04 04:27:10,0,0,0.3,0.55,"(None, None)"
3,nate_kersey,pouring the entire bottle out for my girl Eliz...,pouring the entire bottle out for my girl Eliz...,[https://t.co/bWGOrZXYRk],[],[],2020-03-04 04:27:10,0,0,-0.097222,0.422222,"(None, None)"
4,gafespec,Elizabeth Warren is refusing to drop out despi...,Elizabeth Warren is refusing to drop out despi...,[https://t.co/Sl1DmrXvYp],[],[],2020-03-04 04:27:07,0,0,0.0,0.0,"(None, None)"
