# Tweezers
A lightweight Python library for really simple data scraping from the Twitter API.

In [1]:
import os, sys
sys.path.append(os.path.dirname(os.getcwd()))

from tweezers import Tweezers

### Create an instance of Tweezers with API credentials

In [2]:
# Loading Twitter auth credentials from a local JSON file. Get yours here:
# https://developer.twitter.com/en/apps/
import json

fp = os.path.join(os.getcwd(), "credentials.json")
with open(fp) as f:
    credentials = json.load(f)

t = Tweezers(api_key=credentials["api_key"], 
                 api_secret_key=credentials["api_secret_key"], 
                 access_token=credentials["access_token"], 
                 access_token_secret=credentials["access_token_secret"]
                )
print(t)

Tweezers instance with status code 200


### Perform a search
Searching returns an instance of a class `TweezerSearch`, which contains various data attributes returned by the Twitter API:

In [3]:
s = t.search(search_term="bitcoin", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [4]:
# All the tweet results are returned in a Pandas DataFrame:
s.results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,pinoy287,"From now on, I am letting God and Jesus Christ...","From now on, I am letting God and Jesus Christ...",[https://t.co/OTGtghSbgc],[],[],2020-03-04 03:29:44,0,0,0.3,0.5625,"(None, None)"
1,Ladin21201021,@bgarlinghouse And that is the reason why you ...,bgarlinghouse And that is the reason why you h...,[],[],[@bgarlinghouse],2020-03-04 03:29:39,0,0,0.0,0.0,"(None, None)"
2,CoinFees,Coin fees for the past hour:\nBitcoin fees: $0...,Coin fees for the past hour:\nBitcoin fees: $0...,[https://t.co/qCbFAd2jMk],[],[],2020-03-04 03:29:35,0,0,-0.125,0.125,"(None, None)"
3,bitcoinagile,All eyes on #bitcoin for BINANCE:BTCUSDT by ja...,All eyes on bitcoin for BINANCE:BTCUSDT by jas...,"[https://t.co/k3ucD1s2Fp, https://t.co/czPYTvt...","[bitcoin, BTCUSDT]",[],2020-03-04 03:29:32,0,0,0.0,0.0,"(None, None)"
4,antpool,Math indeed. #Bitcoin @pierre_rochard https://...,Math indeed. Bitcoin pierre_rochard,[https://t.co/KFF5Pw4wOL],[Bitcoin],[@pierre],2020-03-04 03:29:28,0,0,0.0,0.0,"(None, None)"


The columns containing lists (`urls`, `hashtags`, `ats`), can be counted using the `count_list_col_values` function:

In [5]:
s.count_list_col_values("ats").head()

@Bitcoin          86
@PeterSchiff      81
@ahollander314    12
@bgarlinghouse    10
@girls             7
dtype: int64

The full JSON file is also stored as an attribute:

In [6]:
print(s.results_json[0]["text"])

From now on, I am letting God and Jesus Christ plan for my future without me forcing anything. It is for my own goo… https://t.co/OTGtghSbgc


Get an estimate of the average tweet frequency for the search term:

In [7]:
print(s.time_per_tweet)

0 days 00:00:04.806000


Get an estimate of the number of tweets per week about the search term:

In [8]:
print(f"{s.tweets_per_week:,}")

151,200


To facilitate Natural Language Processing, the `results_df` contains a column of the tweet texts, but with URLs, hashtag symbols, and @ symbols removed:

In [9]:
print(s.results_df["tweet"][0])

From now on, I am letting God and Jesus Christ plan for my future without me forcing anything. It is for my own goo… https://t.co/OTGtghSbgc


In [10]:
print(s.results_df["stripped_tweet"][0])

From now on, I am letting God and Jesus Christ plan for my future without me forcing anything. It is for my own goo…


`results_df` also contains a simple implementation of sentiment analysis using  <a href="https://github.com/sloria/TextBlob">TextBlob</a>

In [11]:
s.results_df["polarity"].head()

0    0.300
1    0.000
2   -0.125
3    0.000
4    0.000
Name: polarity, dtype: float64

In [12]:
s.results_df["subjectivity"].head()

0    0.5625
1    0.0000
2    0.1250
3    0.0000
4    0.0000
Name: subjectivity, dtype: float64

### Multiple searches
If performing multiple different searches on the same instance of `Tweezers`, a history of the searches is stored at the `search_history` attribute:

In [13]:
new_s = t.search("elizabeth warren", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [14]:
t.search_history

[TweezerSearch: `bitcoin` (1000 total),
 TweezerSearch: `elizabeth warren` (1000 total)]

In [15]:
t.search_history[-1].results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,Toby_Says,Bernie bros please CALM THE F*CK down. Elizabe...,Bernie bros please CALM THE F*CK down. Elizabe...,[https://t.co/8sR1P5bD4r],[],[],2020-03-04 03:30:22,0,0,0.23254,0.615873,"(None, None)"
1,BroBrahSama,Elizabeth Warren kneecapped the progressive mo...,Elizabeth Warren kneecapped the progressive mo...,[https://t.co/UCKYAdoIjA],[],[],2020-03-04 03:30:21,0,0,0.466667,0.466667,"(None, None)"
2,Blade_kxy,You guys are all voting freaking bernie sander...,You guys are all voting freaking bernie sander...,[https://t.co/Gr6Nidg8HS],[],[],2020-03-04 03:30:21,0,0,0.0,0.0,"(None, None)"
3,landishelwig,S/O to Elizabeth Warren for smothering progres...,S/O to Elizabeth Warren for smothering progres...,[https://t.co/B5lDeWXDlb],[],[],2020-03-04 03:30:18,0,0,0.05,0.2625,"(None, None)"
4,GreenRevanchist,Elizabeth Warren destroyed the planet.,Elizabeth Warren destroyed the planet.,[],[],[],2020-03-04 03:30:16,0,0,0.0,0.0,"(None, None)"
