# Tweezers
A lightweight Python library for really simple data scraping from the Twitter API.

In [1]:
import os, sys
sys.path.append(os.path.dirname(os.getcwd()))

from tweezers import Tweezers

### Create an instance of Tweezers with API credentials

In [2]:
# Loading Twitter auth credentials from a local JSON file. Get yours here:
# https://developer.twitter.com/en/apps/
import json

fp = os.path.join(os.getcwd(), "credentials.json")
with open(fp) as f:
    credentials = json.load(f)

t = Tweezers(
    api_key=credentials["api_key"], 
    api_secret_key=credentials["api_secret_key"], 
    access_token=credentials["access_token"], 
    access_token_secret=credentials["access_token_secret"]
)
print(t)

Tweezers instance with status code 200


### Perform a search
Searching returns an instance of a class `TweezerSearch`, which contains various data attributes returned by the Twitter API:

In [3]:
s = t.search(search_term="bitcoin", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [4]:
# All the tweet results are returned in a Pandas DataFrame:
s.results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,bitcoinermike,Best thing to do now is to buy #bitcoin while ...,Best thing to do now is to buy bitcoin while e...,[],[bitcoin],[],2021-05-04 02:30:37,0,0,1.0,0.3,"(None, None)"
1,AmandaCryptoGal,Imagine selling all of your Bitcoin for Doge 😂,Imagine selling all of your Bitcoin for Doge 😂,[],[],[],2021-05-04 02:30:35,0,0,0.0,0.0,"(None, None)"
2,WisdomQuotesBot,Never let lack of money interfere with having ...,Never let lack of money interfere with having ...,[],"[quotes, inspiration, wisdom, bitcoin]",[],2021-05-04 02:30:34,0,0,0.3,0.2,"(None, None)"
3,pjhonson276,🤝 Follow me on @betfury. Let's hunt for Bitcoi...,🤝 Follow me on betfury. Let's hunt for Bitcoin...,[https://t.co/loWJMN0y37],[1],[@betfury],2021-05-04 02:30:33,0,0,0.3,0.8,"(None, None)"
4,delltronic2,@rocket_fuel_ @minordissent @sthenc @Wealth_Th...,rocket_fuel_ minordissent sthenc Wealth_Theory...,[https://t.co/4DJFheVdAc],[],"[@rocket, @minordissent, @sthenc, @Wealth, @Ma...",2021-05-04 02:30:32,0,0,0.0,0.0,"(None, None)"


The columns containing lists (`urls`, `hashtags`, `ats`), can be counted using the `count_list_col_values` function:

In [5]:
s.count_list_col_values("ats").head()

@APompliano       27
@AdvBMkhwebane    18
@Wealth           15
@betfury          15
@OTC              14
dtype: int64

The full JSON file is also stored as an attribute:

In [6]:
print(s.results_json[0]["text"])

Best thing to do now is to buy #bitcoin while everyone is distracted by eth, doge and the lot


Get an estimate of the average tweet frequency for the search term:

In [7]:
print(s.time_per_tweet)

0 days 00:00:01.799000


Get an estimate of the number of tweets per week about the search term:

In [8]:
print(f"{s.tweets_per_week:,}")

604,800


To facilitate Natural Language Processing, the `results_df` contains a column of the tweet texts, but with URLs, hashtag symbols, and @ symbols removed:

In [9]:
print(s.results_df["tweet"][0])

Best thing to do now is to buy #bitcoin while everyone is distracted by eth, doge and the lot


In [10]:
print(s.results_df["stripped_tweet"][0])

Best thing to do now is to buy bitcoin while everyone is distracted by eth, doge and the lot


`results_df` also contains a simple implementation of sentiment analysis using  <a href="https://github.com/sloria/TextBlob">TextBlob</a>

In [11]:
s.results_df["polarity"].head()

0    1.0
1    0.0
2    0.3
3    0.3
4    0.0
Name: polarity, dtype: float64

In [12]:
s.results_df["subjectivity"].head()

0    0.3
1    0.0
2    0.2
3    0.8
4    0.0
Name: subjectivity, dtype: float64

### Multiple searches
If performing multiple different searches on the same instance of `Tweezers`, a history of the searches is stored at the `search_history` attribute:

In [13]:
new_s = t.search("elizabeth warren", total=1000, result_type="recent")

1,000 tweets requested; 1,000 tweets returned


In [14]:
t.search_history

[TweezerSearch: `bitcoin` (1000 total),
 TweezerSearch: `elizabeth warren` (1000 total)]

In [15]:
t.search_history[-1].results_df.head()

Unnamed: 0,user,tweet,stripped_tweet,urls,hashtags,ats,created_at,favorite_count,retweet_count,polarity,subjectivity,coordinates
0,SheRa_Persists,Don't you get it? When Elizabeth Warren is at ...,Don't you get it? When Elizabeth Warren is at ...,[],[],[],2021-05-04 02:27:21,0,0,-0.15,0.75,"(None, None)"
1,negroalfuturo,Elizabeth Warren is not the reason Bernie didn...,Elizabeth Warren is not the reason Bernie didn...,[https://t.co/LfBnGfYybk],[],[],2021-05-04 02:24:16,0,0,0.183333,0.8,"(None, None)"
2,kfairwrites,@ewarren Exactly how I'd react if Elizabeth Wa...,ewarren Exactly how I'd react if Elizabeth War...,[],[warrendemocrat],[@ewarren],2021-05-04 02:18:16,0,0,0.25,0.25,"(None, None)"
3,mongo_ebooks,This is a set on the same soundstage as the El...,This is a set on the same soundstage as the El...,[https://t.co/6o5d25YI20],[],[],2021-05-04 02:17:11,1,0,0.0,0.125,"(None, None)"
4,Imstillalive54,Elizabeth Warren: Democratic party was relucta...,Elizabeth Warren: Democratic party was relucta...,[https://t.co/qmXSzi6a3y],[],[],2021-05-04 02:14:37,0,0,0.0,0.0,"(None, None)"
