Skip to content
Francesco Poldi edited this page May 23, 2019 · 26 revisions

Twint Module

Install

  • pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Main Functions

  • twint.run.Search() - Fetch Tweets using the search filters (Normal);
  • twint.run.Followers() - Fetch a Twitter user's followers;
  • twint.run.Following() - Fetch who follows a Twitter user;
  • twint.run.Favorites() - Fetch Tweets a Twitter user has liked;
  • twint.run.Profile() - Fetch Tweets from a user's profile (Includes retweets);
  • twint.run.Lookup() - Fetch informations from a user's profile (bio, location, etc.).

Configuring Options

Variable             Type       Description
--------------------------------------------
Username             (string) - Twitter user's username
User_id              (string) - Twitter user's user_id
Search               (string) - Search terms
Geo                  (string) - Geo coordinates (lat,lon,km/mi.)
Location             (bool)   - Set to True to attempt to grab a Twitter user's location (slow).
Near                 (string) - Near a certain City (Example: london)
Lang                 (string) - Compatible language codes: https://github.com/twintproject/twint/wiki/Langauge-codes
Output               (string) - Name of the output file.
Elasticsearch        (string) - Elasticsearch instance
Timedelta            (int)    - Time interval for every request (days)
Year                 (string) - Filter Tweets before the specified year.
Since                (string) - Filter Tweets sent since date (Example: 2017-12-27).
Until                (string) - Filter Tweets sent until date (Example: 2017-12-27).
Email                (bool)   - Set to True to show Tweets that _might_ contain emails.
Phone                (bool)   - Set to True to show Tweets that _might_ contain phone numbers.
Verified             (bool)   - Set to True to only show Tweets by _verified_ users
Store_csv            (bool)   - Set to True to write as a csv file.
Store_json           (bool)   - Set to True to write as a json file.
Custom               (dict)   - Custom csv/json formatting (see below).
Show_hashtags        (bool)   - Set to True to show hashtags in the terminal output.
Limit                (int)    - Number of Tweets to pull (Increments of 20).
Count                (bool)   - Count the total number of Tweets fetched.
Stats                (bool)   - Set to True to show Tweet stats in the terminal output.
Database             (string) - Store Tweets in a sqlite3 database. Set this to the DB. (Example: twitter.db)
To                   (string) - Display Tweets tweeted _to_ the specified user.
All                  (string) - Display all Tweets associated with the mentioned user.
Debug                (bool)   - Store information in debug logs.
Format               (string) - Custom terminal output formatting.
Essid                (string) - Elasticsearch session ID.
User_full            (bool)   - Set to True to display full user information. By default, only usernames are shown.
Profile_full         (bool)   - Set to True to use a slow, but effective method to enumerate a user's Timeline.
Store_object         (bool)   - Store tweets/user infos/usernames in JSON objects.
Store_pandas         (bool)   - Save Tweets in a DataFrame (Pandas) file.
Pandas_type          (string) - Specify HDF5 or Pickle (HDF5 as default).
Pandas               (bool)   - Enable Pandas integration.
Index_tweets         (string) - Custom Elasticsearch Index name for Tweets (default: twinttweets).
Index_follow         (string) - Custom Elasticsearch Index name for Follows (default: twintgraph).
Index_users          (string) - Custom Elasticsearch Index name for Users (default: twintuser).
Index_type           (string) - Custom Elasticsearch Document type (default: items).
Retries_count        (int)    - Number of retries of requests (default: 10).
Resume               (int)    - Resume from a specific tweet id.
Images               (bool)   - Display only Tweets with images.
Videos               (bool)   - Display only Tweets with videos.
Media                (bool)   - Display Tweets with only images or videos.
Replies              (bool)   - Display replies to a subject.
Pandas_clean         (bool)   - Automatically clean Pandas dataframe at every scrape.
Lowercase            (bool)   - Automatically convert uppercases in lowercases.
Pandas_au            (bool)   - Automatically update the Pandas dataframe at every scrape.
Proxy_host           (string) - Proxy hostname or IP.
Proxy_port           (int)    - Proxy port.
Proxy_type           (string) - Proxy type.
Tor_control_port     (int) - Tor control port.
Tor_control_password (string) - Tor control password (not hashed).
Retweets             (bool)   - Display replies to a subject.
Hide_output          (bool)   - Hide output.
Get_replies          (bool)   - All replies to the tweet.

Example

# Import the module
import twint

# Set up TWINT config
c = twint.Config()
c.Username = "now"
c.Search = "Fruit"

# Start search
twint.run.Search(c)

Custom Formating

You can have Twint output with your own custom format.

Custom formatting options

With Search, Profile, Favorites:

- {id}
- {date}
- {time}
- {user_id}
- {username}
- {timezone}
- {tweet}
- {hashtags}
- {location}
- {replies}
- {retweets}
- {likes}
- {link}
- {is_retweet}
- {user_rt}
- {mentions}

With Followers, Following:

- {id}
- {name}
- {username}
- {bio}
- {location}
- {url}
- {join_date}
- {join_time}
- {tweets}
- {following}
- {followers}
- {likes}
- {media}
- {private}
- {verified}
- {avatar}

Examples

#!/usr/bin/python3
import twint

c = twint.Config()
# equivalent to `-s` bitcoin
c.Search = "bitcoin"
# Custom output format
c.Format = "Tweet id: {id} | Username: {username}"

twint.run.Search(c)
#!/usr/bin/python3
import twint

c = twint.Config()
c.Username = "twitter"
c.User_full = True
c.Format = "Username: {username} | Bio: {bio} | Url: {url}"

twint.run.Followers(c)

Custom CSV/JSON Formatting

First thing first

config.Custom is a JSON dict with tweets and user as keys, and with the proper attributes as values.

With Search, Profile, Favorites (config.Custom['tweets']):

- id
- conversation_id
- created_at
- date
- time
- timezone
- user_id
- username
- name
- place
- tweet
- mentions
- urls
- photos
- replies_count
- retweets_count
- likes_count
- location
- hashtags
- link
- retweet
- quote_url
- video

With Followers, Following:

- id
- name
- username
- bio
- location
- url
- join_date
- join_time
- tweets
- following
- followers
- likes
- media
- private
- verified
- profile_image_url
- background_image

Examples

Custom storing

In this example Twint will create a directory named "twitter" and will save tweets in tweets.csv. For every tweet will be stored only the id of every tweet.

#!/usr/bin/python3
import twint

c = twint.Config()

c.Username = "twitter"
c.Store_csv = True
# CSV Fieldnames
c.Custom["tweet"] = ["id"]
# Name of the directory
c.Output = "twitter"

twint.run.Search(c)

In this example Twint will create one file called users.csv and will store only the bio of the scraped users.

#!/usr/bin/python3
import twint

c = twint.Config()
c.Username = "twitter"
c.Store_json = True
c.Custom["user"] = ["bio"]
c.User_full = True
c.Output = "users.csv"

twint.run.Followers(c)

Fetch informations from a user's profile

import twint

c = twint.Config()
c.Username = "twitter"

twint.run.Lookup(c)

Fetch Tweets from a user's profile (Includes retweets)

Please note that fields like config.Since and config.Until can't be applied here. config.Limit, config.Elasticsearch and others can still be applied. In general, arguments for the search, so since, until, media images, videos, since, year, until, timedelta, can't be applied here.

import twint

c = twint.Config()
c.Username = "twitter"

twint.run.Profile(c)

Save data (tweets, users, ...) in lists (RAM)

import twint

c = twint.Config()

c.Username = 'noneprivacy'
c.Limit = 10
c.Store_object = True

twint.run.Search(c)
tweets = twint.output.tweets_object
import twint

c = twint.Config()

c.Username = 'noneprivacy'
c.Limit = 10
c.Store_object = True

twint.run.Followers(c)
followers = twint.output.follow_object

The structure of followers is:

{ username:
    { 'followers':
        [...]
    }
}
import twint

c = twint.Config()

c.Username = 'noneprivacy'
c.Limit = 10
c.Store_object = True
c.User_full = True

twint.run.Followers(c)
users = twint.output.user_object

Here users is a list of twint.user.user class elements.

Clone this wiki locally
You can’t perform that action at this time.