Skip to content

xuwenyihust/Twitter-Streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.5

Twitter-Streaming

Stream tweets to MySQL.

Collection, storage and cleaning.

Introduction

Connect MySQL with Python script

  • Use mysql.connector module
  • Create database 'Twitter'
    • Create table 'source'

Set up a Twitter stream listener

  • Aggregate data on a search term
    • Fetch search term from table source
    • Create different tables for each search term to store infomation we need.
  • Modify StreamListener on_data method
    • Convert JSON data object to a python dictionary
    • Stop the listener when we have enough tweets

Import the data into MySQL database

  • Modify StreamListener on_data method
  • Extract different fields of tweets into different tables

Tables

source

Column Descriptions
id Unique keyword id
keyword Keyword for search
('id', 'int(13)', 'YES', '', None, '')
('keyword', 'varchar(20)', 'YES', '', None, '')

collection

table col1 col2 col3 col4
tweet_term id time username tweet
hashtag_term id tag
url_term id url
mention_term id mentioned_id mentioned_name

Tweets

JSON Format of tweets

Field Type
... ...
text string
source string
id int64
created_at string
... ...

Libraries Used

Appendix

The Emoji Problem

The native MySQL UTF-8 character set can hold only 3 bytes, but the whole range of UTF8 characters, including Emoji, requires 4 bytes. So the relational columns should be created with the utf8mb4 collection.

Stop Listening

The original tweepy.streaming.StreamListener class doesn't support stopping listening even when we've got enough infomation we need. We need to modify the StreamListener class by ourselves, adding a counter in the 'on_data' method.

License

Resources

About

Stream tweets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages