Stream tweets to MySQL.
Collection, storage and cleaning.
- Use mysql.connector module
- Create database 'Twitter'
- Create table 'source'
- Aggregate data on a search term
- Fetch search term from table source
- Create different tables for each search term to store infomation we need.
- Modify StreamListener on_data method
- Convert JSON data object to a python dictionary
- Stop the listener when we have enough tweets
- Modify StreamListener on_data method
- Extract different fields of tweets into different tables
| Column | Descriptions |
|---|---|
| id | Unique keyword id |
| keyword | Keyword for search |
('id', 'int(13)', 'YES', '', None, '')
('keyword', 'varchar(20)', 'YES', '', None, '')| table | col1 | col2 | col3 | col4 |
|---|---|---|---|---|
| tweet_term | id | time | username | tweet |
| hashtag_term | id | tag | ||
| url_term | id | url | ||
| mention_term | id | mentioned_id | mentioned_name |
| Field | Type |
|---|---|
| ... | ... |
| text | string |
| source | string |
| id | int64 |
| created_at | string |
| ... | ... |
The native MySQL UTF-8 character set can hold only 3 bytes, but the whole range of UTF8 characters, including Emoji, requires 4 bytes. So the relational columns should be created with the utf8mb4 collection.
The original tweepy.streaming.StreamListener class doesn't support stopping listening even when we've got enough infomation we need. We need to modify the StreamListener class by ourselves, adding a counter in the 'on_data' method.