Tweet to CSV Converter

A Python utility that converts Twitter archive data (in JavaScript format) to a CSV file format for easier data analysis and processing.

Description

This application reads a Twitter archive file (tweets.js) and converts it into a structured CSV format with enhanced analytical features. It extracts key information from each tweet including:

Tweet ID
Creation date
Tweet text
User name
User screen name
Retweet count
Favorite count
Hashtags
User mentions
URLs

Enhanced Analytical Features

The converter now includes additional columns specifically designed to help AI analysis:

Tweet type (reply, retweet, or original)
Engagement rate (calculated from retweets and favorites)
Time-based features:
- Hour of day
- Day of week
- Month
- Year
- Weekend indicator
Content analysis:
- Tweet length
- Question presence
- Exclamation presence
- Emoji usage
- Word count

Requirements

Python 3.x
Standard Python libraries:
- json
- csv
- argparse
- pathlib
- datetime
- re

Usage

Basic usage:

python Tweet2CSV.py

Advanced usage with command-line arguments:

python Tweet2CSV.py --input /path/to/tweets.js --output output.csv --encoding utf-8

Command-line Arguments

--input, -i: Path to the input tweets.js file (default: /users/keithtownsend/downloads/twitter/data/tweets.js)
--output, -o: Path to the output CSV file (default: tweets.csv)
--encoding, -e: File encoding (default: utf-8)

Input File Format

The script expects a Twitter archive file (tweets.js) in the following format:

window.YTD.tweets.part0 = [
  {
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "1839419668525961279"
          ],
          "editableUntil" : "2024-09-26T22:39:58.000Z",
          "editsRemaining" : "5",
          "isEditEligible" : false
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>",
      "entities" : {
        "hashtags" : [...],
        "user_mentions" : [...],
        "urls" : [...]
      },
      "id_str" : "...",
      "created_at" : "...",
      "text" : "...",
      "user" : {
        "name" : "...",
        "screen_name" : "..."
      },
      "retweet_count" : 0,
      "favorite_count" : 0
    }
  },
  ...
]

Output Format

The script generates a CSV file with the following columns:

id
created_at
text
user_name
user_screen_name
retweet_count
favorite_count
hashtags (semicolon-separated)
mentions (semicolon-separated)
urls (semicolon-separated)
tweet_type (reply/retweet/original)
engagement_rate (percentage)
hour_of_day (0-23)
day_of_week (Monday-Sunday)
month (January-December)
year
is_weekend (true/false)
tweet_length (character count)
has_question (true/false)
has_exclamation (true/false)
has_emoji (true/false)
word_count

Data Dictionary

A comprehensive data dictionary is provided in DATA_DICTIONARY.md that explains:

Each field in the CSV file
How to interpret the values
Common analysis scenarios
Industry-specific metrics
Best practices for data analysis

The data dictionary is designed to help AI tools like ChatGPT better understand and analyze your tweet data.

Recommended Tools for CSV Analysis

CSV Viewers and Editors

Rainbow CSV (VS Code Extension)
- Color-codes CSV columns for better readability
- Validates CSV formatting
- Provides SQL-like querying capabilities
- Makes it easier to spot patterns in your data
- Installation: Search for "Rainbow CSV" in VS Code extensions
Excel/Google Sheets
- Familiar spreadsheet interface
- Built-in filtering and sorting
- Pivot tables for data aggregation
- Charts and visualizations
- Good for sharing with team members
Pandas (Python Library)
- Powerful data analysis capabilities
- Can handle large datasets efficiently
- Extensive statistical functions
- Integration with visualization libraries
- Example usage:
```
import pandas as pd
df = pd.read_csv('tweets.csv')
# Analyze engagement by day of week
print(df.groupby('day_of_week')['engagement_rate'].mean())
```

AI Analysis Tools

ChatGPT
- Upload the CSV file and data dictionary
- Ask specific analysis questions
- Get insights and recommendations
- Example prompt: "Analyze my tweet data and tell me which topics get the most engagement"
Claude
- Similar capabilities to ChatGPT
- Often better at handling structured data
- Can provide more detailed analysis
Custom AI Analysis Scripts
- Create Python scripts using libraries like scikit-learn
- Build predictive models for engagement
- Generate automated reports

Error Handling

The script includes comprehensive error handling for:

File reading errors
JSON parsing errors
CSV writing errors
Invalid file paths
Encoding issues
Date parsing errors

Customization

Modifying the Script

You can customize the script for your specific needs:

Adding New Fields:
- Edit the csv_header list in the write_csv function
- Add corresponding data extraction in the row creation section
Changing Analysis Logic:
- Modify the analysis functions (classify_tweet_type, calculate_engagement_rate, etc.)
- Add new analysis functions as needed
Adjusting Tweet Structure Parsing:
- Update the load_tweets function if your tweet archive has a different structure
- Modify the tweet data extraction in the write_csv function

Adapting for Different Use Cases

The script can be adapted for various use cases:

Personal Brand Analysis:
- Focus on engagement metrics and content analysis
- Use the data dictionary's "Thought Leadership Impact" scenarios
Business Marketing:
- Add fields for campaign tracking
- Focus on conversion metrics and audience analysis
Content Creator Analysis:
- Add fields for content categories
- Focus on content performance across different topics
Community Management:
- Add fields for community engagement metrics
- Focus on interaction patterns and response effectiveness
Technical Content Analysis:
- Use the tech industry-specific scenarios in the data dictionary
- Focus on technical topic performance and educational content

Notes

Make sure you have the necessary permissions to read the input file and write to the output directory
The script will overwrite any existing output CSV file
For large tweet archives, the conversion process might take some time
The script supports UTF-8 encoding by default, but you can specify a different encoding if needed
The enhanced analytical features are designed to help AI tools like ChatGPT better analyze your tweet performance patterns

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
DATA_DICTIONARY.md		DATA_DICTIONARY.md
LICENSE		LICENSE
README.md		README.md
Tweet2CSV.py		Tweet2CSV.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tweet to CSV Converter

Description

Enhanced Analytical Features

Requirements

Usage

Command-line Arguments

Input File Format

Output Format

Data Dictionary

Recommended Tools for CSV Analysis

CSV Viewers and Editors

AI Analysis Tools

Error Handling

Customization

Modifying the Script

Adapting for Different Use Cases

Notes

About

Uh oh!

Releases

Packages

Languages

License

kltownsend/tweet2csv

Folders and files

Latest commit

History

Repository files navigation

Tweet to CSV Converter

Description

Enhanced Analytical Features

Requirements

Usage

Command-line Arguments

Input File Format

Output Format

Data Dictionary

Recommended Tools for CSV Analysis

CSV Viewers and Editors

AI Analysis Tools

Error Handling

Customization

Modifying the Script

Adapting for Different Use Cases

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages