Skip to content

This is a tool for converting your Tweet archive to a CSV file so that AI Agents such as chatgpt can analyze the history

License

Notifications You must be signed in to change notification settings

kltownsend/tweet2csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweet to CSV Converter

A Python utility that converts Twitter archive data (in JavaScript format) to a CSV file format for easier data analysis and processing.

Description

This application reads a Twitter archive file (tweets.js) and converts it into a structured CSV format with enhanced analytical features. It extracts key information from each tweet including:

  • Tweet ID
  • Creation date
  • Tweet text
  • User name
  • User screen name
  • Retweet count
  • Favorite count
  • Hashtags
  • User mentions
  • URLs

Enhanced Analytical Features

The converter now includes additional columns specifically designed to help AI analysis:

  • Tweet type (reply, retweet, or original)
  • Engagement rate (calculated from retweets and favorites)
  • Time-based features:
    • Hour of day
    • Day of week
    • Month
    • Year
    • Weekend indicator
  • Content analysis:
    • Tweet length
    • Question presence
    • Exclamation presence
    • Emoji usage
    • Word count

Requirements

  • Python 3.x
  • Standard Python libraries:
    • json
    • csv
    • argparse
    • pathlib
    • datetime
    • re

Usage

Basic usage:

python Tweet2CSV.py

Advanced usage with command-line arguments:

python Tweet2CSV.py --input /path/to/tweets.js --output output.csv --encoding utf-8

Command-line Arguments

  • --input, -i: Path to the input tweets.js file (default: /users/keithtownsend/downloads/twitter/data/tweets.js)
  • --output, -o: Path to the output CSV file (default: tweets.csv)
  • --encoding, -e: File encoding (default: utf-8)

Input File Format

The script expects a Twitter archive file (tweets.js) in the following format:

window.YTD.tweets.part0 = [
  {
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "1839419668525961279"
          ],
          "editableUntil" : "2024-09-26T22:39:58.000Z",
          "editsRemaining" : "5",
          "isEditEligible" : false
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>",
      "entities" : {
        "hashtags" : [...],
        "user_mentions" : [...],
        "urls" : [...]
      },
      "id_str" : "...",
      "created_at" : "...",
      "text" : "...",
      "user" : {
        "name" : "...",
        "screen_name" : "..."
      },
      "retweet_count" : 0,
      "favorite_count" : 0
    }
  },
  ...
]

Output Format

The script generates a CSV file with the following columns:

  • id
  • created_at
  • text
  • user_name
  • user_screen_name
  • retweet_count
  • favorite_count
  • hashtags (semicolon-separated)
  • mentions (semicolon-separated)
  • urls (semicolon-separated)
  • tweet_type (reply/retweet/original)
  • engagement_rate (percentage)
  • hour_of_day (0-23)
  • day_of_week (Monday-Sunday)
  • month (January-December)
  • year
  • is_weekend (true/false)
  • tweet_length (character count)
  • has_question (true/false)
  • has_exclamation (true/false)
  • has_emoji (true/false)
  • word_count

Data Dictionary

A comprehensive data dictionary is provided in DATA_DICTIONARY.md that explains:

  • Each field in the CSV file
  • How to interpret the values
  • Common analysis scenarios
  • Industry-specific metrics
  • Best practices for data analysis

The data dictionary is designed to help AI tools like ChatGPT better understand and analyze your tweet data.

Recommended Tools for CSV Analysis

CSV Viewers and Editors

  1. Rainbow CSV (VS Code Extension)

    • Color-codes CSV columns for better readability
    • Validates CSV formatting
    • Provides SQL-like querying capabilities
    • Makes it easier to spot patterns in your data
    • Installation: Search for "Rainbow CSV" in VS Code extensions
  2. Excel/Google Sheets

    • Familiar spreadsheet interface
    • Built-in filtering and sorting
    • Pivot tables for data aggregation
    • Charts and visualizations
    • Good for sharing with team members
  3. Pandas (Python Library)

    • Powerful data analysis capabilities
    • Can handle large datasets efficiently
    • Extensive statistical functions
    • Integration with visualization libraries
    • Example usage:
      import pandas as pd
      df = pd.read_csv('tweets.csv')
      # Analyze engagement by day of week
      print(df.groupby('day_of_week')['engagement_rate'].mean())

AI Analysis Tools

  1. ChatGPT

    • Upload the CSV file and data dictionary
    • Ask specific analysis questions
    • Get insights and recommendations
    • Example prompt: "Analyze my tweet data and tell me which topics get the most engagement"
  2. Claude

    • Similar capabilities to ChatGPT
    • Often better at handling structured data
    • Can provide more detailed analysis
  3. Custom AI Analysis Scripts

    • Create Python scripts using libraries like scikit-learn
    • Build predictive models for engagement
    • Generate automated reports

Error Handling

The script includes comprehensive error handling for:

  • File reading errors
  • JSON parsing errors
  • CSV writing errors
  • Invalid file paths
  • Encoding issues
  • Date parsing errors

Customization

Modifying the Script

You can customize the script for your specific needs:

  1. Adding New Fields:

    • Edit the csv_header list in the write_csv function
    • Add corresponding data extraction in the row creation section
  2. Changing Analysis Logic:

    • Modify the analysis functions (classify_tweet_type, calculate_engagement_rate, etc.)
    • Add new analysis functions as needed
  3. Adjusting Tweet Structure Parsing:

    • Update the load_tweets function if your tweet archive has a different structure
    • Modify the tweet data extraction in the write_csv function

Adapting for Different Use Cases

The script can be adapted for various use cases:

  1. Personal Brand Analysis:

    • Focus on engagement metrics and content analysis
    • Use the data dictionary's "Thought Leadership Impact" scenarios
  2. Business Marketing:

    • Add fields for campaign tracking
    • Focus on conversion metrics and audience analysis
  3. Content Creator Analysis:

    • Add fields for content categories
    • Focus on content performance across different topics
  4. Community Management:

    • Add fields for community engagement metrics
    • Focus on interaction patterns and response effectiveness
  5. Technical Content Analysis:

    • Use the tech industry-specific scenarios in the data dictionary
    • Focus on technical topic performance and educational content

Notes

  • Make sure you have the necessary permissions to read the input file and write to the output directory
  • The script will overwrite any existing output CSV file
  • For large tweet archives, the conversion process might take some time
  • The script supports UTF-8 encoding by default, but you can specify a different encoding if needed
  • The enhanced analytical features are designed to help AI tools like ChatGPT better analyze your tweet performance patterns

About

This is a tool for converting your Tweet archive to a CSV file so that AI Agents such as chatgpt can analyze the history

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages