# Datenakquise via Twitter API v2
<small>Stand: 0.1 (28.09.2022)</small>  

Dieses Notebook enthält Funktionen, die es erlauben, Tweets zu einem bestimmten Thema (in diesem Falle "climate change") herunterzuladen und abzuspeichern.

In [1]:
%pip install --upgrade tweepy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tweepy
  Downloading tweepy-4.10.1-py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 3.4 MB/s 
Collecting requests<3,>=2.27.0
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 1.2 MB/s 
Installing collected packages: requests, tweepy
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: tweepy
    Found existing installation: tweepy 3.10.0
    Uninstalling tweepy-3.10.0:
      Successfully uninstalled tweepy-3.10.0
Successfully installed requests-2.28.1 tweepy-4.10.1


In [53]:
import tweepy
from datetime import datetime
import random
import json

# Funktionen
Einige grundlegende Funktionen, um Tweets über die Twitter API v2 via Tweepy herunterzuladen.

In [54]:
def get_tweets(q:str, api_key:str, tweet_fields, page_limit=1, max_results=10, save_as="JSON"):
  '''
  Function to get tweets via Tweepy.

  Parameters
  ----------
    q : str
      The search term(s). See the Twitter API v2 docs for further instructions.
    api_key: str
      The API key needed to initialize Tweepy Client object.
    tweet_fields: list(str)
      List of additional tweet fields (see Twitter API v2 docs).
    page_limit: int
      Enables pagination if there are more tweets than can be retrieved with
      a single request. (default=1)
    max_results : int, optional
      The maximal number of tweets per query (max. is 100 for free tier users).
      (default=10)
    save_as: str, optional #TODO
      The datatype to store the tweets (JSON, CSV). (default=JSON)  

  RETURNS
  -------
    Last retrieved tweepy.Response object.
  '''
  # initialize client
  client = tweepy.Client(api_key)

  # get tweets
  tweets = client.search_recent_tweets(q, tweet_fields=tweet_fields, max_results=max_results)

  # save tweets
  if save_as=="JSON":
    with open(f"drive/MyDrive/tweets_{datetime.now().strftime('%Y_%m_%d_v1')}.json", "a") as f:
      for tweet in tweets.data:
        f.write(json.dumps(tweet.data))
        f.write("\n")    

  # check if there are more pages
  if page_limit > 1 and tweets.meta["next_token"]:
    counter = 1
    while counter < page_limit and tweets.meta["next_token"]:
      # get tweets
      tweets = client.search_recent_tweets(q, tweet_fields=tweet_fields, max_results=max_results, next_token=tweets.meta["next_token"])
      # save tweets
      if save_as=="JSON":
        with open(f"drive/MyDrive/tweets_{datetime.now().strftime('%Y_%m_%d_v')}{str(counter+1)}.json", "a") as f:
          for tweet in tweets.data:
            f.write(json.dumps(tweet.data))
            f.write("\n")   
      counter += 1

  return tweets # return last retrieved tweet page

# Herunterladen Tweets via API v2

In [55]:
API_KEY = "AAAAAAAAAAAAAAAAAAAAAH5wYgEAAAAAhu8dS5BHTXPuIU%2BxIz28jLMG6d4%3DeDYbBg48BFbSkz8YhLylTByOQ9dDg9PucNqtXoguvCr7XI3AkI"

In [56]:
query = '"climate change" -is:retweet lang:en'

In [57]:
tweets = get_tweets(query, API_KEY, tweet_fields=["author_id", "geo", "created_at", "entities", "attachments", "public_metrics"], page_limit=5, max_results=20)

In [37]:
tweets.meta

{'newest_id': '1575393204249190400',
 'oldest_id': '1575392815118692352',
 'result_count': 10,
 'next_token': 'b26v89c19zqg8o3fpzblrszjqm9ryjf95avgvtq0un43h'}

# Laden der Tweets

In [59]:
import glob
from pathlib import Path
import os

In [63]:
tweets_list = list()

In [64]:
for path in glob.glob("drive/MyDrive/*.json"):
  with open(path, "r") as f:
    for line in f.readlines():
      tweets_list.append(json.loads(line))

In [69]:
tweets_list[0]["text"]

'Hurricane Ian’s rapid intensification is a sign of the world to come  https://t.co/23kd3kZPGq via @voxdotcom'