v2 Tweet, unable to unpickle using dill, max recursion reached #1792

ofloveandhate · 2022-01-27T20:40:06Z

Using the new v2 twitter API and version 4.5.0 of Tweepy, with version 0.3.4 of dill, the following code results in a RecursionError: maximum recursion depth exceeded:

query = 'your search here'

import tweepy
from twitter_credentials import bearer_token
client = tweepy.Client(bearer_token=bearer_token, return_type = tweepy.Response)

search_result = client.search_recent_tweets(query = query, max_results=10)

t = search_result.data[0]

import dill
serialized = dill.dumps(t)

u = dill.loads(serialized)

The problem seems to be in __getattr__:

  File "/Users/username/opt/anaconda3/lib/python3.8/site-packages/tweepy/mixins.py", line 30, in __getattr__
    return self.data[name]
  [Previous line repeated 992 more times]

It sure would be dandy to be able to serialize using dill!

The text was updated successfully, but these errors were encountered:

Harmon758 · 2022-01-29T17:30:44Z

What's the use case for this? Is there a reason you're trying to pickle the Tweet object rather than just storing the JSON data?

ofloveandhate · 2022-01-29T17:58:47Z

Great question. I teach a course on Python for Data Science, and I like teaching pickling/unpickling as a component of the second part of our project. In particular, I like that unpickling (hypothetically) keeps Tweet objects as Tweet objects, so there's no conversion from JSON back to tweepy.Tweet necessary. Just unpickle.

A secondary lesson I teach related to this is to collect data once, and process it as-needed. That way we have a stable data set when developing analysis code, and we're not wasting our search credits (since they're a finite resource). So I really want my students to be able to save the results of a search, and load those results back into Tweepy objects at a later time.

If you feel this use case is silly and JSON is superior in every way, I'm all ears. Or, if conversion back to Tweet is a one-liner, I'm open to that, too.

ofloveandhate · 2022-01-29T17:59:50Z

I do feel like my request is reasonable, as I was able to pickle Status objects coming from the v1 Twitter API. So this feels like a regression to me.

Harmon758 · 2022-01-29T19:07:54Z

I'm not completely adverse to adding support for pickle. Dill is stated to be a drop-in replacement for pickle, so adding support for pickle should be equivalent to support for dill. As you pointed out, Twitter API v1 models/objects can be pickled (although that's not a regression, as the Twitter API v2 models/objects are a new interface). However it's worth noting that right now, it's already very simple to get the JSON data dictionary from an API v2 object and/or use it to (re-)create the object, e.g.:

data = tweet.data
tweet = tweepy.Tweet(data)

Currently, this is the best way to retrieve the data from an API v2 object and be able to save it (e.g. as a JSON file) and load it back into a Tweepy object.

I actually think this is an interesting use case, and I'm inspired that Tweepy is being used educationally. However, I'm not sure about the practical application or necessity of pickling in this case, beyond as a way of teaching pickling/unpickling. I don't think JSON is superior in every way, but as I'm sure you know, there are definitely advantages over pickling.

Both are serialization formats, so I don't think it makes too much sense to say that pickle keeps objects as themselves more so than JSON does, at least in cases like this where the objects can be represented in their entirety as JSON data (and are in fact parsed from JSON data). In both cases, you're serializing and unserializing the data. In fact, efficiency-wise, I think JSON is generally a lot faster.

Tweepy already makes the process simple for JSON because the API returns the data as JSON. I don't think there would be any advantages to using pickle over JSON for API v2 objects, but I think the fix for this error would be relatively simple.

michaelmilleryoder · 2022-08-08T20:54:31Z

I also got this error while trying to pickle Response objects. I ended up saving out each part of Response objects (.data, .includes, .errors) as JSON lines files, but it would have saved me time to simply be able to pickle them all.

harshil21 · 2022-08-12T21:23:40Z

Would also love seeing support for this, mostly for the reasons mentioned in the OP.

Harmon758 · 2022-10-27T01:08:00Z

@michaelmilleryoder Is there any reason you couldn't set the return_type to dict and save the response as JSON directly?

@harshil21 I still don't see a particular need or use case for this over storing the data as JSON, besides a specific desire to teach pickling.

michaelmilleryoder · 2022-10-27T17:11:28Z

@Harmon758 Setting return_type=dict when I initialize the tweepy Client should work! I didn't know about that option--thanks.

qin-yu · 2022-11-17T16:41:23Z

I personally prefer plain JSON format for storing and transporting data.

JSON is the default for, say, "web", and Twitter APIs provide data encoded using JSON too. I would say pickling in Tweepy reduce the interoperability/readability and security. Use pickle only if you are sure it brings more benefits.

(I hope I didn't misunderstand the Request For Comments (RFC) label)

achimgaedke · 2023-02-03T09:30:45Z

PR #2060 fixes this issue as well.

Changing the unittest accordingly verifies that. (I have not added this, as it would add a new code dependency.)

import dill
import unittest

from tweepy.tweet import Tweet


class TweepyTweetTests(unittest.TestCase):
    def test_tweet(self):
        t = Tweet(
            data={
                "edit_history_tweet_ids": ["16213149247090909"],
                "id": "16213149247090909",
                "text": "We're super excited to not only welcome the Flamingo Janes in...",
            }
        )

        pickled_tweet = dill.dumps(t)
        t2 = dill.loads(pickled_tweet)

        self.assertDictEqual(t.data, t2.data)

solstag · 2023-03-09T10:09:14Z

Ni! Thanks @achimgaedke , that worked for me.

My use case was quite straightforward: it allowed me to recover a large dataset that a colleague had (unthoughtfully, yet conveniently for them) stored as a pickle.

@Harmon758 : I think it is a problem that things can be pickled but not unpickled. If you don't plan to support unpickling, then please make it so that pickling doesn't give the false impression that the data was properly stored and can be recovered. There are quite a few people who are used to pickling stuff for quick storage and will fall into the trap, possibly losing their work.

Best!

Harmon758 added Feature Request This is requesting a new feature Need Follow-Up This needs to be followed up on to be actionable labels Jan 29, 2022

Harmon758 added RFC This is a Request For Comments and removed Need Follow-Up This needs to be followed up on to be actionable labels Oct 27, 2022

achimgaedke mentioned this issue Feb 3, 2023

Tweet object fails to unpickle with built-in pickle #2059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

ofloveandhate commented Jan 27, 2022

Harmon758 commented Jan 29, 2022

ofloveandhate commented Jan 29, 2022

ofloveandhate commented Jan 29, 2022

Harmon758 commented Jan 29, 2022 •

edited

michaelmilleryoder commented Aug 8, 2022

harshil21 commented Aug 12, 2022

Harmon758 commented Oct 27, 2022

michaelmilleryoder commented Oct 27, 2022

qin-yu commented Nov 17, 2022

achimgaedke commented Feb 3, 2023

solstag commented Mar 9, 2023

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

Comments

ofloveandhate commented Jan 27, 2022

Harmon758 commented Jan 29, 2022

ofloveandhate commented Jan 29, 2022

ofloveandhate commented Jan 29, 2022

Harmon758 commented Jan 29, 2022 • edited

michaelmilleryoder commented Aug 8, 2022

harshil21 commented Aug 12, 2022

Harmon758 commented Oct 27, 2022

michaelmilleryoder commented Oct 27, 2022

qin-yu commented Nov 17, 2022

achimgaedke commented Feb 3, 2023

solstag commented Mar 9, 2023

Harmon758 commented Jan 29, 2022 •

edited