Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

Open
ofloveandhate opened this issue Jan 27, 2022 · 11 comments
Open

v2 Tweet, unable to unpickle using dill, max recursion reached #1792

ofloveandhate opened this issue Jan 27, 2022 · 11 comments
Labels
Feature Request This is requesting a new feature RFC This is a Request For Comments

Comments

@ofloveandhate
Copy link

Using the new v2 twitter API and version 4.5.0 of Tweepy, with version 0.3.4 of dill, the following code results in a RecursionError: maximum recursion depth exceeded:

query = 'your search here'

import tweepy
from twitter_credentials import bearer_token
client = tweepy.Client(bearer_token=bearer_token, return_type = tweepy.Response)

search_result = client.search_recent_tweets(query = query, max_results=10)

t = search_result.data[0]

import dill
serialized = dill.dumps(t)

u = dill.loads(serialized)

The problem seems to be in __getattr__:

  File "/Users/username/opt/anaconda3/lib/python3.8/site-packages/tweepy/mixins.py", line 30, in __getattr__
    return self.data[name]
  [Previous line repeated 992 more times]

It sure would be dandy to be able to serialize using dill!

@Harmon758
Copy link
Member

What's the use case for this? Is there a reason you're trying to pickle the Tweet object rather than just storing the JSON data?

@Harmon758 Harmon758 added Feature Request This is requesting a new feature Need Follow-Up This needs to be followed up on to be actionable labels Jan 29, 2022
@ofloveandhate
Copy link
Author

Great question. I teach a course on Python for Data Science, and I like teaching pickling/unpickling as a component of the second part of our project. In particular, I like that unpickling (hypothetically) keeps Tweet objects as Tweet objects, so there's no conversion from JSON back to tweepy.Tweet necessary. Just unpickle.

A secondary lesson I teach related to this is to collect data once, and process it as-needed. That way we have a stable data set when developing analysis code, and we're not wasting our search credits (since they're a finite resource). So I really want my students to be able to save the results of a search, and load those results back into Tweepy objects at a later time.

If you feel this use case is silly and JSON is superior in every way, I'm all ears. Or, if conversion back to Tweet is a one-liner, I'm open to that, too.

@ofloveandhate
Copy link
Author

I do feel like my request is reasonable, as I was able to pickle Status objects coming from the v1 Twitter API. So this feels like a regression to me.

@Harmon758
Copy link
Member

Harmon758 commented Jan 29, 2022

I'm not completely adverse to adding support for pickle. Dill is stated to be a drop-in replacement for pickle, so adding support for pickle should be equivalent to support for dill. As you pointed out, Twitter API v1 models/objects can be pickled (although that's not a regression, as the Twitter API v2 models/objects are a new interface). However it's worth noting that right now, it's already very simple to get the JSON data dictionary from an API v2 object and/or use it to (re-)create the object, e.g.:

data = tweet.data
tweet = tweepy.Tweet(data)

Currently, this is the best way to retrieve the data from an API v2 object and be able to save it (e.g. as a JSON file) and load it back into a Tweepy object.

I actually think this is an interesting use case, and I'm inspired that Tweepy is being used educationally. However, I'm not sure about the practical application or necessity of pickling in this case, beyond as a way of teaching pickling/unpickling. I don't think JSON is superior in every way, but as I'm sure you know, there are definitely advantages over pickling.

Both are serialization formats, so I don't think it makes too much sense to say that pickle keeps objects as themselves more so than JSON does, at least in cases like this where the objects can be represented in their entirety as JSON data (and are in fact parsed from JSON data). In both cases, you're serializing and unserializing the data. In fact, efficiency-wise, I think JSON is generally a lot faster.

Tweepy already makes the process simple for JSON because the API returns the data as JSON. I don't think there would be any advantages to using pickle over JSON for API v2 objects, but I think the fix for this error would be relatively simple.

@michaelmilleryoder
Copy link

I also got this error while trying to pickle Response objects. I ended up saving out each part of Response objects (.data, .includes, .errors) as JSON lines files, but it would have saved me time to simply be able to pickle them all.

@harshil21
Copy link

Would also love seeing support for this, mostly for the reasons mentioned in the OP.

@Harmon758
Copy link
Member

@michaelmilleryoder Is there any reason you couldn't set the return_type to dict and save the response as JSON directly?

@harshil21 I still don't see a particular need or use case for this over storing the data as JSON, besides a specific desire to teach pickling.

@michaelmilleryoder
Copy link

@Harmon758 Setting return_type=dict when I initialize the tweepy Client should work! I didn't know about that option--thanks.

@Harmon758 Harmon758 added RFC This is a Request For Comments and removed Need Follow-Up This needs to be followed up on to be actionable labels Oct 27, 2022
@qin-yu
Copy link
Contributor

qin-yu commented Nov 17, 2022

I personally prefer plain JSON format for storing and transporting data.

JSON is the default for, say, "web", and Twitter APIs provide data encoded using JSON too. I would say pickling in Tweepy reduce the interoperability/readability and security. Use pickle only if you are sure it brings more benefits.

(I hope I didn't misunderstand the Request For Comments (RFC) label)

@achimgaedke
Copy link

PR #2060 fixes this issue as well.

Changing the unittest accordingly verifies that. (I have not added this, as it would add a new code dependency.)

import dill
import unittest

from tweepy.tweet import Tweet


class TweepyTweetTests(unittest.TestCase):
    def test_tweet(self):
        t = Tweet(
            data={
                "edit_history_tweet_ids": ["16213149247090909"],
                "id": "16213149247090909",
                "text": "We're super excited to not only welcome the Flamingo Janes in...",
            }
        )

        pickled_tweet = dill.dumps(t)
        t2 = dill.loads(pickled_tweet)

        self.assertDictEqual(t.data, t2.data)

@solstag
Copy link

solstag commented Mar 9, 2023

Ni! Thanks @achimgaedke , that worked for me.

My use case was quite straightforward: it allowed me to recover a large dataset that a colleague had (unthoughtfully, yet conveniently for them) stored as a pickle.

@Harmon758 : I think it is a problem that things can be pickled but not unpickled. If you don't plan to support unpickling, then please make it so that pickling doesn't give the false impression that the data was properly stored and can be recovered. There are quite a few people who are used to pickling stuff for quick storage and will fall into the trap, possibly losing their work.

Best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request This is requesting a new feature RFC This is a Request For Comments
Projects
None yet
Development

No branches or pull requests

7 participants