-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 Tweet, unable to unpickle using dill, max recursion reached #1792
Comments
What's the use case for this? Is there a reason you're trying to pickle the Tweet object rather than just storing the JSON data? |
Great question. I teach a course on Python for Data Science, and I like teaching pickling/unpickling as a component of the second part of our project. In particular, I like that unpickling (hypothetically) keeps A secondary lesson I teach related to this is to collect data once, and process it as-needed. That way we have a stable data set when developing analysis code, and we're not wasting our search credits (since they're a finite resource). So I really want my students to be able to save the results of a search, and load those results back into Tweepy objects at a later time. If you feel this use case is silly and JSON is superior in every way, I'm all ears. Or, if conversion back to |
I do feel like my request is reasonable, as I was able to pickle |
I'm not completely adverse to adding support for pickle. Dill is stated to be a drop-in replacement for pickle, so adding support for pickle should be equivalent to support for dill. As you pointed out, Twitter API v1 models/objects can be pickled (although that's not a regression, as the Twitter API v2 models/objects are a new interface). However it's worth noting that right now, it's already very simple to get the JSON data dictionary from an API v2 object and/or use it to (re-)create the object, e.g.: data = tweet.data
tweet = tweepy.Tweet(data) Currently, this is the best way to retrieve the data from an API v2 object and be able to save it (e.g. as a JSON file) and load it back into a Tweepy object. I actually think this is an interesting use case, and I'm inspired that Tweepy is being used educationally. However, I'm not sure about the practical application or necessity of pickling in this case, beyond as a way of teaching pickling/unpickling. I don't think JSON is superior in every way, but as I'm sure you know, there are definitely advantages over pickling. Both are serialization formats, so I don't think it makes too much sense to say that pickle keeps objects as themselves more so than JSON does, at least in cases like this where the objects can be represented in their entirety as JSON data (and are in fact parsed from JSON data). In both cases, you're serializing and unserializing the data. In fact, efficiency-wise, I think JSON is generally a lot faster. Tweepy already makes the process simple for JSON because the API returns the data as JSON. I don't think there would be any advantages to using pickle over JSON for API v2 objects, but I think the fix for this error would be relatively simple. |
I also got this error while trying to pickle Response objects. I ended up saving out each part of Response objects (.data, .includes, .errors) as JSON lines files, but it would have saved me time to simply be able to pickle them all. |
Would also love seeing support for this, mostly for the reasons mentioned in the OP. |
@michaelmilleryoder Is there any reason you couldn't set the @harshil21 I still don't see a particular need or use case for this over storing the data as JSON, besides a specific desire to teach pickling. |
@Harmon758 Setting |
I personally prefer plain JSON format for storing and transporting data. JSON is the default for, say, "web", and Twitter APIs provide data encoded using JSON too. I would say pickling in Tweepy reduce the interoperability/readability and security. Use pickle only if you are sure it brings more benefits. (I hope I didn't misunderstand the |
PR #2060 fixes this issue as well. Changing the unittest accordingly verifies that. (I have not added this, as it would add a new code dependency.) import dill
import unittest
from tweepy.tweet import Tweet
class TweepyTweetTests(unittest.TestCase):
def test_tweet(self):
t = Tweet(
data={
"edit_history_tweet_ids": ["16213149247090909"],
"id": "16213149247090909",
"text": "We're super excited to not only welcome the Flamingo Janes in...",
}
)
pickled_tweet = dill.dumps(t)
t2 = dill.loads(pickled_tweet)
self.assertDictEqual(t.data, t2.data) |
Ni! Thanks @achimgaedke , that worked for me. My use case was quite straightforward: it allowed me to recover a large dataset that a colleague had (unthoughtfully, yet conveniently for them) stored as a pickle. @Harmon758 : I think it is a problem that things can be pickled but not unpickled. If you don't plan to support unpickling, then please make it so that pickling doesn't give the false impression that the data was properly stored and can be recovered. There are quite a few people who are used to pickling stuff for quick storage and will fall into the trap, possibly losing their work. Best! |
Using the new v2 twitter API and version 4.5.0 of Tweepy, with version 0.3.4 of dill, the following code results in a
RecursionError: maximum recursion depth exceeded
:The problem seems to be in
__getattr__
:It sure would be dandy to be able to serialize using dill!
The text was updated successfully, but these errors were encountered: