You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For timesearch, we'll solve it by transforming this data in the dummy object so that by the time it reaches TSDB, it will be normal.
def __init__(self, **attributes):
for (key, val) in attributes.items():
if key == 'author':
val = DummyObject(name=val)
elif key == 'subreddit':
val = DummyObject(display_name=val)
elif key in ['body', 'selftext']:
val = html.unescape(val)
+ elif key == 'parent_id':+ if val is None:+ val = attributes['link_id']+ elif isinstance(val, int):+ val = 't1_' + common.b36(val)
If you have a timesearch database and you need to repair the parent_id data, you do not need to re-download any of those comments. We would just take all rows with null and copy the submission ID, and all rows with int and run them through the b36 function.
The text was updated successfully, but these errors were encountered:
When requesting comments from the Pushshift API,
parent_id
is coming back as null for root comments, and integers for reply comments.Another user reported this issue on reddit: https://www.reddit.com/r/pushshift/comments/ujwdyt/parent_id_is_being_returned_as_integer_bug/
For timesearch, we'll solve it by transforming this data in the dummy object so that by the time it reaches TSDB, it will be normal.
If you have a timesearch database and you need to repair the parent_id data, you do not need to re-download any of those comments. We would just take all rows with null and copy the submission ID, and all rows with int and run them through the b36 function.
The text was updated successfully, but these errors were encountered: