Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovering from exception #2

Closed
Brotakuu opened this issue Mar 16, 2018 · 2 comments
Closed

Recovering from exception #2

Brotakuu opened this issue Mar 16, 2018 · 2 comments

Comments

@Brotakuu
Copy link

Brotakuu commented Mar 16, 2018

After running timesearch through a huge sub, the bot exited with an exception.

Is there a way to resume progress from where it exited? Running the timesearch again only grabs the most recent threads from the top (does not attempt to continue where it left off).

Also: any idea what might be the cause? (running 2 instances with different apps configured on mac os)

Jul 14 2015 13:28:52 - Jul 14 2015 12:37:17 +100
Jul 14 2015 12:36:47 - Jul 14 2015 11:16:45 +100
Jul 14 2015 11:16:19 - Jul 14 2015 09:50:38 +100
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 389, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='oauth.reddit.com', port=443): Read timed out. (read timeout=16.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/requestor.py", line 47, in request
    return self._http.request(*args, timeout=TIMEOUT, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 521, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='oauth.reddit.com', port=443): Read timed out. (read timeout=16.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "timesearch.py", line 11, in <module>
    status_code = timesearch.main(sys.argv[1:])
  File "/Users/1/ts/timesearch/__init__.py", line 425, in main
    args.func(args)
  File "/Users/1/ts/timesearch/__init__.py", line 329, in timesearch_gateway
    timesearch.timesearch_argparse(args)
  File "/Users/1/ts/timesearch/timesearch.py", line 152, in timesearch_argparse
    interval=common.int_none(args.interval),
  File "/Users/1/ts/timesearch/timesearch.py", line 78, in timesearch
    for chunk in submissions:
  File "/Users/1/ts/timesearch/common.py", line 66, in generator_chunker
    for item in generator:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praw/models/reddit/subreddit.py", line 451, in submissions
    sort='new', syntax='cloudsearch'):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praw/models/listing/generator.py", line 52, in __next__
    self._next_batch()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praw/models/listing/generator.py", line 62, in _next_batch
    self._listing = self._reddit.get(self.url, params=self.params)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praw/reddit.py", line 367, in get
    data = self.request('GET', path, params=params)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praw/reddit.py", line 472, in request
    params=params)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/sessions.py", line 181, in request
    params=params, url=url)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/sessions.py", line 112, in _request_with_retries
    data, files, json, method, params, retries, url)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/sessions.py", line 97, in _make_request
    params=params)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/rate_limit.py", line 33, in call
    response = request_function(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prawcore/requestor.py", line 49, in request
    raise RequestException(exc, args, kwargs)
prawcore.exceptions.RequestException: error with request HTTPSConnectionPool(host='oauth.reddit.com', port=443): Read timed out. (read timeout=16.0)
@voussoir
Copy link
Owner

Sorry for the inconvenience, this is another artifact from the PRAW3 to PRAW4 transition. The posts used to be collected oldest-first so that's why it only checks for newer posts when you run it a second time. Now they're collected newest-first and that isn't as good.

You can still provide the --upper and --lower arguments on the commandline. So for example lower should be the timestamp that the subreddit was created, which you can find on the json:

https://www.reddit.com/r/askreddit/about.json Search for "created_utc"

For upper you can use the timestamp from before it crashed. July 14 2015 is approximately 1436897779 but maybe you should set it a bit higher to account for possible timezone offsets.

TLDR:

> timesearch timesearch -r askreddit --upper 1436984179 --lower 1201146735

@voussoir
Copy link
Owner

The cause is just a read timeout, which means the website was probably too busy. Timestamp searching is fairly expensive which is probably one of the reasons they're killing it / the new platform doesn't support it. It's not your fault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants