Skip to content
This repository has been archived by the owner on Oct 5, 2020. It is now read-only.

fix aiohttp>=2.0.0 exception handling #67

Merged
merged 8 commits into from Apr 25, 2017
Merged

fix aiohttp>=2.0.0 exception handling #67

merged 8 commits into from Apr 25, 2017

Conversation

escapewindow
Copy link
Contributor

aiohttp 2.0.0 fixes a lot of things, but is also backwards incompatible.

This PR fixes taskcluster-client async calls when using aiohttp>=2.0.0, and requires it.

http://aiohttp.readthedocs.io/en/stable/migration.html

@escapewindow
Copy link
Contributor Author

Do you think requiring aiohttp>=2.0.0 for py35 is ok @jhford ? We have that dependency for scriptworker already.

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think Releng is the only team which is making use of the Async client, so I'll defer to your judgement on whether that dependency change is OK. I would like us to make a major version bump if we're requiring a new version of aiohttp, since that is a breaking change, I think.

One question I did have is about the comment of the PR. You said that this PR will fix the taskcluster client calls, but I only see changes about exception catching and the version change. Is the issue it's fixing around exception catching or is it something else?

@@ -144,7 +143,7 @@ class AsyncBaseClient(BaseClient):
if response.status == 204:
return None

except aiohttp.HttpProcessingError as rerr:
except aiohttp.ClientError as rerr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs suggest to me that it's aiohttp.ClientResponseError here?

http://aiohttp.readthedocs.io/en/stable/client_reference.html#aiohttp.ClientResponse.raise_for_status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a little more, since the default in aiohttp is to follow redirects, and raise_for_status() only raises for http status code >= 400, and that not all 300s are automatically follow-able redirects, and none (I think) contain a valid response body, we should probably check the status code and throw if <200 or >= 300 and not use raise_for_status().

The requests library also has this issue. Would you be able to update the sync client to match this behaviour? The aiohttp docs seem to be better than the requests docs, since the latter requires you to read the code to spot that only >=400 is throwing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done.

@@ -128,7 +127,7 @@ class AsyncBaseClient(BaseClient):
log.debug('Making attempt %d', retry)
try:
response = await asyncutils.makeSingleHttpRequest(method, url, payload, headers)
except requests.exceptions.RequestException as rerr:
except aiohttp.ClientError as rerr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch about the requests library exception catching.

setup.py Outdated
@@ -64,7 +64,7 @@ def run_tests(self):
raise Exception('this library does not support python >2 and <3.5')
elif sys.version_info[:2] >= (3, 5):
install_requires.extend([
'aiohttp',
'aiohttp>=2.0.0',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though aiohttp uses an attempt at being humorous for a versioning scheme, the scheme they uses still attempts to encode information. I think.

Can we use >=2.0.0,<3? Also, maybe add something for async_timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and gives us a little ascii heart to boot :)

@escapewindow
Copy link
Contributor Author

You said that this PR will fix the taskcluster client calls, but I only see changes about exception catching and the version change. Is the issue it's fixing around exception catching or is it something else?

aiohttp.HttpProcessingError doesn't exist anymore, so I was hitting uncaught exceptions in tests... this likely is hitting us in production as well.

There are two more issues I wanted to address in this PR, but didn't figure out how:

  • Unclosed client session: this is usually handled by a with aiohttp.ClientSession(...) as session:, but I didn't find a place to put the with statement that would be compatible with sync.
  • Creating a client session outside of coroutine - similar to above. Since aiohttp is async, it expects the ClientSession to be created inside an async coroutine rather than a synchronous function. But I'm not sure how to do this with the parallel sync/async structure.

Because of those two issues, we end up with spurious errors in logs whenever we create or destroy async taskcluster objects.

@escapewindow
Copy link
Contributor Author

Ready for re-review, I believe.

setup.py Outdated
@@ -64,8 +64,8 @@ def run_tests(self):
raise Exception('this library does not support python >2 and <3.5')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is pre-existing, but isn't the exception message here incorrect?

Shouldn't it be:
this library does not support python >3 and <3.5

Or perhaps clearer:
This library does not support Python 3 versions below 3.5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the error message was correct. The suggested this library does not support python >3 and <3.5 mean it doesn't support version 4 or less than 3.5, which isn't right.

I'm happy to have the error string that uses real English too :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow.

The old error message implied Python 2.7 wasn't supported, but it is.

Both of the proposed error messages don't comment about Python 2.x, but do say that 3.0-3.4 are not supported, which seems to have been the intention all along.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was using pythonesque boolean logic, but your suggestion of "This library does not support Python 3 versions below 3.5." is much, much better!

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid creating aiohttp/requests internal exceptions in our code, but otherwise this looks great. Let me know if you'd like me to take a look at that.

status = response.status
# Handle non 2xx status code and retry if possible
if status < 200 or status >= 300:
rerr = aiohttp.ClientResponseError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to create this exception, at least not here. Let's just throw the TaskclusterRestFailure with these values as own properties.

My concern is that the cause of the exception isn't really in the aiohttp or request library, so it shouldn't have an exception from those packages. We should still be catching errors from the client library, but independently from status checks. If you'd like, I can take a look at this in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not raising the exception, just handling it like it would look if we had caught raise_for_status, if raise_for_status also raised for <200 or >=300. The alternative was to craft an error message string, and then I have no exception to set as superExc in the taskcluster exception. I wasn't sure if anything downstream is assuming there is an exception set in superExc, otherwise I would have set the error string there.

So we're still raising taskcluster exceptions; we're just using rerr as the source of message in log.warn, and as the superExc when we're raising a taskcluster exception.

Do you still want changes here?

# error message for status >= 400, and craft one for other
# non-2xx statuses.
if status < 200 or status >= 300:
raise requests.exceptions.RequestException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the async, let's throw a taskcluster exception with these as properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm explicitly raising the exception that we're catching in the except line below, so we don't have to deal with exceptions non-2xx statuses in more than one place. Do you still object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like manually creating requests library exceptions outside of that package. I think that like the async version, we should not call the raise_for_status method, meaning we won't throw any exception for a valid non-200 request. Instead, we should handle the different non-exceptional status codes as application logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is what I'm referring to, similar to the changes in async. I just don't really want the control flow to differ that much between sync and async when it doesn't have to.

diff --git a/taskcluster/client.py b/taskcluster/client.py
index 08bd2dc..23a715d 100644
--- a/taskcluster/client.py
+++ b/taskcluster/client.py
@@ -408,57 +408,57 @@ class BaseClient(object):
 
             # Handle non 2xx status code and retry if possible
             status = response.status_code
-            try:
-                response.raise_for_status()
-                if status == 204:
-                    return None
-
-                # raise_for_status only catches status >= 400, but it also
-                # customizes the error message.  Let's use that customized
-                # error message for status >= 400, and craft one for other
-                # non-2xx statuses.
-                if status < 200 or status >= 300:
-                    raise requests.exceptions.RequestException(
-                        '%s Error: %s for url: %s' % (
-                            response.status_code, response.reason, response.url
-                        ), response=response
-                    )
+            if status == 204:
+                return None
 
-            except requests.exceptions.RequestException as rerr:
-                status = response.status_code
-                if 500 <= status and status < 600 and retry < retries:
-                    log.warn('Retrying because of: %s' % rerr)
-                    continue
-                # Parse messages from errors
-                data = {}
-                try:
-                    data = response.json()
-                except:
-                    pass  # Ignore JSON errors in error messages
-                # Find error message
-                message = "Unknown Server Error"
-                if isinstance(data, dict):
-                    message = data.get('message')
-                else:
-                    if status == 401:
-                        message = "Authentication Error"
-                    elif status == 500:
-                        message = "Internal Server Error"
-                # Raise TaskclusterAuthFailure if this is an auth issue
-                if status == 401:
-                    raise exceptions.TaskclusterAuthFailure(
-                        message,
-                        status_code=status,
-                        body=data,
-                        superExc=rerr
-                    )
-                # Raise TaskclusterRestFailure for all other issues
+            # Catch retryable errors and go to the beginning of the loop
+            # to do the retry
+            if 500 <= status and status < 600 and retry < retries:
+                log.warn('Retrying because of a %s status code' % status)
+                continue
+
+            # Throw errors for non-retryable errors
+            if status < 200 or status >= 300:
                 raise exceptions.TaskclusterRestFailure(
+                    '%s Error: %s for url: %s' % (
+                        response.status_code, response.reason, response.url
+                    ), 
+                    status_code=status,
+                    response=response,
+                    body=data,
+                    superExc=None,
+                )
+
+            # Handle valid responses
+            data = {}
+            try:
+                data = response.json()
+            except:
+                pass  # Ignore JSON errors in error messages
+            # Find error message
+            message = "Unknown Server Error"
+            if isinstance(data, dict):
+                message = data.get('message')
+            else:
+                if status == 401:
+                    message = "Authentication Error"
+                elif status == 500:
+                    message = "Internal Server Error"
+            # Raise TaskclusterAuthFailure if this is an auth issue
+            if status == 401:
+                raise exceptions.TaskclusterAuthFailure(
                     message,
                     status_code=status,
                     body=data,
-                    superExc=rerr
+                    superExc=None
                 )
+            # Raise TaskclusterRestFailure for all other issues
+            raise exceptions.TaskclusterRestFailure(
+                message,
+                status_code=status,
+                body=data,
+                superExc=None
+            )
 
             # Try to load JSON
             try:

@jhford
Copy link
Contributor

jhford commented Apr 7, 2017

Regarding the coroutine and context manager stuff, would you mind giving me a quick snippet of code that would trigger each of the errors and put the stuff we'd need to change in comments? I'm not an expert at python async/await, so I'm not 100% clear on the error here. I think it would be great for us to solve the problem though.

@escapewindow
Copy link
Contributor Author

escapewindow commented Apr 7, 2017

Here's a test script:

#!/usr/bin/env python
import aiohttp
import asyncio
from taskcluster.async import Queue

async def error_msg():
    # This actually looks like it's using taskcluster.client.createSession!
    q = Queue()
    await q.ping()

async def should_be_no_error_msg():
    # unpatched, this still warns about unclosed client session because we're
    # not passing session to asyncutils.makeSingleHttpRequest from AsyncBaseClient
    with aiohttp.ClientSession() as session:
        q = Queue(session=session)
        print(await q.ping())

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    #loop.run_until_complete(error_msg())
    loop.run_until_complete(should_be_no_error_msg())

I think I found a couple issues that I might be able to fix.

@escapewindow
Copy link
Contributor Author

With the latest patches, I'm able to run both the error_msg and should_be_no_error_msg functions without errors.

@escapewindow
Copy link
Contributor Author

@jhford any thoughts on the latest changes/comments?

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the only outstanding this is finishing the removal of raise_for_status in the sync class to match that in the async class.

Once we have that dealt with, I'm happy to merge

status = response.status
# Handle non 2xx status code and retry if possible
if status < 200 or status >= 300:
rerr = aiohttp.ClientResponseError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I don't think there's any reason for us to be throwing an aiohttp internal exception here. Let's just throw the exceptions.TaskclusterRestFailure with the properties here.

# error message for status >= 400, and craft one for other
# non-2xx statuses.
if status < 200 or status >= 300:
raise requests.exceptions.RequestException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like manually creating requests library exceptions outside of that package. I think that like the async version, we should not call the raise_for_status method, meaning we won't throw any exception for a valid non-200 request. Instead, we should handle the different non-exceptional status codes as application logic.

# error message for status >= 400, and craft one for other
# non-2xx statuses.
if status < 200 or status >= 300:
raise requests.exceptions.RequestException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is what I'm referring to, similar to the changes in async. I just don't really want the control flow to differ that much between sync and async when it doesn't have to.

diff --git a/taskcluster/client.py b/taskcluster/client.py
index 08bd2dc..23a715d 100644
--- a/taskcluster/client.py
+++ b/taskcluster/client.py
@@ -408,57 +408,57 @@ class BaseClient(object):
 
             # Handle non 2xx status code and retry if possible
             status = response.status_code
-            try:
-                response.raise_for_status()
-                if status == 204:
-                    return None
-
-                # raise_for_status only catches status >= 400, but it also
-                # customizes the error message.  Let's use that customized
-                # error message for status >= 400, and craft one for other
-                # non-2xx statuses.
-                if status < 200 or status >= 300:
-                    raise requests.exceptions.RequestException(
-                        '%s Error: %s for url: %s' % (
-                            response.status_code, response.reason, response.url
-                        ), response=response
-                    )
+            if status == 204:
+                return None
 
-            except requests.exceptions.RequestException as rerr:
-                status = response.status_code
-                if 500 <= status and status < 600 and retry < retries:
-                    log.warn('Retrying because of: %s' % rerr)
-                    continue
-                # Parse messages from errors
-                data = {}
-                try:
-                    data = response.json()
-                except:
-                    pass  # Ignore JSON errors in error messages
-                # Find error message
-                message = "Unknown Server Error"
-                if isinstance(data, dict):
-                    message = data.get('message')
-                else:
-                    if status == 401:
-                        message = "Authentication Error"
-                    elif status == 500:
-                        message = "Internal Server Error"
-                # Raise TaskclusterAuthFailure if this is an auth issue
-                if status == 401:
-                    raise exceptions.TaskclusterAuthFailure(
-                        message,
-                        status_code=status,
-                        body=data,
-                        superExc=rerr
-                    )
-                # Raise TaskclusterRestFailure for all other issues
+            # Catch retryable errors and go to the beginning of the loop
+            # to do the retry
+            if 500 <= status and status < 600 and retry < retries:
+                log.warn('Retrying because of a %s status code' % status)
+                continue
+
+            # Throw errors for non-retryable errors
+            if status < 200 or status >= 300:
                 raise exceptions.TaskclusterRestFailure(
+                    '%s Error: %s for url: %s' % (
+                        response.status_code, response.reason, response.url
+                    ), 
+                    status_code=status,
+                    response=response,
+                    body=data,
+                    superExc=None,
+                )
+
+            # Handle valid responses
+            data = {}
+            try:
+                data = response.json()
+            except:
+                pass  # Ignore JSON errors in error messages
+            # Find error message
+            message = "Unknown Server Error"
+            if isinstance(data, dict):
+                message = data.get('message')
+            else:
+                if status == 401:
+                    message = "Authentication Error"
+                elif status == 500:
+                    message = "Internal Server Error"
+            # Raise TaskclusterAuthFailure if this is an auth issue
+            if status == 401:
+                raise exceptions.TaskclusterAuthFailure(
                     message,
                     status_code=status,
                     body=data,
-                    superExc=rerr
+                    superExc=None
                 )
+            # Raise TaskclusterRestFailure for all other issues
+            raise exceptions.TaskclusterRestFailure(
+                message,
+                status_code=status,
+                body=data,
+                superExc=None
+            )
 
             # Try to load JSON
             try:

@escapewindow
Copy link
Contributor Author

Looks like something in the latest requested changes has broken a lot of tests. Looking...

@escapewindow
Copy link
Contributor Author

I believe we have the workflow we want now.

When we raised an exception right after the early if status < 200 or status >= 300: block, the subsequent logic that raises a TaskclusterAuthFailure on 401 was skipped, and a TaskclusterRestFailure on all other errors was triggering for any request, even a 200.

Since the block includes status < 200 now, the final TaskclusterRestFailure should handle those statuses.

Does this look ok @jhford ?

@jhford jhford merged commit 83c53dc into taskcluster:master Apr 25, 2017
@jhford
Copy link
Contributor

jhford commented Apr 25, 2017

Yep, looks good to me, squashed and merged

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants