Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cheat" env behavior causes an application crash with SystemError #1408

Closed
a-feld opened this issue Nov 28, 2016 · 7 comments
Closed

"cheat" env behavior causes an application crash with SystemError #1408

a-feld opened this issue Nov 28, 2016 · 7 comments

Comments

@a-feld
Copy link
Contributor

a-feld commented Nov 28, 2016

Hi!

My name is Allan and I work at New Relic.

Overview

We've had various customer reports of application crashes when using uwsgi + Python3 with our agent.

(Example: https://github.com/Mobeye/NewRelic)

The reported error is:

SystemError: Objects/tupleobject.c:156: bad argument to internal function

After some investigation, we discovered (under certain circumstances) using the uwsgi_python_create_env_cheat env behavior
can cause the underlying WSGI application to cease communicating with uwsgi.

This error can appear and cause application crashes even when the New Relic agent is not used.

Although uwsgi consistently causes an error to be set with uwsgi_python_create_env_cheat independent of the cpython version, this error only exposes a crash in Python 3.5 and greater. As more developers move to Python 3.5, this issue will start to occur more frequently.

Debug Summary

The following versions were used for reproduction of this issue:

Name Version
uwsgi 2.0.14
Python 3.5.2
wrapt 1.10.8

uwsgi_python_create_env_cheat re-uses the async_args tuple between transactions (request/response).
As part of its re-use it calls PyTuple_SetItem(wsgi_req->async_args, 0, wsgi_req->async_environ) on every request.
If the reference count on the wsgi_req->async_args tuple != 1, then Python raises a SystemError and causes PyEval_CallObject to return NULL.

To understand why, in some cases, the reference count on wsgi_req->async_args never returns to 1, consider the following example:

import wrapt

@wrapt.function_wrapper
def wrapper(wrapped, instance, args, kwargs):
    print("I'm wrapped")
    return wrapped(*args, **kwargs)


class WSGIHandler(object):
    @wrapper
    def __call__(self, environ, start_response):
        raise Exception('BAD APPLICATION')

application = WSGIHandler()

In this case, the wrapt library adds a function between uwsgi and the __call__ method of WSGIHandler. The reference count for async_args starts at 1 and the subsequent call to the wrapper(wrapped, instance, args, kwargs) passes async_args as args which brings the reference count to 2.

When a traceback is generated in any version of cpython, a reference to the frame which generated the traceback is created and the frame's reference count is subsequently incremented. This ensures that the frame which generated the traceback is not deallocated before the traceback is handled.

In the above example, the traceback is generated from the raise Exception('BAD APPLICATION'). The exception is subsequently handled in uwsgi through a PyErr_Print() call in uwsgi_python_exception_log here.

According to the C-API documentation, this call sets the sys.last_traceback variable and holds a reference to the traceback which consequently holds a reference to the frame. Since the frame is never deallocated, the reference count of wsgi_req->async_args remains at 2 for the next transaction.

When the next request arrives, the call to PyTuple_SetItem(wsgi_req->async_args, 0, wsgi_req->async_environ) (plugins/python/wsgi_subhandler.c:237) causes a SystemError exception with the application not being able to communicate with uwsgi.

To further demonstrate that this is the case, changing PyErr_Print to PyErr_PrintEx(0); eliminates the SystemError above.

In some cases, sys.last_traceback is not the only Python object that holds references to the frame. There are examples relating to tracebacks / frames being held due to cyclical references which are garbage collected on the next request/response. In those cases, even when sys.last_traceback is not stored, the frame is not deallocated at the end of the request/response.

Possible Solutions

  1. uwsgi can use uwsgi_python_create_env_holy as the default env creation process. This creates a new Tuple for every request and therefore resolves the SystemError issue.

  2. uwsgi can check for PyErr_Occurred as part of python_call and choose to clear certain types of errors before the underlying function call is made. Printing the error may be a reasonable compromise rather than keeping the error set for the function call (which causes the app to cease communications with uwsgi). An example of this code might be:

plugins/python/pyutils.c
@@ -258,6 +258,9 @@ void uwsgi_python_exception_log(struct wsgi_request *wsgi_req) {
 }
 
 PyObject *python_call(PyObject *callable, PyObject *args, int catch, struct wsgi_request *wsgi_req) {
+        if (PyErr_Occurred()) {
+            PyErr_Print();
+        }
 
        //uwsgi_log("ready to call %p %p\n", callable, args);
        
        PyObject *result = PyEval_CallObject(extract_tb, args);

I would be happy to submit a pull request with a fix, but I thought I'd ask the project what the preferred solution is before doing so.
Thanks in advance for your help!

@unbit
Copy link
Owner

unbit commented Nov 29, 2016

Hi @a-feld, thanks for your analysis. I think it is time to start removing nowadays-useless optimizations in the core, like the one you mentioned. In 2.1 i have just changed the default to the holy one. Now we need to find a workaround for 2.0 without changing the default. You solution in python_call would be a bit 'brutal' so before accepting it i would like to find a better one. Eventually we could consider setting the default allocator for python 3 as the 'holy' one (that probably is even better)

@a-feld
Copy link
Contributor Author

a-feld commented Nov 29, 2016

Hi @unbit! Thanks for the quick response!
For 2.0, adding the following option to a uwsgi.ini appears to eliminate the SystemError:
wsgi-env-behavior = holy

unbit added a commit that referenced this issue Nov 29, 2016
@unbit
Copy link
Owner

unbit commented Nov 29, 2016

What about this ? da68b9d

unbit added a commit that referenced this issue Nov 29, 2016
@a-feld
Copy link
Contributor Author

a-feld commented Nov 29, 2016

Decrementing the reference count on the tuple without deleting the object referencing the tuple can cause the tuple to be deallocated while there is still a reference to the tuple. If the tuple is deallocated in that case, it will eventually result in undefined interpreter behavior.

What about this solution? #1411

Since wsgi_req->async_environ is also reused we can check to see if the async_args tuple does not already contain async_environ before calling PyTuple_SetItem. If the PyTuple_SetItem call would be idempotent, then there's no reason to call it at all (which prevents the SystemError from occurring). This ensures the "cheat" env behavior will not raise a SystemError, allowing developers to still choose to use uwsgi_python_create_env_cheat.

unbit added a commit that referenced this issue Dec 5, 2016
Fix SystemError exception when async_args PyTuple is reused. #1408
@unbit
Copy link
Owner

unbit commented Dec 5, 2016

I think your patch is less invasive than mine. Decref'ing blindly as i did could lead to side problems. Merged it. @xrmx safe for backport

@a-feld
Copy link
Contributor Author

a-feld commented Dec 5, 2016

Thanks for the speedy response and resolution @unbit! I'll go ahead and close this ticket.

@a-feld a-feld closed this as completed Dec 5, 2016
xrmx pushed a commit that referenced this issue Dec 6, 2016
In the case where the uwsgi_python_create_env_cheat env behavior is
used, the async_args PyTuple is reused on every request.

In some cases, the reference count on async_args may be > 1 at the end
of a request/response cycle. If PyTuple_SetItem is called with a
reference count > 1, a SystemError is raised.

When the cheat env behavior is used, the environ dictionary is also
reused between requests. This enables the elimination of redundant
PyTuple_SetItem calls by first checking to see if the environ dictionary
is already in the async_args tuple.
@xrmx
Copy link
Collaborator

xrmx commented Dec 6, 2016

Backported, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants