"cheat" env behavior causes an application crash with SystemError #1408

a-feld · 2016-11-28T22:36:15Z

Hi!

My name is Allan and I work at New Relic.

Overview

We've had various customer reports of application crashes when using uwsgi + Python3 with our agent.

(Example: https://github.com/Mobeye/NewRelic)

The reported error is:

SystemError: Objects/tupleobject.c:156: bad argument to internal function

After some investigation, we discovered (under certain circumstances) using the uwsgi_python_create_env_cheat env behavior
can cause the underlying WSGI application to cease communicating with uwsgi.

This error can appear and cause application crashes even when the New Relic agent is not used.

Although uwsgi consistently causes an error to be set with uwsgi_python_create_env_cheat independent of the cpython version, this error only exposes a crash in Python 3.5 and greater. As more developers move to Python 3.5, this issue will start to occur more frequently.

Debug Summary

The following versions were used for reproduction of this issue:

Name	Version
uwsgi	2.0.14
Python	3.5.2
wrapt	1.10.8

uwsgi_python_create_env_cheat re-uses the async_args tuple between transactions (request/response).
As part of its re-use it calls PyTuple_SetItem(wsgi_req->async_args, 0, wsgi_req->async_environ) on every request.
If the reference count on the wsgi_req->async_args tuple != 1, then Python raises a SystemError and causes PyEval_CallObject to return NULL.

To understand why, in some cases, the reference count on wsgi_req->async_args never returns to 1, consider the following example:

import wrapt

@wrapt.function_wrapper
def wrapper(wrapped, instance, args, kwargs):
    print("I'm wrapped")
    return wrapped(*args, **kwargs)


class WSGIHandler(object):
    @wrapper
    def __call__(self, environ, start_response):
        raise Exception('BAD APPLICATION')

application = WSGIHandler()

In this case, the wrapt library adds a function between uwsgi and the __call__ method of WSGIHandler. The reference count for async_args starts at 1 and the subsequent call to the wrapper(wrapped, instance, args, kwargs) passes async_args as args which brings the reference count to 2.

When a traceback is generated in any version of cpython, a reference to the frame which generated the traceback is created and the frame's reference count is subsequently incremented. This ensures that the frame which generated the traceback is not deallocated before the traceback is handled.

In the above example, the traceback is generated from the raise Exception('BAD APPLICATION'). The exception is subsequently handled in uwsgi through a PyErr_Print() call in uwsgi_python_exception_log here.

According to the C-API documentation, this call sets the sys.last_traceback variable and holds a reference to the traceback which consequently holds a reference to the frame. Since the frame is never deallocated, the reference count of wsgi_req->async_args remains at 2 for the next transaction.

When the next request arrives, the call to PyTuple_SetItem(wsgi_req->async_args, 0, wsgi_req->async_environ) (plugins/python/wsgi_subhandler.c:237) causes a SystemError exception with the application not being able to communicate with uwsgi.

To further demonstrate that this is the case, changing PyErr_Print to PyErr_PrintEx(0); eliminates the SystemError above.

In some cases, sys.last_traceback is not the only Python object that holds references to the frame. There are examples relating to tracebacks / frames being held due to cyclical references which are garbage collected on the next request/response. In those cases, even when sys.last_traceback is not stored, the frame is not deallocated at the end of the request/response.

Possible Solutions

uwsgi can use uwsgi_python_create_env_holy as the default env creation process. This creates a new Tuple for every request and therefore resolves the SystemError issue.
uwsgi can check for PyErr_Occurred as part of python_call and choose to clear certain types of errors before the underlying function call is made. Printing the error may be a reasonable compromise rather than keeping the error set for the function call (which causes the app to cease communications with uwsgi). An example of this code might be:

plugins/python/pyutils.c
@@ -258,6 +258,9 @@ void uwsgi_python_exception_log(struct wsgi_request *wsgi_req) {
 }
 
 PyObject *python_call(PyObject *callable, PyObject *args, int catch, struct wsgi_request *wsgi_req) {
+        if (PyErr_Occurred()) {
+            PyErr_Print();
+        }
 
        //uwsgi_log("ready to call %p %p\n", callable, args);
        
        PyObject *result = PyEval_CallObject(extract_tb, args);

I would be happy to submit a pull request with a fix, but I thought I'd ask the project what the preferred solution is before doing so.
Thanks in advance for your help!

The text was updated successfully, but these errors were encountered:

unbit · 2016-11-29T07:05:10Z

Hi @a-feld, thanks for your analysis. I think it is time to start removing nowadays-useless optimizations in the core, like the one you mentioned. In 2.1 i have just changed the default to the holy one. Now we need to find a workaround for 2.0 without changing the default. You solution in python_call would be a bit 'brutal' so before accepting it i would like to find a better one. Eventually we could consider setting the default allocator for python 3 as the 'holy' one (that probably is even better)

a-feld · 2016-11-29T07:23:44Z

Hi @unbit! Thanks for the quick response!
For 2.0, adding the following option to a uwsgi.ini appears to eliminate the SystemError:
wsgi-env-behavior = holy

unbit · 2016-11-29T07:37:30Z

What about this ? da68b9d

a-feld · 2016-11-29T20:04:38Z

Decrementing the reference count on the tuple without deleting the object referencing the tuple can cause the tuple to be deallocated while there is still a reference to the tuple. If the tuple is deallocated in that case, it will eventually result in undefined interpreter behavior.

What about this solution? #1411

Since wsgi_req->async_environ is also reused we can check to see if the async_args tuple does not already contain async_environ before calling PyTuple_SetItem. If the PyTuple_SetItem call would be idempotent, then there's no reason to call it at all (which prevents the SystemError from occurring). This ensures the "cheat" env behavior will not raise a SystemError, allowing developers to still choose to use uwsgi_python_create_env_cheat.

Fix SystemError exception when async_args PyTuple is reused. #1408

unbit · 2016-12-05T08:48:46Z

I think your patch is less invasive than mine. Decref'ing blindly as i did could lead to side problems. Merged it. @xrmx safe for backport

a-feld · 2016-12-05T23:10:22Z

Thanks for the speedy response and resolution @unbit! I'll go ahead and close this ticket.

In the case where the uwsgi_python_create_env_cheat env behavior is used, the async_args PyTuple is reused on every request. In some cases, the reference count on async_args may be > 1 at the end of a request/response cycle. If PyTuple_SetItem is called with a reference count > 1, a SystemError is raised. When the cheat env behavior is used, the environ dictionary is also reused between requests. This enables the elimination of redundant PyTuple_SetItem calls by first checking to see if the environ dictionary is already in the async_args tuple.

xrmx · 2016-12-06T08:40:44Z

Backported, thanks!

unbit added a commit that referenced this issue Nov 29, 2016

attempted fix for #1408

da68b9d

unbit added a commit that referenced this issue Nov 29, 2016

removed useless garbage introduced in #1408

199d2d5

unbit added a commit that referenced this issue Nov 29, 2016

attempted fix for #1408

a9b3336

unbit added a commit that referenced this issue Nov 29, 2016

removed useless garbage introduced in #1408

109c32a

unbit added a commit that referenced this issue Dec 5, 2016

Merge pull request #1411 from a-feld/master

e61e3e6

Fix SystemError exception when async_args PyTuple is reused. #1408

a-feld closed this as completed Dec 5, 2016

andrewgoetz mentioned this issue Jan 3, 2017

Same problem with NewRelic Mobeye/NewRelic#1

Open

ipmb mentioned this issue Mar 23, 2017

Crashes with wsgi-env-behavior=holy #1497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"cheat" env behavior causes an application crash with SystemError #1408

"cheat" env behavior causes an application crash with SystemError #1408

a-feld commented Nov 28, 2016 •

edited

Loading

unbit commented Nov 29, 2016 •

edited

Loading

a-feld commented Nov 29, 2016

unbit commented Nov 29, 2016

a-feld commented Nov 29, 2016

unbit commented Dec 5, 2016

a-feld commented Dec 5, 2016

xrmx commented Dec 6, 2016

"cheat" env behavior causes an application crash with SystemError #1408

"cheat" env behavior causes an application crash with SystemError #1408

Comments

a-feld commented Nov 28, 2016 • edited Loading

Overview

Debug Summary

Possible Solutions

unbit commented Nov 29, 2016 • edited Loading

a-feld commented Nov 29, 2016

unbit commented Nov 29, 2016

a-feld commented Nov 29, 2016

unbit commented Dec 5, 2016

a-feld commented Dec 5, 2016

xrmx commented Dec 6, 2016

a-feld commented Nov 28, 2016 •

edited

Loading

unbit commented Nov 29, 2016 •

edited

Loading