Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault on repr(OOBucket([... non-Latin-1 thingies ...])) #108

Merged
merged 3 commits into from
Jul 30, 2019

Conversation

mgedmin
Copy link
Member

@mgedmin mgedmin commented Jul 29, 2019

Fixes #106.

The bug was in not checking the return value of PyUnicode_AsLatin1String() and passing a NULL to PyOS_snprintf().

The fix uses PyUnicode_FromFormat() and avoids all the hassle of manual encoding/sprintf/decoding and even calling repr().

I've also added a regression test and tweaked tox.ini to set PYTHONFAULTHANDLER=1 to make the segfault visible -- without that all you see is the test runner exiting unexpectedly with no additional output (and you have to interpret "exited with code -11" as died from a SIGSEGV yourself).

The test fails on Python 3 only, and on non-pure builds only.  I added
PYTHONFAULTHANDLER=1 to the environment to make the failure less
mysteriously silent.

I've also updated tox.ini to use extras=foo instead of deps=.[foo] and
to use factors to set PURE_PYTHON=1 instead of defining an entire
environment.  This now lets you run arbitrary pure Python builds like
tox -e py37-pure.
Fixes #106.

The bug was in not checking the return value of
PyUnicode_AsLatin1String() and passing a NULL to PyOS_snprintf().

The fix is to use PyUnicode_FromFormat() and avoid all the hassle of
manual encoding/sprintf/decoding and even calling repr().
@mgedmin mgedmin requested review from tseaver and jamadden July 29, 2019 13:15
Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good.

Probably needs a change note.

I wonder if it might be more clear at this point to put the #ifdef PY3K around the entire definition of bucket_repr, since there's essentially one line of shared code now. That would save the large stack allocation. (PyObject_Repr deals gracefully with a NULL input, but sadly PyUnicode_FromFormat("%R") does not, so the check on bucket_items is still necessary.)

#ifdef PY3K
static PyObject * bucket_repr(Bucket *self) 
{
   PyObject *i, *r;
   i = bucket_items(self, NULL, NULL);
   if (!i) 
        return NULL;
   r = PyUnicode_FromFormat(...)
   Py_DECREF(i)
   return r;
}
#else 
static PyObject* bucket_repr(Bucket *self)
{
   PyObject *i, *r;
   char repr[10000];
   int rv;
   i = bucket_items(self, NULL, NULL);
   if (!i) 
        return NULL;
   // complicated code here
}
#endif

@mgedmin
Copy link
Member Author

mgedmin commented Jul 29, 2019

The large stack allocation is inside a #ifndef PY3K, so it should already be saved.

Splitting the entire definition would probably improve readability. (OTOH I can't wait for Jan 2020 to roll around so I can submit a PR ripping out Python 2 compatibility and all the ifdefs.)

Good catch on the changelog note!

@mgedmin mgedmin merged commit 44fd158 into master Jul 30, 2019
@mgedmin mgedmin deleted the segfault-on-unicode branch July 30, 2019 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIGSEGV in bucket_repr() with items that have a non-Latin-1 repr() on Python 3
3 participants