Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV with broken unicode due to Py_BuildValue returning NULL #1463

Closed
amerlyq opened this issue Apr 3, 2020 · 3 comments
Closed

SIGSEGV with broken unicode due to Py_BuildValue returning NULL #1463

amerlyq opened this issue Apr 3, 2020 · 3 comments
Assignees
Labels
bug Unexpected problem or unintended behavior
Milestone

Comments

@amerlyq
Copy link

amerlyq commented Apr 3, 2020

Bug summary

I'm using chain skypeweb -> purple -> bitlbee -> weechat.
Skypeweb is notorious in sometimes messing up encodings for some messages or even parts of them, feeding different incompatible encodings to weechat at once.
Incorrect unicode sometimes crashes weechat, because weechat don't have error processing routine to fallback in the case key = "text", dict_value = NULL.

#0  0x00007f8341db3ce5 in raise () from /usr/lib/libc.so.6
#1  0x00007f8341d9d857 in abort () from /usr/lib/libc.so.6
#2  0x00005627187ed026 in weechat_shutdown (return_code=..., crash=...) at weechat.c:739
#3  <signal handler called>
#4  0x00007f834089fa99 in PyDict_SetItem () from /usr/lib/libpython3.8.so.1.0

(gdb) f 5
#5  0x00007f8340b9b134 in weechat_python_hashtable_map_cb (data=0x7f833c2acac0, hashtable=<optimized out>, key=<optimized out>, value=<optimized out>) at /tmp/weechat/src/plugins/python/weechat-python.c:240
warning: Source file is more recent than executable.
240         PyDict_SetItem (dict, dict_key, dict_value);

(gdb) list
235         dict = (PyObject *)data;
236
237         dict_key = Py_BuildValue ("s", key);
238         dict_value = Py_BuildValue ("s", value);
239
240         PyDict_SetItem (dict, dict_key, dict_value);
241
242         Py_DECREF (dict_key);
243         Py_DECREF (dict_value);
244     }

Steps to reproduce

Easiest way -- is to artificially set input argument value to broken unicode, to drop gdb into segfault.
Of course, you must enable Python so weechat_python_hashtable_map_cb would be called at least once.
Actual value below is directly copied from the message which initially crashed my weechat on receiving.

gdb -ex 'source /dev/fd/3' weechat 3<<< $'set breakpoint pending on\nb weechat_python_hashtable_map_cb\ncomm 1\nset var value = ":\\xd0\\x9e\\xd1\\x84\\xd0\\x00\\xd0\\xbc"\ncont\nend\nrun'

Suggested solutions

Simply don't print broken unicode on screen at all.
We may store it into logs, if possible, but if not -- better store nothing.
Working weechat is more important.
However you could replace value by some <Broken Unicode> message to at least notify user about problems, so him could ask people to send message again.

  • WeeChat version: 2.9-dev, c59f812
  • OS, distribution and version:  Linux 5.5.8-arch1-1 SMP PREEMPT x86_64 GNU/Linux
@amerlyq amerlyq added the bug Unexpected problem or unintended behavior label Apr 3, 2020
@flashcode
Copy link
Member

flashcode commented Apr 4, 2020

Hi,
Do you have the complete backtrace, so I can identify which API call raised this crash?
And do you have a way to reproduce without having to change sources?
Thanks.

@flashcode flashcode self-assigned this Apr 4, 2020
@flashcode flashcode added the waiting info Waiting for info from author of issue label Apr 4, 2020
@flashcode
Copy link
Member

According to my tests, you're right, if I give invalid UTF-8 string to Py_BuildValue, like that:

dict_value = Py_BuildValue ("s", "\xc3");

then the returned value is NULL, and it crashes in the call:

PyDict_SetItem (dict, dict_key, dict_value);

I need to know how such invalid UTF-8 can be received in value, in this function.
I tested many functions using hashtables, I can not reproduce, even with invalid UTF-8 data, WeeChat always tries to convert to valid UTF-8 before reaching this function.

I could fix this function, but this is maybe hiding a problem elsewhere in the coe, that's why I must be able to reproduce to understand exactly what's happening.

@amerlyq
Copy link
Author

amerlyq commented Apr 4, 2020

Ah, yes, to validate it on the interface level is really the better solution.
Still for me the rest of the backtrace is looking "somewhat too generic" due to all the callbacks, to relate it to anything. But I can privately share core and its corresponding binary, if it's necessary. Or give you the requested info in chat.

Sorry, but I can't reproduce it anymore. At least without recreating whole situation which would require much efforts. This problem had arised earlier multiple times due to skypeweb messing up encodings, but I couldn't track it down then. Reproduction is possible only when in current skype chat one of last messages is broken and when you connect to skype server it sends to you back last 10 or so messages (and crashes your weechat). But because these messages aren't saved nowhere in the logs of bitlbee, nor in the weechat (due to crash), then I can't reproduce it anymore when chat moves past these 10 message forward.
However we can try to simulate it in some irc server, of course...

(gdb) bt -full -entry-values both -frame-info source-and-location

#0  0x00007f8341db3ce5 in raise () from /usr/lib/libc.so.6
No symbol table info available.

#1  0x00007f8341d9d857 in abort () from /usr/lib/libc.so.6
No symbol table info available.

#2  0x00005627187ed026 in weechat_shutdown (return_code=0x1, return_code@entry=<optimized out>, crash=0x1, crash@entry=<optimized out>) at /usr/src/debug/weechat/src/core/weechat.c:739
739             abort ();
No locals.

#3  <signal handler called>
No symbol table info available.

#4  0x00007f834089fa99 in PyDict_SetItem () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#5  0x00007f8340b9b134 in weechat_python_hashtable_map_cb (data=0x7f833c2acac0, data@entry=<optimized out>, hashtable=<optimized out>, hashtable@entry=<optimized out>, key=<optimized out>, key@entry=<optimized out>, value=<optimized out>, value@entry=<optimized out>) at /usr/src/debug/weechat/src/plugins/python/weechat-python.c:240
240         PyDict_SetItem (dict, dict_key, dict_value);
        dict = 0x7f833c2acac0
        dict_key = 0x7f833c2da670
        dict_value = 0x0

#6  0x0000562718810a96 in hashtable_map_string (hashtable=0x56271bbea170, hashtable@entry=<optimized out>, callback_map=0x7f8340b9b0f0 <weechat_python_hashtable_map_cb>, callback_map@entry=<optimized out>, callback_map_data=0x7f833c2acac0, callback_map_data@entry=<optimized out>) at /usr/src/debug/weechat/src/core/wee-hashtable.c:632
632                 (void) (callback_map) (callback_map_data,
        i = 0x11
        ptr_item = <optimized out>
        ptr_next_item = 0x56271bc9c5d0
        str_key = <optimized out>
        str_value = <optimized out>
        key = 0x56271bc99520 "text"
        value = 0x56271a6f5c60 "XXXXX"...

#7  0x00007f8340b9bd75 in weechat_python_hashtable_to_dict (hashtable=0x56271bbea170, hashtable@entry=0x56271bbea170) at /usr/src/debug/weechat/src/plugins/python/weechat-python.c:262
262         weechat_hashtable_map_string (hashtable, &weechat_python_hashtable_map_cb,
        dict = 0x7f833c2acac0

#8  0x00007f8340bb1cc4 in weechat_python_api_info_get_hashtable (self=<optimized out>, self@entry=<optimized out>, args=<optimized out>, args@entry=<optimized out>) at /usr/src/debug/weechat/src/plugins/python/weechat-python-api.c:4309
4309        result_dict = weechat_python_hashtable_to_dict (result_hashtable);
        info_name = 0x7f833ff36560 "irc_message_parse"
        hashtable = 0x56271bc996e0
        result_hashtable = 0x56271bbea170
        dict = 0x7f833c2ace80
        result_dict = <optimized out>
        python_function_name = 0x7f8340bbe460 "info_get_hashtable"

#9  0x00007f8340870b6f in PyCFunction_Call () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#10 0x00007f8340863f42 in _PyObject_MakeTpCall () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#11 0x00007f834092198f in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#12 0x00007f834090e0dd in _PyFunction_Vectorcall () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#13 0x00007f83408c9a2e in PyObject_CallFunction () from /usr/lib/libpython3.8.so.1.0
No symbol table info available.

#14 0x00007f8340b9c498 in weechat_python_exec (script=0x56271a917f70, script@entry=0x56271a917f70, ret_type=0x1, ret_type@entry=0x1, function=0x56271ac755d0 "\260\222\344\031'V", function@entry=<optimized out>, format=0x7f8340bbe6aa "ssss", format@entry=0x7f8340bbe6aa "ssss", argv=0x7ffccab9a0f0, argv@entry=0x7ffccab9a0f0) at /usr/src/debug/weechat/src/plugins/python/weechat-python.c:515
515             rc = PyObject_CallFunction (evFunc, format2,
        old_python_current_script = 0x56271ba2dca0
        old_interpreter = 0x56271ad13ec0
        evMain = <optimized out>
        evDict = <optimized out>
        evFunc = 0x7f833fef65e0
        rc = <optimized out>
        argv2 = {0x7ffccab9a117, 0x7ffccab9a220, 0x56271a2ad6e0, 0x56271bb55070, 0x0 <repeats 12 times>}
        ret_value = 0x0
        ret_temp = <optimized out>
        format2 = "sssy\000V\000\000\370\t\002A\203\177\000\000P"
        i = <optimized out>
        argc = 0x4
        ret_int = <optimized out>

#15 0x00007f8340b9df81 in weechat_python_api_hook_modifier_cb (pointer=0x56271a917f70, pointer@entry=<optimized out>, data=<optimized out>, data@entry=<optimized out>, modifier=<optimized out>, modifier@entry=<optimized out>, modifier_data=<optimized out>, modifier_data@entry=<optimized out>, string=<optimized out>, string@entry=<optimized out>) at /usr/src/debug/weechat/src/plugins/python/weechat-python-api.c:2977
2977            return (char *)weechat_python_exec (script,
        script = 0x56271a917f70
        func_argv = {0x7ffccab9a117, 0x7ffccab9a220, 0x56271a2ad6e0, 0x56271bb55070}
        empty_arg = ""
        ptr_function = 0x56271ac755d0 "\260\222\344\031'V"
        ptr_data = 0x0

#16 0x000056271882b5f5 in hook_modifier_exec (plugin=<optimized out>, plugin@entry=<optimized out>, modifier=0x7ffccab9a220 "irc_in_PRIVMSG", modifier@entry=<optimized out>, modifier_data=0x56271a2ad6e0 "bitlbee", modifier_data@entry=<optimized out>, string=<optimized out>, string@entry=<optimized out>) at /usr/src/debug/weechat/src/core/hook/wee-hook-modifier.c:116
116                 new_msg = (HOOK_MODIFIER(ptr_hook, callback))
        ptr_hook = 0x56271ae36210
        next_hook = 0x56271ae8c140
        new_msg = <optimized out>
        message_modified = 0x56271bb55070 "XXXXX"...

#17 0x00007f8340fe919b in irc_server_msgq_flush () at /usr/src/debug/weechat/src/plugins/irc/irc-server.c:3020
3020                        new_msg = weechat_hook_modifier_exec (
        next = <optimized out>
        ptr_data = 0x56271bb52fb0 "XXXXX"...
        new_msg = <optimized out>
        new_msg2 = <optimized out>
        ptr_msg = <optimized out>
        ptr_msg2 = <optimized out>
        pos = <optimized out>
        nick = 0x56271bb56f50 "host"
        host = 0x56271bc9bd10 "\240b\307\033'V"
        command = 0x56271bb56f30 "PRIVMSG"
        channel = 0x56271bc31e20 "PRIVMSG"
        arguments = 0x56271bc05060 "p\016\300\033'V"
        msg_decoded = <optimized out>
        msg_decoded_without_color = <optimized out>
        str_modifier = "XXXXX
        modifier_data = "XXXXX"...
        pos_channel = 0x56
        pos_text = 0x61
        pos_decode = <optimized out>

#18 0x00007f8340fedceb in irc_server_recv_cb (data=<optimized out>, data@entry=<optimized out>, fd=<optimized out>, fd@entry=<optimized out>, pointer=<optimized out>, pointer@entry=<optimized out>) at /usr/src/debug/weechat/src/plugins/irc/irc-server.c:3308
3308            irc_server_msgq_flush ();
        server = <optimized out>
        num_read = <optimized out>
        msgq_flush = <optimized out>
        end_recv = <optimized out>
        server = <optimized out>
        buffer = "XXXXX"...
        num_read = <optimized out>
        msgq_flush = <optimized out>
        end_recv = <optimized out>

#19 irc_server_recv_cb (pointer=0x56271a2ad750, pointer@entry=<optimized out>, data=<optimized out>, data@entry=<optimized out>, fd=<optimized out>, fd@entry=<optimized out>) at /usr/src/debug/weechat/src/plugins/irc/irc-server.c:3215
3215    irc_server_recv_cb (const void *pointer, void *data, int fd)
        server = 0x56271a2ad750
        num_read = <optimized out>
        msgq_flush = <optimized out>
        end_recv = <optimized out>
        buffer = "XXXXX"...

#20 0x0000562718828cea in hook_fd_exec () at /usr/src/debug/weechat/src/core/hook/wee-hook-fd.c:261
261                     (void) (HOOK_FD(ptr_hook, callback)) (
        i = <optimized out>
        num_fd = 0x5
        timeout = <optimized out>
        ready = <optimized out>
        found = 0x1
        ptr_hook = 0x56271bb512f0
        next_hook = 0x56271bb44990

#21 0x000056271886795b in gui_main_loop () at /usr/src/debug/weechat/src/gui/curses/gui-curses-main.c:508
508             hook_fd_exec ();
        hook_fd_keyboard = 0x56271b1740c0
        send_signal_sigwinch = 0x0

#22 0x00005627187ed07e in main (argc=0x1, argc@entry=<optimized out>, argv=0x7ffccab9a598, argv@entry=<optimized out>) at /usr/src/debug/weechat/src/gui/curses/normal/main.c:43
43          gui_main_loop ();
No locals.

@flashcode flashcode removed the waiting info Waiting for info from author of issue label Apr 4, 2020
@flashcode flashcode added this to the 2.9 milestone Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants