Skip to content

<ctype.h> is*() and to*() functions are being used unsafely #13332

@Keith-S-Thompson

Description

@Keith-S-Thompson

Steps to reproduce

There is no easily reproducible bug on most platforms.

The is*() and to*() standard functions declared in <ctype.h> require an argument that either is equal to EOF (typically -1) or is in the range of unsigned char.

If plain char is signed, as it is on many systems, a character extracted from a command line argument, from user interactive input, or from a file can have a negative value, resulting in undefined behavior. Typically this applies only to non-ASCII characters.

In most implementations, these functions behave reasonably sanely for negative values, so in spite of the undefined behavior, the observed behavior is likely to be benign.

One exception: Microsoft's C implementation (I used Visual Studio 2022 on Windows 11) aborts on an invalid argument to these functions, but only when an application is built in Debug mode. I've built xxd.c as a standalone application under Visual Studio. The resulting executable crashes when invoked as follows (I named the source file c.c for this quick-and-dirty test):

Release\c.exe -i héllö.txt

I've created a fork of this repo and a branch with a proposed fix for this issue:
https://github.com/Keith-S-Thompson/vim/tree/ctype-fix

I have not yet created a pull request. I'll be glad to do so.

The fix adds a number of macros, such as:

#define SAFE_isalnum(c)  (isalnum ((unsigned char)(c)))

and replaces most calls to the standard functions with invocations of these macros.

Note that there is already some code that's designed to avoid problems calling these functions with negative arguments. This existing code does not cover all cases; my update is intended to do so. For now, I've left most such code in place, though it's probably unnecessary.

(An aside: vim has code that works around implementations where toupper() and tolower() operate incorrectly on arguments that are not respectively lowercase or uppercase letters. I'm skeptical that there are any existing implementations that (a) have this bug and (b) are able to compile vim, since toupper() and tolower() have been required to operate correctly since the 1989 ANSI C standard. This is not directly related to this issue, but it's a potential opportunity for some cleanup.)

Expected behaviour

Calls to is*() and to*() functions declared in <ctype.h> should behave correctly -- which is in fact what currently happens 99% of the time. (Behaving correctly is just one possible consequence of undefined behavior.)

Version of Vim

9.0.2018

Environment

OS: Ubuntu 22.04.3 ; Windows 11
Terminal: xterm, tmux ; cmd.exe
$TERM: tmux-256-color ; N/A
shell: bash 5.3.0 ; cmd.exe

Logs and stack traces

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions