Skip to content

strptime() fails to parse month names containing İ (U+0130) #136028

Closed
@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

Bug description:

strptime() fails to parse month names containing "İ" (U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) like "İyun" or "İyul". This affects locales 'az_AZ', 'ber_DZ', 'ber_MA' and 'crh_UA'.

This happens because 'İ'.lower() == 'i\u0307', but the re module only supports 1-to-1 character matching in case-insensitive mode.

There are several ways to fix this:

  1. Do not convert month names (and any other names) to lower case in _strptime.LocaleTime. This is a large change and it would make converting names to indices more complex and/or slower. This is universal way which would work with any other letters which are converted to multiple characters in lower case (but currently 'İ' the only such case in Linux locales).
  2. Just fix the regular expressions created from lower-cased month names. This only works for 'İ' in month names, but this is all that is needed for now.

The third way -- converting the input string to lower case before parsing -- does not work, because Z for timezone is case-sensitive.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Labels

3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesstdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions