Skip to content

New Rule: disallow unicode confusable identifiers #117

@mhofman

Description

@mhofman

Rule details

Compute the Unicode skeleton of declared identifiers and disallow if similar to an identifier already in scope

Related CVE

CVE-2021-42694

Example code

const loremIpsum = "latin only";
const lоrеmIрsum = "with Cyrillic ";
const lorem‍Ipsum = "with ZWJ";

Participation

  • I am willing to submit a pull request to implement this rule.

Additional comments

The Zero-Width Joiner (\u200d) is a valid identifier character, even though some parsers like the ones used by typescript or Webpack fail to parse correctly.

Cyrillic characters in the example code is one case of confusable unicode character with latin character, but there are a lot of other possibilities, including confusion between non-latin characters. Unicode defines an algorithm to compute the skeleton of text, which we could apply to identifiers, and base the comparison on the skeleton instead of the identifier string.

First reported in eslint/eslint#15240 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions