Description
Feature Request: Detect and Flag Non-ASCII Characters in Identifiers
Summary
Add a StyleCop rule (or rules) to detect and flag identifiers that contain non-ASCII characters (e.g., Greek, Cyrillic), which can be visually indistinguishable from standard Latin letters.
Use Case / Motivation
When coding in C#, developers sometimes inadvertently switch keyboard layouts (e.g., to Greek) and end up typing characters that look identical to standard Latin letters but are actually different Unicode code points. For instance:
public interface ΙMyService // 'Ι' here is Greek capital Iota (U+0399)
{
// ...
}
public class IMyService : ΙMyService // This won't compile as expected
{
// ...
}
It’s very easy to end up troubleshooting odd compile errors or references not matching, only to discover a single character is from the wrong alphabet.
A StyleCop rule that flags these occurrences would provide immediate feedback to developers, preventing such subtle bugs.
Proposed Solution
-
New Rule:
- ID: Suggest something like
SA????
(whatever fits StyleCop’s numbering scheme). - Name: “IdentifiersMustUseAsciiCharacters”
- Category: “Naming” or “Maintainability.”
- Severity: Configurable; default to Warning.
- ID: Suggest something like
-
Behavior:
- For each identifier (class, interface, method, property, field, local variable, parameter, etc.), scan the text for any character outside the ASCII range (
> 0x7F
). - If found, report a diagnostic indicating which identifier is problematic.
- For each identifier (class, interface, method, property, field, local variable, parameter, etc.), scan the text for any character outside the ASCII range (
-
Configuration:
- Allow users to set whether they want to disallow all non-ASCII characters or only certain sets of known homoglyphs (e.g., Greek, Cyrillic, etc.).
- Possibly allow ignoring some characters if needed for legitimate non-English names (but that might be out of scope for a first pass).
-
Rationale:
- This rule prevents confusion caused by visually identical but semantically different characters, saving time and reducing friction during development.
- Many teams adopt “English-only identifiers” as a best practice to avoid these pitfalls, so providing built-in enforcement aligns with real-world usage.
Potential Implementation Details
- Roslyn:
- A SyntaxNode or SyntaxToken analysis hooking into
SyntaxKind.IdentifierToken
. - Perform a quick check:
foreach (char c in identifierText) { if (c > 127) { // Report diagnostic } }
- A SyntaxNode or SyntaxToken analysis hooking into
- Message:
- Something like: “Identifier
{0}
contains non-ASCII characters and may cause confusion.”
- Something like: “Identifier
- Example:
The analyzer would produce a warning explaining that the identifier is using a non-ASCII character.
public void ΜyMethod() // This 'Μ' might be Greek capital Mu { }
Benefits
- Immediate Feedback: Prevents confusion from near-homoglyphs that can break references or cause subtle bugs.
- Aligns with Common Practices: Many coding standards advise using only ASCII for public-facing identifiers.
- Minimal Overhead: Implementation is straightforward (simple character check).
- Highly Configurable: Could provide toggles or whitelists for teams who need exceptions.
Possible Downsides or Considerations
- Legitimate Use of Non-ASCII: In some projects, non-English words or domain-specific terminology might be intentionally used. A global rule might cause false positives.
- Mitigation: Provide .editorconfig or rule settings so the user can suppress or allow certain code blocks or whitelisted characters.
Thank you for all the great work on StyleCop Analyzers. We’d love to see this feature to help developers avoid tricky unicode/homoglyph issues in their day-to-day C# projects.