Skip to content

Small detail on capturing all capitalized words #24

Closed
@akonsta

Description

@akonsta

In your code you use the regex
r'[A-Z][a-z0-9]+'
as the pattern, but that would not count capitalized one letter words (i.e., I or A), capitalized words with apostrophes (e.g., Don't, Isn't, O'Leary) and it would also miss words (or abbreviations that had one or more capitals (e.g., USA, STOP and McKnight). There are certainly much more complicated ways to write the pattern, but I would suggest the pattern
r'[A-Z][A-Za-z0-9']*'
It is not a big deal, but I thought I would mention it. This might capture unintended strings (e.g., lists with letter counters like '(A)', '(B)', etc.; variables that show up in equations like X + Y; non-word strings like UK postal codes - EC1, W8, etc.), but I am of the school of thought that I would rather have more data than less.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions