You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your code you use the regex
r'[A-Z][a-z0-9]+'
as the pattern, but that would not count capitalized one letter words (i.e., I or A), capitalized words with apostrophes (e.g., Don't, Isn't, O'Leary) and it would also miss words (or abbreviations that had one or more capitals (e.g., USA, STOP and McKnight). There are certainly much more complicated ways to write the pattern, but I would suggest the pattern
r'[A-Z][A-Za-z0-9']*'
It is not a big deal, but I thought I would mention it. This might capture unintended strings (e.g., lists with letter counters like '(A)', '(B)', etc.; variables that show up in equations like X + Y; non-word strings like UK postal codes - EC1, W8, etc.), but I am of the school of thought that I would rather have more data than less.
The text was updated successfully, but these errors were encountered:
In your code you use the regex
r'[A-Z][a-z0-9]+'
as the pattern, but that would not count capitalized one letter words (i.e., I or A), capitalized words with apostrophes (e.g., Don't, Isn't, O'Leary) and it would also miss words (or abbreviations that had one or more capitals (e.g., USA, STOP and McKnight). There are certainly much more complicated ways to write the pattern, but I would suggest the pattern
r'[A-Z][A-Za-z0-9']*'
It is not a big deal, but I thought I would mention it. This might capture unintended strings (e.g., lists with letter counters like '(A)', '(B)', etc.; variables that show up in equations like X + Y; non-word strings like UK postal codes - EC1, W8, etc.), but I am of the school of thought that I would rather have more data than less.
The text was updated successfully, but these errors were encountered: