## Regexp sets and ranges

- # Sets

While the dot allows us to match almost every possible character, the sets provide us with the opportunity to be more specific in our regexp templates and narrow down the scope of our search. Each set in the regular expression takes the place of exactly one character in the string, but it defines a whole number of characters that can match it. These characters are listed inside the square brackets, []:

template = '[bd]a[td]'

In the template above, we have two defined sets. The first one corresponds either to a character b or d in the string, the second one — to t or d. Here are the results for some of the possible strings:

In [None]:
re.match(template, 'bat')  # match
re.match(template, 'dad')  # match
re.match(template, 'cat')  # no match: 'c' is not in the first set
re.match(template, 'dot')  # no match: 'o' instead of 'a'

An empty set causes an error:



In [None]:
re.match('c[]at', 'cat')  # sre_constants.error: unexpected end of regular expression


An unescaped left square bracket, for which no unescaped right square bracket was found, causes the same error:



In [None]:
re.match('[', '[')  # sre_constants.error: unexpected end of regular expression


By the way, good news, everyone! There is (almost) no need for boring escaping stuff when we use sets in regexp.



# Escaping in sets


Sets in regular expressions have a sort of superpower: they automatically "neutralize" the metacharacters listed inside the square brackets, turning them into regular characters. This way, the dot and the question mark, for example, do not have to be escaped if they are part of a regexp set:



In [None]:
template = 'Hodor[?.]'
re.match(template, 'Hodor?')  # match
re.match(template, 'Hodor.')  # match
re.match(template, 'Hodor!')  # no match


The only metacharacters that do not fall under this rule and keep their special status are, predictably, the right square bracket ] and the backslash \. The right square bracket should be escaped to show that it is a part of the set, not the metacharacter denoting its end:

In [None]:
template = r'=[\]]'
re.match(template, '=]')  # match

template = r'=[)]]'
re.match(template, '=]')  # no match
re.match(template, '=)]')  # match (the only string this template can match)

# Ranges
- One of the main things about sets is that you may not only list the characters individually but also use ranges of characters. A range is designated by a dash -. For example, if you want your set to match every letter from a to z, you do not have to list out the whole alphabet, you can simply write [a-z].

In [None]:
re.match('ja[a-z].', 'jazz')  # match
re.match('[A-Z]ill', 'kill')  # no match: [A-Z] matches only uppercase letters
re.match('[A-Z]ill', 'Bill')  # match

In [None]:
re.match('[0-9]', '7')   # match
re.match('[0-9]', '07')  # match
re.match('[1-9]', '07')  # no match

In [None]:
re.match('love [a-zA-Z]', 'love U')  # match: [a-zA-Z] matches both uppercase and lowercase
re.match('love [a-z!A-Z]', 'love !')  # match: [a-z!A-Z] matches letters and !

In [None]:
re.match('[A-Z]bermensch', 'Übermensch')  # no match: Ü is not within A-Z range
re.match('[À-Ý]bermensch', 'Übermensch')  # match
re.match('[À-Ý]bermensch', '×bermensch')  # match: × is within À-Ý range

In [None]:
re.match('[-1-9]1', '-1')  # match
re.match('[1-9-]1', '-1')  # match


# Exclusion of characters
The hat (aka the caret) ^ symbol is also a specific set metacharacter: whenever it is placed as the first character in the set, it makes the set specify the characters you do not want to see in the string. Any character that is not a part of such set will match it:



In [1]:
re.match('[^A-Z]ond', 'Bond')  # no match
re.match('Bon[^A-Z]', 'Bond')  # match


NameError: name 're' is not defined

In [None]:
re.match('[A-Z^]ames', 'James')  # match
re.match('[A-Z^]ames', '^ames')  # match

In [9]:
import re 
def is_suitable_name(name):
    # Define a regular expression pattern for the criteria
    # Use re.match to check if the name matches the pattern
    if re.match('[B-N][aeiouAEIOU]*', name):
        return True
    else:
        return False

# Get input from the user
name = input()
# Check if the name is suitable and print the result
if is_suitable_name(name):
    print("Suitable!")


Suitable!


- our next task is to write a program that will match hyphenated words, such as:

- twenty-one,
- long-term
- co-worker
- well-known


It should print True if the word is written with a dash, and False otherwise.

In [15]:
import re

def has_hyphen(word):
    # Define a regular expression pattern to match words with a hyphen
    pattern = r'^[A-Za-z]+-[A-Za-z]+$'
    
    # Use re.match to check if the word matches the pattern
    if re.match(pattern, word):
        return True
    else:
        return False

# Get input from the user
word = input()

# Check if the word has a hyphen and print the result
if has_hyphen(word):
    print("True")
else:
    print("False")



True
