# Groups and alternations

What if we want to specify the number of times that a string of different characters should occur in the string? In this case, we should resort to parenthesis characters (). The parenthesis in regular expressions can group desired parts of the template into single units and process them together. Let's discuss the details of their application!

In this topic, we'll come across the OR operator (an alternation) of regular expressions represented by the vertical bar |


## Groups

By default, when we put a quantifier in our template, it's applied to the preceding character. Take h[ao]{2} for example. The quantifier demands either a or o to be repeated twice, but h can only occur once.

In [None]:
template = r"(h[ao]){2}"  # matches a string consisting of two "ha" or "ho"
re.match(template, "haha")  # a match
re.match(template, "hoha")  # a match
re.match(template, "haa")  # no match
re.match(template, "hho")  # no match

You can apply any quantifier you want, but the syntax remains. For example, you can mark an optional substring with a question mark quantifier ?. It will make the group match one or no occurrences of the group in the string:



In [3]:
import re
template = r"ha(\?!)?"  # we expect "?!" to occur together and in this exact order
re.match(template, "ha?!")  # a match
re.match(template, "ha")  # a match
# in case "?" or "!" occur separately, the group won't match them
re.match(template, "ha?")  # matches only "ha", but not "?", since there's no "!" succeeding "?"
re.match(template, "ha!")  # matches only "ha", but not "!", since there's no "?" preceding "!"

<re.Match object; span=(0, 2), match='ha'>

# Nested groups
We can also make use of nested groups — you can put a group inside a group to specify smaller substring repetitions inside larger substrings. Take a look at this template that matches any number of repeated strings containing two substrings of the <letter><digit> type (for example, A0, C3):



In [4]:
template = r"(([A-Z]\d){2}\.)+"
re.match(template, "A0C3.B8K5.")  # a match
re.match(template, "A0C3.")  # a match
re.match(template, "A0.C3B8K5")  # no match, as a dot separates two letter-digit combinations
re.match(template, "A0.C3.B8K5")  # no match, as "A0.C3." is separated by a dot
# and "B8K5" aren't followed by a dot

# Alterations

In many cases, a pattern we'd like to match can contain alternative substrings — sometimes one, sometimes another. For example, when we search for a web address, it can have .com, .org, .net, etc. as a part of the domain name. We can match several domain types in one template by using |.

| is the or operator in regexps. By separating alternative substrings with vertical bars, you are matching any of these substrings with the template. Here, take a look:

In [5]:
template = r"python|java|kotlin"
re.match(template, "python")  # a match
re.match(template, "java")  # a match
re.match(template, "kotlin")  # a match
re.match(template, "c++")  # no match
re.match(template, "k")  # no match
re.match(template, "jav")  # no match

# Groups and alternations
For instance, if we need to find the following strings: python course, kotlin course, python lesson, or kotlin lesson, we can write the following expression first:

In [6]:
template = r"(python|kotlin) (course|lesson)"
re.match(template, "kotlin")  # no match
re.match(template, "lesson")  # no match
re.match(template, "python lesson")  # match
re.match(template, "kotlin course")  # match

<re.Match object; span=(0, 13), match='kotlin course'>