# Grouping and Alternators

We are going to learn how to group up different patterns using parantheses, use alternators to switch between different patterns, as well as use prefixes and suffixes. 

## Grouping

You might have encountered already a need to quantify an entire series of patterns in a regular expression and not just one. For instance, let's say you wanted to match the words "Support" and "Supported". We could group up the two literals `ed` into parantheses with an optional quantifier `(ed)?`. 

In [None]:
from re import fullmatch

fullmatch(pattern="Support(ed)?", string="Support") != None

In [None]:
fullmatch(pattern="Support(ed)?", string="Supported") != None

In [None]:
fullmatch(pattern="Support(ed)?", string="Supportability") != None

What if we wanted to match any sequence of an alphabetic letter followed by a digit, but repeat that pattern an indefinite amount of times? We can do that by using parantheses followed by a `+` to quantify that expression one or more times. 

In [None]:
fullmatch(pattern="([A-Z][0-9])+", string="A2") != None

In [None]:
fullmatch(pattern="([A-Z][0-9])+", string="F4W9F3W6") != None

Here is final example matching a US phone number of 10 digits with optional hyphens, and the area code (the first three digits) is optional. 

In [None]:
fullmatch(pattern="([0-9]{3}-?)?[0-9]{3}-?[0-9]{4}", string="4803718745") != None

In [None]:
fullmatch(pattern="([0-9]{3}-?)?[0-9]{3}-?[0-9]{4}", string="371-8745") != None

In [None]:
fullmatch(pattern="([0-9]{3}-?)?[0-9]{3}-?[0-9]{4}", string="480-371-8745") != None

Remember to always read a regular expressio from left to right, and pay attention to groupings of patterns in parantheses and realize a quantifier is likely to follow repeating that whole sequence of patterns. 

## Alternators

Another useful operator is the alternator `|` which allows us to switch between two patterns. Think of it as an `OR` in a regular expression. Below we match simple literal strings "ALPHA", "BETA", "GAMMA", and "DELTA" using the alternator `|`. The regular expression will only match these four strings.

In [None]:
fullmatch(pattern="ALPHA|BETA|GAMMA|DELTA", string="ALPHA") != None

In [None]:
fullmatch(pattern="ALPHA|BETA|GAMMA|DELTA", string="DELTA") != None

In [None]:
fullmatch(pattern="ALPHA|BETA|GAMMA|DELTA", string="EPSILON") != None

Here is another example where we match any string that is followed by two digits or "ZZ", a hyphen, and then the string "FOXTROT". 

In [None]:
fullmatch(pattern="([0-9]{2}|ZZ)-FOXTROT", string="12-FOXTROT") != None

In [None]:
fullmatch(pattern="([0-9]{2}|ZZ)-FOXTROT", string="ZZ-FOXTROT") != None

In [None]:
fullmatch(pattern="[0-9]{5}(-[0-9]{4})?", string="75035-8564")

You will find alternators are often used inside a group because it is common to switch between two or more patterns at a certain place in the regular expression. 

## Prefix and Suffix 

Especially when you are scanning documents, it can be helpful to capture a regular expression pattern but not include a certain part of the pattern. This is where prefixes and suffixes can be helpful.

Let's say I want to match a sequence of digits but only if they are preceded by an uppercase letter. I would specify the uppercase letter inside a prefix `(?<=[A-Z])` which would not be returned. However, the `[0-9]+` following it would but only if that prefix was met. 

In [None]:
from re import search

result = search(pattern="(?<=[A-Z])[0-9]+", string="A23")
if result: 
    print(result[0])
else:
    print("No match")

In [None]:
result = search(pattern="(?<=[A-Z])[0-9]+", string="23")
if result: 
    print(result[0])
else:
    print("No match")

Notice how the `23` is the only result that is returned, even though it qualified the uppercase letter preceding it. That `?<=` that starts a group `(?<=` is what defines a prefix (also called a look-behind), and everything that follows it is the prefix pattern. 

You can also use a suffix to do a look-ahead, qualifying a pattern that's ahead but not including it. Below I match a sequence of digits but only if they are followed by an uppercase letter. 

In [None]:
result = search(pattern="[0-9]+(?=[A-Z])", string="23")
if result: 
    print(result[0])
else:
    print("No match")

In [None]:
result = search(pattern="[0-9]+(?=[A-Z])", string="23L")
if result: 
    print(result[0])
else:
    print("No match")

As we will learn, prefixes and suffixes can be helpful in splitting strings based on more complicated patterns. The downsides of the prefix and suffix is they can get whiny when you do not use fixed-width patterns. Therefore, do not expect the Python implementation of regular expressions to allow quantifiers in prefixes and suffixes. 

## Exercise

Write a regular expression that will match a United States zip code, which is 5 digits, followed by an optional hyphen then a sequence of 4 digits. 

Replace the question mark `?` below. 

In [None]:
fullmatch(pattern=?, string="75035-3821") != None

### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

In [None]:
fullmatch(pattern="[0-9]{5}(-[0-9]{4})?", string="75035-3821") != None

In [None]:
fullmatch(pattern="[0-9]{5}(-[0-9]{4})?", string="75035") != None