# Repeating Patterns with Quantifiers

By now, you might be slightly tired of writing `[A-z0-9][A-z0-9][A-z0-9]` just to match three uppercase alphanumeric. Now imagine if you have to match 20 alphanumeric characters 😱? Doesn't that just sound awful? 

```
[A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9][A-z0-9]
```

Well worry not. Quantifiers have come to the rescue! We can consolidate all of that to this. 😂

```
[A-z0-9]{20}
```

And we can do much, much more. 

## Fixed Quantifiers

This regex you saw `[A-z0-9]{20}` has a bracketed number `{20}`, which specifies how many times to repeat the pattern preceding. We can see that it matches exactly 20 alphanumeric characters `[A-z0-9]`. 

In [None]:
from re import fullmatch 

fullmatch(pattern="[A-z0-9]{20}", string="Achd46") != None

In [None]:
fullmatch(pattern="[A-z0-9]{20}", string="hgbjh734hgfhsabfghhf") != None

As you can guess, a **quantifier** repeats a regex pattern and the example above is a fixed quantifier. 

## Min/Max Quantifiers

Using two numbers separating by a comma, we can specify a **min/max quantifier** to repeat a pattern a min/max number of times. 

```
pattern{min,max}
```

For example, we can match airline and airport codes which have two or three alphabetic characters respectively. 

In [None]:
fullmatch(pattern="[A-Z]{2,3}", string="DL") != None 

In [None]:
fullmatch(pattern="[A-Z]{2,3}", string="JFK") != None 

In [None]:
fullmatch(pattern="[A-Z]{2,3}", string="ALPHA") != None 

We can match 1 to 100 numeric digits. 

In [None]:
fullmatch(pattern="[0-9]{1,100}", string="25482") != None 

In [None]:
fullmatch(pattern="[0-9]{1,100}", string="98465462164984335498465649849463574546325775455") != None 

If we leave the `max` blank, we can capture an unlimited number of digits. Below we match at least 3 digits with an unlimited maximum. 

In [None]:
fullmatch(pattern="[0-9]{3,}", string="98") != None 

In [None]:
fullmatch(pattern="[0-9]{3,}", string="98465462164984335498465649849463574546325775455") != None 

We can also have a minimum of 0, which makes the presence of that pattern completely optional. Below we match an `x` and an uppercase alphabetic letter, but 0-3 digits can exist between them. 

In [None]:
fullmatch(pattern="x[0-9]{0,3}[A-Z]", string="xZ") != None 

In [None]:
fullmatch(pattern="x[0-9]{0,3}[A-Z]", string="x75Z") != None 

## Shorthand Quantifiers

Certain min/max quantifiers, specifically `{1,}`, `{0,}`, and `{0,1}` are so common they get their shorthands: `+`, `*`, and `?` respectively. 

|Shorthand|Min/Max Equivalent|Description|
|---|---|---|
|`+`|`{1,}`|Matches one or more instances of a pattern|
|`*`|`{0,}`|Matches 0 or more instances of a pattern|
|`?`|`{0,1}`|Matches only 0 or 1 instances of a pattern|

Below, we match any sequence of digits then a sequence of uppercase letters. 

In [None]:
fullmatch(pattern="[0-9]+[A-Z]+", string="746234WHISKEY") != None 

Again, the `+` is the equivalent to `{1,}` so the same task could have been achieved this way. It specifies "at least one instance of this pattern must exist, and I'll capture as many that exist after that." 

In [None]:
fullmatch(pattern="[0-9]{1,}[A-Z]{1,}", string="746234WHISKEY") != None 

We can also make the digit sequence completely optional by using the `*` instead of `+`, which is the equivalent of using `{0,}` instead of `{1,}`. 

In [None]:
fullmatch(pattern="[0-9]*[A-Z]{1,}", string="746234WHISKEY") != None 

In [None]:
fullmatch(pattern="[0-9]*[A-Z]{1,}", string="WHISKEY") != None 

In [None]:
fullmatch(pattern="[0-9]*[A-Z]{1,}", string="746234") != None 

The `?` is another common shorthand, which is the same as `{0,1}`. It is often referred to as an **optional** as it says one instance of a pattern can be there, but it does not have to be. Below we match two alphabetic letters, but they can be preceded by a single digit. 

In [None]:
fullmatch(pattern="[0-9]?[A-Z]{2}", string="AZ") != None 

In [None]:
fullmatch(pattern="[0-9]?[A-Z]{2}", string="4AZ") != None 

## Greedy versus Lazy Quantifiers

Switching over to a partial match context using `search()`, notice what happens when I search for a sequence of letters. I will also show that you can access the matches by index using square brackets `[ ]` on the `Match` object. We will use the index `[0]` to get the first match. 

In [None]:
from re import search 

search(pattern="[XY0-9]+", string="XXYY9637ALPHA")[0]

No surprise. It captured everything up to the `7`. But ask yourself this: why did it not stop at the first `X`? That would satisfy the regex of `[XY0-9]+` right? The reason is regular expressions are by default **greedy**, meaning they are going to capture as much text as they can for a given pattern until the pattern can no longer be matched. If you want to make the regular expression **lazy**, or stop as early as possible once the pattern is satisfied, add a question mark after the quantifier `+?`. 

In [None]:
search(pattern="[XY0-9]+?", string="XXYY9637ALPHA")[0]

Do not confuse the question mark being used in a different context. Earlier we used it as a shorthand for a `{0,1}` quantifier, but if it is after another quantifier it will be a lazy modifier. I personally have not used lazy modifiers often, and these simple examples may not make sense why they are useful. After all, you can achieve the same behavior just looking for one instance in this case. 

In [None]:
search(pattern="[XY0-9]", string="XXYY9637ALPHA")[0]

But when we learn how to build more complex regular expressions and if you are traversing documents, they can be handy. They are also a useful tool when your regex is capturing more than you expected, and the lazy modification is simpler than a more complex regular expression. 

## Exercise

Write a regular expression that matches a series of digits, then a space, and then a series of uppercase letters, then another space, and finally the word "END". Put the regular expression string in the question mark `?` below. 

HINT: remember that `\s` is a regex pattern for a space and do not forget to use a raw string `r"my regex"` since there will be a backslash. 

In [None]:
from re import fullmatch 

fullmatch(pattern=?, string="5766264 TANGO END") != None

### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

In [None]:
from re import fullmatch 

fullmatch(pattern=r"[0-9]+\s[A-Z]+\sEND", string="5766264 TANGO END") != None