## Start and End of Lines and Strings

We will learn how to qualify the start and end of lines and strings, which are called **anchors**. Along the way, we will understand the nuances between full matches and partial matches, and how anchors can assist with this too. 

## Full Matches versus Partial Matches

Let's bring in the `match()`, `search()`, and `fullmatch()` functions from the `re` regex libary. 

In [1]:
from re import match, search, fullmatch

### fullmatch()

We have learned how to do a full match using the `fullmatch()` function. The regex `[0-9][A-Z]` below will not match `5BA` because the string contains an extraneous character. 

In [2]:
fullmatch(pattern="[0-9][A-Z]", string="5BA") != None

False

### match()

But what happens if we use the `match()` function? 

In [3]:
match(pattern="[0-9][A-Z]", string="5BA") != None

True

Interestingly, this now matches. It must be because this works with partial matching and found `5B`. What if we gave it a string `A5B`? Should it not match the `5B` if it matches partially?  

In [4]:
match(pattern="[0-9][A-Z]", string="A5B") != None

False

Wait, that does not work. The reason is because `match()` will only do partial matches at the start of the string. If you intend on finding a partial match *anywhere* in the string, you need to use `search()`. 

### search()

If you thought `match()` would return a partial match anywhere in the string rather than only at the start of the string, you were likely wanting to use `search()` instead. Don't mix up these two, and in practice you are much more likely to use `search()`. 

In [7]:
search(pattern="[0-9][A-Z]", string="A5B") != None

True

So use `search()` when you intend to look for partial matches *anywhere* in the string. You can also qualify the *start-of-string* `^` to achieve what the `match()` does with `search()`. We will learn about the *start-of-string* and *end-of-string* next. 

In [8]:
search(pattern="^[0-9][A-Z]", string="A5B") != None

False

In [9]:
search(pattern="^[0-9][A-Z]", string="5BA") != None

True

## Start of String and Line

As we saw in the previous example, we can qualify a *start-of-string* using the caret operator `^`. Logically, it is most common to use this operator at the beginning of a regular expression. If I wanted to match only a digit that is the first character of a string, I would use the regex `^[0-9]`. 

In [10]:
search(pattern="^[0-9]", string="7 Apple Macbooks") != None

True

In [11]:
search(pattern="^[0-9]", string="iPhone 8") != None

False

When you have multiple lines in your string, you may want to change the behavior of `^` so it qualifies the start of a line rather than the start of a string. You can use the `re.MULTILINE` flag to achieve this. 

In [13]:
import re 

receipt = """
7 Apple Macbooks
iPhone 8
3 iPad Airs
"""

search(pattern="^[0-9]", string=receipt, flags=re.MULTILINE) != None

'7'

Above, there were two matches so it just qualifies as true. You might be wondering how we can return several partial matches from a document or multiline string. We will learn how to do this in a later section. 

The example below will result in no matches, as no line starts with a digit. 

In [14]:
receipt = """
Apple Macbook Air 
iPhone 8
iPad Mini 3
"""

search(pattern="^[0-9]", string=receipt, flags=re.MULTILINE) != None

False

## End of String and Line

You can also qualify the end of a string or line in a similar manner using the `$`. Logically, this is going to be put at the end of your regex rather than the beginning as it matches the end of the string. 

In [15]:
search(pattern="[0-9]$", string="iPhone 8") != None

True

In [16]:
search(pattern="[0-9]$", string="7 Apple Macbooks") != None

False

We can also match digits at the end of the line, rather than the end of the string, using `re.MULTILINE`. 

In [17]:
receipt = """
Apple Macbook Air 
iPhone 8
iPad Mini 3
"""

search(pattern="[0-9]$", string=receipt, flags=re.MULTILINE) != None

True

In [18]:
import re 

receipt = """
7 Apple Macbooks
3 iPad Airs
"""

search(pattern="[0-9]$", string=receipt, flags=re.MULTILINE) != None

False

## Forcing Full Matches with ^ and \$

To force a full match on a regular expression, you can always use `fullmatch()`. But it can be helpful to have the regular expression to express a fullmatch requirement even when used in a partial match context. This is done by simply using both the start-of-string `^` and end-of-string `$`. Below, we force a fullmatch with a regex `^[0-9][A-Z]$`. This basically reads as "only a digit followed by an uppercase letter can exist between the start and end of the string." This logic is effectively a full match. 

In [19]:
search(pattern="^[0-9][A-Z]$", string="A5B") != None

False

In [20]:
search(pattern="^[0-9][A-Z]$", string="5B") != None

True

> When I stored regular expressions in a database to build business rule engines, I follow this practice so people know the regex intends to be used as a full match. When you switch between platforms like SQL or Java, it helps to have a regex built this way too so you don't misuse a partial match function thinking it does full matching.  If you intend to do a full match, I believe it is a good practice to get into. However, I will refrain from imposing it on the rest of this course and just use `fullmatch()` when I intend to do a full match. 

Of course, you can use this pattern to "full match" the contents of each line using `re.MULTLINE`. 

In [21]:
import re 

my_doc = """
7HD
H7A
5MD
"""

search(pattern="^[0-9][A-Z][A-Z]$", string=my_doc, flags=re.MULTILINE) != None

True

Notice below how `4HAU` is not matched against `^[0-9][A-Z][A-Z]$` because it forces a full match on each line, not a partial one. 

In [22]:
import re 

my_doc = """
4HAU
H7A
YHH
"""

search(pattern="^[0-9][A-Z][A-Z]$", string=my_doc, flags=re.MULTILINE) != None

False

## Exercise

Write a regular expression below that determines if there are lines that start with a 2-letter airline code. Replace the question mark `?` below. 

HINT: remember that `\s` is a regex pattern for a space and do not forget to use a raw string `r"my regex"` since there will be a backslash. 

In [None]:
from re import search

flights = """
WN 672 
    ABQ HOU
DL 78
    ATL PHX
"""

search(pattern=?, string=flights, flags=re.MULTILINE) != None

### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

In [None]:
from re import search


flights = """
WN 672 
    ABQ HOU
DL 78
    ATL PHX
"""

search(pattern=r"^[A-Z][A-Z]\s", string=flights, flags=re.MULTILINE) != None