# Regular Expressions

The term `Regular Expression` refers to sequences of characters that form search parameters for texts patterns. Python has a package `re`, containing functions used in Regular Expressions. This package has 4 main functions:

- `findall` - returns a list containing all matches
- `search` - returns a `match object` if a match exists
- `split` - returns a list where the string has been split at each match
- `sub` - replaces all matches in a string with other specified characters

More information about the metacharacters and special sequences can be found here: [Regex in Python](https://docs.python.org/3/howto/regex.html)

In [2]:
import re

## Finding a match
`findall` returns a list of all the matches that correspond to the regular expression pattern.

In [4]:
str1 = "The rain in Spain falls mainly on the plains"
pat1 = r"\w+ain\w*"
x1 = re.findall(pat1,str1)
print(x1)

['mainl', 'plains']


In [7]:
str2="Denise and Dennis deny denouncing their dentist. Their denials denominated the news."
pat2=r"[Dd]en\w*"
x2 = re.findall(pat2,str2)
print(x2)

['Denise', 'Dennis', 'deny', 'denouncing', 'dentist', 'denials', 'denominated']


## Search for only the first match
The only significant difference between `findall` and `search` is that `search` will only find **the first match** in the string of interest. 

In [8]:
str1 = "The rain in Spain falls mainly on the plains"
pat1 = r"\w+ain\w*"
x = re.search(pat1,str1)
print(x)

<re.Match object; span=(4, 8), match='rain'>


There are several methods that you can use to access the information in the Match object produced by `search`. These methods can give you the indices of the match, the match, and the original string.

In [9]:
print(x.start())
print(x.end())
print(x.span())
print(x.string)
print(x.group())

4
8
(4, 8)
The rain in Spain falls mainly on the plains
rain


## Splitting apart
The `split` function in `re` allows you to split your string on any character(s) you wish. There are also optional arguments allowing you to specify the exact number of splits you wish to make in a given string

In [11]:
getty = "But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground.";
pat3 = r'[\s.—,]+'
pat4 = r'[—]'
x3=re.split(pat3,getty)
x4=re.split(pat4,getty,4)
print(x3)
print(x4)

['But', 'in', 'a', 'larger', 'sense', 'we', 'can', 'not', 'dedicate', 'we', 'can', 'not', 'consecrate', 'we', 'can', 'not', 'hallow', 'this', 'ground', '']
['But, in a larger sense, we can not dedicate', 'we can not consecrate', 'we can not hallow', 'this ground.']


## Substitution
The function `sub` will return the altered string after it substitutes a replacement substring.

In [12]:
x5 = re.sub(r"[Dd]en",r"ur",str2)
print(x5)

urise and urnis ury urouncing their urtist. Their urials urominated the news.
