# Regular Expressions

Regular expressions, or "regex" for short, are a powerful tool for matching patterns in text. In Python, the `re` module provides support for regular expressions. With this module, you can search for patterns in strings, replace parts of strings that match a pattern, and split strings based on a pattern.

The basic building blocks of regular expressions are characters and metacharacters. Characters match themselves, while metacharacters are used to match groups of characters or to specify the structure of the pattern. Some examples of metacharacters are `.` (matches any character), `*` (matches zero or more occurrences of the preceding character), `+` (matches one or more occurrences of the preceding character), `?` (matches zero or one occurrence of the preceding character).

In [1]:
import re # importing re module to perform regular expressions

text = "The quick brown fox jumps over the lazy dog."
match = re.search(r"fox", text)
print(match.group())

fox


In [2]:
# raw strings in python

firstPath = "D:\01.dataScience\01.Python"

secondPath = "D:\01.dataScience\01.Python\newfolder" # in strings "\n" would be considered as a newline

secondPathModified = r"D:\01.dataScience\01.Python\newfolder"



print(firstPath)
print(secondPath) # n is not displayed because python thinks it is calling for new line
print(secondPathModified)

D:.dataScience.Python
D:.dataScience.Python
ewfolder
D:\01.dataScience\01.Python\newfolder


### Matching patterns with regex

To match a US format phone number like 415-555-4757.

In regex, `\d` stands for digit characters. i.e. any number ranging from 0 - 9.

The above pattern can be matched by writing `\d\d\d-\d\d\d-\d\d\d\d`

But regular expressions can be more sophisticated. For example, adding a 3 in braces (`{3}`) after a certain pattern will say that the pattern should be repeated *three times*.

Read: https://docs.python.org/3/library/re.html?highlight=re#module-re

In [12]:
phoneNumRegex = re.compile(r'(\d{3})-(\d{3}-\d{4})') # saving the pattern using compile

In [17]:
mo = phoneNumRegex.search("My number is 312-415-2222")

print(mo.group(1))


312


mo.groups() returns a tuple of multiple values. We can use `(` and `)` to separate the characters as each groups in the `mo.groups` output.


### Special Characters in Regular Expressions

`.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )`


If you want to detect any of these characters, you can use a blackslash inorder to display them in your text pattern.


# Matching Multiple groups with pipe symbol

In [27]:
heroRegex = re.compile(r'Frog|Batman') # saving the pattern using compile

mo = heroRegex.search("Cat and Frog")

print(mo.group())

Frog
