## Python Regular Expression

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

### RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

In [None]:
import re

##### When you have imported the re module, you can start using regular expressions:

##### Search the string to see if it starts with "The" and ends with "Spain":

In [3]:
import re

txt = "The rain in Dubai"
x = re.search("^The.*Dubai$", txt)

if (x):
  print("YES! We have a match!")
else:
  print("No match")


YES! We have a match!


### RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:

#### Function	- Description
findall	 -Returns a list containing all matches

search	 -Returns a Match object if there is a match anywhere in the string

split	 -Returns a list where the string has been split at each match

sub	     -Replaces one or many matches with a string

### Metacharacters

Metacharacters are characters with a special meaning:

#### Character	Description	Example

[]	-A set of characters	"[a-m]"	

\	-Signals a special sequence (can also be used to escape special characters)	"\d"	

.	-Any character (except newline character)	"he..o"	

^	-Starts with	"^hello"

( - $-dollarsign  )	-Ends with	"world$"	-dollar sign

( * )-Zero or more occurrences	"aix*"	

(+)	-One or more occurrences	"aix+"	

{}	-Exactly the specified number of occurrences	"al{2}"	

|	-Either or	"falls|stays"	

()	-Capture and group	 

### Special Sequences

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

#### Character	Description	Example	Try it

\A -Returns a match if the specified characters are at the beginning of the string	"\AThe"	

\b -Returns a match where the specified characters are at the beginning or at the end of a word	r"\bain"
r"ain\b"	

\B -Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word	r"\Bain"
r"ain\B"	

\d -Returns a match where the string contains digits (numbers from 0-9)	"\d"	

\D -Returns a match where the string DOES NOT contain digits	"\D"	

\s -Returns a match where the string contains a white space character	"\s"	

\S -Returns a match where the string DOES NOT contain a white space character	"\S"	

\w -Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"	

\W -Returns a match where the string DOES NOT contain any word characters	"\W"	

\Z -Returns a match if the specified characters are at the end of the string	"Spain\Z"

### Sets
A set is a set of characters inside a pair of square brackets []

#### Set	Description

[arn] - Returns a match where one of the specified characters (a, r, or n) are present	

[a-n] - Returns a match for any lower case character, alphabetically between a and n	

[^arn] -Returns a match for any character EXCEPT a, r, and n	

[0123] -Returns a match where any of the specified digits (0, 1, 2, or 3) are present	

[0-9] -Returns a match for any digit between 0 and 9	

[0-5][0-9] -Returns a match for any two-digit numbers from 00 and 59	

[a-zA-Z] -Returns a match for any character alphabetically between a and z, lower case OR upper case	

[+] -In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

### The findall() Function

The findall() function returns a list containing all matches.

##### Print a list of all matches:

In [7]:
import re

txt = "The rain in Dubai"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


##### Return an empty list if no match was found:

In [8]:
import re

txt = "The rain in Dubai"

#Check if "Portugal" is in the string:

x = re.findall("India", txt)
print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")

[]
No match


### The search() Function

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

##### Search for the first white-space character in the string

In [9]:
import re

txt = "The rain in Dubai"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start()) 


The first white-space character is located in position: 3


##### If no matches are found, the value None is returned:

In [10]:
import re

txt = "The rain in Dubai"
x = re.search("India", txt)
print(x)

None


### The split() Function

The split() function returns a list where the string has been split at each match:

##### Split at each white-space character:

In [11]:
import re

txt = "The rain in Dubai"
x = re.split("\s", txt)
print(x)




['The', 'rain', 'in', 'Dubai']


##### You can control the number of occurrences by specifying the maxsplit parameter:

In [13]:
#Split the string only at the first occurrence:

import re

txt = "The rain in Dubai"
x = re.split("\s", txt, 1)
print(x)

['The', 'rain in Dubai']


### The sub() Function

The sub() function replaces the matches with the text of your choice:

##### Replace every white-space character with the number 9

In [14]:
import re

txt = "The rain in Dubai"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Dubai


##### You can control the number of replacements by specifying the count parameter:

In [15]:
import re

txt = "The rain in Dubai"
x = re.sub("\s", "9", txt, 2)
print(x)

The9rain9in Dubai


### Match Object

A Match Object is an object containing information about the search and the result.

##### The search() function returns a Match object:

In [17]:
import re

txt = "The rain in Dubai"
x = re.search("ai", txt)
print(x)


<re.Match object; span=(5, 7), match='ai'>


#### The Match object has properties and methods used to retrieve information about the search, and the result:
    
.span() - returns a tuple containing the start-, and end positions of the match.

.string - returns the string passed into the function

.group()- returns the part of the string where there was a match

#### Print the position (start- and end-position) of the first match occurrence.

The regular expression looks for any words that starts with an upper case "S":

##### Search for an upper case "S" character in the beginning of a word, and print its position:


In [21]:
import re


txt = "The rain in Dubai"
x = re.search(r"\bD\w+", txt)
print(x.span())


(12, 17)


##### Print the string passed into the function:

#The string property returns the search string:


In [23]:
import re


txt = "The rain in Dubai"
x = re.search(r"\bD\w+", txt)
print(x.string)


The rain in Dubai


##### Print the part of the string where there was a match.

Search for an upper case "S" character in the beginning of a word, and print the word:


In [24]:
import re

txt = "The rain in Dubai"
x = re.search(r"\bD\w+", txt)
print(x.group())


Dubai


##### Note: If there is no match, the value None will be returned, instead of the Match Object.