# Python RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

# RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

In [1]:
import re

# ^a...s$
The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and ending with s.

In [3]:
import re

pattern = '^a...s$'
test_string = 'ab2@s'  # 'Alias'
result = re.match(pattern, test_string)
print(result)

if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")	

<re.Match object; span=(0, 5), match='ab2@s'>
Search successful.


In [24]:
test_string = 'Alias'   #not starting with 'a' but with 'A'
result = re.match(pattern, test_string)
print(result)

None


In [4]:
test_string = 'alias'
result = re.match(pattern, test_string)
print(result)

<re.Match object; span=(0, 5), match='alias'>


Here, we used re.match() function to search pattern within the test_string. The method returns a match object if the search is successful. If not, it returns None.

## Specify Pattern Using RegEx
To specify regular expressions, metacharacters are used. In the above example, ^ and $ are metacharacters.

MetaCharacters
Metacharacters are characters that are interpreted in a special way by a RegEx engine. Here's a list of metacharacters:

[] . ^ $ * + ? {} () \ |

## [] - Square brackets

Square brackets specifies a set of characters you wish to match.
You can also specify a range of characters using - inside square brackets.

## [a-e] is the same as [abcde].
## [1-4] is the same as [1234].
## [0-39] is the same as [01239].
You can complement (invert) the character set by using caret ^ symbol at the start of a square-bracket.

## [^abc] means any character except a or b or c.
## [^0-9] means any non-digit character.

In [None]:
[a-zA-Z0-9]

In [5]:
import re
test_string = ["abc","ab","hey dear","dc ce"]
pattern = "[abc]"
for ele in test_string:
    result1 = re.findall(pattern, ele)
    print(result1)


1 :
['a', 'b', 'c']
2 :
['a', 'b']
3 :
['a']
4 :
['c', 'c']


In [5]:
import re
pattern1 = "[+]*[0-9][0-9]*"
test_string1 = ["a","45","+63","76","-29"]
count=1
for ele in test_string1:
    print(count,":")
    count+=1
    result1 = re.findall(pattern1, ele)
    print(result1)

1 :
[]
2 :
['45']
3 :
['+63']
4 :
['76']
5 :
['29']


In [49]:
pattern1 = "[a-z | A-Z]*[^0-9]"
test_string1 = ["5","shrs5","8wvhdjh","jdvjhd","ASDCVEV"]
count=1
for ele in test_string1:
    print(count,":")
    count+=1
    result1 = re.findall(pattern1, ele)
    print(result1)

1 :
[]
2 :
['shrs']
3 :
['wvhdjh']
4 :
['jdvjhd']
5 :
['ASDCVEV']


In [18]:
import re
txt = "The rain in Spain"
#Find all lower case characters alphabetically between "a" and "m":
x = re.findall("[a-m]", txt)
print(x)

['h', 'e', 'a', 'i', 'i', 'a', 'i']


## . - Period
A period matches any single character (except newline '\n').

In [33]:
txt = "he2#o world"
#Search for a sequence that starts with "h", followed by two (any) characters, and an "o":
x = re.findall("h...o", txt)
print(x)

['he2#o']


## ^ - Caret
The caret symbol ^ is used to check if a string starts with a certain word.

In [34]:
txt = "hello world"
#Check if the string starts with 'hello':
x = re.findall("^hello", txt)
if x:
  print("Yes, the string starts with 'hello'")
else:
  print("No match")

Yes, the string starts with 'hello'


In [36]:
txt = "hello world"

#Check if the string ends with 'world':

x = re.findall("world$", txt)
if x:
  print("Yes, the string ends with 'world'")
else:
  print("No match")

Yes, the string ends with 'world'


## * - Star
The star symbol * matches zero or more occurrences of the pattern left to it.

In [53]:
txt = "The rain in Spain falls mainly in the plain! aix aixx"

#Check if the string contains "ai" followed by 0 or more "x" characters:

x = re.findall("aix*", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['ai', 'ai', 'ai', 'ai', 'aix', 'aixx']
Yes, there is at least one match!


## + - Plus
The plus symbol + matches one or more occurrences of the pattern left to it.

In [39]:
txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 1 or more "x" characters:

x = re.findall("aix+", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[]
No match


In [58]:
txt = "ai aix aixxxx"
x = re.findall("aix+", txt)
x

['aix', 'aixxxx']

In [13]:
txt = "873476221"

#Check if the string contains "ai" followed by 1 or more "x" characters:
pattern = "[7-9][0-9]{8,9}" # min 8 occurrences or max 9 occurrences of character to the left of {8,9} 
x = re.findall(pattern, txt)
print(x)

['873476221']


## ? - Question Mark
The question mark symbol ? matches zero or one occurrence of the pattern left to it.

In [41]:
test_string = ["mn","man","maaan","main","woman"]
pattern = "ma?n" # only 0 or 1 occurences of a are matched
count=1
for ele in test_string:
    print(count,":")
    count+=1
    result1 = re.findall(pattern, ele)
    print(result1)

1 :
['mn']
2 :
['man']
3 :
[]
4 :
[]
5 :
['man']


## {} - Braces
Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.

In [42]:
txt = "The rain in Spain falls mainly in the plain! alll, allll, alllllll"

#Check if the string contains "a" followed by exactly two "l" characters:

x = re.findall("al{2,4}", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['all', 'alll', 'allll', 'allll']
Yes, there is at least one match!


##  |	Either or
| - Alternation: Vertical bar | is used for alternation (or operator).

In [14]:
txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)   # [a-z | A-Z]

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['falls']
Yes, there is at least one match!


In [43]:
txt = "Practical 8 - Python3"
x = re.findall("[a-z | A-Z]", txt)   # [a-z | A-Z]

print(x)

['P', 'r', 'a', 'c', 't', 'i', 'c', 'a', 'l', ' ', ' ', ' ', 'P', 'y', 't', 'h', 'o', 'n']


## () - Group

Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches either a or b or c followed by xz

## \ - Backslash

Backlash \ is used to escape various characters including all metacharacters. For example,

## re.findall()
The re.findall() method returns a list of strings containing all matches.

In [15]:
# Program to extract numbers from a string
string = 'hello 12 hi 89. Howdy 3, 9989'
pattern = '\d+'
result = re.findall(pattern, string) 
print(result)

['12', '89', '3', '9989']


## re.split()
The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred.

In [67]:
string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'
result = re.split(pattern, string) 
print(result)  
# If the pattern is not found, re.split() returns a list containing the original string.

['Twelve:', ' Eighty nine:', '.']


## re.sub()
The syntax of re.sub() is:

re.sub(pattern, replace, string)
The method returns a string where matched occurrences are replaced with the content of replace variable.

In [16]:
# Program to remove all whitespaces

# multiline string
string = 'abc 12\de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.sub(pattern, replace, string) 
print(new_string)
# If the pattern is not found, re.sub() returns the original string.

abc12\de23f456


## re.search()
The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string.

If the search is successful, re.search() returns a match object; if not, it returns None.

match = re.search(pattern, str)

In [52]:
import re
string = "Python Python is fun"

# check if 'Python' is at the beginning
match = re.search('Python', string)
print(match)
if match:
  print("pattern found inside the string")
else:
  print("pattern not found")  

<re.Match object; span=(0, 6), match='Python'>
pattern found inside the string


In [None]:
RE for valid mail id

bongirwarvk@rknec.edu
vrushali_bb@gmail.com
Afh.is_dd23@gmail.com
nsd23@gov.in

ss.ss@fff
ss@ns.i



Valid mobile nos

Valid complex no's
2+9j 3.8+7.3j 0+6j  5.4+0j
8
9j
