<br>
<br>

## `A regular Expression is a special sequence of characters that uses a search pattern to find a strings or set of strings.`

<br>
<br>

In [1]:
import re

<br>
<br>

# Meta character in regEX:

<br>
<br>

1. `.` : Matches any character except a newline.
2. `^` : Matches the start of the string.
3. `$` : Matches the end of the string or just before the newline at the end of the string.
4. `*` : Matches 0 or more repetitions of the preceding regex.
5. `+` : Matches 1 or more repetitions of the preceding regex.
6. `?` : Matches 0 or 1 repetition of the preceding regex.
7. `{}` : Matches exactly the specified number of repetitions.
8. `[]` : Used to indicate a set of characters.
9. `|` : Matches either the regex on the left or the regex on the right.
10. `()` : Used for grouping and capturing.
11. `\` : Used to escape special characters or signals a special sequence.




In [2]:

a = "charlie chaplin coa and the chocolate factory"
b = "ayushi.jain@gmail.com"
c = "hello"
d = "XYZ,YZ,XYZZ,XYYZ,XXZZY,ZYZ,"

# parameter(regEx,string)
match = re.search(r".",b)
print("r'.' : {}".format(match))
# it gives us a but we need to match '.' in this case we use `\`
# here span means starting and ending index.


match = re.search(r"\.",b)
print("r'\\.' : {}".format(match))


match = re.search(r"[l]",c)
print(match)

r'.' : <re.Match object; span=(0, 1), match='a'>
r'\.' : <re.Match object; span=(6, 7), match='.'>
<re.Match object; span=(2, 3), match='l'>


<br>

## use of `re.findall()`

### 1. `[]` : Used to indicate a set of characters.

<br>

In [3]:

match = re.findall(r"[l]",c)
print(match)

match = re.findall(r"[ayu]",b)
print(match)




['l', 'l']
['a', 'y', 'u', 'a', 'a']


<br>
<br>

# 2. `^` : Matches the start of the string.

<br>
<br>

In [4]:

match = re.search(r"^l",c) # first word is not l
print(match)

match = re.search(r"^l",b) # first word is not l
print(match)

match = re.search(r"^a",b) # first word is not a
print(match)

match = re.findall(r"^k",b) # first word is not k
print(match)

match = re.findall(r"^a",b)
print(match)




None
None
<re.Match object; span=(0, 1), match='a'>
[]
['a']


<br>
<br>


# 3. `$` : Matches the end of the string or just before the newline at the end of the string.

<br>
<br>

In [5]:
match = re.search(r"com$",b)
print(match)

match = re.search(r"shi.jain@gmail.com$",b)
print(match)

match = re.search(r"dkfshi.jain@gmail.com$",b)
print(match)

match = re.findall(r"com$",b)
print(match)

match = re.findall(r"shi.jain@gmail.com$",b)
print(match)

match = re.findall(r"dkfshi.jain@gmail.com$",b)
print(match)




<re.Match object; span=(18, 21), match='com'>
<re.Match object; span=(3, 21), match='shi.jain@gmail.com'>
None
['com']
['shi.jain@gmail.com']
[]


<br>
<br>

# 4. `.` : Matches any character except a newline.

<br>
<br>


In [6]:
# one dot
match = re.search(r'c.a',a)
print(match)

match = re.findall(r'c.a',a)
print(match)


# two dot
match = re.findall(r'c..p',a)
print(match)


<re.Match object; span=(0, 3), match='cha'>
['cha', 'cha', 'coa']
['chap']


<br>
<br>

# 5. `|` : Matches either the regex on the left or the regex on the right.

<br>
<br>


In [7]:

match = re.findall(r" c.a | c..p",a)
print(match) 

match = re.findall(r"cha|fac",a)
print(match) 

# match only the right side
match = re.findall(r"chomun|fac",a)
print(match) 




[' chap', ' coa ']
['cha', 'cha', 'fac']
['fac']



<br>
<br>

# 6. `?` : Matches 0 or 1 repetition of the preceding regex.

<br>
<br>

In [8]:
match = re.findall(r"cha?a",a)
print(match)

match = re.findall(r"cha?a",a)
print(match)


m = "cha chao chk cho cham chokk"
match = re.findall(r"ch?a",m)
print(match)


['cha', 'cha']
['cha', 'cha']
['cha', 'cha', 'cha']


<br>
<br>

# 7. `*` : Matches 0 or more repetitions of the preceding regex.

<br>
<br>

In [9]:

m = "cha chaaaao chk cho cham chokk"
match = re.findall(r"cha*",m)
print(match)


['cha', 'chaaaa', 'ch', 'ch', 'cha', 'ch']


<br>
<br>

# 8. `+` : Matches 1 or more repetitions of the preceding regex.

<br>
<br>

In [10]:

match = re.findall(r"XY+Z",d)
print(match)

m = "XYZ,XYZ,XYYZ, XYYYYYK, XXXYYYYYZZZ"
match = re.findall(r"XY+Z",m)
print(match)


['XYZ', 'XYZ', 'XYYZ']
['XYZ', 'XYZ', 'XYYZ', 'XYYYYYZ']


<br>
<br>

## 9. `{}` : Matches exactly the specified number of repetitions.
<br>
<br>

In [11]:

#2B53K5,1 but our range (2 to 5)
m = "aaBbbbbbbbBBBBBaaaaaaaaKKaaaaaa"
match = re.findall(r"a{2,5}",m)
print(match)


['aa', 'aaaaa', 'aaa', 'aaaaa']


<br>
<br>

# 10. `()` : Used for grouping and capturing.

<br>
<br>

In [12]:

# d = "XYZ,YZ,XYZZ,XYYZ,XXZZY,ZYZ,"

match = re.findall(r"(X|Y)Z",d)
print(match)


['Y', 'Y', 'Y', 'Y', 'X', 'Y']


<br>
<br>
<br>

# Special Sequences:

| Character | Description                                                                                                        | Example       |
|-----------|--------------------------------------------------------------------------------------------------------------------|---------------|
| `\A`      | Returns a match if the specified characters are at the beginning of the string                                      | `\AThe`       |
| `\b`      | Returns a match where the specified characters are at the beginning or at the end of a word (use raw string `r""`) | `r"\bain"`    |
|           |                                                                                                                    | `r"ain\b"`    |
| `\B`      | Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word     | `r"\Bain"`    |
|           |                                                                                                                    | `r"ain\B"`    |
| `\d`      | Returns a match where the string contains digits (numbers from 0-9)                                                | `\d`          |
| `\D`      | Returns a match where the string DOES NOT contain digits                                                           | `\D`          |
| `\s`      | Returns a match where the string contains a white space character                                                  | `\s`          |
| `\S`      | Returns a match where the string DOES NOT contain a white space character                                          | `\S`          |
| `\w`      | Returns a match where the string contains any word characters (a to Z, digits 0-9, and the underscore `_` )        | `\w`          |
| `\W`      | Returns a match where the string DOES NOT contain any word characters                                              | `\W`          |
| `\Z`      | Returns a match if the specified characters are at the end of the string                                           | `Spain\Z`     |

<br>
<br>
<br>

<br>

# 1. `\A` Returns a match if the specified characters are at the beginning of the string          

<br>

In [13]:
a = "harray porter"

match = re.search(r"\Aharray",a)
print(match)

match = re.findall(r"\Aharr",a)
print(match)


match = re.search(r"\Aharray por",a)
print(match)

match = re.search(r"\Adlkfj",a)
print(match)

match = re.findall(r"\Aharray portk",a)
print(match)



<re.Match object; span=(0, 6), match='harray'>
['harr']
<re.Match object; span=(0, 10), match='harray por'>
None
[]


<br>
<br>

# 2. `\b` Returns a match where the specified characters are at the beginning or at the end of a word (use raw string `r""`)

<br>
<br>

In [14]:

a = "harray porter harray"

#---------beginning----------
match = re.findall(r"\bay",a)
print(match)

match = re.findall(r"\bha",a)
print(match)

match = re.findall(r"ay\a",a)
print(match)

#---------ending-------------
match = re.search(r"ay\b",a)
print(match)

match = re.findall(r"ay\b",a)
print(match)



[]
['ha', 'ha']
[]
<re.Match object; span=(4, 6), match='ay'>
['ay', 'ay']


<br>
<br>

# 3. `\B` Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word

<br>
<br>

In [15]:


a = "harray porter harray"

match = re.findall(r"\Barra",a)
print(match)

# match middle words from at the end
match = re.search(r"te\B",a)
print(match)

# match middle word 
match = re.findall(r"\Bor",a)
print(match)



['arra', 'arra']
<re.Match object; span=(10, 12), match='te'>
['or']


<br>
<br>

# 4. `\d`  Returns a match where the string contains digits (numbers from 0-9) .   

<br>

In [16]:

a = "yasin2102030k993"

match = re.findall(r"\d",a)
print(match)

match = re.findall(r"\d{2}",a)
print(match)


['2', '1', '0', '2', '0', '3', '0', '9', '9', '3']
['21', '02', '03', '99']


<br>
<br>

# 5. `\D`  Returns a match where the string DOES NOT contain digits.

<br>

In [17]:

a = "yasin2102030k993 KALU"

match = re.findall(r"\D",a)
print(match)

match = re.findall(r"\D{5}",a)
print(match)


['y', 'a', 's', 'i', 'n', 'k', ' ', 'K', 'A', 'L', 'U']
['yasin', ' KALU']


<br>
<br>

# 6. `\s` Returns a match where the string contains a white space character .   

<br>

In [18]:

a = "My name is Yasin Arafat."
match = re.findall(r"\s",a)
print(match)



[' ', ' ', ' ', ' ']


<br>
<br>

# 7. `\S`  Returns a match where the string DOES NOT contain a white space character

<br>

In [19]:

a = "My name is Yasin Arafat."
match = re.findall(r"\S",a)
print(match)



['M', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'Y', 'a', 's', 'i', 'n', 'A', 'r', 'a', 'f', 'a', 't', '.']


<br>
<br>

# 8. `\w` Returns a match where the string contains any word characters (a to Z, digits 0-9, and the underscore `_` 

<br>

In [20]:

a = "My name is Yasin_Arafat !@#$%^&*()."
match = re.findall(r"\w",a)
print(match)



['M', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'Y', 'a', 's', 'i', 'n', '_', 'A', 'r', 'a', 'f', 'a', 't']


<br>
<br>

# 9. `\W`  Returns a match where the string DOES NOT contain any word characters

<br>

In [21]:

a = "My name is Yasin_Arafat !@#$%^&*()."
match = re.findall(r"\W",a)
print(match)



[' ', ' ', ' ', ' ', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '.']


<br>
<br>

# 10. `\Z`  Returns a match if the specified characters are at the end of the string 

<br>

In [22]:

a = "My name is Yasin_Arafat !@#$%^&*()."

match = re.findall(r"().\Z",a)
print(match)

match = re.findall(r".\Z",a)
print(match)


['']
['.']


<br>
<br>

# Set In regEx:

| Set          | Description                                                                                        | Example |
|--------------|----------------------------------------------------------------------------------------------------|---------|
| `[arn]`      | Returns a match where one of the specified characters (a, r, or n) is present                      |         |
| `[a-n]`      | Returns a match for any lower case character, alphabetically between a and n                       |         |
| `[^arn]`     | Returns a match for any character EXCEPT a, r, and n                                               |         |
| `[0123]`     | Returns a match where any of the specified digits (0, 1, 2, or 3) are present                      |         |
| `[0-9]`      | Returns a match for any digit between 0 and 9                                                      |         |
| `[0-5][0-9]` | Returns a match for any two-digit numbers from 00 to 59                                            |         |
| `[a-zA-Z]`   | Returns a match for any character alphabetically between a and z, lower case OR upper case         |         |
| `[+]`        | In sets, +, *, ., |, (), $, {} has no special meaning, so `[+]` means: return a match for any `+` character in the string |         |


<br>
<br>

# 1. `[arn]` Returns a match where one of the specified characters (a, r, or n) is present  

<br>

In [23]:

a = "charlie chaplin coa and the chocolate factory"

match = re.findall(r"[fac]",a)
print(match)

match = re.findall(r"[k]",a)
print(match)


['c', 'a', 'c', 'a', 'c', 'a', 'a', 'c', 'c', 'a', 'f', 'a', 'c']
[]


<br>
<br>

# 2. `[a-n]`  Returns a match for any lower case character, alphabetically between a and n    

<br>

In [24]:

a = "charlie chaplin coa and the chocolate factory"

match = re.findall(r"[a-d]",a)
print(match)

match = re.findall(r"[m-n]",a)
print(match)

['c', 'a', 'c', 'a', 'c', 'a', 'a', 'd', 'c', 'c', 'a', 'a', 'c']
['n', 'n']


<br>
<br>

# 3. `[^arn]`  Returns a match for any character EXCEPT a, r, and n          

<br>

In [25]:
a = "aaaaaabbbbbcccccddddddd kkk lll"

match = re.findall(r"[^abc]",a)
print(match)

['d', 'd', 'd', 'd', 'd', 'd', 'd', ' ', 'k', 'k', 'k', ' ', 'l', 'l', 'l']


<br>
<br>

# 4. `[0123]` Returns a match where any of the specified digits (0, 1, 2, or 3) are present        
<br>

In [26]:
a = "34324123432423"

match = re.findall(r"[0123]",a)
print(match)

['3', '3', '2', '1', '2', '3', '3', '2', '2', '3']


<br>
<br>

# 5. `[0-9]` Returns a match for any digit between 0 and 9                 

<br>

In [27]:
a = "45dfjkl56dflj78kk80ll90mm99"
match = re.findall("[0-5]",a)
print(match)

['4', '5', '5', '0', '0']


<br>
<br>

# 6. `[0-5][0-9]` Returns a match for any two-digit numbers from 00 to 59                                 
<br>

In [28]:
a = "45dfjkl56dflj78kk80ll90mm99"
match = re.findall("[0-5][0-9]",a)
print(match)

['45', '56']


<br>
<br>

# 7. `[a-zA-Z]`  Returns a match for any character alphabetically between a and z, lower case OR upper case       

<br>

In [29]:

a = "ABCabc"
match = re.findall("[A-Ba-b]",a)
print(match)

['A', 'B', 'a', 'b']


<br>
<br>

# 8. `[+]`   In sets, +, *, ., |, (), $, {} has no special meaning, so `[+]` means: return a match for any `+` character in the string 

<br>

In [34]:
a = "!@#$%^&*()_+_"

match = re.findall(r"[+_()]",a)
print(match)

match = re.findall(r"[_]",a)
print(match)

match = re.findall(r"[$]",a)
print(match)


['(', ')', '_', '+', '_']
['_', '_']
['$']
