## re — Regular expression operations
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.<br>
RegEx can be used to check if a string contains the specified search pattern.

### Topics to Learn

- [Metacharacters](#Metacharacters)
- [Special Sequences](#Special-Sequences)
- [re Functions](#re-Functions)
- [re Objects](#re-Objects)
- [Match Objects](#Match-Objects)
- [re Flags](#re-Flags)

In [1]:
# Importing Regular expression module
import re

In [2]:
Metacharacters
Special Sequences
re Functions
Regular Expression Objects
Match Objects
re Flags
re Exceptions

### Metacharacters
- [[] – Square Brackets](#[]-–-Square-Brackets)
- [\ – Backslash](#\\-–-Backslash)
- [. – Dot](#.-–-Dot)
- [^ – Caret](#^-–-Caret)
- [\\$ – Dollar](#$-–-Dollar)
- [* – Star](#*-–-Star)
- [+ – Plus](#+-–-Plus)
- [? – Question Mark](#?-–-Question-Mark)
- [{} – Braces](#{}-–-Braces)
- [| – Or](#|-–-Or)
- [() – Parenthesis](#()-–-Parenthesis)

#### [] – Square Brackets
Square Brackets ([]) represent a character class consisting of a set of characters that we wish to match. For example, the character class [abc] will match any single a, b, or c. 
We can also specify a range of characters using – inside the square brackets. For example, 

- [0, 3] is sample as [0123]
- [a-c] is same as [abc]
<br>
We can also invert the character class using the caret(^) symbol. For example, 

- [^0-3] means any number except 0, 1, 2, or 3
- [^a-c] means any character except a, b, or c

In [None]:
txt = "[^0-3] means any number except 0, 1, 2, or 3"
#Find all lower case characters alphabetically between "a" and "m":
pattern = r"[a-m]"
print(re.findall(pattern, txt))
print()
#Find all vowels in the string:
pattern = r"[aeiouAEIOU]"
print(re.findall(pattern, txt))
print()
#Find all numbers between 2 to 5 in the string:
pattern = r"[2-6]"
print(re.findall(pattern, txt))
print()
#Find all characters except alphabetical letters in the string:
pattern = r"[^a-zA-Z]"
print(re.findall(pattern, txt))

#### \ – Backslash
The backslash (\\) makes sure that the character is not treated in a special way. This can be considered a way of escaping metacharacters.

In [None]:
s = 'The value of pi is 3.14'
 
# without using \
match = re.search(r'.', s)
print(match)
 
# using \
match = re.search(r'\.', s)
print(match)

#### . – Dot
Dot(.) symbol matches only a single character except for the newline character (\n).

In [None]:
string = "The quick brown fox jumps over the lazy dog."
pattern = r"brown..ox" # two dots - one for white space and one for 'f'
 
match = re.search(pattern, string)
if match:
    print("Match found!")
else:
    print("Match not found.")

#### ^ – Caret
Caret (^) symbol matches the beginning of the string i.e. checks whether the string starts with the given character(s) or not.

In [None]:
regex = r'^The'
strings = ['The quick brown fox', 'The lazy dog', 'A quick brown fox', 'Brown The fox']
Matched = []
Not_matched = []
for string in strings:
    if re.match(regex, string):
        Matched.append(string)
    else:
        Not_matched.append(string)
print("Matched strings: ", Matched)
print("Not matched strings: ", Not_matched)

#### \\$ – Dollar
Dollar($) symbol matches the end of the string i.e checks whether the string ends with the given character(s) or not.

In [3]:
string = "Hello World!"
pattern = r"!$"
 
match = re.search(pattern, string)
if match:
    print("Match found!")
else:
    print("Match not found.")

#### * – Star
Star (*) symbol matches zero or more occurrences of the regex preceding the * symbol.

In [4]:
txt = "hello planet hellllllllo helo heo"

#Search for a sequence that starts with "he", followed by 0 or more "l" characters, and an "o":

re.findall("hel*o", txt)

#### + – Plus
Plus (+) symbol matches one or more occurrences of the regex preceding the + symbol.

In [5]:
txt = "hello planet hellllllllo helo heo"

#Search for a sequence that starts with "he", followed by 1 or more "l" characters, and an "o":

re.findall("hel+o", txt)

#### ? – Question Mark
Question mark(?) checks if the string before the question mark in the regex occurs at least once or not at all.

In [6]:
txt = "hello planet hellllllllo helo heo"

#Search for a sequence that starts with "he", followed by 0 or 1 "l" characters, and an "o":

re.findall("hel?o", txt)

#### {} – Braces
- {m}<br>
Specifies that exactly m copies of the previous RE should be matched.
- {m,n}<br>
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible.
- {m,n}?<br>
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible.
- {m,n}+<br>
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible without establishing any backtracking points.

In [7]:
txt = "hello planet hellllllllo helo heo"

#Search for a sequence that starts with "he", followed by exactly 2 "l" characters, and an "o":

re.findall("hel{2}o", txt)

In [8]:
txt = "hello planet hellllllllo helo heo"

#Search for a sequence that starts with "he", followed by exactly 0 to 3 "l" characters, and an "o":

re.findall("hel{0,3}o", txt)

In [9]:
txt = "aaaaaaa"

print("Without question mark")
print(re.findall("a{3,5}", txt))

# match as few repetitions as possible
print("With question mark")
print(re.findall("a{3,5}?", txt))

In [10]:
txt = "aaaaaaaaa"

print("Without + sign")
print(re.findall("a{3,5}", txt))

# match as many repetitions as possible
print("With + sign")
print(re.findall("a{3,5}+", txt))

#### | – Or
A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B.

In [11]:
txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

#### () – Parenthesis
Group symbol is used to capture and group sub-patterns. 

In [12]:
txt = "acd abd adbc abcd dabcd"

re.findall("(a|b)cd", txt)

[<p style="text-align:right">[Back to Current Topic]</p>](#Metacharacters)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)

### Special Sequences
- [\A](#\A)
- [\b](#\b)
- [\B](#\B)
- [\d](#\d)
- [\D](#\D)
- [\s](#\s)
- [\S](#\S)
- [\w](#\w)
- [\W](#\W)
- [\Z](#\Z)

#### \A
Returns a match if the specified characters are at the beginning of the string

In [13]:
txt = "The rain in Spain"
x = re.findall("\AThe", txt)
print(x)
if x:
    print("Yes, there is a match!")
else:
    print("No match")

In [14]:
txt = "the rain in The Spain"
x = re.findall("\AThe", txt)
print(x)
if x:
    print("Yes, there is a match!")
else:
    print("No match")

#### \b
Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")

In [15]:
txt = "The rain in Spain"

#Check if "ain" is present at the beginning of a WORD:

x = re.findall(r"\bain", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [16]:
txt = "The rain in Spain"

#Check if "rain" is present at the beginning of a WORD:

x = re.findall(r"\brain", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [17]:
txt = "The rain in Spain"

#Check if "ain" is present at the end of a WORD:

x = re.findall(r"ain\b", txt)
print(x)
if x:
    print(f"Yes, there is at least one match!")
else:
    print("No match")

#### \B
Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")

In [18]:
txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the beginning of a word:

x = re.findall(r"\Bain", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [19]:
txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the end of a word:

x = re.findall(r"ain\B", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \d
Returns a match where the string contains digits (numbers from 0-9)

In [20]:
txt = "The rain in Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [21]:
txt = "The rain in 1Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \D
Returns a match where the string DOES NOT contain digits

In [22]:
txt = "The rain in Spain"

#Return a match at every no-digit character:

x = re.findall("\D", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [23]:
txt = "123456789"

#Return a match at every no-digit character:

x = re.findall("\D", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \s
Returns a match where the string contains a white space character

In [24]:
txt = "The rain in Spain"

#Return a match at every white-space character:

x = re.findall("\s", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [25]:
txt = "TheRainInSpain"

#Return a match at every white-space character:

x = re.findall("\s", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \S
Returns a match where the string DOES NOT contain a white space character

In [26]:
txt = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [27]:
txt = "     "

#Return a match at every NON white-space character:

x = re.findall("\S", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \w
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)

In [28]:
txt = "The rain in Spain"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):

x = re.findall("\w", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

In [29]:
txt = "@# $%^"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):

x = re.findall("\w", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \W
Returns a match where the string DOES NOT contain any word characters

In [30]:
txt = "The rain in Spain"

#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

#### \Z
Returns a match if the specified characters are at the end of the string

In [31]:
txt = "The rain in Spain"

#Check if the string ends with "Spain":

x = re.findall("Spain\Z", txt)
print(x)
if x:
    print("Yes, there is a match!")
else:
    print("No match")

In [32]:
txt = "The rain in Spainish city"

#Check if the string ends with "Spain":

x = re.findall("Spain\Z", txt)
print(x)
if x:
    print("Yes, there is a match!")
else:
    print("No match")

[<p style="text-align:right">[Back to Current Topic]</p>](#Special-Sequences)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)

### re Functions
- [re.compile](#re.compile)
- [re.search](#re.search)
- [re.match](#re.match)
- [re.fullmatch](#re.fullmatch)
- [re.split](#re.split)
- [re.findall](#re.findall)
- [re.finditer](#re.finditer)
- [re.sub](#re.sub)
- [re.subn](#re.subn)
- [re.escape](#re.escape)
- [re.purge](#re.purge)

#### re.compile
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 
<p style="background-color:yellow">SYNTAX - re.compile(pattern, flags=0)</p>

In [33]:
a_to_e = re.compile('[a-e]')
print(a_to_e)

In [34]:
digits = re.compile('\d')
print(digits)

#### re.search
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
<p style="background-color:yellow">SYNTAX - re.search(pattern, string, flags=0)</p>

In [35]:
s = 'I am a software developer'
match = re.search(r'soft', s)
print(match)
print('Start Index:', match.start())
print('End Index:', match.end())

In [36]:
regex = r"([a-zA-Z]+) (\d+)"

match = re.search(regex, "I was born on March 14")
if match != None:
    print("match found: ", match)
    print("Match at index %s, %s" % (match.start(), match.end()))
    print("Full match: %s" % (match.group(0)))
    print("Month: %s" % (match.group(1)))
    print("Day: %s" % (match.group(2)))
else:
    print("The regex pattern does not match.")

In [37]:
s = 'I am a software developer'
match = re.search(r'hardware', s)
print(match)

#### re.match
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
<br>
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
<br>
If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).
<p style="background-color:yellow">SYNTAX - re.match(pattern, string, flags=0)</p>

In [38]:
regex = r"([a-zA-Z]+) (\d+)"
match = re.match(regex, "I was born on March 14")
print(match)

In [39]:
regex = r"[a-zA-Z]+\s\d+"
match = re.match(regex, "March 14 is my birth date")
print(match)

#### re.fullmatch
If the whole string matches the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
<p style="background-color:yellow">SYNTAX - re.fullmatch(pattern, string, flags=0)</p>

In [40]:
string = 'Python'
pattern = 'P[h-y]{3}on'
print(re.fullmatch(pattern, string))

#### re.split
Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.
<p style="background-color:yellow">SYNTAX - re.split(pattern, string, maxsplit=0, flags=0)</p>

In [41]:
re.split(r'\W+', 'Words, words,,,,, words.')

In [42]:
re.split(r'(\W+)', 'Words, words, words.')

In [43]:
re.split(r'\W+', 'Words, words, words.', maxsplit=1)

In [44]:
re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)

#### re.findall
Return all non-overlapping matches of pattern in string, as a list of strings or tuples. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.
<p style="background-color:yellow">SYNTAX - re.findall(pattern, string, flags=0)</p>

In [45]:
string = "The quick brown fox jumps over the lazy dog"
pattern = "[aeiou]"
re.findall(pattern, string)

In [46]:
re.findall("ai", "The rain in Spain")

In [47]:
# words containing "in" in it
txt = "The raining in Spain"
pattern = r"[a-zA-Z]*in[a-z]*"
re.findall(pattern, txt)

In [48]:
print(re.findall(r'\w+=\d+', 'set width=20 and height=10'))
print(re.findall(r'(\w+)=(\d+)', 'set width=20 and height=10'))

#### re.finditer
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.
<p style="background-color:yellow">SYNTAX - re.finditer(pattern, string, flags=0)</p>

In [49]:
s1 = 'Blue Berries Blue Berries Blue Berries'
pattern = 'Blue'
match = re.finditer(pattern, s1)
print("Iterator object generated: ", match)
for i in match:
    s = i.start()
    e = i.end()
    print('String match "%s" at %d:%d' % (s1[s:e], s, e))

In [50]:
s1 = 'Blue Berries Blue Berries Blue Berries'
pattern = 'Blue'
match = re.findall(pattern, s1)
print("List generated: ", match)

#### re.sub
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth.
<p style="background-color:yellow">SYNTAX - re.sub(pattern, repl, string, count=0, flags=0)</p>

In [51]:
re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
       r'static PyObject*\npy_\1(void)\n{',
       'def myfunc():')

In [52]:
re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)

In [53]:
def dashrepl(matchobj):
    if matchobj.group(0) == '-': return ' '
    else: return '-'
    
re.sub('-{1,2}', dashrepl, 'pro----gram-files')

#### re.subn
Perform the same operation as sub(), but return a tuple (new_string, number_of_subs_made).
<p style="background-color:yellow">SYNTAX - re.subn(pattern, repl, string, count=0, flags=0)</p>

In [54]:
re.subn(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)

#### re.escape
Escape special characters in pattern. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
<p style="background-color:yellow">SYNTAX - re.escape(pattern)</p>

In [55]:
re.escape('https://www.python.org')

In [56]:
re.escape("!#$%&'*+-.^_`|~:")

In [57]:
re.escape("I Asked what is this [a-9], he said \t ^WoW")

#### re.purge
Clear the regular expression cache.

[<p style="text-align:right">[Back to Current Topic]</p>](#re-Functions)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)

### re Objects
- [Pattern.search](#Pattern.search)
- [Pattern.match](#Pattern.match)
- [Pattern.fullmatch](#Pattern.fullmatch)
- [Pattern.split](#Pattern.split)
- [Pattern.findall](#Pattern.findall)
- [Pattern.finditer](#Pattern.finditer)
- [Pattern.sub](#Pattern.sub)
- [Pattern.subn](#Pattern.subn)
- [Pattern.flags](#Pattern.flags)
- [Pattern.groups](#Pattern.groups)
- [Pattern.groupindex](#Pattern.groupindex)
- [Pattern.pattern](#Pattern.pattern)

#### Pattern.search
<p style="background-color:yellow">SYNTAX - Pattern.search</p>

#### Pattern.match
<p style="background-color:yellow">SYNTAX - Pattern.match</p>

#### Pattern.fullmatch
<p style="background-color:yellow">SYNTAX - Pattern.fullmatch</p>

#### Pattern.split
<p style="background-color:yellow">SYNTAX - Pattern.split</p>

#### Pattern.findall
<p style="background-color:yellow">SYNTAX - Pattern.findall</p>

#### Pattern.finditer
<p style="background-color:yellow">SYNTAX - Pattern.finditer</p>

#### Pattern.sub
<p style="background-color:yellow">SYNTAX - Pattern.sub</p>

#### Pattern.subn
<p style="background-color:yellow">SYNTAX - Pattern.subn</p>

#### Pattern.flags
<p style="background-color:yellow">SYNTAX - Pattern.flags</p>

#### Pattern.groups
<p style="background-color:yellow">SYNTAX - Pattern.groups</p>

#### Pattern.groupindex
<p style="background-color:yellow">SYNTAX - Pattern.groupindex</p>

#### Pattern.pattern
<p style="background-color:yellow">SYNTAX - Pattern.pattern</p>

[<p style="text-align:right">[Back to Current Topic]</p>](#Regular-Expression-Objects)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)

### Match Objects
- [Match.expand](#Match.expand)
- [Match.group](#Match.group)
- [Match.groups](#Match.groups)
- [Match.groupdict](#Match.groupdict)
- [Match.start](#Match.start)
- [Match.end](#Match.end)
- [Match.span](#Match.span)
- [Match.pos](#Match.pos)
- [Match.endpos](#Match.endpos)
- [Match.lastindex](#Match.lastindex)
- [Match.lastgroup](#Match.lastgroup)
- [Match.re](#Match.re)
- [Match.string](#Match.string)

#### Match.expand
<p style="background-color:yellow">SYNTAX - Match.expand</p>

#### Match.group
<p style="background-color:yellow">SYNTAX - Match.group</p>

#### Match.groups
<p style="background-color:yellow">SYNTAX - Match.groups</p>

#### Match.groupdict
<p style="background-color:yellow">SYNTAX - Match.groupdict</p>

#### Match.start
<p style="background-color:yellow">SYNTAX - Match.start</p>

#### Match.end
<p style="background-color:yellow">SYNTAX - Match.end</p>

#### Match.span
<p style="background-color:yellow">SYNTAX - Match.span</p>

#### Match.pos
<p style="background-color:yellow">SYNTAX - Match.pos</p>

#### Match.endpos
<p style="background-color:yellow">SYNTAX - Match.endpos</p>

#### Match.lastindex
<p style="background-color:yellow">SYNTAX - Match.lastindex</p>

#### Match.lastgroup
<p style="background-color:yellow">SYNTAX - Match.lastgroup</p>

#### Match.re
<p style="background-color:yellow">SYNTAX - Match.re</p>

#### Match.string
<p style="background-color:yellow">SYNTAX - Match.string</p>

[<p style="text-align:right">[Back to Current Topic]</p>](#Match-Objects)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)

### re Flags
- [re.A or re.ASCII](#re.A-or-re.ASCII)
- [re.DEBUG](#re.DEBUG)
- [re.I or re.IGNORECASE](#re.I-or-re.IGNORECASE)
- [re.M or re.MULTILINE](#re.M-or-re.MULTILINE)
- [re.NOFLAG](#re.NOFLAG)
- [re.S or re.DOTALL](#re.S-or-re.DOTALL)
- [re.X or re.VERBOSE](#re.X-or-re.VERBOSE)

#### re.A or re.ASCII
Make \w, \W, \b, \B, \d, \D, \s and \S perform ASCII-only matching instead of full Unicode matching.

In [58]:
# string with ASCII and Unicode characters
txt = "虎太郎 and Jessa are friends"

# Without re.A or re.ASCII
# To match all 3-letter word
result = re.findall(r"\b\w{3}\b", txt)
print(result)
# Output ['虎太郎', 'and', 'are']

# With re.A or re.ASCII
# regex to match only 3-letter ASCII word
result = re.findall(r"\b\w{3}\b", txt, re.A)
print(result)
# Output ['and', 'are']

In [59]:
txt = '\u0967\u096a\u096c'
print(txt)
print(re.search('\w+', txt))
print(re.search('\w+', txt, re.UNICODE))
print(re.search('\w+', txt, re.A))

#### re.DEBUG
Display debug information about compiled expression. No corresponding inline flag.

In [60]:
re.search('foo.bar', 'fooxbar', re.DEBUG)

In [61]:
txt = 'fooxbar'
re.search('[a-z]{3}.[a-z]{3}', txt, re.DEBUG)

#### re.I or re.IGNORECASE
Perform case-insensitive matching; expressions like [A-Z] will also match lowercase letters.

In [62]:
txt = "KELLy is a Python developer at a PYnative. kelly loves ML and AI"

# Without using re.I
result = re.findall(r"kelly", txt)
print(result)

# with re.I
result = re.findall(r"kelly", txt, re.I)
print(result)

# with re.IGNORECASE
result = re.findall(r"kelly", txt, re.IGNORECASE)
print(result)

#### re.M or re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '\\$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '\\$' only at the end of the string and immediately before the newline (if any) at the end of the string.

In [63]:
target_str = "Joy lucky number is 75\nTom lucky number is 25"

# find 3-letter word at the start of each newline
# Without re.M or re.MULTILINE flag
result = re.findall(r"^\w{3}", target_str)
print(result)  
# Output ['Joy']

# find 2-digit at the end of each newline
# Without re.M or re.MULTILINE flag
result = re.findall(r"\d{2}$", target_str)
print(result)
# Output ['25']

# With re.M or re.MULTILINE
# find 3-letter word at the start of each newline
result = re.findall(r"^\w{3}", target_str, re.MULTILINE)
print(result)
# Output ['Joy', 'Tom']

# With re.M
# find 2-digit number at the end of each newline
result = re.findall(r"\d{2}$", target_str, re.M)
print(result)
# Output ['75', '25']

#### re.NOFLAG
Indicates no flag being applied, the value is 0. This flag may be used as a default value for a function keyword argument or as a base value that will be conditionally ORed with other flags.

#### re.S or re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

In [64]:
# string with newline character
target_str = "ML\nand AI"

# Match any character
result = re.search(r".+", target_str)
print("Without using re.S flag:", result.group())
# Output 'ML'

# With re.S flag
result = re.search(r".+", target_str, re.S)
print("With re.S flag:", result.group())
# Output 'ML\nand AI'

# With re.DOTALL flag
result = re.search(r".+", target_str, re.DOTALL)
print("With re.DOTALL flag:", result.group())
# Output 'ML\nand AI'

#### re.X or re.VERBOSE
This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments.

In [65]:
target_str = "Jessa is a Python developer, and her salary is 8000"

# re.X to add indentation  and comment in regex
result = re.search(r"""(^\w{2,}) # match 5-letter word at the start
                        .+(\d{4}$) # match 4-digit number at the end """, target_str, re.X)
# Fiver-letter word
print(result.group(1))
# Output 'Jessa'

# 4-digit number
print(result.group(2))
# Output 8000

[<p style="text-align:right">[Back to Current Topic]</p>](#re-Flags)
[<p style="text-align:right">[Back to top]</p>](#re-—-Regular-expression-operations)