# 1. What is the name of the feature responsible for generating Regex objects?

The re.compile() function returns Regex objects. In another words, We can compile a regular expression into a regex object to look for occurrences of the same pattern inside various target strings without rewriting it.

# 2. Why do raw strings often appear in Regex objects?

Regular expressions use the backslash character ('\') to indicate special forms (Metacharacters) or to allow special characters (speical sequences) to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals. Hence, Raw strings are used (e.g. r"\n") so that backslashes do not have to be escaped.

# 3. What is the return value of the search() method?

The search() method returns Match objects.

# 4. From a Match item, how do you get the actual strings that match the pattern?

The group() method returns strings of the matched text.

# 5. In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover? Group 2? Group 1?

In the Regex r'(\d\d\d)-(\d\d\d-\d\d\d\d)' the zero group covers the entire pattern match where as the first group cover (\d\d\d) and the second group cover (\d\d\d-\d\d\d\d)

In [1]:
# Example Program
import re
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 123-456-7891.')
print(mo.groups()) # Prints all groups in a tuple format
print(mo.group()) # Always returns the fully matched string 
print(mo.group(1)) # Returns the first group
print(mo.group(2)) # Returns the second group

('123', '456-7891')
123-456-7891
123
456-7891


# 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

The \. \( and \) escape characters in the raw string passed to re.compile() will match actual parenthesis characters.

In [1]:
# Example Program
import re
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (123) 456-7891.')
print(mo.group())

(123) 456-7891


# 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

If the regex has no groups, a list of strings is returned. If the regex has groups, a list of tuples of strings is returned.

In [2]:
# Example Program
import re
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.findall('My phone number is (123) 456-7891')
print(mo)

# Example Program
import re
phoneNumRegex = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phoneNumRegex.findall('My number is 123-456-7891.')
print(mo) # Prints all groups in a tuple format

[('(123)', '456-7891')]
['123-456-7891']


# 8. In standard expressions, what does the | character mean?

The | character signifies matching “either, or” between two groups.

# 9. In regular expressions, what does ? the character stand for?

The ? character can either mean “match zero or one of the preceding group” or be used to signify nongreedy matching.

# 10.In regular expressions, what is the difference between the + and * characters?

In Regular Expressions, * Represents Zero ore more occurances of the preceeding group, whereas + represents one or more occurances of the preceeding group.

# 11. What is the difference between {4} and {4,5} in regular expression?

The {4} matches exactly three instances of the preceding group. The {4,5} matches between three and five instances.

# 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

# 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

# 14. What is the difference between .* ? and .* ?

.* is a Greedy mode, which returns the longest string that meets the condition. Whereas .*? is a non greedy mode which returns the shortest string that meets the condition.

# 15. What is the syntax for matching both numbers and lowercase letters with a character class?

Either [0-9a-z] or [a-z0-9]

# 16. What is the procedure for making a normal expression in regax case insensitive?

We can pass re.IGNORECASE as a flag to make a normal expression case insensitive

# 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

The . character normally matches any character except the newline character. If re.DOTALL is passed as the second argument to re.compile(), then the dot will also match newline characters.

# 18. If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4 hen&#39;) return?

The output will be 'X drummers, X pipers, five rings, X hens'

In [4]:
import re
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

# 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

re.VERBOSE will allow to add whitespace and comments to string passed to re.compile().

In [5]:
# Without Using VERBOSE
regex_email = re.compile(r'^([a-z0-9_\.-]+)@([0-9a-z\.-]+)\.([a-z\.]{2, 6})$', re.IGNORECASE)
 
# Using VERBOSE
regex_email = re.compile(r"""
                            ^([a-z0-9_\.-]+)              # local Part like username
                            @                             # single @ sign 
                            ([0-9a-z\.-]+)                # Domain name like google
                            \.                            # single Dot .
                            ([a-z]{2,6})$                 # Top level Domain  like com/in/org
                         """,re.VERBOSE | re.IGNORECASE)   

# 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:

&#39;42&#39;

&#39;1,234&#39;

&#39;6,368,745&#39;

but not the following:

&#39;12,34,567&#39; (which has only two digits between the commas)

&#39;1234&#39; (which lacks commas)

In [6]:
import re
pattern = r'^\d{1,3}(,\d{3})*$'
pagex = re.compile(pattern)
for ele in ['42','1,234', '6,368,745','12,34,567','1234']:
    print('Output:',ele, '->', pagex.search(ele))

Output: 42 -> <re.Match object; span=(0, 2), match='42'>
Output: 1,234 -> <re.Match object; span=(0, 5), match='1,234'>
Output: 6,368,745 -> <re.Match object; span=(0, 9), match='6,368,745'>
Output: 12,34,567 -> None
Output: 1234 -> None


# 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

&#39;Haruto Watanabe&#39;

&#39;Alice Watanabe&#39;

&#39;RoboCop Watanabe&#39;

but not the following:

&#39;haruto Watanabe&#39; (where the first name is not capitalized)

&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)

&#39;Watanabe&#39; (which has no first name)

&#39;Haruto watanabe&#39; (where Watanabe is not capitalized)

In [7]:
import re
pattern = r'[A-Z]{1}[a-z]*\sWatanabe'
namex = re.compile(pattern)
for name in ['Haruto Watanabe','Alice Watanabe','RoboCop Watanabe','haruto Watanabe','Mr. Watanabe','Watanabe','Haruto watanabe']:
    print('Output: ',name,'->',namex.search(name))

Output:  Haruto Watanabe -> <re.Match object; span=(0, 15), match='Haruto Watanabe'>
Output:  Alice Watanabe -> <re.Match object; span=(0, 14), match='Alice Watanabe'>
Output:  RoboCop Watanabe -> <re.Match object; span=(4, 16), match='Cop Watanabe'>
Output:  haruto Watanabe -> None
Output:  Mr. Watanabe -> None
Output:  Watanabe -> None
Output:  Haruto watanabe -> None


# 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob,or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

&#39;Alice eats apples.&#39;

&#39;Bob pets cats.&#39;

&#39;Carol throws baseballs.&#39;

&#39;Alice throws Apples.&#39;

&#39;BOB EATS CATS.&#39;

but not the following:

&#39;RoboCop eats apples.&#39;

&#39;ALICE THROWS FOOTBALLS.&#39;

&#39;Carol eats 7 cats.&#39;

In [8]:
import re
pattern = r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.'
casex = re.compile(pattern,re.IGNORECASE)
for ele in ['Alice eats apples.','Bob pets cats.','Carol throws baseballs.','Alice throws Apples.','BOB EATS CATS.','RoboCop eats apples.'
,'ALICE THROWS FOOTBALLS.','Carol eats 7 cats.']:
    print('Output: ',ele,'->',casex.search(ele))

Output:  Alice eats apples. -> <re.Match object; span=(0, 18), match='Alice eats apples.'>
Output:  Bob pets cats. -> <re.Match object; span=(0, 14), match='Bob pets cats.'>
Output:  Carol throws baseballs. -> <re.Match object; span=(0, 23), match='Carol throws baseballs.'>
Output:  Alice throws Apples. -> <re.Match object; span=(0, 20), match='Alice throws Apples.'>
Output:  BOB EATS CATS. -> <re.Match object; span=(0, 14), match='BOB EATS CATS.'>
Output:  RoboCop eats apples. -> None
Output:  ALICE THROWS FOOTBALLS. -> None
Output:  Carol eats 7 cats. -> None
