# 1. What is the name of the feature responsible for generating Regex objects?

In [1]:
import re

pattern = re.compile(r'\d+')


# 2. Why do raw strings often appear in Regex objects?

* Raw strings are often used in `Regex` objects because they allow us to avoid escaping special characters that have a meaning in regular expressions. Using raw strings makes regular expression patterns more readable and less error-prone, as it eliminates the need to double-escape special characters. 


# 3. What is the return value of the search() method?

* The `search()` method is used to search for the first occurrence of a pattern in a string that matches a regular expression. If the pattern is found, the `search()` method returns a `Match` object, which contains information about the match, such as the matched text and the position of the match within the string. If no match is found, `search()` returns `None`.

# 4. From a Match item, how do you get the actual strings that match the pattern?



In [2]:
import re

pattern = re.compile(r'(\d+)-(\d+)-(\d+)')
text = 'Date: 2023-05-11'

match = pattern.search(text)
if match:
    print('Match found:', match.group())
    print('Year:', match.group(1))
    print('Month:', match.group(2))
    print('Day:', match.group(3))


Match found: 2023-05-11
Year: 2023
Month: 05
Day: 11


# 5. In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?
Group 2? Group 1?


In [3]:
import re

pattern = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
text = 'My phone number is 123-456-7890.'

match = pattern.search(text)
if match:
    print('Match found:', match.group())
    print('Group 1:', match.group(1))
    print('Group 2:', match.group(2))


Match found: 123-456-7890
Group 1: 123
Group 2: 456-7890


# 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell
a regex that you want it to fit real parentheses and periods?

In [4]:
import re

text = '(some.text)'
pattern = re.compile(r'\(some\.text\)') # escape parentheses and period with backslash

match = pattern.search(text)
if match:
    print('Match found:', match.group())


Match found: (some.text)


# 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of
the two options?

In [5]:
import re

text = 'The quick brown fox jumps over the lazy dog'
pattern = re.compile(r'\w+')
matches = pattern.findall(text)
print(matches)


['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


In [6]:
import re

text = 'John Smith, 30 years old'
pattern = re.compile(r'(\w+)\s+(\w+),\s+(\d+)\s+years\s+old')
matches = pattern.findall(text)
print(matches)


[('John', 'Smith', '30')]


# 8. In standard expressions, what does the | character mean?

* In standard regular expression syntax, the | character represents the logical OR operator. It allows specifying multiple alternative patterns separated by the | symbol, where each pattern will be matched independently. The | symbol allows matching any one of the alternative patterns in the expression.

# 9. In regular expressions, what does the character stand for?

* In regular expressions, the . (dot) character is a wildcard that matches any single character except for a newline character. It can be used to match any character or symbol in a string.

# 10.In regular expressions, what is the difference between the + and * characters?

* In regular expressions, + and * characters are used to match patterns.

* The + matches one or more occurrences of the preceding pattern, while the * matches zero or more occurrences of the preceding pattern.

# 11. What is the difference between {4} and {4,5} in regular expression?

* In regular expressions, {4} means match exactly four occurrences of the preceding pattern, while {4,5} means match at least four and at most five occurrences of the preceding pattern.

# 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular
expressions?

* \d: Matches any digit character, equivalent to [0-9].
* \w: Matches any word character, which includes letters, digits, and underscores, equivalent to [a-zA-Z0-9_].
* \s: Matches any whitespace character, including spaces, tabs, and line breaks.

# 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?


* \D matches any non-digit character.
* \W matches any non-word character. A word character includes letters, digits, and underscores (_).
* \S matches any non-space character.

# 14. What is the difference between .*? and .*?

* In regular expressions, the combination of the dot (.) and asterisk (*) is used to match any character (dot) zero or more times (asterisk). The question mark (?) following the combination makes it non-greedy, meaning it will match the shortest possible string that satisfies the pattern rather than the longest.

# 15. What is the syntax for matching both numbers and lowercase letters with a character class?

* The syntax for matching both numbers and lowercase letters with a character class in a regular expression is `[0-9a-z]`. The brackets denote a character class, and the dash `-` indicates a range of characters to match, in this case, from 0 to 9 and from a to z. By using this character class, the regular expression will match any single character that is either a lowercase letter or a digit.

# 16. What is the procedure for making a normal expression in regax case insensitive?

In [7]:
import re

text = "Hello, world!"
pattern = re.compile(r"hello", re.IGNORECASE)
match = pattern.search(text)
print(match)


<re.Match object; span=(0, 5), match='Hello'>


# 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd
argument in re.compile()?

In [8]:
import re

pattern = re.compile(r'.+')


In [9]:
import re

pattern = re.compile(r'.+', re.DOTALL)


# 18. If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4
hen&#39;) return?

* The `numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')` method will replace all the matches of the regular expression `\d+` (one or more digits) in the input string `'11 drummers, 10 pipers, five rings, 4 hen'` with the character `'X'`. 


# 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

In [10]:
pattern = r'\d{3}-\d{2}-\d{4}'


In [11]:
pattern = r'''
    \d{3}   # match three digits
    -       # match a hyphen
    \d{2}   # match two digits
    -       # match a hyphen
    \d{4}   # match four digits
'''


# 20. How would you write a regex that match a number with comma for every three digits? It must
match the given following:
&#39;42&#39;
&#39;1,234&#39;
&#39;6,368,745&#39;

but not the following:
&#39;12,34,567&#39; (which has only two digits between the commas)
&#39;1234&#39; (which lacks commas)

Any single digit number (e.g. '0', '9', etc.)
Any number with digits in groups of three separated by commas (e.g. '1,000', '123,456,789', etc.)
It will not match:

Any number with fewer than three digits between the commas (e.g. '12,34,567')
Any number without commas (e.g. '1234')

In [13]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

print(pattern.match('42'))        # Matches
print(pattern.match('1,234'))     # Matches
print(pattern.match('6,368,745')) # Matches
print(pattern.match('12,34,567')) # Does not match
print(pattern.match('1234'))      # Does not match


<re.Match object; span=(0, 2), match='42'>
<re.Match object; span=(0, 5), match='1,234'>
<re.Match object; span=(0, 9), match='6,368,745'>
None
None


# 21. How would you write a regex that matches the full name of someone whose last name is
Watanabe? You can assume that the first name that comes before it will always be one word that
begins with a capital letter. The regex must match the following:
&#39;Haruto Watanabe&#39;
&#39;Alice Watanabe&#39;
&#39;RoboCop Watanabe&#39;
but not the following:
&#39;haruto Watanabe&#39; (where the first name is not capitalized)
&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)
&#39;Watanabe&#39; (which has no first name)
&#39;Haruto watanabe&#39; (where Watanabe is not capitalized)

In [14]:
import re

name_regex = re.compile(r'[A-Z][a-zA-Z]* Watanabe')


# 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob,
or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs;
and the sentence ends with a period? This regex should be case-insensitive. It must match the
following:
&#39;Alice eats apples.&#39;
&#39;Bob pets cats.&#39;
&#39;Carol throws baseballs.&#39;
&#39;Alice throws Apples.&#39;
&#39;BOB EATS CATS.&#39;
but not the following:
&#39;RoboCop eats apples.&#39;
&#39;ALICE THROWS FOOTBALLS.&#39;
&#39;Carol eats 7 cats.&#39;

^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$


* ^ matches the start of the string
* (Alice|Bob|Carol) matches any of the names Alice, Bob, or Carol
* \s+ matches one or more whitespace characters
* (eats|pets|throws) matches any of the verbs eats, pets, or throws
* (apples|cats|baseballs) matches any of the objects apples, cats, or baseballs
* \. matches a period
* $ matches the end of the string
* The i flag can be added at the end of the regex to make it case-insensitive



