# Q1. What is the benefit of regular expressions?

A1. Regular expressions provide a powerful way to search for and manipulate patterns in text. They can be used for tasks such as data validation, text parsing and manipulation, and search and replace operations. By using regular expressions, we can write code that is more concise, flexible, and efficient.

In [1]:
import re

# search for a pattern in a string
string = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
result = re.search(pattern, string)
print(result.group()) # "fox"


fox


# Q2. Describe the difference between the effects of "(ab)c+" and "a(bc)+." Which of these, if any, is the unqualified pattern "abc+"?

A2. The regular expression "(ab)c+" matches one or more occurrences of the sequence "abc", where "ab" is captured as a group. The regular expression "a(bc)+" matches one or more occurrences of the sequence "abc", where "bc" is captured as a group. The unqualified pattern "abc+" matches one or more occurrences of the sequence "abc" without capturing any groups.

In [2]:
import re

# match "(ab)c+"
pattern1 = "(ab)c+"
string1 = "abcccabc"
result1 = re.findall(pattern1, string1)
print(result1) # ["abc"]

# match "a(bc)+"
pattern2 = "a(bc)+"
string2 = "abcbcbc"
result2 = re.findall(pattern2, string2)
print(result2) # ["bcbc"]

# match "abc+"
pattern3 = "abc+"
string3 = "abcccabc"
result3 = re.findall(pattern3, string3)
print(result3) # ["abc", "abc"]


['ab', 'ab']
['bc']
['abccc', 'abc']


# Q3. How much do you need to use the following sentence while using regular expressions?

import re

A3. The "import re" statement is required at the beginning of any Python script that uses regular expressions. It imports the "re" module, which provides functions and classes for working with regular expressions in Python.



In [3]:
import re

string = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
result = re.search(pattern, string)
print(result.group()) # "fox"


fox


# Q4. Which characters have special significance in square brackets when expressing a range, and under what circumstances?

A4. In square brackets, certain characters have special significance when used to express a range of characters. These characters include:

- "-" (hyphen): used to specify a range of characters, e.g. "[a-z]" matches any lowercase letter.
- "^" (caret): used to indicate a negated character set, i.e. any character not in the set, e.g. "[^0-9]" matches any non-digit character.
- "" (backslash): used to escape special characters that would otherwise have special meaning in the regular expression, e.g. "[" matches a literal opening bracket.

In [4]:
import re

# match lowercase letters
pattern1 = "[a-z]+"
string1 = "The quick brown fox jumps over the lazy dog"
result1 = re.findall(pattern1, string1)
print(result1) # ["he", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

# match non-digits
pattern2 = "[^0-9]+"
string2 = "1, 2, buckle my shoe"
result2 = re.findall(pattern2, string2)
print(result2) # [", ", ", buckle my shoe"]


['he', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
[', ', ', buckle my shoe']


# Q5. How does compiling a regular-expression object benefit you?

A5. Compiling a regular-expression object using the "re.compile()" function can improve the performance of regular-expression operations, particularly if you need to use the same pattern multiple times in your code. It also allows you to specify options such as case-insensitivity and multi-line matching.



In [5]:
import re

# Without compiling the regular expression
pattern = r'\d+'
match = re.search(pattern, 'Hello 123')
print(match.group())

# Compiling the regular expression
compiled_pattern = re.compile(r'\d+')
match = compiled_pattern.search('Hello 123')
print(match.group())


123
123


# Q6. What are some examples of how to use the match object returned by re.match and re.search?

A6. The match object returned by re.match and re.search contains information about the matched pattern, including the matched text and any captured groups. Some examples of how to use this object include accessing the matched text using the "group()" method, accessing a specific captured group using the "group(n)" method, and iterating over all the captured groups using the "groups()" method.

In [6]:
import re

# Using re.match
pattern = r'hello'
text = 'hello world'
match = re.match(pattern, text)
if match:
    print(f"Matched text: {match.group()}")
    print(f"Match start position: {match.start()}")
    print(f"Match end position: {match.end()}")

# Using re.search
pattern = r'\d+'
text = 'hello 123 world'
match = re.search(pattern, text)
if match:
    print(f"Matched text: {match.group()}")
    print(f"Match start position: {match.start()}")
    print(f"Match end position: {match.end()}")


Matched text: hello
Match start position: 0
Match end position: 5
Matched text: 123
Match start position: 6
Match end position: 9


# Q7. What is the difference between using a vertical bar (|) as an alteration and using square brackets as a character set?

A7. The vertical bar "|" is used to specify an alteration in a regular expression, where one of several alternatives may be matched. For example, the regular expression "dog|cat" matches either "dog" or "cat". Square brackets, on the other hand, are used to specify a character set, where any one character from the set may be matched. For example, the regular expression "[aeiou]" matches any vowel.



In [7]:
import re

# Using the vertical bar
pattern = r'hello|world'
text = 'hello'
match = re.search(pattern, text)
if match:
    print("Matched")

# Using square brackets
pattern = r'[abc]'
text = 'a'
match = re.search(pattern, text)
if match:
    print("Matched")


Matched
Matched


In the first example, the pattern hello|world matches either hello or world. In the second example, the pattern [abc] matches any of the characters a, b, or c.

# Q8. In regular-expression search patterns, why is it necessary to use the raw-string indicator (r)? In replacement strings?

A8. In regular-expression search patterns, it is necessary to use the raw-string indicator "r" to prevent special characters such as backslashes from being interpreted as escape sequences. In replacement strings, it is also recommended to use the raw-string indicator to prevent unintended escape sequences from being inserted into the replaced text.Q8. In regular-expression search patterns, why is it necessary to use the raw-string indicator (r)? In replacement strings?

A8. In regular-expression search patterns, it is necessary to use the raw-string indicator "r" to prevent special characters such as backslashes from being interpreted as escape sequences. In replacement strings, it is also recommended to use the raw-string indicator to prevent unintended escape sequences from being inserted into the replaced text.

In [8]:
import re

pattern = r'(\w+) (\w+)'
text = 'John Smith'
replacement = r'\2, \1'
new_text = re.sub(pattern, replacement, text)
print(new_text)  # Output:


Smith, John
