In [1]:
import re

## Basic Syntax

- `.`: Matches any single character except newline
- `^`: Matches the start of the string
- `$`: Matches the end of the string
- `*`: Matches 0 or more repetitions of the preceding element
- `+`: Matches 1 or more repetitions of the preceding element
- `?`: Matches 0 or 1 repetition of the preceding element
- `{n}`: Matches exactly n repetitions of the preceding element
- `{n,}`: Matches at least n repetitions of the preceding element
- `{n,m}`: Matches between n and m repetitions of the preceding element
- `|`: Alternation, matches either the pattern before or the pattern after the symbol

## Character Classes

- `[abc]`: Matches any one of the characters a, b, or c
- `[^abc]`: Matches any character that is not a, b, or c
- `[a-z]`: Matches any character from a to z
- `[A-Z]`: Matches any character from A to Z
- `[0-9]`: Matches any digit
- `\d`: Matches any digit (equivalent to [0-9])
- `\D`: Matches any non-digit
- `\w`: Matches any word character (equivalent to [a-zA-Z0-9_])
- `\W`: Matches any non-word character
- `\s`: Matches any whitespace character
- `\S`: Matches any non-whitespace character

## Special Characters

- `\`: Escapes a special character
- `()` : Defines a group
- `(?:...)`: Non-capturing group
- `(?=...)`: Positive lookahead assertion
- `(?!...)`: Negative lookahead assertion

## Examples

- `abc`: Matches the string "abc"
- `abc|def`: Matches "abc" or "def"
- `^abc`: Matches any string that starts with "abc"
- `abc$`: Matches a string that ends with "abc"
- `a.b`: Matches any string containing "a", any character, then "b"
- `a*`: Matches 0 or more 'a's
- `a+`: Matches 1 or more 'a's
- `a?`: Matches 0 or 1 'a'
- `\d{2,4}`: Matches between 2 and 4 digits

In [2]:
s = '''
<a href="https://amazon.com/categories/ski">Ski</a>
<a href="https://amazon.com/p/1234567890/awesome-product-1">Coffee beans</a>
<a href="https://amazon.com/p/6454343333/ok-product-2">Backcountry Ski</a>
<a href="https://amazon.com/p/6543565454/great-product-1">Book</a>
<a href="https://amazon.com/about-us">About Us</a>
'''

In [4]:
# Extract only the product links
# Expected output:
# https://amazon.com/p/1234567890/awesome-product-1
# https://amazon.com/p/6454343333/ok-product-2
# https://amazon.com/p/6543565454/great-product-1

re.findall(r'<a href="https://amazon.com/p/1234567890/awesome-product-1">', s)

['<a href="https://amazon.com/p/1234567890/awesome-product-1">']

In [9]:
re.findall(r'<a href="https://amazon.com/p/\d+/.+">', s)

['<a href="https://amazon.com/p/1234567890/awesome-product-1">',
 '<a href="https://amazon.com/p/6454343333/ok-product-2">',
 '<a href="https://amazon.com/p/6543565454/great-product-1">']

In [10]:
re.findall(r'<a href="(https://amazon.com/p/\d+/.+)">', s)

['https://amazon.com/p/1234567890/awesome-product-1',
 'https://amazon.com/p/6454343333/ok-product-2',
 'https://amazon.com/p/6543565454/great-product-1']

# Practice Problems

### Problem 1: Email Extraction

**Problem**: Extract emails from a given string.  
**String**: "Contact us at support@example.com or sales@example.org"

### Problem 2: Phone Number Validation

**Problem**: Validate and extract US phone numbers in the format xxx-xxx-xxxx.  
**String**: "My numbers are 123-456-7890 or 333-333-3333"

### Problem 3: Password Strength Check

**Problem**: Check if a password is at least 8 characters long, contains a digit, an uppercase, and a lowercase letter.  
**String**: "Password1"

### Problem 4: Extracting Domain Name

**Problem**: Extract the domain name from an email address.  
**String**: "user@example.com"

### Problem 5: Validating an IP Address

**Problem**: Check if a string is a valid IPv4 address.  
**String**: "192.168.1.1"