# Regular Expressions

## 1. Characters and Character Classes

| Character or Character Class | Meaning |
| :-: | :- |
| . | any character except a newline |
| \d | a single digit character |
| \D | a single non-digit character |
| \s | a single whitespace character |
| \S | a single non-whitespace character |
| \n | a new line character |
| \t | a tab character |
| a | the character 'a' |
| \w | any alphanumeric character from the set [a-zA-Z0-9]|
| \W | any non-alphanumeric character |
| [abcd] | a character from the set [a,b,c,d] | 
| [^abcd] | a character not from the set [a,b,c,d] |
| [a-z] | all characters from the set a to z |
| [^a-z] | all characters not from the set a to z |

### Examples

a) Match a single character that is immediately followed by the letter 'f'

In [1]:
import re

text = '''Diddle, diddle, dumpling, my son John.
Went to bed with his trousers on.
One shoe off, and the other shoe on.
Diddle, diddle, dumpling, my son John.'''

re.findall(r'.f',text)

['of']

b) Match any single lower case character that is immediately followed by the letter 'l'

In [2]:
re.findall(r'[a-z]l',text)

['dl', 'dl', 'pl', 'dl', 'dl', 'pl']

c) Match any character 'a' followed by a character that is not in the set [a,f,h,n,s].

In [3]:
re.findall(r'a[^afhns]',text)

[]

d) Match the character 'o' when it is not followed by a character in the set g to l or the set A-Z.

In [4]:
re.findall(r'o[^g-lA-Z]',text)

['on', 'o ', 'ou', 'on', 'oe', 'of', 'ot', 'oe', 'on', 'on']

e) Find all digit characters

In [5]:
re.findall(r'\d','He was aged 18 in 1996. He is now 42.')

['1', '8', '1', '9', '9', '6', '4', '2']

## 2. Repetition Operators (Quantifiers)

| Quantifier | Meaning |
| :-: | :- |
| ? | **Zero or one** occurences of the preceeding expression |
| * | **Zero or more** occurences of the preceeding expression |
| + | **One or more** occurences of the preceeding expression |
| {n} | **Exactly *n*** occurences of the preceeding expression |
| {n,m} | **Between *n* and *m*** occurences of the preceeding expression |
| {n,} | ***n* or more** occurences of the preceeding expression |
| {,m} | **Up to *n*** occurences of the preceeding expression |

### Examples

1) Match zero or one occurences of the character 'n'. Note the empty 'zero' occurences in the result.

In [6]:
re.findall(r'n?',text)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'n',
 '',
 '',
 '',
 '',
 'n',
 '',
 '']

2) Match single and multiple occurences of the character 'n'.

In [7]:
re.findall(r'n+',text)

['n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n']

2) Match only when the character 'd' occurs twice in succession.

In [8]:
re.findall(r'd{2}',text)

['dd', 'dd', 'dd', 'dd']

## 3. Position Anchors

| Anchor | Meaning |
| :-: | :- |
| ^ | Matches the expression that follows at the start of the line only |
| $ | Matches the preceeding expression at the end of the line only |
| \b | Matches the word boundary (start or end of a word) |
| \B | Matches an expression not at the word boundary |

### Examples

a) Match the text 'on' as a complete word only

In [9]:
re.findall(r'\bon\b',text)

['on', 'on']

b) Match the text 'on' when it is the end of a larger word.


In [10]:
re.findall(r'\Bon\b',text)

['on', 'on']

c) Match the text 'Peter' when it occurs at the start of a line.

In [23]:
text = '''Peter Piper picked a peck of pickled peppers\n
A peck of pickled peppers Peter Piper picked\n
If Peter Piper picked a peck of pickled peppers\n
Where’s the peck of pickled peppers Peter Piper picked?'''

re.compile(r'^Peter',re.M).findall(text) # needed to permit MULTILINE mode

['Peter']

d) Match the text 'peppers' or the text 'picked' when it occurs at the end of a line.

In [28]:
re.compile(r'picked$|peppers$',re.M).findall(text)

['peppers', 'picked', 'peppers']