In [35]:
# Regular Expressions

## Regex Cheatsheet:
|    regex    | definition                     |
|:-----------:|:-------------------------------|
|    abc…	    | Letters                        |
|    123…	    | Digits                         |
|    \\d	     | Any Digit                      |
|    \\D	     | Any Non-digit character        |
|     .	      | Any Character                  |
|     \\.     | 	Period                        |
|   [abc]	    | Only a, b, or c                |
|   [^abc]	   | Not a, b, nor c                |
|   [a-z]	    | Characters a to z              |
|   [0-9]	    | Numbers 0 to 9                 |
|    \\w	     | Any Alphanumeric character     |
|    \\W	     | Any Non-alphanumeric character |
|     {m}     | 	m Repetitions                 |
|   {m,n}	    | m to n Repetitions             |
|     *	      | Zero or more repetitions       |
|     +	      | One or more repetitions        |
|     ?	      | Optional character             |
|    \\s	     | Any Whitespace                 |
|    \\S	     | Any Non-whitespace character   |
|    ^…$	     | Starts and ends                |
|    (…)	     | Capture Group                  |
|   (a(bc))   | 	Capture Sub-group             |
|    (.*)	    | Capture all                    |
| (abc      \| def)	                          |Matches abc or def|

In [36]:
## Regex in Python!!

In [37]:
import re

In [38]:
pattern = r"([a-zA-Z]+) (\d+)"
if re.search(pattern, "june 24"):
    match = re.search(pattern, "June 24")
    print ("Match at index %s, %s." % (match.start(),match.end()))

    print("Full match: %s" % (match.group(0)))
    print("Month: %s" % (match.group(1)))
    print("Day: %s" % (match.group(2)))
else:
    # If re.search() does not match, then None is returned
    print("The regex pattern does not match. :(")

Match at index 0, 7.
Full match: June 24
Month: June
Day: 24


If you need to see if there are multiple matches, then use either ```findall()``` or ```finditer()```:
```python
matchList = re.findAll(pattern,string,flags=0)
matchListIter = re.finditer(pattern, input, flags=0)
```
For example:

In [39]:
regex = r"[a-zA-Z]+ \d+"
matches = re.findall(regex,"June 24, August 9, Dev 12")
for match in matches:
    print("Full match: %s" % (match))

Full match: June 24
Full match: August 9
Full match: Dev 12


To capture the specific months of each date, use the following pattern:

In [40]:
regex = r"([a-zA-Z]+) \d+"
matches = re.findall(regex, "June 24, August 9, Dec 12")
for match in matches:
    print ("Match month: %s" % (match))

Match month: June
Match month: August
Match month: Dec


For the exact positions of each match:

In [41]:
regex = r"([a-zA-Z]+) \d+"
matches = re.finditer(regex, "June 24, August 9, Dec 12")
for match in matches:
    print("Match at index: %s, %s" % (match.start(), match.end()))

Match at index: 0, 7
Match at index: 9, 17
Match at index: 19, 25


## Finding and Replacing Strings

Use the ```sub()``` function:
```python
replacedString = re.sub(pattern,replacement_pattern, input_str, count, flags=0)
```
For example, I can reverse the order of the day in the month in a date string as follows:

In [42]:
pattern = r"([a-zA-Z]+) (\d+)"
print(re.sub(pattern, r"\2 of \1", "June 24, August 9, Dec 12"))

24 of June, 9 of August, 12 of Dec


# Flags
The above functions all have a ```flags``` parameter.  Flags are generally just QoL/conveniences, and are useful in some situations but useless in others.

 - ```re.IGNORECASE``` makes the pattern case insensitive
 - ```re.MULTILINE``` necessary if the input string has newline characters (\n).  Allows the start (^) and end ($) metacharacters to match at the beginning of each line rather than at the beginning of the whole string.
 - ```re.DOTALL``` allows the dot (.) metacharacter match all characters, including the newline character.

# Compiling a Pattern for Performance

If you need to test many input strings with the same regular expression, it is recommended to compile them using the following function, which returns a ```regexObject```:

```python
regexObject = re.compile(pattern,flags=0)
```


In [43]:
regex = re.compile(r"(\w+) World")
result = regex.search("Hello World is easiest")
if result:
    # this will print ```0 11``` for the start and end of the match
    print(result.start(),result.end())

0 11


This will print
```
hello
bonjour
```
for each of the captured groups that matched

In [44]:
for result in regex.findall("Hello World, Bonjour World"):
    print(result)

Hello
Bonjour


This will substitute "World" with "Earth" and print ```Hello Earth```:

In [45]:
print(regex.sub(r"\1 Earth", "Hello World"))

Hello Earth
