### Regular expresions 

In [1]:
import re

In [2]:
text = "In this chain the word is magic"
re.search("magic", text)

<re.Match object; span=(26, 31), match='magic'>

In [3]:
word = "magic"
find = re.search(word, text) # Return true or none

In [4]:
if find is not None:
    print("The word was find")
else:
    print("The word was not find")

The word was find


In [5]:
find.start() # were this patron start

26

In [6]:
find.end() # were the patron end

31

In [7]:
find.span() # were start and end

(26, 31)

In [8]:
find.string # if we want to recover all the string that contains this word

'In this chain the word is magic'

**re.match**, search the patron at the beginning of the string

In [9]:
text = "Hello world"
re.match("Hello", text) 

<re.Match object; span=(0, 5), match='Hello'>

In [10]:
text = "Hello world"
re.match("Hello", text) # this will return none

<re.Match object; span=(0, 5), match='Hello'>

In [11]:
text = "Lets split this chain"
re.split(' ', text) # split the string of a patron 

['Lets', 'split', 'this', 'chain']

In [12]:
text = "Hello friend"
re.sub("Hello", "Bye", text) # change the patron for another word

'Bye friend'

In [13]:
text = "hello bye hello hello bye"
re.findall("hello", text) # find all the paterns in the string

['hello', 'hello', 'hello']

This is useful to know the number of repetitions for this patern

In [14]:
len(re.findall("hello", text))

3

In [15]:
len(re.findall("e", text))

5

In [16]:
text = "hello bye hola adios hello"
re.findall("(hola|hello)", text) # now we can find more than one word

['hello', 'hola', 'hello']

In [17]:
text = "hllo hello hola hellooo heeello"
re.findall("hllo", text)

['hllo']

In [18]:
re.findall("hola", text)

['hola']

This function will find repeating patterns

In [19]:
text = "hllo hello hola hellooo heeello"

def find_patterns(patterns, text):
    for pattern in patterns:
        print(re.findall(pattern, text))

patterns = ['hllo', 'hello', 'hola']

find_patterns(patterns, text)

['hllo']
['hello', 'hello']
['hola']


In [20]:
text = "hllo hello hola hellooo heeello"
patterns = ['he']

find_patterns(patterns, text)

['he', 'he', 'he']


The **meta-character***:

We use it to define no or more repetitions of the letter to the left of the meta-character

In [21]:
text = "hllo hello hola hellooo heeello"
patterns = ['he', 'he*'] 

find_patterns(patterns, text)

['he', 'he', 'he']
['h', 'he', 'h', 'he', 'heee']


In [22]:
text = "hllo hello hola hellooo heeello"
patterns = ['he', 'he*', 'ho*la', 'he*llo'] 

find_patterns(patterns, text)

['he', 'he', 'he']
['h', 'he', 'h', 'he', 'heee']
['hola']
['hllo', 'hello', 'hello', 'heeello']


The **meta-character+**:

We use it to define one or more repetitions of the letter to the left of the meta-character

In [23]:
text = "hllo hello hola hellooo heeello"
patterns = ['he', 'he+', 'ho+la', 'he+llo'] 

find_patterns(patterns, text)

['he', 'he', 'he']
['he', 'he', 'heee']
['hola']
['hello', 'hello', 'heeello']


In [24]:
text = "hllo hello hola hellooo heeello"
patterns = ['he*', 'he+'] 

find_patterns(patterns, text)

['h', 'he', 'h', 'he', 'heee']
['he', 'he', 'heee']


The **meta-character?**:

We use it to define one or no repetitions of the letter to the left of the meta-character

In [25]:
text = "hllo hello hola hellooo heeello"
patterns = ['he*', 'he+', 'he?'] 

find_patterns(patterns, text)

['h', 'he', 'h', 'he', 'heee']
['he', 'he', 'heee']
['h', 'he', 'h', 'he', 'he']


In [26]:
text = "hllo hello hola hellooo heeelloo"
patterns = ['he*', 'he+', 'he?', 'he?llo'] 

find_patterns(patterns, text)

['h', 'he', 'h', 'he', 'heee']
['he', 'he', 'heee']
['h', 'he', 'h', 'he', 'he']
['hllo', 'hello', 'hello']


This is another sintaxis that indicates the explicit number of repetitions, indicate the number of repetitions of the letter to the left

In [27]:
text = "hllo hello hola helloo heeelloo"
patterns = ['he{0}llo', 'he{1}llo', 'he{3}llo'] 

find_patterns(patterns, text)

['hllo']
['hello', 'hello']
['heeello']


Now we can use a range {x, y} to indicate the number of characters that must appear in the pattern

In [28]:
text = "hllo hello hola helloo heeelloo"
patterns = ['he{0,1}llo', 'he{0,3}llo'] 

find_patterns(patterns, text)

['hllo', 'hello', 'hello']
['hllo', 'hello', 'hello', 'heeello']


Find a set of characters

In [29]:
text = "hala hela hila hola hula"
patterns = ['h[ou]la', 'h[aio]la', 'h[aeiou]la'] 

find_patterns(patterns, text)

['hola', 'hula']
['hala', 'hila', 'hola']
['hala', 'hela', 'hila', 'hola', 'hula']


If we want to determinate if the pattern has repetitions we can use the metacharacter *

In [30]:
text = "haala heeela hiiiila hooooola"
patterns = ['h[ae]la', 'h[ae]*la', 'h[io]{3,9}la'] 

find_patterns(patterns, text)

[]
['haala', 'heeela']
['hiiiila', 'hooooola']


With the character **^** we can exclude 

In [31]:
text = "hala hela hila hola hula"
patterns = ['h[o]la', 'h[^o]la'] 

find_patterns(patterns, text)

['hola']
['hala', 'hela', 'hila', 'hula']


##### Ranges [-]:

Another feature that makes groups ultra powerful is the ability to define ranges. Examples of ranges:

* **[A-Z]:** Any uppercase alphabetic character (no special or number).
* **[a-z]:** Any alphabetic character in lowercase (not special or number).
* **[A-Za-z]:** Any alphabetic character in lowercase or capital (not special or number).
* **[A-z]:** Any alphabetic character in lowercase or capital (not special or number).
* **[0-9]:** Any numeric character (not special or alphabetic).
* **[a-zA-Z0-9]:** Any alphanumeric character (not special).

Keep in mind that any range can be excluded to get the opposite pattern.

In [32]:
text = "hola h0la Hola mola m0la M0la defs"
patterns = ['h[a-z]la', 'h[0-9]la', '[A-z]{4}', '[A-z][A-z0-9]{3}'] 

find_patterns(patterns, text)

['hola']
['h0la']
['hola', 'Hola', 'mola', 'defs']
['hola', 'h0la', 'Hola', 'mola', 'm0la', 'M0la', 'defs']


If every time we wanted to define a variable pattern we had to create ranges, in the end we would have giant regular expressions. Luckily, its syntax also accepts a series of escaped characters that have a unique meaning. Some of the most important are:

<p style='text-align: left;'> 

| Code |     Meaning      |
|:----:|:----------------:|
|  \d  |     numeric      | 
|  \D  |   not numeric    | 
|  \s  |   blank space    | 
|  \S  |  not blank space | 
|  \w  |   alphanumeric   | 
|  \W  | not alphanumeric | 

</p>

In [33]:
text = "This is going to be over soon 2019"
patterns = [r'\d', r'\d+', r'\D', r'\D+'] 

find_patterns(patterns, text)

['2', '0', '1', '9']
['2019']
['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'g', 'o', 'i', 'n', 'g', ' ', 't', 'o', ' ', 'b', 'e', ' ', 'o', 'v', 'e', 'r', ' ', 's', 'o', 'o', 'n', ' ']
['This is going to be over soon ']


In [34]:
# Now if we put a number in the middle of the chain

text = "This is going to 08 be over soon 2019"
patterns = [r'\d', r'\d+', r'\D', r'\D+', r'\s', r'\S', r'\S+', r'\w', r'\w+',r'\W',r'\W+'] 

# \S+ is equal to \w+
# \s is equal to \W and \W+, in the case of \W+ we have to take care if we have more than one space 

find_patterns(patterns, text)

['0', '8', '2', '0', '1', '9']
['08', '2019']
['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'g', 'o', 'i', 'n', 'g', ' ', 't', 'o', ' ', ' ', 'b', 'e', ' ', 'o', 'v', 'e', 'r', ' ', 's', 'o', 'o', 'n', ' ']
['This is going to ', ' be over soon ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
['T', 'h', 'i', 's', 'i', 's', 'g', 'o', 'i', 'n', 'g', 't', 'o', '0', '8', 'b', 'e', 'o', 'v', 'e', 'r', 's', 'o', 'o', 'n', '2', '0', '1', '9']
['This', 'is', 'going', 'to', '08', 'be', 'over', 'soon', '2019']
['T', 'h', 'i', 's', 'i', 's', 'g', 'o', 'i', 'n', 'g', 't', 'o', '0', '8', 'b', 'e', 'o', 'v', 'e', 'r', 's', 'o', 'o', 'n', '2', '0', '1', '9']
['This', 'is', 'going', 'to', '08', 'be', 'over', 'soon', '2019']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
