# Chapter 2
# Strings and Text

## Table of content:
1. [Splitting Strings on Any of Multiple Delimiters](#2.1-Splitting-Strings-on-Any-of-Multiple-Delimiters)

## 2.1 Splitting Strings on Any of Multiple Delimiters

You need to split strings, but the delimiters (and spacing around them) are not consistent throughout the strings. `split()` works for some simple cases, but for more complicated scenarios, it would not be preferred.

In case you need more flexibility, use the `re.split()` method instead:

In [1]:
line = "This is my; life and I doubt, it will turn. out.         well"

In [12]:
import re

print(re.split(r'[;,.\s]\s*', line))

['This', 'is', 'my', 'life', 'and', 'I', 'doubt', 'it', 'will', 'turn', 'out', 'well']


The `re.split()` is useful because you can specify multiple patterns for the separator. For example, as shown in the solution, the separator is either a comma, a dot, a semicolon or multiple whitespace. Whenever that pattern is found, the entire match becomes the delimiter between whatever fields lie on either side of the match. The result is a list of fields, just as with `.split()`.

When using `re.split()`, be careful should the regex pattern involve a capture group enclosed in parentheses. If capture groups are used, then the matched text is also included in the result:

In [10]:
fields = re.split(r'(;|,|.|\s)\s*', line)
print(fields)

['', 'T', '', 'h', '', 'i', '', 's', '', 'i', '', 's', '', 'm', '', 'y', '', ';', '', 'l', '', 'i', '', 'f', '', 'e', '', 'a', '', 'n', '', 'd', '', 'I', '', 'd', '', 'o', '', 'u', '', 'b', '', 't', '', ',', '', 'i', '', 't', '', 'w', '', 'i', '', 'l', '', 'l', '', 't', '', 'u', '', 'r', '', 'n', '', '.', '', 'o', '', 'u', '', 't', '', '.', '', 'w', '', 'e', '', 'l', '', 'l', '']


If you don't want the separator characters, but still need parentheses to group parts of the regular expression pattern, make sure to use a noncapture group, specified as `(?:...)`:

In [13]:
print(re.split(r'(?:;|,|.|\s)\s*', line))

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']


In [14]:
%reset