# 文字與正規表達式

## 郭耀仁

## 像 list 一樣操作

- 文字能夠利用中括號與索引值切割，操作起來就像是 list

In [1]:
luke = 'Luke Skywalker'
print(len(luke))
print(luke[0])
print(luke[:4])
print(luke[-1])

14
L
Luke
r


## 常用的文字方法：大小寫的轉換

- `.lower()`
- `.upper()`
- `.capitalize()`
- `.title()`
- `.swapcase()`

In [2]:
print(luke.lower())
print(luke.upper())
print(luke.lower().capitalize())
print(luke.upper().title())
print(luke.swapcase())

luke skywalker
LUKE SKYWALKER
Luke skywalker
Luke Skywalker
lUKE sKYWALKER


## 常用的文字方法：移除 leading/trailing 空白

- `.lstrip()`
- `.rstrip()`
- `.strip()`

In [3]:
luke = ' Luke Skywalker '
luke.lstrip()

'Luke Skywalker '

In [4]:
luke.rstrip()

' Luke Skywalker'

In [5]:
luke.strip()

'Luke Skywalker'

## 常用的文字方法：指定某個 pattern 來分割

- `.split()`

In [6]:
luke = 'Luke Skywalker'
given_name = luke.split(' ')[0]
family_name = luke.split(' ')[1]
print(given_name)
print(family_name)

Luke
Skywalker


In [7]:
luke = 'Skywalker, Luke'
given_name = luke.split(', ')[1]
family_name = luke.split(', ')[0]
print(given_name)
print(family_name)

Luke
Skywalker


## 正規表達式模組的 `re.split()`

In [8]:
import re

luke = 'Luke Skywalker'
given_name = re.split('\s', luke)[0]
family_name = re.split('\s', luke)[1]
print(given_name)
print(family_name)

Luke
Skywalker


In [9]:
import re

luke = 'Skywalker, Luke'
given_name = re.split(',\s', luke)[1]
family_name = re.split(',\s', luke)[0]
print(given_name)
print(family_name)

Luke
Skywalker


## 常用的文字方法：pattern matching

- `.find()`
- `.rfind()`
- `.index()`
- `.rindex()`

In [10]:
luke = 'Luke Skywalker'
print(luke.find('ke'))
print(luke.rfind('ke'))
print(luke.index('ke'))
print(luke.rindex('ke'))

2
11
2
11


## 兩者的差別在於找不到時候的反應

In [11]:
luke = 'Luke Skywalker'
print(luke.find('sk'))
print(luke.rfind('sk'))
#print(luke.index('sk'))
#print(luke.rindex('sk'))

-1
-1


## 正規表達式模組的 re.findall()

In [12]:
import re

luke = 'Luke Skywalker'
re.findall('ke', luke)

['ke', 'ke']

In [13]:
luke = 'Luke Skywalker'
re.findall('sk', luke)

[]

## 常用的文字方法：替換

- `.replace()`

In [14]:
luke = 'Luke Skywalker'
anakin = luke.replace('Luke', 'Anakin')
print(anakin)
print(anakin.replace(' ', ''))

Anakin Skywalker
AnakinSkywalker


## 正規表達式模組的 re.sub()

In [15]:
import re

luke = 'Luke Skywalker'
anakin = re.sub('Luke', 'Anakin', luke)
print(anakin)
print(re.sub('\s', '', anakin))

Anakin Skywalker
AnakinSkywalker


## 將函數應用至多個文字

- 使用 `map()` 函數搭配 `lambda` 函數應用到 list
- 使用 pandas 的方法應用到 Series

## 以 `.upper()` 方法為例

In [16]:
# map lambda function
skywalkers = ['Anakin Skywalker', 'Luke Skywalker']
list(map(lambda x: x.upper(), skywalkers))

['ANAKIN SKYWALKER', 'LUKE SKYWALKER']

In [17]:
# Series
import pandas as pd

ser = pd.Series(skywalkers)
ser.str.upper()

0    ANAKIN SKYWALKER
1      LUKE SKYWALKER
dtype: object

## 以 `.split()` 方法為例

In [18]:
# map lambda function
skywalkers = ['Anakin Skywalker', 'Luke Skywalker']
list(map(lambda x: x.split(' '), skywalkers))

[['Anakin', 'Skywalker'], ['Luke', 'Skywalker']]

In [19]:
# Series
import pandas as pd

ser = pd.Series(skywalkers)
ser.str.split(' ')

0    [Anakin, Skywalker]
1      [Luke, Skywalker]
dtype: object

## 以 `.replace()` 方法為例

In [20]:
# map lambda function
skywalkers = ['Anakin Skywalker', 'Luke Skywalker']
list(map(lambda x: x.replace(' ', ''), skywalkers))

['AnakinSkywalker', 'LukeSkywalker']

In [21]:
# Series
import pandas as pd

ser = pd.Series(skywalkers)
ser.str.replace(' ', '')

0    AnakinSkywalker
1      LukeSkywalker
dtype: object

## 延伸閱讀

- [re — Regular expression operations](https://docs.python.org/3/library/re.html#module-re)
- [Python Strings](https://developers.google.com/edu/python/strings)