# Python高效编程技巧2

10. [如何进行反向迭代以及如何实现反向迭代？](#10)
11. [如何对迭代器做切片操作？](#11)
12. [如何在一个for语句中迭代多个可迭代对象？](#12)
13. [如何拆分含有多种分隔符的字符串？](#13)
14. [如何判断字符串a是否以字符串b开头或结尾？](#14)
15. [如何调整字符串中文本的格式？](#15)
16. [如何将多个小字符串拼接成一个大字符串？](#16)
17. [如何对字符串进行左、右、中对齐？](#17)
18. [如何去掉字符串中不需要的字符？](#18)

## <span id='10'>10 如何进行反向迭代以及如何实现反向迭代？</span>
- 实现反向迭代协议的\__reversed\__方法，它返回一个反向迭代器

In [2]:
l = [1,2,3,4,5]
l.reverse()
l #改变了原列表，往往是不允许的

[5, 4, 3, 2, 1]

In [4]:
l = [1,2,3,4,5]
l[::-1] #得到新列表，某些时候是很浪费的

[5, 4, 3, 2, 1]

In [5]:
# 列表的反向迭代
reversed(l)

<list_reverseiterator at 0x1064483c8>

In [6]:
# 列表的正向迭代
iter(l)

<list_iterator at 0x106448208>

In [7]:
for x in reversed(l):
    print(x)

5
4
3
2
1


In [9]:
l.__iter__

<method-wrapper '__iter__' of list object at 0x106422d08>

In [12]:
help(l.__reversed__)

Help on built-in function __reversed__:

__reversed__(...) method of builtins.list instance
    L.__reversed__() -- return a reverse iterator over the list



**实例：实现一个连续浮点数发生器FloatRange，于整数range功能类似**

In [14]:
class FloatRange:
    def __init__(self, start, end, step=0.1):
        self.start = start
        self.end = end
        self.step = step
    
    def __iter__(self):
        t = self.start
        while t <= self.end:
            yield t
            t += self.step
    
    def __reversed__(self):
        t = self.end
        while t >= self.start:
            yield t
            t -= self.step

for x in FloatRange(1.0, 4.0, 0.5):
    print(x)

1.0
1.5
2.0
2.5
3.0
3.5
4.0


In [15]:
for x in reversed(FloatRange(1.0, 4.0, 0.5)):
    print(x)

4.0
3.5
3.0
2.5
2.0
1.5
1.0


##  <span id='11'>11 如何对迭代器做切片操作？</span>
假设有个文本文件，我们想读取其中某范围内的内容，如100-300行之间的内容。python文本文件是可迭代对象，我们能否可以使用类似列表切片的方法得到100-300行的文件内容的生成器？下面的代码有效吗？
```python
f = open('/var/log/wifi.log')
f[100:300]
```

In [18]:
cat -n /var/log/wifi.log

     1	Jul 23 00:30:00 leipengdeMBP newsyslog[1015]: logfile turned over
     2	Mon Jul 23 09:00:32.616 ***Starting Up***
     3	Mon Jul 23 09:00:32.651 <kernel> IO80211Controller::addSubscriptionForThisReporterFetchedOnTimer() Failed to addSubscription for group Interface p2p0 subgroup Data Packets driver 0x419f4e8031888eff - data underrun
     4	Mon Jul 23 09:00:32.651 <kernel> IO80211InterfaceMonitor::configureSubscriptions() failed to add subscription
     5	Mon Jul 23 09:00:32.655 <kernel>  Creating all peerManager reporters
     6	Mon Jul 23 09:00:32.686 <airportd[113]> airportdProcessDLILEvent: en0 attached (up)
     7	Mon Jul 23 09:00:32.700 <kernel> wl0: setAWDL_PEER_TRAFFIC_REGISTRATION: active 0, roam_off: 0, err 0 roam_start_set 0 forced_roam_set 0
     8	Mon Jul 23 09:00:33.304 <kernel> Setting BTCoex Profile: band:8
     9	Mon Jul 23 09:00:33.304 <kernel> Profile[0]: mode:7; desense:0; desense_level:0; chain_power_offset:<kernel> 0,<kernel> 0,<kernel> 0,<kernel> 0

In [24]:
f = open('/var/log/wifi.log')
f[50:100] #可以看到文本对象不能直接切片

TypeError: '_io.TextIOWrapper' object is not subscriptable

In [25]:
# 那能不能用readlines对象呢？
lines = f.readlines()
lines[50:60]

['Mon Jul 23 09:00:34.208 <kernel> GTK:\n',
 'Mon Jul 23 09:00:34.208 [00000000] FA 7E EF 35 08 F2 3A 61 AC AB 79 E8 3D 28 23 44 \n',
 'Mon Jul 23 09:00:34.208 <kernel> installGTK: GTK installed\n',
 'Mon Jul 23 09:00:35.134 <kernel> Setting BTCoex Config: enable_2G:1, profile_2g:0, enable_5G:1, profile_5G:0\n',
 'Mon Jul 23 09:00:37.230 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload\n',
 'Mon Jul 23 09:00:37.245 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload\n',
 'Mon Jul 23 09:00:41.990 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload\n',
 'Mon Jul 23 09:00:57.947 <kernel> en0: roam event, sending supplicant link down message.\n',
 'Mon Jul 23 09:00:58.370 <kernel> IO80211AssociationJoinSnapshot::captureRequestCallback Problem reported from corecapture\n',
 'Mon Jul 23 10:45:12.550 <kernel> IO80211AssociationJoinSnapshot::captureRequestCallback Problem 

readlines会一次性将文件全部读入内存当中，有些日志文件可能几个g，what a disaster
#### 读取文本最好的方式还是使用迭代协议

In [26]:
for line in f:
    print(line)

可以看到什么也没有输出，因为前面readlines使得文件指针已经在文件结尾处

In [27]:
f.seek(0)
for line in f:
    print(line)

Jul 23 00:30:00 leipengdeMBP newsyslog[1015]: logfile turned over

Mon Jul 23 09:00:32.616 ***Starting Up***

Mon Jul 23 09:00:32.651 <kernel> IO80211Controller::addSubscriptionForThisReporterFetchedOnTimer() Failed to addSubscription for group Interface p2p0 subgroup Data Packets driver 0x419f4e8031888eff - data underrun

Mon Jul 23 09:00:32.651 <kernel> IO80211InterfaceMonitor::configureSubscriptions() failed to add subscription

Mon Jul 23 09:00:32.655 <kernel>  Creating all peerManager reporters

Mon Jul 23 09:00:32.686 <airportd[113]> airportdProcessDLILEvent: en0 attached (up)

Mon Jul 23 09:00:32.700 <kernel> wl0: setAWDL_PEER_TRAFFIC_REGISTRATION: active 0, roam_off: 0, err 0 roam_start_set 0 forced_roam_set 0

Mon Jul 23 09:00:33.304 <kernel> Setting BTCoex Profile: band:8

Mon Jul 23 09:00:33.304 <kernel> Profile[0]: mode:7; desense:0; desense_level:0; chain_power_offset:<kernel> 0,<kernel> 0,<kernel> 0,<kernel> 0,<kernel> 

Mon Jul 23 09:00:33.304 <kernel> Profile[1]: mode:7

**使用标准库中的itertools.islice，它能返回一个迭代对象切片的生成器**

In [28]:
from itertools import islice
help(islice)

Help on class islice in module itertools:

class islice(builtins.object)
 |  islice(iterable, stop) --> islice object
 |  islice(iterable, start, stop[, step]) --> islice object
 |  
 |  Return an iterator whose next() method returns selected values from an
 |  iterable.  If start is specified, will skip all preceding elements;
 |  otherwise, start defaults to zero.  Step defaults to one.  If
 |  specified as another value, step determines how many values are 
 |  skipped between successive calls.  Works like a slice() on a list
 |  but returns an iterator.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  

In [29]:
islice(f, 50, 100)

<itertools.islice at 0x1064fbc28>

In [32]:
f.seek(0)
for line in islice(f, 50, 100):
    print(line)

Mon Jul 23 09:00:34.208 <kernel> GTK:

Mon Jul 23 09:00:34.208 [00000000] FA 7E EF 35 08 F2 3A 61 AC AB 79 E8 3D 28 23 44 

Mon Jul 23 09:00:34.208 <kernel> installGTK: GTK installed

Mon Jul 23 09:00:35.134 <kernel> Setting BTCoex Config: enable_2G:1, profile_2g:0, enable_5G:1, profile_5G:0

Mon Jul 23 09:00:37.230 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload

Mon Jul 23 09:00:37.245 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload

Mon Jul 23 09:00:41.990 <airportd[113]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload

Mon Jul 23 09:00:57.947 <kernel> en0: roam event, sending supplicant link down message.

Mon Jul 23 09:00:58.370 <kernel> IO80211AssociationJoinSnapshot::captureRequestCallback Problem reported from corecapture

Mon Jul 23 10:45:12.550 <kernel> IO80211AssociationJoinSnapshot::captureRequestCallback Problem reported from corecapture

Mon Jul 23 12:06:18.

In [33]:
islice(f, 500) #前500行

<itertools.islice at 0x1064fbbd8>

In [35]:
islice(f, 100, None) #100行到文末

<itertools.islice at 0x1064fbd18>

In [36]:
islice(f, 100, -100)

ValueError: Indices for islice() must be None or an integer: 0 <= x <= sys.maxsize.

**islice会消耗迭代器，具有修改迭代器的行为，用例子理解**

In [41]:
l = range(20)
l

range(0, 20)

In [42]:
t = iter(l)

In [43]:
for x in islice(t, 5, 10): #t从0开始迭代，直到碰到5，抛弃前面的值
    print(x)

5
6
7
8
9


In [44]:
for x in t: #t已经被修改了
    print(x)

10
11
12
13
14
15
16
17
18
19


所以，要重复使用islice对象，需重新生成迭代对象

## <span id='12'>12 如何在一个for语句中迭代多个可迭代对象？</span>
- 并行：使用内置函数zip，它能将多个可迭代对象合并，每次迭代返回一个元组
- 串行：使用标准库中的itertools.chain, 它能将多个可迭代对象连接

#### 并行问题：某班学生期末考试成绩，语数英分别存储中3个列表中，同时迭代三个列表，计算总分

In [48]:
# 传统方法
from random import randint

chinese = [randint(60, 100) for _ in range(20)]
math = [randint(60, 100) for _ in range(20)]
english = [randint(60, 100) for _ in range(20)]

for i in range(len(math)):
    chinese[i] + math[i] + english[i]

以上方法是可行的，但是有一定的局限性，因为不是所有的可迭代对象都支持这种索引操作。比如生成器。

In [52]:
zip([1,2,3,4], ('a', 'b', 'c', 'd'))

<zip at 0x1064e6308>

In [53]:
list(zip([1,2,3,4], ('a', 'b', 'c', 'd')))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [55]:
list(zip([1,2,3,4], ('a', 'b', 'c'))) #元素个数不一致也可以

[(1, 'a'), (2, 'b'), (3, 'c')]

In [56]:
total = []

for c, m, e in zip(chinese, math, english):
    total.append(c + m + e)
total

[246,
 221,
 226,
 282,
 218,
 243,
 223,
 250,
 214,
 209,
 214,
 219,
 244,
 230,
 204,
 246,
 219,
 230,
 204,
 249]

#### 串行问题：某年级4个班，某次考试每班英语成绩分布存储中4个列表中，依次迭代每个列表，统计全年级成绩高于90分的人数。

In [61]:
from itertools import chain

help(chain)

Help on class chain in module itertools:

class chain(builtins.object)
 |  chain(*iterables) --> chain object
 |  
 |  Return a chain object whose .__next__() method returns elements from the
 |  first iterable until it is exhausted, then elements from the next
 |  iterable, until all of the iterables are exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  __setstate__(...)
 |      Set state information for unpickling.
 |  
 |  from_iterable(...) from builtins.type
 |      chain.from_iterable(iterable) --> chain object
 |      
 |      Alternate chain() constructor taking a single iterable argument

In [62]:
for x in chain([1, 2, 3], ['a', 'b', 'c']):
    print(x)

1
2
3
a
b
c


In [63]:
e1 = [randint(60, 100) for _ in range(19)]
e2 = [randint(60, 100) for _ in range(21)]
e3 = [randint(60, 100) for _ in range(22)]
e4 = [randint(60, 100) for _ in range(18)]

count = 0
for i in chain(e1, e2, e3, e4):
    if i >= 90:
        count += 1
count

25

也可以用+号直接连接列表，使用上更方便，缺点是生成了新列表，占用空间

In [67]:
e = e1 + e2 + e3 + e4
[i for i in e if i >= 90]

[99,
 96,
 92,
 97,
 90,
 97,
 92,
 96,
 90,
 97,
 100,
 95,
 95,
 90,
 95,
 95,
 99,
 97,
 97,
 94,
 90,
 93,
 98,
 100,
 99]

In [68]:
len([i for i in e if i >= 90])

25

In [69]:
zip?

## <span id='13'>13 如何拆分含有多种分隔符的字符串？</span>

In [70]:
!ps aux

USER               PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
thunderbang        652   4.9  7.6 107350300 635360   ??  Ss   10:14AM  25:09.89 /System/Library/
_hidd               94   1.8  0.1  4334312  11916   ??  Ss    9:00AM  10:31.24 /usr/libexec/hid
thunderbang        499   1.1  2.4  6400444 200528   ??  S     9:00AM  20:23.74 /Applications/Sa
thunderbang        824   1.1  3.6 123546184 305548   ??  Rs    1:02PM  16:00.24 /System/Library/
_windowserver      151   0.7  1.7  6840796 145620   ??  Ss    9:00AM  32:31.08 /System/Library/
root                97   0.1  0.0  4305216   2120   ??  Ss    9:00AM   0:15.46 /usr/sbin/notify
root                56   0.1  0.3  4383536  23360   ??  Ss    9:00AM   0:11.66 /usr/libexec/log
root                53   0.1  0.1  4331180   9312   ??  Ss    9:00AM   0:25.54 /System/Library/
thunderbang       1056   0.0  0.3  4338280  24244   ??  S     5:46PM   0:00.22 /System/Library/
thunderbang       1055   0.0  0.3  43

_locationd         201   0.0  0.1  4335900   8444   ??  S     9:00AM   0:00.28 /usr/libexec/tru
_locationd         200   0.0  0.0  4304900   1420   ??  S     9:00AM   0:00.08 /usr/sbin/cfpref
root               199   0.0  0.1  4307204   6016   ??  Ss    9:00AM   0:00.04 /System/Library/
_fpsd              198   0.0  0.1  4308668   7684   ??  Ss    9:00AM   0:00.21 /System/Library/
_locationd         197   0.0  0.1  4332616   9924   ??  S     9:00AM   0:00.33 /usr/libexec/sec
_locationd         196   0.0  0.1  4377884  11368   ??  S     9:00AM   0:01.21 /System/Library/
root               195   0.0  0.1  4331952   6956   ??  Ss    9:00AM   0:00.07 /System/Library/
_nsurlstoraged     194   0.0  0.1  4304992   5980   ??  Ss    9:00AM   0:00.20 /usr/libexec/nsu
root               193   0.0  0.1  4332248   7616   ??  Ss    9:00AM   0:00.30 /usr/libexec/san
root               187   0.0  0.1  4316556  10436   ??  Ss    9:00AM   0:00.12 /usr/libexec/dis
root               176   0.0  

In [76]:
x = !ps aux
type(x)

IPython.utils.text.SList

In [78]:
s = x[-1]
s

'thunderbang       1058   0.0  0.3  4338280  24076   ??  S     5:46PM   0:00.16 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.single'

In [79]:
s.split() #默认用空格分隔

['thunderbang',
 '1058',
 '0.0',
 '0.3',
 '4338280',
 '24076',
 '??',
 'S',
 '5:46PM',
 '0:00.16',
 '/System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker',
 '-s',
 'mdworker',
 '-c',
 'MDSImporterWorker',
 '-m',
 'com.apple.mdworker.single']

In [82]:
s = 'ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz'

In [86]:
s.split(';') #';|,'不能用此做分隔符，因为被认为一个整体

['ab', 'cd|efg|hi,jkl|mn\topq', 'rst,uvw\txyz']

In [107]:
res = s.split(';')

In [108]:
map(lambda x: x.split('|'), res)

<map at 0x106906c50>

In [109]:
list(map(lambda x: x.split('|'), res))

[['ab'], ['cd', 'efg', 'hi,jkl', 'mn\topq'], ['rst,uvw\txyz']]

In [110]:
t = []
list(map(lambda x: t.extend(x.split('|')), res))
t

['ab', 'cd', 'efg', 'hi,jkl', 'mn\topq', 'rst,uvw\txyz']

In [111]:
res = t
t = []
list(map(lambda x: t.extend(x.split(',')), res))
t

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn\topq', 'rst', 'uvw\txyz']

In [112]:
res = t
t = []
list(map(lambda x: t.extend(x.split('\t')), res))
t

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

**根据以上步骤，我们可以写一个循环实现**

In [116]:
def mySplit(s, ds):
    res = [s]
    for i in ds:
        t = []
        list(map(lambda x: t.extend(x.split(i)), res))
        res = t
    return res

s = 'ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz'
ds = ';,|\t'

mySplit(s, ds)

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

In [119]:
s = 'ab;cd|efg|hi,,jkl|mn\topq;rst,uvw\txyz'

mySplit(s, ds)

['ab', 'cd', 'efg', 'hi', '', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

因为有两个连续的分隔符，结果中出现了空值，需在mySplit函数的返回值过滤掉

In [120]:
def mySplit(s, ds):
    res = [s]
    for i in ds:
        t = []
        list(map(lambda x: t.extend(x.split(i)), res))
        res = t
    return [x for x in res if x]

s = 'ab;cd|efg|hi,,jkl|mn\topq;rst,uvw\txyz'

mySplit(s, ds)

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

**现在我们用re.split神器，一次性解决问题**

In [122]:
import re

re.split(r'[,;\t|]+', s)

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

对于单个分隔符，推荐采用str.split()，速度更快。对于多个分隔符，推荐采用re.split()

## <span id='14'>14 如何判断字符串a是否以字符串b开头或结尾？</span>
案例：判断url是否以https开头；给所有.py文件更改权限；等
- str.startswith()
- str.endswith()

实例：修改目录下以.ipynb或.py结尾的文件权限

In [124]:
ls -l

total 352
-rw-r--r--@ 1 thunderbang  staff      45 Jul 23 14:26 history
-rw-r--r--  1 thunderbang  staff   54106 Jul 23 16:25 python高效编程技巧1.ipynb
-rw-r--r--  1 thunderbang  staff  115968 Jul 23 19:34 python高效编程技巧2.ipynb
-rw-r--r--@ 1 thunderbang  staff       0 Jul 23 19:37 test.py


In [131]:
import os, stat

os.listdir('.')

['.DS_Store',
 'test.py',
 'history',
 'python高效编程技巧1.ipynb',
 '.ipynb_checkpoints',
 'python高效编程技巧2.ipynb']

In [132]:
s = 'test.py'
s.endswith('.py')

True

In [136]:
s.endswith(('.py', '.ipynb')) #元组表示满足任意一个即可，不能是列表

True

In [138]:
[name for name in os.listdir('.') if name.endswith(('.py', 'ipynb'))]

['test.py', 'python高效编程技巧1.ipynb', 'python高效编程技巧2.ipynb']

In [142]:
os.stat('test.py')

os.stat_result(st_mode=33188, st_ino=2973742, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=0, st_atime=1532345862, st_mtime=1532345862, st_ctime=1532345877)

In [143]:
os.stat('test.py').st_mode

33188

In [144]:
# 改成8进制显示
oct(os.stat('test.py').st_mode)

'0o100644'

In [156]:
stat.S_*?

找一个权限掩码与原权限取或，赋给原权限

In [157]:
os.chmod('test.py', os.stat('test.py').st_mode | stat.S_IXUSR)

In [158]:
ls -l

total 360
-rw-r--r--@ 1 thunderbang  staff      45 Jul 23 14:26 history
-rw-r--r--  1 thunderbang  staff   54106 Jul 23 16:25 python高效编程技巧1.ipynb
-rw-r--r--  1 thunderbang  staff  119273 Jul 23 19:54 python高效编程技巧2.ipynb
-rwxr--r--@ 1 thunderbang  staff       0 Jul 23 19:37 [31mtest.py[m[m*


## <span id='15'>15 如何调整字符串中文本的格式？</span>
#### 例如，将log文件中的日期由中国日期'yyyy-mm-dd'改为'mm/dd/yyyy'
使用正则表达式re.sub()方法做字符串替换，利用正则表达式的捕获组捕获每个部分内容，然后在替换字符串中调整各个捕获组的顺序。

In [160]:
# 随便定义一个带中国日期的字符串
log = '2018-07-21 10:00:00 learning python'
log

'2018-07-21 10:00:00 learning python'

In [162]:
import re

re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', log)

'07/21/2018 10:00:00 learning python'

**正则表达式中括号()为捕获组，可按顺序引用。上述替换符中即用\2表示捕获组中第二个元素。注意用r‘ ’**

正则表达式有更复杂的引用方式，(?P<name\>xxx)可以对捕获元素命名，\g<name\>对捕获元素按名选取。

In [163]:
re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', r'\g<month>/\g<day>/\g<year>', log)

'07/21/2018 10:00:00 learning python'

##  <span id='16'>16 如何将多个小字符串拼接成一个大字符串？ </span>
需求：基于UDP的网络协议，按照固定次序向服务器传递一系列参数。假如中程序中，我们将各个参数按次序收集到列表中，因为UDP协议不能保证按顺序到达，需将各个参数拼接成一个数据包进行发送。
- 方法一：迭代列表，使用'+'操作依次拼接每一个字符串
- 方法二：使用str.join()方法，更加快速地拼接列表中所有字符串。

In [166]:
s1 = 'abcdefg'
s2 = '12345'
s1 + s2 #本质上+号是运算符重载，是利用了下面的方法

'abcdefg12345'

In [167]:
str.__add__(s1, s2)

'abcdefg12345'

In [168]:
pl = ['<0112>', '<32>', '<1024x726>', '1']
s = ''

for p in pl:
    s += p
s

'<0112><32><1024x726>1'

In [169]:
pl = ['<0112>', '<32>', '<1024x726>', '1']
s = ''

for p in pl:
    s += p
    print(s)

<0112>
<0112><32>
<0112><32><1024x726>
<0112><32><1024x726>1


#### 可以看到，此种方法在拼接字符串很多时，创建了大量临时字符串，用过之后就释放掉了。也就是说，有大量的对象拷贝和释放过程，开销大浪费资源

In [170]:
help(str.join)

Help on method_descriptor:

join(...)
    S.join(iterable) -> str
    
    Return a string which is the concatenation of the strings in the
    iterable.  The separator between elements is S.



In [171]:
';'.join(['abc', '123', 'xyz'])

'abc;123;xyz'

In [172]:
''.join(['abc', '123', 'xyz'])

'abc123xyz'

In [173]:
''.join(pl)

'<0112><32><1024x726>1'

 如果列表中不仅有字符串，也有数字，但想拼接成字符串，怎么办？
 - 可以列表解析（但会生成新列表，占用资源）
 - 用生成器对象（推荐）

In [174]:
l = ['abc', 123, 'xyz']
''.join([str(x) for x in l])

'abc123xyz'

In [177]:
(str(x) for x in l) #（）形成生成器对象

<generator object <genexpr> at 0x10697ffc0>

In [179]:
''.join((str(x) for x in l))

'abc123xyz'

In [180]:
''.join(str(x) for x in l) #括号可以省略

'abc123xyz'

## <span id='17'>17 如何对字符串进行左、右、中对齐？</span>
- 方法一：使用字符串的str.ljust(), str.rjust(), str.center()进行对齐
- 方法二：使用format()方法，传递类似'<20', '>20', '^20'参数

In [181]:
s = 'abc'
s.ljust(20)

'abc                 '

In [182]:
print(s.ljust(20, '='))
print(s.rjust(20))
print(s.center(20))

                 abc
        abc         


In [183]:
s = 'abc'
format(s, '<20')

'abc                 '

In [185]:
print(format(s, '>20'))
print(format(s, '^20'))

                 abc
        abc         


实例：左对齐显示

In [188]:
d = {
    'Discull': 500.0,
    'smallcull': 0.01,
    'far': 444,
    'large': 40,
    'thisislongest': 100
}

In [190]:
d.keys()

dict_keys(['Discull', 'smallcull', 'far', 'large', 'thisislongest'])

In [191]:
map(len, d.keys())

<map at 0x10698aa20>

In [192]:
max(map(len, d.keys()))

13

In [193]:
w = max(map(len, d.keys()))

In [194]:
for k in d:
    print(k.ljust(w), ':', d[k])

Discull       : 500.0
smallcull     : 0.01
far           : 444
large         : 40
thisislongest : 100


In [196]:
for k in d:
    print(format(k, '<{}'.format(w)), ':', d[k])

Discull       : 500.0
smallcull     : 0.01
far           : 444
large         : 40
thisislongest : 100


## <span id='18'> 18 如何去掉字符串中不需要的字符？</span>
- 方法一：字符串strip(), lstrip(), rstrip()去掉字符串两端字符
- 方法二：删除单个固定位置的字符，可以使用切片+拼接的方式
- 方法三：字符串的replace()方法或正则表达式re.sub()删除任意位置字符
- 方法四：字符串translate()方法，可以同时删除多种不同字符

In [198]:
s = '   abc   123   '
s.strip()

'abc   123'

In [199]:
s.lstrip()

'abc   123   '

In [200]:
s.rstrip()

'   abc   123'

In [201]:
s = '---abc+++'
s.strip('-+')

'abc'

In [202]:
# 切片+拼接
s = 'abc:123'
s[:3] + s[4:]

'abc123'

In [203]:
# replace
s = '\tabc\t123\txyz'
s.replace('\t', '')

'abc123xyz'

In [205]:
# re.sub
s = '\tabc\t123\txyz\ropq\r'

import re

re.sub('[\t\r]', '', s)

'abc123xyzopq'

In [211]:
# translate: str.translate & unicode.translate
# unicode.translate在python3中被删除
help(str.translate)

Help on method_descriptor:

translate(...)
    S.translate(table) -> str
    
    Return a copy of the string S in which each character has been mapped
    through the given translation table. The table must implement
    lookup/indexing via __getitem__, for instance a dictionary or list,
    mapping Unicode ordinals to Unicode ordinals, strings, or None. If
    this operation raises LookupError, the character is left untouched.
    Characters mapped to None are deleted.



In [212]:
# 将abc转化为xyz，xyz转化为abc，对字符串加密
s = 'abc123xyz'

str.maketrans('abcxyz', 'xyzabc') #建立映射表, 参数1和2长度必须一致

{97: 120, 98: 121, 99: 122, 120: 97, 121: 98, 122: 99}

In [213]:
s.translate(str.maketrans('abcxyz', 'xyzabc'))

'xyz123abc'

In [221]:
# s.translate()只有一个参数
s = 'abc\r123\nxyz\t'
s.translate(str.maketrans('\r\n\t', '   '))

'abc 123 xyz '

In [225]:
u = 'ni\u0301 ha\u030co'
u

'ní hǎo'

In [230]:
u.encode('utf-8')

b'ni\xcc\x81 ha\xcc\x8co'

In [237]:
u.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode character '\u0301' in position 2: ordinal not in range(128)

In [239]:
u.translate({0x0301: None}) #键为ASCII码的数值，也就是16进制值

'ni hǎo'

In [242]:
u.translate(dict.fromkeys([0x0301, 0x030c])) #因为字典值为None，可以采用fromkeys返回多

'ni hao'