# Python高效编程技巧1

1. [如何在列表、字典、集合中根据条件筛选数据？](#1)
2. [如何为元组中的每个元素命名，提高程序的可读性？](#2)
3. [如何统计序列中元素的出现频率？](#3)
4. [如何根据字典中值的大小，对字典中的项进行排序](#4)
5. [如何快速找到多个字典中的公共键（key）](#5)
6. [如何让字典保持有序](#6)
7. [如何实现用户的历史记录功能（最多n条）](#7)
8. [如何实现可迭代对象和迭代器对象？](#8)
9. [如何使用生成器函数实现可迭代对象？](#9)

## <span id='1'>1 如何在列表、字典、集合中根据条件筛选数据？</span>
- 列表
 - filter函数 filter(lambda x : x>=0, data)
 - 列表解析 \[x for x in data if x >= 0\]
- 字典 
 - 字典解析 \{k : v for k, v in d.iteritems() if v > 90}
- 集合
 - 集合解析 \{x for x in s if x%3 == 0}

In [1]:
from random import randint

In [3]:
data = [randint(-10, 10) for _ in range(10)]
data

[5, -10, 7, -2, -4, 9, -4, -5, -7, -10]

In [4]:
filter?

In [10]:
filter(lambda x: x >= 0, data)

<filter at 0x106463a20>

In [11]:
[x for x in data if x >= 0]

[5, 7, 9]

In [12]:
timeit filter(lambda x: x >= 0, data)

217 ns ± 3.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [13]:
timeit [x for x in data if x >= 0]

532 ns ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [14]:
d = {x: randint(60, 100) for x in range(1, 21)}
d

{1: 87,
 2: 78,
 3: 63,
 4: 75,
 5: 82,
 6: 88,
 7: 61,
 8: 69,
 9: 88,
 10: 97,
 11: 69,
 12: 85,
 13: 91,
 14: 62,
 15: 70,
 16: 71,
 17: 70,
 18: 89,
 19: 82,
 20: 85}

In [16]:
{k: v for k, v in d.items() if v > 90}

{10: 97, 13: 91}

In [17]:
data

[5, -10, 7, -2, -4, 9, -4, -5, -7, -10]

In [18]:
s = set(data)
s

{-10, -7, -5, -4, -2, 5, 7, 9}

In [19]:
{x for x in s if x % 3 == 0}

{9}

## <span id='2'>2 如何为元组中的每个元素命名，提高程序的可读性？</span>
- 定义类似与其他语言的枚举类型，也就是定义一系列数值常量
- 使用标准库中collections.namedtuple替代内置tuple

In [21]:
student = ('Jim', 16, 'male', 'jim8721@gmail.com')
student

('Jim', 16, 'male', 'jim8721@gmail.com')

In [22]:
student[0]

'Jim'

要引用元组中的元素，需要大量的数字索引，降低了可读性。
#### 采用方法一，定义一系列数值常量

In [23]:
NAME, AGE, SEX, EMAIL = range(4)
student[NAME]

'Jim'

**采用方法二，使用标准库中collections.namedtuple替代内置tuple**

In [24]:
from collections import namedtuple
Student = namedtuple('Student', ['name', 'age', 'sex', 'email'])
s = Student('Jim', 16, 'male', 'jim8721@gmail.com')
s

Student(name='Jim', age=16, sex='male', email='jim8721@gmail.com')

In [26]:
s = Student(name='Jim', age=16, sex='male', email='jim8721@gmail.com')
s

Student(name='Jim', age=16, sex='male', email='jim8721@gmail.com')

In [27]:
s.name

'Jim'

In [28]:
s.age

16

In [29]:
isinstance(s, tuple)

True

## <span id='3'>3 如何统计序列中元素的出现频率？ </span>
- 随机序列中元素频率统计
- 词频统计：找出出现次数最高的10个单词，及出现次数 

In [30]:
from random import randint

In [32]:
data = [randint(0, 20) for _ in range(30)]
data

[18,
 4,
 9,
 18,
 7,
 15,
 0,
 16,
 20,
 2,
 0,
 11,
 19,
 3,
 8,
 0,
 20,
 10,
 3,
 19,
 3,
 13,
 16,
 10,
 12,
 15,
 11,
 9,
 16,
 18]

In [33]:
c = dict.fromkeys(data, 0)
c

{18: 0,
 4: 0,
 9: 0,
 7: 0,
 15: 0,
 0: 0,
 16: 0,
 20: 0,
 2: 0,
 11: 0,
 19: 0,
 3: 0,
 8: 0,
 10: 0,
 13: 0,
 12: 0}

In [34]:
for x in data:
    c[x] += 1
c

{18: 3,
 4: 1,
 9: 2,
 7: 1,
 15: 2,
 0: 3,
 16: 3,
 20: 2,
 2: 1,
 11: 2,
 19: 2,
 3: 3,
 8: 1,
 10: 2,
 13: 1,
 12: 1}

In [45]:
sorted(c.items(), key=lambda c: c[1])

[(4, 1),
 (7, 1),
 (2, 1),
 (8, 1),
 (13, 1),
 (12, 1),
 (9, 2),
 (15, 2),
 (20, 2),
 (11, 2),
 (19, 2),
 (10, 2),
 (18, 3),
 (0, 3),
 (16, 3),
 (3, 3)]

以上原生处理方法仍然较麻烦
#### 使用collections.Counter对象
- 将序列传入Counter构造器，得到Counter对象是元素频度的字典
- Counter.most_common(n)方法得到频度最高的n个元素的列表

In [47]:
from collections import Counter
c2 = Counter(data)
c2

Counter({18: 3,
         4: 1,
         9: 2,
         7: 1,
         15: 2,
         0: 3,
         16: 3,
         20: 2,
         2: 1,
         11: 2,
         19: 2,
         3: 3,
         8: 1,
         10: 2,
         13: 1,
         12: 1})

In [48]:
c2.most_common(3)

[(18, 3), (0, 3), (16, 3)]

**词频统计**

In [58]:
import re
txt = '''
I’m a Civil Engineer who decided to change his career track. For the last years, I’ve been self-learning how to move from Civil Engineering to Machine Learning.

Here, I’ll tell you stories about my learning projects. We, humans, learn better from other people. That’s why I believe that, by telling you stories about how I solved a problem, I can help you
'''
txt

'\nI’m a Civil Engineer who decided to change his career track. For the last years, I’ve been self-learning how to move from Civil Engineering to Machine Learning.\n\nHere, I’ll tell you stories about my learning projects. We, humans, learn better from other people. That’s why I believe that, by telling you stories about how I solved a problem, I can help you\n'

In [59]:
c3 = Counter(re.split('\W+', txt))
c3

Counter({'': 2,
         'I': 6,
         'm': 1,
         'a': 2,
         'Civil': 2,
         'Engineer': 1,
         'who': 1,
         'decided': 1,
         'to': 3,
         'change': 1,
         'his': 1,
         'career': 1,
         'track': 1,
         'For': 1,
         'the': 1,
         'last': 1,
         'years': 1,
         've': 1,
         'been': 1,
         'self': 1,
         'learning': 2,
         'how': 2,
         'move': 1,
         'from': 2,
         'Engineering': 1,
         'Machine': 1,
         'Learning': 1,
         'Here': 1,
         'll': 1,
         'tell': 1,
         'you': 3,
         'stories': 2,
         'about': 2,
         'my': 1,
         'projects': 1,
         'We': 1,
         'humans': 1,
         'learn': 1,
         'better': 1,
         'other': 1,
         'people': 1,
         'That': 1,
         's': 1,
         'why': 1,
         'believe': 1,
         'that': 1,
         'by': 1,
         'telling': 1,
         'solved': 1,

In [60]:
c3.most_common(10)

[('I', 6),
 ('to', 3),
 ('you', 3),
 ('', 2),
 ('a', 2),
 ('Civil', 2),
 ('learning', 2),
 ('how', 2),
 ('from', 2),
 ('stories', 2)]

##  <span id='4'> 4 如何根据字典中值的大小，对字典中的项进行排序 </span>
- 利用zip将字典数据转化为元组
- 传递sorted函数的key参数 

In [62]:
d = {x: randint(60, 100) for x in 'xyzabc'}
d

{'x': 92, 'y': 92, 'z': 95, 'a': 82, 'b': 94, 'c': 68}

In [63]:
sorted(d)

['a', 'b', 'c', 'x', 'y', 'z']

In [64]:
iter(d)

<dict_keyiterator at 0x106515318>

In [65]:
list(iter(d))

['x', 'y', 'z', 'a', 'b', 'c']

In [66]:
(92, 'x') > (95, 'z')

False

In [67]:
d.keys()

dict_keys(['x', 'y', 'z', 'a', 'b', 'c'])

In [68]:
d.values()

dict_values([92, 92, 95, 82, 94, 68])

In [71]:
zip(d.values(), d.keys())

<zip at 0x1064fb648>

In [72]:
list(zip(d.values(), d.keys()))

[(92, 'x'), (92, 'y'), (95, 'z'), (82, 'a'), (94, 'b'), (68, 'c')]

In [74]:
sorted(zip(d.values(), d.keys()))

[(68, 'c'), (82, 'a'), (92, 'x'), (92, 'y'), (94, 'b'), (95, 'z')]

方法二：key参数

In [75]:
d.items()

dict_items([('x', 92), ('y', 92), ('z', 95), ('a', 82), ('b', 94), ('c', 68)])

In [77]:
sorted(d.items(), key=lambda d: d[1])

[('c', 68), ('a', 82), ('x', 92), ('y', 92), ('b', 94), ('z', 95)]

## <span id='5'> 5 如何快速找到多个字典中的公共键（key）</span>
利用集合set()的交集操作
- 使用字典的keys()方法，得到一个字典keys的集合
- 使用map函数，得到所有字典的keys集合
- 使用reduce函数， 取所有字典的keys的集合的交集 

In [2]:
from random import randint, sample
sample('abcdefg', 3)

['f', 'a', 'g']

In [3]:
sample('abcdefg', randint(3, 6))

['f', 'c', 'e']

In [4]:
s1 = {x: randint(1, 4) for x in sample('abcdefg', randint(3, 6))}
s1

{'d': 4, 'c': 1, 'f': 2, 'b': 1}

In [5]:
s2 = {x: randint(1, 4) for x in sample('abcdefg', randint(3, 6))}
s3 = {x: randint(1, 4) for x in sample('abcdefg', randint(3, 6))}
print(s2)
print(s3)

{'f': 3, 'b': 2, 'a': 1, 'e': 1, 'c': 4, 'g': 4}
{'a': 4, 'd': 2, 'f': 1, 'b': 4, 'e': 1}


In [6]:
res = []
for k in s1:
    if k in s2 and k in s3:
        res.append(k)
res

['f', 'b']

用 & 取交集（适合数量少）

In [7]:
s1.keys()

dict_keys(['d', 'c', 'f', 'b'])

In [8]:
s2.keys()

dict_keys(['f', 'b', 'a', 'e', 'c', 'g'])

In [9]:
s1.keys() & s2.keys() & s3.keys()

{'b', 'f'}

用map, reduce方法（可以对n个集合求交集）

In [10]:
map(dict.keys, [s1, s2, s3])

<map at 0x1065caa20>

In [11]:
list(map(dict.keys, [s1, s2, s3]))

[dict_keys(['d', 'c', 'f', 'b']),
 dict_keys(['f', 'b', 'a', 'e', 'c', 'g']),
 dict_keys(['a', 'd', 'f', 'b', 'e'])]

In [14]:
from functools import reduce
reduce(lambda a, b: a & b, map(dict.keys, [s1, s2, s3]))

{'b', 'f'}

##  <span id='6'>6 如何让字典保持有序 </span>

In [16]:
d = {}
d['Jim'] = (1, 35)
d['Leo'] = (2, 37)
d['Bob'] = (3, 40)
d

{'Jim': (1, 35), 'Leo': (2, 37), 'Bob': (3, 40)}

In [20]:
for k in d:
    print(k)

Jim
Leo
Bob


上述字典实际上是无序的，无法保证其按排名顺序输出

In [22]:
from collections import OrderedDict
d = OrderedDict()
d['Jim'] = (1, 35)
d['Leo'] = (2, 37)
d['Bob'] = (3, 40)
d

OrderedDict([('Jim', (1, 35)), ('Leo', (2, 37)), ('Bob', (3, 40))])

In [23]:
for k in d:
    print(k)

Jim
Leo
Bob


**实例：模拟答题时间排序系统**

In [28]:
from time import time
from random import randint
from collections import OrderedDict

d = OrderedDict()
players = list('ABCDEFGH')
start = time()

for i in range(len(players)):
    input() #点击一次表示一个选手提交
    p = players.pop(randint(0, 7 - i)) #randint是闭区间
    end = time()
    print(i + 1, p, end - start)
    d[p] = (i + 1, end - start)


1 F 0.9821949005126953

2 G 1.7208008766174316

3 D 2.0660178661346436

4 E 2.3328609466552734

5 A 2.6529877185821533

6 B 2.968859910964966

7 C 3.647524833679199

8 H 4.424718856811523


In [29]:
d

OrderedDict([('F', (1, 0.9821949005126953)),
             ('G', (2, 1.7208008766174316)),
             ('D', (3, 2.0660178661346436)),
             ('E', (4, 2.3328609466552734)),
             ('A', (5, 2.6529877185821533)),
             ('B', (6, 2.968859910964966)),
             ('C', (7, 3.647524833679199)),
             ('H', (8, 4.424718856811523))])

In [30]:
for k in d:
    print(k, d[k])

F (1, 0.9821949005126953)
G (2, 1.7208008766174316)
D (3, 2.0660178661346436)
E (4, 2.3328609466552734)
A (5, 2.6529877185821533)
B (6, 2.968859910964966)
C (7, 3.647524833679199)
H (8, 4.424718856811523)


## <span id='7'>7 如何实现用户的历史记录功能（最多n条）</span>
使用容量为n的队列存储历史记录
- 使用标准库collections中的deque，它是一个双端循环队列 
- 程序退出前，可以使用pickle将队列对象存入文件，再次运行时将其导入

队列介绍

In [37]:
from collections import deque

q = deque([], 5)
q.append(1)
q

deque([1])

In [38]:
q.append(2)
q.append(3)
q.append(4)
q.append(5)
q

deque([1, 2, 3, 4, 5])

In [39]:
q.append(6)
q

deque([2, 3, 4, 5, 6])

**实例：猜数字游戏**

In [40]:
from random import randint

N = randint(0, 10)

def guess(k):
    if k == N:
        print('right')
        return True
    if k < N:
        print('%s is less than N' % k)
    else:
        print('%s is greater than N' % k)
    return False

while True:
    line = input('please input a number') #python3没有raw_input
    if line.isdigit():
        k = int(line)
        if guess(k):
            break

please input a number5
5 is greater than N
please input a number3
right


添加历史记录功能

In [43]:
from random import randint
from collections import deque

N = randint(0, 100)
history = deque([], 5)

def guess(k):
    if k == N:
        print('right')
        return True
    if k < N:
        print('%s is less than N' % k)
    else:
        print('%s is greater than N' % k)
    return False

while True:
    line = input('please input a number:') #python3没有raw_input
    if line.isdigit():
        k = int(line)
        history.append(k)
        if guess(k):
            break
    elif line == 'history' or line == 'h?':
        print(list(history))

please input a number:50
50 is greater than N
please input a number:25
25 is greater than N
please input a number:13
13 is less than N
please input a number:20
20 is greater than N
please input a number:h?
[50, 25, 13, 20]
please input a number:16
right


将历史记录存储进文件，而非内存

In [45]:
import pickle
q

deque([2, 3, 4, 5, 6])

In [53]:
pickle.dump

<function _pickle.dump(obj, file, protocol=None, *, fix_imports=True)>

In [58]:
pickle.dump(q, open('history', 'wb'))

In [62]:
pickle.load

<function _pickle.load(file, *, fix_imports=True, encoding='ASCII', errors='strict')>

In [69]:
q2 = pickle.load(open('history', 'rb'))

In [70]:
q2

deque([2, 3, 4, 5, 6])

## <span id='8'>8 如何实现可迭代对象和迭代器对象？ </span> 

In [71]:
l = [1, 2, 3, 4]
s = 'abcde'
for x in l: print(x)

1
2
3
4


In [72]:
for x in s: print(x)

a
b
c
d
e


In [73]:
iter(l)

<list_iterator at 0x10652af98>

In [74]:
iter(s)

<str_iterator at 0x106345518>

In [76]:
help(iter)

Help on built-in function iter in module builtins:

iter(...)
    iter(iterable) -> iterator
    iter(callable, sentinel) -> iterator
    
    Get an iterator from an object.  In the first form, the argument must
    supply its own iterator, or be a sequence.
    In the second form, the callable is called until it returns the sentinel.



In [96]:
l.__*? #可迭代对象的方法

内置函数iter()会先去找变量的\__iter\__属性，没有的话再去找\__getitem\__

In [98]:
t = iter(l)
next(t) #迭代器的方法

1

In [99]:
t.__next__()

2

In [101]:
t.next() #python2的命令，python3不可用

AttributeError: 'list_iterator' object has no attribute 'next'

In [103]:
from collections import Iterable, Iterator
Iterator.__abstractmethods__

frozenset({'__next__'})

In [104]:
Iterable.__abstractmethods__

frozenset({'__iter__'})

**实例：天气信息查询。实现‘用时访问’策略，将所有城市气温封装到一个对象中，可用for进行迭代**

In [107]:
import requests
def getWeather(city):
    r = requests.get(u'http://wthrcdn.etouch.cn/weather_mini?city=' + city)
    data = r.json()['data']['forecast'][0]
    return '%s, %s, %s' %(city, data['low'], data['high'])

#[u'北京‘，u'上海', u'广州', u‘长春']
print(getWeather(u'北京'))
print(getWeather(u'长春'))

北京, 低温 25℃, 高温 32℃
长春, 低温 25℃, 高温 33℃


In [108]:
class WeatherIterator(Iterator):
    def __init__(self, cities):
        self.cities = cities
        self.index = 0
        
    def getWeather(self, city):
        r = requests.get(u'http://wthrcdn.etouch.cn/weather_mini?city=' + city)
        data = r.json()['data']['forecast'][0]
        return '%s, %s, %s' %(city, data['low'], data['high'])
    
    def __next__(self):
        if self.index == len(self.cities):
            raise StopIteration
        city = self.cities[self.index]
        self.index += 1
        return self.getWeather(city)
    
class WeatherIterable(Iterable):
    def __init__(self, cities):
        self.cities = cities
        
    def __iter__(self):
        return WeatherIterator(self.cities)

In [111]:
cities = [u'北京', u'上海', u'广州', u'长春']
WeatherIterable(cities)

<__main__.WeatherIterable at 0x10e0e0400>

In [112]:
for x in WeatherIterable(cities):
    print(x)

北京, 低温 25℃, 高温 32℃
上海, 低温 27℃, 高温 33℃
广州, 低温 25℃, 高温 33℃
长春, 低温 25℃, 高温 33℃


In [114]:
weather_info = WeatherIterator(cities)
weather_info

<__main__.WeatherIterator at 0x10e0cfc50>

In [115]:
next(weather_info)

'北京, 低温 25℃, 高温 32℃'

In [116]:
next(weather_info)

'上海, 低温 27℃, 高温 33℃'

In [117]:
next(weather_info)

'广州, 低温 25℃, 高温 33℃'

In [118]:
next(weather_info)

'长春, 低温 25℃, 高温 33℃'

In [119]:
next(weather_info)

StopIteration: 

## <span id='9'>9 如何使用生成器函数实现可迭代对象？ </span> 

In [128]:
def f():
    print('in f(), 1')
    yield 1
    
    print('in f(), 2')
    yield 2
    
    print('in f(), 3')
    yield 3
    
g = f()
g

<generator object f at 0x10e14d7d8>

In [129]:
next(g)

in f(), 1


1

In [130]:
next(g)
next(g)

in f(), 2
in f(), 3


3

In [131]:
next(g)

StopIteration: 

In [137]:
g = f()
for x in g:
    print(x)

in f(), 1
1
in f(), 2
2
in f(), 3
3


In [139]:
g.__iter__()

<generator object f at 0x10e14d990>

In [141]:
g.__iter__() is g

True

#### 实例：实现一个可迭代对象的类，能迭代出给定范围内的所有素数
- 将该类的__iter__方法实现生成器函数，每次yield返回一个素数

In [144]:
class PrimeNumbers:
    def __init__(self, start, end):
        self.start = start
        self.end = end
        
    def isPrime(self, k):
        if k < 2:
            return False
        
        for i in range(2, int(k**0.5)):
            if k % i == 0:
                return False
        
        return True
    
    def __iter__(self):
        for k in range(self.start, self.end+1):
            if self.isPrime(k):
                yield k
                
for x in PrimeNumbers(1, 20):
    print(x)

2
3
4
5
6
7
8
9
11
13
15
17
19


In [148]:
a = PrimeNumbers(1, 20)
a.__iter__()

<generator object PrimeNumbers.__iter__ at 0x10e19c678>