Pandas逻辑运算: The truth value of a Series is ambiguous

原因是由于在python中 or 和 and 的声明需要 truth-values, 即真实的True或者False

但是df['pop']>3返回并不是True或False，而仍然是一个Series，所以在pandas中这样使用被认为是不明确(ambiguous)

因此需要使用位运算符(bitwise)即 |(or) 或者 &(and):

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
data = {'state': ['US','Ohio','Nevada'], 'pop':[2.5, 4.3, 3.2]}
df = pd.DataFrame(data)
df

Unnamed: 0,pop,state
0,2.5,US
1,4.3,Ohio
2,3.2,Nevada


In [3]:
df[(df['pop']>3) and (df['pop']<4)]  ##错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [4]:
df[(df['pop']>3) & (df['pop']<4)]  ##正确

Unnamed: 0,pop,state
2,3.2,Nevada


# UserWarning:
Boolean Series key will be reindexed to match DataFrame index.

In [5]:
df[df['pop']>3][df['pop']<4]  ##警告- bool索引排序会有改变

  """Entry point for launching an IPython kernel.


Unnamed: 0,pop,state
2,3.2,Nevada


# 行内容字符串过滤 + str 函数使用

In [7]:
df[ df.state.str.find('U') != 0 ]

Unnamed: 0,pop,state
1,4.3,Ohio
2,3.2,Nevada
