## 1. 更新DataFrame数据的常见问题

## (1) 代码1：以下代码可以正常运行

In [None]:
from pandas import DataFrame  
a = {'a': 1, 'b': 1}
df = DataFrame([a])
print df
rec = df.ix[0]
rec['a'] = 100
print df

## (2) 代码2: 以下代码无法正常运行
这两段代码的差别只是，代码的dict类型不同，这样导致了pandas 使用ix操作的时候使用的是copy，所以修改不能影响元数据。

In [None]:
from pandas import DataFrame
a = {'a': 1, 'b': '1'}
df = DataFrame([a])
print df
rec = df.ix[0]
rec['a'] = 100
print df

## (3)  进一步参考

原以为这是一个bug，提交到pandas开发团队:<br/>
https://github.com/pydata/pandas/issues/11510

说明此问题的文档: <br/>
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

# 2. 安全更新DataFrame的方法

In [35]:
from pandas import DataFrame
data = [ {'a': 1, 'b': '1'}, {'a': 2, 'b': '2'} ]
df = DataFrame(data)
df

Unnamed: 0,a,b
0,1,1
1,2,2


以下代码无法修改成功:

In [36]:
rec = df.ix[0]
rec['a'] = 100
df

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,a,b
0,1,1
1,2,2


## (1) 方法1: 使用切片

以下代码可以修改成功但是有警告:

In [37]:
rec = df.ix[0:0]
rec['a'] = 100
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,a,b
0,100,1
1,2,2


## (2) 方法2使用loc(官方推荐方法)

In [38]:
rec = df.loc[1, 'a'] = 1000
df

Unnamed: 0,a,b
0,100,1
1,1000,2


## (3) 使用update 方法

In [43]:
df = DataFrame(    
    {'a': range(0, 5), 'b': range(10, 15), 'c': range(20, 25)},
    index=range(5)
)
df

Unnamed: 0,a,b,c
0,0,10,20
1,1,11,21
2,2,12,22
3,3,13,23
4,4,14,24


以下代码df不会发生修改, 并且发出警告:

In [44]:
toUpdate = df[df.b > 12]
toUpdate['c'] = 1000
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,a,b,c
0,0,10,20
1,1,11,21
2,2,12,22
3,3,13,23
4,4,14,24


执行update后df发生修改

In [46]:
df.update(toUpdate)
df

Unnamed: 0,a,b,c
0,0,10,20
1,1,11,21
2,2,12,22
3,3,13,1000
4,4,14,1000
