# 1. Pandas

- 정의 : 데이터 가공 및 분석을 위해 만들어진 Python 라이브러리
- Pandas라는 이름은 "Panel data"에서 유래함(개량경제학 용어로써 다차원의 구조화된 데이터셋을 다룸)
> balanced panel
![balanced panel](https://wikimedia.org/api/rest_v1/media/math/render/svg/de4ab9449dffb05244e681551e6f3ce710856ac6)
unbalanced panel
![unbalanced panel](https://wikimedia.org/api/rest_v1/media/math/render/svg/fad5580f0bc2deadc1a110b647dded40867600c0)

## 1.1 10 Minute to pandas (http://pandas.pydata.org/pandas-docs/stable/10min.html)
 - Object 생성([Object Creation](http://pandas.pydata.org/pandas-docs/stable/10min.html#object-creation))
 - 데이터 조회([Viewing Data](http://pandas.pydata.org/pandas-docs/stable/10min.html#viewing-data))
 - 데이터 선택([Selection](http://pandas.pydata.org/pandas-docs/stable/10min.html#selection))
     - Selection by Label
     - Selection by Position
     - Boolean Indexing
     - Setting
 - 누락 데이터([Missing Data](http://pandas.pydata.org/pandas-docs/stable/10min.html#missing-data))
 - Pandas 계산([Operations](http://pandas.pydata.org/pandas-docs/stable/10min.html#operations))
     - Stats
     - Apply
     - Histogramming
     - String Methods
 - 병합([Merge](http://pandas.pydata.org/pandas-docs/stable/10min.html#merge))
     - Concat
     - Join
     - Append
 - 그룹화([Grouping](http://pandas.pydata.org/pandas-docs/stable/10min.html#grouping))
 - 변형([Reshaping](http://pandas.pydata.org/pandas-docs/stable/10min.html#reshaping))
     - Stack
     - Pivot Tables
 - 시계열([Time Series](http://pandas.pydata.org/pandas-docs/stable/10min.html#time-series))
 - 분류([Categoricals](http://pandas.pydata.org/pandas-docs/stable/10min.html#categoricals))
 - 그래프([Plotting](http://pandas.pydata.org/pandas-docs/stable/10min.html#plotting))
 - 데이터 입출력([Getting Data in/out](http://pandas.pydata.org/pandas-docs/stable/10min.html#getting-data-in-out))
 - [Gotchas](http://pandas.pydata.org/pandas-docs/stable/10min.html#gotchas)

### 1.1.1 Object 생성

In [4]:
import pandas as pd
import numpy as np
s = pd.Series([1,3,5,np.nan,6,8])
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

### 1.1.2 데이터 조회

#### 1.1.2.1 처음과 끝 조회

In [5]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,0.581564,-2.359538,-0.592095,-0.07066
2013-01-02,-0.530759,0.850432,3.071119,-1.111024
2013-01-03,-1.37277,-0.943479,0.431011,1.418258
2013-01-04,-0.759652,1.52357,-1.351856,-1.171365
2013-01-05,-0.675534,2.117911,1.675931,-0.550272


In [6]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,-0.759652,1.52357,-1.351856,-1.171365
2013-01-05,-0.675534,2.117911,1.675931,-0.550272
2013-01-06,-0.186688,-0.341802,0.211513,-0.000795


#### 1.1.2.2 인덱스, 컬럼, 값 조회

In [7]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [8]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [9]:
df.values

array([[  5.81563738e-01,  -2.35953751e+00,  -5.92095482e-01,
         -7.06595426e-02],
       [ -5.30758993e-01,   8.50432428e-01,   3.07111864e+00,
         -1.11102415e+00],
       [ -1.37277001e+00,  -9.43479157e-01,   4.31011466e-01,
          1.41825832e+00],
       [ -7.59651913e-01,   1.52357019e+00,  -1.35185609e+00,
         -1.17136489e+00],
       [ -6.75534272e-01,   2.11791063e+00,   1.67593135e+00,
         -5.50272218e-01],
       [ -1.86687990e-01,  -3.41802025e-01,   2.11512997e-01,
         -7.94971804e-04]])

#### 1.1.2.3 통계 데이터, 변환, 정렬

In [10]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.49064,0.141182,0.57427,-0.247643
std,0.65243,1.672414,1.591879,0.954752
min,-1.37277,-2.359538,-1.351856,-1.171365
25%,-0.738623,-0.79306,-0.391193,-0.970836
50%,-0.603147,0.254315,0.321262,-0.310466
75%,-0.272706,1.355286,1.364701,-0.018261
max,0.581564,2.117911,3.071119,1.418258


In [11]:
df.T

Unnamed: 0,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00,2013-01-06 00:00:00
A,0.581564,-0.530759,-1.37277,-0.759652,-0.675534,-0.186688
B,-2.359538,0.850432,-0.943479,1.52357,2.117911,-0.341802
C,-0.592095,3.071119,0.431011,-1.351856,1.675931,0.211513
D,-0.07066,-1.111024,1.418258,-1.171365,-0.550272,-0.000795


In [12]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,-0.07066,-0.592095,-2.359538,0.581564
2013-01-02,-1.111024,3.071119,0.850432,-0.530759
2013-01-03,1.418258,0.431011,-0.943479,-1.37277
2013-01-04,-1.171365,-1.351856,1.52357,-0.759652
2013-01-05,-0.550272,1.675931,2.117911,-0.675534
2013-01-06,-0.000795,0.211513,-0.341802,-0.186688


In [13]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-01,0.581564,-2.359538,-0.592095,-0.07066
2013-01-03,-1.37277,-0.943479,0.431011,1.418258
2013-01-06,-0.186688,-0.341802,0.211513,-0.000795
2013-01-02,-0.530759,0.850432,3.071119,-1.111024
2013-01-04,-0.759652,1.52357,-1.351856,-1.171365
2013-01-05,-0.675534,2.117911,1.675931,-0.550272


### 1.1.3 선택
#### 1.1.3.1 List 표현

In [14]:
df['A']

2013-01-01    0.581564
2013-01-02   -0.530759
2013-01-03   -1.372770
2013-01-04   -0.759652
2013-01-05   -0.675534
2013-01-06   -0.186688
Freq: D, Name: A, dtype: float64

In [15]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,0.581564,-2.359538,-0.592095,-0.07066
2013-01-02,-0.530759,0.850432,3.071119,-1.111024
2013-01-03,-1.37277,-0.943479,0.431011,1.418258


In [16]:
df['2013-01-02':'2013-01-03']

Unnamed: 0,A,B,C,D
2013-01-02,-0.530759,0.850432,3.071119,-1.111024
2013-01-03,-1.37277,-0.943479,0.431011,1.418258


#### 1.1.3.2 Label 을 이용한 표현

In [17]:
df.loc[dates[0]]

A    0.581564
B   -2.359538
C   -0.592095
D   -0.070660
Name: 2013-01-01 00:00:00, dtype: float64

In [18]:
df.loc['2013-01-02']

A   -0.530759
B    0.850432
C    3.071119
D   -1.111024
Name: 2013-01-02 00:00:00, dtype: float64

In [19]:
df.loc[:,['A', 'B']]

Unnamed: 0,A,B
2013-01-01,0.581564,-2.359538
2013-01-02,-0.530759,0.850432
2013-01-03,-1.37277,-0.943479
2013-01-04,-0.759652,1.52357
2013-01-05,-0.675534,2.117911
2013-01-06,-0.186688,-0.341802


In [20]:
df.loc['20130102':'20130104',['A','B']]

Unnamed: 0,A,B
2013-01-02,-0.530759,0.850432
2013-01-03,-1.37277,-0.943479
2013-01-04,-0.759652,1.52357


In [21]:
df.loc['20130102',['A','B']]

A   -0.530759
B    0.850432
Name: 2013-01-02 00:00:00, dtype: float64

In [22]:
df.loc[dates[0],'A']

0.58156373835831165

아래가 좀더 빠르다

In [23]:
df.at[dates[0],'A']

0.58156373835831165

#### 1.1.3.3 위치를 이용한 표현

In [24]:
df.iloc[3]

A   -0.759652
B    1.523570
C   -1.351856
D   -1.171365
Name: 2013-01-04 00:00:00, dtype: float64

In [25]:
df.iloc[3:5,0:2]

Unnamed: 0,A,B
2013-01-04,-0.759652,1.52357
2013-01-05,-0.675534,2.117911


In [26]:
df.iloc[[1,2,4],[0,2]]

Unnamed: 0,A,C
2013-01-02,-0.530759,3.071119
2013-01-03,-1.37277,0.431011
2013-01-05,-0.675534,1.675931


In [27]:
df.iloc[1:3,:]

Unnamed: 0,A,B,C,D
2013-01-02,-0.530759,0.850432,3.071119,-1.111024
2013-01-03,-1.37277,-0.943479,0.431011,1.418258


In [28]:
 df.iloc[:,1:3]

Unnamed: 0,B,C
2013-01-01,-2.359538,-0.592095
2013-01-02,0.850432,3.071119
2013-01-03,-0.943479,0.431011
2013-01-04,1.52357,-1.351856
2013-01-05,2.117911,1.675931
2013-01-06,-0.341802,0.211513


In [29]:
df.iloc[1,1]

0.85043242775317796

아래가 더 빠르다

In [30]:
df.iat[1,1]

0.85043242775317796

#### 1.1.3.4 조건식을 이용

In [31]:
df[df.A>0]

Unnamed: 0,A,B,C,D
2013-01-01,0.581564,-2.359538,-0.592095,-0.07066


In [32]:
df[df>0]

Unnamed: 0,A,B,C,D
2013-01-01,0.581564,,,
2013-01-02,,0.850432,3.071119,
2013-01-03,,,0.431011,1.418258
2013-01-04,,1.52357,,
2013-01-05,,2.117911,1.675931,
2013-01-06,,,0.211513,


In [33]:
df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']

In [34]:
df2

Unnamed: 0,A,B,C,D,E
2013-01-01,0.581564,-2.359538,-0.592095,-0.07066,one
2013-01-02,-0.530759,0.850432,3.071119,-1.111024,one
2013-01-03,-1.37277,-0.943479,0.431011,1.418258,two
2013-01-04,-0.759652,1.52357,-1.351856,-1.171365,three
2013-01-05,-0.675534,2.117911,1.675931,-0.550272,four
2013-01-06,-0.186688,-0.341802,0.211513,-0.000795,three


In [35]:
df2[df2['E'].isin(['two','four'])]

Unnamed: 0,A,B,C,D,E
2013-01-03,-1.37277,-0.943479,0.431011,1.418258,two
2013-01-05,-0.675534,2.117911,1.675931,-0.550272,four


#### 1.1.3.5 Setting

In [36]:
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))

In [37]:
s1

2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64

In [38]:
df['F'] = s1

In [39]:
df.at[dates[0], 'A'] = 0

In [40]:
df.iat[0,1] = 0

In [41]:
df.loc[:,'D'] = np.array([5]* len(df))

In [42]:
df

Unnamed: 0,A,B,C,D,F
2013-01-01,0.0,0.0,-0.592095,5,
2013-01-02,-0.530759,0.850432,3.071119,5,1.0
2013-01-03,-1.37277,-0.943479,0.431011,5,2.0
2013-01-04,-0.759652,1.52357,-1.351856,5,3.0
2013-01-05,-0.675534,2.117911,1.675931,5,4.0
2013-01-06,-0.186688,-0.341802,0.211513,5,5.0


In [43]:
df2 = df.copy()
df2[df2>0] = -df2

In [44]:
df2

Unnamed: 0,A,B,C,D,F
2013-01-01,0.0,0.0,-0.592095,-5,
2013-01-02,-0.530759,-0.850432,-3.071119,-5,-1.0
2013-01-03,-1.37277,-0.943479,-0.431011,-5,-2.0
2013-01-04,-0.759652,-1.52357,-1.351856,-5,-3.0
2013-01-05,-0.675534,-2.117911,-1.675931,-5,-4.0
2013-01-06,-0.186688,-0.341802,-0.211513,-5,-5.0


### 1.1.4 누락 데이터

In [45]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns)+ ['E'])

In [46]:
df1

Unnamed: 0,A,B,C,D,F,E
2013-01-01,0.0,0.0,-0.592095,5,,
2013-01-02,-0.530759,0.850432,3.071119,5,1.0,
2013-01-03,-1.37277,-0.943479,0.431011,5,2.0,
2013-01-04,-0.759652,1.52357,-1.351856,5,3.0,


In [47]:
df1.loc[dates[0]:dates[1], 'E'] = 1

In [48]:
df1

Unnamed: 0,A,B,C,D,F,E
2013-01-01,0.0,0.0,-0.592095,5,,1.0
2013-01-02,-0.530759,0.850432,3.071119,5,1.0,1.0
2013-01-03,-1.37277,-0.943479,0.431011,5,2.0,
2013-01-04,-0.759652,1.52357,-1.351856,5,3.0,


In [49]:
# NaN 항목 지우기
df1.dropna(how='any')

Unnamed: 0,A,B,C,D,F,E
2013-01-02,-0.530759,0.850432,3.071119,5,1.0,1.0


In [50]:
#채우기
df1.fillna(value=5)

Unnamed: 0,A,B,C,D,F,E
2013-01-01,0.0,0.0,-0.592095,5,5.0,1.0
2013-01-02,-0.530759,0.850432,3.071119,5,1.0,1.0
2013-01-03,-1.37277,-0.943479,0.431011,5,2.0,5.0
2013-01-04,-0.759652,1.52357,-1.351856,5,3.0,5.0


In [51]:
pd.isnull(df1)

Unnamed: 0,A,B,C,D,F,E
2013-01-01,False,False,False,False,True,False
2013-01-02,False,False,False,False,False,False
2013-01-03,False,False,False,False,False,True
2013-01-04,False,False,False,False,False,True


### 1.1.5 연산자
#### 1.1.5.1 통계

In [52]:
#컬럼별 평균
df.mean()

A   -0.587567
B    0.534439
C    0.574270
D    5.000000
F    3.000000
dtype: float64

In [53]:
#Index 별 평균
df.mean(1)

2013-01-01    1.101976
2013-01-02    1.878158
2013-01-03    1.022952
2013-01-04    1.482412
2013-01-05    2.423662
2013-01-06    1.936605
Freq: D, dtype: float64

In [54]:
s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)

In [55]:
pd.Series([1,3,5,np.nan,6,8], index=dates)

2013-01-01    1.0
2013-01-02    3.0
2013-01-03    5.0
2013-01-04    NaN
2013-01-05    6.0
2013-01-06    8.0
Freq: D, dtype: float64

In [56]:
df.sub(s, axis='index')

Unnamed: 0,A,B,C,D,F
2013-01-01,,,,,
2013-01-02,,,,,
2013-01-03,-2.37277,-1.943479,-0.568989,4.0,1.0
2013-01-04,-3.759652,-1.47643,-4.351856,2.0,0.0
2013-01-05,-5.675534,-2.882089,-3.324069,0.0,-1.0
2013-01-06,,,,,


In [57]:
df.sub(s, axis='index').dropna()

Unnamed: 0,A,B,C,D,F
2013-01-03,-2.37277,-1.943479,-0.568989,4.0,1.0
2013-01-04,-3.759652,-1.47643,-4.351856,2.0,0.0
2013-01-05,-5.675534,-2.882089,-3.324069,0.0,-1.0


#### 1.1.5.2 Apply

In [58]:
sdf

NameError: name 'sdf' is not defined

In [None]:
df.apply(np.cumsum)

In [None]:
df.apply(lambda x:x.max() - x.min())

#### 1.1.5.3 Historamming

In [None]:
s = pd.Series(np.random.randint(0, 7, size=10))

In [None]:
s

In [None]:
#값별 빈도수 
s.value_counts()

#### 1.1.5.4 문자열 함수

In [None]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [None]:
s.str.lower()

### 1.1.6 병합

#### 1.1.6.1 Concat

In [None]:
df = pd.DataFrame(np.random.randn(10,4))

In [None]:
df

In [None]:
pieces = [df[:3], df[3:7],df[7:]]

In [None]:
pd.concat(pieces)

#### 1.1.6.2 join

In [None]:
left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})

In [None]:
left

In [None]:
right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})

In [None]:
right

In [None]:
pd.merge(left, right, on='key')

#### 1.1.6.3 Append

In [None]:
df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])

In [None]:
df

In [None]:
s = df.iloc[3]

In [None]:
df.append(s, ignore_index=True)

### 1.1.7 그룹핑

In [None]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                          'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

In [None]:
df

In [None]:
df.groupby('A').sum()

In [None]:
df.groupby(['A','B']).sum()

### 1.1.8 변형

#### 1.1.8.1 스택

In [None]:
tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
                     'foo', 'foo', 'qux', 'qux'],
                    ['one', 'two', 'one', 'two',
                     'one', 'two', 'one', 'two']]))

In [None]:
tuples

In [None]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [None]:
index

In [None]:
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])

In [None]:
df2 = df[:4]

In [None]:
df2

In [None]:
#colume 을 index화 
stacked = df2.stack()

In [None]:
stacked

In [None]:
type(stacked)

In [None]:
stacked.index

In [None]:
type(stacked.unstack())

In [None]:
stacked

In [None]:
stacked.unstack()

In [None]:
stacked.unstack(1)

In [None]:
stacked.unstack(2)

#### 1.1.8.2 피벗테이블

In [None]:
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                   'B' : ['A', 'B', 'C'] * 4,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D' : np.random.randn(12),
                   'E' : np.random.randn(12)})

In [None]:
df

In [None]:
pd.pivot_table(df, values='D', index=['A','B'], columns=['C'])

### 1.1.9 시계열

In [None]:
rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [None]:
ts = pd.Series(np.random.randint(0,500,len(rng)), index=rng)

In [None]:
ts.resample('5Min').sum()

In [None]:
rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')

In [None]:
ts = pd.Series(np.random.randn(len(rng)), rng)

In [None]:
rng

In [None]:
ts

In [None]:
ts_utc = ts.tz_localize('UTC')

In [None]:
ts_utc

In [None]:
ts_utc.tz_convert('US/Eastern')

In [None]:
rng = pd.date_range('1/1/2012', periods=5, freq = 'M')

In [None]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [None]:
ts

In [None]:
ps = ts.to_period()

In [None]:
ps

In [None]:
ps.to_timestamp()

In [None]:
 prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')

In [None]:
ts = pd.Series(np.random.randn(len(prng)), prng)

In [None]:
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9

In [None]:
ts.head()

### 1.1.10 분류

In [None]:
df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

In [None]:
df["grade"] = df["raw_grade"].astype("category")

In [None]:
df["grade"]

In [None]:
df["grade"].cat.categories = ["very good", "good", "very bad"]

In [None]:
df['grade']

In [None]:
df

In [None]:
df.dtypes

In [None]:
df.sort_values(by='grade')

In [None]:
df.groupby("grade").size()

In [None]:
df.groupby("grade")

### 1.1.11 그래프

In [None]:
import matplotlib.pyplot as plt

In [None]:
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

In [None]:
ts = ts.cumsum()

In [None]:
ts.plot()

In [None]:
plt.figure()

In [None]:
plt.show()

In [None]:
df = pd.DataFrame(np.random.randn(1000,4), index=ts.index, columns=['A', 'B', 'C', 'D'])

In [None]:
df = df.cumsum()

In [None]:
plt.figure()

In [None]:
df.plot()

In [None]:
plt.legend(loc='best')

In [None]:
plt.show()

### 1.1.12 데이터 입출력

In [None]:
df.to_csv('foo.csv')

In [None]:
pd.read_csv('foo.csv')

In [None]:
df.to_json('foo.json')

In [None]:
pd.read_json('foo.json')

In [None]:
df.to_hdf('foo.h5', 'df')

In [None]:
pd.read_hdf('foo.h5','df')

In [None]:
df.to_pickle('foo.pic')

In [None]:
pd.read_pickle('foo.pic')

In [None]:
df.to_excel('foo.xlsx', sheet_name='Sheet1')

In [None]:
pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA'])

#### 용량비교 : 

In [None]:
!dir foo*

### Gotchas

In [None]:
if pd.Series([False, True, False]) is not None:
    print("I was not None")

In [None]:
if pd.Series([False, True, False]).any():
    print("I am any")

In [None]:
pd.Series([True]).bool()

In [None]:
pd.Series([False]).bool()

In [None]:
pd.DataFrame([[True]]).bool()

In [None]:
pd.DataFrame([[False]]).bool()

## 1.2 전체 API 목록

http://pandas.pydata.org/pandas-docs/stable/api.html

## 주식 데이터 표현 예제

In [None]:
import pandas as pd
import pandas_datareader.data
import requests
import datetime
import matplotlib.pyplot as plt

CODE='005930.KS'
df = pandas_datareader.data.DataReader(CODE, "yahoo", '2017-01-01', datetime.datetime.now())

df['MA_5'] = df['Adj Close'].rolling(window=5, center=False).mean()
df['MA_20'] = df['Adj Close'].rolling(window=20, center=False).mean()
df['diff'] = df['MA_5'] - df['MA_20']

In [None]:
fig = plt.gcf()
fig.set_size_inches(16,8)

#price(가격)
price_chart = plt.subplot2grid((4,1),(0,0),rowspan=2)
price_chart.plot(df.index, df['Adj Close'], label = 'Adj Close')
price_chart.plot(df.index, df['MA_5'], label = 'MA_5')
price_chart.plot(df.index, df['MA_20'], label = 'MA_20')

plt.title("Samsung 2017")
plt.legend(loc='best')

vol_chart = plt.subplot2grid((4,1),(2,0), rowspan = 1)
vol_chart.bar(df.index, df['Volume'], color = 'c')

signal_chart = plt.subplot2grid((4,1), (3,0), rowspan=1)
signal_chart.plot(df.index, df['diff'].fillna(0), color = 'g')
plt.axhline(y=0, linestyle = '--', color = 'k')

prev_key = prev_val = 0
for key, val in df['diff'][1:].iteritems():
    if val == 0:
        continue
    elif val * prev_val < 0 and val > prev_val:
        print('GOLD', key, val)
        price_chart.annotate('Golden', xy = (key, df['MA_20'][key]), xytext=(10,-30), 
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))        
        signal_chart.annotate('BUY', xy = (key, df['diff'][key]), xytext=(10,-30),
                              textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))        
    elif val * prev_val < 0 and val < prev_val:
        print('DEAD', key, val)
        price_chart.annotate('Dead', xy = (key, df['MA_20'][key]), xytext=(10,30),
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
        signal_chart.annotate('Sell', xy = (key, df['diff'][key]), xytext=(10,30),
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
    prev_key, prev_val = key, val


In [None]:
plt.show()

In [None]:
import pandas as pd
import pandas_datareader.data
import requests
import datetime
import matplotlib.pyplot as plt

CODE='005930.KS'
df = pandas_datareader.data.DataReader(CODE, "yahoo", '2017-01-01', datetime.datetime.now())

In [None]:
df.tail(3)

In [None]:
plt.show()