# 1. Pandas

- Definition : pandas is a software library written for the Python programming language for data manipulation and analysis.
- The name is derived from the term "Panel data", an econometrics term for multidimensional structured data sets.
> balanced panel
![balanced panel](https://wikimedia.org/api/rest_v1/media/math/render/svg/de4ab9449dffb05244e681551e6f3ce710856ac6)
unbalanced panel
![unbalanced panel](https://wikimedia.org/api/rest_v1/media/math/render/svg/fad5580f0bc2deadc1a110b647dded40867600c0)

- 데이터 분석에 특화되어있는 Library 데이터 분석기능 시 R등 다른 언어를 사용해야 하지만 데이터 가공 및 분석을 위해 
- 가장 큰 특징 : Index 설정 가능


## 1.1 10 Minute to pandas (http://pandas.pydata.org/pandas-docs/stable/10min.html)
 - [Object Creation](http://pandas.pydata.org/pandas-docs/stable/10min.html#object-creation)
 - [Viewing Data](http://pandas.pydata.org/pandas-docs/stable/10min.html#viewing-data)
 - [Selection](http://pandas.pydata.org/pandas-docs/stable/10min.html#selection)
     - Selection by Label
     - Selection by Position
     - Boolean Indexing
     - Setting
 - [Missing Data](http://pandas.pydata.org/pandas-docs/stable/10min.html#missing-data)
 - [Operations](http://pandas.pydata.org/pandas-docs/stable/10min.html#operations)
     - Stats
     - Apply
     - Histogramming
     - String Methods
 - [Merge](http://pandas.pydata.org/pandas-docs/stable/10min.html#merge)
     - Concat
     - Join
     - Append
 - [Grouping](http://pandas.pydata.org/pandas-docs/stable/10min.html#grouping)
 - [Reshaping](http://pandas.pydata.org/pandas-docs/stable/10min.html#reshaping)
     - Stack
     - Pivot Tables
 - [Time Series](http://pandas.pydata.org/pandas-docs/stable/10min.html#time-series)
 - [Categoricals](http://pandas.pydata.org/pandas-docs/stable/10min.html#categoricals)
 - [Plotting](http://pandas.pydata.org/pandas-docs/stable/10min.html#plotting)
 - [Getting Data in/out](http://pandas.pydata.org/pandas-docs/stable/10min.html#getting-data-in-out)
 - [Gotchas](http://pandas.pydata.org/pandas-docs/stable/10min.html#gotchas)
 

## 1.1 Pandas
    1. Dictionary와 연관(key Value)
     - Series -> Dictionary : variable.to_dict()
     - Dictionary -> Series : Series(variable)
    2. Null 값 찾기 : pd.isnull()
    3. Series 합치기 : series1 + series2
    4. Series 이름짓기 : ~.name = "이름"
    5. Index 이름짓기 : ~.index.name = "이름"


## 1.2 Series
## 1.3 DataFrame
1. 표와 같은 스프레드 형식의 자료구조(2차원)



## 1.4 Usage
### 1.4.1 Object 생성
### 1.4.2 Viewing Data

In [None]:
import pandas as pd
import pandas_datareader.data
import requests
import datetime

In [None]:
CODE='005930.KS'
df = pandas_datareader.data.DataReader(CODE, "yahoo", '1970-01-01', datetime.datetime.now())
df

In [None]:
df.columns

In [None]:
df.index

In [None]:
df.ix[0]

In [None]:
df.head(3)

In [None]:
df.tail(2)

In [None]:
df['2013']

In [None]:
df['2017-01']

In [None]:
df['2016-07-01':'2017-12-31']

In [None]:
df['2017-01-01':]

In [None]:
df.describe()

## 주식 데이터 표현 예제

In [None]:
import pandas as pd
import pandas_datareader.data
import requests
import datetime
import matplotlib.pyplot as plt

CODE='005930.KS'
df = pandas_datareader.data.DataReader(CODE, "yahoo", '2017-01-01', datetime.datetime.now())

df['MA_5'] = df['Adj Close'].rolling(window=5, center=False).mean()
df['MA_20'] = df['Adj Close'].rolling(window=20, center=False).mean()
df['diff'] = df['MA_5'] - df['MA_20']

In [None]:
fig = plt.gcf()
fig.set_size_inches(16,8)

#price(가격)
price_chart = plt.subplot2grid((4,1),(0,0),rowspan=2)
price_chart.plot(df.index, df['Adj Close'], label = 'Adj Close')
price_chart.plot(df.index, df['MA_5'], label = 'MA_5')
price_chart.plot(df.index, df['MA_20'], label = 'MA_20')

plt.title("Samsung 2017")
plt.legend(loc='best')

vol_chart = plt.subplot2grid((4,1),(2,0), rowspan = 1)
vol_chart.bar(df.index, df['Volume'], color = 'c')

signal_chart = plt.subplot2grid((4,1), (3,0), rowspan=1)
signal_chart.plot(df.index, df['diff'].fillna(0), color = 'g')
plt.axhline(y=0, linestyle = '--', color = 'k')

prev_key = prev_val = 0
for key, val in df['diff'][1:].iteritems():
    if val == 0:
        continue
    elif val * prev_val < 0 and val > prev_val:
        print('GOLD', key, val)
        price_chart.annotate('Golden', xy = (key, df['MA_20'][key]), xytext=(10,-30), 
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))        
        signal_chart.annotate('BUY', xy = (key, df['diff'][key]), xytext=(10,-30),
                              textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))        
    elif val * prev_val < 0 and val < prev_val:
        print('DEAD', key, val)
        price_chart.annotate('Dead', xy = (key, df['MA_20'][key]), xytext=(10,30),
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
        signal_chart.annotate('Sell', xy = (key, df['diff'][key]), xytext=(10,30),
                             textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
    prev_key, prev_val = key, val


In [None]:
plt.show()

In [None]:
import pandas as pd
import pandas_datareader.data
import requests
import datetime
import matplotlib.pyplot as plt

CODE='005930.KS'
df = pandas_datareader.data.DataReader(CODE, "yahoo", '2017-01-01', datetime.datetime.now())

In [None]:
df.tail(3)

1. Clipboard에 있는것을 pandas형식으로 저장
 - 웹페이지의 단어를 Copy : <a href ="http://www.koreabaseball.com/teamrank/teamrank.aspx"> 야구 순위 </a>

In [None]:
df = pd.read_clipboard()
