### pandas study

학습 날짜 : 2019년 08월 24일

학습 내용 : pandas pct_change, diff

학습 참고
- pct_change : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html
- diff : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html

공식 문서를 읽고 예제 코드를 따라해보기 :)

In [1]:
import pandas as pd

### pct_change

```
DataFrame.pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, **kwargs)[source]
Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

parameters

periods : int, default 1
Periods to shift for forming percent change.

fill_method : str, default ‘pad’
How to handle NAs before computing percent changes.

limit : int, default None
The number of consecutive NAs to fill before stopping.

freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay()).

**kwargs
Additional keyword arguments are passed into DataFrame.shift or Series.shift.
```

In [2]:
# example

s = pd.Series([90, 91, 95])
s

0    90
1    91
2    95
dtype: int64

In [3]:
s.pct_change()

0         NaN
1    0.011111
2    0.043956
dtype: float64

In [7]:
# how to work?

print((91-90)/90, (95-91)/91)

0.011111111111111112 0.04395604395604396


In [8]:
s.pct_change(periods=2)

0         NaN
1         NaN
2    0.055556
dtype: float64

In [9]:
# how to work?

print((95-90)/90)

0.05555555555555555


In [11]:
s = pd.Series([90, 91, None, 85])
s

0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64

In [14]:
# See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

s.pct_change(fill_method='ffill')

0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

In [16]:
# how to work?

print((91-90)/90, (91-91)/91, (85-91)/91)

0.011111111111111112 0.0 -0.06593406593406594


In [18]:
# Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

df = pd.DataFrame({
    'FR': [4.0405, 4.0963, 4.3149],
    'GR': [1.7246, 1.7482, 1.8519],
    'IT': [804.74, 810.01, 860.13]},
    index=['1980-01-01', '1980-02-01', '1980-03-01']
)

In [19]:
df

Unnamed: 0,FR,GR,IT
1980-01-01,4.0405,1.7246,804.74
1980-02-01,4.0963,1.7482,810.01
1980-03-01,4.3149,1.8519,860.13


In [20]:
df.pct_change()

Unnamed: 0,FR,GR,IT
1980-01-01,,,
1980-02-01,0.01381,0.013684,0.006549
1980-03-01,0.053365,0.059318,0.061876


In [21]:
# Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

df = pd.DataFrame({
    '2016': [1769950, 30586265],
    '2015': [1500923, 40912316],
    '2014': [1371819, 41403351]},
    index=['GOOG', 'APPL']
)
df

Unnamed: 0,2016,2015,2014
GOOG,1769950,1500923,1371819
APPL,30586265,40912316,41403351


In [22]:
df.pct_change(axis='columns')

Unnamed: 0,2016,2015,2014
GOOG,,-0.151997,-0.086016
APPL,,0.337604,0.012002


In [26]:
print((1500923-1769950)/1769950, (1371819-1500923)/1500923)

-0.1519969490663578 -0.08601640457238646


---------------------------

### Diff


```
DataFrame.diff(self, periods=1, axis=0)[source]
First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).

periods : int, default 1
Periods to shift for calculating difference, accepts negative values.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Take difference over rows (0) or columns (1).

New in version 0.16.1..

```

In [27]:
df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
                   'b': [1, 1, 2, 3, 5, 8],
                   'c': [1, 4, 9, 16, 25, 36]}
                 )
df

Unnamed: 0,a,b,c
0,1,1,1
1,2,1,4
2,3,2,9
3,4,3,16
4,5,5,25
5,6,8,36


In [28]:
df.diff()

Unnamed: 0,a,b,c
0,,,
1,1.0,0.0,3.0
2,1.0,1.0,5.0
3,1.0,1.0,7.0
4,1.0,2.0,9.0
5,1.0,3.0,11.0


In [29]:
# Difference with previous column

df.diff(axis=1)

Unnamed: 0,a,b,c
0,,0.0,0.0
1,,-1.0,3.0
2,,-1.0,7.0
3,,-1.0,13.0
4,,0.0,20.0
5,,2.0,28.0


In [30]:
# 간격 3씩 차이남

df.diff(periods=3)

Unnamed: 0,a,b,c
0,,,
1,,,
2,,,
3,3.0,2.0,15.0
4,3.0,4.0,21.0
5,3.0,6.0,27.0


In [32]:
# 거꾸로가기

df.diff(periods=-1)

Unnamed: 0,a,b,c
0,-1.0,0.0,-3.0
1,-1.0,-1.0,-5.0
2,-1.0,-1.0,-7.0
3,-1.0,-2.0,-9.0
4,-1.0,-3.0,-11.0
5,,,
