Using ZigZag
===
[ZigZag](https://pypi.python.org/pypi/ZigZag) is a (very) small library [I](http://twitter.com/generativist) wrote for calculating the peaks and valleys of a sequence (e.g. time series data). It also can calculate the [maximum drawdown](http://en.wikipedia.org/wiki/Drawdown_(economics)), a useful metric for trading analysis. The repository is on github at [https://github.com/jbn/ZigZag](https://github.com/jbn/ZigZag). Prior to version `0.1.4` it optionally used [`numba`](https://github.com/numba/numba); starting with version `0.1.4`, I switched to [`Cython`](http://cython.org/). 

This notebook demonstrates how to use ZigZag, and draws attention to a few caveats.

Installation
---
`pip install zigzag`

Basic Usage
---

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from zigzag import *

In [None]:
# This is not nessessary to use zigzag. It's only here so that
# this example is reproducible.
np.random.seed(1997)

In [None]:
X = np.cumprod(1 + np.random.randn(100) * 0.01)
pivots = peak_valley_pivots(X, 0.03, -0.03)

In [None]:
from stk.dao import Daily
from stk.graphs import column_show, Candle

In [None]:
# df = Daily().
daily = Daily()
df = daily.read_by_code('600004')
df = df[df.date > '2022-01-01'].copy().reset_index(drop=True)[['date', 'close', 'open', 'high', 'low', 'chg']]
# column_show([Candle('byjc', df).renderV2()])

In [None]:
df

In [None]:
def plot_pivots(X, pivots):
    plt.xlim(0, len(X))
    plt.ylim(X.min()*0.99, X.max()*1.01)
    plt.plot(np.arange(len(X)), X, 'k:', alpha=0.5)
    plt.plot(np.arange(len(X))[pivots != 0], X[pivots != 0], 'k-')
    plt.scatter(np.arange(len(X))[pivots == 1], X[pivots == 1], color='g')
    plt.scatter(np.arange(len(X))[pivots == -1], X[pivots == -1], color='r')

In [None]:
X = df.close.to_numpy()
pivots = peak_valley_pivots(X, 0.15, -0.1)

The following plot illustrates how the sequence was annotated. 

In [None]:
plot_pivots(X, pivots)

In [None]:
idx_peak = [i for i, v in enumerate(list(pivots)) if v == 1 and i !=0]
idx_trough = [i for i, v in enumerate(list(pivots)) if v == -1 and i != len(pivots)-1]
idx = list(zip(idx_trough, idx_peak)) 
print(idx)

In [None]:
idx_peak = [v[1] for v in list(idx)]
idx_trough = [v[0] for v in idx]
idx_peak, idx_trough

In [None]:
def cal_percentage(idx_pairs, df):
    change = []
    print(idx_pairs)
    for idx in idx_pairs:
        # print(idx[1])
        peak = df.loc[idx[1]].close
        trough = df.loc[idx[0]].close
        change.append('')
        change.append((peak-trough)/trough)
    return change


df2 = df.iloc[idx_peak + idx_trough].sort_index() #.reset_index(drop=True)
df2['change'] = cal_percentage(idx, df2)
df2

In [None]:
plot_pivots(X, pivots)

The following shows how you can use `pivots_to_modes` to inspect the segments.

In [None]:
modes = pivots_to_modes(pivots)
pd.Series(X).pct_change().groupby(modes).describe()

Calculate the peak to valley returns for all of the segments.

In [None]:
compute_segment_returns(X, pivots)

Finally, compute the oft-quoted (in financial literature) `max_drawdown`.

In [None]:
max_drawdown(X)

Pandas Compatability
---
The `peak_valley_pivots` function works on pandas `series` assuming the index is either a DateTimeIndex or is \[0, n). [Pandas](http://pandas.pydata.org/) is great.

In [None]:
from pandas_datareader import get_data_yahoo

X = get_data_yahoo('GOOG')['Adj Close']
pivots = peak_valley_pivots(X.values, 0.2, -0.2)
ts_pivots = pd.Series(X, index=X.index)
ts_pivots = ts_pivots[pivots != 0]
X.plot()
ts_pivots.plot(style='g-o');

# `#WONTFIX`

[in PR#13 `ytian` writes](https://github.com/jbn/ZigZag/pull/13),

> the code has some bugs for some test cases:
> code:
>
> a = np.array([1, 1.2, 1.5, 1.8, 2.4, 3.3, 2.4, 1.5, 1.6])  
> peak_valley_pivots(a, 0.2, -0.2) 
> 
> output (wrong result)  
> array([-1, 0, 0, 0, 0, 1, 0, 0, 1]) 
> 
> output (after fix)  
> array([-1, 0, 0, 0, 0, 1, 0, -1, 1])


Visually,

In [None]:
X = np.array([1, 1.2, 1.5, 1.8, 2.4, 3.3, 2.4, 1.5, 1.6])
pivots = peak_valley_pivots(X, 0.2, -0.2)
plot_pivots(X, pivots)

Intuitively, what he is saying seems to make sense. 

- 3.3 is the peak
- 1.5 is the lowest point after it
- 1.6 is above 1.5 so can't be the valley pivot

The difficulty here is that 1.6 is not a peak, either. A local one, yes, but not according to the segmenter which requires `>20%` which 1.6 above 1.5 does not satisfy.