In [1]:
import datetime
import urllib.request
import json
import pandas as pd

## FinMind
The data of an ETF that tracking `TAIEX` is collected from a remarkable project call **FinMind**, a free and generous data source.

check:  
+ [Website](https://finmindtrade.com/)  
+ [Github](https://github.com/FinMind/FinMind)

In [2]:
ds = datetime.date(2021, 9, 22)

url = "https://api.finmindtrade.com/api/v4/data"

params = {
    'dataset': 'TaiwanStockPriceTick',
    'data_id': '006204',
    'start_date': ds.strftime('%Y-%m-%d'),
}

In [3]:
def get_paramed_url(url:str, params:dict[str, str]) -> str:
    """
    concat a url with parameters

    args:
        url - bare url
        params - key-value pairs for parameters
    return:
        parametered url
    """
    return url + '?{}'.format('&'.join([f'{k}={v}' for k, v in params.items()]))

get_paramed_url(url, params)

'https://api.finmindtrade.com/api/v4/data?dataset=TaiwanStockPriceTick&data_id=006204&start_date=2021-09-22'

A simple get http request could do the trick.

An unregistered connection could request 300 time per hour.  
Fortunately, the usage of this project won't excess it.

The data comes in json format.

In [4]:
res = urllib.request.urlopen(get_paramed_url(url, params))
content = res.read()

content

b'{"msg":"success","status":200,"data":[{"date":"2021-09-22","stock_id":"006204","deal_price":90.1,"volume":1,"Time":"09:00:11.161","TickType":1},{"date":"2021-09-22","stock_id":"006204","deal_price":90.0,"volume":10,"Time":"13:20:24.969","TickType":1},{"date":"2021-09-22","stock_id":"006204","deal_price":90.0,"volume":10,"Time":"13:20:26.038","TickType":1}]}'

In [5]:
json_obj = json.loads(content.decode('utf8'))
data = json_obj['data']

data

[{'date': '2021-09-22',
  'stock_id': '006204',
  'deal_price': 90.1,
  'volume': 1,
  'Time': '09:00:11.161',
  'TickType': 1},
 {'date': '2021-09-22',
  'stock_id': '006204',
  'deal_price': 90.0,
  'volume': 10,
  'Time': '13:20:24.969',
  'TickType': 1},
 {'date': '2021-09-22',
  'stock_id': '006204',
  'deal_price': 90.0,
  'volume': 10,
  'Time': '13:20:26.038',
  'TickType': 1}]

make a pandas dataframe for easy organization and display

In [6]:
df = pd.DataFrame(data)
df

Unnamed: 0,date,stock_id,deal_price,volume,Time,TickType
0,2021-09-22,6204,90.1,1,09:00:11.161,1
1,2021-09-22,6204,90.0,10,13:20:24.969,1
2,2021-09-22,6204,90.0,10,13:20:26.038,1


merge date and time column, and take selected data

In [7]:
df['datetime'] = pd.to_datetime(df.date.str.strip() + df.Time.str.strip(), format='%Y-%m-%d%H:%M:%S.%f').dt.tz_localize('Asia/Taipei')
df['datetime'] = df['datetime'].dt.floor('s')

df = df[['datetime', 'stock_id', 'deal_price', 'volume']].rename(columns={'deal_price': 'price'})
df

Unnamed: 0,datetime,stock_id,price,volume
0,2021-09-22 09:00:11+08:00,6204,90.1,1
1,2021-09-22 13:20:24+08:00,6204,90.0,10
2,2021-09-22 13:20:26+08:00,6204,90.0,10
