FOREX HistData.com ETL Tools
The good people at HistData.com have setup the infrastructure necessary to provide FOREX data for free. This is awesome & if possible, you should donate or purchase some of their services if you are going to use the data. The tools contained herein will download, merge and convert the datasets so they are usable, but not (yet) easily updateable. The entire sets have to be rebuilt when updates occur (but not downloaded in their entirety).
A Makefile is included for some common operations. They are documented below.
This will get all files from HistData.com as MetaTrader CSVs and put the contents in
This will combine all the yearly & monthly files into a single CSV per instrument (e.g. AUDUSD.csv). Since the data from HistData.com have provided 1 minute data, the contents of merged data is in
Using Pandas, this will invoke the convert_data.py script and by default create 60 minute data sets. The end result will be stored in
SYMBOL=EURUSD make download
All commands mentioned above can be used with
SYMBOL=<symbols> environment variable, otherwise script will download / merge / convert all available symbols
Optional year range
START_YEAR=2005 END_YEAR=2010 make download
This commands downloads data for all available symbols between years 2005 - 2010. Can be used together with
The following are required:
- Python (2.7 or greater)
Yeah, yeah, how do I get FOREX Data working in Pandas as DataFrames!???
If you want to just get some FOREX data as Pandas DataFrames, just do this:
$ make download $ make merge
import pandas as pd source_path="./data/60M/" source="EURUSD.csv" df = pd.read_csv(os.path.join(source_path, source), sep=',')
Why do this?
I wanted to try some ML models on FOREX data but didn't have a great data source at the time. I am publishing this because distributing public data should be easier than writing bash scripts to get random tokens in order to download data about the world's currencies. Why isn't exchange rate data completely public?
Maybe in the future
At some point, I may do the following:
make updateshould do daily updates
- stop using bash, this can all be in Python... should have started there
- create pickled versions of various time-series
- incorporate the gaps found -- those are ignored / deleted at the moment