- Ubuntu 18.04
- Python 3.6
- Docker 18.09.4
- docker-compose 1.24.0
- Selenium 3.141.0
- chrome-browser 71.0.3578.98-0
- chromedriver 2.45
- MariaDB 10.3
- Chin-Ho, Lin
- tainvecs@gmail.com
- GitLab
This project provides a foreign exchange spider of 34 banks in Taiwan. The spider crawls latest foreign exchange data from the websites of banks and save them to json files or MariaDB. The project environment can be build and set up with docker-compose. The project also supports crontab schedule and email error report for foreign exchange spiders.
- Structure
- Docker Image
- fx_tw-crawler
- fx_tw-mariadb
- Docker Container
- fx
- db-my
- Shared Volume
- spider_code
- mysql_data
- mysql_code
- Network
- db-net
- Docker Image
- Build
- Option 1: Build with Docker Compose
- Build Docker Images and Set Up Network and Shared Volumes
- Create and Run Containers
- Option 2: Build with Dockerfile
- Docker Images
- Option 1: Build Docker Images with Dockerfiles
- Option 2: Pull Docker Images from Docker Hub
- Docker Volumes
- Docker Network
- Docker Containers
- Docker Images
- Option 1: Build with Docker Compose
- Set Up
- Set Up MariaDB (Optional)
- Connecting to the MySQL Server
- Change the Root Password
- Create a New User and Grant All Privileges
- Set Up MariaDB Config File
- Initailize the Database for Saving Crawled Data
- Set up Email Config for Sending Error Report (Optional)
- Set Up MariaDB (Optional)
- Run
- Run a Foreign Exchange Crawler
- Run a Crawler and Save Crawled Data to a Json File
- Run a Crawler and Save Crawled Data to MariaDB
- Run a Crawler that Save Crawled Data to MariaDB and Email an Error Report
- Set Up Schedule for Foreign Exchange Crawler (Optional)
- Run a Foreign Exchange Crawler
- Docker Image
- fx_tw-crawler
- The docker image of spider that crawls the foreign exchange data of banks in Taiwan
- fx_tw-mariadb
- The docker image of MariaDB that saves the crawled data
- fx_tw-crawler
- Docker Container
- fx
- The container instance of fx_tw-crawler
- db-my
- The container instance of fx_tw-mariadb
- fx
- Shared Volume
- spider_code
- The share volume for the code of a foreign exchange spider
- mysql_data
- The share volume of MySQL default data directory in a db-my container
- mysql_code
- The share volume for the code of a db-my container
- spider_code
- Network
- db-net
- Network between fx and db-my container
- db-net
- Option 1: Build with Docker Compose
- Build Docker Images and Set Up Network and Shared Volumes
-
docker-compose build --no-cache
-
- Create and Run Containers
-
docker-compose up -d
-
- Build Docker Images and Set Up Network and Shared Volumes
- Option 2: Build with Dockerfile
- Docker Images
- Option 1: Build Docker Images with Dockerfiles
-
docker build -t tainvecs/fx_tw-crawler --no-cache ./fx_tw-crawler docker build -t tainvecs/fx_tw-mariadb --no-cache ./fx_tw-mariadb
-
- Option 2: Pull Docker Images from Docker Hub
-
docker pull tainvecs/fx_tw-crawler docker pull tainvecs/fx_tw-mariadb
-
- Option 1: Build Docker Images with Dockerfiles
- Docker Volumes
-
docker volume create "spider_code" docker volume create "mysql_data" docker volume create "mysql_code"
-
- Docker Network
-
docker network create db-net
-
- Docker Containers
-
docker run \ -itd \ --name fx \ --network db-net \ -v spider_code:/spider_code \ tainvecs/fx_tw-crawler docker run \ -d \ --name db-my \ --network db-net \ -v mysql_data:/var/lib/mysql \ -v mysql_code:/mysql_code \ -e MYSQL_ROOT_PASSWORD=root \ tainvecs/fx_tw-mariadb
-
- Docker Images
- Set Up MariaDB (Optional)
- Execute "/bin/bash" Command in db-my Container
-
docker exec -it db-my /bin/bash
-
- Connecting to the MySQL Server
-
mysql -u root -P 3306 -p
- default password is "root"
-
- Change the Root Password
-
SET PASSWORD FOR 'root'@'%' = PASSWORD('new_root_password'); SET PASSWORD FOR 'root'@'localhost' = PASSWORD('new_root_password'); flush privileges;
- "new_root_password"
-
- Create a New User and Grant All Privileges
-
CREATE USER 'user_name'@'%' IDENTIFIED BY 'new_user_password'; GRANT ALL PRIVILEGES ON *.* TO 'user_name'@'%';
- "user_name", "new_user_password"
-
- Exit from db-my Container
-
exit
-
- Execute "/bin/bash" Command in fx Container
-
docker exec -it fx /bin/bash
-
- Set Up MariaDB Config File "/spider_code/res/mariadb.config" (more info)
-
{ "mariadb_host": "db-my", "mariadb_port": 3306, "mariadb_user": "user_name", "mariadb_password": "user_password", "mariadb_database": "fx_tw", "mariadb_charset": "utf8" }
- "user_name", "user_password"
-
- Initailize the Database for Saving Crawled Data (more info)
-
python3 /spider_code/src/init-mariadb.py \ --mariadb_config /spider_code/res/mariadb.config \ --mariadb_init_template /spider_code/res/mariadb_init_template.json \ --bank_table /spider_code/res/bank_table.json
-
- Exit from fx Container
-
exit
-
- Execute "/bin/bash" Command in db-my Container
- Set up Email Config for Sending Error Report (Optional)
- Gmail app passwords should be generated in advance
- Execute "/bin/bash" Command in fx Container
-
docker exec -it fx /bin/bash
-
- Set Up Email Config File "/spider_code/res/email.config" (more info)
-
{ "from_user": "username_from", "from_email": "email_address_from", "password": "email_app_password", "to_user": "username_to", "to_email": "email_address_to", "subject": "subject" }
- "email_app_password"
- gmail app passwords
- "username_from", "email_address_from", "username_to", "email_address_to", "subject"
-
- Exit from fx Container
-
exit
-
- Run a Foreign Exchange Crawler (more info)
- Attach to the fx Container
-
docker attach fx
-
- Run a Crawler and Save Crawled Data to a Json File "/spider_code/bank-data.json"
-
python3 /spider_code/crawler.py \ --bank_table local:/spider_code/res/bank_table.json \ --currency2id local:/spider_code/res/currency2id.json \ --out_file /spider_code/bank-data.json \ --out_type json_lines
-
- Run a Crawler and Save Crawled Data to MariaDB
-
python3 /spider_code/crawler.py \ --bank_table local:/spider_code/res/bank_table.json \ --currency2id local:/spider_code/res/currency2id.json \ --mariadb_config /spider_code/res/mariadb.config
-
- Run a Crawler that Save Crawled Data to MariaDB and Email an Error Report
-
python3 /spider_code/crawler.py \ --bank_table local:/spider_code/res/bank_table.json \ --currency2id local:/spider_code/res/currency2id.json \ --email_config /spider_code/res/email.config \ --mariadb_config /spider_code/res/mariadb.config
-
- Detach from the fx Container
- Press Ctrl+p Ctrl+q
- Attach to the fx Container
- Set Up Schedule for Foreign Exchange Crawler (Optional) (more info)
- Attach to the fx Container
-
docker attach fx
-
- Edit the Command of Schedule Crawler
- "/spider_code/schedule/schedule.sh"
- Default scheduled crawler saves crawled data to MariaDB and a json file in "/spider_code/output/".
- Default scheduled crawler also uses the email_config "/spider_code/res/email.config" to send error report.
- Edit the Schedule Task
- "/spider_code/schedule/schedule_start.sh"
- Default schedule task runs the crawler every ten minutes from 09:00 to 18:00 everyday.
- Default schedule task also redirects stdin and stderr to a log file "/spider_code/log/crontag.log".
- Start a Schedule Task for a Foreign Exchange Crawler
-
bash /spider_code/schedule/schedule_start.sh
-
- Detach from the fx Container
- Press Ctrl+p Ctrl+q
- Attach to the fx Container