Update params.yaml
file with your params. Run and get Google Maps Places and Reviews. And Upload to GCS.
- Install Dependencies:
make install
- Set Environment Variables:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/crawler_gcp_keyfile.json"
export GCS_BUCKET_NAME="your-bucket-name"
export GCS_BLOB_NAME="your-blob-name"
- Get the results by running:
make run
Remember to add your params in params.yaml file.
- Clean repo:
make clean
- Clean repo and results:
make clean_all
- Build Docker Image
docker build -t gmaps-scraper .
- Run Docker Container
docker run -it --rm -m 4g --shm-size=2g \
-v $(pwd)/crawler_gcp_keyfile.json:/app/crawler_gcp_keyfile.json \
-e GCS_BUCKET_NAME="your-bucket-name" \
-e GCS_BLOB_NAME="your-blob-name" \
gmaps-scraper
- Build Docker Image
docker build -t gmaps-scraper .
-
Set Docker Proxy in Airflow docker-compose
-
Add DockerOperator to your DAG
run_scraper = DockerOperator(
task_id="e_gmaps-scraper",
image="gmaps-scraper",
api_version="auto",
auto_remove=True,
environment={
"GCS_BUCKET_NAME": "your-bucket-name",
"GCS_BLOB_NAME": "your-blob-name",
},
command="make run",
mounts=[
Mount(
source="<your-gcp-keyfile>", # local path
target="/app/crawler_gcp_keyfile.json",
type="bind",
read_only=True,
),
],
mount_tmp_dir=False,
mem_limit="4g", # 容器可以使用的最大内存為 4GB
shm_size="2g", # 共享内存大小為 2GB
docker_url="tcp://docker-proxy:2375",
network_mode="bridge",
)
- Upload to GCS
- Pack as Docker Image
- Run in Airflow
- Get more detailed Google Maps Places info
- Filter time to get Google Maps Reviews
- Refactor Code