Skip to content

yeha98552/google-maps-reviews-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Maps Scraper

Update params.yaml file with your params. Run and get Google Maps Places and Reviews. And Upload to GCS.

Usage

Use in local

  1. Install Dependencies:
make install
  1. Set Environment Variables:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/crawler_gcp_keyfile.json"
export GCS_BUCKET_NAME="your-bucket-name"
export GCS_BLOB_NAME="your-blob-name"
  1. Get the results by running:
make run

Remember to add your params in params.yaml file.

  1. Clean repo:
make clean
  1. Clean repo and results:
make clean_all

Use in Docker Container

  1. Build Docker Image
docker build -t gmaps-scraper .
  1. Run Docker Container
docker run -it --rm -m 4g --shm-size=2g \
  -v $(pwd)/crawler_gcp_keyfile.json:/app/crawler_gcp_keyfile.json \
  -e GCS_BUCKET_NAME="your-bucket-name" \
  -e GCS_BLOB_NAME="your-blob-name" \
  gmaps-scraper

Use in Airflow

  1. Build Docker Image
docker build -t gmaps-scraper .
  1. Set Docker Proxy in Airflow docker-compose

  2. Add DockerOperator to your DAG

run_scraper = DockerOperator(
    task_id="e_gmaps-scraper",
    image="gmaps-scraper",
    api_version="auto",
    auto_remove=True,
    environment={
        "GCS_BUCKET_NAME": "your-bucket-name",
        "GCS_BLOB_NAME": "your-blob-name",
    },
    command="make run",
    mounts=[
        Mount(
            source="<your-gcp-keyfile>",  # local path
            target="/app/crawler_gcp_keyfile.json",
            type="bind",
            read_only=True,
        ),
    ],
    mount_tmp_dir=False,
    mem_limit="4g",  # 容器可以使用的最大内存為 4GB
    shm_size="2g",  # 共享内存大小為 2GB
    docker_url="tcp://docker-proxy:2375",
    network_mode="bridge",
)

TODO

  • Upload to GCS
  • Pack as Docker Image
  • Run in Airflow
  • Get more detailed Google Maps Places info
  • Filter time to get Google Maps Reviews
  • Refactor Code

Reference

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.7%
  • Dockerfile 1.4%
  • Other 0.9%