Skip to content

Highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform

Notifications You must be signed in to change notification settings

wesleycheung0/twds-crawler

 
 

Repository files navigation

twds-crawler

This repository contains the code to build a highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform. It was part of a datascience-class to get in touch with some of the most common technologies when it comes to big web- and big data processing.

Documentation

A more detailed description of the implementation can be found in my medium.com article.

Trouble Shooting

Additionally I documented some of my challenges in the trouble-shooting.md

About

Highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.1%
  • Shell 3.0%
  • Dockerfile 0.9%