DataMiner

==========

What data you can scrap?

Any website scraping codes can be added to this project for researching purpose(or similar). Currently, this project supports following sites. For further information, please refer to comments in python scripts.

imdb.com Dependency : BeautifulSoup4, lxml, ParsePy
kobis.or.kr Dependency : BeautifulSoup4, lxml, Selenium
boxofficemojo.com Dependency : BeautifulSoup4, lxml
hometax.go.kr Dependency : selenium, PyInquirer

IMDB.com project

It comprises One WPF C# app + One python script. Firstly, for the python script, you can run this script at any isolated server. During parsing the data, manager or data administrator can monitor what is going on the data mining(or scraping) server via the C# application.

Hometax project

Manipulations of select tag alert control, iframe switching.

Installation

BeautifulSoup4 Clone this repository:

    pip install BeautifulSoup4

or manually download the package from here : https://pypi.python.org/pypi/beautifulsoup4

lxml 3.6.4 or higher Clone this repository:

    yum install libxslt-devel libxml2-devel

Download the packpage here : https://pypi.python.org/pypi/lxml/3.6.4)

ParsePy (https://github.com/milesrichardson/ParsePy) The easiest way to install this package is by downloading or cloning this repository:

    pip install git+https://github.com/milesrichardson/ParsePy.git

Selenium

   pip install selenium

PyInquirer

    pip instsall PyInquirer

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.vs		.vs
.vscode		.vscode
FirebaseCrashlytics		FirebaseCrashlytics
IMDBUtils_VS		IMDBUtils_VS
IRPUtils		IRPUtils
IRP_ireland		IRP_ireland
blog.naver.com		blog.naver.com
boxofficemojo.com		boxofficemojo.com
hometax.go.kr		hometax.go.kr
imdb.com		imdb.com
kobis.or.kr		kobis.or.kr
movie.naver.com		movie.naver.com
ruliweb.com		ruliweb.com
workinghour		workinghour
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
chromedriver		chromedriver
geckodriver		geckodriver
geckodriver.log		geckodriver.log
test.py		test.py
workinghour.py		workinghour.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataMiner

What data you can scrap?

IMDB.com project

Hometax project

Installation

Introduction

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tangible-idea/DataScrapingTemplate

Folders and files

Latest commit

History

Repository files navigation

DataMiner

What data you can scrap?

IMDB.com project

Hometax project

Installation

Introduction

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages