quora-crawler

Use selenium, beautiful soup and mongodb to crawl and store data from quora The reason to use selenium is that sometimes pages are loaded using javascript when new items show up as you scroll down, and simply using beautiful soup won't enable javascript and show all the results. selenium is mostly used to show up all the results, and beautiful soup is easier than selenium to analyze and crawl the data.

precondition

install python3 https://www.python.org/downloads/
install selenium
install beautiful soup
install pymongo
run pymongo on localhost (command line: mongod)

usage

userid is your personal userid and depth means how many steps you will crawl from your followers. (1 means one-step followers, 2 means two-step followers, etc). Please go to your personal profile, and you will find your userid in url. For example, mine: https://www.quora.com/profile/Justin-Li-65

python3 controller.py -u <userid> -d <depth>

python3 controller.py --userid <userid> --depth <depth>

example

python3 controller.py -u Justin-Li-65 -d 1

python3 controller.py --userid Justin-Li-65 --depth 1

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
utilities		utilities
README.md		README.md
controller.py		controller.py
crawler.py		crawler.py
driver.py		driver.py
parser.py		parser.py
storer.py		storer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utilities

utilities

README.md

README.md

controller.py

controller.py

crawler.py

crawler.py

driver.py

driver.py

parser.py

parser.py

storer.py

storer.py

Repository files navigation

quora-crawler

precondition

usage

example

About

Releases

Packages

Languages

yuanxu-li/quora-crawler

Folders and files

Latest commit

History

Repository files navigation

quora-crawler

precondition

usage

example

About

Resources

Stars

Watchers

Forks

Languages