Skip to content
crawl ptt articles from its website
Branch: master
Clone or download
Latest commit 792286d May 10, 2016
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
parser/home-sale
.gitignore
README.md
cat.ls
crawler.ls
food.ls
home-sale.ls
id-stat-show.ls
id-stat.ls
package.json

README.md

ptt-crawler

crawl ptt articles from its website

usage:

scraping certain ptt board:

lsc crawler.ls <board-name>

All posts will be downloaded into data//post/ folder. There will also be a data//post-list.json to kepp track of your download history, so you can interrupt your download at any time and resume later.

categorize authors by title:

lsc cat.ls <board-name>

food.ls: example for fetching articles for article generation home-sale.ls: example for categorizing purpose of articles id-stat.ls: analyze users stand point. output to data//id-stat.json id-stat-show.ls: show users statistics, generate suspect.json.

LICENSE

all sources are licensed under MIT License. ( I used CC-BY-4.0 license before, but MIT License is better for code license. please refer to correspondent license according to the time you fork this project. )

You can’t perform that action at this time.