Skip to content
使用express写一个简易的爬虫
JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
.gitignore
Procfile
README.md
app-init.js
app.js
package-lock.json
package.json

README.md

Node Spider

使用 superagent 与 cheerio 完成简单爬虫,使用 async 控制并发

使用

git clone https://github.com/yangtao2o/node-express-spider.git

cd node-express-spider

npm i

npm start

运行http://localhost:3000/,开始获取资源,需要一时半会儿,若一直重复获取中,可重启再试,也可手动改动请求数目

比如如下结果:log-20190921

题目

代码的入口是 app.js,当调用 node app.js 时,它会输出 CNode(https://cnodejs.org/ ) 社区首页的所有主题的标题,链接和第一条评论,以 json 的格式。

注意:并发连接数需要控制在 5 个。

输出示例:

[
  {
    "title": "【公告】发招聘帖的同学留意一下这里",
    "href": "http://cnodejs.org/topic/541ed2d05e28155f24676a12",
    "comment1": "呵呵呵呵"
  },
  {
    "title": "发布一款 Sublime Text 下的 JavaScript 语法高亮插件",
    "href": "http://cnodejs.org/topic/54207e2efffeb6de3d61f68f",
    "comment1": "沙发!"
  }
]

知识点

  1. 学习 async(https://github.com/caolan/async ) 的使用。这里有个详细的 async demo 演示:https://github.com/alsotang/async_demo
  2. 学习使用 async 来控制并发连接数。

Heroku

➜  node-express-spider git:(master) npm i
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN learn-02@1.0.0 No repository field.

added 77 packages from 86 contributors and audited 175 packages in 2.302s
found 0 vulnerabilities

➜  node-express-spider git:(master) ✗ node app.js
➜  node-express-spider git:(master) ✗ heroku create 
Creating app... done, ⬢ fathomless-headland-59476
https://fathomless-headland-59476.herokuapp.com/ | https://git.heroku.com/fathomless-headland-59476.git
➜  node-express-spider git:(master) ✗ git remote -v
heroku  https://git.heroku.com/fathomless-headland-59476.git (fetch)
heroku  https://git.heroku.com/fathomless-headland-59476.git (push)
origin  https://github.com/yangtao2o/node-express-spider.git (fetch)
origin  https://github.com/yangtao2o/node-express-spider.git (push)
➜  node-express-spider git:(master) ✗ git push heroku master
枚举对象: 16, 完成.
对象计数中: 100% (16/16), 完成.
使用 8 个线程进行压缩
压缩对象中: 100% (14/14), 完成.
写入对象中: 100% (16/16), 16.19 KiB | 5.40 MiB/s, 完成.
总共 16 (差异 2),复用 0 (差异 0)
remote: Compressing source files... done.
remote: Building source:
remote: 
remote: -----> Node.js app detected
remote:        
remote: -----> Creating runtime environment
remote:        
remote:        NPM_CONFIG_LOGLEVEL=error
remote:        NODE_ENV=production
remote:        NODE_MODULES_CACHE=true
remote:        NODE_VERBOSE=false
remote:        
remote: -----> Installing binaries
remote:        engines.node (package.json):  unspecified
remote:        engines.npm (package.json):   unspecified (use default)
remote:        
remote:        Resolving node version 10.x...
remote:        Downloading and installing node 10.15.3...
remote:        Using default npm version: 6.4.1
remote:        
remote: -----> Building dependencies
remote:        Installing node modules (package.json + package-lock)
remote:        added 148 packages from 193 contributors and audited 279 packages in 4.549s
remote:        found 2 low severity vulnerabilities
remote:          run `npm audit fix` to fix them, or `npm audit` for details
remote:        
remote: -----> Caching build
remote:        - node_modules
remote:        
remote: -----> Pruning devDependencies
remote:        audited 279 packages in 1.578s
remote:        found 2 low severity vulnerabilities
remote:          run `npm audit fix` to fix them, or `npm audit` for details
remote:        
remote: -----> Build succeeded!
remote: -----> Discovering process types
remote:        Procfile declares types     -> (none)
remote:        Default types for buildpack -> web
remote: 
remote: -----> Compressing...
remote:        Done: 21.1M
remote: -----> Launching...
remote:        Released v3
remote:        https://fathomless-headland-59476.herokuapp.com/ deployed to Heroku
remote: 
remote: Verifying deploy... done.
To https://git.heroku.com/fathomless-headland-59476.git
 * [new branch]      master -> master
➜  node-express-spider git:(master) ✗ heroku open
➜  node-express-spider git:(master) ✗ heroku logs --tail

Node 相关

You can’t perform that action at this time.