Skip to content
Completely save web pages to local
Branch: master
Clone or download
Latest commit e9c3145 Mar 21, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.vscode Add `Attach` to launch.json Dec 8, 2018
scripts Update postinstall script Feb 14, 2019
src Use mercury-parser Mar 20, 2019
.gitignore
.npmignore Add fixtures Mar 20, 2019
.prettierrc
LICENSE
README.md 📝 Add link to Qiita article Feb 17, 2019
config.js Add support for Shadow DOMs & Use json DB Feb 9, 2019
jest.config.js 💥 Use jsdom and css-tree Dec 26, 2018
jest.setup.js
package.json
tsconfig.json 🔧 Update configs Dec 28, 2018
yarn.lock

README.md

📃 Vanilla Clipper

日本語 (Qiita)

Vanilla Clipper is a Node.js library to completely save a webpage to local with Puppeteer. You can save all the contents in the page such as images, videos, CSS, web fonts, iframes, and Shadow DOMs with one command.

Dependencies

  • Node.js (>= 8.10)
  • Chrome or Chromium (Latest version)

Installation

yarn global add vanilla-clipper
# or
npm i -g vanilla-clipper

Usage

CLI

Note: If it fails to launch, try adding --no-sandbox (-n) option.

  • Save https://example.com:

    vanilla-clipper https://example.com
  • Save .timeline element in https://example.com to tech directory (Set browser language to Japanese):

    vanilla-clipper -d tech -s .timeline -l ja-JP https://example.com
  • Login with sub account in the config file:

    vanilla-clipper -a sub https://example.com

See here for details of the options.

📂 Directory structure in ~/.vanilla-clipper

📂 .vanilla-clipper
   📂 pages
      📂 main
         📃 20190213-page1.html
         ︙
      📂 {SOME_FOLDER}
         📃 20190213-page2.html
         📃 20190214-page3.html
         ︙

   📂 resources
      📂 20190213
         📎 {ulid}.jpg
         📎 {ulid}.svg
         ︙
      📂 20190214
         📎 {ulid}.woff2
         ︙

   💎 resources.json
   💎 config.json

⚙️ Config file example

{YOUR_HOME_DIRECTORY}/.vanilla-clipper/config.js

module.exports = {
    resource: { maxSize: 50 * 1024 * 1024 },
    sites: [
        {
            url: 'example.com', // site URL
            accounts: {
                default: {
                    // ↑ account label
                    username: 'main', // or () => 'main'
                    password: 'password1',
                },
                sub: {
                    // ↑ account label
                    username: 'sub_account',
                    password: 'password2',
                },
            },
            login: [
                // [action, arg1, arg2, ...]
                [
                    'goto',
                    'https://example.com/login', // URL
                ],
                [
                    'input',
                    'input[name="session[username_or_email]"]', // selector
                    '$username', // -> accounts.{ACCOUNT_LABEL}.username
                ],
                [
                    'input',
                    'input[name="session[password]"]', // selector
                    '$password', // -> accounts.{ACCOUNT_LABEL}.password
                ],
                [
                    'submit',
                    '[role=button]', // selector
                ],
            ],
        },
    ],
}
You can’t perform that action at this time.