Skip to content
🐾 Creeper - The Next Generation Crawler Framework (Go)
Branch: master
Clone or download
Latest commit eb1753d May 16, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
art Update readme & add logo Feb 19, 2017
.gitignore add gitignore & rename creeper script file extname Feb 17, 2017
creeper.go Optimized @next (Support automatic stop) Mar 4, 2017
node.go Fix fun param order bug & Prepare for implementation of @next node Mar 2, 2017
town.go Fix fun param order bug & Prepare for implementation of @next node Mar 2, 2017

License Go Report Card Gitter Creeper


Creeper is a next-generation crawler which fetches web page by creeper script. As a cross-platform embedded crawler, you can use it for your news app, subscribe program, etc.

Warning: At present this project is still under early stage development, please do not use in the production environment.

Get Started


$ go get

Hello World!


page(@page=1) = "{@page}"

news[]: page -> $("tr.athing")
    title: $(".title a.storylink").text
    site: $(".title span.sitestr").text
    link: $(".title a.storylink").href

Then, create main.go

package main

import ""

func main() {
	c := creeper.Open("./")
	c.Array("news").Each(func(c *creeper.Creeper) {
		println("title: ", c.String("title"))
		println("site: ", c.String("site"))
		println("link: ", c.String("link"))

Build and run. Console will print something like:

title:  Samsung chief Lee arrested as S.Korean corruption probe deepens
title:  ReactOS 0.4.4 Released
title:  FeFETs: How this new memory stacks up against existing non-volatile memory

Script Spec


Town is a lambda like expression for saving (in)mutable string. Most of the time, we used it to store url.

page(@page=1, ext) = "{@page}&ext={ext}"

When you need town, use it as if you were calling a function:

news[]: page(ext="Hello World!") -> $("tr.athing")

You might have noticed that the @page parameter is not used. Yeah, it is a special parameter.

Expression in town definition line like name="something", represents parameter name has a default value "something".

Incidentally, @page is a parameter that will automatically increasing when current page has no more content.


Nodes are tree structure that represent the data structure you are going to crawl.

news[]: page -> $("tr.athing")
	title: $(".title a.storylink").text
	site: $(".title span.sitestr").text
	link: $(".title a.storylink").href

Like yaml, nodes distinguishes the hierarchy by indentation.

Node Name

Node has name. title is a field name, represents a general string data. news[] is a array name, represents a parent structure with multiple sub-data.


Page indicates where to fetching the field data. It can be a town expression or field reference.

Field reference is a advanced usage of Node, you can found the details in ./

If a node owned page and fun at the same time, page should on the left of ->, fun should on the right of ->. Which is page -> fun


Fun represents the data processing process.

There are all supported funs:

Name Parameters Description
$ (selector: string) Relative CSS selector (select from parent node)
$root (selector: string) Absolute CSS selector (select from body)
html inner HTML
text inner text
outerHTML outer HTML
attr (attr: string) attribute value
style style attribute value
href href attribute value
src src attribute value
class class attribute value
id id attribute value
calc (prec: int) calculate arithmetic expression
match (regexp: string) match first sub-string via regular expression
expand (regexp: string, target: string) expand matched strings to target string


Plutonist · Github @wspl

You can’t perform that action at this time.