GOPA, A Spider Written in Go.
- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
- Mac/Linux: Run
make build
to build the Gopa. - Windows: Checkout this wiki page - How to build GOPA on windows.
So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
- Create a index in elasticsearch with script
config/gopa-index-mapping.sh
Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "ext": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "screenshot_id": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 }, "last_screenshot_id": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should >= v5.3
- Enable index module in
gopa.yml
, update the elasticsearch's setting:
- module: index
enabled: true
ui:
enabled: true
elasticsearch:
endpoint: http://dev:9200
index_prefix: gopa-
username: elastic
password: changeme
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.
Also run ./gopa -h
to get the full list of command line options.
Example
➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, wi -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
- Search Console
http://127.0.0.1:9001/
- Admin Console
http://127.0.0.1:9001/admin/
- TBD
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
Released under the Apache License, Version 2.0 .