Pull out bits of URLs provided on stdin
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
script
.gitignore
CONTRIBUTING.mkd
LICENSE
README.mkd
format_test.go
main.go

README.mkd

unfurl

Pull out bits of URLs provided on stdin

Install

If you have Go installed and configured:

▶ go get -u github.com/tomnomnom/unfurl

Otherwise download the latest binary for your platform, extract it and move it to somewhere in your $PATH (e.g. /usr/bin/):

▶ wget https://github.com/tomnomnom/unfurl/releases/download/v0.0.1/unfurl-linux-amd64-0.0.1.tgz
▶ tar xzf unfurl-linux-amd64-0.0.1.tgz
▶ sudo mv unfurl /usr/bin/

Usage

unfurl works with URLs provided on stdin; they might come from a file like this one:

▶ cat urls.txt
https://sub.example.com/users?id=123&name=Sam
https://sub.example.com/orgs?org=ExCo#about
http://example.net/about#contact

Domains

You can extract the domains from the URLs with the domains mode:

▶ cat urls.txt | unfurl domains
sub.example.com
sub.example.com
example.net

If you don't want to output duplicate values you can use the -u or --unique flag:

▶ cat urls.txt | unfurl --unique domains
sub.example.com
example.net

The -u/--unique flag works for all modes.

Paths

▶ cat urls.txt | unfurl paths
/users
/orgs
/about

Query String Keys

▶ cat urls.txt | unfurl keys
id
name
org

Query String Values

▶ cat urls.txt | unfurl values
123
Sam
ExCo

Query String Key/Value Pairs

▶ cat urls.txt | unfurl keypairs
id=123
name=Sam
org=ExCo

Custom Formats

You can use the format mode to specify a custom output format:

▶ cat urls.txt | unfurl format %d%p
sub.example.com/users
sub.example.com/orgs
example.net/about

The available format directives are:

%%  A literal percent character
%s  The request scheme (e.g. https)
%u  The user info (e.g. user:pass)
%d  The domain (e.g. sub.example.com)
%S  The subdomain (e.g. sub)
%r  The root of domain (e.g. example)
%t  The TLD (e.g. com)
%P  The port (e.g. 8080)
%p  The path (e.g. /users)
%q  The raw query string (e.g. a=1&b=2)
%f  The page fragment (e.g. page-section)
%@  Inserts an @ if user info is specified
%:  Inserts a colon if a port is specified
%?  Inserts a question mark if a query string exists
%#  Inserts a hash if a fragment exists
%a  Authority (alias for %u%@%d%:%P)

Any characters that don't match a format directive remain untouched:

▶ cat urls.txt | unfurl -u format "%d (%s)"
sub.example.com (https)
example.net (http)

Note that if a URL does not include the data requested, there will be no output for that URL:

▶ echo http://example.com | unfurl format "%P"
▶ echo http://example.com:8080 | unfurl format "%P"
8080

Help

▶ unfurl -h
Format URLs provided on stdin

Usage:
  unfurl [OPTIONS] [MODE] [FORMATSTRING]

Options:
  -u, --unique   Only output unique values
  -v, --verbose  Verbose mode (output URL parse errors)

Modes:
  keys     Keys from the query string (one per line)
  values   Values from the query string (one per line)
  keypairs Key=value pairs from the query string (one per line)
  domains  The hostname (e.g. sub.example.com)
  paths    The request path (e.g. /users)
  format   Specify a custom format (see below)

Format Directives:
  %%  A literal percent character
  %s  The request scheme (e.g. https)
  %u  The user info (e.g. user:pass)
  %d  The domain (e.g. sub.example.com)
  %P  The port (e.g. 8080)
  %p  The path (e.g. /users)
  %q  The raw query string (e.g. a=1&b=2)
  %f  The page fragment (e.g. page-section)
  %@  Inserts an @ if user info is specified
  %:  Inserts a colon if a port is specified
  %?  Inserts a question mark if a query string exists
  %#  Inserts a hash if a fragment exists
  %a  Authority (alias for %u%@%d%:%P)

Examples:
  cat urls.txt | unfurl keys
  cat urls.txt | unfurl format %s://%d%p?%q