Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Branch: master

Update READMEs to have correct installation instructions for each pac…

…kage. Make sure instructions are clear for both the precompiled-binary and built-from-source cases.
latest commit 531c88e499
@aseering aseering authored



The Shell Load extension package provides a mechanism to execute shell
commands from within Vertica and retrieve the result as a stream of
raw file data to be loaded via the COPY command.  You can execute shell
commands to generate or fetch data, and commands to filter data loaded
by an existing Vertica source, passing it via pipes through an external

Be aware: executing arbitrary shell commands is a loaded gun and should
be used with care.


To install, first copy build/ and 
build/ to a node on your cluster.  Then log 
into that node with vsql and run the following commands:

CREATE LIBRARY ExternalFilterLib AS '/path/to/';
CREATE FILTER ExternalFilter AS LANGUAGE 'C++' NAME 'ExternalFilterFactory' LIBRARY ExternalFilterLib;
CREATE LIBRARY ExternalSourceLib AS '/path/to/';
CREATE SOURCE ExternalSource AS LANGUAGE 'C++' NAME 'ExternalSourceFactory' LIBRARY ExternalSourceLib;

Alternatively, assuming vsql is in your path, just do:

$ make install
$ make uninstall

Make sure the user running the Vertica process is able to sudo to user 'nobody'
without password and without having a tty. The following /etc/sudoers entry
should work:

dbadmin ALL=(nobody) NOPASSWD: ALL
Defaults:dbadmin !requiretty

Alternatively, add -DNO_SUDO to CXXFLAGS in the Makefile to have the
commands run as the Vertica process, and rebuild from source. Note that 
even though running as 'nobody' is not secure (one user could harm or 
inspect another user's process), at least it is a bit safer in that an 
accidental "rm -rf" will not delete your database.


To build this library from source code:

$ make

To test the resulting binary:

$ make test


COPY <table> WITH SOURCE ExternalSource(cmd='<command>' [, nodes='<nodes list>']);
COPY <table> WITH FILTER ExternalFilter(cmd='<command>');

command    - command to execute.
nodes list - comma-separated list of node names to execute on,
             'ANY NODE' (default) or 'ALL NODES'.

The actual execution is accomplished by forking and running: sh -c <command>
Because the command is actually interpreted by the shell, pipes,
redirects, looping and other shell functionality is available.

Example 1:
Creating a table and populating it once.

create table t (i int);
copy t with source ExternalSource(cmd='seq 1 10');
copy t from 'list_of_numbers.txt.gz' with filter ExternalFilter(cmd='gunzip');

Example 2:
Creating an external table without state (is populated on every query).

create external table t (i int) as copy source ExternalSource(cmd='seq 1 10');

The following environment variables are available from ExternalSource:

  - CURRENT_NODE_NAME  - name of the current node, e.g. 'v_test_node0001'
  - TARGET_NODE_NAMES  - comma-separated list of nodes on which the query runs
  - NUM_TARGET_NODES   - number of nodes on which the query runs
  - CURRENT_NODE_INDEX - an index (0..NUM_TARGET_NODES-1) of this node

You can use NUM_TARGET_NODES and CURRENT_NODE_INDEX to have cmd return a subset
of data on each node, e.g. that where:

    [ $something % $NUM_TARGET_NODES -eq $CURRENT_NODE_INDEX ]

The following environment variables are available from ExternalFilter:

  - CURRENT_NODE_NAME  - see above


Please see LICENSE.txt


This UDL executes an external program and passes its stdout directly to Vertica.  It's effective, for example, if your data is generated by a program, or if your data is available on a remote computer and a command-line client program can access it.

To install this UDL, run "make install", or see the vsql commands under the 'install' target in the Makefile.

The ExternalSource UDL takes two arguments:
- 'cmd' (required): The command to execute.  Must be installed on all nodes that will execute the query; by far the simplest approach is to simply install it on all nodes.
- 'nodes' (optional): A comma-separated list of the names of nodes to run the query on.  If none is specified, chooses one node at random.

Usage example:

create table t (i int);
copy t with source ExternalSource(cmd='seq 1 10');

Something went wrong with that request. Please try again.