briggs
is a simple command-line program for converting test data between many popular
formats, such as:
- JSON
- CSV
- XML
- C/C++ & Java structures
- SQL
- and hopefully many more...
Let's go champ...
To get briggs
working, you'll need at a minimum:
- OpenSSL
If you want MySQL, Postgres or Oracle support you'll need to add those libraries to your system or request that they be compiled in if building briggs
from source.
Keep reading for instructions on how to build.
yum install postgresql
postgresql
On Unix-like systems (e.g. not Windows) make && make install
should be all
that's needed to run on your system. Windows users can get up and running fairly
easily via Cygwin.
Provided you have wget or curl on your system, run the following three commands
to get briggs
built and installed on your system.
$ wget https://github.com/zaiah-dj/briggs/archive/master.zip $ unzip master.zip && cd briggs-master/ && make $ sudo make install
Options for running briggs
are as follows.
-i, --input <arg> Specify an input datasource (required) --create If an output source does not exist, create it -o, --output <arg> Specify an output datasource (optional, default is stdout) -c, --convert Converts input data to another format or into a datasource -H, --headers Displays headers of input datasource and stop -S, --schema Generates an SQL schema using headers of input datasource -d, --delimiter <arg> Specify a delimiter when using a file as input datasource -u, --output-delimiter <arg> Specify an output delimiter when generating serialized output -t, --typesafe Enforce and/or enable typesafety -C, --coerce Specify a type for a column -f, --format <arg> Specify a format to convert to -j, --json Convert into JSON (short for --convert --format "json") -x, --xml Convert into XML (short for --convert --format "xml") -q, --sql Convert into general SQL INSERT statement. -p, --prefix <arg> Specify a prefix -s, --suffix <arg> Specify a suffix --class <arg> Generate a Java-like class using headers of input datasource --struct <arg> Generate a C-style struct using headers of input datasource --for <arg> Use <arg> as the backend when generating a schema (See `man briggs` for available backends) --camel-case Use camel case for class properties or struct labels -n, --no-newline Do not generate a newline after each row. --id <arg> Add and specify a unique ID column named <arg>. --add-datestamps Add columns for date created and date updated -a, --ascii Remove any characters that aren't ASCII and reproduce --no-unsigned Remove any unsigned character sequences. -T, --table <arg> Use the specified table when connecting to a database for source data -Q, --query <arg> Use this query to filter data from datasource -y, --stats Dump stats at the end of an operation -X, --dumpdsn Dump the DSN only. (DEBUG) -h, --help Show help.
The repository ships with a few test files in the test/ folder.
Let's start off with the file called test/inventory.csv
. Running
the following command:
briggs --convert test/inventory.csv --stream json
(or in short form:)
briggs -c test/inventory.csv -j
Will yield a list resembling the following:
[{ "inv_cost": "0.99" ,"inv_qty": "-1" ,"inv_item": "2020-11-04-Sunrise.mp3" ,"inv_image": "" ,"inv_description": "Mr. Doctor – Sunrise (MP3 encoded)" } ,{ "inv_cost": "0.99" ,"inv_qty": "-1" ,"inv_item": "2020-11-04-Scapes-v1.mp3" ,"inv_image": "" ,"inv_description": "Mr. Doctor – Scapes (MP3 encoded)" } /* this list continues... */ }]
If we take a peek at that file, we'll see that briggs
is automatically
using the first line of data as key names.
//Output of head -n 1 test/inventory.csv inv_cost,inv_qty,inv_item,inv_image,inv_description
If we would like to see those in one column, or just need to see what the
key names will look like in a data structure, we can use the following to
see what briggs
made of the file's headers:
inv_cost inv_qty inv_item inv_image inv_description
While not terribly useful on it's own, the --headers option can be used in
conjunction with sed
or similar to do quick data formatting. So with the
following:
briggs --headers test/inventory.csv | sed 's/_/entory /g; s;\(.*\);<h2>\1</h2>;'
...we'll get:
<h2>inventory cost</h2> <h2>inventory qty</h2> <h2>inventory item</h2> <h2>inventory image</h2> <h2>inventory description</h2>
Now, let's say we've been tasked with saving this data to a sqlite database of some sort. Assuming we're using something like sqlite, this should be a fairly trivial operation.
First, if there is no table we'll have to create it.
//create statement for the inventory table CREATE TABLE inventory ( inv_cost TEXT, inv_qty TEXT, inv_item TEXT, inv_image TEXT, inv_description TEXT );
Second, let briggs
handle the conversion and place the insert
statement into a file called inventory.sql.
briggs -c test/inventory.csv -f sql --root inventory > inventory.sql
This will yield:
INSERT INTO inventory ( inv_cost,inv_qty,inv_item,inv_image,inv_description ) VALUES ( '0.99','-1','2020-11-04-Sunrise.mp3','','Mr. Doctor – Sunrise (MP3 encoded)' ); INSERT INTO inventory ( inv_cost,inv_qty,inv_item,inv_image,inv_description ) VALUES ( '0.99','-1','2020-11-04-Scapes-v1.mp3','','Mr. Doctor – Scapes (MP3 encoded)' ); /* .... more records .... */ INSERT INTO inventory ( inv_cost,inv_qty,inv_item,inv_image,inv_description ) VALUES ( '0.99','-1','2020-11-04-MoreLyrical.mp3','','Mr. Doctor – More Lyrical (MP3 encoded)' );
Notice how we used the --root
option to specify the name of the table. If we
didn't use it, briggs
would have used the word 'root' as our assumed table name.
It's worth noting that for options in which a named data structure will make
sense, briggs
will rely on the root flag. So, for example, if we wanted a C
structure with a particular variable name, the --root option would be used to
specify that name. Obviously, for certain formats this won't apply.
Let's take another example, using an Oracle database for storage. By default,
briggs
will output SQL insert code with a single-quote delimiter for strings.
So what if we encounter a string like the third line below in one of our data
files? (Leading spaces have been added for clarity.)
10001, 100 E Santos St., Charlotte, NC, 28201, Fish populate this neighborhood. 10000, 20 E Flatbush Ave., Charlotte, NC, 28201, It is so nice here. 10009,777 Unlucky Cir.,Charlotte,NC,28201,This is sure to trip up ' ' 'anyone' '. 10007, 80 E Yonkers Ct., Charlotte, NC, 28206, Runaway... fast.
Running our usual insert generator will give us something that doesn't work quite so well with a default Oracle XE install:
//Generate an INSERT statement and save to 'oracle.sql' briggs --sql -c test/oracle.csv > oracle.sql
//Dump the contents of 'oracle.sql' INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( '10001','100 E Santos St.','Charlotte','NC','28201','Phish are all over this neighborhood.' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( '10000','20 E Flatbush Ave.','Charlotte','NC','28201','It is so nice here.' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( '10009','777 Unlucky Cir.','Charlotte','NC','28201','This is sure to trip up ' ' 'anyone' '.' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( '10007','80 E Yonkers Ct.','Charlotte',' NC','28206','Runaway... fast.' );
sudo -Eu oracle sqlplus / as sysdba < oracle.sql /* ...Oracle's diagnostic and version information would display here... */ SQL> 2 1 row created. SQL> 2 1 row created. SQL> 2 ERROR: ORA-01756: quoted string not properly terminated SQL> 2 1 row created.
Yep, that error message says it all. The typical SQL database does not handle single quotes very well. For what it's worth, Oracle handled this extremely well, as some database engines won't even tell you why this particular statement does not work.
So, how do we fix it? The first solution is to just escape the single quote with either a backslash () or another single quote ('). So all the characters in our offending field would look something like this:
'This is sure to trip up \' \' \'anyone\' \'.' //or 'This is sure to trip up '' '' ''anyone'' ''.'
This can work for this ONE change. But what if we've got a thousand lines like this? Or even worse, what if multiple fields in our data look like this?
//Yuck... 10001, 100 'E' Santos St., Charlotte, NC, 28201, Fish populate this neighborhood. 10000, 20 E Flatbush Ave., Charlotte, 'NC', 28201, It's so nice here on 'Tuesday'. 10009, 777 Unlucky Cir., Charlotte, NC, 28201, This is sure to trip up ''anyone''. 10007, 80 E Yonkers Ct., 'Charlotte', NC, 28206, Runaway... fast.
The second solution will involve relying on a completely different delimiter.
Fortunately for us, Oracle has a built-in way to solve this problem via the 'q'
tag. It tells the database engine that it is about to encounter an unusual
string, that may contain strange characters like the ones in our test file.
The idea is similar to a heredoc for those familiar with Bash or PHP. You can
read more about the 'q' mechanism here.
To make use of things like this, briggs
comes with an --output-delimiter
option
(or -u
for short) to change the default string enclosure. It is invoked when
converting to SQL and used as a delimiter when not specifying a particular format.
So, with no format specified, we'll get something like the following when running
briggs
with that flag:
//Command briggs -c test/oracle.csv -u "q'[,]'" //Yields /* record => q'[10001]' address => q'[100 E Santos St.]' city => q'[Charlotte]' state => q'[NC]' zip => q'[28201]' description => q'[Phish are all over this neighborhood.]' record => q'[10000]' address => q'[20 E Flatbush Ave.]' city => q'[Charlotte]' state => q'[NC]' zip => q'[28201]' description => q'[It is so nice here.]' record => q'[10009]' address => q'[777 Unlucky Cir.]' city => q'[Charlotte]' state => q'[NC]' zip => q'[28201]' description => q'[This is sure to trip up ' ' 'anyone' '.]' record => q'[10007]' address => q'[80 E Yonkers Ct.]' city => q'[Charlotte]' state => q'[NC]' zip => q'[28206]' description => q'[Runaway... fast.]' */
And to solve our current problem, we can run:
//Command briggs --sql -c test/oracle.csv -u "q'[,]'" > oracle.sql
...which will yield
INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( q'[10001]',q'[100 E Santos St.]',q'[Charlotte]',q'[NC]',q'[28201]',q'[Phish are all over this neighborhood.]' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( q'[10000]',q'[20 E Flatbush Ave.]',q'[Charlotte]',q'[NC]',q'[28201]',q'[It is so nice here.]' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( q'[10009]',q'[777 Unlucky Cir.]',q'[Charlotte]',q'[NC]',q'[28201]',q'[This is sure to trip up ' ' 'anyone' '.]' ); INSERT INTO root ( record,address,city,state,zip,description ) VALUES ( q'[10007]',q'[80 E Yonkers Ct.]',q'[Charlotte]',q'[NC]',q'[28206]',q'[Runaway... fast.]' );
One of the reasons for writing this tool was a need to stay away from in-depth
sed
& awk
incantations. In theory, they can solve just about any problem
one would encounter when dealing with ASCII text. In practice (for me anyway),
it always takes too many lines and far too much thought to get at a quick solution.
This is open-source software, and a project currently in beta.
briggs
was named in honor of New York's
Shannon Briggs -- notorious for his vicious (and hilarious) trolling of
Vladimir Klitschko before the pair's promised title fight.
Please submit a pull request, a comment here on Github, or just email me directly at ramar@ramar.work if you're having issues using it.