Skip to content
xml2 for JSON; like gron
Python Shell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore json2dir tool + fixes Oct 8, 2013
2json
README.md Document some limitations about Python2 Apr 10, 2018
dir2json json2dir tool + fixes Oct 8, 2013
json2 Be more careful with numbers Apr 10, 2018
json2dir Fix for Python 2 Oct 23, 2016
json_compare Declare author and license in short Oct 5, 2013
json_keys Declare author and license in short Oct 5, 2013
random_json Add more characters in strings in random_json Apr 10, 2018
test.sh Document some limitations about Python2 Apr 10, 2018
test_dir.sh Test for json2dir and dir2json Oct 9, 2013

README.md

There is a tool to convert XML files to intermediate format that allows editing and data extraction to be performed with simple (not XML-aware) tools, such as regular expressions-based grep or sed. It does not solve the general task of transforming XML files, but still allows text handling tools to go farther than in case of direct attempt to use them on XML.

But xml2 is for XML, and somebody may want the similar tool for JSON.

Here there are two main tools plus several supplementrary ones:

  • json2 - converts JSON to intermediate text-editable format;
  • 2json - converts that intermediate format back to JSON;
  • json_compare - compares two JSON files to equality and reports the found difference, if any;
  • random_json - generates random "tricky" JSON (with confusing strings, empty objects, etc.);
  • json_keys - gathers keys used for in objects in the JSON
  • test.sh - endless "fuzz test" of json2 | 2json using random_json and json_compare.
  • json2dir and dir2json - "unpacks" JSON to files and directories and back;

Tested with Python 2.6, 2.7 and 3.2.

Example

JSON file

{ "mist": "qua\nlity\n",
  "fist": [],
   "gist": [5,6,"7"],
   "...": null,
   "test":[[[false]]],
   "var":{"lib":{"dpkg":"status"}}
}

Output of json2:

/...=null
/gist/0=5
/gist/1=6
/gist/2="7
/var/lib/dpkg=status
/fist=[]
/test/0/0/0=false
/mist="qua
/mist="lity
/mist="

Rules of the format

  • Each line must contain "=". The first "=" on each line is always put by json2, subsequent "="s may happen in the data extracted from JSON;
  • The left part of the line before "=" is "address", the right part after the first "=" is "value".
  • Value can be string, number, null, float, boolean, empty list or empty object.
  • Any value that can't be interpreted as non-string is interpreted as string. Using " character just after = forces it to be a string. By default json2 uses unescaped strings where possible: if there_may_be_problems then prefix_with_" else use_the_string_as_is. JSON2_ALWAYS_MARK_STRINGS=true overrides this and makes json2 put " before any string values.
  • Only empty lists and objects must be explicitly mentioned as values. Non-empty lists and objects still can have "stubs" like =[] or ={} at the respective address. JSON2_ALWAYS_STUBS=true forces stubs for all lists and objects.
  • Address is a list of keys separated by "/". The first empty key (before the first /) is ignored, subsequent empty keys are assumed as empty keys of objects (for example, {"":{"":""}} -> //="). Each address entry "descends" from the top-level list of object into it's children (creating intermediate lists or objects if necessary).
  • Numeric keys are used as indexes (starting from 0) of the lists in JSON. Non-numeric keys are keys for object fileds.
  • All keys of object fileds are mangled to preserve assumptions about usage of /, =, " and \n characters and to avoid mistakingly interepreting them as indexes for lists instead of keys for objects. Mangling rules are not standard: apart from usual \n, \r and \t, / " = \ becomes \| \' \_ \!. Additionally the entire key may be prefixed with \ if it looks like a number.
  • Multiline string values are handled as repeated lines (with the same address).
  • Apart from multi-line string values, lines in 2json input file may be reordered arbitrarily.

Limitations

  • Order of fields in objects is not preserved;
  • 2json is slow. It navigates into the hierarchy of objects and lists from the root for every line;
  • All tools load the entire input file in memory as a tree, not "streamed".
  • Is may be poor option if you need to handle recursive JSON files.
  • There may be corner case incompabilitis between json2 format generated when executing by Python 2 and Python 3. For example + 1 is not considered a valid number of Python 3, hence not prepended with \.
  • Round-trip test fails on Python 2 in tricky corner case (involving tricky characters in keys)

See also

You can’t perform that action at this time.