Skip to content
thedod edited this page Feb 8, 2013 · 5 revisions

json/ contains ~150MB of json files representing graphs (<hex hash>.json) and an index file (mrn2graphs.json) mapping each MRN to a list of the graph files that contain it.

Graph generation

make-json/ contains the scripts needed in order to generate these graphs with cable2graph (see the HOWTO file for details). This enables you to experiment with various ways of generating json files for your CableWeaver fork.

Currently, there are 4 steps (see make-json.sh):

  1. Generate a full graph of all cables and references
  2. Find clusters
  3. Find communities using the the Blondel et al. multilevel algorithm
  4. Split all graphs that are "too big to handle" via CableWeaver (rule of thumb: graphs with a graphml file >100KB) to sub-communities.

Note that both steps 2. and 3. happen at line 7 (that's what the --clusters and --multilevel switches mean).

If you fork this - you can try other methods to generate the json/ folder. To see what algorithms are available - read Cable2Graph's observations page.

Conversion to json

g2json is a downsized version of Cable2Graph's g2svg (not having to compute the layout makes it a hell of a lot faster). The main additions are:

  • Fixing directionality (cluster and community algorithms treat the graph as non-directional and botch the directionality).
  • Adding auxiliary information to the json files that is [at least for me] easier to compute in python. Note that color and colorindex are 2 different ways to give a color to a node (at the moment we use color since it's globally consistent, but colorindex can produce more readable graphs (this is what the prototype uses) as long as your color palette contains enough colors).

Runtime

CableWeaver uses D3's force layout feature, with various tweaks that are not necessarily optimal (I'm a D3 noob), so if you're a D3 expert - you may probably find ways to do things better, faster and more elegant.

One thing I couldn't figure out how to do is make the nodes anything more complex than a circle (e.g. add the MRN as text). As soon as I use a containing a circle and text (or even only a circle), the force layout system becomes too slow to be practical. Perhaps it's only a matter of optimization, perhaps nothing can be done (and a circle with a mouse-over title is the best we can have). I'd love to get some input from D3 wizards about this.

Clone this wiki locally