Skip to content
Marc van Grootel edited this page Jan 3, 2016 · 4 revisions

Introduction to Origami

Origami is a templating library for XQuery 3.1 and the BaseX XML database engine and XPath/XQuery 3.1 processor.

To get an idea of how Origami works the following sections provide an introduction to the library.

Topics:

  • The Mu datastructure
  • Pure code templates
  • Mu pipelines
  • Document builders
  • Node transformers
  • Template builders

Mu, a micro XML data structure

An Origami template is a data structure built out of arrays and maps. XQuery 3.0 already introduced functions as data items and XQuery 3.1 introduces arrays and maps. Origami takes advantage of these new language features.

Mu is the name of the greek character "μ". The Mu data structure can represent the structure of XML documents (less some of it's features such as comments, processing instructions and namespaces). It's data model resembles the proposal for Micro-XML so, you could call them a flavor of μ-XML.

With the Origami library you can:

  • create Mu documents from XML or HTML
  • extract nodes from XML or HTML documents
  • attach node handlers (functions) to elements and attributes
  • transform Mu documents or compose larger templates from smaller ones
  • create HTML or XML from Mu documents

Although Origami primarily deals with Mu data structures these can also contain XML nodes. But the reverse is not true: XML nodes cannot contain Mu data structures.

If this were a game of rock-paper-scissors then Mu would be the paper and XML the rock.

Because of it's simplicity Mu documents are easy to write by hand. They are pure XQuery code. If you want to use Origami only for Mu then you have to learn about only two functions.

declare variable $xml := 
  <p>Hello, <span class="name">Origami</span>!</p>;

We can convert this XML fragment to a Mu data structure using the o:doc function.

o:doc($xml)

This will return the following Mu data structure.

['p', 
    'Hello, ', 
    ['span', map { 'class': 'name' }, 'Origami'], 
    '!'
]

It's easy to work out how this data structure represents the former XML fragment. In essence, it returns a nested structure of arrays (the elements) with maps (for the attributes) and plain strings. Element names are also plain strings but they must occur in the first position of each array. Handlers (or function items) can be embedded too but they will be introduced later.

If you have such a data structure use the o:xml function to get back XML.

declare variable $mu := o:doc($xml);

o:xml($mu)

This will return an XML node structure identical to the original.

Templates as code

Using Mu data structures we can easily turn a sequence of values into an HTML list.

declare function list($items) 
{
  ['ul', map { 'class': 'groceries' },
    for $item in $items
    return ['li', $item]
  ]
};

list(('Apples', 'Bananas', 'Pears'))

This returns

['ul', map { 'class': 'groceries' },
  ['li', 'Apples'],
  ['li', 'Bananas'],
  ['li', 'Pears']
]

Using the o:xml function we get the XML for this list.

o:xml(list(('Apples', 'Bananas', 'Pears')))
<ul class="groceries">
  <li>Apples</li>
  <li>Bananas</li>
  <li>Pears</li>
</ul>

This type of templating will feel very familiar to the developer as it uses little else then the XQuery language itself. Taking a generic data structure and folding it together with data into another generic data structure.

On the other hand this example may not show off the best abilities of Origami as it's just as easy as writing this.

declare function list($items) 
{
  <ul class='groceries'>{
    for $item in $items
    return <li>{ $item }</li>
  }</ul>
};

list(('Apples', 'Bananas', 'Pears'))

In a language like XQuery a lot of the infrastructure is geared towards queries on XML nodes. A lot of the power of XPath cannot be used for querying Mu data structures. But it's easy to mix the two approaches. Origami is aimed at algorithmic manipulation of documents for the purpose, mainly, of templating.

This example shows that you can implement a familiar templating style using the Mu data structure. It makes your code read less like a weird blend between XML markup and code. It's also the style that is probably easiest to introduce in your code.

Mu pipelines

Let's look at three different examples for generating an HTML list from a simple XML list structure. This is a simple transformation which amounts to little else than renaming a few elements. These examples will show how to build little Mu pipeline functions for modifying Mu documents.

declare variable $local:xml :=
	<list>
		<item>A</item>
		<item>B</item>
		<item>C</item>
	</list>;
declare function x:ul($list as element(list))
as element(ul)
{
    ['ul',
        o:doc($list)
        => o:unwrap()
        => for-each(o:rename('li'))
    ] => o:xml()
};
$local:xml => x:ul()
<ul>
  <li>A</li>
  <li>B</li>
  <li>C</li>
</ul>

In this case the function accepts XML nodes and also produces XML nodes. When building larger applications you may choose to transform the documents or data to Mu and then let the individual pipelines work on the Mu documents. This way you can take advantage of many of the other Mu node transformer functions and only do the serialization from and to XML at the beginning and end of the processing.

The above example would then look like this.

declare function x:ul($list as array(*))
as array(*)
{
    ['ul',
        $list
        => o:unwrap()
        => for-each(o:rename('li'))
    ]
};
o:doc($local:xml) => x:ul() => o:xml()

I am using the arrow operator as a handy syntactic device to make the code look more like a pipeline through which Mu nodes flow.

The previous example is a bit contrived and there is a much more succinct way of describing such transformations. We can describe the transformation rules in a map.

declare function x:ul-xf($list as element(list))
as element(ul)
{
    o:doc(
        $list, 
        map { 
            'list': o:rename('ul'),
            'item': o:rename('li')
        }
    ) => o:apply() => o:xml()
};

The map contains two entries or rules. The first handles a list elements and the second handles all item elements. The rule entry values are functions that are returned by calling o:rename#1. It's not unsimilar to an XSLT stylesheet with two match templates.

After the o:doc call that handles the transformation there is now an o:apply call. What does that do? The o:doc call transforms a document (Mu or XML) using the rules in the map provided as it's second argument. This will transformation will attach the o:rename handlers to each of the matching nodes but it will not evaluate them yet. This so we can store the the result of the o:doc call and re-use it over and over as a template. To invoke the handlers and effect the actual renaming of nodes you need to call o:apply.

Using a map for defining a transformation suffices for simple examples but this does not allow using real XPath. To define more complex transforms use an array with, possibly nested, rules. This will build and use an XSLT transform in the background and allows you to use real XPath expressions to match nodes.

declare function x:ul-xslt($list as element(list))
as element(ul)
{
    o:doc(
        $list,
        ['list', o:rename('ul'),
            ['item', o:rename('li')]
        ]
    ) => o:apply() => o:xml()
};

This example is the same as the previous but it defines rules using arrays. These can be nested in order to contain the node matching within specific area of a document.

As this type of transformation uses XSLT (1.0) in the background it is a lot slower then the former examples. However, this type of transform is intended to be used at compile time to prepare templates. The data structure returned (with handlers attached to the matched nodes) can then be used many times at run-time (of course this depends on the XQuery engine used and if it can re-use compiled expressions).

As this is using proper XPath we can do things that we cannot do with a map based transformer.

declare function x:ul-xslt2($list as element(list))
{
    o:doc(
        $list,
        ['list', o:rename('ul'),
            ['item', o:rename('li')],
            ['item[1]', o:comp((o:rename('li'),o:insert('replaced')))]
        ]
    ) => o:apply() => o:xml()
};

The result of this function will be the following HTML list.

<ul>
	<li>replaced</li>
    <li>B</li>
    <li>C</li>
</ul>

The handler for the first item node is special as it uses o:comp to create a handler function by composing a rename followed by an insert. This is a more functional approach but is otherwise identical to using an anonymous function (or even a named function if you want to re-use it in different places).

declare function x:ul-xslt2($list as element(list))
{
    o:doc(
        $list,
        ['list', o:rename('ul'),
            ['item', o:rename('li')],
            ['item[1]', function($n) {
				$n
				=> o:rename('li')
				=> o:insert('replaced')
			}]
        ]
    ) => o:apply() => o:xml()
};

This would achieve exactly the same the difference being more a matter of style.

Document builders

When reading external templating files you often need to extract specific nodes from them or remove some. Origami provides document builders to deal with this.

An extraction is defined using rules which resemble the Mu data structure. You can pass these rules to the o:builder function and pass the created builder object to the o:doc function as a second argument or you can let the o:doc function create this builder implicitly for you.

Conceptually this process can be described as follows.

  • Initially the extraction process will start in removal mode.
  • A rule can switch the extraction to start copying nodes selected by an XPath expression.
  • Nested rules can choose if they want to switch to removing, or copying nodes.

This way you can declare powerful extraction processes that take from the input document only what you need. And, because these builder rules are just data structures, like Mu documents, you can generate them, compose and re-use them.

Assume you have the following input document.

declare variable $html :=
  <html>
    <body>
      <p>This is a table</p>
      <table>
        <tr class="odd" x="foo">
          <th>hello <b>world</b>!</th>
          <th>foobar</th>
        </tr>
        <tr class="even" y="bar">
          <td>bla <b>bla</b></td>
          <td>foobar</td>
        </tr>
      </table>
    </body>
  </html>;

Extract only the table as-is.

o:xml(o:doc($html, ['table']))

Take the table but remove all attributes by using a nested rule inside the table rule. This nested rule matches on all attributes and snips them (using ()).

o:xml(o:doc($html, ['table', ['@*', ()]]))

Say, we want to extract all indidual cells but we also want to remove inline markup from these cells. I've added indentation to see the separate nested rules more clearly.

o:xml(
  o:apply(
    o:doc($html, 
      ['table',
        ['td|th', 
          ['*', o:unwrap()]
        ]
      ]
    )
  )
)

Here we get to the limit of what a normal extraction builder can do. We have to introduce a new concept (handlers) which is explained in more detail later.

Let's walk through this example. First all table nodes are copied, the first nested rule selects all cells. The next nested rule means: match all element nodes inside the cell. Each rule creates the context for it's nested rules to operate in.

But then when each element inside a cell is matched we cannot just snip it out of the tree as this would remove the text nodes inside it as well. So we provide a node transformer as a handler. We provide an o:unwrap handler. What this does is it removes the element surrounding the text.

We also had to add a new function o:apply. This function is needed to invoke the handlers in the Mu data structure. A limitation is that to unwrap further levels of nested inline markup we would have to apply even more tricks so I am not going to go there.

Note that currently Origami uses XPath/XSLT 1.0 to implement these builders. In a future version I intend to lift this restriction.

Node transformers

When dealing with Mu data structures you need tools to, let's say, slice and dice them. Origami contains a collection of functions to deal with Mu documents and document fragments. Together with some of the new operators and functions introduced in XPath they can be used to build concise transformations on Mu data structures.

TBD

Template builders

This style of templating is inspired by Clojure library named Enlive. Like Enlive Origami uses HTML files that are completely free of templating instructions.

Say you have an HTML file named groceries.html with the following contents.

<html>
  <head>
    <title>Template</title>
    <meta charset="UTF-8" />
    <link rel="stylesheet" type="text/css" href="base.css"></link>
  </head>
  <body>
    <ul class="groceries">
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
    </ul>
  </body>
</html>

We want to use this HTML as a template which involves:

  1. Changing the title
  2. Adding a CSS link in the header
  3. Displaying a list of groceries

Earlier I used a builder to extract nodes from a document. Here I need to do more than just extracting nodes. I need to inject data into nodes, add nodes and remove nodes. Like earlier, the same o:doc function is used to prepare the template. Then the o:apply function is used to inject data into this template data structure.

I do not want to add code or foreign templating markup to the template. I want to use the HTML as-is. I have agreed with the designer to use a lightweight convention for class and id attributes and some structural markup that will remain constant. But other than that the designer can update the HTML without telling me.

Let's address the first task: changing the title.

declare variable $template := 
  o:doc(
    o:read-html('groceries.html'),
    ['html', 
      ['head',
        ['title', function($node, $data) { 
          $node => o:insert($data?title) 
        }]
      ]
    ]
  );

let $data := 
  map {
    'title': 'Shopping List',
    'css': 'shopping-list.css',
    'items': ('Apples', 'Bananas', 'Pears')
  }

return 
  o:xml(
    o:apply($template, $data)
  )

The o:doc function loads the HTML (using the o:read-html function) and then uses the builder rules to bind handlers (functions) to selected nodes. It returns the resulting Mu data structure with the handlers bound to, in this case, the title element.

Handlers receive two arguments. The current element node and the data passed into the template. The handler's body takes the node and inserts the page title ($data?title).

The data used for this example is modelled as a map. It could just as easily be an XML node (and maybe it should in a more realistic example). The map contains the title of the list, the name of an extra CSS file to load and a sequence of shopping items.

Finally the data is applied to the template. The result of this is a Mu data structure and to see it as XML we pass this result through the o:xml function.

Great, on to the next task: adding a CSS link in the header. I will only show the code for the current task. At the end I will show the full example and the final result.

declare variable $template := 
  o:doc(
    o:read-html('groceries.html'),
    ['html', 
      ['head',
        ['link[@rel="stylesheet"][last()]', function($node, $data) {
          let $link := $node => o:set-attr(map { 'href': $data?css })
          return
            $node => o:after($link)
        }]
      ]
    ]
  );

The link handler matches the last CSS link element and uses it to build a new node ($link) by setting the href attribute to the value of $data?css. Then this new link node is inserted after the current $node element.

Just one more thing to do: Displaying a list of groceries.

declare variable $template := 
  o:doc(
    o:read-html('groceries.html'),
    ['html', 
      ['ul[@class="groceries"]',
        ['li[1]', function($node, $data) {
          for $item in $data?items
          return
            $node => o:insert($item)
        }],
        ['li', ()]
      ]
    ]
  );

First find the list node (ul[@class="groceries"]) and in the list replicate the first list item for each item in the shopping list ($data?items). A second rule is necessary to get rid of the two other list items that are in the template.

We can now pass different data into the template and get a fully styled HTML page back.

This is the complete XQuery code for this example.

declare variable $template := 
  o:doc(
    o:read-html('groceries.html'),
    ['html', 
      ['head',
        ['title', function($node, $data) { 
          $node => o:insert($data?title) 
        }],
        ['link[@rel="stylesheet"][last()]', function($node, $data) {
          let $link := $node => o:set-attr(map { 'href': $data?css })
          return
            $node => o:after($link)
        }]
      ],
      ['ul[@class="groceries"]',
        ['li[1]', function($node, $data) {
          for $item in $data?items
          return
            $node => o:insert($item)
        }],
        ['li', ()]
      ]
    ]
  );

let $data := 
  map {
    'title': 'Shopping List',
    'css': 'shopping-list.css',
    'items': ('Apples', 'Bananas', 'Pears')
  }

return 
  o:xml(
    o:apply($template, $data)
  )

Running this code will give you the following.

<html>
  <head>
    <title>Shopping List</title>
    <meta charset="UTF-8"/>
    <link href="base.css" rel="stylesheet" type="text/css"/>
    <link href="shopping-list.css" rel="stylesheet" type="text/css"/>
  </head>
  <body>
    <ul class="groceries">
      <li>Apples</li>
      <li>Bananas</li>
      <li>Pears</li>
    </ul>
  </body>
</html>