Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
445 lines (307 sloc) 13.5 KB
This is the documentation for libsyck and describes how to extend it.
= Overview =
Syck is designed to take a YAML stream and a symbol table and move
data between the two. Your job is to simply provide callback functions which
understand the symbol table you are keeping.
Syck also includes a simple symbol table implementation.
== About the Source ==
The Syck distribution is laid out as follows:
lib/ libsyck source (core API) lexer for YAML bytecode (re2c)
emitter.c emitter functions
gram.y grammar for YAML documents (bison)
handler.c internal handlers which glue the lexer and grammar lexer for builtin YAML types (re2c)
node.c node allocation and access
syck.c parser funcs, central funcs
syck.h libsyck definitions
syck_st.c symbol table functions
syck_st.h symbol table definitions lexer for YAML plaintext (re2c)
yaml2byte.c simple bytecode emitter
ext/ ruby, python, php, cocoa extensions
tests/ unit tests for libsyck
YTS.c.rb generates YAML Testing Suite unit test
(use: ruby YTS.c.rb > YTS.c)
Basic.c allocation and buffering tests
Parse.c parser sanity
Emit.c emitter sanity
== Using SyckNodes ==
The SyckNode is the structure which YAML data is loaded into
while parsing. It's also a good structure to use while emitting,
however you may choose to emit directly from your native types
if your extension is very small.
SyckNodes are designed to be used in conjunction with a symbol
table. More on that in a moment. For now, think of a symbol
table as a library which stores nodes, assigning each node a
unique identifier.
This identifier is called the SYMID in Syck. Nodes refer to
each other by SYMIDs, rather than pointers. This way, the
nodes can be free'd as the parser goes.
To be honest, SYMIDs are used because this is the way Ruby
works. And this technique means Syck can use Ruby's symbol
table directly. But the included symbol table is lightweight,
solves the problem of keeping too much data in memory, and
simply pairs SYMIDs with your native object type (such as
PyObject pointers.)
Three kinds of SyckNodes are available:
1. scalar nodes (syck_str_kind):
These nodes store a string, a length for the string
and a style (indicating the format used in the YAML
2. sequence nodes (syck_seq_kind):
Sequences are YAML's array or list type.
These nodes store a list of items, which allocation
is handled by syck functions.
3. mapping nodes (syck_map_kind):
Mappings are YAML's dictionary or hashtable type.
These nodes store a list of pairs, which allocation
is handled by syck functions.
The syck_kind_tag enum specifies the above enumerations,
which can be tested against the SyckNode.kind field.
PLEASE leave the SyckNode.shortcut field alone!! It's
used by the parser to workaround parser ambiguities!!
=== Node API ===
SyckNode *
Allocates a node of a given type and initializes its
internal union to emptiness. When left as-is, these
nodes operate as a valid empty string, empty sequence
and empty map.
Remember that the node's id (SYMID) isn't set by the
allocation functions OR any other node functions herein.
It's up to your handler function to do that.
syck_free_node( SyckNode *n )
While the Syck parser will free nodes it creates, use
this to free your own nodes. This function will free
all of its internals, its type_id and its anchor. If
you don't need those members free, please be sure they
are set to NULL.
SyckNode *
syck_new_str( char *str, enum scalar_style style )
syck_new_str2( char *str, long len, enum scalar_style style )
Creates scalar nodes from C strings. The first function
will call strlen() to determine length.
syck_replace_str( SyckNode *n, char *str, enum scalar_style style )
syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style )
Replaces the string content of a node `n', while keeping
the node's type_id, anchor and id.
char *
syck_str_read( SyckNode *n )
Returns a pointer to the null-terminated string inside scalar node
`n'. Normally, you might just want to use:
char *ptr = n->data.str->ptr
long len = n->data.str->len
SyckNode *
syck_new_map( SYMID key, SYMID value )
Allocates a new map with an initial pair of nodes.
syck_map_empty( SyckNode *n )
Empties the set of pairs for a mapping node.
syck_map_add( SyckNode *n, SYMID key, SYMID value )
Pushes a key-value pair on the mapping. While the ordering
of pairs DOES affect the ordering of pairs on output, loaded
nodes are deliberately out of order (since YAML mappings do
not preserve ordering.)
See YAML's builtin !omap type for ordering in mapping nodes.
syck_map_read( SyckNode *n, enum map_part, long index )
Loads a specific key or value from position `index' within
a mapping node. Great for iteration:
for ( i = 0; i < syck_map_count( n ); i++ ) {
SYMID key = sym_map_read( n, map_key, i );
SYMID val = sym_map_read( n, map_value, i );
syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id )
Replaces a specific key or value at position `index' within
a mapping node. Useful for replacement only, will not allocate
more room when assigned beyond the end of the pair list.
syck_map_count( SyckNode *n )
Returns a count of the pairs contained by the mapping node.
syck_map_update( SyckNode *n, SyckNode *n2 )
Combines all pairs from mapping node `n2' into mapping node
SyckNode *
syck_new_seq( SYMID val )
Allocates a new seq with an entry `val'.
syck_seq_empty( SyckNode *n )
Empties a sequence node `n'.
syck_seq_add( SyckNode *n, SYMID val )
Pushes a new item `val' onto the end of the sequence.
syck_seq_assign( SyckNode *n, long index, SYMID val )
Replaces the item at position `index' in the sequence
node with item `val'. Useful for replacement only, will not allocate
more room when assigned beyond the end of the pair list.
syck_seq_read( SyckNode *n, long index )
Reads the item at position `index' in the sequence node.
Again, for iteration:
for ( i = 0; i < syck_seq_count( n ); i++ ) {
SYMID val = sym_seq_read( n, i );
syck_seq_count( SyckNode *n )
Returns a count of items contained by sequence node `n'.
== YAML Parser ==
Syck's YAML parser is extremely simple. After setting up a
SyckParser struct, along with callback functions for loading
node data, use syck_parse() to start reading data. Since
syck_parse() only reads single documents, the stream can be
managed by calling syck_parse() repeatedly for an IO source.
The parser has four callbacks: one for reading from the IO
source, one for handling errors that show up, one for
handling nodes as they come in, one for handling bad
anchors in the document. Nodes are loaded in the order they
appear in the YAML document, however nested nodes are loaded
before their parent.
=== How to Write a Node Handler ===
Inside the node handler, the normal process should be:
1. Convert the SyckNode data to a structure meaningful
to your application.
2. Check for the bad anchor caveat described in the
next section.
3. Add the new structure to the symbol table attached
to the parser. Found at parser->syms.
4. Return the SYMID reserved in the symbol table.
=== Nodes and Memory Allocation ===
One thing about SyckNodes passed into your handler:
Syck WILL free the node once your handler is done with it.
The node is temporary. So, if you plan on keeping a node
around, you'll need to make yourself a new copy.
And you'll probably need to reassign all the items
in a sequence and pairs in a map. You can do this
with syck_seq_assign() and syck_map_assign(). But, before
you do that, you might consider using your own node structure
that fits your application better.
=== A Note About Anchors in Parsing ===
YAML anchors can be recursive. This means deeper alias nodes
can be loaded before the anchor. This is the trickiest part
of the loading process.
Assuming this YAML document:
--- &a [*a]
The loading process is:
1. Load alias *a by calling parser->bad_anchor_handler, which
reserves a SYMID in the symbol table.
2. The `a' anchor is added to Syck's own anchor table,
referencing the SYMID above.
3. When the anchor &a is found, the SyckNode created is
given the SYMID of the bad anchor node above. (Usually
nodes created at this stage have the `id' blank.)
4. The parser->handler function is called with that node.
Check for node->id in the handler and overwrite the
bad anchor node with the new node.
=== Parser API ===
See <syck.h> for layouts of SyckParser and SyckNode.
SyckParser *
Creates a new Syck parser.
syck_free_parser( SyckParser *p )
Frees the parser, as well as associated symbol tables
and buffers.
syck_parser_implicit_typing( SyckParser *p, int on )
Toggles implicit typing of builtin YAML types. If
this is passed a zero, YAML builtin types will be
ignored (!int, !float, etc.) The default is 1.
syck_parser_taguri_expansion( SyckParser *p, int on )
Toggles expansion of types in full taguri. This
defaults to 1 and is recommended to stay as 1.
Turning this off removes a layer of abstraction
that will cause incompatibilities between YAML
documents of differing versions.
syck_parser_handler( SyckParser *p, SyckNodeHandler h )
Assign a callback function as a node handler. The
SyckNodeHandler signature looks like this:
SYMID node_handler( SyckParser *p, SyckNode *n )
syck_parser_error_handler( SyckParser *p, SyckErrorHandler h )
Assign a callback function as an error handler. The
SyckErrorHandler signature looks like this:
void error_handler( SyckParser *p, char *str )
syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h )
Assign a callback function as a bad anchor handler.
The SyckBadAnchorHandler signature looks like this:
SyckNode *bad_anchor_handler( SyckParser *p, char *anchor )
syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r )
Assigns a FILE pointer as an IO source and a callback function
which handles buffering of that IO source.
The SyckIoFileRead signature looks like this:
long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip );
Syck comes with a default FILE handler named `syck_io_file_read'. You
can assign this default handler explicitly or by simply passing in NULL
as the `r' parameter.
syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r )
Assigns a string as the IO source with a callback function `r'
which handles buffering of the string.
The SyckIoStrRead signature looks like this:
long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip );
Syck comes with a default string handler named `syck_io_str_read'. You
can assign this default handler explicitly or by simply passing in NULL
as the `r' parameter.
syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r )
Same as the above, but uses strlen() to determine string size.
syck_parse( SyckParser *p )
Parses a single document from the YAML stream, returning the SYMID for
the root node.
== YAML Emitter ==
Since the YAML 0.50 release, Syck has featured a new emitter API. The idea
here is to let Syck figure out shortcuts that will clean up output, detect
builtin YAML types and -- especially -- determine the best way to format
outgoing strings.
The trick with the emitter is to learn its functions and let it do its
job. If you don't like the formatting Syck is producing, please get in
contact the author and pitch your ideas!!
Like the YAML parser, the emitter has a couple of callbacks: namely,
one for IO output and one for handling nodes. Nodes aren't necessarily
SyckNodes. Since we're ultimately worried about creating a string, SyckNodes
become sort of unnecessary.
=== The Emitter Process ===
1. Traverse the structure you will be emitting, registering all nodes
with the emitter using syck_emitter_mark_node(). This step will
determine anchors and aliases in advance.
2. Call syck_emit() to begin emitting the root node.
3. Within your emitter handler, use the syck_emit_* convenience methods
to build the document.
4. Call syck_emit_flush() to end the document and push the remaining
document to the IO stream. Or continue to add documents to the output
stream with syck_emit().
=== Emitter API ===
See <syck.h> for the layout of SyckEmitter.
SyckEmitter *
Creates a new Syck emitter.
syck_emitter_mark_node( SyckEmitter *e, st_data_t node )
Adds an outgoing node to the symbol table, allocating an anchor
for it if it has repeated in the document and scanning the type
tag for auto-shortcut.
syck_output_handler( SyckEmitter *e, SyckOutputHandler out )
Assigns a callback as the output handler.
void *out_handler( SyckEmitter *e, char * ptr, long len );
Receives the emitter object, pointer to the buffer and a count
of bytes which should be read from the buffer.
syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler