Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
JSON (Simple/Stacked/Stateful) Lexer (C)
branch: master

This branch is 5 commits ahead, 39 commits behind mnunberg:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
build
examples fixed up test program. added some ascii tweaks
.gitignore
LICENSE
Makefile
README.pod
json_samples.tgz
json_test.c
jsonsl.c
jsonsl.h

README.pod

JSONSL

JSON Stateful (or Simple, or Stacked) Lexer

Why another (and yet another) JSON lexer?

I took inspiration from some of the uses of YAJL, which looked quite nice, but whose build system seemed unusable, source horribly mangled, and grown beyond its original design. In other words, I saw it as a bunch of cruft.

Instead of bothering to spend a few days figuring out how to use it, I came to a conclusion that the tasks I needed (simple token notifications coupled with some kind of state shift detection), I could do with a simple, small, ANSI C embeddable source file.

I am still not sure if YAJL provides the featureset of JSONSL, but I'm guessing I've got at least some innovation.

JSONSL

Inspiration was also taken from Joyent's http-parser project, which seems to use a similar, embeddable, and simple model.

Here's a quick featureset

Stateful

Maintains state about current descent/recursion/nesting level Furthermore, you can access information about 'lower' stacks as long as they are activ.

Decoupling Object Graph from Data

JSONSL abstracts the object graph from the actual (and usually more CPU-intensive) work of actually populating higher level structures such as "hashes" and "arrays" with "decoded" and "meaningful" values. Using this, one can implement an on-demand type of conversion.

Callback oriented, selectively

Invokes callbacks for all sorts of events, but you can control which kind of events you are interested in receiving without writing a ton of wrapper stubs

Non-Buffering

This doesn't buffer, copy, or allocate any data. The only allocation overhead is during the initialization of the parser, in which the initial stack structures are initialized

Simple

Just a C source file, and a corresponding header file. ANSI C.

The rest of this documentation needs work

Details

Terminology

Because the JSON spec is quite confusing in its terminology, especially when we want to map it to a different model, here is a listing of the terminology used here.

I will use element, object, state interchangeably. They all refer to some form of atomic unit as far as JSON is concerned.

I will use the term hash for those things which look like {"foo":"bar"}, and refer to its contents as keys and values

I will use the term list for those things which look like ["hello", "byebye"], and their contents as list elements or array elements explicitly

Model

States

A state represents a JSON element, this can be a a hash (T_OBJECT), array (T_LIST), hash key (T_HKEY), string (T_STRING), or a 'special' value (T_SPECIAL) which should be either a numeric value, or one of true, false, null.

A state comprises and maintains the following information

Type

This merely states what type it is - as one of the JSONSL_T_* constants mentioned above

Positioning

This contains positioning information mapping the location of the element as an offset relative to the input stream. When a state begins, its start position is set. Whenever control returns back to the state, its current position is updated and set to the point in the stream when the return occured

Extended Information

For non-scalar state types, information regarding the number of children contained is stored.

User Data

This is a simple void* pointer, and allows you to associate your own data with a given state

Stack

A stack consists of multiple states. When a state begins, it is pushed to the stack, and when the state terminates, it is popped from the stack and returns control to the previous stack state.

When a state is popped, the contained information regarding positioning and children is complete, and it is therefore possible to retrieve the entire element in its byte-stream.

Once a state has been popped, it is considered invalid (though it is still valid during the callback).

Below is a diagram of a sample JSON stream annotated with stack/state information.

 Level 0
    {

    Level 1

        Level 2
            "ABC"
        :
        Level 2
            "XYZ"
        ,

    Level 1

        [
        Level 2

            {
            Level 3

                Level 4
                "Foo":"Bar"

            Level 3
            }
        Level 2
        ]
    Level 1
    }

USING

The header file jsonsl.h contains the API. Read it.

As an additional note, you can 'extend' the state structure (thereby eliminating the need to allocate extra pointers for the void *data field) by defining the JSONSL_STATE_USER_FIELDS macro to expand to additonal struct fields.

This is assumed as the default behavior - and should work when you compile your project with jsonsl.c directly.

If you wish to use the 'generic' mode, make sure to #define or -D the JSONSL_STATE_GENERIC macro.

UNICODE

While JSONSL does not support unicode directly (it does not decode \uxxx escapes, nor does it care about any non-ascii characters), you can compile JSONSL using the JSONSL_USE_WCHAR macro. This will make jsonsl iterate over wchar_t characters instead of the good 'ole char. Of course you would need to handle processing the stream correctly to make sure the multibyte stream was complete.

AUTHOR AND COPYRIGHT

Copyright (C) 2012 M. Nunberg.

See LICENSE for license information.

Something went wrong with that request. Please try again.