Skip to content
Convert files tagged with corpus metadata to JSON, PHP, or XML.
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.

Corpus-tagged Text Converter

Circle CI

A PHP library for converting files tagged with corpus metadata to JSON, PHP, or XML.

Screenshot of Conversion


Corpus linguistics researchers use a markup-like syntax to provide metadata about texts. For consumption by applications, this syntax needs to be converted into a more universal, machine-readable format. The format chosen was JSON.

Basic Usage

The included /demo/index.php file contains a conversion form demonstration.

Make your code aware of the TagConverter class via your favorite method (e.g., use or require)

Then pass a string of text into the class:

$text = TagConverter::json('<MyTag: 123>My tagged text here');
echo $text;
// Returns {"MyTag":"123","text":"My tagged text here"}

$text = TagConverter::php('<MyTag: 123>My tagged text here');
echo $text;
// Returns array('MyTag' => '123', 'text' => 'My tagged text here')

$text = TagConverter::xml('<MyTag: 123>My tagged text here');
echo $text;
// Returns <?xml version="1.0"?><root><MyTag>123</MyTag><text>My tagged text here</text></root>

Expected input format

The corpus style tagging syntax expected by the library is defined as follows:

  1. Tags must be wrapped in < and >
  2. Tag names and tag values may only alphanumeric characters, spaces, underscores, and hypens.
  3. Tag names must be separated from tag values by a :
  4. Spaces at the beginning at end of tag names or tag values are ignored; spaces within tag values will be preserved
  5. Everything not wrapped in < and > will be considered "text"
Status Tag Example Explanation
Good <MyTag:SomeText>
Good <My Tag:Some Text> Spaces in tag names & values OK
Good < My Tag : Some Text > Spaces padding tag names & values OK
Good < My-Tag : Some_Text > Underscores & hyphens OK
Bad < My/Tag : Some:Text > Other characters not OK


Unit Tests can be run (after composer install) by executing vendor/bin/phpunit

You can’t perform that action at this time.