A PHP library for converting files tagged with corpus metadata to JSON, PHP, or XML.
Corpus linguistics researchers use a markup-like syntax to provide metadata about texts. For consumption by applications, this syntax needs to be converted into a more universal, machine-readable format. The format chosen was JSON.
The included /demo/index.php
file contains a conversion form demonstration.
Make your code aware of the TagConverter class via your favorite method (e.g.,
use
or require
)
Then pass a string of text into the class:
$text = TagConverter::json('<MyTag: 123>My tagged text here');
echo $text;
// Returns {"MyTag":"123","text":"My tagged text here"}
$text = TagConverter::php('<MyTag: 123>My tagged text here');
echo $text;
// Returns array('MyTag' => '123', 'text' => 'My tagged text here')
$text = TagConverter::xml('<MyTag: 123>My tagged text here');
echo $text;
// Returns <?xml version="1.0"?><root><MyTag>123</MyTag><text>My tagged text here</text></root>
The corpus style tagging syntax expected by the library is defined as follows:
- Tags must be wrapped in
<
and>
- Tag names and tag values may only alphanumeric characters, spaces, underscores, and hypens.
- Tag names must be separated from tag values by a
:
- Spaces at the beginning at end of tag names or tag values are ignored; spaces within tag values will be preserved
- Everything not wrapped in
<
and>
will be considered "text"
Status | Tag Example | Explanation |
---|---|---|
Good | <MyTag:SomeText> |
|
Good | <My Tag:Some Text> |
Spaces in tag names & values OK |
Good | < My Tag : Some Text > |
Spaces padding tag names & values OK |
Good | < My-Tag : Some_Text > |
Underscores & hyphens OK |
Good | ```< My-Tag : Value 1 | Value 2 >``` |
Good | < My-Tag : Value 1 ; Value 2 > |
Semicolon separators for multiple values |
Bad | < My/Tag : Some:Text > |
Other characters not OK |
Unit Tests can be run (after composer install
) by executing vendor/bin/phpunit