A PHP library for parsing NITF (News Industry Text Format) XML documents into a flat, searchable structure optimized for Meilisearch and similar full-text search engines.
composer require tacman/ntif-parser- PHP 8.4+
use Tacman\NTF\NTF;
// Parse from file
$ntf = NTF::fromFile('article.xml');
// Or from XML string
$ntf = NTF::fromXml($xmlString);
// Or from a zip archive containing multiple NITF files
foreach (NTF::fromZip('articles.zip') as $ntf) {
echo $ntf->headline;
}
// Get a flat array ready for indexing
$searchable = $ntf->toSearchable();The NTF class provides these public properties:
| Property | Type | Description |
|---|---|---|
$id |
string |
Document ID (from doc-id/@id-string) |
$headline |
string |
Main headline (from hl1) |
$subhead |
string |
Sub-headline (from hl2) |
$byline |
string |
Author byline |
$summary |
string |
Article summary/abstract |
$body |
string |
Full body text (all <p> elements joined) |
$keywords |
string[] |
Keywords from key-list |
$categories |
array |
Classifications as ['type' => '...', 'value' => '...'] |
$images |
array |
Media references with source, name, mimeType |
$publishedAt |
?DateTime |
Publication date |
$modifiedAt |
?DateTime |
Last modification date |
$section |
?string |
Publication section |
$type |
?string |
Publication type |
The toSearchable() method returns a flat array ready for direct indexing:
$ntf = NTF::fromFile('article.xml');
$searchable = $ntf->toSearchable();
// Index directly into Meilisearch
$client->index('articles')->addDocuments([$searchable]);The searchable array includes all fields with:
publishedAtandmodifiedAtas ISO 8601 stringskeywordsas an arraycategoriesandimagesas JSON arrays
Process large archives efficiently using the generator:
// Iterate through all NITF files in a zip
$count = 0;
foreach (NTF::fromZip('archive.zip') as $ntf) {
$count++;
// Process each document
}
// Or get all as an array
$all = NTF::allFromZip('archive.zip');The zip parser:
- Only processes
.xmlfiles - Skips invalid XML files silently
- Uses a generator for memory efficiency
Given a NITF XML file:
<?xml version="1.0" encoding="UTF-8"?>
<nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
<head>
<docdata>
<doc-id id-string="abc123"/>
<date.release norm="2026-01-15T00:01:00Z"/>
<key-list>
<keyword key="#news"/>
<keyword key="#sports"/>
</key-list>
</docdata>
<pubdata type="web" position.section="news/sports"/>
</head>
<body>
<body.head>
<hedline>
<hl1>Big Game Today</hl1>
<hl2>Preview and analysis</hl2>
</hedline>
<byline>By John Smith</byline>
</body.head>
<body.content>
<p>First paragraph of the article...</p>
<p>Second paragraph...</p>
</body.content>
</body>
</nitf>You get:
$ntf->id; // "abc123"
$ntf->headline; // "Big Game Today"
$ntf->subhead; // "Preview and analysis"
$ntf->byline; // "By John Smith"
$ntf->body; // "First paragraph...\n\nSecond paragraph..."
$ntf->keywords; // ["#news", "#sports"]
$ntf->section; // "news/sports"
$ntf->publishedAt; // DateTime object./vendor/bin/phpunitMIT