Skip to content

zspitz/PandocFilters

Repository files navigation

PandocFilters

AppVeyor build status NuGet Status

Write Pandoc filters in .NET, using strongly-typed data structures for the Pandoc AST.

Pandoc filters

Pandoc is a command-line program and Haskell library for converting documents from and to many different formats. Documents are translated from the input format to an AST (defined in the Text.Pandoc.Definition module), which is then used to create the output format.

Pandoc allows writing filters — programs that intercept the AST as JSON from standard input, modify the AST, and write it back out to standard output. Filters can be run using the pipe operator (| on Linux, > on Windows):

pandoc -s input.md -t json | my-filter | pandoc -s -f json -o output.html

or using the Pandoc --filter command-line option:

pandoc -s input.md --filter my-filter -o output.html

Pandoc AST

Much of the JSON-serialized AST comes in the form of objects with a t and c property1:

{
    "t": "Para",
    "c": [

    ]
}

This corresponds to a Para object with properties filled with the values at the c property.

The library defines types and base classes for both levels:

Type level Description Namespace Visitor base class
Raw Objects with a t and c property PandocFilters.Raw RawVisitorBase
Higher-level AST e.g. Para type PandocFilters.Ast VisitorBase

The library also includes two predefined visitors — DelegateVisitor and RawDelegateVisitor — which can be extended by adding delegates via the Add method, instead of defining a new class (see below for sample).

1. All the types in pandoc-types except for the root Pandoc type and the Citation type.

Usage

  1. Create a console application.
  2. Install the PandocFilters NuGet package.
  3. Define your visitor — either
    • write a class that inherits from one of the visitor base classes, and create an instance of the class, or
    • create an instance of the appropriate delegate visitor class, and append delegates using the Add methods.
  4. Pass the instance into Filter.Run.
  5. Either pass your program to Pandoc using --filter; or pipe the JSON output from Pandoc into your program, and pipe the outout back into Pandoc.

Note that Filter.Run takes an arbitrary number of visitors — you can create multiple visitors and pass them into Filter.Run.

Sample

using System.Diagnostics;
using System.Linq;
using PandocFilters;
using PandocFilters.Types;

var visitor = new RemoveImagePositioning();
Filter.Run(visitor);

class RemoveImagePositioning : VisitorBase {
    public override Image VisitImage(Image image) =>
        image with {
            Attr = image.Attr with {
                KeyValuePairs = 
                    img.Attr.KeyValuePairs
                        .Where(x => x.Item1 != "height" && x.Item1 != "width"))
                        .ToImmutableList()
            }
        };
}

Using the delegate visitor:

using System.Diagnostics;
using System.Linq;
using PandocFilters;
using PandocFilters.Types;

var visitor = new DelegateVisitor();
visitor.Add((Image image) => image with {
    Attr = image.Attr with {
        KeyValuePairs =
            img.Attr.KeyValuePairs
                .Where(x => x.Item1 != "height" && x.Item1 != "width"))
                .ToImmutableList()

    }
});
Filter.Run(visitor);

For a real-world usage example with multiple visitors (and the reason I wrote this in the first place), see DlrDocsProcessor.

Credits

Notes

  • PandocFilters is written against the types in pandoc-types 1.22. When pandoc-types is updated, code written against the raw types will successfully receive the JSON-source data structures; while code written against the higher-level types will conceivably fail in the JSON parsing stage.
  • The library uses C# 9 record types (and System.Collections.Immutable) to enforce immutability; otherwise we'd have to check for circular references before serializing. If you're using C# 9 or later, you can use the with keyword to clone/initialize the returned instance; otherwise you'll have to pass in all arguments to the constructor.