Skip to content

Can [[wikilink]] in YAML ever be "done right"? NO. But there could be a "route of least pain". #137

Moonbase59 started this conversation in General
Can [[wikilink]] in YAML ever be "done right"? NO. But there could be a "route of least pain". #137
Jun 23, 2021 · 3 comments · 6 replies

Here are some thoughts I wrote up, detailing the problems of using Obsidian "wikilinks" ([[link]] style) in YAML, and a proposed "best route to go". Since Obsidian Leaflet’s code block is also in YAML format (and parsed as such), this is becoming more and more interesting. And error-prone.

The original file is from my notes, uses the Dataview and Templater plugins to show some points, and will not work on GitHub, so please just download it and extract into your own Obsidian vault.

Test YAML, with 'single' and "double" quotes.md.zip

I’ll nevertheless copy-paste the text here, so everyone can follow, more or less:


---
date: 2021-06-23
author: Matthias C. Hormann a.k.a. Moonbase59
file1: [[Test YAML, with 'single' and "double" quotes#Test YAML with 'single' and double quotes|linked to its own H1]]
---

Test YAML, with 'single' and "double" quotes

This is a perfectly legal filename in Obsidian and can be [[Test YAML, with 'single' and "double" quotes#Test YAML with 'single' and double quotes|linked to its own H1]].

How can we safely represent this in YAML?

To be clear, the Obsidian "wikilink" looks like this:

[[Test YAML, with 'single' and "double" quotes#Test YAML with 'single' and double quotes|linked to its own H1]]

We can see the Obsidian link suggester removes the comma , and double quotes " for its header link, but leaves the single quotes ' in.

Using the complete link in YAML

file1: [[Test YAML, with 'single' and "double" quotes#Test YAML with 'single' and double quotes|linked to its own H1]]

shows us that Obsidian’s syntax highlighter is broken (it shows a YAML comment where there is none), and gives us:

---
!!map {
  ? !!str "file1"
  : !!seq [
    !!seq [
      !!str "Test YAML",
      !!str "with 'single' and \"double\" quotes#Test YAML with 'single' and double\
        \ quotes|linked to its own H1",
    ],
  ],
}

Interestingly, this seems to be "self-quoting" in some way, and it doesn’t break at the hash # because there is no space before it. (A YAML comment starts with <space>#.)

The "self-quoting" probably comes from the YAML parser seeing a plain !!str scalar within a !!seq within another !!seq (double brackets [[). Plain scalars can use \ backslashes, " quotes ' and $ a % lot /&?+ of other {} [] stuff without the need of quoting.

See also [[#Characters that cannot be used at the beginning of a plain scalar]], [[#Character sequences that can't be used inside a scalar]], Plain Scalars (yaml.info).

We can see the YAML parser returns a sequence !!seq of two strings !!str, namely

"Test YAML"

and

"with 'single' and \"double\" quotes#Test YAML with 'single' and double quotes|linked to its own H1"

Dataview

Dataview output: ===this.file1==

Dataview creates a working link. We don’t know how it parses YAML data, probably using its own parser, or parser exceptions.

Templater

Templater output: ==<%+ tp.frontmatter.file1 %>==

We can see Templater blindly concatenates all elements of the !!seq, using a comma , for joining (watch closely: the blank after "Test," is missing). Assuming is not a good strategy in programming.

Obsidian Leaflet

The Obsidian Leaflet plugin uses a YAML parser to parse the code block. (Which?)

Some of the wikilinks work, some don’t, some variables look like wikilinks but aren’t.

  • markerFile: [[Elvenking’s Halls]] works
  • This works:
    geojson:
      - [[geo-fangorn.geojson]]
      - [[geo-line.geojson]]
  • geojson: [[Columbus, OH]] doesn’t work
  • geojson: [["Columbus, OH"]] works (but breaks auto-suggest)
  • geojson: "[[Columbus, OH]]" doesn’t work
  • overlay: [[red, [705.88, 941.175], 100 miles, "Test Overlay"]] is no wikilink, but looks like one (and works)

No good solution seems to be at hand

Still, for Obsidian Leaflet, …

  1. We have to come up with something most usable and easy.
  2. It should—as far as possible—adhere to the YAML 1.2 spec.
  3. We should be able to continue using the YAML parser for code blocks.
  4. We must tell users about possible exceptions and how to circumvent those.

Most compatible solution for wikilinks I can think of

  • Allow to use wikilinks in the [[link]] format.
  • If needed for naming things (like layers), internally strip off the |name part.
  • Use plain !!str scalars inside the brackets, since these allow the most characters without escaping.
  • Obsidian forbids colons : in file names, so we don’t have the colon problem.
  • Assume concatenation for comma-separated scalars using a ,<space> joiner. (Most people will use a blank after a comma.) Tell users in the docs.
  • Tell users what isn’t allowed in plain scalars (i.e., "wikilinks"): [[#Characters that cannot be used at the beginning of a plain scalar|Characters not allowed at the beginning]], [[#Character sequences that can't be used inside a scalar|character sequences that are not allowed within]].
  • For the more technically inclined, let them know how to circumvent this restriction (i.e., by building the Obsidian link using the suggester, then using quoted strings within the [[…]] sequence).
  • Use a uniform "drilldown" function that can …
    • handle "array in array" notation ([[link]]),
    • "array in array in array" (multiple [[link]] on - lines, or [ [[link1]], [[link2]] ] notation),
    • concatenate plain scalars in the innermost level, if needed (option?), using a ,<space> joiner,
    • strip the |name part of an Obsidian link for further use (layer names),
    • arrive at a functioning file name (and return it, besides a possible "name").

Most compatible solution for tags I can think of

In order to not break YAML by accidentally using the hash # as a comment introducer, I think using tags in the non-hash notation, like Obsidian’s YAML tags: might be best. So we could have:

markerTags: family, friends

or

markerTags: [family, friends]

The user must still know [[#List of things not allowed for YAML scalars|what’s not allowed]] in YAML scalars.

Caveat: Obsidian’s tag suggester can’t be used. To be precise, it can be used but the hashes # have to be removed manually.

List of things not allowed for YAML scalars

Other good sources:

Characters that cannot be used at the beginning of a plain scalar

  • ! Tag like !!null
  • & Anchor like &mapping_for_later_use
  • * Alias like *mapping_for_later_use
  • -<space> Block sequence entry
  • :<space> Block mapping entry
  • ?<space> Explicit mapping key
  • {, }, [, ] Flow mapping or sequence
  • , Flow Collection entry separator
  • # Comment
  • |, > Block Scalar
  • @, ` (backtick) Reserved characters
  • ", ' Double and single quote
  • <whitespace>
  • % Directive

Character sequences that can't be used inside a scalar

flow style sequence: [ string one, string two ]
flow style mapping: { key: value }

As you can see, a comma or a square bracket will end a plain scalar. Therefore, to avoid confusion, the following characters or character sequences are not allowed in plain scalars:

  • :<space> Key/value seperator. A colon without following whitespace is allowed.
  • <space># This starts a comment.
  • [, ]
  • {, }
  • , (possible exception: assumed auto-concatenation using ,<space> joiner)
  • :[
  • :]
  • :{
  • :}
  • :,

Special types you shouldn’t use in wikilinks or tags

Another use case for quotes is when you have a string that would be resolved as a special type. This highly depends on the YAML version and on the Schema in use. Here are some examples where you need quotes:

  • true, false Boolean values
  • 23 Integer numbers
  • 1e3 Numbers with exponent
  • 3.14159 Float numbers
  • null

Replies

3 comments
·
6 replies

This is a great writeup. I haven't had much time to sit down and look at this plugin for a few days (got busy :) ), but I appreciate the significant amount of effort that went into this.

As far as the YAML parser, I am using Obsidian's exposed parseYAML function (https://github.com/obsidianmd/obsidian-api/blob/b0aa06eab03d8c39c0733949da5aa1edc2084b01/obsidian.d.ts#L2064). I believe they use yaml.js, as their error messages are similar.

For Dataview, just took a look through the code:

I have no experience with parsimmon (which is what he is using to parse the Query language), but the wikilinks are pulled out here:

https://github.com/blacksmithgu/obsidian-dataview/blob/905b9d0b2216b9c6e0f290b3ff6eda5fc25c9ecf/src/expression/parse.ts#L200

And the parsing is defined here:

https://github.com/blacksmithgu/obsidian-dataview/blob/905b9d0b2216b9c6e0f290b3ff6eda5fc25c9ecf/src/expression/parse.ts#L42-L59

Possibly what I could do here is pull out any parameter in the code block that matches a wikilink RegExp, add it to an internal map and replace the wikilink in the source with the YAML-safe index from the map, parse the YAML, then re-add the wikilink into my parameters object.

5 replies
@valentine195

The same could be done for the markerTags field - if it's detected, pull the tags out before the YAML parsing and then re-add them.

@Moonbase59

Good to hear from you! Of course we’re all awaiting the next great optimizations, but of course—I think I can speak for all of us here—we appreciate what efforts you put into it, and that sometimes the real life has precedence!

So I took the time to do some thinking and experimenting in the meantime, the idea of collaboration in mind. I’m not a good Javascript or Typescript programmer, but I can often see things "from the 10,000 ft above perspective", and have been known to pinpoint possible weaknesses and drill down to details as well. This is where 43 years of IT pay off. ;-) Also, I just love how we can work together, and what comes out of it.

Now YAML is "human-friendly" (they say) but it has many syntax and especially quoting quirks, so I wouldn’t be too sure about coming up with a "perfect" regex. And how would you distinguish things that look like a wikilink but aren’t? We do have to have "arrays in arays" (see "overlay" example).

I think—in order to come up with a foolproof plugin—the number of "special exceptions" needs to be kept at a minimum, and basic functions be used that even work in all extreme cases. So best to think together and come up with a good way. It will also help to keep the code clean, and make the thing as robust as can be. (Always my goal: It’s easier to modify a function, or extend a class, than to check "spaghetti code" for possible incompatibilities and problems. Especially when you have to add something months or years later.)

If your idea is possible, taking into account all the "specials" (links with #header, ^pointer, |name), it might be a way. We don’t want to take away all the linking variants in the maps, but still be able to use some features for ourselves (like image: [[my-image.png|layername]]).

What parser do you currently use for the code block parsing?

I’ll certainly have a peek at @blacksmithgu’s way of doing it, thanks for the pointers. I only fear Michael is starting out to make his Dataview a "monster"—devising your own language is extremely hard and "quick shots" at correcting things often turn out to be disastrous, especially when we talk about code maintenance.

Anyways (time for writing novels today, huh?), I’m sure we’ll come up with something that might not be "perfect", but clean and maintainable!

@valentine195

And how would you distinguish things that look like a wikilink but aren’t?

One way is to look to see if the match exists in the vault. It's a rather small overhead.

Basically, just parse the source string (not the YAML-parsed object) for anything matching /\[\[([^\[\]]*?)\]\]/u, then test the results - any results found in the vault would be mapped to something YAML-friendly (something like INTERNAL_LINK_1: result), then replaced in the source string by the index key (INTERNAL_LINK_1), then the YAML would be parsed.

What parser do you currently use for the code block parsing?

Currently, I try to use Obsidian's parseYAML and, if that fails, I manually parse it using my old parser. See here.

FYI, I just pushed 3.19.0 which implements this YAML pre-parsing.

1 reply
@Moonbase59

Not (yet) completely, see #134 (comment)

I’d say @valentine195’s proposed way of YAML preparsing in 3.19.1+ solves this problem most user-friendly, i.e. Obsidian’s auto-suggest can be used, and most "odd" file names don’t present a problem anymore.

I still recommend keeping this somewhere for reference regarding what might be illegal YAML.

0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants