Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding support for task lists in GitHubFlavor #461

Open
jirijakes opened this issue Jul 8, 2023 · 6 comments
Open

adding support for task lists in GitHubFlavor #461

jirijakes opened this issue Jul 8, 2023 · 6 comments
Milestone

Comments

@jirijakes
Copy link

When Markdown input contains task list:

Todo:

 - [x] Task 1
 - [ ] Task 2

rendering into HTML produces an error:

import laika.api.Transformer
import laika.format.{Markdown, HTML}
import laika.markdown.github.GitHubFlavor

Transformer.from(Markdown).to(HTML).using(GitHubFlavor).build.transform(
"""Todo:

 - [x] Task 1
 - [ ] Task 2"""
)
Either[laika.parse.markup.DocumentParser.TransformationError, String] =
  Left(laika.parse.markup.DocumentParser$ParserError:
    Error parsing document '/': One or more error nodes in result:

  [3]: unresolved link id reference: x

   - [x] Task 1
     ^

  [4]: unresolved link id reference:

   - [ ] Task 2
     ^)

Tested with version 0.19.3.

@jenshalm
Copy link
Contributor

jenshalm commented Jul 8, 2023

Task lists are specific to GitHub and not official Markdown syntax. There are currently no plans to support them in Laika.

They could, theoretically, get added to the existing GitHubFlavor support in Laika, but I won't have the bandwidth in the near future to work on this as it would be a non-trivial piece of work, in particular as it has to somehow shadow the existing bullet list parser which is part of official Markdown syntax. It would also require some interesting decisions on how to render them in EPUB or PDF.

I'll be adding this to the backlog, in case I have time for this sometime next year or in case someone else wants to pick this up.

You see the error, since without support for task lists [x] is parsed as a link (it's standard Markdown link syntax) and x does not exist as a target.

@jenshalm jenshalm added this to the Backlog milestone Jul 8, 2023
@jirijakes
Copy link
Author

Thank you for the response, Jens.

@jenshalm jenshalm changed the title Cannot render task list adding support for task lists in GitHubFlavor Jul 8, 2023
@jenshalm
Copy link
Contributor

jenshalm commented Jul 8, 2023

I changed the title to make it clear that this is essentially a feature request, not a bug.

@jirijakes
Copy link
Author

I played with this for a while and found two ways using an extension, one with a new block parser (adapted regular list parser), one with a rewrite rule. Both ways are based on detecting the special case of having [ ] or [x] right at the beginning of a list item and rewriting it. Most likely it misses some cases and I guess it also may not be the best way, but here ends my familiarity with Laika's internals.

I do not think this would be ready for a PR but if you think it is close to it, I may try open one. Could also work with PDF and EPUB, since the check box is just an ordinary Unicode character (I haven't tried though).

Anyway, for anybody who may need to parse Github's tasks lists:

object TaskLists extends ExtensionBundle {
  override def description: String = "TaskLists"

  def rewriteTaskItem(content: Seq[Block]): Seq[Block] =
    content.map {
      case Paragraph(LinkIdReference(_, "x" | "X", _, _) +: rest, o) =>
        Paragraph(Text("") +: rest, o)
      case Paragraph(LinkIdReference(_, " ", _, _) +: rest, o) =>
        Paragraph(Text("") +: rest, o)
      case other => other
    }

  // As parser

  val taskListParser: BlockParserBuilderOps = {
    val bulletChar: Parser[String] = oneOf('*', '-', '+')
    val after: Parser[String]      = someOf(' ', '\t') ~>
      lookAhead(literal("[ ]") | literal("[x]") | literal("[X]"))

    BlockParser.recursive { implicit recParsers =>
      PrefixedParser('*', '-', '+') {
        lookAhead(oneChar) >> { char =>
          val bullet = StringBullet(char)
          list(
            bulletChar,
            after,
            BulletList(_: List[BulletListItem], bullet),
            (_, blocks) => BulletListItem(rewriteTaskItem(blocks), bullet)
          )
        }
      }
    }
  }

  override def parsers: ParserBundle = ParserBundle(blockParsers = List(taskListParser))

  // As rewrite rule

  val rewriteTaskLists: RewriteRules =
    RewriteRules(
      blockRules = List({ case list: BulletList =>
        val newContent = list.content.map(item => item.withContent(rewriteTaskItem(item.content)))
        Replace(list.copy(content = newContent))
      })
    )

  override def rewriteRules: RewriteRules.RewritePhaseBuilder = { case RewritePhase.Build =>
    List(_ => Right(rewriteTaskLists))
  }

}
Transformer.from(Markdown).to(HTML).using(TaskLists).build.transform("""
- [x] done
- [ ] todo
- [X] DONE
""")

produces

<ul>
  <li>☒ done</li>
  <li>☐ todo</li>
  <li>☒ DONE</li>
</ul>

By the way, Jens, a couple hours ago was my first time ever digging into Laika and I have to say it has been a pleasure! Very easy to navigate around and understand, thanks to all the type aliases, names and documentation.

@jenshalm
Copy link
Contributor

jenshalm commented Jul 9, 2023

That's kind of cool how quickly you found two workarounds for the problem. I'm always a little afraid new developers get lost in the huge API surface of Laika. From the two variants I think I prefer the parser solution as the rewrite rule relies on invalid AST nodes being created first. Just be aware that the list function you are using for the parser will be package-private API in version 1.x, so you'd need to implement your extension inside that package (and you couldn't rely on API stability for that method).

For a PR for official Laika support though, a lot more work would need to be done, and I'll try my best to summarize the steps below, just in case you or someone else wants to pick this up.


AST

The downside of your solution is that a significant aspect of the model (the presence of a task list) is hidden inside a tiny detail within a regular BulletList. This is not how the AST is intended to be used. If people would want to match on such a node they'd need to write very low-level code (and they could not even rely on a Text("☐") node being present since Laika often optimizes the model by merging adjacent Text nodes). For proper AST support new types would need to be introduced, similar to existing list variants (e.g. BulletList/BulletListItem, EnumList/EnumListItem, etc.). They could be called TaskList and TaskListItem and live in the lists.scala file in the laika.ast package. The best signature for TaskListItem would probably be this:

case class TaskListItem(content: Seq[Block], done: Boolean, options: Options = NoOpt) 
  extends ListItem with BlockContainer

Here, content would not contain a unicode character for a checkbox as that information is kept in a simple boolean flag.

Parser

The Markdown syntax is specific to GitHub and not part of any standard, so it would live in the laika.markdown.github package and get linked into the existing GitHubFlavor type. The parser would produce the new AST types.

Core Renderer

This is a second downside of your workaround: it kind of hard-codes the assumption the checkbox gets rendered by a unicode character of the standard text font. These kind of decisions are usually left to the renderer and not hardcoded in the AST. The default renderer would probably still use those unicode characters as it does not have access to any icon sets without theme support. But it would need to be implemented in both, HTMLRenderer and FORenderer (the latter for PDF), to support all output formats. This is another hurdle, as the author would need to understand the XSL-FO format for this task.

Test Suites

Since Laika is not tailored for the single use case of transforming Markdown to HTML (even though it is the most common one), parsers and renderers are usually tested separately. There are very few test suites that go the full way from markup input to HTML output. The parser would need to be tested in GitHubFlavorSpec, going from text input to AST nodes as result. It would need to include tests for edge cases and incorrect input. Since it is not part of any official spec, the only guidance would be the behaviour on this site, including checks what GitHub does with unusual inputs (e.g. just a single item in a list missing the check box). The renderer would need to be tested in both HTMLRendererSpec and FORendererSpec, here going from AST nodes as input to HTML strings and XSL-FO strings as output.

Theme Support (Helium)

The built-in Helium theme usually adds a bit more comfort on top of the core renderers. For example, picking a unicode character of the standard text font is often not the best look & feel (on this site for example it renders with a very tiny font which does not look very good imho). I'm also not sure how commonly text fonts include those unicode characters. The Helium theme includes an icon set and we can add support for those there. Ideally we would expand the integrated icofont with two new glyphs if it has support for those symbols. If not we'd need to pick SVG icons with compatible license or create them ourselves. The icons would also need to become part of the API (e.g. HeliumIcon.checkboxOn and HeliumIcon.checkbox.Off). We'd also need to decide on the best representation (a checkbox with a tick seems to be more common than a checkbox with an X).

Finally, the default theme CSS would need to be adjusted, since task lists commonly render without a visible bullet.


That's all I can think of at the moment. You probably see by now why it is not very high priority for me. While none of the steps above is particularly challenging, it's a significant amount of work and the gain is somewhat limited in my view - for two reasons: the first is that I think this is a feature that is most useful on sites with interactivity, like this site, where you can check/uncheck and re-order items. This can only be delivered with a backend, so will never be supported out of the box in Laika/Helium. The use case for our users is mostly around producing static sites, and for those the value of task lists is somewhat limited. The second reason is that it is not a requested feature. The number of people, worldwide, who are interested in this feature just recently went from 0 to 1. 🙂

That being said, if anyone still wants to go through all the steps above, I'm happy to merge such a PR. It's just unlikely I'll prioritise it myself in the near future for the reasons given.

@jirijakes
Copy link
Author

Thank you very much for the interesting explanation, Jens.

Alright, I understand that it might work as a quick and dirty solution for my needs but it would not be a good way to integrate with the rest of the library.

I don't think I will get further into adding this feature into Laika now. Although it might be good adventure to explore the inner-most parts of it someday.

By the way, my use case is rendering Github issues and comments into a static HTML and perhaps with some rewriting. Turns out there are quite many task lists on Github. Laika was first choice I explored as I had remembered it from the past and have always had the connection Scala–Markdown–Laika in my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants