feat: tokenizer extension position #3594

UziTech · 2025-01-18T19:37:30Z

Marked version: 15.0.6

Description

Add position property for tokenizer extension to it can pick where to be ran in the lexer.

Fixes Select Extension priority #3590

TODO:

Check benchmark speed (initial check seems to be fine)
Write tests
Write docs

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2025-01-18T19:37:35Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
marked-website	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 25, 2025 3:19am

calculuschild · 2025-02-24T18:59:14Z

I haven't run any benchmarks on this, but I'm curious how much of a slowdown this adds, assuming no extensions. Do you have any measurements?

As an alternative that might be more flexible, but more restructuring... I wonder what you think about extracting each of the parser/renderer steps into an array of "extension-like objects".

I.e., instead of:

      beforeCode();
      // code
      if (token = this.tokenizer.code(src)) {
       ...
       beforeFences();

       // fences
      if (token = this.tokenizer.fences(src)) {
      ...

something like (pseudocode):

const tokenizers = {
  code : ()=>{ if (token = this.tokenizer.code(src)) ... },
  fences : ()=>{ if (token = this.tokenizer.fences(src)) ... },
  ...
}

for ([tokenizerName, tokenizerFunction] in tokenizers) {
  runExtensionBefore(tokenizerName);
  tokenizerFunction();
}

I'm fairly certain there is some speed impact calling the tokenizers from an array/map/object like this but it might make the whole project more flexible for this type of "extension position" customization.

UziTech · 2025-02-26T02:40:09Z

I did try to move the tokenizers to some sort of array but there were two problems.

Some tokenizers require a lot of extra logic that is not easy to move out of the lexer.
I couldn't think of an easy way to provide the array to the user so they can pick where to put their tokenizers

I think just adding a position property like this is the easiest for the user, and even though it is a lot of boilerplate we are not likely going to be adding or removing tokenizers from the lexer.

calculuschild

Implementation seems good to me assuming there's no alternative to the boilerplate.

Remaining comments:

Should this include a documentation update?
Does this need a test for an extension at one of the custom positions?
Is there any significant performance impact? Not sure if the extra function calls between each step are going to be an issue.

calculuschild · 2025-02-28T19:33:07Z

src/Instance.ts

+            if (ext.position && ![...tokenizerBlockPositions, ...tokenizerInlinePositions].includes(ext.position)) {
+              throw new Error(`extension position must be one of '${tokenizerBlockPositions.join("', '")}',  '${tokenizerInlinePositions.join("', '")}'`);
+            }
+            if (!ext.level && !ext.position) {


Should we enforce compatible level and position entries? I.e., a block level doesn't make sense with an inline position. Or, should these properties be mutually-exclusive to avoid unexpected behavior?

They are mutually exclusive. They can both be specified in case an extension wants to support many versions of marked, but versions of marked that allow position will ignore level if position is provided.

calculuschild · 2025-02-28T19:39:37Z

I did try to move the tokenizers to some sort of array but there were two problems.

Ok, fair enough. Looking back, this is what I tried to do back in 2021 and I think we concluded arrays just add too much slowdown anyway. #1872

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it #2043 (comment)

UziTech · 2025-02-28T20:48:49Z

Remaining comments:

Ya, these 3 are in the TODOs section in this PRs description. I will work on them soon.

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it

Looks like we just figured we would get it done when it was actually needed

vercel bot deployed to Preview January 18, 2025 19:38 View deployment

UziTech mentioned this pull request Jan 18, 2025

Select Extension priority #3590

Open

vercel bot deployed to Preview January 18, 2025 19:41 View deployment

vercel bot deployed to Preview January 18, 2025 21:40 View deployment

calculuschild requested changes Feb 28, 2025

View reviewed changes

vercel bot deployed to Preview March 1, 2025 06:55 View deployment

vercel bot deployed to Preview March 1, 2025 18:14 View deployment

UziTech force-pushed the tokenizer-position branch from 5fac52a to cdff291 Compare April 19, 2025 22:44

vercel bot deployed to Preview April 19, 2025 22:44 View deployment

UziTech added 5 commits April 24, 2025 21:19

runExtensions

cd3cfb0

fix lint

ec20bed

clean

11ea98d

add after positions

9c950d9

lint

8f3b1cd

UziTech force-pushed the tokenizer-position branch from cdff291 to 8f3b1cd Compare April 25, 2025 03:19

vercel bot deployed to Preview April 25, 2025 03:19 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: tokenizer extension position #3594

feat: tokenizer extension position #3594

Uh oh!

UziTech commented Jan 18, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jan 18, 2025 •

edited

Loading

Uh oh!

calculuschild commented Feb 24, 2025

Uh oh!

UziTech commented Feb 26, 2025

Uh oh!

calculuschild left a comment

Uh oh!

calculuschild Feb 28, 2025

Uh oh!

UziTech Mar 1, 2025

Uh oh!

calculuschild commented Feb 28, 2025 •

edited

Loading

Uh oh!

UziTech commented Feb 28, 2025

Uh oh!

Uh oh!

feat: tokenizer extension position #3594

Are you sure you want to change the base?

feat: tokenizer extension position #3594

Uh oh!

Conversation

UziTech commented Jan 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODO:

Contributor

Committer

Uh oh!

vercel bot commented Jan 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

calculuschild commented Feb 24, 2025

Uh oh!

UziTech commented Feb 26, 2025

Uh oh!

calculuschild left a comment

Choose a reason for hiding this comment

Uh oh!

calculuschild Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

UziTech Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

calculuschild commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UziTech commented Feb 28, 2025

Uh oh!

Uh oh!

UziTech commented Jan 18, 2025 •

edited

Loading

vercel bot commented Jan 18, 2025 •

edited

Loading

calculuschild commented Feb 28, 2025 •

edited

Loading