Improve DefaultSlugGenerator to preserve complex unicode letters, numbers, and marks #467
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
DefaultSlugGenerator
implementation currently strips out certain unicode code-points that can render titles in many languages not work properly.Test: https://gist.github.com/Ayesh/f357dcc18b60e117ab771476606dda3a
league/commonmark
slugTest
#test
#test
අත්හදා බලන මාතෘකාව
#අත්හදා-බලන-මාතෘකාව
#--
අත්හදා බලන මාතෘකාව -
#අත්හදා-බලන-මාතෘකාව---
#----
测试标题
#测试标题
#
試験タイトル
#試験タイトル
#
The last second and third strings are from my native Sinhalese language. Some of the glyphs are letters (
\p{L)
), but we also have marks (\p{M}
). These marks are not symbols (\p{S}
) or punctuation (\p{P}
). I think the current slug-ify process makes it not possible to use the HeadingPermalink extension in a meaningful way for those who write in complex scripts, which includes Eastern Asian and South Asian languages (which, statistically, accounts for more than half of the world population)Pretty much every news site, and even WikiPedia processes slugs this way.
This PR relaxes the stripping logic to allow complex scripts to preserve the title. Punctuation, symbols, emoji, etc are still removed. After this change,
league/commonmark
slugs match GitHub's slugs verbatim.