-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix($shared-utils): Replace diacritics with String.normalize #1855
fix($shared-utils): Replace diacritics with String.normalize #1855
Conversation
Nice job @larionov i will have a look when i got time ! |
it should fix #1815 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @larionov,
Thanks for this PR, seems great when reading tests.
I'm not an expert on this topic so I'll just ask questions for now:
/[\u0300-\u036F]/g
this will remove all chars code from 768 to 879. Why this specific range? Are you sure this is correct?- Are you sure
str.normalize('NFKD')
is correct? Could you explain?
@kefranabg I'm not an expert either but: At that point I thought that the purpose of this is to make sure the urls are unambiguous and it's easy to type it and make sure you have the same url if it was re-encoded somewhere else somehow. For that reason I chose the NFKD decomposition: It's purpose is exactly that:
So it will make the
I got it from here: https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript#answer-37511463 Sorry for the late answer! |
I extended the tests with additional German Umlauts (besides For example try
|
PS: I am following this PR and its issue due to about-code/glossarify-md#27. As a workaround for users of glossarify-md I currently recommend to replace vuepress' internal slugger via vuepress' |
Looks good to me, but I'd like one more code review as I'm not 100% confident with this |
@kefranabg What has changed since me reporting that German diacritics are not handled correctly? In case it was a misunderstanding: please note that while I wrote, to have extended the tests I meant to have done so only locally to verify the PR. I haven't been pushing the changes yet, that's why tests still run green. But I don't think the current implementation is already sufficient with respect to unicode characters in general. @larionov I would push my changes with German characters, if you don't mind. This is going to make tests fail but I am hardly an expert on the topic, either, and don't know what the unicode character range would have to look like to cover all diacritics. |
@about-code I think the only way to solve this problem is to make your package to use the same slugger as vuepress does, and you can actually import shared-utils from your project and use it there too. The German characters difference is not the only difference between your slugger and this one and it doesn't make sense to make it equal character by character, it will break in the end anyway, sooner or later. |
... but I am not sure what the exact purpose of the fix is: is this all about just the cyrillic letter or about changing the way diacritics are handled in general? If it's just about the cyrillic character then just forget about my comments. |
@larionov thanks for your quick response. Just saw it. Edit:
Since the slugger in my project is a transitive dependency I can't do it this way - only the other way around. But I'm happy with the fact, that vuepress allows for replacing its slugger via its ❤️ |
Can it be merged? This PR has been open for almost a year and the problem it solves still exists. |
Hey @osipxd! Thanks for pinging this thread again. I noticed that there's a merge conflict on this since there's a |
|
d94c91f
to
b35e451
Compare
b35e451
to
d3c0664
Compare
@bencodezen @osipxd @ulivz done |
d3c0664
to
10716f6
Compare
Thank you @larionov! Really appreciate our work on this! |
Summary
fix #1815
What kind of change does this PR introduce? (check at least one)
If changing the UI of default theme, please provide the before/after screenshot:
Does this PR introduce a breaking change? (check one)
If yes, please describe the impact and migration path for existing applications:
The slugs in the existing URL's will change, so the external links referencing existing pages might break.
The PR fulfills these requirements:
fix #xxx[,#xxx]
, where "xxx" is the issue number)You have tested in the following browsers: (Providing a detailed version will be better.)
If adding a new feature, the PR's description includes:
To avoid wasting your time, it's best to open a feature request issue first and wait for approval before working on it.
Other information:
So diacritics has some errors in the replacements sets, and it wasn't updated for quite some time. I did a bit of research and looks like using String.normalize might work.
It is supported by node > 6 but not supported by IE, and the polyfill is huge, but,
@f3ltron, do I understand correctly that slugify util is only ever run on the server side and never in the browser? If so then it's not a problem then.
We can normalize the string (https://www.unicode.org/reports/tr15/#Compatibility_Equivalence_Figure) and in the process it will show accents as separate characters which can be removed with the regex (thanks @glebtv).
I think this will also help if, say, someone typed the links in the .md files manually, and they look the same but internally different - they will reference the same page anyway.