-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Licensing problem with unidecode #3311
Comments
Thanks for reporting this. It seems like others have had this problem with unidecode in the past: avian2/unidecode#1.
https://github.com/kmike/text-unidecode is also a port of Perl's
Seems to me like this is an option worth looking at. |
Hello, just to add some insights on this discussion: The bad behavior of the stock django slugify against non-latin characters is a really old and painful story. Slugify will happily accept non-latin characters in its input and will just convert them to spaces (!)... Yes, since django 1.9 slugify actually accepts an Due to this behavior of slugify I have experienced problems related to greek characters ignored in slugs in various project like django-taggit (jazzband/django-taggit#273), django-crispy-forms (django-crispy-forms/django-crispy-forms#396) and Wagtail (before some of the unidecode patches ware provided). So the correct solution would be to create slugs that contain transliterated unicode characters to their latin counterparts this, unfortunately (because of its restrictive license) unidecode is (at least for me) the only proper way of implementing a unicode-friendly slugify (the text-unidecode that is proposed by @thibaudcolas seems abandoned)! Now, I think that the best way to resolve the problem of unidecode's restrictive license will be to remove it from dependencies but use it in wagtail if it has been installed anyway. This is the solution that was used in django taggit (see the discussion on jazzband/django-taggit#273 and the relevant patch at jazzband/django-taggit#315) and from my understand it does not introduce any GPL-related licensing problems. |
I was not aware of the way the developers of taggit have circumvent the licensing problem. However, the problem is still there, just deported to the Wagtail platform integrator if he want to work with non latin characters. But he have the choice and I think it is already much less penalizing. |
As of Wagtail 1.6, URL slugs preserve unicode characters by default (and the Given the less-public-facing nature of those filenames, I think we could reasonably drop unidecode there in favour of a simpler unicode-to-ascii conversion - a good candidate that already exists in the Wagtail codebase is The bigger problem will be wagtailforms, I believe - if I'm not mistaken, changing the function that converts from human-readable labels to internal form field names will cause existing form submission data to be 'lost' (they'll still exist in the database, but we lose the ability to map them back to the original fields when exporting). Related: #3088 |
Hello @gasman, I wasn't aware that in Wagtail URL slugs preserve unicode characters by default. This behavior in my opinion is not ideal: Try visiting the URL for a greek wikipedia page, for example: https://el.wikipedia.org/wiki/Ελλάδα (using either chrome or firefox) and then copy and paste that url from your browser in a notepad. You'll get the following: https://el.wikipedia.org/wiki/%CE%95%CE%BB%CE%BB%CE%AC%CE%B4%CE%B1 For me this is not acceptable, that's why I recommend never using Unicode characters in URLs and always transliterate them to latin ones. Also, for filenames and form fields I'd really prefer the transliterated version instead of the So please allow using unidecode if it is installed -- it is really required for non-english speakers! |
Hi @spapas - please see the discussion at #1443. I guess different languages / nationalities have different thoughts about the acceptability of needing a Latin transliteration - and if we have to choose one way or the other, I'd rather choose the option that doesn't add an extra task for editors. (Also, the fact that Wikipedia have decided in favour of Unicode URLs, despite the messy copy/paste behaviour, surely can't be discounted :-) ) Either way, slug generation usually happens client-side (the exception being page instances that are created outside of the admin interface, e.g. import scripts) so there would be extra work involved in hooking up unidecode there.
Agreed - happy to support unidecode here if available. |
A solution could be to use our own script to generate transliterations. We could also parse [these official transliteration XML files] to fetch only the more basic rules (see syntax detail), I guess we would already have an excellent support for non-latin languages. That’s more work than the previous solution, but it would be much more complete too. |
Just stumbled on https://pypi.python.org/pypi/text-unidecode, a unidecode replacement licensed under the Artistic License. (Still need to confirm that it's compatible with BSD, and if we do switch we'll still need to handle form builder fields as per #3088 (comment) so that we don't lock out old form submission data.) |
@gasman the Artistic License version 1 might be a problem for commercial users of Wagtail as it uses terms like "reasonable copying fee":
I'm not a lawyer, but I don't want to worry about whether I'm charging for my services or for the distribution of third-party packages I use in my work. |
Is this bug still being looked at? No update for some time. |
@connorsml No progress on it lately, but it's something we're keen to resolve. Contributions welcome! |
So the issues with changing to another library are:
|
I wonder if the Ruby version matches the current functionality of the python implementation. Perhaps we could port this to python. |
@connorsml Finding an alternative library with comparable functionality to unidecode isn't really an issue - unidecode is only used in fairly minor places (filenames for images/documents, and generating field names for the form builder) where it's not the end of the world if we switch to a less-smart conversion algorithm, such as The bigger issue is indeed with migrating existing sites, specifically ones using the form builder (see #3088) - if we change the conversion algorithm at all then we risk making existing form submissions inaccessible. |
This is in prepration for support other implementations then unidecode since it has a GPLv2 license. See wagtail#3311
This is in preparation for support other implementations then unidecode since it has a GPLv2 license. See wagtail#3311
This is in preparation for support other implementations then unidecode since it has a GPLv2 license. See wagtail#3311
See #3088 (comment) for a proposal of how to deal with the form builder migration. Forgot that I'd already written this up :-) |
The Postgres search backend's dependency on unidecode has now been removed, in #5514. |
More progress towards this goal |
I found this - https://pypi.org/project/anyascii/ I think it might be a suitable drop in replacement for unidecode, it may have a different output (at first glance the approach to æ is different). However, it looks like it could work once the above mentioned PR resolves the backwards compatibility with form submission data. The licence is ISC, which hopefully is ok. @geonux would this licence be appropriate? |
ISC license seems to be compatible as it is a declination of the BSD license (as the Wagtail License). So for me it is a good solution. |
+1 for anyascii; I tested it and works perfectly:
The ISC licence seems to be the same as the MIT with some changes in language so it should be fine for using it in any kind of project! Thank you for mentioning it, i'm going to slowly replace unidecode with anyascii on my projects :) |
#6093 is now merged - this eliminates the use of unidecode in the form builder, which was the blocker for swapping out unidecode for an equivalent-but-not-100%-exact replacement such as anyascii. The dependency on unidecode will have to remain until 2.12 to provide an upgrade path for anyone with saved form data that's stored under the unidecode-created field names. @rjmackay (or anyone else...) - I'd be happy to accept a PR that updates the |
- Add anyascii to replace unidecode - Update wagtail.core.utils.string_to_ascii to use anyascii. - Anyascii has a similar but not exactly the same encoding - see updates to tests. Refs #3311
Completed in #6244 - the unidecode dependency will be left in place until 2.12 so that developers have the window of time of the 2.10 and 2.11 releases to deploy a new Wagtail version and have their form data migrated. (After that, if they skip straight from 2.9 to 2.12 then they'll need to install unidecode themselves to do the migration.) |
Not all Wagtail dependencies have same licenses. This is not a problem in most cases, many licenses are compatible with each other, in particular non-contaminating licenses such as BSD, MIT, ...
Wagtail is made available through one of these non-contaminating license : a BSD 3 Clauses.
The problem comes from Unidecode, which is GPL v2 (https://github.com/avian2/unidecode/blob/master/LICENSE). This license is a so-called contaminant license. All derivative works resulting from the integration of the software component are therefore theoretically contaminated and should therefore be distributed with the same license. Wagtail should be GPL v2.
Unfortunately, this license is very contaminating and therefore poses many integration problems - which limits the possibilities of use and integrate your piece of software. Moreover, it is not really in the philosophy of Python that prone the opening and the non-contaminating licenses.
Moreover, the GPL v2 license is incompatible with another component : Django-treebeard, which is released under Apache v2 (https://github.com/django-treebeard/django-treebeard/blob/master/LICENSE).
More informations about that :
Today, licenses of Wagtail and its dependencies are so incompatible.
Can you do the necessary tasks to correct this problem?
The removal of the Unidecode component will probably be the better solution. This solution keeps the current license and therefore the user community of Wagtail.
For your information, I am not the owner of one of these libraries. I just want to integrate Wagtail for my developments and I have strong constraints to produce "clean" software.
The text was updated successfully, but these errors were encountered: