Fix misleading pattern name and documentation #109

mrezzamoradi · 2022-02-16T10:54:18Z

The ALLOWED_CHARS_PATTERN and regex_pattern are actually disallowed characters in the final slug.
Also since the code lowers the case in line 141, there's no need for a separate regex pattern for lowercase strings

Currently, mypy understands the type as `Iterable[str]`, which doesn't match what should actually be passed in, which is `Iterable[Iterable[str]]` or, ideally, `Iterable[Tuple[str, str]]`

un33k · 2022-02-16T17:58:36Z

@mrezzamoradi Thank you for raising this PR.

However, the pattern is to match all chars we want to keep.
Anything else outside the pattern will be replaced with the separator.

Code:

 # replace all other unwanted characters
    if lowercase:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN
    else:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN_WITH_UPPERCASE
    text = re.sub(pattern, DEFAULT_SEPARATOR, text). # <-- look here

Example:

>>> allowedPattern = re.compile(r'[^-a-z0-9]+')
>>> text = 'Aboo$%$%$%9-999--asdfasfd'
>>> unwantedRemoved = re.sub(allowedPattern, '-', text)
>>> print(unwantedRemoved)
'-boo-9-999--asdfasfd'
>>>

Also:
Case sensitive needs to follow the lowercase flag (default = False).
This is specially true when the separator is something like: `ABZZzzzzDDD';
Please refer to the tests here. https://github.com/un33k/python-slugify/blob/master/test.py#L97

un33k · 2022-02-16T18:05:46Z

@mrezzamoradi

Once again thank you for taking the time to raise this PR.

With that said, I am going to closing this PR for now.
If the above explanation is not sufficient, or you have new updates, feel free to raise an updated PR or a new one.

Should you have time to add github actions to this repo, I would appreciate it as well.
Thx

mrezzamoradi · 2022-02-16T19:56:48Z

@mrezzamoradi Thank you for raising this PR.

However, the pattern is to match all chars we want to keep. Anything else outside the pattern will be replaced with the separator.

Code:
 # replace all other unwanted characters
    if lowercase:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN
    else:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN_WITH_UPPERCASE
    text = re.sub(pattern, DEFAULT_SEPARATOR, text). # <-- look here 
Example:
>>> allowedPattern = re.compile(r'[^-a-z0-9]+')
>>> text = 'Aboo$%$%$%9-999--asdfasfd'
>>> unwantedRemoved = re.sub(allowedPattern, '-', text)
>>> print(unwantedRemoved)
'-boo-9-999--asdfasfd'
>>>
Also: Case sensitive needs to follow the lowercase flag (default = False). This is specially true when the separator is something like: `ABZZzzzzDDD'; Please refer to the tests here. https://github.com/un33k/python-slugify/blob/master/test.py#L97

Thanks for your comment @un33k
Still you might have either misnamed the ALLOWED_CHARS_PATTERN or misunderstood the re.sub function. let's dig into it:

[^-a-z0-9]+ simply means match any character that is not in the list of dash, a-z, or 0-9. It means characters like ?<>!@. Now is the code supposed to keep the latter characters? I guess not. Those are the characters you want to ignore/disallow and replace them by a separator
re.sub(pattern, repl, string) is simply regex substitution (replacement) function. It replaces any character in the string that matches the pattern, with repl. Now again obviously the aim of re.sub(pattern, DEFAULT_SEPARATOR, text) is not to keep the pattern in the text. It's quite the opposite of keeping, to replace/remove them from the text.

I hope this explanation brings some light on the confusion here. The same explanation can explain some part of why I've removed the lowercase regex pattern (because the matching characters would always be a subset of the matching characters when using ALLOWED_CHARS_PATTERN_WITH_UPPERCASE) . Also note that the current default value of lowercase flag is True and it converts the text to lowercase before any substitution of patterns.

mrezzamoradi · 2022-02-16T19:57:59Z

btw this PR passes all the tests

un33k · 2022-02-16T22:50:12Z

@mrezzamoradi I enabled github actions to ensure we have a trusted CI.

As soon as the tests passed, I decided to pull your request in.
https://github.com/un33k/python-slugify/tree/v6.0.1

I do appreciate your contribution.

I have sent a request, should you decide to join this project as a contributor.

fahhem and others added 4 commits May 12, 2021 13:25

Add better typing for slugify.slugify

1097c23

Currently, mypy understands the type as `Iterable[str]`, which doesn't match what should actually be passed in, which is `Iterable[Iterable[str]]` or, ideally, `Iterable[Tuple[str, str]]`

whitespace around =

a90d967

fix misleading pattern name and documentation

5810e3d

fix README.md and cli doc as well

376f949

un33k closed this Feb 16, 2022

un33k reopened this Feb 16, 2022

Merge branch 'staging' into master

9640069

un33k merged commit dce2189 into un33k:staging Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix misleading pattern name and documentation #109

Fix misleading pattern name and documentation #109

mrezzamoradi commented Feb 16, 2022

un33k commented Feb 16, 2022 •

edited

un33k commented Feb 16, 2022

mrezzamoradi commented Feb 16, 2022 •

edited

mrezzamoradi commented Feb 16, 2022

un33k commented Feb 16, 2022 •

edited

Fix misleading pattern name and documentation #109

Fix misleading pattern name and documentation #109

Conversation

mrezzamoradi commented Feb 16, 2022

un33k commented Feb 16, 2022 • edited

un33k commented Feb 16, 2022

mrezzamoradi commented Feb 16, 2022 • edited

mrezzamoradi commented Feb 16, 2022

un33k commented Feb 16, 2022 • edited

un33k commented Feb 16, 2022 •

edited

mrezzamoradi commented Feb 16, 2022 •

edited

un33k commented Feb 16, 2022 •

edited