Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix misleading pattern name and documentation #109

Merged
merged 5 commits into from Feb 16, 2022
Merged

Fix misleading pattern name and documentation #109

merged 5 commits into from Feb 16, 2022

Conversation

mrezzamoradi
Copy link
Collaborator

The ALLOWED_CHARS_PATTERN and regex_pattern are actually disallowed characters in the final slug.
Also since the code lowers the case in line 141, there's no need for a separate regex pattern for lowercase strings

fahhem and others added 4 commits May 12, 2021 13:25
Currently, mypy understands the type as `Iterable[str]`, which doesn't match what should actually be passed in, which is `Iterable[Iterable[str]]` or, ideally, `Iterable[Tuple[str, str]]`
@un33k
Copy link
Owner

un33k commented Feb 16, 2022

@mrezzamoradi Thank you for raising this PR.

However, the pattern is to match all chars we want to keep.
Anything else outside the pattern will be replaced with the separator.

Code:

 # replace all other unwanted characters
    if lowercase:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN
    else:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN_WITH_UPPERCASE
    text = re.sub(pattern, DEFAULT_SEPARATOR, text). # <-- look here 

Example:

>>> allowedPattern = re.compile(r'[^-a-z0-9]+')
>>> text = 'Aboo$%$%$%9-999--asdfasfd'
>>> unwantedRemoved = re.sub(allowedPattern, '-', text)
>>> print(unwantedRemoved)
'-boo-9-999--asdfasfd'
>>>

Also:
Case sensitive needs to follow the lowercase flag (default = False).
This is specially true when the separator is something like: `ABZZzzzzDDD';
Please refer to the tests here. https://github.com/un33k/python-slugify/blob/master/test.py#L97

@un33k
Copy link
Owner

un33k commented Feb 16, 2022

@mrezzamoradi

Once again thank you for taking the time to raise this PR.

With that said, I am going to closing this PR for now.
If the above explanation is not sufficient, or you have new updates, feel free to raise an updated PR or a new one.

Should you have time to add github actions to this repo, I would appreciate it as well.
Thx

@un33k un33k closed this Feb 16, 2022
@mrezzamoradi
Copy link
Collaborator Author

mrezzamoradi commented Feb 16, 2022

@mrezzamoradi Thank you for raising this PR.

However, the pattern is to match all chars we want to keep. Anything else outside the pattern will be replaced with the separator.

Code:

 # replace all other unwanted characters
    if lowercase:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN
    else:
        pattern = regex_pattern or ALLOWED_CHARS_PATTERN_WITH_UPPERCASE
    text = re.sub(pattern, DEFAULT_SEPARATOR, text). # <-- look here 

Example:

>>> allowedPattern = re.compile(r'[^-a-z0-9]+')
>>> text = 'Aboo$%$%$%9-999--asdfasfd'
>>> unwantedRemoved = re.sub(allowedPattern, '-', text)
>>> print(unwantedRemoved)
'-boo-9-999--asdfasfd'
>>>

Also: Case sensitive needs to follow the lowercase flag (default = False). This is specially true when the separator is something like: `ABZZzzzzDDD'; Please refer to the tests here. https://github.com/un33k/python-slugify/blob/master/test.py#L97

Thanks for your comment @un33k
Still you might have either misnamed the ALLOWED_CHARS_PATTERN or misunderstood the re.sub function. let's dig into it:

  • [^-a-z0-9]+ simply means match any character that is not in the list of dash, a-z, or 0-9. It means characters like ?<>!@. Now is the code supposed to keep the latter characters? I guess not. Those are the characters you want to ignore/disallow and replace them by a separator

  • re.sub(pattern, repl, string) is simply regex substitution (replacement) function. It replaces any character in the string that matches the pattern, with repl. Now again obviously the aim of re.sub(pattern, DEFAULT_SEPARATOR, text) is not to keep the pattern in the text. It's quite the opposite of keeping, to replace/remove them from the text.

I hope this explanation brings some light on the confusion here. The same explanation can explain some part of why I've removed the lowercase regex pattern (because the matching characters would always be a subset of the matching characters when using ALLOWED_CHARS_PATTERN_WITH_UPPERCASE) . Also note that the current default value of lowercase flag is True and it converts the text to lowercase before any substitution of patterns.

@mrezzamoradi
Copy link
Collaborator Author

btw this PR passes all the tests

@un33k un33k reopened this Feb 16, 2022
@un33k un33k merged commit dce2189 into un33k:staging Feb 16, 2022
@un33k
Copy link
Owner

un33k commented Feb 16, 2022

@mrezzamoradi I enabled github actions to ensure we have a trusted CI.

As soon as the tests passed, I decided to pull your request in.
https://github.com/un33k/python-slugify/tree/v6.0.1

I do appreciate your contribution.

I have sent a request, should you decide to join this project as a contributor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants