Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters to excludes custom words #14

Closed
service-paradis opened this issue Dec 3, 2020 · 14 comments
Closed

Parameters to excludes custom words #14

service-paradis opened this issue Dec 3, 2020 · 14 comments
Assignees

Comments

@service-paradis
Copy link
Contributor

Thanks for this action!

I would like to use something similar for https://github.com/simple-icons/simple-icons

It would be really beneficial if we could filter some words of the algo. For example, we have a lot of request that looks like Request: brandname icon. There would be a lot of false positive if we use your action as-is. But if we could exclude specific words like Request and icon, it could probably be able to catch a good amount of duplicate.

bubkoo added a commit that referenced this issue Dec 4, 2020
filter newly created issue by title, any matched issue would not go on detection

re #14
@bubkoo bubkoo self-assigned this Dec 4, 2020
@bubkoo
Copy link
Member

bubkoo commented Dec 4, 2020

@service-paradis filter input is supported in the next release. Any newly created issue would stop detection when it's title match the filter. And filter can be a string or space separated strings work with https://www.npmjs.com/package/anymatch.

@service-paradis
Copy link
Contributor Author

@bubkoo Thanks for your work and the follow up!
The changes is great. It is not exactly what I need though.

This is an examples.

  • if I open Request Ubuntu icon and Request Fedora icon, they will be flagged as potential duplicates

I would like the algo to exclude a custom list of words before comparing the title. For example, having something like:

excludes:
  - Request
  - icon
  • This way, the algo will compare Request Ubuntu icon with Request Fedora icon. They wont be flagged as potential duplicates
  • If I have a title Ubuntu icon and Request Ubuntu icon, the algo will compare Ubuntu icon with Request Ubuntu icon. They will be flagged as potential duplicates

@bubkoo
Copy link
Member

bubkoo commented Dec 4, 2020

@service-paradis config like this

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: 'Request ** icon'

And this is my tested issues #15 #16

@service-paradis
Copy link
Contributor Author

service-paradis commented Dec 4, 2020

Thanks again @bubkoo 😄
Since people are not that disciplined, would it also work for other derivation than the current filter?
Example, people can request icons using these kind of title:

  • Request: Ubuntu icon
  • Request Ubuntu icon
  • Request: Ubuntu
  • Request Ubuntu
  • Add Ubuntu icon
  • Add Ubuntu
  • Ubuntu icon
  • Ubuntu
  • ...

@bubkoo
Copy link
Member

bubkoo commented Dec 7, 2020

@service-paradis You can specify multi filters in each line, such as

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: |
            Request ** icon
            Add ** icon
            ** icon **
            ** Ubuntu **

@mondeja
Copy link
Contributor

mondeja commented Dec 7, 2020

Sorry @bubkoo, but this is not what we need.

If I'm not mistaken, when an issue is opened, if match with, at least, one filter (if their title is "valid") will not be checked for potential duplicates. We need that, regardless of the opened issue title (without "validate it"), remove from it the words that are not needed and, comparing with other titles, of these other titles would be also removed certain words to improve the match between titles.

As @service-paradis pointed, we need an "exclude" function. Is something that you plan to include or not?

bubkoo added a commit that referenced this issue Dec 8, 2020
replace keyworlds with empty string

re #14
@bubkoo
Copy link
Member

bubkoo commented Dec 8, 2020

@mondeja @service-paradis Keyworlds specified in exclude will be replaced with empty string before detecting.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          exclude: |
            request
            icon
            add
            ubuntu

@mondeja
Copy link
Contributor

mondeja commented Dec 8, 2020

What about remove them instead of replacing with empty strings? Empty strings will be compared also, increasing the possibility of false positives. Check this test, the action is comparing "" Ansys "" with "" Ubuntu "" and raising false positives. What is the point of replacing the exclusions with empty strings?

bubkoo added a commit that referenced this issue Dec 9, 2020
Empty strings will be compared also, increasing the possibility of false positives.

re #14
@bubkoo bubkoo closed this as completed Jan 4, 2021
@service-paradis
Copy link
Contributor Author

Thanks again for your work on this @bubkoo

Unfortunately, I dont think the changes you made will totally solve the previous problem with unnecessary spaces.

For example, if we want to exclude "Request" and "icon" from "Request Ubuntu icon"

before:

  .reduce((memo, keyworld) => memo.replace(keyworld, ' '), title)

gives "⎵⎵Ubuntu⎵⎵"

after

  .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
  .replace(/\s+/, ' ')

gives "⎵Ubuntu⎵"

For the comparison, you are creating arrays using split(' '). We can see a slight improvements as it'll compare ["", "Ubuntu", ""] instead of ["", "", "Ubuntu", "", ""]. But it still can bring false positives.

Maybe trimming leading and trailing spaces before splitting would solve the above.

@bubkoo
Copy link
Member

bubkoo commented Jan 12, 2021

@service-paradis It now will trimming leading and trailing spaces before return.

  export function formatTitle(title: string) {
    const exclude = core.getInput('exclude')
    if (exclude) {
      return exclude
        .split(/[\s\n]+/)
        .map((keyworld) => keyworld.trim())
        .filter((keyworld) => keyworld.length > 0)
        .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
        .replace(/\s+/, ' ')
        .trim()
    }
    return title
  }

@ericcornelissen
Copy link

Would it makes sense to make the list of excluded words case insensitive?

(Also, keyworld should probably be keyword, not sure if you copied this snippet straight from the source.)

@service-paradis
Copy link
Contributor Author

Thank you for your work @bubkoo!

Would it makes sense to make the list of excluded words case insensitive?

I agree that it would be better if the exclusions were case insensitive. What do you think @bubkoo?

Yes, the typo comes from the source itself.

bubkoo added a commit that referenced this issue Jan 22, 2021
keyworld => keyword

re #14
bubkoo added a commit that referenced this issue Jan 22, 2021
@bubkoo
Copy link
Member

bubkoo commented Jan 22, 2021

@service-paradis Thanks for your tips and suggestions.

@service-paradis
Copy link
Contributor Author

@bubkoo I see that you added case insensitivity to math titles.

It would be also great to add case insensitivity to remove excluded words. For example, here, I need to add every words in different cases (ex. request and Request).

bubkoo added a commit that referenced this issue Jan 26, 2021
remove matched keywords from title

re #14
github-actions bot pushed a commit to iv-org/close-potential-duplicates that referenced this issue Jun 3, 2023
# 1.0.0 (2023-06-03)

### Bug Fixes

* 🐛 Avoid false positives if issue title is empty after excluding ([wow-actions#20](https://github.com/iv-org/close-potential-duplicates/issues/20)) ([e58dfb2](e58dfb2))
* 🐛 Exclude pull requests searching for duplicates ([wow-actions#19](https://github.com/iv-org/close-potential-duplicates/issues/19)) ([eebe64d](eebe64d))
* 🐛 excluded words should be case insensitivity ([f6ebc6e](f6ebc6e)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* 🐛 remove matched keywords ([0d07e57](0d07e57)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* 🐛 trimming leading and trailing spaces in issue title ([e0aa68f](e0aa68f)), closes [wow-actions#12](https://github.com/iv-org/close-potential-duplicates/issues/12)
* 🐛 typos ([393ab50](393ab50)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* 🐛 typos ([2bfd89d](2bfd89d))

### Features

* ✨ exclude keyworlds in title before detecting ([6f12b76](6f12b76)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* ✨ issue title filter ([7f95e44](7f95e44)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* ✨ make the list of excluded words case insensitive ([8dde4ad](8dde4ad)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14)
* ✨ support multi line filters ([4e8f2cd](4e8f2cd))
* ✨ support reactions ([9908687](9908687))

### Performance Improvements

* ⚡️ init ([05eddff](05eddff))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants