Parameters to excludes custom words #14

service-paradis · 2020-12-03T01:29:29Z

Thanks for this action!

I would like to use something similar for https://github.com/simple-icons/simple-icons

It would be really beneficial if we could filter some words of the algo. For example, we have a lot of request that looks like Request: brandname icon. There would be a lot of false positive if we use your action as-is. But if we could exclude specific words like Request and icon, it could probably be able to catch a good amount of duplicate.

The text was updated successfully, but these errors were encountered:

filter newly created issue by title, any matched issue would not go on detection re #14

bubkoo · 2020-12-04T02:45:55Z

@service-paradis filter input is supported in the next release. Any newly created issue would stop detection when it's title match the filter. And filter can be a string or space separated strings work with https://www.npmjs.com/package/anymatch.

service-paradis · 2020-12-04T14:06:39Z

@bubkoo Thanks for your work and the follow up!
The changes is great. It is not exactly what I need though.

This is an examples.

if I open Request Ubuntu icon and Request Fedora icon, they will be flagged as potential duplicates

I would like the algo to exclude a custom list of words before comparing the title. For example, having something like:

excludes:
  - Request
  - icon

This way, the algo will compare ~~Request~~ Ubuntu ~~icon~~ with ~~Request~~ Fedora ~~icon~~. They wont be flagged as potential duplicates
If I have a title Ubuntu icon and Request Ubuntu icon, the algo will compare Ubuntu ~~icon~~ with ~~Request~~ Ubuntu ~~icon~~. They will be flagged as potential duplicates

bubkoo · 2020-12-04T14:59:44Z

@service-paradis config like this

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: 'Request ** icon'

And this is my tested issues #15 #16

service-paradis · 2020-12-04T17:53:19Z

Thanks again @bubkoo 😄
Since people are not that disciplined, would it also work for other derivation than the current filter?
Example, people can request icons using these kind of title:

Request: Ubuntu icon
Request Ubuntu icon
Request: Ubuntu
Request Ubuntu
Add Ubuntu icon
Add Ubuntu
Ubuntu icon
Ubuntu
...

bubkoo · 2020-12-07T02:12:44Z

@service-paradis You can specify multi filters in each line, such as

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: |
            Request ** icon
            Add ** icon
            ** icon **
            ** Ubuntu **

mondeja · 2020-12-07T13:11:08Z

Sorry @bubkoo, but this is not what we need.

If I'm not mistaken, when an issue is opened, if match with, at least, one filter (if their title is "valid") will not be checked for potential duplicates. We need that, regardless of the opened issue title (without "validate it"), remove from it the words that are not needed and, comparing with other titles, of these other titles would be also removed certain words to improve the match between titles.

As @service-paradis pointed, we need an "exclude" function. Is something that you plan to include or not?

replace keyworlds with empty string re #14

bubkoo · 2020-12-08T02:30:01Z

@mondeja @service-paradis Keyworlds specified in exclude will be replaced with empty string before detecting.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          exclude: |
            request
            icon
            add
            ubuntu

mondeja · 2020-12-08T12:34:39Z

What about remove them instead of replacing with empty strings? Empty strings will be compared also, increasing the possibility of false positives. Check this test, the action is comparing "" Ansys "" with "" Ubuntu "" and raising false positives. What is the point of replacing the exclusions with empty strings?

Empty strings will be compared also, increasing the possibility of false positives. re #14

service-paradis · 2021-01-11T20:22:53Z

Thanks again for your work on this @bubkoo

Unfortunately, I dont think the changes you made will totally solve the previous problem with unnecessary spaces.

For example, if we want to exclude "Request" and "icon" from "Request Ubuntu icon"

before:

  .reduce((memo, keyworld) => memo.replace(keyworld, ' '), title)

gives "⎵⎵Ubuntu⎵⎵"

after

  .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
  .replace(/\s+/, ' ')

gives "⎵Ubuntu⎵"

For the comparison, you are creating arrays using split(' '). We can see a slight improvements as it'll compare ["", "Ubuntu", ""] instead of ["", "", "Ubuntu", "", ""]. But it still can bring false positives.

Maybe trimming leading and trailing spaces before splitting would solve the above.

bubkoo · 2021-01-12T05:47:46Z

@service-paradis It now will trimming leading and trailing spaces before return.

  export function formatTitle(title: string) {
    const exclude = core.getInput('exclude')
    if (exclude) {
      return exclude
        .split(/[\s\n]+/)
        .map((keyworld) => keyworld.trim())
        .filter((keyworld) => keyworld.length > 0)
        .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
        .replace(/\s+/, ' ')
        .trim()
    }
    return title
  }

ericcornelissen · 2021-01-21T17:20:38Z

Would it makes sense to make the list of excluded words case insensitive?

^{(Also, keyworld should probably be keyword, not sure if you copied this snippet straight from the source.)}

service-paradis · 2021-01-21T17:28:57Z

Thank you for your work @bubkoo!

Would it makes sense to make the list of excluded words case insensitive?

I agree that it would be better if the exclusions were case insensitive. What do you think @bubkoo?

Yes, the typo comes from the source itself.

keyworld => keyword re #14

add `nocase` option for anymatch re #14

bubkoo · 2021-01-22T01:35:08Z

@service-paradis Thanks for your tips and suggestions.

service-paradis · 2021-01-25T13:51:53Z

@bubkoo I see that you added case insensitivity to math titles.

It would be also great to add case insensitivity to remove excluded words. For example, here, I need to add every words in different cases (ex. request and Request).

remove matched keywords from title re #14

# 1.0.0 (2023-06-03) ### Bug Fixes * 🐛 Avoid false positives if issue title is empty after excluding ([wow-actions#20](https://github.com/iv-org/close-potential-duplicates/issues/20)) ([e58dfb2](e58dfb2)) * 🐛 Exclude pull requests searching for duplicates ([wow-actions#19](https://github.com/iv-org/close-potential-duplicates/issues/19)) ([eebe64d](eebe64d)) * 🐛 excluded words should be case insensitivity ([f6ebc6e](f6ebc6e)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * 🐛 remove matched keywords ([0d07e57](0d07e57)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * 🐛 trimming leading and trailing spaces in issue title ([e0aa68f](e0aa68f)), closes [wow-actions#12](https://github.com/iv-org/close-potential-duplicates/issues/12) * 🐛 typos ([393ab50](393ab50)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * 🐛 typos ([2bfd89d](2bfd89d)) ### Features * ✨ exclude keyworlds in title before detecting ([6f12b76](6f12b76)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * ✨ issue title filter ([7f95e44](7f95e44)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * ✨ make the list of excluded words case insensitive ([8dde4ad](8dde4ad)), closes [wow-actions#14](https://github.com/iv-org/close-potential-duplicates/issues/14) * ✨ support multi line filters ([4e8f2cd](4e8f2cd)) * ✨ support reactions ([9908687](9908687)) ### Performance Improvements * ⚡️ init ([05eddff](05eddff))

service-paradis mentioned this issue Dec 3, 2020

Detecting potential duplicated issues simple-icons/simple-icons#4128

Closed

bubkoo added a commit that referenced this issue Dec 4, 2020

feat: ✨ issue title filter

7f95e44

filter newly created issue by title, any matched issue would not go on detection re #14

bubkoo self-assigned this Dec 4, 2020

service-paradis closed this as completed Dec 4, 2020

service-paradis reopened this Dec 4, 2020

bubkoo added a commit that referenced this issue Dec 8, 2020

feat: ✨ exclude keyworlds in title before detecting

6f12b76

replace keyworlds with empty string re #14

bubkoo added a commit that referenced this issue Dec 9, 2020

fix: 🐛 remove matched keywords

0d07e57

Empty strings will be compared also, increasing the possibility of false positives. re #14

bubkoo closed this as completed Jan 4, 2021

bubkoo added a commit that referenced this issue Jan 22, 2021

fix: 🐛 typos

393ab50

keyworld => keyword re #14

bubkoo added a commit that referenced this issue Jan 22, 2021

feat: ✨ make the list of excluded words case insensitive

8dde4ad

add `nocase` option for anymatch re #14

bubkoo added a commit that referenced this issue Jan 26, 2021

fix: 🐛 excluded words should be case insensitivity

f6ebc6e

remove matched keywords from title re #14

service-paradis mentioned this issue Jan 29, 2021

Detect potential duplicated issues simple-icons/simple-icons#4817

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameters to excludes custom words #14

Parameters to excludes custom words #14

service-paradis commented Dec 3, 2020

bubkoo commented Dec 4, 2020

service-paradis commented Dec 4, 2020

bubkoo commented Dec 4, 2020

service-paradis commented Dec 4, 2020 •

edited

Loading

bubkoo commented Dec 7, 2020

mondeja commented Dec 7, 2020 •

edited

Loading

bubkoo commented Dec 8, 2020

mondeja commented Dec 8, 2020 •

edited

Loading

service-paradis commented Jan 11, 2021

bubkoo commented Jan 12, 2021

ericcornelissen commented Jan 21, 2021

service-paradis commented Jan 21, 2021

bubkoo commented Jan 22, 2021

service-paradis commented Jan 25, 2021

Parameters to excludes custom words #14

Parameters to excludes custom words #14

Comments

service-paradis commented Dec 3, 2020

bubkoo commented Dec 4, 2020

service-paradis commented Dec 4, 2020

bubkoo commented Dec 4, 2020

service-paradis commented Dec 4, 2020 • edited Loading

bubkoo commented Dec 7, 2020

mondeja commented Dec 7, 2020 • edited Loading

bubkoo commented Dec 8, 2020

mondeja commented Dec 8, 2020 • edited Loading

service-paradis commented Jan 11, 2021

before:

after

bubkoo commented Jan 12, 2021

ericcornelissen commented Jan 21, 2021

service-paradis commented Jan 21, 2021

bubkoo commented Jan 22, 2021

service-paradis commented Jan 25, 2021

service-paradis commented Dec 4, 2020 •

edited

Loading

mondeja commented Dec 7, 2020 •

edited

Loading

mondeja commented Dec 8, 2020 •

edited

Loading