Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a uniq filter. #653

Merged
merged 1 commit into from Oct 10, 2021
Merged

Add a uniq filter. #653

merged 1 commit into from Oct 10, 2021

Conversation

Sveder
Copy link
Contributor

@Sveder Sveder commented Jul 20, 2021

No description provided.

@Sveder
Copy link
Contributor Author

Sveder commented Jul 20, 2021

@thp wdyt?
I tried using shellpipe: uniq (didn't work) but makes sense to me to have this as a first class as sort exists.

(Draft as I didn't add docs, if you'll ok this I'll add them and un-draft)

@Sveder Sveder marked this pull request as draft July 20, 2021 13:25
@thp
Copy link
Owner

thp commented Jul 29, 2021

The reason why sort exists is historical, there was no shellpipe filter when sort was introduced.

The reason why your uniq with shellpipe didn't work is most likely because you didn't sort it before (uniq in Unix requires lines to be sorted).

So for line-based uniqueness, I think it's fine to just sort it first (try shellpipe: 'sort -u' or shellpipe: 'sort | uniq').

What might be interesting is a kind of "unique / sort" kind of filter that works on e.g. CSS or XPath selectors, as this is kind of hard to do with built-in unix commands.

Feel free to turn this PR into a documentation change that shows how to use shellpipe + sort -u or sort | uniq to filter duplicate lines.

@Sveder
Copy link
Contributor Author

Sveder commented Jul 30, 2021

@thp thanks for answering. You might be right that I didn't play with shellpipe: uniq enough, and indeed now that I read uniq docs it is not what I wanted as I don't want to sort the data.

In this case, I think my implementation of uniq is definitely simpler and more intuitive than some of the awk one liners that stack overflow suggests to uniq without sort.

Is this use case still not interesting enough to be merged?

@thp
Copy link
Owner

thp commented Jul 30, 2021

Yeah I think it's valid, but rename it remove-duplicate-lines or something so that it's easier for non-Unix people and so that there's no confusion with the Unix uniq tool which works slightly differently.

@Sveder
Copy link
Contributor Author

Sveder commented Aug 2, 2021

@thp updated the name as per your suggestion.

lib/urlwatch/filters.py Outdated Show resolved Hide resolved
Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, rest looks good already. The optional separator subfilter option should also be documented.

@Sveder
Copy link
Contributor Author

Sveder commented Aug 5, 2021

@thp made the changes :)

docs/source/filters.rst Outdated Show resolved Hide resolved
Copy link
Owner

@thp thp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@Sveder
Copy link
Contributor Author

Sveder commented Aug 18, 2021

Changes made.

@thp
Copy link
Owner

thp commented Aug 18, 2021

Please mark as ready for review + update changelog + squash to a single commit and then we can merge this.

@Sveder Sveder marked this pull request as ready for review October 6, 2021 12:35
@Sveder
Copy link
Contributor Author

Sveder commented Oct 6, 2021

@thp done.

@thp thp merged commit d732547 into thp:master Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants