-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array#uniq does not remove equal Twingly::URLs #123
Labels
Comments
Great catch, not sure how we can solve it (seems tricky) I will dump the workaround you used here: $ irb -rtwingly/url
irb(main):001:0> a = Twingly::URL.parse("http://google.com")
=> #<Twingly::URL:0x3fc7cf08b444 http://google.com>
irb(main):002:0> b = Twingly::URL.parse("http://google.com")
=> #<Twingly::URL:0x3fc7cec9e890 http://google.com>
irb(main):003:0> [a, b].uniq(&:to_s)
=> [#<Twingly::URL:0x3fc7cf08b444 http://google.com>] |
I think this is how to solve it. diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 20b4437..4999fd6 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -1,5 +1,7 @@
# frozen_string_literal: true
+require "digest"
+
require "addressable/idna/pure"
require "addressable/uri"
require "public_suffix"
@@ -23,6 +25,11 @@ module Twingly
PublicSuffix::DomainInvalid,
].freeze
+ # Instantiate digest classes in a thread-safe manner
+ # This is important since we don't know how people will
+ # use this gem (if they require it in a thread safe way)
+ SHA256_DIGEST = Digest(:SHA256)
+
private_constant :ACCEPTED_SCHEMES
private_constant :CUSTOM_PSL
private_constant :STARTS_WITH_WWW
@@ -193,6 +200,14 @@ module Twingly
self.to_s <=> other.to_s
end
+ def eql?(other)
+ self.hash == other.hash
+ end
+
+ def hash
+ SHA256_DIGEST.digest(self.to_s).unpack1("q")
+ end
+
def to_s
addressable_uri.to_s
end
diff --git a/lib/twingly/url/null_url.rb b/lib/twingly/url/null_url.rb
index 1044303..1d5e442 100644
--- a/lib/twingly/url/null_url.rb
+++ b/lib/twingly/url/null_url.rb
@@ -24,6 +24,14 @@ module Twingly
self.to_s <=> other.to_s
end
+ def eql?(other)
+ self.hash == other.hash
+ end
+
+ def hash
+ "".hash
+ end
+
def to_s
""
end
diff --git a/spec/lib/twingly/url/null_url_spec.rb b/spec/lib/twingly/url/null_url_spec.rb
index 7ae42c1..181e1ae 100644
--- a/spec/lib/twingly/url/null_url_spec.rb
+++ b/spec/lib/twingly/url/null_url_spec.rb
@@ -97,4 +97,9 @@ describe Twingly::URL::NullURL do
expect { url.method_does_not_exist }.to raise_error(NoMethodError)
end
end
+
+ describe "uniqueness" do
+ subject(:list) { 10.times.map { |_| described_class.new }.uniq }
+ it { is_expected.to eq([described_class.new]) }
+ end
end
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index 02614cc..f505d2f 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -847,6 +847,18 @@ describe Twingly::URL do
end
end
+ describe "uniqueness" do
+ context do "with the same URL twice"
+ let(:a) { described_class.parse("https://www.twingly.com/") }
+ let(:b) { described_class.parse("https://www.google.com/") }
+ let(:c) { described_class.parse("https://www.twingly.com/") }
+
+ it "only unique URLs" do
+ expect([a,b,c].uniq).to eq([a,b])
+ end
+ end
+ end
+
describe "#inspect" do
let(:url_object) { described_class.parse(url) }
subject { url_object.inspect }
Example: $ bundle exec pry
[1] pry(main)> require_relative "lib/twingly/url"
=> true
[2] pry(main)> 1000.times.map { Twingly::URL.parse(%Q{https://#{["www.", ""].sample}duh.se/}) }.uniq
=> [#<Twingly::URL:0x3ff188258e60 https://duh.se/>, #<Twingly::URL:0x3ff18824453c https://www.duh.se/>]
[3] pry(main)> 1000.times.map { Twingly::URL.parse(%Q{https://#{["www.", ""].sample}duh.se/}).normalized }.uniq
=> [#<Twingly::URL:0x3ff1881f4794 https://www.duh.se/>] |
Nice! |
jage
added a commit
that referenced
this issue
Feb 1, 2019
Will reduce duplicates when using `.uniq`, to get even better result the user could first normalize, then uniq. Close #123
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the sample above I would expect to get 1 element back.
The text was updated successfully, but these errors were encountered: