Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array#uniq does not remove equal Twingly::URLs #123

Closed
walro opened this issue Apr 16, 2018 · 3 comments
Closed

Array#uniq does not remove equal Twingly::URLs #123

walro opened this issue Apr 16, 2018 · 3 comments
Labels

Comments

@walro
Copy link
Contributor

walro commented Apr 16, 2018

[2] pry(main)> url = "https://www.twingly.com"
=> "https://www.twingly.com"
[3] pry(main)> Twingly::URL.parse(url) == Twingly::URL.parse(url)
=> true
[4] pry(main)> [Twingly::URL.parse(url), Twingly::URL.parse(url)].uniq.size
=> 2

In the sample above I would expect to get 1 element back.

@walro walro added the bug label Apr 16, 2018
@dentarg
Copy link
Collaborator

dentarg commented Apr 16, 2018

Great catch, not sure how we can solve it (seems tricky)

I will dump the workaround you used here:

$ irb -rtwingly/url
irb(main):001:0> a = Twingly::URL.parse("http://google.com")
=> #<Twingly::URL:0x3fc7cf08b444 http://google.com>
irb(main):002:0> b = Twingly::URL.parse("http://google.com")
=> #<Twingly::URL:0x3fc7cec9e890 http://google.com>
irb(main):003:0> [a, b].uniq(&:to_s)
=> [#<Twingly::URL:0x3fc7cf08b444 http://google.com>]

@jage
Copy link
Contributor

jage commented Jan 31, 2019

I think this is how to solve it.

diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 20b4437..4999fd6 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -1,5 +1,7 @@
 # frozen_string_literal: true
 
+require "digest"
+
 require "addressable/idna/pure"
 require "addressable/uri"
 require "public_suffix"
@@ -23,6 +25,11 @@ module Twingly
       PublicSuffix::DomainInvalid,
     ].freeze
 
+    # Instantiate digest classes in a thread-safe manner
+    # This is important since we don't know how people will
+    # use this gem (if they require it in a thread safe way)
+    SHA256_DIGEST = Digest(:SHA256)
+
     private_constant :ACCEPTED_SCHEMES
     private_constant :CUSTOM_PSL
     private_constant :STARTS_WITH_WWW
@@ -193,6 +200,14 @@ module Twingly
       self.to_s <=> other.to_s
     end
 
+    def eql?(other)
+      self.hash == other.hash
+    end
+
+    def hash
+      SHA256_DIGEST.digest(self.to_s).unpack1("q")
+    end
+
     def to_s
       addressable_uri.to_s
     end
diff --git a/lib/twingly/url/null_url.rb b/lib/twingly/url/null_url.rb
index 1044303..1d5e442 100644
--- a/lib/twingly/url/null_url.rb
+++ b/lib/twingly/url/null_url.rb
@@ -24,6 +24,14 @@ module Twingly
         self.to_s <=> other.to_s
       end
 
+      def eql?(other)
+        self.hash == other.hash
+      end
+
+      def hash
+        "".hash
+      end
+
       def to_s
         ""
       end
diff --git a/spec/lib/twingly/url/null_url_spec.rb b/spec/lib/twingly/url/null_url_spec.rb
index 7ae42c1..181e1ae 100644
--- a/spec/lib/twingly/url/null_url_spec.rb
+++ b/spec/lib/twingly/url/null_url_spec.rb
@@ -97,4 +97,9 @@ describe Twingly::URL::NullURL do
       expect { url.method_does_not_exist }.to raise_error(NoMethodError)
     end
   end
+
+  describe "uniqueness" do
+    subject(:list) { 10.times.map { |_| described_class.new }.uniq }
+    it { is_expected.to eq([described_class.new]) }
+  end
 end
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index 02614cc..f505d2f 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -847,6 +847,18 @@ describe Twingly::URL do
     end
   end
 
+  describe "uniqueness" do
+    context do "with the same URL twice"
+      let(:a) { described_class.parse("https://www.twingly.com/") }
+      let(:b) { described_class.parse("https://www.google.com/") }
+      let(:c) { described_class.parse("https://www.twingly.com/") }
+
+      it "only unique URLs" do
+        expect([a,b,c].uniq).to eq([a,b])
+      end
+    end
+  end
+
   describe "#inspect" do
     let(:url_object) { described_class.parse(url) }
     subject { url_object.inspect }

Example:

$ bundle exec pry
[1] pry(main)> require_relative "lib/twingly/url"
=> true
[2] pry(main)> 1000.times.map { Twingly::URL.parse(%Q{https://#{["www.", ""].sample}duh.se/}) }.uniq
=> [#<Twingly::URL:0x3ff188258e60 https://duh.se/>, #<Twingly::URL:0x3ff18824453c https://www.duh.se/>]
[3] pry(main)> 1000.times.map { Twingly::URL.parse(%Q{https://#{["www.", ""].sample}duh.se/}).normalized }.uniq
=> [#<Twingly::URL:0x3ff1881f4794 https://www.duh.se/>]

@dentarg
Copy link
Collaborator

dentarg commented Jan 31, 2019

Nice!

jage added a commit that referenced this issue Feb 1, 2019
Will reduce duplicates when using `.uniq`, to get even better result the
user could first normalize, then uniq.

Close #123
@jage jage closed this as completed in #129 Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants