Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests #3

Merged
merged 21 commits into from
Feb 21, 2014
Merged

Tests #3

merged 21 commits into from
Feb 21, 2014

Conversation

jage
Copy link
Contributor

@jage jage commented Feb 20, 2014

  • Add basic tests so I can refactor
  • Refactor
  • Add tests for weird URLs
  • Fix red tests
  • Add profiling tests
  • Optimize
  • 💵
  • Bump Gem version

Using minitest with some extras:

* Turn for more informative run output
* Shoulda for context and matchers

Turn: https://github.com/turn-project/turn
Shoulda: https://github.com/thoughtbot/shoulda
Broken URLs found during work with Zambezi
end

should "handle URL with reference to another URL in it" do
url = "http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGc4A_sfGS6fMMqggiK_8h6yk2miw&url=http:%20%20%20//fansided.com/2013/08/02/nike-decides-to-drop-milwaukee-brewers-ryan-braun"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this URL be supported?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause some library didn't like it (it's now working).
I don't know the RFC from my head, some strings might not be real URLs.

I agree this should be supported, if Chrome supports it it should work (we should snatch their tests!)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works in curl too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle URL with reference to another URL in it

Hmm, is the "another URL" valid?

$ curl -v http:%20%20%20//fansided.com/2013/08/02/nike-decides-to-drop-milwaukee-brewers-ryan-braun
* Adding handle: conn: 0x7fbd82007a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fbd82007a00) send_pipe: 1, recv_pipe: 0
* Could not resolve host: http
* Closing connection 0
curl: (6) Could not resolve host: http

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think it matters that it's a URL with another URL in it, parameters can contain whatever, don't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I think the original site changed the "internal URL", replaced some stuff.
Chrome rewrites the URL to: http:+++//..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure the test name is that could, but I couldn't figure out a better one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I think the original site changed the "internal URL", replaced some stuff.

Or not, Google says "The previous page is sending you to an invalid url".

shoulda includes shoulda-context and shoulda-matchers, we’re not using
the matchar at this moment, so no need to pull it in (since it
introduces lots of development dependencies).
PostRank::URI couldn’t handle umlauts.

We will lose the feature to detect urls without protocol “twingly.com”,
but we don’t see the need for this feature.

On the plus side, lots of runtime dependencies are removed (nokogiri!).
@jage jage self-assigned this Feb 20, 2014
assert_equal "http://www.twingly.com/", result
end

should "not be able to normalize url without protocol" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this so we don't add this feature by mistake in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart

@jage
Copy link
Contributor Author

jage commented Feb 20, 2014

With Postrank::URI

Loaded Suite test,test/profile,test/unit

Started at 2014-02-20 16:24:15 +0100 w/ seed 36130.

NormalizerPerformanceTest
Thread ID: 70317498558180
Fiber ID: 70317500203180
Total: 30.794471
Sort by: self_time

 %self      total      self      wait     child     calls  name
  5.93      2.230     1.827     0.000     0.403   410000   PublicSuffix::Rule::Base#odiff 
  4.88      6.988     1.501     0.000     5.487   410000   PublicSuffix::Rule::Base#match? 
  4.50      3.358     1.384     0.000     1.974   425991   <Class::PublicSuffix::Domain>#domain_to_labels 
  3.70      1.138     1.138     0.000     0.000   451983   String#split 
  3.62      1.115     1.115     0.000     0.000    60000   String#=~ 
  3.19      0.981     0.981     0.000     0.000   160000   String#gsub 
  2.73      0.841     0.841     0.000     0.000  1170000   Kernel#instance_variable_defined? 
  2.67      5.164     0.824     0.000     4.341    30000   <Class::Addressable::URI>#normalize_component 
  2.63      6.655     0.810     0.000     5.845    30000   <Class::Addressable::URI>#parse 
  2.38      3.363     0.734     0.000     2.629   170000   Addressable::URI#validate 
  2.20      0.940     0.678     0.000     0.262   370000   Addressable::URI#host 
  2.06      0.635     0.635     0.000     0.000   455991   Array#reverse 
  1.94      7.592     0.596     0.000     6.996    20000   Array#select 
  1.81      0.779     0.557     0.000     0.222   300000   Addressable::URI#scheme 
  1.79      0.794     0.551     0.000     0.243   390000   String#== 
  1.66      0.849     0.510     0.000     0.339    40000   Addressable::URI#host= 
  1.40      1.034     0.433     0.000     0.601    30000   <Class::Addressable::URI>#encode_component 
  1.40      0.677     0.431     0.000     0.245    70001   Array#each 
  1.37      1.769     0.420     0.000     1.349    40000   Addressable::URI#scheme= 
  1.36     24.423     0.418     0.000    24.004    40000  *String#scan 
  1.32      0.562     0.407     0.000     0.155   220000   Addressable::URI#path 
  1.31      0.403     0.403     0.000     0.000   410000   Array#[] 
  1.23      0.380     0.380     0.000     0.000   538055   String#to_s 
  1.22      1.129     0.376     0.000     0.752    40000   <Class::Addressable::URI>#unencode 
  1.17      1.482     0.360     0.000     1.123    20000   Addressable::URI#to_s 
  1.15      0.354     0.354     0.000     0.000   246073   String#[] 
  1.13      0.487     0.347     0.000     0.140    50000   Addressable::URI#path= 
  0.98      0.777     0.303     0.000     0.474   210000   BasicObject#!= 
  0.98      0.558     0.300     0.000     0.258   110000   Kernel#dup 
  0.92      0.351     0.283     0.000     0.067    10000   PublicSuffix::Rule::Normal#decompose 
  0.89      0.838     0.275     0.000     0.563    50000   Addressable::URI#authority 
  0.89      1.084     0.275     0.000     0.809    50000   Addressable::URI#ip_based? 
  0.87      7.264     0.268     0.000     6.996    40000   Addressable::URI#initialize 
  0.86      0.542     0.264     0.000     0.278    30000   <Module::Addressable::IDNA>#unicode_sort_canonical 
  0.81      0.250     0.250     0.000     0.000   270000   Kernel#respond_to? 
  0.79      0.243     0.243     0.000     0.000   340000   Kernel#respond_to_missing? 
  0.79      0.242     0.242     0.000     0.000   100000   <Module::Addressable::IDNA>#lookup_unicode_combining_class 
  0.75      2.342     0.231     0.000     2.111    30000   <Module::Addressable::IDNA>#unicode_normalize_kc 
  0.68      6.852     0.208     0.000     6.644    40000   Addressable::URI#defer_validation 
  0.67      0.657     0.206     0.000     0.451    30000   <Module::Addressable::IDNA>#unicode_compose_pair 
  0.59      0.912     0.181     0.000     0.731    10000   Range#each 
  0.58      0.857     0.178     0.000     0.680    10000   Addressable::URI#authority= 
  0.56      0.173     0.173     0.000     0.000   140000   String#strip 
  0.56      0.298     0.172     0.000     0.125   120000   Kernel#initialize_dup 
  0.56      0.235     0.171     0.000     0.064   150000   Array#include? 
  0.55      0.374     0.169     0.000     0.205    40000   Addressable::URI#userinfo 
  0.49      1.087     0.151     0.000     0.936    30000   <Module::Addressable::IDNA>#unicode_compose 
  0.47      0.619     0.143     0.000     0.475    10000   Addressable::URI#replace_self 
  0.46      3.588     0.141     0.000     3.446    10000   Domainatrix::DomainParser#parse 
  0.45      0.408     0.139     0.000     0.269    10000   Domainatrix::DomainParser#parse_domains_from_host 
  0.44      0.136     0.136     0.000     0.000   180000   Kernel#is_a? 
  0.44      0.135     0.135     0.000     0.000    60000   Hash#keys 
  0.43      7.747     0.133     0.000     7.615    55992  *Class#new 
  0.43      0.310     0.131     0.000     0.179    50000   <Class::Addressable::URI>#ip_based_schemes 
  0.42      0.131     0.131     0.000     0.000    40000   NilClass#to_s 
  0.42      0.181     0.130     0.000     0.051    70000   Addressable::URI#query 
  0.42      0.129     0.129     0.000     0.000   130000   String#force_encoding 
  0.42      0.128     0.128     0.000     0.000    30000   Kernel#lambda 
  0.41      3.089     0.127     0.000     2.962    30000   Addressable::URI#normalized_scheme 
  0.40      3.014     0.124     0.000     2.890    10000   Addressable::URI#normalized_path 
  0.40      0.301     0.123     0.000     0.178    10000   <Class::Addressable::URI>#normalize_path 
  0.37      0.156     0.112     0.000     0.044    60000   Addressable::URI#password 
  0.36      0.153     0.110     0.000     0.042    60000   Addressable::URI#port 
  0.35     13.479     0.108     0.000    13.371    10000   PostRank::URI#c18n 
  0.34      0.976     0.105     0.000     0.870    10000   Addressable::URI#normalized_host 
  0.34      8.874     0.103     0.000     8.771    20001   Array#map 
  0.33      0.102     0.102     0.000     0.000   130000   Kernel#nil? 
  0.33     13.056     0.101     0.000    12.955    20000   PostRank::URI#parse 
  0.33      0.115     0.101     0.000     0.014    10000   Addressable::URI#password= 
  0.33      0.100     0.100     0.000     0.000   140000   String#to_str 
  0.32      0.099     0.099     0.000     0.000    75991   String#downcase 
  0.31      0.096     0.096     0.000     0.000    40000   Array#join 
  0.31      1.625     0.094     0.000     1.531    10000   Addressable::URI#normalized_authority 
  0.30      0.330     0.094     0.000     0.237    30000   <Module::Addressable::IDNA>#unicode_decompose 
  0.30      0.129     0.093     0.000     0.037    50000   Addressable::URI#user 
  0.30     10.220     0.093     0.000    10.127    10000   Addressable::URI#normalize 
  0.30      0.988     0.092     0.000     0.895    10000   PostRank::URI#normalize 
  0.29      0.164     0.090     0.000     0.074    10000   PostRank::URI#embedded 
  0.28      0.129     0.087     0.000     0.042    30000   Array#hash 
  0.28      0.086     0.086     0.000     0.000   105991   Hash#has_key? 
  0.27      0.085     0.085     0.000     0.000   120000   Kernel#instance_variable_set 
  0.27      0.083     0.083     0.000     0.000    40000   <Module::Addressable::IDNA>#lookup_unicode_compatibility 
  0.26      0.079     0.079     0.000     0.000    30000   Array#pack 
  0.26      0.125     0.079     0.000     0.046    10000   Addressable::URI#port= 
  0.25      0.078     0.078     0.000     0.000   105991   Kernel#class 
  0.25      0.118     0.078     0.000     0.041    10000   Addressable::URI#user= 
  0.24      0.075     0.075     0.000     0.000    90000   Kernel#kind_of? 
  0.24      0.074     0.074     0.000     0.000    60000   <Class::Addressable::URI>#port_mapping 
  0.24      0.101     0.073     0.000     0.027    40000   Addressable::URI#fragment 
  0.24      7.757     0.072     0.000     7.684    10000   PublicSuffix::List#select 
  0.24      0.072     0.072     0.000     0.000    30000   String#gsub! 
  0.23      0.071     0.071     0.000     0.000    80000   String#initialize_copy 
  0.23      1.136     0.071     0.000     1.065    40000   Kernel#!~ 
  0.23      0.177     0.070     0.000     0.107    10000   <Module::Addressable::IDNA>#to_ascii 
  0.21      0.194     0.065     0.000     0.129    30000   <Module::Addressable::IDNA>#lookup_unicode_composition 
  0.20      0.062     0.062     0.000     0.000    10000   Domainatrix::Url#initialize 
  0.19      0.192     0.057     0.000     0.135    10000   Addressable::URI#normalized_port 
  0.19     15.349     0.057     0.000    15.291    10000   PostRank::URI#clean 
  0.18      0.056     0.056     0.000     0.000    10000   Array#slice 
  0.18     24.276     0.055     0.000    24.221    10000   PostRank::URI#extract 
  0.17      6.400     0.053     0.000     6.347    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.16      0.051     0.051     0.000     0.000    30000   String#unpack 
  0.16      0.129     0.050     0.000     0.080     5991   PublicSuffix::Rule::Base#initialize 
  0.16      0.089     0.049     0.000     0.040    10000   Hash#merge 
  0.16      8.639     0.048     0.000     8.591    10000   <Module::PublicSuffix>#valid? 
  0.16      0.670     0.048     0.000     0.622    10000   Addressable::URI#fragment= 
  0.14      7.874     0.044     0.000     7.829    10000   PublicSuffix::List#find 
  0.14      0.042     0.042     0.000     0.000    60000   Kernel#hash 
  0.13      0.039     0.039     0.000     0.000    10000   Array#values_at 
  0.12      0.038     0.038     0.000     0.000    35991   Module#name 
  0.12      0.404     0.037     0.000     0.367    10000   PublicSuffix::Rule::Base#allow? 
  0.12      0.062     0.036     0.000     0.026    10000   Addressable::URI#query_values 
  0.12      0.036     0.036     0.000     0.000    10000   Addressable::URI#query= 
  0.11      0.035     0.035     0.000     0.000    10000   Array#& 
  0.11      0.084     0.035     0.000     0.049    10000   PostRank::URI#unescape 
  0.11      0.062     0.034     0.000     0.028    10000   Addressable::URI#normalized_query 
  0.11      0.261     0.034     0.000     0.227        1   IO#each_line 
  0.11      0.207     0.034     0.000     0.173     5991   <Class::PublicSuffix::Rule>#factory 
  0.10      0.068     0.032     0.000     0.036    10000   Addressable::URI#query_values= 
  0.10      0.030     0.030     0.000     0.000    30000   Array#initialize_copy 
  0.09     30.751     0.027     0.000    30.724    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.09     30.778     0.027     0.000    30.751   110000  *Proc#call 
  0.09     10.865     0.027     0.000    10.838    10000   Addressable::URI#normalize! 
  0.08      3.696     0.026     0.000     3.670    10000   <Module::Domainatrix>#parse 
  0.08      0.025     0.025     0.000     0.000    10000   Kernel#instance_variables 
  0.08      0.024     0.024     0.000     0.000    10000   String#chomp 
  0.08      0.024     0.024     0.000     0.000    10000   Hash#initialize_copy 
  0.08      0.024     0.024     0.000     0.000    30000   Module#== 
  0.07      0.022     0.022     0.000     0.000    10000   Regexp#match 
  0.07      0.032     0.022     0.000     0.010    10000   Enumerable#inject 
  0.07      0.021     0.021     0.000     0.000    10000   String#squeeze 
  0.07     24.296     0.020     0.000    24.276    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.07      0.046     0.020     0.000     0.026    10000   Addressable::URI#normalized_fragment 
  0.06      0.027     0.019     0.000     0.008    10000   Enumerable#any? 
  0.06      0.019     0.019     0.000     0.000    10000   String#tr 
  0.06      0.110     0.019     0.000     0.090    10000   Addressable::URI#normalized_userinfo 
  0.06      0.018     0.018     0.000     0.000    25991   Array#first 
  0.06      0.195     0.017     0.000     0.177    10001   Enumerable#each_with_index 
  0.05      0.039     0.017     0.000     0.022    10000   String#match 
  0.05     30.794     0.017     0.000    30.778        1   Integer#times 
  0.05      0.016     0.016     0.000     0.000    10000   PublicSuffix::Rule::Normal#parts 
  0.05      0.305     0.016     0.000     0.289    10000   <Class::PublicSuffix::List>#default 
  0.05      0.023     0.016     0.000     0.007    10000   Fixnum#== 
  0.05      0.015     0.015     0.000     0.000     5991   PublicSuffix::List#add 
  0.04      0.013     0.013     0.000     0.000    15991   Array#last 
  0.04      0.140     0.012     0.000     0.128     5909   PublicSuffix::Rule::Normal#initialize 
  0.04      0.011     0.011     0.000     0.000    10000   String#include? 
  0.04      0.011     0.011     0.000     0.000    10000   String#to_i 
  0.03      0.011     0.011     0.000     0.000    10000   Array#compact 
  0.03      0.009     0.009     0.000     0.000    10000   Array#push 
  0.03      0.008     0.008     0.000     0.000    10000   Symbol#== 
  0.03      0.008     0.008     0.000     0.000    10000   NilClass#nil? 
  0.02      0.007     0.007     0.000     0.000    10000   BasicObject#== 
  0.02      0.007     0.007     0.000     0.000     5991   Module#const_get 
  0.02      0.006     0.006     0.000     0.000     6868   String#strip! 
  0.02      0.006     0.006     0.000     0.000     5991   String#capitalize 
  0.02      0.005     0.005     0.000     0.000     5991   String#to_sym 
  0.00      0.000     0.000     0.000     0.000      308   Hash#[]= 
  0.00      0.001     0.000     0.000     0.001       41   PublicSuffix::Rule::Wildcard#initialize 
  0.00      0.001     0.000     0.000     0.001       41   PublicSuffix::Rule::Exception#initialize 
  0.00     30.794     0.000     0.000    30.794        1   Object#measure 
  0.00      0.000     0.000     0.000     0.000        1   File#initialize 
  0.00      0.289     0.000     0.000     0.289        1   PublicSuffix::List#initialize 
  0.00      0.000     0.000     0.000     0.000        1   <Class::PublicSuffix::List>#default_definition 
  0.00      0.028     0.000     0.000     0.028        1   PublicSuffix::List#create_index! 
  0.00      0.289     0.000     0.000     0.289        1   <Class::PublicSuffix::List>#parse 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#dirname 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#join 
  0.00      0.000     0.000     0.000     0.000        1   <Class::IO>#new 
  0.00      0.000     0.000     0.000     0.000        1   Kernel#block_given? 

* indicates recursively called methods
              PASS (0:00:30.961) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 30.961786 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

Without Postrank::URI

Loaded Suite test,test/profile,test/unit

Started at 2014-02-20 16:23:30 +0100 w/ seed 21376.

NormalizerPerformanceTest
Thread ID: 70309785233120
Fiber ID: 70309800490320
Total: 5.622905
Sort by: self_time

 %self      total      self      wait     child     calls  name
  9.97      3.701     0.561     0.000     3.140    20000   <Class::Addressable::URI>#parse 
  4.99      1.319     0.281     0.000     1.038    60000   Addressable::URI#validate 
  4.73      0.266     0.266     0.000     0.000   360000   Kernel#instance_variable_defined? 
  4.32      0.403     0.243     0.000     0.160    20000   Addressable::URI#host= 
  4.17      0.339     0.235     0.000     0.105   170000   String#== 
  3.96      0.310     0.222     0.000     0.088   120000   Addressable::URI#host 
  3.70      0.390     0.208     0.000     0.182    20000   Addressable::URI#scheme= 
  3.35      0.265     0.188     0.000     0.077   100000   Addressable::URI#scheme 
  3.18      0.750     0.179     0.000     0.571    10000   Addressable::URI#to_s 
  2.74      0.154     0.154     0.000     0.000   110000   String#[] 
  2.64      0.206     0.148     0.000     0.057    80000   Addressable::URI#path 
  2.57      0.341     0.145     0.000     0.196    10000   Domainatrix::DomainParser#parse_domains_from_host 
  2.55      0.361     0.144     0.000     0.218   100000   BasicObject#!= 
  2.52      0.193     0.142     0.000     0.052    20000   Addressable::URI#path= 
  2.42      2.610     0.136     0.000     2.474    10000   Domainatrix::DomainParser#parse 
  2.23      2.718     0.125     0.000     2.593    20000   Addressable::URI#initialize 
  2.16      0.122     0.122     0.000     0.000    20000   String#scan 
  1.96      0.429     0.110     0.000     0.319    20000   Addressable::URI#ip_based? 
  1.92      2.564     0.108     0.000     2.456    20000   Addressable::URI#defer_validation 
  1.86      0.105     0.105     0.000     0.000   150000   Kernel#respond_to_missing? 
  1.75      0.297     0.099     0.000     0.199    20000   Addressable::URI#authority 
  1.55      2.867     0.087     0.000     2.779    30000   Class#new 
  1.47      0.109     0.083     0.000     0.026    10000   Array#each 
  1.23      0.069     0.069     0.000     0.000    40000   String#gsub 
  1.11      5.469     0.063     0.000     5.406    20000   Array#map 
  1.09      0.061     0.061     0.000     0.000    10000   Domainatrix::Url#initialize 
  1.09      0.061     0.061     0.000     0.000    70000   Kernel#respond_to? 
  1.07      0.060     0.060     0.000     0.000    50000   String#strip 
  1.00      0.056     0.056     0.000     0.000    20000   Hash#keys 
  0.93      5.393     0.052     0.000     5.341    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.92      0.052     0.052     0.000     0.000    20000   String#=~ 
  0.92      0.133     0.051     0.000     0.081    20000   <Class::Addressable::URI>#ip_based_schemes 
  0.81      0.045     0.045     0.000     0.000    60000   Hash#has_key? 
  0.76      0.094     0.043     0.000     0.052    10000   Addressable::URI#userinfo 
  0.75      0.042     0.042     0.000     0.000    60000   String#to_str 
  0.74      0.119     0.041     0.000     0.078    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.67      0.052     0.037     0.000     0.015    20000   Addressable::URI#query 
  0.58      0.033     0.033     0.000     0.000    20000   String#split 
  0.55      0.083     0.031     0.000     0.052    20000   Kernel#!~ 
  0.53      0.065     0.030     0.000     0.035    10000   Hash#merge 
  0.51      0.044     0.029     0.000     0.015    20000   Array#include? 
  0.49      0.028     0.028     0.000     0.000    20000   Array#join 
  0.47      0.026     0.026     0.000     0.000    10000   Array#flatten 
  0.46      5.583     0.026     0.000     5.557    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.45      0.025     0.025     0.000     0.000    20000   <Class::Addressable::URI>#port_mapping 
  0.45      2.743     0.025     0.000     2.717    10000   <Module::Domainatrix>#parse 
  0.45      5.608     0.025     0.000     5.583    30000  *Proc#call 
  0.42      0.023     0.023     0.000     0.000    30000   Array#reverse 
  0.39      0.022     0.022     0.000     0.000    30000   String#to_s 
  0.36      0.020     0.020     0.000     0.000    10000   Hash#initialize_copy 
  0.34      0.019     0.019     0.000     0.000    20000   Module#name 
  0.34      0.026     0.019     0.000     0.007    10000   Addressable::URI#user 
  0.33      0.026     0.019     0.000     0.007    10000   Addressable::URI#fragment 
  0.33      0.026     0.019     0.000     0.007    10000   Addressable::URI#password 
  0.33      0.026     0.019     0.000     0.008    10000   Addressable::URI#port 
  0.33      0.019     0.019     0.000     0.000    20000   String#downcase 
  0.31      0.126     0.017     0.000     0.109    10000   Enumerable#each_with_index 
  0.31      0.017     0.017     0.000     0.000    20000   Kernel#kind_of? 
  0.28      0.016     0.016     0.000     0.000    20000   Kernel#is_a? 
  0.27      0.035     0.015     0.000     0.020    10000   Kernel#initialize_dup 
  0.27      5.623     0.015     0.000     5.608        1   Integer#times 
  0.25      0.014     0.014     0.000     0.000    20000   Kernel#class 
  0.23      0.013     0.013     0.000     0.000    10000   Kernel#Array 
  0.17      0.010     0.010     0.000     0.000    10000   Array#slice 
  0.16      0.009     0.009     0.000     0.000    10000   String#force_encoding 
  0.13      0.007     0.007     0.000     0.000    10000   Symbol#to_proc 
  0.00      5.623     0.000     0.000     5.623        1   Object#measure 

* indicates recursively called methods
              PASS (0:00:05.639) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 5.639741 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

@walro
Copy link
Contributor

walro commented Feb 20, 2014

Much improve, so amaze!

Inspiration from elasticsearch-transport tests:

https://github.com/elasticsearch/elasticsearch-ruby/blob/6f83143b8e6409a
2eaf451a4dabf2c64f25ade31/elasticsearch-transport/test/profile/client_be
nchmark_test.rb
@jage
Copy link
Contributor Author

jage commented Feb 20, 2014

I say I'm done!

@walro
Copy link
Contributor

walro commented Feb 20, 2014

:shipit:

Should not exist in gems
In 19d28c6 when I removed
Postrank::URI, I removed the feature that detected URLs without
protocol.

This commits enables tests for it again.
Enabled the behavior removed in 19d28c6

This uses PublicSuffix and Addressable instead of Postrank::URI though.
Why? Postrank::URI was very slow, this is also slow, but not quite as
slow.
@jage
Copy link
Contributor Author

jage commented Feb 20, 2014

Ok, we had some discussion about the changed behavior in this gem. I've added the old features again, but with new code.

I'm using PublicSuffix instead of PostRank::URI.

Since I'm verifying the domains, this is pretty slow. Not quite as slow as Postrank::URI though.

Loaded Suite test/lib,test,test/profile,test/unit

Started at 2014-02-20 19:29:04 +0100 w/ seed 4195.

NormalizerPerformanceTest
Thread ID: 70319318886120
Fiber ID: 70319324815320
Total: 12.994087
Sort by: self_time

 %self      total      self      wait     child     calls  name
 13.80      2.209     1.793     0.000     0.416   420000   PublicSuffix::Rule::Base#odiff 
 12.21      7.021     1.587     0.000     5.434   420000   PublicSuffix::Rule::Base#match? 
 11.79      3.333     1.533     0.000     1.801   436385   <Class::PublicSuffix::Domain>#domain_to_labels 
  8.45      1.098     1.098     0.000     0.000   462771   String#split 
  4.77      7.648     0.620     0.000     7.029    20000   Array#select 
  3.32      0.432     0.432     0.000     0.000   436385   Array#reverse 
  3.20      0.531     0.416     0.000     0.116    20000   PublicSuffix::Rule::Normal#decompose 
  3.20      0.416     0.416     0.000     0.000   420000   Array#[] 
  2.78      0.361     0.361     0.000     0.000   499208   String#to_s 
  2.30      1.976     0.299     0.000     1.676    10000   <Class::Addressable::URI>#parse 
  1.46      0.190     0.190     0.000     0.000   240000   Kernel#instance_variable_defined? 
  1.46      0.799     0.189     0.000     0.609    10000   Addressable::URI#to_s 
  1.35      2.478     0.175     0.000     2.303    10000   <Class::Addressable::URI>#heuristic_parse 
  1.15      0.713     0.150     0.000     0.564    30000   Addressable::URI#validate 
  1.15      0.218     0.150     0.000     0.069   100000   String#== 
  1.09      0.200     0.141     0.000     0.059    70000   Addressable::URI#scheme 
  1.06      0.192     0.138     0.000     0.054    70000   Addressable::URI#host 
  1.02      8.860     0.132     0.000     8.729    10000   <Module::PublicSuffix>#parse 
  0.99      0.214     0.129     0.000     0.085    10000   Addressable::URI#host= 
  0.83      0.208     0.108     0.000     0.100    10000   Addressable::URI#scheme= 
  0.79      0.312     0.103     0.000     0.209    20000   Addressable::URI#authority 
  0.78      0.140     0.101     0.000     0.039    50000   Addressable::URI#path 
  0.72      7.877     0.093     0.000     7.784    10000   PublicSuffix::List#select 
  0.68      0.226     0.088     0.000     0.138    60000   BasicObject#!= 
  0.61      0.080     0.080     0.000     0.000    56438   String#[] 
  0.59     12.736     0.077     0.000    12.659    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.57      0.102     0.074     0.000     0.028    10000   Addressable::URI#path= 
  0.53      0.069     0.069     0.000     0.000    90000   Kernel#respond_to_missing? 
  0.51      0.066     0.066     0.000     0.000    20000   String#=~ 
  0.50      1.452     0.065     0.000     1.386    10000   Addressable::URI#initialize 
  0.49      0.064     0.064     0.000     0.000    10000   String#scan 
  0.48      0.072     0.062     0.000     0.010    10000   Enumerable#inject 
  0.48      0.062     0.062     0.000     0.000    10000   String#gsub! 
  0.47      0.062     0.062     0.000     0.000    30000   Array#join 
  0.47      0.097     0.061     0.000     0.036    10000   Hash#merge 
  0.45      0.239     0.059     0.000     0.180    10000   Addressable::URI#ip_based? 
  0.45      0.119     0.058     0.000     0.061    10000   PublicSuffix::Domain#subdomain? 
  0.44      0.138     0.057     0.000     0.081     6385   PublicSuffix::Rule::Base#initialize 
  0.41      1.371     0.054     0.000     1.317    10000   Addressable::URI#defer_validation 
  0.39      0.324     0.051     0.000     0.274        1   IO#each_line 
  0.37      0.048     0.048     0.000     0.000    50000   Kernel#respond_to? 
  0.36      1.896     0.047     0.000     1.849    26386  *Class#new 
  0.36      8.035     0.046     0.000     7.989    10000   PublicSuffix::List#find 
  0.35     12.796     0.046     0.000    12.750    20001  *Array#map 
  0.34      0.099     0.045     0.000     0.055    10000   Addressable::URI#userinfo 
  0.34      0.145     0.045     0.000     0.100    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.34      0.044     0.044     0.000     0.000    10000   Array#flatten 
  0.30      0.236     0.038     0.000     0.197     6385   <Class::PublicSuffix::Rule>#factory 
  0.29      0.037     0.037     0.000     0.000    20000   String#gsub 
  0.29      0.037     0.037     0.000     0.000    10000   Hash#keys 
  0.28      0.036     0.036     0.000     0.000    50000   Kernel#nil? 
  0.27      0.338     0.035     0.000     0.303    10000   PublicSuffix::Rule::Base#allow? 
  0.26     12.951     0.033     0.000    12.917    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.25      0.098     0.032     0.000     0.066    20000   Kernel#!~ 
  0.24      0.032     0.032     0.000     0.000    20000   PublicSuffix::Rule::Normal#parts 
  0.24      0.031     0.031     0.000     0.000    10000   Regexp#=== 
  0.24      0.061     0.031     0.000     0.031    20000   Kernel#initialize_dup 
  0.24      0.039     0.031     0.000     0.008    10000   PublicSuffix::Domain#initialize 
  0.23      0.030     0.030     0.000     0.000    40000   String#to_str 
  0.21      0.078     0.028     0.000     0.050    10000   <Class::Addressable::URI>#ip_based_schemes 
  0.21     12.978     0.028     0.000    12.951    20000  *Proc#call 
  0.19      0.030     0.024     0.000     0.006    10001   Array#each 
  0.18      0.023     0.023     0.000     0.000    20000   String#strip 
  0.17      0.022     0.022     0.000     0.000    20000   String#chomp 
  0.16      0.021     0.021     0.000     0.000    10000   Array#values_at 
  0.16      0.021     0.021     0.000     0.000    10000   Hash#initialize_copy 
  0.16      0.021     0.021     0.000     0.000    26385   Hash#has_key? 
  0.16      0.020     0.020     0.000     0.000    20000   String#include? 
  0.15      0.027     0.020     0.000     0.007    10000   Addressable::URI#password 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#user 
  0.15      0.020     0.020     0.000     0.000    26385   Array#first 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#fragment 
  0.15      0.028     0.020     0.000     0.008    10000   Addressable::URI#query 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#port 
  0.14      0.018     0.018     0.000     0.000    20000   Kernel#kind_of? 
  0.14      0.044     0.018     0.000     0.025    10000   Kernel#dup 
  0.13      0.017     0.017     0.000     0.000     6385   PublicSuffix::List#add 
  0.13      0.016     0.016     0.000     0.000    16385   Module#name 
  0.13      0.016     0.016     0.000     0.000    16385   String#downcase 
  0.12     12.994     0.016     0.000    12.978        1   Integer#times 
  0.12      0.375     0.016     0.000     0.359    10000   <Class::PublicSuffix::List>#default 
  0.12      0.024     0.016     0.000     0.009    10000   Array#include? 
  0.11      0.014     0.014     0.000     0.000    10000   Kernel#Array 
  0.11      0.014     0.014     0.000     0.000     7820   <Class::PublicSuffix::List>#private_domains? 
  0.10      0.150     0.014     0.000     0.137     6332   PublicSuffix::Rule::Normal#initialize 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#trd 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#tld 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#sld 
  0.10      0.013     0.013     0.000     0.000    10000   <Class::Addressable::URI>#port_mapping 
  0.10      0.013     0.013     0.000     0.000    16385   Array#last 
  0.10      0.013     0.013     0.000     0.000    16385   Kernel#class 
  0.08      0.010     0.010     0.000     0.000    10000   Array#compact 
  0.08      0.010     0.010     0.000     0.000    10000   String#initialize_copy 
  0.07      0.009     0.009     0.000     0.000    10000   String#force_encoding 
  0.07      0.009     0.009     0.000     0.000    10000   Symbol#to_proc 
  0.07      0.009     0.009     0.000     0.000    10000   Array#pop 
  0.06      0.008     0.008     0.000     0.000    10000   Kernel#is_a? 
  0.06      0.008     0.008     0.000     0.000    10001   Kernel#block_given? 
  0.06      0.007     0.007     0.000     0.000    10000   Symbol#== 
  0.06      0.007     0.007     0.000     0.000     6385   Module#const_get 
  0.05      0.007     0.007     0.000     0.000     7820   String#strip! 
  0.05      0.006     0.006     0.000     0.000     6385   String#capitalize 
  0.04      0.006     0.006     0.000     0.000     6385   String#to_sym 
  0.01      0.001     0.001     0.000     0.000      560   Hash#[]= 
  0.00      0.001     0.000     0.000     0.001       34   PublicSuffix::Rule::Wildcard#initialize 
  0.00      0.000     0.000     0.000     0.000       19   PublicSuffix::Rule::Exception#initialize 
  0.00     12.994     0.000     0.000    12.994        1   Object#measure 
  0.00      0.000     0.000     0.000     0.000        1   File#initialize 
  0.00      0.359     0.000     0.000     0.359        1   PublicSuffix::List#initialize 
  0.00      0.020     0.000     0.000     0.020        1   Enumerable#each_with_index 
  0.00      0.034     0.000     0.000     0.034        1   PublicSuffix::List#create_index! 
  0.00      0.000     0.000     0.000     0.000        1   <Class::PublicSuffix::List>#default_definition 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#join 
  0.00      0.000     0.000     0.000     0.000        1   <Class::IO>#new 
  0.00      0.359     0.000     0.000     0.359        1   <Class::PublicSuffix::List>#parse 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#dirname 

* indicates recursively called methods
              PASS (0:00:13.171) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 13.171820 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

assert_equal [url], @normalizer.normalize(url)
end

should "should not blow up when there's no URL in the text" do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One "should" too much?

@dentarg
Copy link
Collaborator

dentarg commented Feb 21, 2014

From bundle exec rake in stanley using this branch for twingly-url-normalizer:

TestUrlHelper
     FAIL (0:00:00.066) test_site_url_without_scheme
          Expected: "//www.asos.com/"
            Actual: "//www.asos.com"
        @ /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:200:in `assert'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:240:in `assert_equal'
          test/unit/url_helper_test.rb:10:in `test_site_url_without_scheme'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1301:in `run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:867:in `_run_anything'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1060:in `run_tests'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1047:in `block in _run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1046:in `each'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1046:in `_run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1035:in `run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:789:in `block in autorun'

@jage
Copy link
Contributor Author

jage commented Feb 21, 2014

Expected: "//www.asos.com/"
Actual: "//www.asos.com"

Ok, I'll look into it.

Insert / if no path exist.
@jage
Copy link
Contributor Author

jage commented Feb 21, 2014

So, what's next? Start using it and see where it breaks?

@walro
Copy link
Contributor

walro commented Feb 21, 2014

I think so. Hopefully the tests in Zambezi and Stanley will pick any problems up :)

jage added a commit that referenced this pull request Feb 21, 2014
@jage jage merged commit d345763 into master Feb 21, 2014
@jage jage deleted the tests branch February 21, 2014 16:42
@jage jage mentioned this pull request Feb 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants