Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warning: regexp match /.../n against to UTF-8 string #87

Closed
mhfs opened this issue Jun 4, 2010 · 25 comments
Closed

warning: regexp match /.../n against to UTF-8 string #87

mhfs opened this issue Jun 4, 2010 · 25 comments

Comments

@mhfs
Copy link

mhfs commented Jun 4, 2010

Hey there,

In one of my specs I'm getting several warnings because of this line:

click 'Enviar instruções de redefinição de senha'

gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string

In case it helps, here is my Gemfile content https://gist.github.com/c0e8f04433cf1affcf84

Thanks,
Marcelo

@mhfs
Copy link
Author

mhfs commented Jun 4, 2010

Oops ... forgot to mention I'm using ruby-1.9.2-preview3.

@jnicklas
Copy link
Collaborator

sounds like this is coming from inside rack, any evidence this is a Capybara problem?

@rosenfeld
Copy link

I'm also using Capybara and I'm having exactly the same problem, but I can't tell you it is a Capybara problem. I don't know what is the cause... I'm also having this warnings:

Non US-ASCII detected and no charset defined.
Defaulting to UTF-8, set your own if this is incorrect.

These are probably related to mail delivering...

@jnicklas
Copy link
Collaborator

God I hate Ruby 1.9 charset issues :(

@rosenfeld
Copy link

:)

@jnicklas
Copy link
Collaborator

I find it very unlikely that this is a Capybara issue. Closing this. If someone can prove to me otherwise, please reopen!

@ctrochalakis
Copy link

Yes, Rack::Utils.escape uses an ascii regural expression on unicode data, notice the '/n'

def escape(s)
  s.to_s.gsub(/([^ a-zA-Z0-9_.-]+)/n) {
    '%'+$1.unpack('H2'*bytesize($1)).join('%').upcase
  }.tr(' ', '+')
end

But I have no idea where this gets called :|

@e2
Copy link
Contributor

e2 commented Aug 16, 2010

Its because of the _snowman ...

@e2
Copy link
Contributor

e2 commented Aug 16, 2010

@e2
Copy link
Contributor

e2 commented Aug 16, 2010

Here is a extremely bad patch for capybara/driver/rack_test_driver.rb:

  def process(method, path, attributes = {})
    return if path.gsub(/^#{request_path}/, '') =~ /^#/

    if Kernel.const_defined?(:Encoding)
      att = {} 
      attributes.each_pair do |k,v|
        key = k.dup.force_encoding(Encoding::ASCII_8BIT)
        att[key] = v.dup.force_encoding(Encoding::ASCII_8BIT)
      end
      path = path.dup.force_encoding(Encoding::ASCII_8BIT)
    else
      att = attributes
    end

    send(method, path, att, env)
    follow_redirects!
  end

The same goes for the submit method.

@jnicklas
Copy link
Collaborator

I don't know much about encodings, but that seems completely wrong to me. Unless the attributes are actually in ASCII_8BIT, then force encoding them there seems like a bad idea, and if they're not ASCII, then why cast them? Wouldn't that just lead to encoding errors? I think in effect we are removing the encoding information from the Strings for no gain. The problem lies deeper than this. I'm not sure how to fix it, but I'm pretty sure this isn't it.

@e2
Copy link
Contributor

e2 commented Aug 17, 2010

ASCII_8BIT is an alias for binary and 'force_encoding' does exactly what you mentioned: it removes the encoding. You cannot "encode" to binary, and as for speed, this doesn't have overhead, apart from looking very ugly. The fact is that frameworks such as cucumber don't guarantee binary strings, and they don't seem to belong to rack.

Why Capybara? Because rack works on binary data and is optimized for real world applications, where parameters are binary, because you cannot effectively make assumptions about the encoding (what about UTF-16? Shift-JIS? - they will crash on the /u parameter). Capybara has a driver that passes utf-8 strings that occur only because the parameters are crafted in the cucumber tests and never cast to binary.

In summary, the warning is there because making thing assume utf-8 is fundamentally wrong and suggests a deeper problem, as you have described.

Summary: In my opinion Capybara should simulate the real world by passing unencoded strings (ASCII_8BIT as named in ruby), instead of getting rack to work around acceptance tests, which provide non-binary strings and grow encoding handling functionality.

Also, note that the US_ASCII is the only exception that is compatible with ASCII_8BIT ('binary' in normal people's terms) and ruby handles automatic conversion as a special case. Thats why it works without warning until utf-8 characters are used (_snowman in this case).

I deeply recommend the following: http://yehudakatz.com/2010/05/17/encodings-unabridged/

You can read through ruby-core (especially Yui Naruse's comments) on why ruby works this way and there is no "binary" encoding.

@jnicklas
Copy link
Collaborator

That was a case well made, I'm actually convinced. Can you provide a patch with tests? That'd be absolutely awesome! This whole encoding thing is giving me headaches (and yes, I've read through Yehuda's article, and a number of others), so I feel ill equipped to tackle this.

@e2
Copy link
Contributor

e2 commented Aug 19, 2010

Rushed patch with tests here:

http://github.com/e2/capybara/commit/c8a55c4011c2f91943ee9aeebe3b81a4420c3ecf

I don't like the solution and you can completely redo it without crediting me at all - and I will be completely happy. Just let me know once you have patched it so I can nuke my fork...

Thanks :)

P.S. Sorry about "bonus" indenting - I just didn't want to rebase+squash AGAIN...

@jnicklas
Copy link
Collaborator

Good job e2! I actually thought that solution was pretty decent. I've merged it in, if someone wants to improve on it, they're welcome, but this looks good for now.

@jwilsonsprings
Copy link

Humm, I upgraded to this while using (ruby 1.9.2, Rails 3.0.0). While I don't get the warning, things are worse off. My tests fail with a :

undefined local variable or method `node' for # (NameError)
./features/step_definitions/web_steps.rb:35:in `block (2 levels) in '
./features/step_definitions/web_steps.rb:14:in `with_scope'
./features/step_definitions/web_steps.rb:34:in `/^(?:|I )follow "([^\"]*)"(?: within "([^\"]*)")?$/'

When I try to use Spork, I get this:

/home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:573:in `load': too large packet 67654656 (DRb::DRbConnError)
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:632:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:918:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1197:in `send_message'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1088:in `block (2 levels) in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1172:in `open'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1087:in `block in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1105:in `with_friend'
Exception encountered: #
backtrace:

@jwilsonsprings
Copy link

To be fair, I just pulled edge from github, so possible this is something else... Falling back to released 0.3.9

@jnicklas
Copy link
Collaborator

You shouldn't be getting that error on the latest master. It's a problem where cucumber monkey-patches a Capybara method which no longer exists. There's a line in env.rb along the lines of:

 require 'cucumber/capybara-javascript-emulation'

or something to that effect. Remove it and the error will go away.

@jwilsonsprings
Copy link

Yep, that did it! Thanks.

@DCarper
Copy link

DCarper commented Nov 11, 2010

Deleting this makes clicking normal links work, but for me at least then links with onclick's (generated from link_to ... :method => :post) fails..

Any ideas?

@leebyrd
Copy link

leebyrd commented Jan 15, 2011

Having the same problem. Any resolution?

@jeroenhouben
Copy link

whatever happened to this issue?

@jeroenhouben
Copy link

OK seems to be continued here: #243

@jnicklas
Copy link
Collaborator

jnicklas commented Jun 9, 2011

tl;dr: upgrade to Rack 1.3.0

@mischa
Copy link

mischa commented Sep 15, 2011

tl;dr 2:

A lot of things don't support Rack 1.3.0 yet (formtastic, rails 3.0.9, sendgrid-rails, etc). For now, you can put this in an initializer or support file:

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    def unescape(s)
      CGI.unescape(s)
    end
  end
end

@lock lock bot locked and limited conversation to collaborators Aug 18, 2019
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants