warning: regexp match /.../n against to UTF-8 string #87

Closed
mhfs opened this Issue Jun 4, 2010 · 25 comments

Comments

Projects
None yet
10 participants
@mhfs

mhfs commented Jun 4, 2010

Hey there,

In one of my specs I'm getting several warnings because of this line:

click 'Enviar instruções de redefinição de senha'

gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string

In case it helps, here is my Gemfile content https://gist.github.com/c0e8f04433cf1affcf84

Thanks,
Marcelo

@mhfs

This comment has been minimized.

Show comment
Hide comment
@mhfs

mhfs Jun 4, 2010

Oops ... forgot to mention I'm using ruby-1.9.2-preview3.

mhfs commented Jun 4, 2010

Oops ... forgot to mention I'm using ruby-1.9.2-preview3.

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Jun 12, 2010

Collaborator

sounds like this is coming from inside rack, any evidence this is a Capybara problem?

Collaborator

jnicklas commented Jun 12, 2010

sounds like this is coming from inside rack, any evidence this is a Capybara problem?

@rosenfeld

This comment has been minimized.

Show comment
Hide comment
@rosenfeld

rosenfeld Jun 19, 2010

I'm also using Capybara and I'm having exactly the same problem, but I can't tell you it is a Capybara problem. I don't know what is the cause... I'm also having this warnings:

Non US-ASCII detected and no charset defined.
Defaulting to UTF-8, set your own if this is incorrect.

These are probably related to mail delivering...

I'm also using Capybara and I'm having exactly the same problem, but I can't tell you it is a Capybara problem. I don't know what is the cause... I'm also having this warnings:

Non US-ASCII detected and no charset defined.
Defaulting to UTF-8, set your own if this is incorrect.

These are probably related to mail delivering...

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Jun 19, 2010

Collaborator

God I hate Ruby 1.9 charset issues :(

Collaborator

jnicklas commented Jun 19, 2010

God I hate Ruby 1.9 charset issues :(

@rosenfeld

This comment has been minimized.

Show comment
Hide comment
@rosenfeld

rosenfeld Jun 19, 2010

:)

:)

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Jun 29, 2010

Collaborator

I find it very unlikely that this is a Capybara issue. Closing this. If someone can prove to me otherwise, please reopen!

Collaborator

jnicklas commented Jun 29, 2010

I find it very unlikely that this is a Capybara issue. Closing this. If someone can prove to me otherwise, please reopen!

@ctrochalakis

This comment has been minimized.

Show comment
Hide comment
@ctrochalakis

ctrochalakis Jul 9, 2010

Yes, Rack::Utils.escape uses an ascii regural expression on unicode data, notice the '/n'

def escape(s)
  s.to_s.gsub(/([^ a-zA-Z0-9_.-]+)/n) {
    '%'+$1.unpack('H2'*bytesize($1)).join('%').upcase
  }.tr(' ', '+')
end

But I have no idea where this gets called :|

Yes, Rack::Utils.escape uses an ascii regural expression on unicode data, notice the '/n'

def escape(s)
  s.to_s.gsub(/([^ a-zA-Z0-9_.-]+)/n) {
    '%'+$1.unpack('H2'*bytesize($1)).join('%').upcase
  }.tr(' ', '+')
end

But I have no idea where this gets called :|

@e2

This comment has been minimized.

Show comment
Hide comment
@e2

e2 Aug 16, 2010

Contributor

Its because of the _snowman ...

Contributor

e2 commented Aug 16, 2010

Its because of the _snowman ...

@e2

This comment has been minimized.

Show comment
Hide comment
@e2

This comment has been minimized.

Show comment
Hide comment
@e2

e2 Aug 16, 2010

Contributor

Here is a extremely bad patch for capybara/driver/rack_test_driver.rb:

  def process(method, path, attributes = {})
    return if path.gsub(/^#{request_path}/, '') =~ /^#/

    if Kernel.const_defined?(:Encoding)
      att = {} 
      attributes.each_pair do |k,v|
        key = k.dup.force_encoding(Encoding::ASCII_8BIT)
        att[key] = v.dup.force_encoding(Encoding::ASCII_8BIT)
      end
      path = path.dup.force_encoding(Encoding::ASCII_8BIT)
    else
      att = attributes
    end

    send(method, path, att, env)
    follow_redirects!
  end

The same goes for the submit method.

Contributor

e2 commented Aug 16, 2010

Here is a extremely bad patch for capybara/driver/rack_test_driver.rb:

  def process(method, path, attributes = {})
    return if path.gsub(/^#{request_path}/, '') =~ /^#/

    if Kernel.const_defined?(:Encoding)
      att = {} 
      attributes.each_pair do |k,v|
        key = k.dup.force_encoding(Encoding::ASCII_8BIT)
        att[key] = v.dup.force_encoding(Encoding::ASCII_8BIT)
      end
      path = path.dup.force_encoding(Encoding::ASCII_8BIT)
    else
      att = attributes
    end

    send(method, path, att, env)
    follow_redirects!
  end

The same goes for the submit method.

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Aug 16, 2010

Collaborator

I don't know much about encodings, but that seems completely wrong to me. Unless the attributes are actually in ASCII_8BIT, then force encoding them there seems like a bad idea, and if they're not ASCII, then why cast them? Wouldn't that just lead to encoding errors? I think in effect we are removing the encoding information from the Strings for no gain. The problem lies deeper than this. I'm not sure how to fix it, but I'm pretty sure this isn't it.

Collaborator

jnicklas commented Aug 16, 2010

I don't know much about encodings, but that seems completely wrong to me. Unless the attributes are actually in ASCII_8BIT, then force encoding them there seems like a bad idea, and if they're not ASCII, then why cast them? Wouldn't that just lead to encoding errors? I think in effect we are removing the encoding information from the Strings for no gain. The problem lies deeper than this. I'm not sure how to fix it, but I'm pretty sure this isn't it.

@e2

This comment has been minimized.

Show comment
Hide comment
@e2

e2 Aug 17, 2010

Contributor

ASCII_8BIT is an alias for binary and 'force_encoding' does exactly what you mentioned: it removes the encoding. You cannot "encode" to binary, and as for speed, this doesn't have overhead, apart from looking very ugly. The fact is that frameworks such as cucumber don't guarantee binary strings, and they don't seem to belong to rack.

Why Capybara? Because rack works on binary data and is optimized for real world applications, where parameters are binary, because you cannot effectively make assumptions about the encoding (what about UTF-16? Shift-JIS? - they will crash on the /u parameter). Capybara has a driver that passes utf-8 strings that occur only because the parameters are crafted in the cucumber tests and never cast to binary.

In summary, the warning is there because making thing assume utf-8 is fundamentally wrong and suggests a deeper problem, as you have described.

Summary: In my opinion Capybara should simulate the real world by passing unencoded strings (ASCII_8BIT as named in ruby), instead of getting rack to work around acceptance tests, which provide non-binary strings and grow encoding handling functionality.

Also, note that the US_ASCII is the only exception that is compatible with ASCII_8BIT ('binary' in normal people's terms) and ruby handles automatic conversion as a special case. Thats why it works without warning until utf-8 characters are used (_snowman in this case).

I deeply recommend the following: http://yehudakatz.com/2010/05/17/encodings-unabridged/

You can read through ruby-core (especially Yui Naruse's comments) on why ruby works this way and there is no "binary" encoding.

Contributor

e2 commented Aug 17, 2010

ASCII_8BIT is an alias for binary and 'force_encoding' does exactly what you mentioned: it removes the encoding. You cannot "encode" to binary, and as for speed, this doesn't have overhead, apart from looking very ugly. The fact is that frameworks such as cucumber don't guarantee binary strings, and they don't seem to belong to rack.

Why Capybara? Because rack works on binary data and is optimized for real world applications, where parameters are binary, because you cannot effectively make assumptions about the encoding (what about UTF-16? Shift-JIS? - they will crash on the /u parameter). Capybara has a driver that passes utf-8 strings that occur only because the parameters are crafted in the cucumber tests and never cast to binary.

In summary, the warning is there because making thing assume utf-8 is fundamentally wrong and suggests a deeper problem, as you have described.

Summary: In my opinion Capybara should simulate the real world by passing unencoded strings (ASCII_8BIT as named in ruby), instead of getting rack to work around acceptance tests, which provide non-binary strings and grow encoding handling functionality.

Also, note that the US_ASCII is the only exception that is compatible with ASCII_8BIT ('binary' in normal people's terms) and ruby handles automatic conversion as a special case. Thats why it works without warning until utf-8 characters are used (_snowman in this case).

I deeply recommend the following: http://yehudakatz.com/2010/05/17/encodings-unabridged/

You can read through ruby-core (especially Yui Naruse's comments) on why ruby works this way and there is no "binary" encoding.

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Aug 19, 2010

Collaborator

That was a case well made, I'm actually convinced. Can you provide a patch with tests? That'd be absolutely awesome! This whole encoding thing is giving me headaches (and yes, I've read through Yehuda's article, and a number of others), so I feel ill equipped to tackle this.

Collaborator

jnicklas commented Aug 19, 2010

That was a case well made, I'm actually convinced. Can you provide a patch with tests? That'd be absolutely awesome! This whole encoding thing is giving me headaches (and yes, I've read through Yehuda's article, and a number of others), so I feel ill equipped to tackle this.

@e2

This comment has been minimized.

Show comment
Hide comment
@e2

e2 Aug 19, 2010

Contributor

Rushed patch with tests here:

http://github.com/e2/capybara/commit/c8a55c4011c2f91943ee9aeebe3b81a4420c3ecf

I don't like the solution and you can completely redo it without crediting me at all - and I will be completely happy. Just let me know once you have patched it so I can nuke my fork...

Thanks :)

P.S. Sorry about "bonus" indenting - I just didn't want to rebase+squash AGAIN...

Contributor

e2 commented Aug 19, 2010

Rushed patch with tests here:

http://github.com/e2/capybara/commit/c8a55c4011c2f91943ee9aeebe3b81a4420c3ecf

I don't like the solution and you can completely redo it without crediting me at all - and I will be completely happy. Just let me know once you have patched it so I can nuke my fork...

Thanks :)

P.S. Sorry about "bonus" indenting - I just didn't want to rebase+squash AGAIN...

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Aug 20, 2010

Collaborator

Good job e2! I actually thought that solution was pretty decent. I've merged it in, if someone wants to improve on it, they're welcome, but this looks good for now.

Collaborator

jnicklas commented Aug 20, 2010

Good job e2! I actually thought that solution was pretty decent. I've merged it in, if someone wants to improve on it, they're welcome, but this looks good for now.

@jwilsonsprings

This comment has been minimized.

Show comment
Hide comment
@jwilsonsprings

jwilsonsprings Sep 28, 2010

Humm, I upgraded to this while using (ruby 1.9.2, Rails 3.0.0). While I don't get the warning, things are worse off. My tests fail with a :

undefined local variable or method `node' for # (NameError)
./features/step_definitions/web_steps.rb:35:in `block (2 levels) in '
./features/step_definitions/web_steps.rb:14:in `with_scope'
./features/step_definitions/web_steps.rb:34:in `/^(?:|I )follow "([^\"]*)"(?: within "([^\"]*)")?$/'

When I try to use Spork, I get this:

/home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:573:in `load': too large packet 67654656 (DRb::DRbConnError)
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:632:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:918:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1197:in `send_message'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1088:in `block (2 levels) in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1172:in `open'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1087:in `block in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1105:in `with_friend'
Exception encountered: #
backtrace:

Humm, I upgraded to this while using (ruby 1.9.2, Rails 3.0.0). While I don't get the warning, things are worse off. My tests fail with a :

undefined local variable or method `node' for # (NameError)
./features/step_definitions/web_steps.rb:35:in `block (2 levels) in '
./features/step_definitions/web_steps.rb:14:in `with_scope'
./features/step_definitions/web_steps.rb:34:in `/^(?:|I )follow "([^\"]*)"(?: within "([^\"]*)")?$/'

When I try to use Spork, I get this:

/home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:573:in `load': too large packet 67654656 (DRb::DRbConnError)
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:632:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:918:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1197:in `send_message'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1088:in `block (2 levels) in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1172:in `open'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1087:in `block in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1105:in `with_friend'
Exception encountered: #
backtrace:
@jwilsonsprings

This comment has been minimized.

Show comment
Hide comment
@jwilsonsprings

jwilsonsprings Sep 28, 2010

To be fair, I just pulled edge from github, so possible this is something else... Falling back to released 0.3.9

To be fair, I just pulled edge from github, so possible this is something else... Falling back to released 0.3.9

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Sep 28, 2010

Collaborator

You shouldn't be getting that error on the latest master. It's a problem where cucumber monkey-patches a Capybara method which no longer exists. There's a line in env.rb along the lines of:

 require 'cucumber/capybara-javascript-emulation'

or something to that effect. Remove it and the error will go away.

Collaborator

jnicklas commented Sep 28, 2010

You shouldn't be getting that error on the latest master. It's a problem where cucumber monkey-patches a Capybara method which no longer exists. There's a line in env.rb along the lines of:

 require 'cucumber/capybara-javascript-emulation'

or something to that effect. Remove it and the error will go away.

@jwilsonsprings

This comment has been minimized.

Show comment
Hide comment
@jwilsonsprings

jwilsonsprings Sep 28, 2010

Yep, that did it! Thanks.

Yep, that did it! Thanks.

@DCarper

This comment has been minimized.

Show comment
Hide comment
@DCarper

DCarper Nov 11, 2010

Deleting this makes clicking normal links work, but for me at least then links with onclick's (generated from link_to ... :method => :post) fails..

Any ideas?

DCarper commented Nov 11, 2010

Deleting this makes clicking normal links work, but for me at least then links with onclick's (generated from link_to ... :method => :post) fails..

Any ideas?

@leebyrd

This comment has been minimized.

Show comment
Hide comment
@leebyrd

leebyrd Jan 15, 2011

Having the same problem. Any resolution?

leebyrd commented Jan 15, 2011

Having the same problem. Any resolution?

@jeroenhouben

This comment has been minimized.

Show comment
Hide comment
@jeroenhouben

jeroenhouben Jun 9, 2011

whatever happened to this issue?

whatever happened to this issue?

@jeroenhouben

This comment has been minimized.

Show comment
Hide comment
@jeroenhouben

jeroenhouben Jun 9, 2011

OK seems to be continued here: jnicklas#243

OK seems to be continued here: jnicklas#243

@jnicklas

This comment has been minimized.

Show comment
Hide comment
@jnicklas

jnicklas Jun 9, 2011

Collaborator

tl;dr: upgrade to Rack 1.3.0

Collaborator

jnicklas commented Jun 9, 2011

tl;dr: upgrade to Rack 1.3.0

@mischa

This comment has been minimized.

Show comment
Hide comment
@mischa

mischa Sep 15, 2011

tl;dr 2:

A lot of things don't support Rack 1.3.0 yet (formtastic, rails 3.0.9, sendgrid-rails, etc). For now, you can put this in an initializer or support file:

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    def unescape(s)
      CGI.unescape(s)
    end
  end
end

mischa commented Sep 15, 2011

tl;dr 2:

A lot of things don't support Rack 1.3.0 yet (formtastic, rails 3.0.9, sendgrid-rails, etc). For now, you can put this in an initializer or support file:

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    def unescape(s)
      CGI.unescape(s)
    end
  end
end

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment