Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode ranges as pstore called dynamicly #3

Merged
merged 3 commits into from
Jul 12, 2015

Conversation

equivalent
Copy link
Contributor

No description provided.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling a441999 on equivalent:named-property-lambda into aa0a1da on tom-lord:master.

@equivalent
Copy link
Contributor Author

If I'm breaking any of your standards for code syntax or how you imagine the gem should work pls let me know I'll change that

@equivalent
Copy link
Contributor Author

It looks like the failing test is some kind of encoding problem as it works with ruby 2.1.5 and not on 2.2.0 (any idea ??)

expected "᠎" to match /\A(?:\p{Space})\z/

def ranges_to_unicode(ranges)
result = []
ranges.each do |range|
if range.is_a? Fixnum # Small hack to improve readability below
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now using a pstore (which is not human readable), it might be worth removing this "hack" from the code? I only put it in, so that you don't end up with (in the old style) ugly definitions like:

'blank' => ranges_to_unicode(9..9, 32..32, 160..160, .....)

In fact, you could even potentially remove the whole "array of numbers --> ranges" conversion altogether - although this would make db/unicode_ranges.pstore significantly larger (especially if I remove the "only save first 128 matches").

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitelly agree I just wanted to do small step first to see where it will lead, but yeah I'll refactor to include everyting

@tom-lord
Copy link
Owner

tom-lord commented Mar 3, 2015

It looks like you've generated the pstore with ruby 2.1, and this is causing a failing test in ruby 2.2.

To be specific:

/\p{Space}/ =~ "\u180e" # 0 in ruby 2.1; nil in ruby 2.2

To solve this properly (and improve test coverage) perhaps we should just generate a separate pstore for ruby 2.0/2.1/2.2 (and the micros versions too, if it turns out that's needed!)

In other words, some small tweak to lib/regexp-examples/unicode_char_ranges.rb:9:

STORE_FILENAME = 'unicode_ranges.pstore' # before
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION}.pstore" # something like this
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION[0..2]}.pstore"# or perhaps we can get away with this

?

NamedGroups.each do |name|
count += 1
matching_codes = (0..55295).lazy.select { |x| /\p{#{name}}/ =~ eval("?\\u{#{x.to_s(16)}}") }.first(128)
f.puts "'#{name.downcase}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
store[name.downcase] = calculate_ranges(matching_codes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... Or in other words (as per my above comment), I think we should either be using a human-readable YAML::Store (which might not be a bad idea?), or just do:

 store[name.downcase] = (0..55295)
  .lazy
  .map { |x| eval("?\\u{#{x.to_s(16)}") } # convert to unicode
  .select { |x| /\p{#{name}}/ =~ x) }
  .first(128)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use YAML store which entirely acts as pstore only human readable

however is slower
you can run this to see how much

require 'benchmark'
require 'yaml/store'
require 'fileutils'
FileUtils.rm_f('bm_blog.yml')
FileUtils.rm_f('bm_blog.pstore')
repo = YAML::Store.new('bm_blog.yml')
Post = Struct.new(:title,:body)
pstore_repo = PStore.new('bm_blog.pstore')
Benchmark.bm do |b| 
  b.report('yaml') do
    1000.times do |i| 
      repo.transaction do |r|.
        (r["posts"] ||= []) << Post.new("Post #{i}")
      end 
    end 
  end 
  b.report('pstore') do
    1000.times do |i| 
      pstore_repo.transaction do |r|.
        (r["posts"] ||= []) << Post.new("Post #{i}")
      end 
    end 
  end 
end

(stolen from Avdi Grim ruby tapas 163)

so it's your call what do you think will be the best approach :) ... I will implement what you say is the best for the direction of this project

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about performance - running that script on my machine gives:

   user     system      total        real

yaml 45.020000 0.140000 45.160000 ( 45.167915)
pstore 1.680000 0.050000 1.730000 ( 1.725712)

Performance is unlikely to be an issue in "real use" of the system, but could become significant in a larger test suite. For example, if Onigmo implements support for extended bracket character classes (in Perl: http://perldoc.perl.org/perlrecharclass.html#Extended-Bracketed-Character-Classes), then this gem might need additional features+tests added for ruby 2.3+

So the question is really: Do we want to maximise the gem performance (with a pstore), or provide a human-readable file diff between ruby versions?

I guess even though the latter would be nice to have, it should not be the responsibility of this gem to effectively provide (what should be) ruby documentation. Perhaps I could re-run the script in YAML-mode and save the result in a blog post/something.

tl;dr: PStore is fine, ruby should document this stuff not me 😛

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 42592f9 on equivalent:named-property-lambda into aa0a1da on tom-lord:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 42592f9 on equivalent:named-property-lambda into aa0a1da on tom-lord:master.

@tom-lord tom-lord merged commit 42592f9 into tom-lord:master Jul 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants