-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode ranges as pstore called dynamicly #3
Conversation
If I'm breaking any of your standards for code syntax or how you imagine the gem should work pls let me know I'll change that |
It looks like the failing test is some kind of encoding problem as it works with ruby 2.1.5 and not on 2.2.0 (any idea ??)
|
def ranges_to_unicode(ranges) | ||
result = [] | ||
ranges.each do |range| | ||
if range.is_a? Fixnum # Small hack to improve readability below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're now using a pstore (which is not human readable), it might be worth removing this "hack" from the code? I only put it in, so that you don't end up with (in the old style) ugly definitions like:
'blank' => ranges_to_unicode(9..9, 32..32, 160..160, .....)
In fact, you could even potentially remove the whole "array of numbers --> ranges" conversion altogether - although this would make db/unicode_ranges.pstore
significantly larger (especially if I remove the "only save first 128 matches").
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitelly agree I just wanted to do small step first to see where it will lead, but yeah I'll refactor to include everyting
It looks like you've generated the pstore with ruby 2.1, and this is causing a failing test in ruby 2.2. To be specific:
To solve this properly (and improve test coverage) perhaps we should just generate a separate pstore for ruby 2.0/2.1/2.2 (and the micros versions too, if it turns out that's needed!) In other words, some small tweak to lib/regexp-examples/unicode_char_ranges.rb:9:
? |
NamedGroups.each do |name| | ||
count += 1 | ||
matching_codes = (0..55295).lazy.select { |x| /\p{#{name}}/ =~ eval("?\\u{#{x.to_s(16)}}") }.first(128) | ||
f.puts "'#{name.downcase}' => ranges_to_unicode(#{calculate_ranges(matching_codes)})," | ||
store[name.downcase] = calculate_ranges(matching_codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... Or in other words (as per my above comment), I think we should either be using a human-readable YAML::Store (which might not be a bad idea?), or just do:
store[name.downcase] = (0..55295)
.lazy
.map { |x| eval("?\\u{#{x.to_s(16)}") } # convert to unicode
.select { |x| /\p{#{name}}/ =~ x) }
.first(128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use YAML store which entirely acts as pstore only human readable
however is slower
you can run this to see how much
require 'benchmark'
require 'yaml/store'
require 'fileutils'
FileUtils.rm_f('bm_blog.yml')
FileUtils.rm_f('bm_blog.pstore')
repo = YAML::Store.new('bm_blog.yml')
Post = Struct.new(:title,:body)
pstore_repo = PStore.new('bm_blog.pstore')
Benchmark.bm do |b|
b.report('yaml') do
1000.times do |i|
repo.transaction do |r|.
(r["posts"] ||= []) << Post.new("Post #{i}")
end
end
end
b.report('pstore') do
1000.times do |i|
pstore_repo.transaction do |r|.
(r["posts"] ||= []) << Post.new("Post #{i}")
end
end
end
end
(stolen from Avdi Grim ruby tapas 163)
so it's your call what do you think will be the best approach :) ... I will implement what you say is the best for the direction of this project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about performance - running that script on my machine gives:
user system total real
yaml 45.020000 0.140000 45.160000 ( 45.167915)
pstore 1.680000 0.050000 1.730000 ( 1.725712)
Performance is unlikely to be an issue in "real use" of the system, but could become significant in a larger test suite. For example, if Onigmo implements support for extended bracket character classes (in Perl: http://perldoc.perl.org/perlrecharclass.html#Extended-Bracketed-Character-Classes), then this gem might need additional features+tests added for ruby 2.3+
So the question is really: Do we want to maximise the gem performance (with a pstore), or provide a human-readable file diff between ruby versions?
I guess even though the latter would be nice to have, it should not be the responsibility of this gem to effectively provide (what should be) ruby documentation. Perhaps I could re-run the script in YAML-mode and save the result in a blog post/something.
tl;dr: PStore is fine, ruby should document this stuff not me 😛
No description provided.