Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Updated the RegExp to catch Strings earlier in the tokenization process. #90

Merged
merged 1 commit into from

2 participants

Kevin Menard Aaron Patterson
Kevin Menard

This is an improvement over my last pull request to try to address the speed issue in #84. By using a single RegExp match rather than two, things speed up quite a bit. This still isn't optimal, but I got hung up trying to deal with corner cases. That "1_000" is an int, but "_100" is a String really messes with things. As a result, any String that starts with a digit ends up following the unhappy path right now, even if it contains a clearly non-number character later in the String. Someone better at RegExp might be able to get that working. I could only do it through some really messy disjunctions.

With this change, the tokenization process is no longer a hot spot in some VCR-heavy tests I have. Previously, tokenization was accounting for around 26% of the total execution time.

As an aside, I think we could do better in the intermediary paths all the way to the else clause. E.g., the two infinite cases could be collapsed into a single RegExp match. But it wasn't coming up in the profiler so I left it for now.

Aaron Patterson tenderlove merged commit 4be2aef into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 3 additions and 1 deletion.
  1. +3 −1 lib/psych/scalar_scanner.rb
4 lib/psych/scalar_scanner.rb
View
@@ -24,7 +24,9 @@ def tokenize string
return string if @string_cache.key?(string)
case string
- when /^[A-Za-z_~]/
+ # Check for a String type, being careful not to get caught by hash keys, hex values, and
+ # special floats (e.g., -.inf).
+ when /^[^\d\.:-]?[A-Za-z_\s!@#\$%\^&\*\(\)\{\}\<\>\|\/\\~;=]+/
if string.length > 5
@string_cache[string] = true
return string
Something went wrong with that request. Please try again.