Updated the RegExp to catch Strings earlier in the tokenization process.

This is an improvement over my last pull request to try to address the speed issue in #84. By using a single RegExp match rather than two, things speed up quite a bit. This still isn't optimal, but I got hung up trying to deal with corner cases. That "1_000" is an int, but "_100" is a String really messes with things. As a result, any String that starts with a digit ends up following the unhappy path right now, even if it contains a clearly non-number character later in the String. Someone better at RegExp might be able to get that working. I could only do it through some really messy disjunctions.

With this change, the tokenization process is no longer a hot spot in some VCR-heavy tests I have. Previously, tokenization was accounting for around 26% of the total execution time.

As an aside, I think we could do better in the intermediary paths all the way to the else clause. E.g., the two infinite cases could be collapsed into a single RegExp match. But it wasn't coming up in the profiler so I left it for now.

@@ -24,7 +24,9 @@ def tokenize string
return string if @string_cache.key?(string)
case string
- when /^[A-Za-z_~]/
+ # Check for a String type, being careful not to get caught by hash keys, hex values, and
+ # special floats (e.g., -.inf).
+ when /^[^\d\.:-]?[A-Za-z_\s!@#\$%\^&\*\(\)\{\}\<\>\|\/\\~;=]+/
if string.length > 5
@string_cache[string] = true
return string
