Updated the RegExp to catch Strings earlier in the tokenization process. #90

merged 1 commit into from Oct 2, 2012

2 participants


This is an improvement over my last pull request to try to address the speed issue in #84. By using a single RegExp match rather than two, things speed up quite a bit. This still isn't optimal, but I got hung up trying to deal with corner cases. That "1_000" is an int, but "_100" is a String really messes with things. As a result, any String that starts with a digit ends up following the unhappy path right now, even if it contains a clearly non-number character later in the String. Someone better at RegExp might be able to get that working. I could only do it through some really messy disjunctions.

With this change, the tokenization process is no longer a hot spot in some VCR-heavy tests I have. Previously, tokenization was accounting for around 26% of the total execution time.

As an aside, I think we could do better in the intermediary paths all the way to the else clause. E.g., the two infinite cases could be collapsed into a single RegExp match. But it wasn't coming up in the profiler so I left it for now.

@tenderlove tenderlove merged commit 4be2aef into tenderlove:master Oct 2, 2012

1 check failed

Details default The Travis build failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment