New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash with djb2 function #10
Conversation
This sounds good to me! The existing hash comes from the K&R C book (second edition on page 144). Though, for some reason, I must've thought it came from The Practice of Programming book (hence TPOP). Either way, it looks like djb2 has decent qualities. It will expand to be one instruction larger, however, due to the initial constant being larger than the acceptable range of a single I suppose at the and of the day this choice would come down to the nuanced characteristics of each hash function. I doubt the extra instruction with djb2 would make any noticeable difference (hashing isn't on a hot path of the interpreter). For this PR, I'd want to go ahead and simply rename |
Oh nooo!! I totally missed the fact that Now thinking about it, I do somewhat prefer less instructions even though you say it won't make a difference... let me think about this some more. |
Indeed, it adds only one instruction to
As you said, it shouldn't make a big difference. |
Awesome, I like this change! I've always been thinking (in the back of my mind) about how (or if) the interpreter should handle hash collisions. In Forth, overriding words with new implementations is super common so collisions are "expected" behavior in that sense. If two words ever collided unintentionally, then the program would be in a bad state without the user really knowing why (unless they know to run the I thought about simply printing something special if a word gets defined that replaces an existing one (maybe print |
Yep, I came to pretty much the same conclusion regarding the likelyhood of a collision. I was initially skeptical about word hashing due to the "performance hit", but in this case it means a word name only occupies 4-Bytes instead of N-Bytes, which provides quite a bit of savings as the dictionary grows.. which is particularly useful on small MCUs. Now we just need to fix the O(N) lookup time for user-defined dictionary words. I'm testing some ideas for this and will get back to you in a separate issue. |
Yea, the Yea, the O(N) lookups are slightly fussy, but thankfully they aren't on the hot path. Once a word is looked up and linked, it will never been looked up again (by the word being defined). I could possibly see an argument claiming that "startup times" could be negatively impacted by the linear lookup behavior: defining M words in sequence could lead to O(N*M) worst case. In practice, however, new words tend to refer to other recent words which keeps the behavior more linear than quadratic. Plus, I worry that doing anything fancier (some sort of trie?) would hurt code size and general readability. It is probably worth some research, though. |
This PR makes 2 changes:
.asm
file (smaller binary?)tpop
hash function with values used in the djb2 hash functionI don't think it'll make a big difference performance wise, but I like the idea of using a well-known initial magic constant or
hash value
with a goodmultiplier
.