Optimized regexp for matching tags. #206
Conversation
Second commit adds ability to support non-ascii tags. My blog is in Russian and I was all the time wondering why only tags in English work. So I did study writefreely sources and some of the Golang docs, and it seems that Golang regexp fully supports only English language. For example, I've added a very simple workaround that attempts to match |
Thanks for contributing this. While I think this is a good stopgap for instances that need it, I'd prefer we implement a more permanent fix that supports all possible character sets on the front and back end. Part of this work will likely involve larger database changes, including tracking hashtags associated with posts instead of doing things with regular expressions. I'll leave this open for now, but ideally we can fix #219 with a more robust system than what we have today. |
Isn't it better to have a solution that is not perfect but works today while waiting for a better system? It's your choice, of course, but this PR replaces one regexp with another regexp, so while it's not a perfect solution, but original regexp isn't optimal either. E.g. |
I agree and am fine with removing the |
Closing now since there hasn't been any progress. If you want to make those improvements, please feel free to reopen this! |
A tiny optimization for the SQLite regexp matching removing no-op
.*
before and after the regexp matching tags. I'm not sure if it affects performance noticeably, but it's an inaccurate regexp because.*
will match anything and doesn't really have to be included.