Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate uses of AbstractString and string indices #24

Closed
jakobnissen opened this issue Apr 28, 2023 · 4 comments
Closed

Validate uses of AbstractString and string indices #24

jakobnissen opened this issue Apr 28, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@jakobnissen
Copy link

I noticed that at least a few of your functions that take AbstractString can't actually deal with AbstractString, and/or does not handle non-ASCII strings, for example:

julia> parse(Identifier, Test.GenericString("a"))
ERROR: ArgumentError: regex matching is only available for the String type; use String(s) to convert

julia> DataToolkitBase.stringdist("ÆaÆb", "ÆaÆb")
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'Æ', [3]=>'a'

In general, I've found that

  • You rarely need to index into strings. If you do, use nextind, prevind etc.
  • Any time you need a pointer to a string (e.g. when matching a Regex to it), you need it to be Union{String, SubString{String}}
  • If you test your functions with Test.GenericString containing non-ASCII characters, you catch most of these bugs.
@tecosaur
Copy link
Owner

Hi Jakob, thanks for taking a look and making this issue. This project is beginning to move onto the testing/documenting/tweaking phase, so your comments are particularly appreciated. I'll see what I can do to make those functions better behaved.

If you might have any other comments, please do share them. I'd really like this project to be as well-constructed as I can make it, and so far I'm the only one who's really looked into the code, so I'm very keen on feedback.

@jakobnissen
Copy link
Author

I really just noticed it when I copy-pasted your stringdist function to test in the context of a comment of yours. I don't use your package, just wanted to let you know

@tecosaur
Copy link
Owner

tecosaur commented May 6, 2023

Yea, I'm pretty sure at this stage I'm basically the only user of this package 😛, I really want to avoid locking in bad design decisions though, which is why I'm desperate keen to get more eyes on the code.

Just mentioning this in case anything might come of it, but I understand this is a drive-by issue 🙂

@tecosaur
Copy link
Owner

The utils.jl string functions should be multi-codepoint-safe now.

@tecosaur tecosaur added the bug Something isn't working label May 16, 2024
@tecosaur tecosaur transferred this issue from tecosaur/DataToolkitCore.jl May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants