-
Notifications
You must be signed in to change notification settings - Fork 281
Make URL schema case insensitive #2620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make URL schema case insensitive #2620
Conversation
We should really aim to keep In fact I had a very similar situation recently, just with file extensions. Since the list of extensions in that case was fixed, it was safe to use a temporary stack allocated buffer of fixed size and write the lower-case version of the extension into it. Since we have the possibility to add user-defined schemas here, that unfortunately doesn't work without constraints. However, I think that limiting the length of schema names in |
@s-ludwig I didn't touch
|
inet/vibe/inet/url.d
Outdated
string lowerschema = schema; | ||
try | ||
lowerschema = schema.toLower(); | ||
catch (Exception) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @s-ludwig means is something like this:
if (schema.length > 128) return false;
char[128] buffer;
buffer[0 .. schema.length] = schema[];
scope lowerSchema = buffer[0 .. schema.length];
lowerSchema.toLowerInPlace();
Unfortunately I think that it is not @nogc
because it decodes and thus can throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach:
immutable ubyte CaseOffset = 'a' - 'A';
foreach (char c; schema)
{
// We should only get ASCII input, anything else is rejected
if (c & 0b1000_0000) return false;
if (c >= 'A' && c <= 'Z')
buffer[idx] = cast(char) (c + CaseOffset);
else
buffer[idx] = c;
}
}
It's crude but should work. What do you think @s-ludwig ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work, there is also std.ascii.toLower
to make this a little bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit refactoring, works well,
char[128] lowerschema = '\0';
if (schema.length >= 128) return false;
foreach (ix, char c; schema)
{
if (!isASCII(c)) return false;
lowerschema[ix] = toLower(c);
}
but since StringSet
requires string
parameters, buffer has to be casted which requires a trusted
scope
() @trusted {
return set ? set.contains(cast(string) lowerschema) : false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can alternatively just change contains
to accept scope const(char)[]
without generating a compile error here.
That's not currently the case though, we always allocate when calling I think some users will want to keep user input, some will prefer to normalize it. We would really like a way to normalize it, even if it is a method called explicitly that allocates, as it's a rare (in the grand scheme of things) event. |
Hence why "the plan" ;-)
We are not only talking about user input; you might be loading a bunch of URLs out of an XML or CSV file, where allocating per item can have a considerable impact. Adding a normalization functionality makes sense, though, and I'd also see that allocate in all realistic OT: would see the normalization doing three things:
Do you have anything else in mind (apart from what the "fuzzy" parser we've talked about earlier already does)? |
Remove the |
inet/vibe/inet/url.d
Outdated
try { | ||
lowerschema = schema.toLower(); | ||
} catch (Exception) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function nothrow
@s-ludwig ? I think any error should be reported, and not silently ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be assert(false, e.msg)
instead of return
.
Apart from that |
Fixes #2619
Default port and common internet schema checks are case insensitive now. Schema stored in
URL
structure still respects to user's letters.