Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Diplomat &[&str], cmp::Ordering, unvalidated UTF Writeable #4786

Merged
merged 21 commits into from
Apr 10, 2024

Conversation

robertbastian
Copy link
Member

No description provided.

@robertbastian robertbastian requested review from Manishearth and a team as code owners April 9, 2024 06:32
Manishearth
Manishearth previously approved these changes Apr 9, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertbastian robertbastian changed the title Use Diplomat &[&str] Use Diplomat &[&str], cmp::Ordering Apr 9, 2024
Manishearth
Manishearth previously approved these changes Apr 9, 2024
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like @sffc to approve writeable changes

ffi.Pointer<_SliceUtf8> allocIn(ffi.Allocator alloc) {
final slice = alloc<_SliceUtf8>(length);
for (var i = 0; i < length; i++) {
final codeUnits = Utf8Encoder().convert(_strings[i]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract the Utf8Encoder() to a field? It is even a const constructor, so writing const Utf8Encoder encoder could increase performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that actually necessary? I'd hope the compiler doesn't need to create an empty object to call the method.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the compiler optimizes that, better to extract it and be sure IMO.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using const.

Const constructors may be used to get a constant, but can also be used to instantiate a new object. Using the const keyword ensures it is a const. (And will in AOT ensure the object is dropped if the method doesn't actually use the object.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A const field or a const variable?

utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
@robertbastian robertbastian changed the title Use Diplomat &[&str], cmp::Ordering Use Diplomat &[&str], cmp::Ordering, unvalidated UTF Writeable Apr 10, 2024
@robertbastian robertbastian requested a review from sffc April 10, 2024 15:48
utils/writeable/src/utf.rs Outdated Show resolved Hide resolved
Comment on lines 62 to 66
out.push_str(valid);
out.push_str(
char::REPLACEMENT_CHARACTER
.encode_utf8(&mut [0; char::REPLACEMENT_CHARACTER.len_utf8()]),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: why not just use <String as core::fmt::Write>::write_char and write_str here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they're fallible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use String::push(char) for the second one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that has an extra branch which might not be optimized out. Probably cheapest to just use push_str

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed in godbolt that it does in fact get optimized out.

Co-authored-by: Shane F. Carr <shane@unicode.org>
@robertbastian robertbastian requested a review from sffc April 10, 2024 15:56
);

out.push_str(valid);
out.push('�');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, optional: I would prefer explicitly writing out the code point so that I know this is a literal replacement character, U+FFFD, and not some other weird Unicode question mark symbol.

Suggested change
out.push('');
out.push('\u{FFFD}');

I say this aware of the following section in the style guide.

https://github.com/unicode-org/icu4x/blob/main/documents/process/style_guide.md#render-visible-characters-in-docs-and-code--suggested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I recognise the question mark as the replacement character, I don't recognise the escape sequence.

@robertbastian robertbastian merged commit 3412e3b into unicode-org:main Apr 10, 2024
30 checks passed
@robertbastian robertbastian deleted the list branch April 10, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants