Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving to std::string_view for string handling #708

Open
timwoj opened this issue Dec 13, 2019 · 4 comments
Open

Consider moving to std::string_view for string handling #708

timwoj opened this issue Dec 13, 2019 · 4 comments

Comments

@timwoj
Copy link
Contributor

@timwoj timwoj commented Dec 13, 2019

I’m starting to dive into using std::string_view in prep for maybe bringing in {fmt} and I wanted to get thoughts on standardizing on ways to do strings in general. We don’t really have a standard way of handling strings, and the code is kind of all over the place depending on when something was implemented. Some methods will take or return const char* and some will return std::string or take const or not const std::string&

The advice around std::string_view is that you should use it whenever you have a method that takes a string that won’t be modified and doesn’t have a longer life time than the caller of the method (1 and 2). This covers pretty much every place we use const std::string& and nearly every place we use const char* (there’s a few outliers). For function arguments, I’m proposing we follow that advice and move over to using std::string_view (non-const, pass-by-value) for most of the places we take strings as arguments that won’t be modified.

The main argument for not using string_view is that you can't guarantee that it's going to be null-terminated. Given that we're already passing around a large number of bare char* pointers that may or may not be null-terminated, I think that argument is moot.

For strings that can be modified, move everything over to take either std::string& or std::string* depending on whether we want to strictly follow the coding style guidelines on objects are modified in a function. I don't think there's an awful lot of places that we pass a string into a function where it gets modified.

For returning strings from methods, we should standardize on returning std::string. In nearly every case, RVO optimizes away copying the string. Does anyone have any arguments for staying with returning bare const char* pointers?

@jsiwek

This comment has been minimized.

Copy link
Member

@jsiwek jsiwek commented Dec 14, 2019

The advice around std::string_view is that you should use it whenever you have a method that takes a string that won’t be modified and doesn’t have a longer life time than the caller of the method (1 and 2). This covers pretty much every place we use const std::string& and nearly every place we use const char* (there’s a few outliers).

Yes, a lot of those places I'd agree might be better w/ string_view. Especially for const std::string& parameters that just end up copying string literals I pass in.

For strings that can be modified, move everything over to take either std::string& or std::string* depending on whether we want to strictly follow the coding style guidelines on objects are modified in a function. I don't think there's an awful lot of places that we pass a string into a function where it gets modified.

Yeah, also not sure if there many places that apply to what you're thinking. But if the general sentiment is to do less manual management of memory pointed to by char* types and instead use std::string, sounds good to me.

For returning strings from methods, we should standardize on returning std::string. In nearly every case, RVO optimizes away copying the string. Does anyone have any arguments for staying with returning bare const char* pointers?

Can you give an example? I thought if you return a string literal into an std::string it's going to do an implicit conversion/copy of that string literal. The RVO optimizes away copy of the implicit std::string object itself, but still a copy of the bytes happens. If that's true, I'd favor still returning const char*. I mean the documentation for the function could just be clear enough that one understands they don't take ownership from a memory management perspective. But for cases where they would take ownership, yeah, that's a case I'd probably rather have used std:string for the originating value in the first place instead of allocating some buffer manually.

@ZekeMedley

This comment has been minimized.

Copy link
Member

@ZekeMedley ZekeMedley commented Dec 15, 2019

I thought if you return a string literal into an std::string it's going to do an implicit conversion/copy of that string literal. The RVO optimizes away copy of the implicit std::string object itself, but still a copy of the bytes happens.

By no means a deep dive, but a quick skim of how LLVM and Microsoft's string implementations seem to imply that moving does indeed copy the underlying string literal.

LLVM

https://github.com/llvm/llvm-project/blob/0133dc3983c2ed477a198d414d5d7ad4b95db549/libcxx/include/__string#L157-L176

Microsoft

https://github.com/microsoft/STL/blob/b61483432979aafb6b89b245fb8e82eee7aee32f/stl/inc/xstring#L85-L89

@timwoj

This comment has been minimized.

Copy link
Contributor Author

@timwoj timwoj commented Dec 16, 2019

Yes, a lot of those places I'd agree might be better w/ string_view. Especially for const std::string& parameters that just end up copying string literals I pass in.

Alright, I'll dive into this.

Can you give an example? I thought if you return a string literal into an std::string it's going to do an implicit conversion/copy of that string literal.

Ah, you're correct. I could have sworn that compilers would optimize away creating the string now, but I guess that's not really possible now that I sit here thinking about it. Fixed string literals are still best returned as const char*. https://godbolt.org/z/_6mVSt has an example.

@timwoj timwoj self-assigned this Jan 2, 2020
@timwoj

This comment has been minimized.

Copy link
Contributor Author

@timwoj timwoj commented Jan 2, 2020

There's also an argument to be made against string_view when it comes to API changes. One big problem is that string_view can't be created from a null pointer. I've only found a few places we do this internally (all in scan.l) but it's something to look out for. Passing a null to a string_view argument causes an exception to be thrown, and in our case crashes zeek. There's a proposal out to fix this issue in the standard, but who knows if or when it will land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.