This crate defines ArcStr
, a reference counted string type. It's essentially trying to be a better Arc<str>
or Arc<String>
, at least for most use cases.
ArcStr intentionally gives up some of the features of Arc
which are rarely-used for Arc<str>
(Weak
, Arc::make_mut
, ...). And in exchange, it gets a number of features that are very useful, especially for strings. Notably robust support for cheap/zero-cost ArcStr
s holding static data (for example, string literals).
(Aside from this, it's also a single pointer, which can be good for performance and FFI)
Additionally, if the substr
feature is enabled (and it is by default) we provide a Substr
type which is essentially a (ArcStr, Range<usize>)
with better ergonomics and more functionality, which represents a shared slice of a "parent" ArcStr
(Note that in reality, u32
is used for the index type, but this is not exposed in the API, and can be transparently changed via a cargo feature).
A quick tour of the distinguishing features (note that there's a list of benefits in the ArcStr
documentation which covers some of the reasons you might want to use it over other alternatives). Note that it offers essentially the full set of functionality string-like functionality you probably would expect from an immutable string type — these are just the unique selling points:
use arcstr::ArcStr;
// Works in const:
const AMAZING: ArcStr = arcstr::literal!("amazing constant");
assert_eq!(AMAZING, "amazing constant");
// `arcstr::literal!` input can come from `include_str!` too:
const MY_BEST_FILES: ArcStr = arcstr::literal!(include_str!("my-best-files.txt"));
Or, you can define the literals in normal expressions. Note that these literals are essentially "Zero Cost". Specifically, below we not only don't allocate any heap memory to instantiate wow
or any of the clones, we also don't have to perform any atomic reads or writes when cloning, or dropping them (or during any other operations on them).
let wow: ArcStr = arcstr::literal!("Wow!");
assert_eq!("Wow!", wow);
// This line is probably not something you want to do regularly,
// but as mentioned, causes no extra allocations, nor performs any
// atomic loads, stores, rmws, etc.
let wowzers = wow.clone().clone().clone().clone();
// At some point in the future, we can get a `&'static str` out of one
// of the literal `ArcStr`s too.
let static_str: Option<&'static str> = ArcStr::as_static(&wowzers);
assert_eq!(static_str, Some("Wow!"));
// Note that this returns `None` for dynamically allocated `ArcStr`:
let dynamic_arc = ArcStr::from(format!("cool {}", 123));
assert_eq!(ArcStr::as_static(&dynamic_arc), None);
Open TODO: Include Substr
usage here, as it has some compelling use cases too!
It's a normal rust crate, drop it in your Cargo.toml
's dependencies section. In the somewhat unlikely case that you're here and don't know how:
[dependencies]
# ...
arcstr = { version = "...", features = ["..."] }
The following cargo features are available. Only substr
is on by default currently.
-
std
(off by default): Turn on to usestd::process
's aborting, instead of triggering an abort using the "double-panic trick".Essentially, there's one case we need to abort, and that's during a catastrophic error where you leak the same (dynamic)
ArcStr
2^31 on 32-bit systems, or 2^63 in 64-bit systems. If this happens, we followlibstd
's lead and just abort because we're hosed anyway. Ifstd
is enabled, we use the realstd::process::abort
. Ifstd
is not enabled, we trigger anabort
by triggering a panic while another panic is unwinding, which is either defined to cause an abort, or causes one in practice.In pratice you will never hit this edge case, and it still works in no_std, so no_std is the default. If you have to turn this on, because you hit this ridiculous case and found our handling bad, let me know.
Concretely, the difference here is that without this, this case becomes a call to
core::intrinsics::abort
, and notstd::process::abort
. It's a ridiculously unlikely edge case to hit, but if you are to hit it,std::process::abort
results in aSIGABRT
whereascore::intrinsics::abort
results in aSIGILL
, and the former has meaningfully better UX. That said, it's extraordinarially unlikely that you manage to leak2^31
or2^63
copies of the sameArcStr
, so it's not really worth depending onstd
by default for in our opinion. -
serde
(off by default): enable serde serialization ofArcStr
. Note that this doesn't do any fancy deduping or whatever. -
substr
(on by default): implement theSubstr
type and related functions. -
substr-usize-indices
(off by default, impliessubstr
): Useusize
under the hood for the boundaries, instead ofu32
.Without this, if you use
Substr
and an index would overflow au32
we unceremoniously panic.
While this crate does contain a decent amount of unsafe code, we justify this in the following ways:
- We have a very high test coverage ratio (essentially the only uncovered functions are the out-of-memory handler (which just calls
alloc::handle_alloc_error
), and an extremely pathological integer overflow where we just abort). - All tests pass under various sanitizers:
asan
,msan
,tsan
, andmiri
. - We have a few
loom
models although I'd love to have more. - Our tests pass on a ton of different targets (thanks to
cross
for many of these possible — easy even):- Linux x86, x86_64, armv7 (arm32), aarch64 (arm64), riscv64, mips32, and mips64 (the mips32 and mips64 targets allow us to check both big-endian 32bit and 64bit. Although we don't have any endian-specific code at the moment).
- Windows 32-bit and 64-bit, on both GNU and MSVC toolchains.
- MacOS on x86_64.
Additionally, we test on Rust stable, beta, nightly, and our MSRV (see badge above for MSRV).
Note that the above is not a list of supported platforms. In general I expect arcstr
to support all platform's Rust supports, except for ones with target_pointer_width="16"
, which should work if you turn off the substr
feature. That said, if you'd like me to add a platform to the CI coverage to ensure it doesn't break, just ask* (although, if it's harder than adding a line for another cross
target, I'll probably need you to justify why it's likely to not be covered by the existing platform tests).
* This is why there are riscv64.