- Types: StrSlice for strings, Slice for bytes, StrChar for characters
- Gas efficient (see gas benchmarks)
- Versioned releases, available for both foundry and hardhat
- Simple imports, you only need e.g.
StrSliceandtoSlice StrSliceenforces UTF-8 character boundaries;StrCharvalidates character encoding- Clean, well-documented and thoroughly-tested source code
- Optional PRBTest extension with assertions like
assertContainsandassertLtfor both slices and nativebytes,string SliceandStrSliceare value types, not structs- Low-level functions like memchr, memcmp, memmove etc
Gas costs - Strings are expensive.
- Storage: Storing strings on-chain costs ~20,000 gas per 32-byte slot. A microblog-length string could cost $5-50+ depending on gas prices
- Operations: String comparison, concatenation, and manipulation require loops over bytes—each iteration costs gas
- Memory expansion: Longer strings expand memory, which has quadratic cost growth
- No native support: Solidity has no built-in string functions, so any operation requires library code (more bytecode = higher deployment cost)
- Determinism concerns - Strings introduce complexity:
- Variable length makes gas estimation unpredictable
- UTF-8 encoding edge cases can cause unexpected behavior
- Comparison/sorting depends on encoding, not semantic meaning
Best practice alternatives:
| Instead of... | Use... |
|---|---|
| String identifiers | bytes32 (fixed, cheap comparison) |
| String enums | uint8 enum types |
| User-facing text | Events (logged off-chain, much cheaper) |
| Configuration | Off-chain storage with on-chain hash verification |
Despite gas concerns, there are excellent use cases for on-chain string manipulation:
- Input validation in view functions
- Formatting data for front-end consumption
- ABI encoding helpers
Gas on L2s (Arbitrum, Optimism, Base) is 10-100x cheaper than mainnet. On app-specific chains, it's often negligible.
- Assertion helpers for test suites (included!)
- Fuzzing utilities for string inputs
- Console.log formatting in Foundry
Some operations justify the gas cost:
- EIP-712 signature formatting — Structured data often requires string building
- NFT/Token URI construction — Dynamic metadata URIs
- Rich revert messages — Debugging with dynamic error content
- Merkle proof data — String-based allowlists
Convert strings to native types:
toUint()— Parse numeric stringstoAddress()— Parse hex addresses (with or without0xprefix)
Future development could lower the gas cost of strings, making them more viable.
| Trend | Impact |
|---|---|
| L2 Scaling | Already 100x cheaper, heading toward negligible |
| EIP-4844 (Blobs) | Dramatically reduces data availability costs |
| Verkle Trees | Will reduce state access costs |
| Account Abstraction | Gas sponsorship means users don't feel costs |
| App-Specific Chains | Gaming/social chains prioritize UX over minimal gas |
Emerging use cases:
- On-chain social profiles and content
- Rich metadata stored directly in contracts
- Human-readable governance proposals
- On-chain search and indexing
Use this library when:
- You're on an L2 or low-cost chain
- You're building view functions or off-chain tooling
- You need URI construction, input parsing, or rich error messages
- You're writing tests
- You've planned for gas costs or are not worried about gas costs
Think twice when:
- You're on a circa 2025 EVM with cost-sensitive users
- A
bytes32or enum would suffice - The string could live off-chain with only a hash on-chain
Solidity String Utils Gas Measurements note that the original Arachnid implementation was designed before Solidity 0.8.0, and doesn't use code in unchecked blocks to significantly reduce gas costs.
Standard forge gas snapshots don't capture string/bytes function efficiency well—they're not suited for internal functions with dynamic inputs where gas varies significantly (e.g., finding an item at index 0 vs index 500 in a 1000-byte string).
See the gas benchmarks repo for detailed measurements.
| Operation | Best For | Notes |
|---|---|---|
memchr, find |
Strings > ~8 bytes | Optimized for long strings; binary search tricks for 8-32 bytes |
memcmp (inequality) |
Long strings | For 10,000+ bytes, approaches keccak256 hash comparison speed |
memmove |
Fast copying (view) | Uses identity precompile; very efficient but requires view |
memcpy |
Pure copying | Chunked mload/mstore; slower but pure compatible |
StrCharsIter.count |
Validated counting | Slower—must validate every UTF-8 character |
StrCharsIter.unsafeCount |
Fast counting | Very fast—no validation (equivalent to Arachnid's len) |
Optimized for longer strings: Short strings (~8 bytes) have proportionally more overhead from safety checks like len() and ptr() (~50 gas each). This overhead is a deliberate trade-off for usability and readability.
Why some methods are view: Methods using memmove (identity precompile) are view because the precompile isn't pure. The pure alternative memcpy is available but slower.
UTF-8 validation costs: Safe iteration via StrCharsIter validates encoding. Use unsafeCount when you trust the input and need speed.
For maximum efficiency, use the low-level utilities directly:
- memchr — byte search
- memcmp, memmove, memcpy — memory operations
These have minimal overhead and are suitable for performance-critical code.
Solidity does not provide native string manipulation functions such as
substring, slice, indexOf, or split. There is no built-in way to slice
strings directly (for example, str.substring(1, 5) is not supported).
-
Work with
bytes
Strings are UTF-8, butbytesare indexable, which allows manual byte-level slicing.
⚠️ This operates on bytes, not Unicode characters, and may break on multi-byte UTF-8 characters. -
**Use a
solidity-stringutilslibrary **
This library exists to fill the gap, providing slicing, substring search, and comparison utilities via asliceabstraction.
This is the closest thing to real substring support in Solidity. -
Concatenation only (built-in)
Solidity0.8.12+supportsstring.concat(...), but still offers no slicing or parsing functionality. Solidity has slowly been adding more string functions.
- String manipulation is gas-expensive
- UTF-8 handling is error-prone on-chain
- Best practice is to avoid string parsing on-chain
- Prefer
bytes32, enums, IDs, or hashes - Perform complex string operations off-chain when possible
- ❌ No native substring support in Solidity
- ✅ Possible via
bytesor libraries ⚠️ Byte-based, not Unicode-safe- 🧠 Often better to redesign than parse strings on-chain
import { StrSlice, toSlice } from "solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns the content of brackets, or empty string if not found
function extractFromBrackets(string memory stuffInBrackets) pure returns (StrSlice extracted) {
StrSlice s = stuffInBrackets.toSlice();
bool found;
(found, , s) = s.splitOnce(toSlice("("));
if (!found) return toSlice("");
(found, s, ) = s.rsplitOnce(toSlice(")"));
if (!found) return toSlice("");
return s;
}
/*
assertEq(
extractFromBrackets("((1 + 2) + 3) + 4"),
toSlice("(1 + 2) + 3")
);
*/See ExamplesTest.
Internally StrSlice uses Slice and extends it with logic for multibyte UTF-8 where necessary.
| Method | Description |
|---|---|
len |
length in bytes |
isEmpty |
true if len == 0 |
toString |
copy slice contents to a new string |
keccak |
equal to keccak256(s.toString()), but cheaper |
| concatenate | |
add |
Concatenate 2 slices into a new string |
join |
Join slice array on self as separator |
| compare | |
cmp |
0 for eq, < 0 for lt, > 0 for gt |
eq,ne |
==, != (more efficient than cmp) |
lt,lte |
<, <= |
gt,gte |
>, >= |
| index | |
isCharBoundary |
true if given index is an allowed boundary |
get |
get 1 UTF-8 character at given index |
splitAt |
(slice[:index], slice[index:]) |
getSubslice |
slice[start:end] |
| search | |
find |
index of the start of the first match |
rfind |
index of the start of the last match |
return type(uint256).max for no matches |
|
contains |
true if a match is found |
startsWith |
true if starts with pattern |
endsWith |
true if ends with pattern |
| modify | |
stripPrefix |
returns subslice without the prefix |
stripSuffix |
returns subslice without the suffix |
splitOnce |
split into 2 subslices on the first match |
rsplitOnce |
split into 2 subslices on the last match |
replacen |
experimental replace n matches |
| replacen requires 0 < pattern.len() <= to.len() | |
| iterate | |
chars |
character iterator over the slice |
| ascii | |
isAscii |
true if all chars are ASCII |
| dangerous | |
asSlice |
get underlying Slice |
ptr |
get memory pointer |
| case conversion | |
toUpperCaseEnglish |
get new string with A-Z |
toLowerCaseEnglish |
get new string with a-z |
capitalizeEnglish |
get new string with first char capitalized |
equalsIgnoreCaseEnglish |
true if equal ignoring case |
isLowerCaseEnglish |
true if contains no uppercase A-Z |
isUpperCaseEnglish |
true if contains no lowercase a-z |
| non-ASCII chars are left unchanged | |
| trimming | |
trim |
remove leading and trailing whitespace |
trimStart |
remove leading whitespace |
trimEnd |
remove trailing whitespace |
| parsing | |
toUint |
parse ASCII numeric string to uint256 |
toAddress |
parse hex string to address |
| formatting | |
repeat |
repeat slice N times into a new string |
padLeft |
pad left to totalLen with char |
padRight |
pad right to totalLen with char |
toHexString |
convert uint256 or bytes to "0x..." string |
| search & split | |
count |
count non-overlapping matches |
split |
split into StrSlice[] array by delimiter |
Indexes are in bytes, not characters. Indexing methods revert if isCharBoundary is false.
Returned by chars method of StrSlice
import { StrSlice, toSlice, StrCharsIter } from "@dk1a/solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns a StrSlice of `str` with the 2 first UTF-8 characters removed
/// reverts on invalid UTF8
function removeFirstTwoChars(string memory str) pure returns (StrSlice) {
StrCharsIter memory chars = str.toSlice().chars();
for (uint256 i; i < 2; i++) {
if (chars.isEmpty()) break;
chars.next();
}
return chars.asStr();
}
/*
assertEq(removeFirstTwoChars(unicode"📎!こんにちは"), unicode"こんにちは");
*/| Method | Description |
|---|---|
asStr |
get underlying StrSlice of the remainder |
len |
remainder length in bytes |
isEmpty |
true if len == 0 |
next |
advance the iterator, return the next StrChar |
nextBack |
advance from the back, return the next StrChar |
count |
returns the number of UTF-8 characters |
validateUtf8 |
returns true if the sequence is valid UTF-8 |
| dangerous | |
unsafeNext |
advance unsafely, return the next StrChar |
unsafeCount |
unsafely count chars, read the source for caveats |
ptr |
get memory pointer |
count, validateUtf8, unsafeCount consume the iterator in O(n).
Safe methods revert on an invalid UTF-8 byte sequence.
unsafeNext does NOT check if the iterator is empty, may underflow! Does not revert on invalid UTF-8. If returned StrChar is invalid, it will have length 0. Otherwise length 1-4.
Internally next, unsafeNext, count all use _nextRaw. It's very efficient, but very unsafe and complicated. Read the source and import it separately if you need it.
Represents a single UTF-8 encoded character. Internally it's bytes32 with leading byte at MSB.
It's returned by some methods of StrSlice and StrCharsIter.
| Method | Description |
|---|---|
len |
character length in bytes |
toBytes32 |
returns the underlying bytes32 value |
toString |
copy the character to a new string |
toCodePoint |
returns the unicode code point (ord in python) |
cmp |
0 for eq, < 0 for lt, > 0 for gt |
eq,ne |
==, != |
lt,lte |
<, <= |
gt,gte |
>, >= |
isValidUtf8 |
usually true |
isAscii |
true if the char is ASCII |
Import StrChar__ (static function lib) to use StrChar__.fromCodePoint for code point to StrChar conversion.
len can return 0 only for invalid UTF-8 characters. But some invalid chars may have non-zero len! (use isValidUtf8 to check validity). Note that 0x00 is a valid 1-byte UTF-8 character, its len is 1.
isValidUtf8 can be false if the character was formed with an unsafe method (fromUnchecked, wrap).
import { Slice, toSlice } from "solidity-stringutils/src/Slice.sol";
using { toSlice } for bytes;
function findZeroByte(bytes memory b) pure returns (uint256 index) {
return b.toSlice().find(
bytes(hex"00").toSlice()
);
}See using {...} for Slice global in the source for a function summary. Many are shared between Slice and StrSlice, but there are differences.
Internally Slice has very minimal assembly, instead using memcpy, memchr, memcmp and others; if you need the low-level functions, see src/utils/.
import { PRBTest } from "@prb/test/src/PRBTest.sol";
import { Assertions } from "solidity-stringutils/src/test/Assertions.sol";
contract StrSliceTest is PRBTest, Assertions {
function testContains() public {
bytes memory b1 = "12345";
bytes memory b2 = "3";
assertContains(b1, b2);
}
function testLt() public {
string memory s1 = "123";
string memory s2 = "124";
assertLt(s1, s2);
}
}You can completely ignore slices if all you want is e.g. assertContains for native bytes/string.
- dk1a/solidity-stringutils - This library is a fork of dk1a's string utils.
- Arachnid/solidity-stringutils - One of the first versions of solidity-stringutils
- paulrberg/prb-math - good template for solidity data structure libraries with
using {...} for ... global - brockelmore/memmove - good assembly memory management examples