layout | title |
---|---|
post |
第94期 |
公众号
欢迎投稿,推荐或自荐文章/软件/资源等
马上2022就要结束了。祝大家新年快乐。
我感觉这波阳性我算比较早期的。希望大家都没事。
值类型重申,可能很多人还停留在modern effective c++介绍的auto那里
考虑一种需求,把二进制编成ascii码, base64有点复杂,不如base16
void encode_scalar(const uint8_t *source, size_t len, char *target) {
const uint16_t table[] = {
0x3030, 0x3130, 0x3230, 0x3330, 0x3430, ...
0x6366, 0x6466, 0x6566, 0x6666};
for (size_t i = 0; i < len; i++) {
uint16_t code = table[source[i]];
::memcpy(target, &code, 2);
target += 2;
}
}
显然,能simd
__m128i shuf = _mm_set_epi8('f', 'e', 'd', 'c', 'b', 'a', '9', '8', '7', '6',
'5', '4', '3', '2', '1', '0');
size_t i = 0;
__m128i maskf = _mm_set1_epi8(0xf);
for (; i + 16 <= len; i += 16) {
__m128i input = _mm_loadu_si128((const __m128i *)(source + i));
__m128i inputbase = _mm_and_si128(maskf, input);
__m128i inputs4 =
_mm_and_si128(maskf, _mm_srli_epi16(input, 4));
__m128i firstpart = _mm_unpacklo_epi8(inputs4, inputbase);
__m128i output1 = _mm_shuffle_epi8(shuf, firstpart);
__m128i secondpart = _mm_unpackhi_epi8(inputs4, inputbase);
__m128i output2 = _mm_shuffle_epi8(shuf, secondpart);
_mm_storeu_si128((__m128i *)(target), output1);
target += 16;
_mm_storeu_si128((__m128i *)(target), output2);
target += 16;
}
代码就不列出来了,在这里 https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2022/12/23/base16.cpp
直接看性能吧
方法 | 速度 |
---|---|
table lookup | 0.9 GB/s |
128-bit vectors | 6.4 GB/s |
256-bit vectors | 11 GB/s |
常规
static const std::unordered_set<std::string_view> special_set = {
"ftp", "file", "http", "https", "ws", "wss"};
bool hash_is_special(std::string_view input) {
return special_set.find(input) != special_set.end();
}
枚举
bool direct_is_special(std::string_view input) {
return (input == "https") | (input == "http") | (input == "ftp") |
(input == "file") | (input == "ws") | (input == "wss");
}
既然是这种特殊短字符串,直接转成int给大家开开眼
static inline uint64_t string_to_uint64(std::string_view view) {
uint64_t val;
std::memcpy(&val, view.data(), sizeof(uint64_t));
return val;
}
uint32_t string_to_uint32(const char *data) {
uint32_t val;
std::memcpy(&val, data, sizeof(uint32_t));
return val;
}
bool fast_is_special(std::string_view input) {
uint64_t inputu = string_to_uint64(input);
if ((inputu & 0xffffffffff) == string_to_uint64("https\0\0\0")) {
return input.size() == 5;
}
if ((inputu & 0xffffffff) == string_to_uint64("http\0\0\0\0")) {
return input.size() == 4;
}
if (uint32_t(inputu) == string_to_uint32("file")) {
return input.size() == 4;
}
if ((inputu & 0xffffff) == string_to_uint32("ftp\0")) {
return input.size() == 3;
}
if ((inputu & 0xffffff) == string_to_uint32("wss\0")) {
return input.size() == 3;
}
if ((inputu & 0xffff) == string_to_uint32("ws\0\0")) {
return input.size() == 2;
}
return false;
}
直接看看速度
GCC 11 Intel Ice Lake
方法 | 速度 |
---|---|
std::unordered_map | 20 ns/string |
direct | 9.1 ns/string |
fast | 3.0 ns/string |
Apple M2 LLVM 12
方法 | 速度 |
---|---|
std::unordered_map | 14 ns/string |
direct | 5.5 ns/string |
fast | 1.6 ns/string |
硬转效果还挺好
看代码
struct person {
int age;
std::string name;
}
// ...
auto alice = std::make_shared<person>("Alice", 38);
std::shared_ptr<std::string> name(alice, &alice->name);
assert(alice.use_count() == name.use_count()); // single-threaded use only
通过第二种构造,name相当于alice的别名了,这么写问题出在哪里?
alice 可能被乱搞,可能已经失效了,这个时候使用alice的name是有问题的
直接来个极端的例子
std::shared_ptr<void> null;
std::shared_ptr<std::string> weirdo(null, &some_global_string_that_is_always_valid)
null是无效的但weirdo是有效的。https://godbolt.org/z/xT5qzK443
不要用shared_ptr的这种构造函数。很容易写出坑
数据复制是最常见的场景了,把数据传来传去,所以说一个memcpy速度快是很重要的。
那么实现一个快速的memcpy要考虑什么呢?
- Fail: Copy-in-RAM and DMA engines
- Both the CPU side and the memory side of caches matter
- Load/Store Partial and Double-width-shift help significantly
- Longer Prefetching matters
- Letting the memory controller know about prefetch length can be 3x faster
- Controlling cache pollution matters
- Fixed-precision formatting of floating-point numbers
感觉很牛逼。浮点数啥的我一直不懂
介绍c++23新特性。不多说了。说了好多次了
template<auto... Ns> consteval auto fn() {
std::vector v{Ns...};
return std::size(v);
}
static_assert(3uz == fn<1, 2, 3>());
额,怎么说呢这段代码,想不出有啥用途
性能路径尽量别用异常,懂得都懂。不多说了
能快一点,但这玩意一般不是瓶颈
size_t sve_strlen(const char *s) {
/*if(svcntb() > 256) {
// do something here because we assume that our
// vectors have no more than 256.
}*/
size_t len = 0;
while (true) {
svuint8_t input = svldff1_u8(svptrue_b8(), (const uint8_t *)s + len);
svbool_t matches = svcmpeq_n_u8(svptrue_b8(), input, 0);
if (svptest_any(svptrue_b8(), matches)) {
return len + svlastb_u8(svbrka_z(matches, matches), svindex_u8(0, 1));
}
len += svcntb();
}
}
都是一坨难受的代码
timer内部标记一个flag,然后外面UI事件框架根据flag来搞优先级?
没有嗷别做梦了
手把手教你cmake嵌入复杂文件信息,tar包
值得一看
cppcon
讲struc_pack的。和boost.pfr差不多
- asteria 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群753302367和作者对线
- saf asio基础上的scheduler
- libenvpp A modern C++ library for type-safe environment variable parsing
- xmake.sh xmake脚本