Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variants handling doubles unic-langid binary size #49

Closed
sffc opened this issue Nov 17, 2019 · 10 comments
Closed

Variants handling doubles unic-langid binary size #49

sffc opened this issue Nov 17, 2019 · 10 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@sffc
Copy link

sffc commented Nov 17, 2019

I was doing some binary size analysis on unic-langid. Overall it's pretty tight, I think about as tight as possible, except for the variants.sort(); line, which adds about 40% to the binary weight (from about 3 KB to 5 KB). It also appears that the compiler likes to basically inline the sorting algorithm.

The next biggest chunk of binary size comes from the vector operations required when there is more than one variant.

I think in most cases, language tags have 0 or 1 variant.

@zbraniecki
Copy link
Owner

Thank you for the analysis!

I'm wondering if there's an equivalent of AutoVec that would optimize for 0-1 element arrays.

@sffc
Copy link
Author

sffc commented Nov 18, 2019

It appeared that the compiler already optimizes Vecs that it can figure out are only going to have 0 or 1 elements, but if it's possible for it to have more than 2 variants, then it has to pull in the extra machinery. I'm not sure how to avoid that. If there's a lighterweight sorting implementation in some crate, that could be an option. But, I wouldn't plan around it; this issue is more for posterity. Maybe a good resolution would be adding a TODO comment suggesting that variants.sort() may be able to be optimized in the future.

@zbraniecki
Copy link
Owner

@Manishearth pointed out that sort_unstable may be cheaper here, and if that's not enough, he suggested we just implement a dumb sort.

@zbraniecki zbraniecki added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Dec 16, 2019
@zbraniecki
Copy link
Owner

@sffc - could you test if sort_unstable is worth it?

@zbraniecki
Copy link
Owner

@sffc - I landed the switch to sort_unstable in 0.8. Lmk if that works, and reopen if needed.

@sffc
Copy link
Author

sffc commented Jan 30, 2020

Sorry, finally got around to looking at this again.

I have this small Rust function:

use wee_alloc::WeeAlloc;
use unic_langid_impl::LanguageIdentifier;

#[wasm_bindgen]
extern "C" {
	fn alert(s: &str);
}

#[wasm_bindgen]
pub fn get_langauge(input: &str) {
	let loc1 = LanguageIdentifier::from_bytes(input.as_bytes()).unwrap();
	let loc2 = LanguageIdentifier::from_bytes("en-US".as_bytes()).unwrap();
	alert(&loc1.get_language());
	alert(&loc2.get_language());
}

I have my toolchain set up here. I use Twiggy to profile the size of the code.

With unic-langid-impl-0.7.2 out of the box, I get:

           5873 ┊     19.85% ┊ export "get_langauge"
           5858 ┊     19.79% ┊   ⤷ get_langauge
           5278 ┊     17.83% ┊       ⤷ unic_langid_impl::LanguageIdentifier::from_bytes::h25c1831e8202e4b6
            591 ┊      2.00% ┊           ⤷ unic_langid_impl::subtags::parse_variant_subtag::h1198a9025204caa2
            427 ┊      1.44% ┊           ⤷ <core::slice::Split<T,P> as core::iter::traits::iterator::Iterator>::next::h5ccdfcdaa719e75d
             58 ┊      0.20% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index::hfdbd2e8e98a5538d
            295 ┊      1.00% ┊           ⤷ alloc::slice::insert_head::h333d8d70a035dbe8
            237 ┊      0.80% ┊           ⤷ unic_langid_impl::subtags::parse_region_subtag::hfac6b531b4ecf168
            235 ┊      0.79% ┊           ⤷ alloc::vec::Vec<T>::push::h38bce0d5c9bff681
              5 ┊      0.02% ┊               ⤷ type[13]: (i32, i64) -> nil
            204 ┊      0.69% ┊           ⤷ alloc::slice::<impl [T]>::sort::{{closure}}::h4078393406c8d185
              6 ┊      0.02% ┊               ⤷ type[15]: (i64, i64) -> i32
            115 ┊      0.39% ┊           ⤷ memmove
            104 ┊      0.35% ┊           ⤷ core::iter::adapters::Peekable<I>::peek::hc1006d67d971c83b
             89 ┊      0.30% ┊           ⤷ core::alloc::Layout::repeat::h907536d65c798a93
             61 ┊      0.21% ┊           ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index_mut::ha4042dde3e27d80a
             49 ┊      0.17% ┊           ⤷ tinystr::tinystr4::TinyStr4::is_ascii_alphabetic::hbd058fdde3c679e9
             44 ┊      0.15% ┊           ⤷ <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index::h4253ae23d517efe1
             33 ┊      0.11% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::dealloc_buffer::haa6e8f76fb979e93
             12 ┊      0.04% ┊           ⤷ alloc::raw_vec::capacity_overflow::h25594a7affbe56e3
              8 ┊      0.03% ┊           ⤷ type[10]: (i32, i32, i32, i32, i32) -> nil
              7 ┊      0.02% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::allocate_in::{{closure}}::hb238c4cb2f739463
              7 ┊      0.02% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::allocate_in::{{closure}}::hb1e67dc4d598d007
            237 ┊      0.80% ┊       ⤷ core::result::Result<T,E>::unwrap::hc7aa25d8963c7c4c
            138 ┊      0.47% ┊           ⤷ core::result::unwrap_failed::hf7591c1dd9412006
             78 ┊      0.26% ┊       ⤷ unic_langid_impl::LanguageIdentifier::get_language::he9b07973971093fc
             45 ┊      0.15% ┊       ⤷ omnicu_xargo::alert::hd6da0f4b9baeffb3
             35 ┊      0.12% ┊           ⤷ import wbg::__wbg_alert_07a8b1643f0fc0c4
             43 ┊      0.15% ┊       ⤷ core::ptr::real_drop_in_place::had65767f0b54c832
             24 ┊      0.08% ┊       ⤷ __rust_dealloc
             12 ┊      0.04% ┊           ⤷ __rg_dealloc

When I comment out the variants.sort() line in unic-langid-impl/src/parser/mod.rs, I get:

           3506 ┊     13.21% ┊ export "get_langauge"
           3491 ┊     13.15% ┊   ⤷ get_langauge
           2911 ┊     10.97% ┊       ⤷ unic_langid_impl::LanguageIdentifier::from_bytes::h25c1831e8202e4b6
            591 ┊      2.23% ┊           ⤷ unic_langid_impl::subtags::parse_variant_subtag::h1198a9025204caa2
            435 ┊      1.64% ┊           ⤷ <core::slice::Split<T,P> as core::iter::traits::iterator::Iterator>::next::h5ccdfcdaa719e75d
             66 ┊      0.25% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index::hfdbd2e8e98a5538d
              8 ┊      0.03% ┊                   ⤷ type[10]: (i32, i32, i32, i32, i32) -> nil
            237 ┊      0.89% ┊           ⤷ unic_langid_impl::subtags::parse_region_subtag::hfac6b531b4ecf168
            232 ┊      0.87% ┊           ⤷ alloc::vec::Vec<T>::push::h38bce0d5c9bff681
             12 ┊      0.05% ┊               ⤷ alloc::raw_vec::capacity_overflow::h25594a7affbe56e3
              5 ┊      0.02% ┊               ⤷ type[13]: (i32, i64) -> nil
            104 ┊      0.39% ┊           ⤷ core::iter::adapters::Peekable<I>::peek::hc1006d67d971c83b
             49 ┊      0.18% ┊           ⤷ tinystr::tinystr4::TinyStr4::is_ascii_alphabetic::hbd058fdde3c679e9
             33 ┊      0.12% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::dealloc_buffer::haa6e8f76fb979e93
            237 ┊      0.89% ┊       ⤷ core::result::Result<T,E>::unwrap::hc7aa25d8963c7c4c
            138 ┊      0.52% ┊           ⤷ core::result::unwrap_failed::hf7591c1dd9412006
             78 ┊      0.29% ┊       ⤷ unic_langid_impl::LanguageIdentifier::get_language::he9b07973971093fc
             45 ┊      0.17% ┊       ⤷ omnicu_xargo::alert::hd6da0f4b9baeffb3
             35 ┊      0.13% ┊           ⤷ import wbg::__wbg_alert_07a8b1643f0fc0c4
             43 ┊      0.16% ┊       ⤷ core::ptr::real_drop_in_place::had65767f0b54c832
             24 ┊      0.09% ┊       ⤷ __rust_dealloc
             12 ┊      0.05% ┊           ⤷ __rg_dealloc

When I replace .sort() with .sort_unstable() in that one line, I get:

           7922 ┊     24.78% ┊ export "get_langauge"
           7907 ┊     24.73% ┊   ⤷ get_langauge
           7327 ┊     22.92% ┊       ⤷ unic_langid_impl::LanguageIdentifier::from_bytes::h25c1831e8202e4b6
           4396 ┊     13.75% ┊           ⤷ core::slice::sort::recurse::h26ae2b3ecd611434
            324 ┊      1.01% ┊               ⤷ core::slice::sort::shift_tail::he508bb8a3243ccdb
            204 ┊      0.64% ┊               ⤷ core::slice::<impl [T]>::sort_unstable::{{closure}}::h09904293e31dc7af
              6 ┊      0.02% ┊                   ⤷ type[15]: (i64, i64) -> i32
            168 ┊      0.53% ┊               ⤷ core::slice::sort::heapsort::{{closure}}::hac3b65c9e0abfb48
            123 ┊      0.38% ┊               ⤷ core::slice::sort::choose_pivot::{{closure}}::h9f1cc9882f75af60
             86 ┊      0.27% ┊                   ⤷ core::slice::sort::choose_pivot::{{closure}}::h4f952a83fc00f926
             91 ┊      0.28% ┊               ⤷ core::slice::<impl [T]>::swap::h1be65a25a19266f2
             70 ┊      0.22% ┊               ⤷ core::slice::sort::choose_pivot::{{closure}}::hdbca027bb550e45b
             61 ┊      0.19% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index_mut::ha4042dde3e27d80a
             47 ┊      0.15% ┊               ⤷ core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut::he58255048e17ba39
             47 ┊      0.15% ┊               ⤷ core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut::hd469b9c87a64ed09
            591 ┊      1.85% ┊           ⤷ unic_langid_impl::subtags::parse_variant_subtag::h1198a9025204caa2
            427 ┊      1.34% ┊           ⤷ <core::slice::Split<T,P> as core::iter::traits::iterator::Iterator>::next::h5ccdfcdaa719e75d
             58 ┊      0.18% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index::hfdbd2e8e98a5538d
            237 ┊      0.74% ┊           ⤷ unic_langid_impl::subtags::parse_region_subtag::hfac6b531b4ecf168
            232 ┊      0.73% ┊           ⤷ alloc::vec::Vec<T>::push::h38bce0d5c9bff681
             12 ┊      0.04% ┊               ⤷ alloc::raw_vec::capacity_overflow::h25594a7affbe56e3
              5 ┊      0.02% ┊               ⤷ type[13]: (i32, i64) -> nil
            104 ┊      0.33% ┊           ⤷ core::iter::adapters::Peekable<I>::peek::hc1006d67d971c83b
             49 ┊      0.15% ┊           ⤷ tinystr::tinystr4::TinyStr4::is_ascii_alphabetic::hbd058fdde3c679e9
             33 ┊      0.10% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::dealloc_buffer::haa6e8f76fb979e93
              8 ┊      0.03% ┊           ⤷ type[10]: (i32, i32, i32, i32, i32) -> nil
            237 ┊      0.74% ┊       ⤷ core::result::Result<T,E>::unwrap::hc7aa25d8963c7c4c
            138 ┊      0.43% ┊           ⤷ core::result::unwrap_failed::hf7591c1dd9412006
             78 ┊      0.24% ┊       ⤷ unic_langid_impl::LanguageIdentifier::get_language::he9b07973971093fc
             45 ┊      0.14% ┊       ⤷ omnicu_xargo::alert::hd6da0f4b9baeffb3
             35 ┊      0.11% ┊           ⤷ import wbg::__wbg_alert_07a8b1643f0fc0c4
             43 ┊      0.13% ┊       ⤷ core::ptr::real_drop_in_place::had65767f0b54c832
             24 ┊      0.08% ┊       ⤷ __rust_dealloc
             12 ┊      0.04% ┊           ⤷ __rg_dealloc

When I use version 0.8.0, I get (after fixing a compile error in my call site):

           7922 ┊     24.78% ┊ export "get_langauge"
           7907 ┊     24.74% ┊   ⤷ get_langauge
           7327 ┊     22.92% ┊       ⤷ unic_langid_impl::LanguageIdentifier::from_bytes::h279347be58208d78
           4396 ┊     13.75% ┊           ⤷ core::slice::sort::recurse::h62ca1ef9fc6d2fa4
            324 ┊      1.01% ┊               ⤷ core::slice::sort::shift_tail::h2ede9195bbc498f9
            204 ┊      0.64% ┊               ⤷ core::slice::<impl [T]>::sort_unstable::{{closure}}::haeab3f68a1ccf43b
              6 ┊      0.02% ┊                   ⤷ type[15]: (i64, i64) -> i32
            168 ┊      0.53% ┊               ⤷ core::slice::sort::heapsort::{{closure}}::hf6d2f08faf447ccd
            123 ┊      0.38% ┊               ⤷ core::slice::sort::choose_pivot::{{closure}}::he66d10be01a20fce
             86 ┊      0.27% ┊                   ⤷ core::slice::sort::choose_pivot::{{closure}}::h40c5df205093f0cb
             91 ┊      0.28% ┊               ⤷ core::slice::<impl [T]>::swap::hbf082fe8001070e2
             70 ┊      0.22% ┊               ⤷ core::slice::sort::choose_pivot::{{closure}}::h7747a1eb98086fc9
             61 ┊      0.19% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index_mut::hdf3c81e26708a9ce
             47 ┊      0.15% ┊               ⤷ core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut::h85611fa31aa54632
             47 ┊      0.15% ┊               ⤷ core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut::haf274e062922f630
            591 ┊      1.85% ┊           ⤷ unic_langid_impl::subtags::parse_variant_subtag::h12e9dbb8f3487194
            427 ┊      1.34% ┊           ⤷ <core::slice::Split<T,P> as core::iter::traits::iterator::Iterator>::next::h872e2b6622523b23
             58 ┊      0.18% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index::h81a320c31745a79b
            237 ┊      0.74% ┊           ⤷ unic_langid_impl::subtags::parse_region_subtag::hbc3516caf4c2cd30
            232 ┊      0.73% ┊           ⤷ alloc::vec::Vec<T>::push::h12ecf5cffdd880ad
             12 ┊      0.04% ┊               ⤷ alloc::raw_vec::capacity_overflow::h25594a7affbe56e3
              5 ┊      0.02% ┊               ⤷ type[13]: (i32, i64) -> nil
            104 ┊      0.33% ┊           ⤷ core::iter::adapters::Peekable<I>::peek::h89f4a8368e18e1a5
             49 ┊      0.15% ┊           ⤷ tinystr::tinystr4::TinyStr4::is_ascii_alphabetic::hbd058fdde3c679e9
             33 ┊      0.10% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::dealloc_buffer::h1773962ecf307183
              8 ┊      0.03% ┊           ⤷ type[10]: (i32, i32, i32, i32, i32) -> nil
            237 ┊      0.74% ┊       ⤷ core::result::Result<T,E>::unwrap::hb24a7d3d886a3e84
            138 ┊      0.43% ┊           ⤷ core::result::unwrap_failed::hf7591c1dd9412006
             78 ┊      0.24% ┊       ⤷ unic_langid_impl::LanguageIdentifier::language::h023debf99ab01c72
             45 ┊      0.14% ┊       ⤷ omnicu_xargo::alert::hd6da0f4b9baeffb3
             35 ┊      0.11% ┊           ⤷ import wbg::__wbg_alert_07a8b1643f0fc0c4
             43 ┊      0.13% ┊       ⤷ core::ptr::real_drop_in_place::h6a2d91186314df7f
             24 ┊      0.08% ┊       ⤷ __rust_dealloc
             12 ┊      0.04% ┊           ⤷ __rg_dealloc

And finally, if I replace .sort_unstable() with .sort() in version 0.8.0, I get:

           5873 ┊     19.85% ┊ export "get_langauge"
           5858 ┊     19.80% ┊   ⤷ get_langauge
           5278 ┊     17.84% ┊       ⤷ unic_langid_impl::LanguageIdentifier::from_bytes::h279347be58208d78
            591 ┊      2.00% ┊           ⤷ unic_langid_impl::subtags::parse_variant_subtag::h12e9dbb8f3487194
            427 ┊      1.44% ┊           ⤷ <core::slice::Split<T,P> as core::iter::traits::iterator::Iterator>::next::h872e2b6622523b23
             58 ┊      0.20% ┊               ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index::h81a320c31745a79b
            295 ┊      1.00% ┊           ⤷ alloc::slice::insert_head::h98090e23b8ac0c98
            237 ┊      0.80% ┊           ⤷ unic_langid_impl::subtags::parse_region_subtag::hbc3516caf4c2cd30
            235 ┊      0.79% ┊           ⤷ alloc::vec::Vec<T>::push::h12ecf5cffdd880ad
              5 ┊      0.02% ┊               ⤷ type[13]: (i32, i64) -> nil
            204 ┊      0.69% ┊           ⤷ alloc::slice::<impl [T]>::sort::{{closure}}::h9bbfe9ffd3974521
              6 ┊      0.02% ┊               ⤷ type[15]: (i64, i64) -> i32
            115 ┊      0.39% ┊           ⤷ memmove
            104 ┊      0.35% ┊           ⤷ core::iter::adapters::Peekable<I>::peek::h89f4a8368e18e1a5
             89 ┊      0.30% ┊           ⤷ core::alloc::Layout::repeat::h907536d65c798a93
             61 ┊      0.21% ┊           ⤷ <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index_mut::hdf3c81e26708a9ce
             49 ┊      0.17% ┊           ⤷ tinystr::tinystr4::TinyStr4::is_ascii_alphabetic::hbd058fdde3c679e9
             44 ┊      0.15% ┊           ⤷ <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index::hbf7210bebfec57a9
             33 ┊      0.11% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::dealloc_buffer::h1773962ecf307183
             12 ┊      0.04% ┊           ⤷ alloc::raw_vec::capacity_overflow::h25594a7affbe56e3
              8 ┊      0.03% ┊           ⤷ type[10]: (i32, i32, i32, i32, i32) -> nil
              7 ┊      0.02% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::allocate_in::{{closure}}::he8ece9a7f410bd90
              7 ┊      0.02% ┊           ⤷ alloc::raw_vec::RawVec<T,A>::allocate_in::{{closure}}::h5fff764167c54a21
            237 ┊      0.80% ┊       ⤷ core::result::Result<T,E>::unwrap::hb24a7d3d886a3e84
            138 ┊      0.47% ┊           ⤷ core::result::unwrap_failed::hf7591c1dd9412006
             78 ┊      0.26% ┊       ⤷ unic_langid_impl::LanguageIdentifier::language::h023debf99ab01c72
             45 ┊      0.15% ┊       ⤷ omnicu_xargo::alert::hd6da0f4b9baeffb3
             35 ┊      0.12% ┊           ⤷ import wbg::__wbg_alert_07a8b1643f0fc0c4
             43 ┊      0.15% ┊       ⤷ core::ptr::real_drop_in_place::h6a2d91186314df7f
             24 ┊      0.08% ┊       ⤷ __rust_dealloc
             12 ┊      0.04% ┊           ⤷ __rg_dealloc

This is not a very big or comprehensive test, but on the surface it appears as though .sort() is smaller than .sort_unstable().

@sffc
Copy link
Author

sffc commented Jan 30, 2020

One other observation. It appears that the compiler keeps .sort_unstable() as its own function, whereas it inlines .sort(). So, if there are other code paths that use .sort_unstable(), maybe the sorting code would not be duplicated, which would be an overall savings even if one call site is a bit bigger. I have not tested this theory.

@sffc
Copy link
Author

sffc commented Jan 31, 2020

I'm also increasingly thinking that OmnICU would need to be #![no_std] if we want to target WASM. That's the only way to definitely get the binary size down and not let it accidentally explode on us.

Have you considered making a #![no_std] version of unic-langid?

@Manishearth
Copy link
Collaborator

There are some vector libraries that allocate on the stack and don't require std.

But we could also use no_std+liballoc instead and this crate should just work, modulo the sort function. We can write our own simpler sort function.

@zbraniecki
Copy link
Owner

Thank you @sffc ! I filled spin off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants