Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting no_std in ICU4X #812

Closed
sffc opened this issue Jun 17, 2021 · 11 comments · Fixed by #911
Closed

Revisiting no_std in ICU4X #812

sffc opened this issue Jun 17, 2021 · 11 comments · Fixed by #911
Assignees
Labels
A-design Area: Architecture or design A-performance Area: Performance (CPU, Memory) C-meta Component: Relating to ICU4X as a whole question Unresolved questions; type unclear S-epic Size: Major project (create smaller child issues)
Milestone

Comments

@sffc
Copy link
Member

sffc commented Jun 17, 2021

2021 follow-up to #77

The allocator remains one of the largest chunks of code size in WASM files, and it has been reported to us by some customers (such as .NET) that it can cause integration issues.

I would therefore like us to revisit no_std as a baseline in ICU4X, meaning no allocator (more specifically, types that require heap allocations, like Vec and String, would be off limits from core library code).

With #78 and #667, the data provider will support zero-alloc. We are fairly close in some other areas. FixedDecimal and LanguageIdentifier both have a Vec-like field; we would need to decide what to do with these: perhaps return an error code if overflow would occur?

Most importantly, I don't want to embark on no_std without the full support of the team. I truly think no_std is an achievable and noble goal for us, but it will require commitment and a team culture where no_std is the norm. More specifically, although I don't think it should be a requirement that everything in the ICU4X core library is no_std, I want that to be the exception, not the rule. New features should land with no_std support by default, unless team members provide a technical justification otherwise.

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Jun 17, 2021
@dminor
Copy link
Contributor

dminor commented Jun 17, 2021

There's a little bit of use of Vec in LocaleCanonicalizer, but that is largely driven by the Vec like field on LanguageIdentifier. There's also a use of String to work around some awkwardness parsing an extension value. I think a little work on the APIs for LanguageIdentifier/Locale and it would be easy to remove this stuff.

In general, I think supporting no_std is a good idea. Beyond WASM, it would potentially open up uses in embedded systems as well.

@sffc
Copy link
Member Author

sffc commented Jun 24, 2021

  • @zbraniecki - I like the idea but I'm a bit concerned about the drag factor. It's one more thing that affects our velocity. And it's not something we need for Gecko. I think the reward is there, but I'm not sure about the RoI.
  • @Manishearth - Now that we have ZeroVec, I think this is a valid goal. Note that trait objects over FFI require allocation because of boxes. So I think this is a good idea, but there are also other ways to solve the C# issues mentioned above. I am in favor of this, but not for the stated reasons.
  • @nordzilla - If FFI requires it, then would the only consumers of no_std be Rust consumers?
  • @Manishearth - We could do opaque stack traits, but those are tricky. Plugging in a bump allocator would be easier.
  • @sffc - I think we can decouple the FFI problem from no_std in components.
  • @Manishearth - I think we can do this in many small steps.
  • @sffc - We should have a common location where we document decisions about how to migrate things like Vec and String to no_std.

@sffc sffc added A-design Area: Architecture or design A-performance Area: Performance (CPU, Memory) C-meta Component: Relating to ICU4X as a whole question Unresolved questions; type unclear S-epic Size: Major project (create smaller child issues) and removed discuss Discuss at a future ICU4X-SC meeting labels Jun 24, 2021
@sffc sffc added this to the 2021 Q3-m1 milestone Jun 24, 2021
@sffc sffc self-assigned this Jun 24, 2021
@sffc
Copy link
Member Author

sffc commented Jun 24, 2021

Next step: assess the work involved and find an assignee.

@gregtatum
Copy link
Member

praise: Thanks for leaving nice notes on the conversation here, it helps get up to speed here, since I missed this meeting.

@Manishearth
Copy link
Member

I'm planning to spend the next week-ish trying stuff here. My initial direction will actually be to have both no_std and no_std-but-alloc support where possible, and perhaps that goal will be too lofty, but I'd like to start with that and downgrade to no_std-with-alloc if it doesn't work. My main goal is to figure out what challenges are involved and how much work this will take. I'm aware of the shape of the challenges for this kind of refactor; however it's as of yet unclear if they apply to our specific situation. We'll see!

I'll probably have some kind of draft PR that might not be representative of how we actually want to do this.

@Manishearth Manishearth self-assigned this Jul 13, 2021
@Manishearth
Copy link
Member

Manishearth commented Jul 14, 2021

I've been making some progress, I managed to get icu_capi's serde dependency off std (which I anticipated would be a bigger task). A larger issue is that we use thiserror, which unfortunately implements std::error::Error and Display, incompatibly with no_std. We could turn it off but we do use the Display impl: I recommend switching to displaydoc as is mentioned here: dtolnay/thiserror#64

@Manishearth
Copy link
Member

rust-lang/cargo#9696 is making it harder for me to do this, but it's not actually a limitation in the final situation, it just makes it harder to work on this incrementally.

@Manishearth
Copy link
Member

immediately realized that i need to do more error work. I think for now I might actually drop the usage of std::error completely and use a dummy blanket trait instead

@Manishearth
Copy link
Member

core_error does solve our problem but it's prerelease

@Manishearth
Copy link
Member

#865 makes progress, however it doesn't remove std from fs since we still use std::error::Error. I'll be working on removing that next

@Manishearth
Copy link
Member

Manishearth commented Jul 20, 2021

Useful awk script for adding imports
# gawk -f std.awk  -i inplace file.rs file.rs
# find src -name *.rs | xargs -I {} gawk -f std.awk  -i inplace {} {}

NR==FNR{ 
    if(!/\/\/\//) {
        if(/\yString\y/) {
            string = 1;
        }
        if(/Vec/) {
            vec = 1;
        }
        if(/\yBox\y/) {
            box = 1;
        }
        if(/\yAny\y/) {
            any = 1;
        }
        if(/\yATypeId\y/) {
            typeid = 1;
        }
        if(/\yvec[!]/) {
            vecm = 1;
        }
        if(/\yformat[!]/) {
            format = 1;
        }
        if(/\yto_string\y/) {
            tostring = 1;
        }
    }
    next
}
{
    if(/^use / && use == 0) {
        use = 1
        if(any == 1) {
            print "use core::any::Any;"
        }
        if(typeid == 1) {
            print "use core::any::TypeId;"
        }
        if(box == 1) {
            print "use alloc::boxed::Box;"
        }
        if(format == 1) {
            print "use alloc::format;"
        }
        if(string == 1) {
            print "use alloc::string::String;"
        }
        if(tostring == 1) {
            print "use alloc::string::ToString;"
        }
        if(vec == 1) {
            print "use alloc::vec::Vec;"
        }
        if(vecm == 1) {
            print "use alloc::vec;"
        }
   }

   if(!/\/\/\//) {
       gsub(/std::rc::Rc/,"alloc::rc::Rc");
       gsub(/std::sync::Arc/,"alloc::sync::Arc");
       gsub(/std::borrow::Cow/,"alloc::borrow::Cow");
       gsub(/std::borrow::ToOwned/,"alloc::borrow::ToOwned");
       gsub(/\sToOwned/," alloc::borrow::ToOwned");
       gsub(/std::/,"core::");
   }
   print $0;
}

This was referenced Jul 22, 2021
@sffc sffc modified the milestones: 2021 Q3-m1, ICU4X 0.3 Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-design Area: Architecture or design A-performance Area: Performance (CPU, Memory) C-meta Component: Relating to ICU4X as a whole question Unresolved questions; type unclear S-epic Size: Major project (create smaller child issues)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants