Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to unicode 12.1.0 #11

Merged
merged 1 commit into from
Jul 25, 2019
Merged

Conversation

matklad
Copy link
Contributor

@matklad matklad commented Jul 21, 2019

Ran python scripts/unicode.py, copied tables.rs.

Question: what is the maintenance status of this crate? I'd love to switch rustc's lexer to this crate (discussion): this'll make it easier to reuse lexer elsewhere, and will also allow us to remove corresponding unicode tables from libcore.

@Manishearth
Copy link
Member

It's maintained, but not well. Happy to have help from rustc folks.

@Manishearth Manishearth merged commit 31a29c1 into unicode-rs:master Jul 25, 2019
@Manishearth
Copy link
Member

Make a PR for the new version?

@matklad matklad deleted the new-unicode branch July 25, 2019 12:44
@matklad
Copy link
Contributor Author

matklad commented Jul 25, 2019

@Manishearth done #12.

Pulled #7 and #8 there as well

Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants