Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework handling general entity references (&entity;) #766

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Jun 21, 2024

This is a big change in handling general entity references and character references. Open PR early to get feedback.

With this changes we can correctly parse document

<!DOCTYPE root [
  <!ENTITY root "<root/>">
]>
&root;

as equivalent normalized document

<root/>

The updated custom_entities example shows how it would be possible to implement requirement from the specification about parsed general entities. Serde deserializer did not updated yet, because this is not trivial part and probably that will be done in another PR.

Of course, such change probably makes the performance worse, I didn't measure impact yet.

Mingun and others added 11 commits June 23, 2024 17:59
…onstruction in a text

failures (18):
  serde-de (9):
    borrow::escaped::element
    borrow::escaped::top_level
    resolve::resolve_custom_entity
    trivial::text::byte_buf
    trivial::text::bytes
    trivial::text::string::field
    trivial::text::string::naked
    trivial::text::string::text
    xml_schema_lists::element::text::string
  serde-migrated (1):
    test_parse_string
  serde-se (5):
    with_root::char_amp
    with_root::char_gt
    with_root::char_lt
    with_root::str_escaped
    with_root::tuple
  --doc (3):
    src\de\resolver.rs - de::resolver::EntityResolver (line 13)
    src\reader\ns_reader.rs - reader::ns_reader::NsReader<&'i[u8]>::read_text (line 870)
    src\reader\slice_reader.rs - reader::slice_reader::Reader<&'a[u8]>::read_text (line 186)
Text events produces by the Reader can not contain escaped data anymore,
all such data is represented by the Event::GeneralRef
failures:
    src\reader\ns_reader.rs - reader::ns_reader::NsReader<&'i[u8]>::read_text (line 870)
    src\reader\slice_reader.rs - reader::slice_reader::Reader<&'a[u8]>::read_text (line 185)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant