Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Implement new method for parsing markup from Fetch responses #10076

Closed
brandonmcconnell opened this issue Jan 18, 2024 · 4 comments
Closed
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@brandonmcconnell
Copy link

brandonmcconnell commented Jan 18, 2024

Spec proposal

collapsed by default to avoid cluttering the required fields below

Introduction

This proposal introduces a new method, .markup(), to be included in the Fetch API response parsing interface. This method offers a streamlined interface for parsing various markup data formats such as HTML and XML.

Rationale

Consistency and Flexibility

Just as .text() and .json() methods offer simplified handling of text and JSON data, .markup() extends this ease of use to markup languages. By providing a single method with optional configurations, developers can handle XML and HTML data more flexibly within the same framework.

Performance and Optimization

Integrating this method into the language standard allows for engine-level optimizations, potentially outperforming custom parsing solutions and improving overall performance.

Enhanced Readability and Maintenance

A unified method simplifies codebases, enhancing readability and ease of maintenance. This aligns with modern JavaScript's goal of concise and powerful syntax.

Technical Specification

.markup() Method

  • Purpose: Parses the response body as XML or HTML based on specified configurations.
  • Usage: response.markup(options).
    • options: An optional argument specifying parsing preferences
      • type: "text/html" | "text/xml" (enforces self-closing tags, etc. for HTML)
  • Return Type: A promise that resolves with the result of parsing the response body text as specified.

Implementation Notes

  • Should follow the structural design of .text() and .json().
  • Includes error handling for malformed content, with robustness akin to .json().
  • Security considerations are paramount, especially for HTML content, to prevent injection attacks.
  • Should be capable of handling self-closing tags in HTML when specified in options.

Use Cases

  • XML Feeds: Facilitates the consumption of XML feeds, such as RSS or Atom.
  • Client-Side Templating: Simplifies integration of HTML templates fetched from a server.
  • Web Scraping: Aids in efficient parsing of HTML for data extraction.

Potential Challenges

  • Security Concerns: Ensuring safe parsing, particularly for HTML, to prevent XSS attacks.
  • Browser Support and Polyfills: Guaranteeing consistent behavior across different JavaScript engines and providing polyfills for backward compatibility.

Conclusion

Introducing the .markup() method in ECMAScript offers a versatile and optimized approach to handling XML and HTML data. This proposal seeks the TC39's consideration for this addition, which is in line with JavaScript's evolution towards a more powerful and developer-friendly language.

What problem are you trying to solve?

Currently, developers handling XML and HTML content in ECMAScript face a lack of native, streamlined methods for parsing these markup languages. This leads to reliance on custom or third-party parsing solutions, which can vary in efficiency, security, and ease of use.

fetch("https://swapi.dev")
  .then(response => response.markup({ type: "text/html" }))
  .then(data => console.log(doc))
  .catch(error => console.error(error));

What solutions exist today?

Presently, developers typically use custom-built parsers or third-party libraries to parse XML and HTML content. For example, libraries like xml2js or node-html-parser provide these capabilities, but they require additional dependencies and may not be optimized for all use cases. These solutions often lead to inconsistent implementations and may pose security risks, especially when parsing HTML content.

One workaround involves using the .text() method and then parsing its content using a new DOMParser.

For example:

fetch("https://swapi.dev")
  .then(response => response.text())
  .then(data => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(data, "text/html");
    console.log(doc);
  })
  .catch(error => console.error(error));

This method is a bit cumbersome and does not provide any any of the security benefits of the Sanitizer API.

How would you solve it?

The solution is to introduce a new method, .markup(), into the ECMAScript standard. This method will unify and simplify the parsing of XML and HTML content. By offering an optional configuration argument, it allows developers to specify the content type (XML or HTML) and other parsing preferences. For instance, response.markup({ type: "text/html" }) would parse HTML content while appropriately handling self-closing tags (the default behavior). This approach ensures consistency, optimizes performance, and reduces the security risks associated with third-party parsers.

Anything else?

In addition to providing a unified method for parsing markup languages, the .markup() method will include robust error handling and security features, especially vital for HTML parsing to prevent cross-site scripting (XSS) attacks. It should natively support the Sanitizer API, similar to how the setHTML() method will.

Its design will be in line with the existing .text() and .json() methods, ensuring familiarity and ease of adoption for developers. The proposal also considers the need for backward compatibility and browser support, suggesting the development of polyfills for older environments.

@brandonmcconnell brandonmcconnell added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Jan 18, 2024
@annevk
Copy link
Member

annevk commented Jan 19, 2024

Duplicate of #2142.

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
@domenic
Copy link
Member

domenic commented Jan 19, 2024

Seemed to me kind of like a dupe of whatwg/fetch#16

@brandonmcconnell
Copy link
Author

@annevk This proposal does not relate to streaming HTML content into elements.

@annevk
Copy link
Member

annevk commented Jan 19, 2024

At least to me #2142 covers the idea of a streaming parser API generally.

And yeah, I guess Domenic is correct that exposing a method directly on Response for this is a non-starter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

3 participants