Skip to content

Conversation

@hideokamoto
Copy link
Contributor

Problem

When running the langchain-stripe-loader package in a Node.js environment, the following error occurs:

Error loading documents: ReferenceError: document is not defined

This is because the Turndown library used for HTML to Markdown conversion relies on browser DOM APIs, which are not available in Node.js environments.

Solution

This PR replaces the Turndown library with node-html-markdown, which is designed to work natively in Node.js environments without requiring DOM APIs.

Changes:

  • Removed Turndown and its type definitions
  • Added node-html-markdown as a dependency
  • Updated the HTML to Markdown conversion logic in both StripeComDocumentLoader and StripeDocsDocumentLoader classes

Benefits

  • Fixed compatibility with Node.js environments (including AWS Lambda)
  • Significantly reduced bundle size (from 4MB to 376KB)
  • Improved performance by using a library optimized for Node.js

Testing

  • Tested in a Node.js demo application
  • Verified that the loaders can successfully fetch and convert Stripe documentation

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hideokamoto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request addresses an issue where the langchain-stripe-loader package was not compatible with Node.js environments due to the Turndown library's reliance on browser DOM APIs. To resolve this, I've replaced Turndown with node-html-markdown, which is designed for Node.js environments. This change also significantly reduces the bundle size and improves performance. The changes primarily involve updating the package.json files to remove the turndown dependency and add node-html-markdown, and modifying the StripeComDocumentLoader and StripeDocsDocumentLoader classes to use node-html-markdown for HTML to Markdown conversion.

Highlights

  • Node.js Compatibility: The primary goal of this PR is to fix compatibility issues with Node.js environments by replacing Turndown with node-html-markdown.
  • Bundle Size Reduction: Replacing Turndown with node-html-markdown significantly reduces the bundle size of the package.
  • Performance Improvement: The new library is optimized for Node.js, leading to improved performance in Node.js environments.
  • Metadata Updates: The package.json files have been updated with keywords, descriptions, repository information, and other metadata.

Changelog

Click here to see the changelog
  • package.json
    • Added keywords related to Stripe, documentation, loader, langchain, and sitemap.
    • Updated the license to MIT.
    • Added a description for the package.
    • Added repository, bugs, and homepage information.
    • These changes were applied to the root package.json.
  • packages/core/package.json
    • Added keywords related to Stripe, documentation, sitemap, processor, and utility.
    • Added repository, bugs, and homepage information.
    • Added publishConfig to allow public access.
    • Added a prepublishOnly script to ensure the package is built before publishing.
  • packages/langchain-stripe-loader/package.json
    • Replaced turndown with node-html-markdown as a dependency.
    • Removed @types/turndown from devDependencies.
    • Added keywords related to Stripe, documentation, loader, langchain, LLM, AI, and document-loader.
    • Added repository, bugs, and homepage information.
    • Added publishConfig to allow public access.
    • Added a prepublishOnly script to ensure the package is built before publishing.
  • packages/langchain-stripe-loader/src/StripeComLoader.ts
    • Replaced Turndown import with NodeHtmlMarkdown import.
    • Replaced encoder.turndown(article.content) with nhm.translate(article.content) to convert HTML to Markdown using node-html-markdown.
  • packages/langchain-stripe-loader/src/StripeDocsLoader.ts
    • Replaced Turndown import with NodeHtmlMarkdown import.
    • Replaced encoder.turndown(article.content) with nhm.translate(article.content) to convert HTML to Markdown using node-html-markdown.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Trivia time!

What is the primary use case for Markdown, and who initially created it?

Click here for the answer
Markdown is primarily used for formatting text on the web and was created by John Gruber in 2004.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical issue where the langchain-stripe-loader package was incompatible with Node.js environments due to the Turndown library's reliance on browser DOM APIs. The solution of replacing Turndown with node-html-markdown is well-justified and brings significant benefits, including Node.js compatibility, reduced bundle size, and improved performance. The changes are focused and the provided testing information is reassuring.

Summary of Findings

  • Missing Error Handling: The code does not include error handling for the nhm.translate function. If this function throws an error, it could crash the application. Error handling should be added to ensure the application is robust.

Merge Readiness

The pull request is well-structured and addresses a significant compatibility issue. However, the absence of error handling around the nhm.translate function is a concern that needs to be addressed before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issue should be addressed before merging.

Comment on lines 102 to +103
const documents = articles.map(article => {
const markdownContent = encoder.turndown(article.content);
const markdownContent = nhm.translate(article.content);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling around the nhm.translate function. If this function throws an error, it could crash the application. Wrapping it in a try-catch block would make the code more robust.

    const documents = articles.map(article => {
      let markdownContent;
      try {
        markdownContent = nhm.translate(article.content);
      } catch (error) {
        console.error("Error converting HTML to Markdown:", error);
        markdownContent = ""; // Or some default value or error handling
      }
      return new Document({

Comment on lines 68 to +69
const documents = articles.map(article => {
const markdownContent = encoder.turndown(article.content);
const markdownContent = nhm.translate(article.content);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling around the nhm.translate function. If this function throws an error, it could crash the application. Wrapping it in a try-catch block would make the code more robust.

    const documents = articles.map(article => {
      let markdownContent;
      try {
        markdownContent = nhm.translate(article.content);
      } catch (error) {
        console.error("Error converting HTML to Markdown:", error);
        markdownContent = ""; // Or some default value or error handling
      }
      return new Document({

@hideokamoto hideokamoto merged commit 6488970 into main Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants