-
Notifications
You must be signed in to change notification settings - Fork 0
Fix: Replace Turndown with node-html-markdown for Node.js compatibility #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @hideokamoto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request addresses an issue where the langchain-stripe-loader package was not compatible with Node.js environments due to the Turndown library's reliance on browser DOM APIs. To resolve this, I've replaced Turndown with node-html-markdown, which is designed for Node.js environments. This change also significantly reduces the bundle size and improves performance. The changes primarily involve updating the package.json files to remove the turndown dependency and add node-html-markdown, and modifying the StripeComDocumentLoader and StripeDocsDocumentLoader classes to use node-html-markdown for HTML to Markdown conversion.
Highlights
- Node.js Compatibility: The primary goal of this PR is to fix compatibility issues with Node.js environments by replacing
Turndownwithnode-html-markdown. - Bundle Size Reduction: Replacing
Turndownwithnode-html-markdownsignificantly reduces the bundle size of the package. - Performance Improvement: The new library is optimized for Node.js, leading to improved performance in Node.js environments.
- Metadata Updates: The
package.jsonfiles have been updated with keywords, descriptions, repository information, and other metadata.
Changelog
Click here to see the changelog
- package.json
- Added keywords related to Stripe, documentation, loader, langchain, and sitemap.
- Updated the license to MIT.
- Added a description for the package.
- Added repository, bugs, and homepage information.
- These changes were applied to the root
package.json.
- packages/core/package.json
- Added keywords related to Stripe, documentation, sitemap, processor, and utility.
- Added repository, bugs, and homepage information.
- Added publishConfig to allow public access.
- Added a prepublishOnly script to ensure the package is built before publishing.
- packages/langchain-stripe-loader/package.json
- Replaced
turndownwithnode-html-markdownas a dependency. - Removed
@types/turndownfrom devDependencies. - Added keywords related to Stripe, documentation, loader, langchain, LLM, AI, and document-loader.
- Added repository, bugs, and homepage information.
- Added publishConfig to allow public access.
- Added a prepublishOnly script to ensure the package is built before publishing.
- Replaced
- packages/langchain-stripe-loader/src/StripeComLoader.ts
- Replaced
Turndownimport withNodeHtmlMarkdownimport. - Replaced
encoder.turndown(article.content)withnhm.translate(article.content)to convert HTML to Markdown usingnode-html-markdown.
- Replaced
- packages/langchain-stripe-loader/src/StripeDocsLoader.ts
- Replaced
Turndownimport withNodeHtmlMarkdownimport. - Replaced
encoder.turndown(article.content)withnhm.translate(article.content)to convert HTML to Markdown usingnode-html-markdown.
- Replaced
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Trivia time!
What is the primary use case for Markdown, and who initially created it?
Click here for the answer
Markdown is primarily used for formatting text on the web and was created by John Gruber in 2004.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical issue where the langchain-stripe-loader package was incompatible with Node.js environments due to the Turndown library's reliance on browser DOM APIs. The solution of replacing Turndown with node-html-markdown is well-justified and brings significant benefits, including Node.js compatibility, reduced bundle size, and improved performance. The changes are focused and the provided testing information is reassuring.
Summary of Findings
- Missing Error Handling: The code does not include error handling for the
nhm.translatefunction. If this function throws an error, it could crash the application. Error handling should be added to ensure the application is robust.
Merge Readiness
The pull request is well-structured and addresses a significant compatibility issue. However, the absence of error handling around the nhm.translate function is a concern that needs to be addressed before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issue should be addressed before merging.
| const documents = articles.map(article => { | ||
| const markdownContent = encoder.turndown(article.content); | ||
| const markdownContent = nhm.translate(article.content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around the nhm.translate function. If this function throws an error, it could crash the application. Wrapping it in a try-catch block would make the code more robust.
const documents = articles.map(article => {
let markdownContent;
try {
markdownContent = nhm.translate(article.content);
} catch (error) {
console.error("Error converting HTML to Markdown:", error);
markdownContent = ""; // Or some default value or error handling
}
return new Document({| const documents = articles.map(article => { | ||
| const markdownContent = encoder.turndown(article.content); | ||
| const markdownContent = nhm.translate(article.content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around the nhm.translate function. If this function throws an error, it could crash the application. Wrapping it in a try-catch block would make the code more robust.
const documents = articles.map(article => {
let markdownContent;
try {
markdownContent = nhm.translate(article.content);
} catch (error) {
console.error("Error converting HTML to Markdown:", error);
markdownContent = ""; // Or some default value or error handling
}
return new Document({
Problem
When running the
langchain-stripe-loaderpackage in a Node.js environment, the following error occurs:This is because the
Turndownlibrary used for HTML to Markdown conversion relies on browser DOM APIs, which are not available in Node.js environments.Solution
This PR replaces the
Turndownlibrary withnode-html-markdown, which is designed to work natively in Node.js environments without requiring DOM APIs.Changes:
Turndownand its type definitionsnode-html-markdownas a dependencyStripeComDocumentLoaderandStripeDocsDocumentLoaderclassesBenefits
Testing