Skip to content

Add Oxylabs Document Loader #4625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oxy-rostyslav
Copy link

Description

Adds an Oxylabs Document loader that allows to load the data from multiple sources efficiently.

Example

image image

@HenryHengZJ
Copy link
Contributor

can you do pnpm lint-fix to fix the linting issues?

@HenryHengZJ HenryHengZJ requested a review from Copilot June 24, 2025 18:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new Oxylabs document loader that integrates with the Oxylabs real-time scraping API and exposes it as a Flowise node.

  • Introduces OxylabsLoader for making authenticated requests to various Oxylabs sources.
  • Wraps the loader in an INode implementation (Oxylabs_DocumentLoaders) with UI inputs and output handling.
  • Adds a credential definition for Oxylabs API credentials.

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File Description
packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts New loader class, API request methods, and Flowise node wrapper
packages/components/credentials/OxylabsApi.credential.ts New credential class for Oxylabs API username/password
Comments suppressed due to low confidence (3)

packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:69

  • [nitpick] The OxylabsLoader class lacks JSDoc or inline comments. Adding a brief description and parameter/docs would improve readability and help future maintainers.
export class OxylabsLoader extends BaseDocumentLoader {

packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:165

  • [nitpick] Class name Oxylabs_DocumentLoaders uses an underscore and plural form. Consider renaming to OxylabsDocumentLoaderNode for consistency with the project's naming conventions.
class Oxylabs_DocumentLoaders implements INode {

packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:1

  • No unit tests were added for OxylabsLoader. Consider adding tests to cover each source type, parameter filtering, and error handling paths.
import { TextSplitter } from 'langchain/text_splitter'

private async sendAPIRequest<R>(params: any): Promise<AxiosResponse<R, any>> {
params = Object.fromEntries(Object.entries(params).filter(([_, value]) => value !== null && value !== '' && value !== undefined))

const auth = btoa(`${this.params.username}:${this.params.password}`)
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using btoa may not work in Node.js environments; consider using Buffer.from(\${this.params.username}:${this.params.password}`).toString('base64')` for server-side base64 encoding.

Suggested change
const auth = btoa(`${this.params.username}:${this.params.password}`)
const auth = Buffer.from(`${this.params.username}:${this.params.password}`).toString('base64')

Copilot uses AI. Check for mistakes.

Comment on lines +153 to +159
const docs: OxylabsDocument[] = [
{
id: response.data.job.id.toString(),
pageContent: response.data.results[0].content,
metadata: {}
}
]
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first result is converted into a document. To support multiple pages or results, map over response.data.results and create a document for each entry.

Suggested change
const docs: OxylabsDocument[] = [
{
id: response.data.job.id.toString(),
pageContent: response.data.results[0].content,
metadata: {}
}
]
const docs: OxylabsDocument[] = response.data.results.map((result, index) => ({
id: `${response.data.job.id.toString()}-${index}`,
pageContent: result.content,
metadata: {}
}))

Copilot uses AI. Check for mistakes.

return response
}

private async getUniversal(): Promise<AxiosResponse<OxylabsResponse, any>> {
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The getUniversal, getGoogleSearch, getAmazonProduct, and getAmazonSearch methods are almost identical. Consider unifying them into a single method that accepts source as an argument to reduce duplication.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants