-
-
Notifications
You must be signed in to change notification settings - Fork 20.8k
Add Oxylabs Document Loader #4625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Oxylabs Document Loader #4625
Conversation
can you do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new Oxylabs document loader that integrates with the Oxylabs real-time scraping API and exposes it as a Flowise node.
- Introduces
OxylabsLoader
for making authenticated requests to various Oxylabs sources. - Wraps the loader in an
INode
implementation (Oxylabs_DocumentLoaders
) with UI inputs and output handling. - Adds a credential definition for Oxylabs API credentials.
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts | New loader class, API request methods, and Flowise node wrapper |
packages/components/credentials/OxylabsApi.credential.ts | New credential class for Oxylabs API username/password |
Comments suppressed due to low confidence (3)
packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:69
- [nitpick] The
OxylabsLoader
class lacks JSDoc or inline comments. Adding a brief description and parameter/docs would improve readability and help future maintainers.
export class OxylabsLoader extends BaseDocumentLoader {
packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:165
- [nitpick] Class name
Oxylabs_DocumentLoaders
uses an underscore and plural form. Consider renaming toOxylabsDocumentLoaderNode
for consistency with the project's naming conventions.
class Oxylabs_DocumentLoaders implements INode {
packages/components/nodes/documentloaders/Oxylabs/Oxylabs.ts:1
- No unit tests were added for
OxylabsLoader
. Consider adding tests to cover each source type, parameter filtering, and error handling paths.
import { TextSplitter } from 'langchain/text_splitter'
private async sendAPIRequest<R>(params: any): Promise<AxiosResponse<R, any>> { | ||
params = Object.fromEntries(Object.entries(params).filter(([_, value]) => value !== null && value !== '' && value !== undefined)) | ||
|
||
const auth = btoa(`${this.params.username}:${this.params.password}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using btoa may not work in Node.js environments; consider using Buffer.from(\
${this.params.username}:${this.params.password}`).toString('base64')` for server-side base64 encoding.
const auth = btoa(`${this.params.username}:${this.params.password}`) | |
const auth = Buffer.from(`${this.params.username}:${this.params.password}`).toString('base64') |
Copilot uses AI. Check for mistakes.
const docs: OxylabsDocument[] = [ | ||
{ | ||
id: response.data.job.id.toString(), | ||
pageContent: response.data.results[0].content, | ||
metadata: {} | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the first result is converted into a document. To support multiple pages or results, map over response.data.results
and create a document for each entry.
const docs: OxylabsDocument[] = [ | |
{ | |
id: response.data.job.id.toString(), | |
pageContent: response.data.results[0].content, | |
metadata: {} | |
} | |
] | |
const docs: OxylabsDocument[] = response.data.results.map((result, index) => ({ | |
id: `${response.data.job.id.toString()}-${index}`, | |
pageContent: result.content, | |
metadata: {} | |
})) |
Copilot uses AI. Check for mistakes.
return response | ||
} | ||
|
||
private async getUniversal(): Promise<AxiosResponse<OxylabsResponse, any>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The getUniversal
, getGoogleSearch
, getAmazonProduct
, and getAmazonSearch
methods are almost identical. Consider unifying them into a single method that accepts source
as an argument to reduce duplication.
Copilot uses AI. Check for mistakes.
Description
Adds an Oxylabs Document loader that allows to load the data from multiple sources efficiently.
Example