-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
feat: added 3 Bright Data web scraping tools #4700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add BrightDataWebScraper: Web scraping with markdown/HTML output - Add BrightDataSearchEngine: Multi-engine search (Google, Bing, Yandex) - Add BrightDataStructuredData: 40+ dataset auto-detection and extraction All tools include: - Comprehensive error handling - Configurable timeouts and zones - FlowiseAI integration patterns - Debug logging for troubleshooting
The update is containing the components by Bright Data
- Fixed YouTube video/comments dataset ID conflict - Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com - Enhanced tool descriptions to include all 40+ supported platforms - Improved pattern detection order for better matching - Added comprehensive platform support documentation
- Fixed YouTube video/comments dataset ID conflict - Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com - Enhanced tool descriptions to include all 40+ supported platforms - Improved pattern detection order for better matching - Added comprehensive platform support documentation
…ata/brightdata-Flowise-component into feature/brightdata-tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds two new Bright Data–powered tools and a credential definition to support web scraping and search functionality.
- Introduces
BrightDataWebScraperTool
for page scraping with Bright Data Web Unlocker. - Implements
BrightDataSearchEngineTool
for paginated search results from Google, Bing, and Yandex. - Defines
BrightDataApiCredential
for managing Bright Data API tokens.
Reviewed Changes
Copilot reviewed 3 out of 10 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts | Web scraper tool implementation and node registration |
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts | Search engine tool with pagination and error handling |
packages/components/credentials/BrightData.credential.ts | Bright Data API credential definition |
Comments suppressed due to low confidence (1)
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts:125
- [nitpick] The class name contains an underscore; rename to 'BrightDataWebScraperTools' to follow PascalCase naming conventions and maintain consistency.
class BrightDataWebScraper_Tools implements INode {
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts
Outdated
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts
Outdated
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts
Outdated
Show resolved
Hide resolved
thanks! can you remove the redundant folder |
Hi @HenryHengZJ , I've removed the redundant shared folder as requested. Regarding "allow edits for maintainer" - this option is not available for PRs from organization forks (brightdata) due to GitHub's policy. GitHub only allows this feature for personal account forks. If you need to make edits, I'm happy to implement any changes you suggest through the normal review process. or submit from a personal account (but the tool will have to be under BrightData) Thanks! |
Hey @Idanvilenski. I tried using the Brightdata tools in my chatflow and I can't get them to work. I keep getting 400 errors during tool call. Let me know if I'm doing something wrong or if I should follow certain steps (so we can document it). I'm using the API key from a free Brightdata account. |
Hi, @0xi4o thanks for checking out the component, I'm sorry to see that you have problems with the component, since I can see that the tools are being called correctly by the agent, I think this is an API issue. Please make sure you have "Admin permissions" for your API key on the Bright Data website (like in the picture) - let me know if thats not the case. Also, I noticed that the agent is trying to use the search_engine function (which is used for serp searches on google yandex and bing) to perform the web_unlocker / structured_data actions (extract data from a specific website) - we will look into that from our end. Please look at the permissions issue and let me know if that was the problem. Thanks, |
Hey @Idanvilenski. Unfortunately, I'm still running into the same issues. I used an API key with admin permissions. I did test out the tools individually, and got different errors for each one: |
Hey @0xi4o , I am sorry about the slow process. Regarding the Search Engine tool:
Regarding the Structured Data tool: Regarding the Web Scraper tool: Here is an example for a more comprehensive use of search + structured data extraction, note that some times its not working / the correct answer arrives after error message because the agent receives the tool's response after answering in the chat (for me when it happened the agent gave the correct answer without additional prompt after a couple of seconds). Let me know if everything works! |
Hi @0xi4o , Did you have a chance to try following the last comment? Thanks, |
Added 3 web scraping tools powered by Bright Data
Structured Data tool - contains 40+ different data sets with auto-select according to the website in the URL
Web Unlocker tool - Unlocks any website with blocking bypass
Search Engine tool - Use Bright Data to search Bing Google or Yandex.
You can find the tools in the tool section on the tools section. (under the "LangChain colomn)

Use as tools connected to the agent in a chat-flow or agent-flow for best results

Thanks