Skip to content

feat: added 3 Bright Data web scraping tools #4700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Idanvilenski
Copy link

Added 3 web scraping tools powered by Bright Data
Structured Data tool - contains 40+ different data sets with auto-select according to the website in the URL
Web Unlocker tool - Unlocks any website with blocking bypass
Search Engine tool - Use Bright Data to search Bing Google or Yandex.

You can find the tools in the tool section on the tools section. (under the "LangChain colomn)
image (2)

Use as tools connected to the agent in a chat-flow or agent-flow for best results
image (1)

Thanks

- Add BrightDataWebScraper: Web scraping with markdown/HTML output
- Add BrightDataSearchEngine: Multi-engine search (Google, Bing, Yandex)
- Add BrightDataStructuredData: 40+ dataset auto-detection and extraction

All tools include:
- Comprehensive error handling
- Configurable timeouts and zones
- FlowiseAI integration patterns
- Debug logging for troubleshooting
The update is containing the components by Bright Data
- Fixed YouTube video/comments dataset ID conflict
- Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com
- Enhanced tool descriptions to include all 40+ supported platforms
- Improved pattern detection order for better matching
- Added comprehensive platform support documentation
- Fixed YouTube video/comments dataset ID conflict
- Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com
- Enhanced tool descriptions to include all 40+ supported platforms
- Improved pattern detection order for better matching
- Added comprehensive platform support documentation
@HenryHengZJ HenryHengZJ requested a review from Copilot June 24, 2025 10:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds two new Bright Data–powered tools and a credential definition to support web scraping and search functionality.

  • Introduces BrightDataWebScraperTool for page scraping with Bright Data Web Unlocker.
  • Implements BrightDataSearchEngineTool for paginated search results from Google, Bing, and Yandex.
  • Defines BrightDataApiCredential for managing Bright Data API tokens.

Reviewed Changes

Copilot reviewed 3 out of 10 changed files in this pull request and generated 4 comments.

File Description
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts Web scraper tool implementation and node registration
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts Search engine tool with pagination and error handling
packages/components/credentials/BrightData.credential.ts Bright Data API credential definition
Comments suppressed due to low confidence (1)

packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts:125

  • [nitpick] The class name contains an underscore; rename to 'BrightDataWebScraperTools' to follow PascalCase naming conventions and maintain consistency.
class BrightDataWebScraper_Tools implements INode {

@HenryHengZJ
Copy link
Contributor

thanks! can you remove the redundant folder shared, and allow edit for maintainer?

@Idanvilenski
Copy link
Author

Hi @HenryHengZJ ,

I've removed the redundant shared folder as requested. Regarding "allow edits for maintainer" - this option is not available for PRs from organization forks (brightdata) due to GitHub's policy.

GitHub only allows this feature for personal account forks. If you need to make edits, I'm happy to implement any changes you suggest through the normal review process. or submit from a personal account (but the tool will have to be under BrightData)

Thanks!

@0xi4o
Copy link
Contributor

0xi4o commented Jul 4, 2025

Hey @Idanvilenski. I tried using the Brightdata tools in my chatflow and I can't get them to work. I keep getting 400 errors during tool call. Let me know if I'm doing something wrong or if I should follow certain steps (so we can document it). I'm using the API key from a free Brightdata account.

Flowise-Build-AI-Agents-Visually-07-04-2025_03_03_PM
Flowise-Build-AI-Agents-Visually-07-04-2025_03_02_PM
Flowise-Build-AI-Agents-Visually-07-04-2025_03_01_PM

@Idanvilenski
Copy link
Author

Hi, @0xi4o thanks for checking out the component,

I'm sorry to see that you have problems with the component, since I can see that the tools are being called correctly by the agent, I think this is an API issue.

Please make sure you have "Admin permissions" for your API key on the Bright Data website (like in the picture) - let me know if thats not the case.
image

Also, I noticed that the agent is trying to use the search_engine function (which is used for serp searches on google yandex and bing) to perform the web_unlocker / structured_data actions (extract data from a specific website) - we will look into that from our end.

Please look at the permissions issue and let me know if that was the problem.

Thanks,
Idan

@0xi4o
Copy link
Contributor

0xi4o commented Jul 9, 2025

Hey @Idanvilenski. Unfortunately, I'm still running into the same issues. I used an API key with admin permissions.

Bright-Data-Web-Data-Platform

I did test out the tools individually, and got different errors for each one:

Search Engine:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_07_PM

Structured Data:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_00_PM

Web Scraper:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_04_PM

@Idanvilenski
Copy link
Author

Idanvilenski commented Jul 9, 2025

Hey @0xi4o , I am sorry about the slow process.

Regarding the Search Engine tool:
I tried it now successfully, I suspect it was one of 2 problems:

  • Not pressing save before running the flow - I get the same results as you if I don't save the flow before running
  • In the "Additional Parameters" section - Add a description for the tool (like : "use this tool to perform search on any search engine - the result will be a list of URLs" to help the agent know how to call the tool), works without it but good practice.
    That was the result I got for the same prompt:
    image

Regarding the Structured Data tool:
You entered the URL "www.example.com" - note that you need to add a real URL, because we use regex to parse the URL and use the relevant data set for that request, you can try to use "https://www.walmart.com/ip/Apple-MacBook-Air-13-3-inch-Laptop-Space-Gray-M1-Chip-Built-for-Apple-Intelligence-8GB-RAM-256GB-storage/609040889?classType=VARIANT&athbdg=L1800" instead.

Regarding the Web Scraper tool:
I appologize for that, it was a problem we had for a few hours - its fixed now.

Here is an example for a more comprehensive use of search + structured data extraction, note that some times its not working / the correct answer arrives after error message because the agent receives the tool's response after answering in the chat (for me when it happened the agent gave the correct answer without additional prompt after a couple of seconds).
https://github.com/user-attachments/assets/d572bc1b-98e6-4378-b24d-a9b8f5f0a06f

Let me know if everything works!
Thanks,
Idan

@0xi4o
Copy link
Contributor

0xi4o commented Jul 11, 2025

@Idanvilenski

I made sure to save and added "use this tool to perform search on any search engine - the result will be a list of URLs" as the tool description in additional parameters. I'm still getting the same result.
Flowise-Build-AI-Agents-Visually-07-10-2025_03_34_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_10_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_09_PM

For the structured data tool, Walmart links work fine but not Amazon links.
Flowise-Build-AI-Agents-Visually-07-10-2025_03_25_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_23_PM

So I added some logs and seems like the site detection for Amazon is not working correctly.

walmart-link-detection amazon-link-detection

I'm still getting the same error for web scraper:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_04_PM

@Idanvilenski
Copy link
Author

Regarding the structured data -
there was a problem only with the amazon product data - fixed now (thank you for pointing this out), please try to pull the changes and try again.
I used this system prompt when testing the structured data, it will improve your results:
"
You are a helpful AI assistant.
Your input is a URL
You will insert this URL into your tools and output the response
Important - your response must ALWAYS contain ALL the details you receive from the tool
"

Regarding the search engine -
Note that the search "PlayStation 5 Pro site:amazon" yields no search results:

image

Searching for "PlayStation 5 Pro" will work better (I also recommend changing system prompt to display links - "You are a helpful AI assistant.
Your input is a search phrase - you will input it into your tool
As your response you will display the full data that was extracted - INCLUDING LINKS"), result is:

image

If after trying to change the system prompt and prompt, you could send me the logs for the Search Engine request it will be helpful so I can understand the problem - from our end we tried it multiple times and didn't receive this problem.

Regarding the web scraper -
Please use the same system prompt as detailed for the structured data
I tried the same URL and received a correct answer:

image image

You can send me the logs so I can understand the problem.

Also you will receive better results if you will change the temperature in your LLM of choice to 0.1 instead of 0.9 since the responses and tool handling will be more accurate.

Thank you,
Idan

@Idanvilenski
Copy link
Author

Hi @0xi4o ,

Did you have a chance to try following the last comment?

Thanks,
Idan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants