Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Scrape PDF URL #58

Open
nagendrakumar02 opened this issue Jul 15, 2024 · 0 comments
Open

Unable to Scrape PDF URL #58

nagendrakumar02 opened this issue Jul 15, 2024 · 0 comments

Comments

@nagendrakumar02
Copy link

I'm experiencing an issue where I'm unable to scrape a PDF URL using the [library/tool name]. The URL in question is https://www.myelectric.coop/wp-content/uploads/Electric-Vehicle-Charging-Equipment-Rebates.pdf.

Also, is there an example to use crawl4ai with Azure open AI?

Steps to Reproduce:

Attempt to scrape the PDF URL using the crawl4ai
Observe that the scraping process fails or returns an error

Expected Behavior:

The crawl4ai should be able to successfully scrape the PDF URL and return the contents.

Actual Behavior:

The [library/tool name] is unable to scrape the PDF URL and returns an error or fails to complete the scraping process.

Error Message:
""" Failed to crawl https://www.myelectric.coop/wp-content/uploads/Electric-Vehicle-Charging-Equipment-Rebates.pdf, error: can only concatenate str (not "NoneType") to str"""

Reproduction Code:

def fetch_with_crawl(url):
# Create an instance of WebCrawler
crawler = WebCrawler()

# Warm up the crawler (load necessary models)
crawler.warmup()

# Run the crawler on a URL
result = crawler.run(url=url)

# Print the extracted content
# print(result.markdown)
return result.markdown

Let me know if you'd like me to add anything else to the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant