Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error downloading Html - Exception Message #171

Closed
sharathm89 opened this issue Apr 12, 2018 · 10 comments
Closed

Error downloading Html - Exception Message #171

sharathm89 opened this issue Apr 12, 2018 · 10 comments
Assignees

Comments

@sharathm89
Copy link

Trying to scrape this Link but unable to do it..

It throws an exception with the message has Error downloading Html

9zq7o

 async public static Task<HtmlDocument> GetDocument()
    {
        HtmlDocument doc = null;
        string url = "https://www.finedininglovers.com/recipes/appetizer/vegan-dishes-white-asparagus/";
        try
        {
            HtmlWeb web = new HtmlWeb();
            doc = await web.LoadFromWebAsync(url);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
            Console.WriteLine(ex.StackTrace);
        }
        return doc;
    }

Tried setting Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 as the UserAgent but still not working

@JonathanMagnan JonathanMagnan self-assigned this Apr 12, 2018
@JonathanMagnan
Copy link
Member

Hello @sharathm89 ,

Unfortunately the server return the following error:

{StatusCode: 500, ReasonPhrase: 'Internal Server Error', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
  x-frame-options: DENY
  X-UA-Compatible: IE=Edge
  X-Iinfo: 8-41929732-41929787 SNNN RT(1523536424411 339) q(0 0 0 -1) r(1 1) U11
  X-CDN: Incapsula
  Transfer-Encoding: chunked
  Cache-Control: private
  Date: Thu, 12 Apr 2018 12:33:42 GMT
  Server: 
  Content-Type: text/html; charset=utf-8
}}

However, the NonAsync version work fine.

HtmlAgilityPack.HtmlDocument doc = null;
string url = "https://www.finedininglovers.com/recipes/appetizer/vegan-dishes-white-asparagus/";

HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
doc = web.Load(url);
var html = doc.DocumentNode.OuterHtml;

So you can use it meanwhile we investigate the issue.

Best Regards,

Jonathan

@sharathm89
Copy link
Author

thanks @JonathanMagnan

@JonathanMagnan
Copy link
Member

Hello @sharathm89 ,

The v1.8.1 has been released.

You should no longer have the issue with the Async method.

Best Regards,

Jonathan

@sharathm89
Copy link
Author

sharathm89 commented Apr 29, 2018

@JonathanMagnan still the issue exists with latest v1.8.1 below is the code I tested. Url also mentioned.

Async throws An error occurred while sending the request.

Non Async throws The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF

It used to work earlier but I guess after version upgrade its failing.

   class Program
    {
        const string url = "https://www.finedininglovers.com/recipes/appetizer/vegan-dishes-white-asparagus/";
        static void Main(string[] args)
        {
            try
            {
                GetHtmlDocumentAsync().GetAwaiter().GetResult();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);  // An error occurred while sending the request.
            }

            try
            {
                GetHtmlDocument();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);  // The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF
            }
            Console.ReadLine();
        }

        async public static Task<HtmlDocument> GetHtmlDocumentAsync()
        {
            HtmlWeb web = new HtmlWeb();
            return await web.LoadFromWebAsync(url);
        }

        public static HtmlDocument GetHtmlDocument()
        {
            HtmlWeb web = new HtmlWeb();
            return web.Load(url);
        }
    }

capture

@JonathanMagnan
Copy link
Member

Hello @sharathm89 ,

Thank you for the additional info.

We will continue to look at it.

Best Regards,

Jonathan

@sharathm89
Copy link
Author

thanks @JonathanMagnan

@JonathanMagnan
Copy link
Member

Hello @sharathm89 ,

We tried your code but everything is working on our side ;(

Could you try it and let us know what we are missing?

HtmlAsync.zip

Best Regards,

Jonathan

@sharathm89
Copy link
Author

@JonathanMagnan I tried the same code but sometimes it happens actually after reporting the issue I tried after 3 hours it worked but again 2 days back when I tried got same error. Now I tried its working...

So its not occurring every-time....

@JonathanMagnan
Copy link
Member

Hello @sharathm89 ,

That is probably due to some bot detection that BAN an ip that had made to many requests in a very short delay.

There is nothing we can do at this moment for such error ;(

Best Regards,

Jonathan

@sharathm89
Copy link
Author

@JonathanMagnan Probably so in that case I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants