Session cookies do not inherit during redirection #55

WhZzi · 2023-04-28T00:56:05Z

problem: Session cookies do not inherit during redirection
version: 0.5.4
descibe: When a session with cookie requests a redirected website A, 302 jumps to B, and the request to access B does not carry any cookies

WhZzi · 2023-04-28T01:05:56Z

from curl_cffi import requests
s2 = requests.Session(impersonate="chrome110")
s2.cookies.update({"a":"123"})
r = s2.get('http://www.baidu.com/',verify=False,proxies={"http":"127.0.0.1:8888","https":"127.0.0.1:8888"})

use fiddler to capture the requests

FrenziedIM · 2023-05-19T16:41:14Z

It's because the cookie header is in lowercase, I think the server does not pick it up. I am also facing the same issue and would love to know if it could be possible to fix.

yifeikong · 2023-05-23T06:33:39Z

Cookies are lost during transition from python cookiejar to curl cookiejar. To fix this issue, we will have to change the implementation to handle cookies entirely by either python or curl. Before it's fixed, consider hitting the HTTPS url directly, instead of hitting the http url and getting redirected.

WhZzi · 2023-05-24T17:56:07Z

The above demonstration is just an example. Typically, during the process of redirecting from one HTTPS page to another HTTPS page, it seems that cookies are also lost. I would be glad if you fix this problem.

coletdjnz · 2023-08-05T02:44:30Z

I was looking into this for our project (yt-dlp/yt-dlp#7595). Ideally we want to keep the Python CookieJar and internal curl cookie store in sync.

One way I was looking into doing this was the following:

Before every request sent to curl, clear curl cookie store and export the Python CookieJar to curl using CurlOpt.COOKIELIST
- I notice this looks to be what you are also looking at doing in master...feature/cookielist
- https://github.com/yt-dlp/yt-dlp/pull/7595/files#diff-2b612b0627302329d9cf69f275167c780c15a190f80909c4888c53b87ca2e665R90-R122
When we get a curl response, extract cookies back into Python CookieJar using CurlInfo.COOKIELIST
- Not currently supported: Add support for curl_info options, like CurlInfo.COOKIELIST #4
- You can also extract cookies using a cookiefile to curl to save to, but curl only appears to export cookies to this when it is closed (so don't think that will work for us).

It may not be entirely efficient though, especially when lots of cookies are present in the CookieJar (in our project that is not uncommon). But not sure if we can get any better, so might just be a tradeoff we have to make.

yifeikong · 2023-08-05T08:56:33Z

@coletdjnz Yes, that's exactly the way I want it to be implemented, too. In my experiment, I found that libcurl is much more efficient than pure python http clients like requests or httpx, so I guess it wouldn't hurt too much to sync cookies like this.

BTW, I noticed that you listed the libcurl error codes in your PR, would you mind if I incorporate that enum in this project?

coletdjnz · 2023-08-05T20:37:21Z

BTW, I noticed that you listed the libcurl error codes in your PR, would you mind if I incorporate that enum in this project?

Yeah absolutely, go for it. I was thinking it might be better here too.

* Preliminary support for cookie fix * Extract cookies using CurlInfo.COOKIELIST, fix #23 #55 #83 * Remove unused domain related stuff * Add CurlECode * Remove unused types.py * Add test for single request cookies

yifeikong · 2023-08-07T06:30:25Z

It seems that python's cookiejar only works with Cookiejar.make_cookies, there are very complicate processes behind it. Using set_cookie to add cookies does not work as expected.

In the docs:

It is not expected that users of http.cookiejar construct their own Cookie instances. Instead, if necessary, call make_cookies() on a CookieJar instance.

However, make_cookies requires a request/response pair, which can not be constructed when there are redirections, unless we handle redirects entirely in python, which seems to be less efficient.

The other way is to let curl handle cookies. For python, use something like a CookieProxy to read and set cookies from curl. Reopening this issue until it's implemented.

Grub4K · 2023-08-07T09:15:14Z

Using set_cookie to add cookies does not work as expected.

Here is what I gathered from looking at the CPython implementation:

CookieJar.make_cookies parses Netscape and Rfc2965 and calls _cookies_from_attrs_set for each: Rfc2965, Netscape.
That function calls _normalized_cookie_tuples and then creates the Cookie from these through _cookie_from_cookie_tuple.

_normalized_cookie_tuples does not do anything interesting, while _cookie_from_cookie_tuple sets various defaults in Cookie construction from the Request.
Here we get some differences in the handling due to the current curl_cffi implemetation:

According to https://curl.se/docs/http-cookies.html, expires can be 0.
For CookieJar however, that would have to be translated to expires=None, discard=True. (Ref)
While not necessarily relevant, before setting path, if defined, it will get escaped through escape_path. (Ref)

So if these quirks are accounted for it should be fine if we create our own Cookie. What other roadblock were you thinking of?

yifeikong · 2023-08-07T10:12:47Z

Thanks for the investigation! Another quirk I noticed is the handling of initial dots mentioned in #97, which may be related to cookiejar implementation here.

The other way I was thinking is to handle cookies all by curl, this is a curl wrapper anyway. It looks like this:

class CookieProxy:
    def __init__(self, curl):
        self.curl = curl
        self.jar = FakeJar()  # cookie iterable for compatibility

    def get(self, name, domain=None, ...):
        cookies = self.curl.getinfo(CurlInfo.COOKIELIST)
        for cookie in cookies:
            cookie = self._parse_curl_cookie(cookie)
            if cookie.name == name:
                return cookie.value

    def set(self, name, value, domain=None, ...):
        cookie_line = self._dump_curl_cookie(name, value, ...)
        self.curl.setopt(CurlOpt.COOKIELIST, cookie_line)

    def __setitem__(self, name, value):
        self.set(name, value)
    ...

The benefits could be:

less code, no need to set and extract cookies in Session and add to CookieJar.
better performance, cookies are only extracted when needed, not for every request.

coletdjnz · 2023-08-11T21:28:55Z

Thanks for the investigation! Another quirk I noticed is the handling of initial dots mentioned in #97, which may be related to cookiejar implementation here.

The other way I was thinking is to handle cookies all by curl, this is a curl wrapper anyway. It looks like this:

class CookieProxy:
    def __init__(self, curl):
        self.curl = curl
        self.jar = FakeJar()  # cookie iterable for compatibility

    def get(self, name, domain=None, ...):
        cookies = self.curl.getinfo(CurlInfo.COOKIELIST)
        for cookie in cookies:
            cookie = self._parse_curl_cookie(cookie)
            if cookie.name == name:
                return cookie.value

    def set(self, name, value, domain=None, ...):
        cookie_line = self._dump_curl_cookie(name, value, ...)
        self.curl.setopt(CurlOpt.COOKIELIST, cookie_line)

    def __setitem__(self, name, value):
        self.set(name, value)
    ...

The benefits could be:

* less code, no need to set and extract cookies in Session and add to CookieJar.

* better performance, cookies are only extracted when needed, not for every request.

Hmm true, something like that could work for curl_cffi to keep things simple. I do recall seeing there may be multiple curl instances for a session though (due to threading?) - could that be an issue for this?

Now that CurlInfo.COOKIELIST is working, projects that need more complex/specific cookie handling can always implement their own method on their end (such as full cookie sync in the instance of our project).

yifeikong · 2023-08-14T07:54:17Z

Hmm true, something like that could work for curl_cffi to keep things simple. I do recall seeing there may be multiple curl instances for a session though (due to threading?) - could that be an issue for this?

Curl has dedicated functions for this already - curl_share_*, which should have been exposed anyway. I think I will add this as a second option for handling cookies along side of cookiejar.

Now that CurlInfo.COOKIELIST is working, projects that need more complex/specific cookie handling can always implement their own method on their end (such as full cookie sync in the instance of our project).

I'm still pondering on how to sync cookies between curl and cookiejar. Because of the different formats of cookies and indirect interface for adding/updating cookies in the cookiejar, there are more to consider.

Going back to the original issue here, we were using cookie: header to tell curl what cookies to use in outgoing requests. Since curl's cookie engine is not activated, when redirect happens, the second request is sent without cookies. When the final response arrives, cookies are striped from set-cookie headers and updated in the cookiejar.

It's very easy to sync cookies in this way, since the primary interfaces(extract_cookies/add_cookie_headers) exposed by cookiejar are based on cookie/set-cookie headers.

To solve the missing cookies problem above, the related changes on master branch are:

expose curl.getinfo(CurlInfo.COOKIELIST) which is not exposed before.
enable the cookie engine of curl by curl.setopt(CurlOpt.COOKIEFILE, "") or whatsoever.
use CURLOPT_COOKIELIST to send cookies with curl instead of headers.
use CURLINFO_COOKIELIST to extract cookies from curl.

However, as mentioned above, curl's cookie format does not work well with cookiejar's internal format. We have to consider expires, initial_dot, path and so one. To work around this issue, we can revert to cookiejar.extract_headers while keeping other changes, which should work since it's just a replay of how cookie changes on python side, i.e.

~~use CURLINFO_COOKIELIST to extract cookies from curl.~~
extract cookies from set-cookie headers of all history redirects in visit order.

What do you think?

coletdjnz · 2023-08-19T07:06:23Z

In theory you should be able to full sync cookies - at the end of the day they should all be following the same standard. It might just require getting into the nitty gritty of how to translate the two. When I get some time I wouldn't mind looking into this more.

Another alternative could be to take a requests-like approach - handle the redirects in curl_cffi. That way you could use extract_cookies/add_cookie_headers for redirects too. But redirect handling can get a bit tricky too - lots of little things you have to follow (could inherit from requests? though, requests handling is not perfect either).

Going back to basics: is there no way to get information about the intermediate requests? (any curl callback options helpful?) If we can get the request urls and response headers as it goes through redirects then that would make things easier.

in worst case maybe you could parse the debug info 😅? Though, this might require verbose to be on so might not be viable. https://curl.se/libcurl/c/CURLOPT_DEBUGFUNCTION.html
https://curl.se/libcurl/c/CURLOPT_HEADERFUNCTION.html implies you can get headers for all the requests?
some others to maybe look at: https://curl.se/libcurl/c/CURLOPT_IOCTLFUNCTION.html, https://curl.se/libcurl/c/CURLOPT_SEEKFUNCTION.html https://curl.se/libcurl/c/CURLOPT_PREREQFUNCTION.html

yifeikong · 2023-08-20T16:06:24Z

https://curl.se/libcurl/c/CURLOPT_HEADERFUNCTION.html implies you can get headers for all the requests?

Yes, you can read the response header lines for each request from the preset header BytesIO buffer. However, to use the cookiejar.extract_cookies function correctly, you will have to construct each request object, since it's needed here and here. It's even more complicated and cumbersome than CURLINFO_COOKIELIST with cookiejar.make_cookies. 😂

yifeikong · 2023-08-23T15:09:25Z

I uploaded a beta version to PyPI, cookies seem to be correct, I also changed the wrong option used for POST request, as I found here. Besides, the error code is exposed here, so you don't have to use a regex to extract it. You can try to see if it fits your needs when you have some free time.

pip install curl-cffi==0.5.9b2

WhZzi · 2023-08-24T13:31:19Z

I test same codes with version 0.5.9b2, it has perfectly solved this problem, thank for you and all guys maintenance 😊😊

coletdjnz · 2023-08-25T08:41:32Z

I uploaded a beta version to PyPI, cookies seem to be correct, I also changed the wrong option used for POST request, as I found here. Besides, the error code is exposed here, so you don't have to use a regex to extract it. You can try to see if it fits your needs when you have some free time.
pip install curl-cffi==0.5.9b2

Awesome, thanks, this will help heaps 😃

I also changed the wrong option used for POST request, as I found here

sweet - seems to be working better. Though I'm still having some redirect issues (e.g. think content-length header is being sent on GET after POST->GET redirect?). I'll have to investigate - might just be our test suite being too pedantic.

Besides, the error code is exposed here, so you don't have to use a regex to extract it.

Ah, missed that. Though, it isn't copied over to the RequestsError so I have to access with RequestError.args[0].code.

yifeikong · 2023-08-25T15:03:16Z

These two issues should be fixed in 0.5.9b3

yifeikong self-assigned this May 23, 2023

yifeikong added the bug Something isn't working label May 23, 2023

yifeikong added this to the v0.6 milestone May 23, 2023

yifeikong mentioned this issue Jul 15, 2023

Cookies are lost between GET and POST requests in a session #83

Closed

coletdjnz mentioned this issue Aug 5, 2023

[rh:curlcffi] Add support for curl_cffi (curl-impersonate) yt-dlp/yt-dlp#7595

Closed

9 tasks

yifeikong added a commit that referenced this issue Aug 5, 2023

Extract cookies using CurlInfo.COOKIELIST, fix #23 #55 #83

a5736b3

yifeikong closed this as completed in a8dd6d1 Aug 6, 2023

yifeikong reopened this Aug 7, 2023

yifeikong mentioned this issue Aug 17, 2023

version 0.5.8 #102

Closed

WhZzi closed this as completed Aug 24, 2023

T-256 mentioned this issue Dec 26, 2023

sync-to-date docs #188

Merged

yifeikong mentioned this issue Apr 1, 2024

[Feature] Remove big overhead in BaseSession cookies section #285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session cookies do not inherit during redirection #55

Session cookies do not inherit during redirection #55

WhZzi commented Apr 28, 2023

WhZzi commented Apr 28, 2023 •

edited

FrenziedIM commented May 19, 2023

yifeikong commented May 23, 2023

WhZzi commented May 24, 2023

coletdjnz commented Aug 5, 2023

yifeikong commented Aug 5, 2023

coletdjnz commented Aug 5, 2023

yifeikong commented Aug 7, 2023

Grub4K commented Aug 7, 2023

yifeikong commented Aug 7, 2023

coletdjnz commented Aug 11, 2023

yifeikong commented Aug 14, 2023 •

edited

coletdjnz commented Aug 19, 2023 •

edited

yifeikong commented Aug 20, 2023

yifeikong commented Aug 23, 2023

WhZzi commented Aug 24, 2023

coletdjnz commented Aug 25, 2023

yifeikong commented Aug 25, 2023

Session cookies do not inherit during redirection #55

Session cookies do not inherit during redirection #55

Comments

WhZzi commented Apr 28, 2023

WhZzi commented Apr 28, 2023 • edited

FrenziedIM commented May 19, 2023

yifeikong commented May 23, 2023

WhZzi commented May 24, 2023

coletdjnz commented Aug 5, 2023

yifeikong commented Aug 5, 2023

coletdjnz commented Aug 5, 2023

yifeikong commented Aug 7, 2023

Grub4K commented Aug 7, 2023

yifeikong commented Aug 7, 2023

coletdjnz commented Aug 11, 2023

yifeikong commented Aug 14, 2023 • edited

coletdjnz commented Aug 19, 2023 • edited

yifeikong commented Aug 20, 2023

yifeikong commented Aug 23, 2023

WhZzi commented Aug 24, 2023

coletdjnz commented Aug 25, 2023

yifeikong commented Aug 25, 2023

WhZzi commented Apr 28, 2023 •

edited

yifeikong commented Aug 14, 2023 •

edited

coletdjnz commented Aug 19, 2023 •

edited