-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session cookies do not inherit during redirection #55
Comments
It's because the cookie header is in lowercase, I think the server does not pick it up. I am also facing the same issue and would love to know if it could be possible to fix. |
Cookies are lost during transition from python cookiejar to curl cookiejar. To fix this issue, we will have to change the implementation to handle cookies entirely by either python or curl. Before it's fixed, consider hitting the HTTPS url directly, instead of hitting the http url and getting redirected. |
The above demonstration is just an example. Typically, during the process of redirecting from one HTTPS page to another HTTPS page, it seems that cookies are also lost. I would be glad if you fix this problem. |
I was looking into this for our project (yt-dlp/yt-dlp#7595). Ideally we want to keep the Python CookieJar and internal curl cookie store in sync. One way I was looking into doing this was the following:
It may not be entirely efficient though, especially when lots of cookies are present in the CookieJar (in our project that is not uncommon). But not sure if we can get any better, so might just be a tradeoff we have to make. |
@coletdjnz Yes, that's exactly the way I want it to be implemented, too. In my experiment, I found that libcurl is much more efficient than pure python http clients like requests or httpx, so I guess it wouldn't hurt too much to sync cookies like this. BTW, I noticed that you listed the libcurl error codes in your PR, would you mind if I incorporate that enum in this project? |
Yeah absolutely, go for it. I was thinking it might be better here too. |
It seems that python's cookiejar only works with In the docs:
However, The other way is to let curl handle cookies. For python, use something like a |
Here is what I gathered from looking at the CPython implementation:
So if these quirks are accounted for it should be fine if we create our own |
Thanks for the investigation! Another quirk I noticed is the handling of The other way I was thinking is to handle cookies all by curl, this is a curl wrapper anyway. It looks like this: class CookieProxy:
def __init__(self, curl):
self.curl = curl
self.jar = FakeJar() # cookie iterable for compatibility
def get(self, name, domain=None, ...):
cookies = self.curl.getinfo(CurlInfo.COOKIELIST)
for cookie in cookies:
cookie = self._parse_curl_cookie(cookie)
if cookie.name == name:
return cookie.value
def set(self, name, value, domain=None, ...):
cookie_line = self._dump_curl_cookie(name, value, ...)
self.curl.setopt(CurlOpt.COOKIELIST, cookie_line)
def __setitem__(self, name, value):
self.set(name, value)
... The benefits could be:
|
Hmm true, something like that could work for curl_cffi to keep things simple. I do recall seeing there may be multiple Now that |
Curl has dedicated functions for this already -
I'm still pondering on how to sync cookies between curl and cookiejar. Because of the different formats of cookies and indirect interface for adding/updating cookies in the cookiejar, there are more to consider. Going back to the original issue here, we were using It's very easy to sync cookies in this way, since the primary interfaces( To solve the missing cookies problem above, the related changes on master branch are:
However, as mentioned above, curl's cookie format does not work well with cookiejar's internal format. We have to consider
What do you think? |
In theory you should be able to full sync cookies - at the end of the day they should all be following the same standard. It might just require getting into the nitty gritty of how to translate the two. When I get some time I wouldn't mind looking into this more. Another alternative could be to take a requests-like approach - handle the redirects in curl_cffi. That way you could use extract_cookies/add_cookie_headers for redirects too. But redirect handling can get a bit tricky too - lots of little things you have to follow (could inherit from requests? though, requests handling is not perfect either). Going back to basics: is there no way to get information about the intermediate requests? (any curl callback options helpful?) If we can get the request urls and response headers as it goes through redirects then that would make things easier.
|
Yes, you can read the response header lines for each request from the preset header BytesIO buffer. However, to use the |
I uploaded a beta version to PyPI, cookies seem to be correct, I also changed the wrong option used for POST request, as I found here. Besides, the error code is exposed here, so you don't have to use a regex to extract it. You can try to see if it fits your needs when you have some free time.
|
I test same codes with version 0.5.9b2, it has perfectly solved this problem, thank for you and all guys maintenance 😊😊 |
Awesome, thanks, this will help heaps 😃
sweet - seems to be working better. Though I'm still having some redirect issues (e.g. think content-length header is being sent on GET after POST->GET redirect?). I'll have to investigate - might just be our test suite being too pedantic.
Ah, missed that. Though, it isn't copied over to the |
These two issues should be fixed in 0.5.9b3 |
problem: Session cookies do not inherit during redirection
version: 0.5.4
descibe: When a session with cookie requests a redirected website A, 302 jumps to B, and the request to access B does not carry any cookies
The text was updated successfully, but these errors were encountered: