Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting ca_certs specified as file-like object #1768

Open
ecbftw opened this issue Dec 4, 2019 · 9 comments
Open

Consider supporting ca_certs specified as file-like object #1768

ecbftw opened this issue Dec 4, 2019 · 9 comments

Comments

@ecbftw
Copy link

ecbftw commented Dec 4, 2019

Hi there. Thanks for a great library.

Here's what I'm trying to do: I'd like to provide users a trust-on-first-use (TOFU) mechanism for self-signed certificates in a complex application. I'm storing observed certificates (self-signed, or signed by a CA that isn't public) in my database and then I give the user the option whether or not to trust certain certificates. (Yes, this could be risky for the user, but the reality of the enterprise world is that no one wants to centrally manage internal CAs and TOFU is much safer than disabling validation. Do you manually verify every SSH host key you see??? ;-) )

From there, I'd like to pass a CA bundle to urllib3 (or requests, etc) via either a simple buffer or file-like object. Lo and behold, others have asked for this in #474. Back when that issue was closed in 2016, there didn't seem to be an easily supportable way to do this with the standard python library. However, I think the world may have changed since then. Consider that the standard library's SSLContext.load_verify_locations method supports a cadata argument that we could use for this. This argument was added in Python 3.4. Python 3.3 support ended on 2017-09-29.

Would it make sense now to implement this fully in urllib3 now? My suggestion is:
A) Caller may provide ca_certs as file-like object, which is easily distinguishable from a string that specifies a path
B) If a file-like object is received and urllib3 is using pyOpenSSL, then use the work around described in #474.
C) If a file-like object is received and urllib3 is using the standard library, fully read-in the contents of the file-like object and pass it as the cadata argument.

Thoughts?

@sethmlarson
Copy link
Member

What do you think about using assert_fingerprint? First calculate the SHA256 of the certificate, then pass like this to a PoolManager:

import urllib3

http = urllib3.PoolManager(
    assert_fingerprint="6FA628EDA9F8679B08F95FD7116E35D077DBB84F8108623E660E6683FDD77556",
    ...
)

You might also find my blog post which mentions a lot of things about TOFU / fingerprinting interesting: https://sethmlarson.dev/blog/2019-11-26/designing-for-real-world-https

@ecbftw
Copy link
Author

ecbftw commented Dec 14, 2019

Hi @sethmlarson, thanks for the suggestion. I wasn't aware of these assert_... options in urllib3, but I'm not sure it will give me the flexibility that I desire.

Consider a case where a service starts off using self-signed certs. We do the TOFU thing with assert_fingerprint and maybe assert_hostname, and that works OK. But then later the user of my software decides to do it the right way and generates their own CA certificate. They install it in my software and then proceed to replace various service certificates with ones signed by that CA. At that point, my software should realize the certs can be verified off of the new CA. But with assert_fingerprint, the verification would fail even though a "better" certificate is now installed.

There are likely work-arounds for this (e.g. try to connect twice with different settings,etc), but they are slow/ugly and it would just be far more flexible if I can customize my CA list according to the logic I deem appropriate up front and then just connect once with the appropriately groomed CA bundle.

Note that in my application, I'll be storing perhaps many thousands of self-signed certificates and doing TOFU against them. At this scale, one can't prompt the user for every little change (such as a transition from TOFU to a CA-signed cert). It has to be carefully thought out to be reasonably secure while still manageable.

@sigmavirus24
Copy link
Contributor

@ecbftw the fundamental limitation is in Python's own ssl library. Last I checked, it didn't even allow for this.

@ecbftw
Copy link
Author

ecbftw commented Dec 15, 2019

@sigmavirus24 Right, that was true until Python 3.4 when the cadata argument was added. See my explanation in the first post. All supported versions of Python 3 now have this, which was not the case back when #474 was closed.

@sethmlarson
Copy link
Member

sethmlarson commented Dec 16, 2019

cadata doesn't add any additional functionality to load_verify_locations(). It's a mechanism where certificates can be loaded into the SSLContext without requiring the filesystem.

I don't think the mechanism you're describing is possible without attempting multiple connections with two different SSLContext objects, one configured with CA certificates and one configured to not verify the chain of trust and to instead verify the signature of the peer certificate.

@ecbftw
Copy link
Author

ecbftw commented Dec 19, 2019

Yeah, so the whole point here is that I find it really crazy that to provide a custom bundle, I have to write it to disk. I want to dynamically generate my bundle and pass it as a parameter. What I was suggesting is a file handle so I can just do StringIO while being backward compatible. cadata is just nice so urllib3 wouldn't have to write anything to disk either. As it stands now, I have no choice but to use something like NamedTemporaryFile.

If you set the X509_V_FLAG_PARTIAL_CHAIN verify flag, then you can present a CA bundle that includes truly self-signed certificates and it will verify fine. (What I mean by truly self-signed, is that it is a single server certificate that also signed itself.)

However, if you provide a bundle that includes a server certificate that is signed by a custom CA (and that custom CA isn't in your bundle), then this approach doesn't work. So in that case, you're right, you're not really able to do TOFU that way.

I'm finding that basically... OpenSSL kinda sucks at this. It just isn't flexible for CA management. I still think my suggestion is an improvement in flexibility and shouldn't be discounted, but to achieve what I want, my only current option is to use pyOpenSSL with a OpenSSL.SSL.Context() and call the set_verify() method to set a callback. Then do the certificate validation by hand. I think that works how I want it, but now I'm not sure how to use this SSL socket I created with Requests or urllib3. Any tips? Can I subclass a PoolManager or Requests adapter?

Thanks for your help.

@ecbftw
Copy link
Author

ecbftw commented Jan 9, 2020

I have come up with a fully working solution that:

  • Forces urllib3 to use pyOpenSSL via the contrib module (which seems like a semi-legit thing)
  • Implements a pyOpenSSL set_verify callback method to validate certificates
  • The custom callback method is able to validate certificates based on a relaxed form of the traditional CA chain, and failing that, is able to perform TOFU validation on certificates
  • All validation happens during a single TLS handshake, not requiring reconnects

This is all wonderful for my end user, but it requires several monkey patches and isn't exactly elegant in places. One thing that would help tremendously is if there was a much easier way to override the pyOpenSSL callback method passed to set_verify. Any thoughts on this?

@barnettZQG
Copy link

This advice is exactly what I need. In my scenario, I need to continue to request a number of books since the visa service, certificate content is stored in the database and it can be changed, now I have to deposit before a request on the disk, in order not to affect performance, I will not every time to refresh the certificate contents of the disk, resulting in the possible certificate has expired.

@IvanLauLinTiong
Copy link
Contributor

pyOpenSSL is deprecated and will be removed in future release version 2.x (#2691).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants