Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib.appengine.AppEngineManager #664

Merged
merged 1 commit into from Jul 17, 2015
Merged

Conversation

theacodes
Copy link
Member

This is currently very rough and early. I want to make sure I'm on the right track before continuing.

Outstanding questions:

  1. What functionality should be covered by the connection pool? (retries, etc.)
  2. Should I just inherit from TestConnectionPool and null out the tests that aren't applicable to GAE instead of duplicating code?
  3. How should this connection class be wired in? On GAE Sandbox environments w/o sockets, I assume this should be default.
  4. What exact edge cases should this class be aware; e.g, where does URLFetch behavior differ from what's expected of httplib/sockets? Do the tests already cover this?

[shazow edit: Fixes #664]


try:
from urllib.parse import urlencode
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except ImportError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@shazow
Copy link
Member

shazow commented Jul 1, 2015

@Lukasa @sigmavirus24 Would love your thoughts on some of the open questions inline:

What functionality should be covered by the connection pool? (retries, etc.)

Hm, I've been thinking about this. I think it may make more sense to have the AppEngine URLFetch driver be at the Manager-level (see poolmanager.py) rather than ConnectionPool-level because it doesn't make sense for our PoolManager/ProxyManager/etc to manage pools of GAEConnectionPool, right?

This should make things a fair bit more straight-forward. Any other opinions/ideas?

Should I just inherit from TestConnectionPool and null out the tests that aren't applicable to GAE instead of duplicating code?

Ideally there shouldn't be too many duplicate tests. Any overlapping functionality should be inherited (and thus won't need to be exercised separately) whereas any new functionality should be tested explicitly, right?

How should this connection class be wired in? On GAE Sandbox environments w/o sockets, I assume this should be default.

This is a good question, and it'll depend if we go with the ConnectionPool-level or Manager-level. The other question is: How automatic/implicit vs explicit should this be?

What exact edge cases should this class be aware; e.g, where does URLFetch behavior differ from what's expected of httplib/sockets? Do the tests already cover this?

Honestly I was hoping that's something you could inform us about. We're not too sure how URLFetch differs, as we don't use it ourselves. :/

Another thought: It may be better to start this off as a urllib3/contrib/ submodule. Our standards for those are somewhat lower, so it'll be easier to get this work merged and start to be used by the community while we figure out how we want to integrate it into the core API.

@shazow
Copy link
Member

shazow commented Jul 1, 2015

@jonparrott Btw, thank you again for volunteering to tackle this. We very much appreciate your time and effort. :)

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 1, 2015

Hm, I've been thinking about this. I think it may make more sense to have the AppEngine URLFetch driver be at the Manager-level (see poolmanager.py) rather than ConnectionPool-level because it doesn't make sense for our PoolManager/ProxyManager/etc to manage pools of GAEConnectionPool, right?

This should make things a fair bit more straight-forward. Any other opinions/ideas?

The downside of this is that GAE no longer becomes transparent to users. The ideal case would be that libraries that use urllib3 just magically work on GAE, and I don't think that'd be true if the AppEngine driver was a replacement for PoolManager instead of a replacement for ConnectionPool. Though I'm a little concern that I missed what you were aiming for here.

This is a good question, and it'll depend if we go with the ConnectionPool-level or Manager-level. The other question is: How automatic/implicit vs explicit should this be?

My vote is for automatic and implicit. The reason to do explicit is to force users to know that something different happens on GAE, but what we're doing there is pushing responsibility for testing and debugging those differences onto our users, when we should be the ones bearing that burden.

@theacodes
Copy link
Member Author

Ideally there shouldn't be too many duplicate tests. Any overlapping functionality should be inherited (and thus won't need to be exercised separately) whereas any new functionality should be tested explicitly, right?

Cool, I'll inherit from the other test and null out the cases that aren't applicable.

Honestly I was hoping that's something you could inform us about. We're not too sure how URLFetch differs, as we don't use it ourselves. :/

I can help, absolutely, but as I mentioned before I'm not intimately familiar with how URLFetch works under the covers (though I guess I could look). Most importantly, I'd like to know what things you guys are already aware of.

It may be better to start this off as a urllib3/contrib/ submodule

Fine with me.

PoolManager vs ConnectionPool

I'm cool with it either way, it seems the implementation is basically the same (urlopen). Let me know what you guys decide.

@jonparrott Btw, thank you again for volunteering to tackle this. We very much appreciate your time and effort. :)

No worries, I'm here to help make GCP a great place for Python developers.

My vote is for automatic and implicit.

We should be aware of what we're doing to users if it's automatic and implicit, and the user should always be able to switch. Here's a comparison of App Engine's environments and the caveats I'm aware of:

_ Local Dev Server GAE (Sandbox) GAE (Managed VMs Compat) GAE (Managed VMs Custom)
URLFetch ✔️ ✔️ ✔️
Sockets ✔️ limited1 ✔️ ✔️
PyOpenSSL maybe2 optional3 optional4 optional5
Custom certs depends6 depends6 ✔️ ✔️
  1. Only paid apps, can't access google apis, etc. limitations
  2. It's up to the user to have PyOpenSSL installed on their system.
  3. Sandbox users have to explicitly list this, and it's a fixed, google-provided version.
  4. Users can either use the method in 3 or modify their dockerfile / requirements to include it.
  5. Users should be able to just add this to their requirements, but failing that, can add it to their dockerfile.
  6. URLFetch doesn't support custom certs, but sockets + ssl do.

So, we need to figure out what the best implicit behavior is for each environment (I have code that can reliably detect this) and how to give users the ability to change it.

@shazow
Copy link
Member

shazow commented Jul 1, 2015

The downside of this is that GAE no longer becomes transparent to users. The ideal case would be that libraries that use urllib3 just magically work on GAE, and I don't think that'd be true if the AppEngine driver was a replacement for PoolManager instead of a replacement for ConnectionPool. Though I'm a little concern that I missed what you were aiming for here.

@Lukasa Why not? We could swap out PoolManager with GAEPoolManager at runtime same way we swap in VerifiedHTTPSConnectionPool. Or maybe we need to finally add another layer above PoolManager which would do this implicitly (we've been talking about adding a Session object? Or another name?).

The silly thing is that ConnectionPool objects are there to manage pools of connections (via our queue collections), which URLFetch does not have (those are managed behind the scenes by URLFetch). In effect, URLFetch is an interface to a ConnectionPool-type object. Then the PoolManager would make an instance of these ConnectionPool managers per host, which would really add nothing in terms of functionality—runtime complexity and stack depth (and possibly GAE-incompatible bugs therein).

Plus, it's not like any of our Manager-features would work with URLFetch without more interception too. For example, ProxyManager wouldn't work.

@theacodes
Copy link
Member Author

@shazow I've moved the pool into the contrib directory.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 1, 2015

Why not? We could swap out PoolManager with GAEPoolManager at runtime same way we swap in VerifiedHTTPSConnectionPool.

I suppose, but...ewww. It feels really weird to do that at runtime if we can possibly avoid it, though I suppose it's an acceptable approach.

Basically, I don't like it, but I can't articulate a good reason. Consider me -0 on that idea. =P

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 1, 2015

Actually, no, I think that might be fine. If it were me I'd probably take the stack depth hit, but if that's off the table, the 'patch at runtime' approach is fine.

@shazow
Copy link
Member

shazow commented Jul 1, 2015

@Lukasa Please do me a favour: sleep on it and try to articulate why you feel negative about it, I'm curious. :) Either way it's going to be a patch at runtime, right? Whether we swap the ConnectionPool or the PoolManager. Only difference is that a ConnectionPool of ConnectionPools is an extra unnecessary layer of abstraction, as far as I can tell. There's no benefit, and some potential surprises (e.g. ProxyManager not working with it).

I'd really like to get somekind of consensus on this as I think this decision will inform how we move forward in our future designs. I still suspect that adding another layer above PoolManager which automagically manages the correct things might be the best and least-surprising option.

@theacodes
Copy link
Member Author

I still suspect that adding another layer above PoolManager which automagically manages the correct things might be the best and least-surprising option.

It's your library, but adding another layer of complexity just to support GAE's sandbox doesn't sound fun to me. If there's other use cases beyond just supporting GAE for this then I'm cool with it.

@shazow
Copy link
Member

shazow commented Jul 1, 2015

It's your library, but adding another layer of complexity just to support GAE's sandbox doesn't sound fun to me. If there's other use cases beyond just supporting GAE for this then I'm cool with it.

Yea it's something we've talked about before. urllib3 in general is largely about being very manual/explicit, but more and more we move towards autodetecting/autoconfiguring which creates a schism. Putting all that magic in one place might help keep things more sane: Things like setting up good TLS defaults (use certifi if available), proxy settings (load from .netrc/env vars), platform-specific injections (like AppEngine, or PyOpenSSL), maybe someday cookie handling (ugh).

@theacodes
Copy link
Member Author

Gotcha, sounds good to me.

@t-8ch
Copy link
Contributor

t-8ch commented Jul 2, 2015

AFAIK URLfetch does not support chunked encoding, so when we use it implicitly streaming APIs will not work anymore.

status=urlfetch_resp.status_code
)

def _absolute_url(self, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move the definition from HTTPConnectionPool to ConnectionPool and use it from there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will hold off on this until the decision of pool vs manager is settled.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 2, 2015

AFAIK URLfetch does not support chunked encoding, so when we use it implicitly streaming APIs will not work anymore.

Are you serious?

From RFC 7230 Section 4.1:

A recipient MUST be able to parse and decode the chunked transfer coding.

The only way URLFetch gets around that requirement is by advertising themselves as a HTTP/1.0 client. Do they do that?

**response_kw):

retries = self._get_retries(retries, redirect)
log.info(retries)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to keep logging messages consistent with other urlopens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging statement I accidentally left in, fixed.

@shazow
Copy link
Member

shazow commented Jul 10, 2015

Looks sensible, just need to cleanup some debugging code and ideally match up logging messages with other pools.

Could use another pair of eyes, though. (@Lukasa @t-8ch @sigmavirus24?)

@theacodes
Copy link
Member Author

Removed the debugging stuff (that's what I get for trying to get this pushed before I left yesterday).

@shazow shazow added this to the v1.11 milestone Jul 16, 2015
@shazow
Copy link
Member

shazow commented Jul 16, 2015

@Lukasa Can you give this a quick look-over? I want to include this in the v1.11 early next week.

@jonparrott Would you be up for adding a bit of docs for how to use this module? It can live here for now: https://github.com/shazow/urllib3/blob/master/docs/contrib.rst -> https://urllib3.readthedocs.org/en/latest/contrib.html

Something along the lines of...

from urllib3 import PoolManager
from urllib3.contrib.appengine import AppEngineManager, is_appengine

# This substitution will be done automagically once appengine code
# graduates from the contrib module.
if is_appengine():
    # AppEngineManager uses AppEngine's URLFetch API behind the scenes
    http = AppEngineManager()
else:
    # PoolManager uses a socket-level API behind the scenes
    http = PoolManager()

# The client API should be consistent across managers, though some features are not available
# in URLFetch and you'll get warnings when you try to use them (like granular timeouts).
r = http.request('GET', 'https://google.com/')

Overall and example and little blurb for detecting which AppEngine environment you're in, a recipe for swapping out the respective manager, and that this will happen automagically as soon as AppEngineManager graduates from contrib-land. :)

Connection manager for Google App Engine sandbox applications.

This manager uses the URLFetch service directly instead of using the
emulated httplib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a list of features that AppEngineManager will throw PlatformError/Warnings for in the docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 16, 2015

@shazow I'm certainly happy to have this in contrib land for the next release.

@theacodes theacodes force-pushed the master branch 2 times, most recently from 7ce31ae to 28adf54 Compare July 16, 2015 20:47
@theacodes
Copy link
Member Author

Fixed stuff that @Lukasa mentioned and added some docs in contrib (thanks for writing the sample for me, @shazow).

shazow added a commit that referenced this pull request Jul 17, 2015
contrib: App Engine Connection Pool
@shazow shazow merged commit 0047075 into urllib3:master Jul 17, 2015
@shazow
Copy link
Member

shazow commented Jul 17, 2015

@jonparrott This is fantastic and invaluable. Thank you so much for helping us out here, I'm very excited about this and the urllib3/requests community really appreciates your contribution but even moreso: I'm sure all the AppEngine users will be just ecstatic! Make sure to link to this thread in your promo packet, you really deserve some recognition for your work. :) ❤️ 🍮 ✨

@theacodes
Copy link
Member Author

Happy to help. I'm sure this will have bugs and fun edge cases, so please mention me on any bugs that get filed on this.

@webmaven
Copy link

@jonparrott Thanks so much for responding when I sent up the Bat-Signal!

@shazow shazow changed the title App Engine Connection Pool contrib.appengine.AppEngineManager Jul 21, 2015
shazow added a commit that referenced this pull request Jul 21, 2015
@Lukasa Lukasa mentioned this pull request Oct 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants