Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outgoing connection for LDAP/AD #157

Closed
tkrajca opened this issue Feb 20, 2014 · 30 comments
Closed

Outgoing connection for LDAP/AD #157

tkrajca opened this issue Feb 20, 2014 · 30 comments

Comments

@tkrajca
Copy link

tkrajca commented Feb 20, 2014

Hi,

I am wondering, would it be useful (in terms of a number of users) to create a native outgoing connection service for Ldap/AD?

Currently, we store our LDAP credentials (host, port, bind_dn, bind_password) in Redis, retrieve them from our Service, establish a new connection and then use this connection to do something useful. The small downside is that an Ldap/AD connection is established for every invocation of this service or any other services that connect to Ldap, otherwise it works fine.

We are generally happy to write a native outgoing connection for LDAP/AD and submit it back to Zato via a pull request providing there is enough need for it.

@slurms
@adonm

@dsuch
Copy link
Collaborator

dsuch commented Feb 20, 2014

Hi @tkrajca and thanks for asking!

LDAP outconns would certainly come in handy, that's for sure and I think we could still deliver it for Zato 1.2.

As it happens I haven't worked with LDAP in Python but generally speaking something I have on my mind was adding both LDAP connections and LDAP searches.

The latter means the ability to specify templates in the form of

  (&(objectClass=person)(|(givenName={givenName})(mail={mail})

And later on in the code you'd do

  for item in self.ldap.search('My Query Name', givenName='Alice', mail='alice*):
      # Here each item would be available from a remote resource

Both features need core code for interfacing with LDAP and some glueing code to make it work within Zato - GUI, services and updates to workers so everything is always cached locally.

If you could please tell me what you'd be interested to work on I'd be able to guide you next. Naturally, I will certainly appreciate any assistance. Please don't feel compelled to implement the whole of it.

One more question - is the LDAP client library you're using thread-safe or not? If not, we'd need to employ a queue of connections. Also, is it a pure-Python one?

Cheers!

@slurms
@adonm

@tkrajca
Copy link
Author

tkrajca commented Feb 21, 2014

Hi @dsuch,

Thanks, that all sounds great.

Python Ldap integration is pretty easy with python-ldap:

    def _get_ldap(self, name=''):
        server_uri = self.kvdb.conn.get('server_uri{0}'.format(name))
        bind_dn = self.kvdb.conn.get('bind_dn{0}'.format(name))
        bind_password = self.kvdb.conn.get('bind_password{0}'.format(name))

        if not (server_uri and bind_dn and bind_password):
            return None

        conn = ldap.initialize(server_uri)
        try:
            conn.simple_bind_s(bind_dn, bind_password)
        except ldap.INVALID_CREDENTIALS:
            return None
        else:
            return conn

I certainly like the idea about Ldap searches.

I already fixed the Django/GUI for creating new Ldap outgoing connections, we use django a lot so that was pretty easy. I haven't started with the backend integration/gluing as I was not quite sure where to start. I figured I would probably start by writing ./src/zato/server/service/internal/outgoing/ldap.sql for CRUD operations on ldap connections, where would I go from there? Could you please briefly describe how it's all glued together (when and where the connections get created, how do they get passed to Service, etc.)? I have no problems reading source code so it doesn't need to be overly detailed, high-level overview is fine. I am happy to work on the Ldap connection itself as I have already started a little bit.

Yes, the Ldap client is pure-Python but it needs python-ldap which requires libsasl2-dev and libldap2-dev for building (on debian). I am not sure if it's thread-safe but I'll try to figure that out.

@dsuch
Copy link
Collaborator

dsuch commented Feb 23, 2014

Thanks @tkrajca - the code looks nice.

I think it will be best to provide you with information step by step, starting with GUI going deeper and explaining as we go. Before I start though - could you please tell me how much you already have? Can you update the code somewhere?

Broadly speaking, the steps, for connections alone, will be:

  • Add JS, forms, templates and views to Django
  • Templates and JS use a simple framework for passing data received from views around - note that it was originally written 2009 with Dojo in mind and since then I'm sure many cool JS libraries already implement such things
  • Views invoke Zato services using req.zato.client
  • Services need an SQL model defined in zato.common.odb
  • Each component, such as LDAP, has GetList, Create, Edit and Delete services
  • GetList operates on SQLAlchemy based queries in zato.common.odb as well
  • Each service needs to be listed in spring_context.py (zato-server)
  • Create/Edit/Delete need to publish internal reconfiguration messages on a broker so that other servers learn of the fact that a new object has been created/updated/deleted
  • Such messages are picked by worker.py (zato-server) which has a handler method for each message received
  • This method does what needs be done, e.g. deletes details of a connection from a dict of connections
  • If a client library is thread-safe, a mere dict will suffice indeed
  • If it's not thread-safe, a queue connections needs to be created (SudsSOAPWrapper in zato-server as an example)
  • Note that in our context thread-safe means greenlet-safe because we have gevent under the hood but it works the same
  • Each starting server needs always to read all the information from ODB regarding everything and pass it along to worker.py for it to construct connections or other objects
  • Each service needs a facade that is later exposed in self.foo, so there will be an LDAPFacade that will be known as self.ldap and this will be the thing that will be a thin wrapper around worker.py

Yes, well, I think this is how a general description would look like :-)

You'll probably find working with internal services the most time-consuming part - after each change a server needs be restarted, i.e. internal services aren't hot-deployed.

Let's delve into details, including links to pieces of code implementing each part, as we progress, shall we?

If you'd already like to have a look at something then please check out Security Basic Auth on Django end, this is how LDAP GUI's code should look like - note though that SimpleIO in web-admin's views is not really server-side SIO, it uses the same name with the view of implementing SIO in web-admin in the future but it's not done yet.

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/web/views/security/basic_auth.py

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/templates/zato/security/basic-auth.html

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/static/js/security/basic-auth.js

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/web/forms/security/basic_auth.py

Also - please add a zato.ldap namespace to common.js before starting to work on JS

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/static/js/common.js

@tkrajca
Copy link
Author

tkrajca commented Feb 24, 2014

Hi @dsuch,

We forked your repo here:

https://github.com/dpaw2/zato

I am not sure I did the right thing though, I started to work on release/1.1 and installed hotfixes on top of that. I'll try to get my changes into our master without the hotfixes (I believe the hotfixes are part of the master anyway, right?).

@tkrajca
Copy link
Author

tkrajca commented Feb 24, 2014

Thanks for the description, it's pretty good, I am confident I can get started from there.

Thanks for the links, I used the sql outgoing connections code as inspiration (instead of Security basic auth), they look pretty similar anyway.

@tkrajca
Copy link
Author

tkrajca commented Feb 24, 2014

I think the ldap client is not thread-safe, which of the other outgoing connections are thread-safe?

@tkrajca
Copy link
Author

tkrajca commented Feb 24, 2014

I squashed the GUI changes into our master https://github.com/dpaw2/zato/commit/749b52af1b298150f0f3eb61632dbfb9ae727fa7

@dsuch
Copy link
Collaborator

dsuch commented Feb 24, 2014

Hi @tkrajca - the code looks fine although some things are superflous, for instance LDAP won't probably need an 'engine' field, this is something SQL needs. But naturally this becomes easier once backend is there and services reply with proper data.

Another point is that you are using an older approach to develop Django views - the new one is class-based and much more declarative, check it out here

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/web/views/security/oauth.py

https://github.com/zatosource/zato/blob/master/code/zato-web-admin/src/zato/admin/web/views/channel/zmq.py

However, it is cool if you continue to use the older one - I'll just rewrite it myself before merging it into master.

The last thing would be to base your work off master instead of any branch related to 1.1. You are right that /usually/ what is in hotfixes gets applied to master but it's not a rule. For instance, I'm planning to do away with zdaemon for servers however this will be done in master only, i.e. in what will become 1.2 with time. 1.1 will continue to use zdaemon for as long as 1.1 is supported so here it won't be the case that hotfixes and master contain the same.

This ticket has number 157 so let's develop the feature in a branch dpaw-f-gh157-ldap-outconns in your fork. Also, I'm really open to any sort - if at all - of rebasing, squashing or anything and the only thing I'd like to ask for is to prefix all commits with "GH #157", so for instance "GH #157 - here goes a message" lets us later on understand what feature a given commit came from - and there always be more than one commit so --no-ff is not enough.

In any case, good work on the frontend :-)

Now for backend, we need SQLAlchemy models be added here

https://github.com/zatosource/zato/blob/master/code/zato-common/src/zato/common/odb/model.py

you can base it on XPath or ElemPath (for instance).

Next at least two queries are needed here

https://github.com/zatosource/zato/blob/master/code/zato-common/src/zato/common/odb/query.py

Please check out _service, service and service_list for how it's done - basically, the first one is a common query that the two others use to return either a single object or a list of such objects.

No migrations are necessary - I can add them before 1.2 is released however if you'd like to add them please find them here https://github.com/zatosource/zato/tree/master/code/alembic - but I personally find it much more productive to add them all in one session before a full release.

That would be it for now. Please let me know if I'm not providing you with too much or too few details and if the pace is good?

@tkrajca
Copy link
Author

tkrajca commented Feb 25, 2014

Hi,

Another point is that you are using an older approach to develop Django views - the new one is class-based and much more declarative, check it out here

I am actually more comfortable using class-based views but because the SQL outgoing connections are done that way, I though that was the "standard" for Zato.

We actually don't even use class-based views for most of our apps, we utilize django's ModelAdmin. Basically, we hook all our models to correlating admin models and then customize the admin models. Usually, all the admin models inherit from a common base so all common behavior is in there. That way, it's really DRY and there is as little code repetition as possible :)

It would be nice to refactor Zato this way to avoid all the repeating patterns but it's quite a drastic change so I can't see that happening before Zato 2.0.

@tkrajca
Copy link
Author

tkrajca commented Feb 25, 2014

Not sure what I was thinking working on the master directly :)

https://github.com/dpaw2/zato/tree/dpaw-f-gh157-ldap-outconns

@dsuch
Copy link
Collaborator

dsuch commented Feb 25, 2014

https://github.com/dpaw2/zato/tree/dpaw-f-gh157-ldap-outconns

Oh wow, this looks cool!

For the connection pool please use the approach Suds SOAP connections use given the LDAP client is not thread-safe.

https://github.com/zatosource/zato/blob/master/code/zato-server/src/zato/server/connection/http_soap/outgoing.py#L312

Essentially, there needs to exist a queue that connections are checked out from in services using Python's with statement to guarantee they are put back, as in this blog post:

https://zato.io/blog/posts/secure-scalable-and-dynamic-invocation-of-soap-services-with-zato-and-suds.html

Note that in order to make sure the process of building a queue of connections ends eventually, there's a limit assigned to that task - for Suds it is misc.suds_soap_queue_build_cap in server.conf.

Actually, let's see how adding such a connection queue for LDAP goes. Perhaps there's a pattern here and once you add it, it will be worth to refactor both Suds and LDAP to use a generic non-thread-safe-outconn-queue kind of container. But there's no rush - let's first implement it and only then we'll find out how much sense it makes to add it.

Thanks for suggestion regarding Zato web-admin's views. Frankly, admin is the part of Django that I'm the least familiar with, it somehow always was the case that I wrote all the GUIs from scratch, without using admin that much so I realize I may be missing out on something useful, that's true.

@aek
Copy link
Contributor

aek commented Feb 25, 2014

I think that too, the use of django model admin is a way to remove a lot of
client side code, also in conjunction with bootstraps the visibility get
improved by defaults, I use yawdadmin django app to get django
admin bootstraped and also works very well with Django 1.6, also this app
have support for display google analitics statistics and could be a future
integration in Zato services, I mean that services handlers could track
events and call google annalitics using a client library like for example
http://code.google.com/p/gdata-python-client/ or
https://github.com/supercodepoet/python-ga. That way you can track ga
events and get displayed in the admin. I know that Zato have their own
statistics of service execution but using ga Zato could benefits from
others ga related features, of course for those who need it. If not
yawdadmin is just another but very good an up to date django admin
bootstrap theme.
You may say, I'm a dreamer.
But this could be done,
Cheers

On Mon, Feb 24, 2014 at 7:23 PM, Tomas Krajca notifications@github.comwrote:

Hi,

Another point is that you are using an older approach to develop Django
views - the new one is class-based and much more declarative, check it out
here

I am actually more comfortable using class-based views but because the SQL
outgoing connections are done that way, I though that was the "standard"
for Zato.

We actually don't even use class-based views for most of our apps, we
utilize django's ModelAdmin. Basically, we hook all our models to
correlating admin models and then customize the admin models. Usually, all
the admin models inherit from a common base so all common behavior is in
there. That way, it's really DRY and there is as little code repetition as
possible :)

It would be nice to refactor Zato this way to avoid all the repeating
patterns but it's quite a drastic change so I can't see that happening
before Zato 2.0.

Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-35960572
.

@aek
Copy link
Contributor

aek commented Feb 25, 2014

Could you take a look at this gevent ldap pool implementation?
Maybe it fit in what you wanna do.
https://bitbucket.org/lordmauve/webchat/src/02f1bc67da350915ef0a69ce069373bebf6827e7/ldap_async.py?at=default

Hope this help

On Mon, Feb 24, 2014 at 7:53 PM, Dariusz Suchojad
notifications@github.comwrote:

https://github.com/dpaw2/zato/tree/dpaw-f-gh157-ldap-outconns

Oh wow, this looks cool!

For the connection pool please use the approach Suds SOAP connections use
given the LDAP client is not thread-safe.

https://github.com/zatosource/zato/blob/master/code/zato-server/src/zato/server/connection/http_soap/outgoing.py#L312

Essentially, there needs to exist a queue that connections are checked out
out from in services using Python's with statement to guarantee they are
put back, as in this blog post:

https://zato.io/blog/posts/secure-scalable-and-dynamic-invocation-of-soap-services-with-zato-and-suds.html

Note that in order to make sure the process of building a queue of
connections ends eventually, there's a limit assigned to that task - for
Suds it is misc.suds_soap_queue_build_cap in server.conf.

Actually, let's see how adding such a connection queue for LDAP goes.
Perhaps there's a pattern here and once you add it, it will be worth to
refactor both Suds and LDAP to use a generic non-thread-safe-outconn-queue
kind of container. But there's no rush - let's first implement it and only
then we'll find out how much sense it makes to add it.

Thanks for suggestion regarding Zato web-admin's views. Frankly, admin is
the part of Django that I'm the least familiar with, it somehow always was
the case that I wrote all the GUIs from scratch, without using admin that
much so I realize I may be missing out on something useful, that's true.

Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-35962635
.

@tkrajca
Copy link
Author

tkrajca commented Feb 25, 2014

Thanks, I'll check it out.

We've got a really good project that nicely utilizes django's ModelAdmin, I think that would be a great example of how DRY the code can be. Django's class-based views are nice but there is still a fair bit of repetition + there is so much that django's ModelAdmin does for you (e.g. form generation, validation, change views, changelist views, etc.) just out of the box.

I got about half way through publishing it on pypi and "making" it open source, I'll try to strip off all its sensitive information today and put it at least on our public github repo so that you can have a look.

@tkrajca
Copy link
Author

tkrajca commented Feb 25, 2014

This is the project I was talking about https://github.com/dpaw2/django-prescribed-burn-system

Sorry, there is pretty much no documentation for it, yet. The public (open source) release is in progress.

@slurms
@adonm

@tkrajca
Copy link
Author

tkrajca commented Feb 25, 2014

Hi, just got stuck on this, my backend server doesn't start any more, below is a part of the server log.

2014-02-25 13:49:17,670 - DEBUG - 6168:MainThread - springpython.factory.PythonObjectFactory:22 - Creating an instance of singleton_server
2014-02-25 13:49:17,671 - DEBUG - 6168:MainThread - springpython.config.objectSingleton<function singleton_server at 0x3a63410> - (<zato.server.spring_context.ZatoContext object at 0x15ea810>,)scope.SINGLETON:22 - This IS the top-level object, calling singleton_server().
2014-02-25 13:49:17,671 - DEBUG - 6168:MainThread - springpython.config.objectSingleton<function scheduler at 0x3a635f0> - (<zato.server.spring_context.ZatoContext object at 0x15ea810>,)scope.SINGLETON:22 - This is NOT the top-level object singleton_server, deferring to container.
2014-02-25 13:49:17,671 - DEBUG - 6168:MainThread - springpython.config.objectSingleton<function scheduler at 0x3a635f0> - (<zato.server.spring_context.ZatoContext object at 0x15ea810>,)scope.SINGLETON:22 - Container = <springpython.context.ApplicationContext object at 0x3a62750>
2014-02-25 13:49:17,671 - DEBUG - 6168:MainThread - springpython.config.objectSingleton<function scheduler at 0x3a635f0> - (<zato.server.spring_context.ZatoContext object at 0x15ea810>,)scope.SINGLETON:22 - Found <zato.server.scheduler.Scheduler object at 0x3a68dd0> inside the container
2014-02-25 13:49:17,671 - DEBUG - 6168:MainThread - springpython.config.objectSingleton<function singleton_server at 0x3a63410> - (<zato.server.spring_context.ZatoContext object at 0x15ea810>,)scope.SINGLETON:22 - Found <zato.server.base.singleton.SingletonServer object at 0x3a68f50>
2014-02-25 13:49:17,672 - DEBUG - 6168:MainThread - springpython.context.ApplicationContext:22 - Stored object 'singleton_server' in container's singleton storage
2014-02-25 13:49:17,673 - DEBUG - 6168:Dummy-1 - springpython.context.ApplicationContext:22 - Invoking the destroy_method on registered objects
2014-02-25 13:49:17,673 - DEBUG - 6168:Dummy-1 - springpython.context.ApplicationContext:22 - About to destroy object 'parallel_server'
2014-02-25 13:49:17,674 - ERROR - 6168:Dummy-1 - springpython.context.ApplicationContext:22 - Could not destroy object 'parallel_server', exception 'Traceback (most recent call last):
  File "/root/zato/code/eggs/springpython-1.3.0.RC1-py2.7.egg/springpython/context/__init__.py", line 106, in shutdown_hook
    destroy_method()
  File "/root/zato/code/zato-server/src/zato/server/base/parallel.py", line 696, in destroy
    self.config.odb_data = self.get_config_odb_data(self)
  File "/root/zato/code/zato-server/src/zato/server/base/parallel.py", line 543, in get_config_odb_data
    odb_data.db_name = parallel_server.odb_data['db_name']
TypeError: 'NoneType' object has no attribute '__getitem__'
'

I haven't worked with springpython, I assume it has something to do with it? That's happening on the head of dpaw-f-gh157-ldap-outconns.

I haven't got around setting the LDAP pool yet, I like this approach https://bitbucket.org/lordmauve/webchat/src/02f1bc67da350915ef0a69ce069373bebf6827e7/ldap_async.py?at=default (thanks @aek for suggestion), I'll see whether it can play nicely with suds.

@dsuch
Copy link
Collaborator

dsuch commented Feb 25, 2014

TypeError: 'NoneType' object has no attribute 'getitem'

I'll check it out @tkrajca but one thing I recommend you cherry-pick from zatosource/zato is this commit 8ce9b1a

There's an upstream bug https://bitbucket.org/circuits/circuits/issue/62/installing-circuits-15-install-1515 and actually, we can live without circuits - it is needed only for WebSphere MQ and only in very, very specialized cases that are not even to do with Zato. So we can drop this requirement.

Without this commit you won't be able to run install.sh in your fork, as I have just witnessed it :)

@dsuch
Copy link
Collaborator

dsuch commented Feb 25, 2014

@tkrajca - I can't start a server running from head on dpaw-f-gh157-ldap-outconns

AttributeError: 'LDAPStore' object has no attribute 'add_params'
Traceback (most recent call last):
  File "/home/dsuch/projects/dpaw2-zato/code/eggs/gunicorn-18.0-py2.7.egg/gunicorn/arbiter.py", line 494, in spawn_worker
    self.cfg.post_fork(self, worker)
  File "/home/dsuch/projects/dpaw2-zato/code/zato-server/src/zato/server/base/parallel.py", line 637, in post_fork
    parallel_server._after_init_accepted(server, arbiter.zato_deployment_key)
  File "/home/dsuch/projects/dpaw2-zato/code/zato-server/src/zato/server/base/parallel.py", line 433, in _after_init_accepted
    self.worker_store.init()
  File "/home/dsuch/projects/dpaw2-zato/code/zato-server/src/zato/server/base/worker.py", line 112, in init
    self.init_ldap()
  File "/home/dsuch/projects/dpaw2-zato/code/zato-server/src/zato/server/base/worker.py", line 203, in init_ldap
    self.worker_config.out_ldap.add_params(config_list)

I've also noticed the LDAPStore is based on SQL connections - please use SudsSOAP instead or the one @aek found.

With the latter async connection pool however, after a cursory look I'm not sure if it's OK to poll for LDAP replies just like that? Will it be efficient?

@tkrajca
Copy link
Author

tkrajca commented Feb 26, 2014

Hi, thanks, I worked around the circus problem but it's nice to see that it's fixed.

I ran the code on "master" against stock Zato 1.1, that's why I was getting the original error.

I now created a new zato cluster against the master (which got rid of my original problem).

This is a much nicer way to debug the backend server than doing zato start ... :)

bin/py -m zato.server.main /root/zato2/server1

@tkrajca
Copy link
Author

tkrajca commented Feb 26, 2014

Hi @dsuch,

I think I am finally starting to understand how things work together and got to the point that I am pretty much able to create a new LDAP outgoing connection (in the db, not physically to LDAP, yet). Once, I can create a new LDAP outgoing connection, I'll start working on the SudsLDAPWrapper.

The work is a little bit slower than I would like due to other projects that I participate in but I think it is heading the right way.

@aek
Copy link
Contributor

aek commented Feb 26, 2014

Some days ago I create a class to allow to bootstrap a zato-server inside
Eclipse and also let you to debug Zato code and deployed webservices, It
require werkzeug as development service. If you are looking for a way to
debug what you are doing in Zato you could use it.
I just push it to my Zato fork at github

https://github.com/aek/zato/blob/master/code/zato-server/src/zato/server/wsgi.py

the way of use it is identical to main.py of the service that you are
executing with the line

bin/py -m zato.server.main /root/zato2/server1

Just run it or debug it in eclipse with the path of your server folder as
argument in the run or debug configuration
wsgy.py contain a simple wsgi application without dependencies with
gunicorn, the werkzeug dependency exist just to serve the wsgi app

On Tue, Feb 25, 2014 at 9:15 PM, Tomas Krajca notifications@github.comwrote:

Hi, thanks, I worked around the circus problem but it's nice to see that
it's fixed.

I ran the code on "master" against stock Zato 1.1, that's why I was
getting the original error.

I now created a new zato cluster against the master (which got rid of my
original problem).

This is a much nicer way to debug the backend server than doing zato start
... :)

bin/py -m zato.server.main /root/zato2/server1

Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-36083783
.

@tkrajca
Copy link
Author

tkrajca commented Feb 26, 2014

Thanks @aek, that is good to know, I actually use vim so the bash command that logs into stdout/stderr works pretty well for me.

@tkrajca
Copy link
Author

tkrajca commented Feb 26, 2014

Hi @dsuch,

Could you please explain a bit closer how do the server workers zato/code/zato-server/src/zato/server/base/worker.py interact with the connections e.g. zato/code/zato-server/src/zato/server/connection/ldap.py

So, let's say I've got 4 workers, I assume there is a queue of established LDAP connections, when a worker is to serve a Service, it picks up one of the connections and passes it to the Service so that 'self.outgoing.ldap' works? Sorry, I have not done much with threading, queues and gevent-based threads so I feel a little bit unsure of what I am doing here. I am not exactly sure why it's not thread-safe, what is green-safe, etc. and how does it relate to LDAP (or the connections in general). For example, from your source code, it seems that sql and ftp connections are thread-safe while Http/Soap connections are not, I can't see why.

Thanks

@dsuch
Copy link
Collaborator

dsuch commented Feb 26, 2014

Hello @tkrajca,

SQL connections are a bit of an oddball because they were added first. FTP on the other hand is not very representative because a connection to an FTP resource is created each time an FTP command is executed.

In the context of this ticket, LDAP really seems most similar to HTTP so let me just describe what happens when a server is starting and how it's possible for a service to have access to self.outgoing.plain_http and self.outgoing.soap.

There are three dimensions of parallelism:

  • Multiple servers may be running in a cluster
  • Each server may consist of multiple gunicorn-managed gevent processes
  • Within each process, there may be multiple tasks of execution running in parallel (greenlets, although calling them threads is OK as long as we remember they are not real OS threads, they are lightweight ones spawned by gevent)

Lucky for us, when adding a new outgoing connection we can forget about 1 and 2 - there are mechanisms that make sure that:

  • Servers start independently
  • For each server, each of its gevent processes starts independently and they don't get into each other's way

Also, as far as terminology goes - there is always one worker.py for each gevent process.

In other words, when adding an outconn we can for the most part assume we are working with a single process, however this process may start multiple threads (greenlets) hence we need to take it into account. And indeed, we need to remember that a new greenlet is spawned each time Zato receives a request.

What is described below is based on this commit (head as of now)

https://github.com/zatosource/zato/tree/c658e9e98e52e959f36539ff0af4341804e9168d

For clarity, I won't be using full paths to modules each time so here they are introduced upfront

Module Full path
parallel.py zato-server/src/zato/server/base/parallel.py
worker.py zato-server/src/zato/server/base/worker.py
config.py zato-server/src/zato/server/config.py
odb.py zato-server/src/zato/server/odb.py
outgoing.py zato-server/src/zato/server/connection/http_soap/outgoing.py
service/init.py zato-server/src/zato/server/service/init.py
reqresp/init.py zato-server/src/zato/server/service/reqresp/init.py
query.py zato-common/src/zato/common/odb/query.py

Definitions are read from the DB

  • parallel.py L:347-353 - all the outgoing HTTP/SOAP definitions are read from the DB
  • odb.py's get_http_soap_list is a method that does it
  • The method invokes query.py's function in turn
  • Once read, the definitions are turned into a ConfigDict by config.py

A worker is assigned its config which gets initialized

  • parallel.py L:447-451 - once all the definitions are read, they are assigned to a worker, whose .init method is called
  • worker.py's init initializes various unrelated structures and eventually calls self.init_http_soap in L:111
  • self.init_http_soap calls self._http_soap_wrapper_from_config in L:203
  • self.init_http_soap returns either a regular HTTPSOAPWrapper or a SudsSOAPWrapper. The latter - and the one we are interested in here - is returned only if the outconn uses Suds for serializing SOAP requests into wire format.
  • Whatever is returned, it's assigned to worker's self.worker_config.out_soap or self.worker_config.out_plain_http in self.init_http_soap

What SudsSOAPWrapper does, really

  • Note L:164 in worker.py which calls SudsSOAPWrapper's build_client_queue in outgoing.py L:354
  • Holding a lock which prevents access to the outconn until a queue is built, the method spawns (L:393-394) as many greenlets as there are supposed to be connections kept in a queue
  • Each greenlet calls 'add_client' in L366 which does some SOAP-specific dances but eventually adds a newly established connection to the queue in L:390
  • Meanwhile, the main greenlet, the one that spawned all the ones that are busy populating the queue, it still ticks in L:396-413
  • What it does in L:396-413 is to abort the process of building the queue if we run out of time
  • But let's say we don't run out of time and the queue is ready with connections - you'll note that each actual connection is wrapped in a _SudsClient instance in L:314-333 whose purpose is to use enter and exit to make sure a connection fetched from a queue is always returned to it

This concludes the steps a starting server takes in order to read everything from the DB, cache it locally in ConfigDicts and eagerly create connections.

A service is invoked and wishes to invoke an HTTP/SOAP resource

  • When a service is invoked through a channel, control is eventually passed to service/init.py's _init method in L:200
  • This method establishes various facades for what workers have created in steps above, in particular L:215 creates an Outgoing from reqresp/init.py
  • reqresp/init.py L:387-401 - it doesn't do much, simply makes it possible for services to invoke outconns from one place instead of access worker.py directly
  • So now when you do self.outgoing.plain_http.get('my-rest-resource') it reaches to worker.py's self.worker_config.out_plain_http and returns a ConfigDict whose .conn method is a HTTPSOAPWrapper
  • Similarly, when you call self.outgoing.plain_http.get('my-soap-resource') and it happens that it's a Suds using outconn, a ConfigDict is returned again from a worker.py's self.worker_config.out_soap whose .conn method is this time a SudsSOAPWrapper with a queue of connections inside

What to do next? Well, if the underlying LDAP library is thread-safe, we need to make everything look exactly like HTTPSOAPWrapper above. However, if it's not thread-safe, it needs to look like SudsSOAPWrapper - which means 90% stays the same except for the fact that a queue of connections is needed.

I certainly appreciate all your efforts and I understand you can be pressed on time so if you're interested, please add as much as you can - even without such a queue - and I'll be happy to add the rest.

Cheers!

@tkrajca
Copy link
Author

tkrajca commented Feb 27, 2014

Hi Dariusz,

Thank you for a great explanation, I think I finally got it.

Just a quick question, so which of the outgoing connections are thread-safe? I think that is what got me confused. All FTP, SQL, HTTP/SOAP, etc. are not generally thread/green safe, right? It is just that each of them deals with that differently - FTP by establishing a new connection whenever a service is invoked, SOAP via the gevent.queue and suds, SQL via sqlalchemy, etc. Do I have it right? I suppose it might be nice to have a common interface for all of them - so that you don't have to invoke .session() for sql, .conn for http, etc. (is that feasible)?

Anyway, I took the approach that FTP does to create an LDAP connection per service invocation (I believe that's thread-safe). The current head of https://github.com/dpaw2/zato/tree/dpaw-f-gh157-ldap-outconns works with our authentication service (and its tests seem to pass):

from zato.server.service import Service

import ldap


class Invoke(Service):
    class SimpleIO:
        input_required = ('email', 'password')
        output_required = ('email', 'status')

    def handle(self):
        self.response.payload.email = self.request.input.email
        ldapconn = self.outgoing.ldap.get('AD').conn
        if not ldapconn:
            self.response.payload.status = 'failed to connect to AD'
            return

        # find the user in AD by email
        try:
            user_dn, user = ldapconn.search_s(
                "DC=corporateict,DC=domain", ldap.SCOPE_SUBTREE,
                "(mail={0})".format(self.request.input.email))[0]
        except:
            self.response.payload.status = 'fail'
            return

        # try to bind to the AD with this user and his supplied password
        try:
            ldapconn.simple_bind_s(user_dn, self.request.input.password)
        except ldap.INVALID_CREDENTIALS:
            self.response.payload.status = 'fail'
            return

        self.response.payload.status = 'okay'

I might give this https://bitbucket.org/lordmauve/webchat/src/02f1bc67da350915ef0a69ce069373bebf6827e7/ldap_async.py?at=default a go to create a pool of asynchronous LDAP connections later on (not sure).

@dsuch
Copy link
Collaborator

dsuch commented Feb 27, 2014

Hi @tkrajca - yes, it can put that way.

You are right that a common interface would be handy but truth be told, all the connection protocols so far have so widely different characteristics as far as their underlying libraries go that it's not so easy, especially when you start including connections that aren't pure-Python - WebSphere MQ and ZeroMQ.

You were dealing with internals right now so it may have particularly seemed but from a user's point of view, there are basically two ways to obtain connections:

self.foo['name'].conn

or

with .. as session/client:

And this is what I'd like to keep - these two ways should suffice. One is for connections that have no underlying queues/pools whereas the former is, currently, for SOAP and SQL, which happen to use queues/pools.

As for LDAP, this is really great work in very short time!

In upcoming 2-3 weeks, this is what I'd like to do now, seeing as you now have something that you work with your end:

  1. Merge the branch into zatosource/zato
  2. Add unit tests
  3. Add tests against a live LDAP server
  4. Merge it into master

Next we could see what it takes to make connections BIND by default and how to go about search templates.

As for 3 - can you recommend a very lightweight server for me to set up on my test environment? Like an SQLite's equivalent in the LDAP world?

As for binding by default -I've noticed you can search without a bind in your code - does it mean this is how your particular (AD) works? I've never checked it and I'm not sure - say you're opening an Outlook contact book and look up someone - doesn't AD always require a BIND under the hood, even if transparently through Kerberos? I just don't know how it works, asking out of curiosity.

Thanks a lot!

@tkrajca
Copy link
Author

tkrajca commented Feb 28, 2014

Hi @dsuch,

Thanks, it's been a steep learning curve for me but it would take much longer without your great guidance.

Yes, you are quite right about that, two ways are probably ok for now,

with .. as session/client:

This could probably be used for all connections. I guess it depends whether you want to wait for garbage collector to free up the ftp, ldap, etc. connections, or do it explicitely.

As for the mock ldap server, I haven't used any myself but I googled a couple of interesting projects that aim to do that:

https://bitbucket.org/psagers/mockldap
https://pypi.python.org/pypi/python-ldap-test
https://pypi.python.org/pypi/mockldap
https://pypi.python.org/pypi/fakeldap

Regarding the binding, I think there are "open" LDAP servers - anybody can connect and search, etc. and "authenticated" LDAP servers - only authenticated users can access services. My code assumes that the LDAP connection is already bound to our AD (I am sure it's not open to everybody) - that's why there is bind_dn and password.

As for your plan, what would you like me to help with, should I create a pull request with what I've got now? I am happy to provide some assistance if I've got time but my life is about to get crazy for a couple of weeks - I am having a baby, moving interstate and starting in a new job - all that in the next 3 weeks :) so I am not sure how available I will be. Also, I am not sure I will have any time for this in my new job - the focus and direction is most likely going to be very different as far as I know. I'll try to talk to my colleagues (@slurms, @adonm) here and see whether they are interested in taking this a little bit further - e.g. creating a pool of async ldap connections, I don't think the template search will be of any priority as we've got other means of search the AD. Anyway, I should be in my current work until the end of next week.

@adonm
Copy link

adonm commented Feb 28, 2014

Just a note on a sample ldap server, if you have docker available I found
this pretty useful:

https://github.com/mattvoss/docker-ldap

Bit of downloading during the build, but a one liner to get an LDAP service
up and running.

Kind Regards,
Adon

On 28 February 2014 14:45, Tomas Krajca notifications@github.com wrote:

Hi @dsuch https://github.com/dsuch,

Thanks, it's been a steep learning curve for me but it would take much
longer without your great guidance.

Yes, you are quite right about that, two ways are probably ok for now,

with .. as session/client:

This could probably be used for all connections. I guess it depends
whether you want to wait for garbage collector to free up the ftp, ldap,
etc. connections, or do it explicitely.

As for the mock ldap server, I haven't used any myself but I googled a
couple of interesting projects that aim to do that:

https://bitbucket.org/psagers/mockldap
https://pypi.python.org/pypi/python-ldap-test
https://pypi.python.org/pypi/mockldap
https://pypi.python.org/pypi/fakeldap

Regarding the binding, I think there are "open" LDAP servers - anybody can
connect and search, etc. and "authenticated" LDAP servers - only
authenticated users can access services. My code assumes that the LDAP
connection is already bound to our AD (I am sure it's not open to
everybody) - that's why there is bind_dn and password.

As for your plan, what would you like me to help with, should I create a
pull request with what I've got now? I am happy to provide some assistance
if I've got time but my life is about to get crazy for a couple of weeks -
I am having a baby, moving interstate and starting in a new job - all that
in the next 3 weeks :) so I am not sure how available I will be. Also, I am
not sure I will have any time for this in my new job - the focus and
direction is most likely going to be very different as far as I know. I'll
try to talk to my colleagues (@slurms https://github.com/slurms, @adonmhttps://github.com/adonm)
here and see whether they are interested in taking this a little bit
further - e.g. creating a pool of async ldap connections, I don't think the
template search will be of any priority as we've got other means of search
the AD. Anyway, I should be in my current work until the end of next week.


Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-36325954
.

Regards,

Adon Metcalfe

@tkrajca
Copy link
Author

tkrajca commented Mar 6, 2014

Hi @dsuch,
I created a pull request #165 with what I've got so far. I've got a son as of Tuesday morning so today is my last day at the Department of Parks and Wildlife. Thank you for your guidance through this work, it was excellent. I'll try to keep track of what is happening with this pull request/ticket but I might simply be too busy with other stuff, feel free to contact @slurms or @adonm for any assistance.
Cheers,
Tomas

dsuch added a commit that referenced this issue Mar 6, 2014
@dsuch
Copy link
Collaborator

dsuch commented Mar 6, 2014

Thanks @tkrajca - this was merged and I'll be still working on it a bit in a feature branch, i.e. tests and query templates. It was very nice to work with you, wishing you all the best!

@dsuch dsuch closed this as completed Mar 6, 2014
This was referenced May 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants