New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending message to remote actor systems #24
Comments
I'm glad you are enjoying Thespian so far. If I understand correctly, you have multiple systems and actors that started independently on those systems and now you are wanting to ensure that the actors on the remote systems have the address of the coordinator actor? If this is the case, that is an unusual configuration. Typically, actors exist in a tree of parent/child relationships. It's possible for actors to communicate with any other actor provided the address, but the address is typically passed as part of a message. You are getting the remoteAdminAddress from the convention update, but the address provided is that of the admin itself, which is running in the background and responsible for keeping everything running on the local system. The admin will not respond to arbitrary messages; only to system messages it knows about. The customary usage model is that the actors on the remote systems are created as children of an actor on the central system; for example, the coordinator is responsible for starting the top-level actors on those remote systems and therefore they automatically know about each other. The convention notification and associated admin address are generally intended for bookkeeping purposes (for example, we use it to keep a database updated for which remote systems are "online" at any point in time), but actors generally attempt to maintain location independence by not specifically targeting systems (instead, the capabilities and requirements are used to drive the placement of new actors). The multi-system examples (and specifically https://github.com/godaddy/Thespian/tree/master/examples/multi_system/act5) show an example of how this can be done. I realize this is a pretty high-level response; there are some compelling elements of the architecture I described above that might help drive your model, but I would like to also know more about your model to make sure Thespian supports a broad set of usage categories, so I'm happy to discuss further details on this if you would like. A couple of other notes:
I hope the above helps, and I would be happy to discuss any of the points in more detail as well as learn more about your usage model. Feedback on the documentation and examples is also welcomed. -Kevin |
It is getting clearer now, thanks. So, in my case the best and supported solution would be to create each top-level actors on remote systems from the coordinator. This would result in a large distributed tree of actors. If that is the case, my only requirement is to have only one top-level ("host coordinator") actor on each remote actor system. How can I ensure that with capabilities only? More generally, how can I ensure that there is only one instance of a specific type of actor in an actor system? Or should I avoid this during designing the architecture? The rest of the job is clear: the top-level local admins create children actors in their own remote systems, which are now indirectly connected to the coordinator (grandparent) and can send messages to him. |
The need to have "only one top-level actor on each remote actor system" should be manageable by the coordinator: the coordinator registers for convention notifications and when it receives one, it can call createActor with requirements that match the capabilities of the new remote system. This implies that there is at least one unique capability on each started remote Thespian instance; this is typically provided by whatever startup mechanism is being used, and could be any value you choose: an IP address, a hostname, a UUID, etc. Because the coordinator (or its delegate) is creating the remote actors, it knows which systems have remote actors and can ensure that there is exactly one on each remote system. We have found that the pattern of single responsibility is a big benefit when implementing an actor-based application or service. For example, if your coordinator is also acting as the entry point for requests that will be handled by the system, then it may be convenient to create a separate "registrar" actor that manages the existence of top level actors. The registrar's responsibilities are:
This allows the main coordinator to simply request the host coordinator address for any remote system on demand from the registrar, and pushes all responsibility for maintaining those remote host coordinators to the registrar. I don't think you should avoid this architecture, and in fact we have implemented something quite similar for the set of actors responsible for remote system management and monitoring. We also have other patterns active (simultaneously) where actors get placed based on other capabilities (for example, valid network access to a remote service, credentials for accessing a database, etc.), and there is work in progress to allow load-based automatic actor distribution which would allow the least-heavily utilized remote to be chosen when creating a new actor. -Kevin From: D?vid Honfi notifications@github.com It is getting clearer now, thanks. So, in my case the best and supported solution would be to create each top-level actors on remote systems from the coordinator. This would result in a large distributed tree of actors. If that is the case, my only requirement is to have only one top-level ("host coordinator") actor on each remote actor system. How can I ensure that with capabilities only? More generally, how can I ensure that there is only one instance of a specific type of actor in an actor system? Or should I avoid this during designing the architecture? The rest of the job is clear: the top-level local admins create children actors in their own remote systems, which are now indirectly connected to the coordinator (grandparent) and can send messages to him. You are receiving this because you commented. |
Works as expected, thank you very much! Now, I have only one question left. On the host of the Coordinator, I am shutting down the actor system using a "reaper pool" including every actor managed by the specific HostCoordinator. If the reaper pool is empty, then the HostCoordinator sends a message to its own ActorSystem to shutdown itself (its address was saved during the initialization). I would also like to have this process on remote systems. However, in cases when the HostCoordinators are created by the Registrar, this method cannot be used as I cannot reach the address of the remotely hosting ActorSystem (i.e., a remote HostCoordinator cannot send a message to its own ActorSystem to shutdown). Is there a way to automatically shut down remote actor systems when its actors have finished their jobs? Or should I create a separate process that monitors the ActorSystem for active actors? |
Glad to hear things are progressing well for you! We have customarily treated the Actor System itself as a background service (similar to sshd or inetd), but I can see the value in a cleanup-on-completion scenario. I'm curious as to how you are starting the Actor System on the remotes; if you are using something like systemd or upstart, then it may be most appropriate to utilize the same mechanism to effect the shutdown. One thing that comes to mind is that when your reaper has decided it is time to shutdown the remote, it can create an actor on the remote and send it a message that causes that actor to initiate the shutdown (via whatever means) on that remote system (local to it); there may be some timeouts and other abruption issues to deal with in this method. Also please be aware that shutting down an actor will automatically and recursively send shutdown requests to its children, whether they are running locally or remotely, and also that shutting down an actor system will shutdown all actors running in that system. -Kevin From: D?vid Honfi notifications@github.com Works as expected, thank you very much! Now, I have only one question left. On the host of the Coordinator, I am shutting down the actor system using a "reaper pool" including every actor managed by the specific HostCoordinator. If the reaper pool is empty, then the HostCoordinator sends a message to its own ActorSystem to shutdown itself (its address was saved during the initialization). I would also like to have this process on remote systems. However, in cases when the HostCoordinators are created by the Registrar, this method cannot be used as I cannot reach the address of the remotely hosting ActorSystem (i.e., a remote HostCoordinator cannot send a message to its own ActorSystem to shutdown). Is there a way to automatically shut down remote actor systems when its actors have finished their jobs? Or should I create a separate process that monitors the ActorSystem for active actors? You are receiving this because you commented. |
I was also trying this idea: when it is time to shut down a remote actor system, its host coordinator gets notified and tells its system to shut down itself. However, I am getting a retryable exception, when the host coordinator calls the ActorSystem().shutdown() method in receiveMessage(). Moreover, I tried invoking shutdown() when the HostCoordinator gets an ActorExitRequest, but nothing happens in this case (I think a deadlock occurs as the shutdown() waits for all actors to finish their ActorExitRequest receive functions [including the caller HostCoordinator]). Are there any proper ways to shut down a local actor system from one of its actors? |
It's not something that we've done before, but it's not unreasonable to support something like this. I would have expected the ActorSystem().shutdown() to work, so I'll investigate this and get back to you. From: D?vid Honfi notifications@github.com I was also trying this idea: when it is time to shut down a remote actor system, its host coordinator gets notified and tells its system to shut down itself. However, I am getting a retryable exception, when the host coordinator calls the ActorSystem().shutdown() method in receiveMessage(). Moreover, I tried invoking shutdown() when the HostCoordinator gets an ActorExitRequest, but nothing happens in this case (I think a deadlock occurs as the shutdown() waits for all actors to finish their ActorExitRequest receive functions [including the caller HostCoordinator]). Are there any proper ways to shut down a local actor system from one of its actors? You are receiving this because you commented. |
Thank you for fast reply, I'm waiting for your response. |
Hi David, I apologize for the mixup: I had sent you a reply but had done it by email "reply" to a message in the middle of the chain, rather than the last message, not realizing that github would ignore a reply that didn't occur at the end. I'm posting my original reply below directly via github (and you can ignore the original if it ever pops of the ether): ---- original message follows ---- I've done some testing and while it's sub-optimal, it does basically work for me. I've included a little test utility below that I used; feel free to update this to match your usage scenario. The utility also has two ways to trigger the shutdown (one line is commented out, feel free to uncomment it and comment out the following line).
As I stated above, this actually worked, but it's not desireable because it's not very clean, and it does violate the admonition to not call ActorSystem methods from an Actor itself. A better solution would be to provide a ".shutdownSystem()" method on the Actor itself, but I want to synchronize on the behaviour of the above methods first to ensure that I've captured the correct functionality. Please let me know what happens when you run the above, and send me the contents of ${TMPDIR}/thespian.log for the period of the run. Once we've synchronized the behavior we are seeing from the test utility, I think adding the .shutdownSystem() method to the Actor itself is probably reasonable. One thing to be aware of is that it would operate asynchronously, so the call would return to the Actor's receiveMessage() handler and other messages may be delivered (including an ActorExitRequest) before the ActorSystem actually shuts down. -Kevin ----snip----
|
First, I've a modified the snippet a little bit, because in the killer actor a new instance of the ActorSystem was created (due to the systemBase argument) on the same address that caused an exception. Thus, I've removed the systemBase argument in both constructor invocations in the Killer actor. Finally, I've got the attached results.
This is the same situation what I've ran into in my own system. I'm looking forward to a shutDownSystem method in the actor class (or in a special, dedicated actor class). |
Hi David, You are getting what I would expect to see after your modifications, although your modifications didn't work quite as expected. I'm really interested in the exception you were getting with the original snippet above, because I don't get an exception and I would like to understand what is happening differently for you. I'm working on the Details regarding your results (feel free to skip): By default, the Unfortunately, the killer actor is created as a separate process (because the The processes you had leftover were:
BTW, if you install the
If you run the original version of the tests above with the systemBase specified in the killer actor, then that should start creating a new multiprocTCPBase system, which will look for an existing multiprocTCPBase admin; in this case it should find the original one created and simply use that, thereby operating with the system-wide "singleton" instance. This is normal and expected behavior, and the fact that you got an exception instead is concerning. Here's the thespian.log output for me running case 2 with the systemBase specified in the killer actor:
It's notable that there is a 10 second delay before the first ERR line: this is the amount of time that an ActorSystem will gracefully wait for all actors to stop before giving up and exiting with this error. There is another 12 second delay before the TCP connection fails during the shutdown process and the final cleanup code is run. After this point, there should be no more processes running, but in the 22-second period from the start of the run, there will still be processes if you look. I'm also concerned if you don't see this final cleanup performed and there are still processes running after 30 seconds from starting the test. Much of the above is pretty complicated and the normal expectation is that the developer writing an actor application using Thespian should not need to know about this level of intricacy (and shouldn't normally need to consult the thespian.log, which is for debugging the internals of Thespian). The proper solution in this case is to provide a |
Hi Kevin, First of all, I am using a Windows 7 environment with Python 3.4.4. In my case, when the systemBase argument is given to the ActorSystem constructor, it also tries creating another system (see attached log of running case 2 with systemBase arguments) on the same port, which causes an exception (InvalidActorAddress for admin actor). If you need further tests, let me know and I'll try it. I'm also really looking forward to the |
Hi Kevin, Both the ActorSystem constructor and the |
David, This thread started with the idea connecting different remote actor systems. Any help would be appreciated. |
Hi, Exactly, I was able to get remote actors to communicate with each other using a convention. An example is provided in this directory. The key idea is to have a convention leader actor system to which the remotes can connect. If this happens, a special actor in the leader actor system can get notifications about it (using notifyOnSystemRegistrationChanges and handling the ActorSystemConventionUpdate messages, see the document ). If you want to create a new actor on the newly joined remote actor system, then you should use the actorSystemCapabilityCheck with a unique identifier of each remote actor system (e.g., its IP or a unique name). |
@jnkramer3 - hopefully David's response above is helpful for your decision. I am closing this issue but please feel free to create a new issue or reach out on the Thespian mailing list (https://groups.google.com/forum/#!forum/thespianpy) if you would like further help in your analysis or design. |
The new |
Make Thespian's internal logging threshold configurable
I just recently started using Thespian, its features are really great. However, I have just ran into a limitation/bug, which I cannot solve. My situation is the following.
I have multiple actor systems on different machines, including a dedicated system that has a coordinator actor (this system is the convention leader). I would like to distribute the address of the coordinator actor for some specific actors in other, remote actor systems (to be able to send back results of jobs). I am currently using the convention update system message for this purpose (the coordinator actor is subscribed to this).
My problem here is that I cannot send a message to the actor system that is just connected to the convention. Using the message handler of the coordinator I am sending a message to the remoteAdminAddress (received in the convention update message). However, this message is not being delivered to the remote, recently connected system (the remote waits for a message with ActorSystem.listen()).
Also note that the coordinator does not throw any exception, the sending finishes successfully. I have tried configuring admin routing as well, but it did not help.
The text was updated successfully, but these errors were encountered: