Storm Dependency Isolation
An unresolved issue of Storm is dependency conflict between storm itself and user's topology code. e.g. Storm depends on log4j 1.2.16, while user's topology code might depends on version 1.2.17.
Actually any 'container'
like framework which itself runs as a container and accpet user's code(jar) runs within it has this kind of problem, so have a look at how web container Tomcat solve the problem may give us a clue.
Tomcat is using ClassLoader Isolation to slove the problem, a simplified view of its classloader hierarchy(have a look at its doc and source code) looks like this:
Bootstrap
|
System
|
Common
_______|__________
| | |
Server Webapp1 Webapp2 ...
- The common classloader load the classes which is visible to both tomcat & user's code, like the servlet api.
- The Server classload load the classes which tomcat itself depends on.
- And tomcat create a seperate classloader for each webapp: Webapp1, Webapp2 etc, so each webapp can depends on different version of the same class.
This approach works great for tomcat because tomcat's server code and user code does not run on the same thread. As long as we set the corresponding class loader for each thread using:
// for tomcat server thread
Thread.currentThread().setContextClassLoader(serverClassLoader);
// for webapp
Thread.currentThread().setContextClassLoader(webapp1ClassLoader);
that's all.
Short answer, no. As the current implementation of storm, the storm server code and user's topology code runs on the same thread, there is no server thread, no user thread, server code and user code runs on the same thread, so it looks like this:
storm-server-code
...
user-topology-code
...
storm-server-code
...
user-topology-code
So we cann't simply set the ContextClassLoader
to ServerClassLoader
or UserClassLoader
.
One possible solution is separate the storm server code and user topology code into different threads. It seems not doable to storm, because if we split them into two threads, there will be one server thread for one user thread, then there will be two times of threads, which is too much overhead. Or we might be able do some kind of optimization to reduce the number of server threads, but it seems too big change to the current implementation.
Can we keep the storm server code and user topology code run on the same thread, but let them use different classloaders? The answer is yes, and the implementation will not be very ugly.
In order to use different classloaders for storm server code and user topology code on one thread, we need to be able to clearly identify which lines of code is storm server code, which lines of code is user topology code. It is not hard because user topology code has boundary: which is defined by:
- https://github.com/nathanmarz/storm/blob/master/src/jvm/backtype/storm/spout/ISpout.java
- https://github.com/nathanmarz/storm/blob/master/src/jvm/backtype/storm/task/IBolt.java
So one can implement a SpoutWrapper
and a BoltWrapper
to wrap user's spout and bolt, the wrapper's responsibility is to set the ContextClassLoader
to UserClassLoader
before executing user's spout/bolt code, and set the ContextClassLoader
back to ServerClassLoader
before it returns, so the pseudo code of the BoltWrapper
will be:
public class BoltWrapper implements IBolt {
public BoltWrapper(IBolt realBolt, ClassLoader serverClassLoader, ClassLoader userClassLoader) {
// TODO set the fields
}
public void execute(Tuple tuple) {
Thread.currentThread().setContextClassLoader(this.userClassLoader);
this.realBolt.execute(tuple);
Thread.currentThread().setContextClassLoader(this.serverClassLoader);
}
}
this way we can use ServerClassLoader
to load storm's dependency, and use UserClassLoader
to load user topology's dependency. How do you think?