Skip to content

Storm Dependency Isolation

xumingming edited this page Jun 13, 2012 · 3 revisions

Problem Description

An unresolved issue of Storm is dependency conflict between storm itself and user's topology code. e.g. Storm depends on log4j 1.2.16, while user's topology code might depends on version 1.2.17.

Tomcat's solution

Actually any 'container' like framework which itself runs as a container and accpet user's code(jar) runs within it has this kind of problem, so have a look at how web container Tomcat solve the problem may give us a clue.

Tomcat is using ClassLoader Isolation to slove the problem, a simplified view of its classloader hierarchy(have a look at its doc and source code) looks like this:

      Bootstrap
          |
       System
          |
       Common
   _______|__________ 
  |         |        |
Server   Webapp1   Webapp2 ...  
  • The common classloader load the classes which is visible to both tomcat & user's code, like the servlet api.
  • The Server classload load the classes which tomcat itself depends on.
  • And tomcat create a seperate classloader for each webapp: Webapp1, Webapp2 etc, so each webapp can depends on different version of the same class.

This approach works great for tomcat because tomcat's server code and user code does not run on the same thread. As long as we set the corresponding class loader for each thread using:

// for tomcat server thread
Thread.currentThread().setContextClassLoader(serverClassLoader);

// for webapp
Thread.currentThread().setContextClassLoader(webapp1ClassLoader);

that's all.

Can storm use the same solution?

Short answer, no. As the current implementation of storm, the storm server code and user's topology code runs on the same thread, there is no server thread, no user thread, server code and user code runs on the same thread, so it looks like this:

storm-server-code
...
user-topology-code
...
storm-server-code
...
user-topology-code

So we cann't simply set the ContextClassLoader to ServerClassLoader or UserClassLoader.

Possible Solution I

One possible solution is separate the storm server code and user topology code into different threads. It seems not doable to storm, because if we split them into two threads, there will be one server thread for one user thread, then there will be two times of threads, which is too much overhead. Or we might be able do some kind of optimization to reduce the number of server threads, but it seems too big change to the current implementation.

Possible Solution II

Can we keep the storm server code and user topology code run on the same thread, but let them use different classloaders? The answer is yes, and the implementation will not be very ugly.

In order to use different classloaders for storm server code and user topology code on one thread, we need to be able to clearly identify which lines of code is storm server code, which lines of code is user topology code. It is not hard because user topology code has boundary: which is defined by:

So one can implement a SpoutWrapper and a BoltWrapper to wrap user's spout and bolt, the wrapper's responsibility is to set the ContextClassLoader to UserClassLoader before executing user's spout/bolt code, and set the ContextClassLoader back to ServerClassLoader before it returns, so the pseudo code of the BoltWrapper will be:

public class BoltWrapper implements IBolt {
    public BoltWrapper(IBolt realBolt, ClassLoader serverClassLoader, ClassLoader userClassLoader) {
         // TODO set the fields
    }

    public void execute(Tuple tuple) {
        Thread.currentThread().setContextClassLoader(this.userClassLoader);
        this.realBolt.execute(tuple);
        Thread.currentThread().setContextClassLoader(this.serverClassLoader);
    }
}

this way we can use ServerClassLoader to load storm's dependency, and use UserClassLoader to load user topology's dependency. How do you think?