Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the Flink library and shade dependencies from the fat jar #78

Closed
linhr opened this issue Nov 18, 2022 · 5 comments
Closed

Remove the Flink library and shade dependencies from the fat jar #78

linhr opened this issue Nov 18, 2022 · 5 comments

Comments

@linhr
Copy link
Contributor

linhr commented Nov 18, 2022

  1. I noticed that the Nebula Flink connector library available in the Maven central repository is large (over 100 MB). It seems that the flat jar contains the entire Flink library as dependencies. Shall we mark the Flink dependencies as <provided> in pom.xml so that it is not included in the jar? (BTW, do we have a plan to publish artifacts compatible with each recent Flink version? For example for the Kafka connector I can pick flink-sql-connector-kafka-1.16.0.jar where 1.16.0 is the Flink version I need.)
  2. It seems that third-party dependencies are not shaded in the flat jar, so there may be version conflicts with other libraries at runtime. Shall we shade all third-party dependencies under e.g. org.apache.flink.connector.nebula.shaded?

I've seen the above two issues taken care of in other official connectors (e.g. the Flink Elasticsearch connector). I have the corresponding code changes locally, and I'm happy to submit a PR if the idea makes sense.

@wey-gu
Copy link

wey-gu commented Nov 21, 2022

Thanks so much @linhr

what do you think @Nicole00 ?

@Nicole00
Copy link
Contributor

  1. not included in t
  1. Yeah, the size is not unreasonable for the maven dependency, I will submit a pr to change the package way.
  2. Multi-version compatibility is in our plans, but its priority is not high. I would appreciate it if you contribute your local code.
  3. the shade action looks not so necessary. maybe we can adjust the package way and when users use the flink connector, they can exclude the internal dependencies and import another version by themselves. just like:
<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>${flink.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>commons-lang3</artifactId>
                    <groupId>org.apache.commons</groupId>
                </exclusion>
            </exclusions>
        </dependency>

What's your idea? @linhr

@linhr
Copy link
Contributor Author

linhr commented Nov 22, 2022

@Nicole00 Thanks for your response!

I've created a PR (#79) to illustrate my ideas. Regarding your comments:

  1. I've marked all Flink dependencies as <provided>, in this way not only the flat jar size is reduced, but also people can deploy the connector jar to their Flink environment of a different version. (Before this change, it was not possible. For example, the nebula-flink-connector-3.3.0.jar file in the Maven repository was bundled with Flink 1.14, so when I deployed it to my Flink 1.16 environment I got errors in job/task managers due to Java class conflicts.)
  2. I haven't looked into how multi-version support can be handled. I guess we need to setup unit/integration tests so that they are run against each Flink version we want to support. I'm fine if it is not of high priority at this point, since the user can build the connector from source and test against the Flink version they want.
  3. Dependency shading: There is no change to the Java code import statements, and it only affects the flat jar so developers do not need to be aware of it. I feel it is cleaner than asking the user to provide all third-party dependencies (and transitive dependencies). Especially, PyFlink and Flink SQL users typically do not have Maven in their development setup, so a self-contained flat jar would be more desirable. Dependency shading is a standard practice which can be found in other official Flink connectors as well (e.g. the Kafka connector here).

@linhr linhr changed the title Remove the Flink library and shade dependencies from the flat jar Remove the Flink library and shade dependencies from the fat jar Nov 22, 2022
@Nicole00
Copy link
Contributor

@linhr Thanks for your contributions very much!

  1. I'm fine to mark dependencies as <provided> and to shade the dependencies, it has the same result with what I will do with maven-assembly-plugin.
  2. For multi-version flink support, we need to extract the common code for different flink version, and use profile's properties to control flink version. Then when deploy the snapshot or release connector, just run deploy action with parameters -pl ${connector_directory} -am -Pflink-2.14

@Nicole00
Copy link
Contributor

#79 resolved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants