Skip to content

vdn-projects/hashing-columns-nifi-processor

Repository files navigation

Nifi hash column custom processor

Introduction

There are standard hashing processors but they mostly work with flowfile's attribute or a whole content, while we are only interested in partial content data. This project is to build a custom processor to perform the task of hashing specific columns with favorited algorithms including: MD2, MD5, SHA224, SHA256 and SHA512. As our particular purpose requires the outcome in csv format, then the csv output support is included in this project as well.

You can directly download the compiled output file HERE and test with your data flow. (put nar file in lib folder of Nifi installed location, restart is required to get the imported processor showing up)

Dependencies

This is a list of additional libraries that not come along during generating nifi-processor Maven's archetype template.

  • avro: from apache avro lirary to make the work with generic record easier
  • commons-csv: from apache-common to work with csv format

Get the project run

Ensure your computer is installed the following:

  • Java 8 JDK

  • Maven

    From the terminal, simply run below commands:

      git clone https://github.com/vanducng/hashing-columns-nifi-processor.git
      cd hashing-columns-nifi-processor
      mvn clean install
    

The output is located at: .\hashing-columns-nifi-processor\nifi-HashColumn-nar\target\nifi-HashColumn-nar-1.0.nar

Future improvement

  • CSV format can be configured with numerous properties such as value separator, record separator, quote character, escape charactor, etc.
  • Add in JSON output format support since adding another conversion processor will impact the processing time of the whole data flow.

Resources

  • YOUTUBE LINK: Custom Processor Development with Apache NiFi: a very informative resource to get the sense of custom processor development.

  • APACHE.ORG: Official development document reference.

  • MEDIUM LINK: Build a first simple custom processor.

  • APACHE.ORG: Working with Apache Avro

  • BLOG LINK: Working with Apache Common CSV

About

Hash specific columns and output in csv or avro format flowfile

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages