Skip to content

Releases: yurkao/spark-dns

Spark-DNS 1.0.3

23 Mar 13:36
Compare
Choose a tag to compare
  1. Added Structured Streaming write support

Spark-DNS 1.0.2

21 Mar 15:07
Compare
Choose a tag to compare
  1. Added integration tests
  2. Added Spark DNS batch write support (Dataset API and Spark SQL) to publish DNS update to DNS server

Features added

02 Mar 20:33
Compare
Choose a tag to compare

Support for ignoring XFR failures

Spark-DNS

02 Mar 20:29
Compare
Choose a tag to compare

Introduction

Spark data source for retrieving DNS A type records from DNS server.
The spark DNS data source uses zone transfers to retrieve data from DNS server.
It tries to use IXFR for every zone transfer though some DNS server implementation may return AXFR response.

The spark DNS data source may operate on multiple DNS zones in single data frame.
Due to nature of DNS zone transfer, data retrieval for single zone transfer cannot be done in parallel,
though data from multiple zones is retrieved in parallel (each DNS zone is handled in different Spark partition of RDD)

Rationale

  1. Learning Spark internals
  2. integrating Spark with 3rd party data sources
  3. Just for fun

Features and limitations

Limitations

  1. Providing multiple DNS servers in options for same the same dataset/table is currently not supported
  2. Continuous Structured Streaming is not supported yet
  3. On Spark 2.4 (incl CDH 6.3.x) only batch reading is supported.

Currently implemented features

  1. Spark batch read
  2. Retrieving DNS A records from multiple DNS zone (though from single DNS server)
  3. New DNS SOA serial of DNS zone is available in Accumulator via Spark UI (refer to relevant stage)
  4. Spark Structured Streaming read support (Only trigger Once and Prcessing time is supported)
  5. Zone transfer timeout
  6. Specifying explicit zone transfer type (AXFR/IXFR) to use when retrieving data from DNS server.
    • When suing xfr=ixfr, only DNS zone updates from initial serial will be returned.
      • On Structured Streaming this may produce empty DataFrames on no updates
    • When using xfr=axfr, entire DNS zone A records will be returned
  7. Handling temporary failures during zone transfer (similar to failOnDataLoss in Spark+Kafka)