Used to read and write(developing) tsfile in spark.


1. dependency

2. versions

The versions required for Spark and Java are as follow:

Spark Version Scala Version Java Version TsFile
2.0+ 2.11 1.8 0.4.0

ATTENTION: Please check the jar packages in the root directory of your spark and replace libthrift-0.9.2.jar and libfb303-0.9.2.jar with libthrift-0.9.1.jar and libfb303-0.9.1.jar respectively.

3. TsFile Type <=> SparkSQL type

This library uses the following mapping the data type from TsFile to SparkSQL:

TsFile SparkSQL
INT32 IntegerType
INT64 LongType
FLOAT FloatType
DOUBLE DoubleType

4. TsFile Schema <-> SparkSQL Table Structure

The set of time-series data in section "Time-series Data" is used here to illustrate the mapping from TsFile Schema to SparkSQL Table Stucture.
A set of time-series data

4.1. using delta_object as reserved column

There are two reserved columns in Spark SQL Table:

  • time : Timestamp, LongType
  • delta_object : Delta_object ID, StringType

The SparkSQL Table Structure is as follow:

time(LongType) delta_object(StringType)sensor_1(FloatType)sensor_2(IntType)sensor_3(IntType)
1 1.220null
2 null2050
3 1.421null
4 null2051
5 1.1nullnull
6 nullnull52
7 1.8nullnull
8 nullnull53

4.2. unfolding delta_object column

If you want to unfold the delta_object column into multi columns you should add an option when read and write:


option("delta_object_name" -> "root.device.turbine")

The "delta_object_name" is reserved key.

Then The SparkSQL Table Structure is as follow:

time(LongType) device(StringType) turbine(StringType)sensor_1(FloatType)sensor_2(IntType)sensor_3(IntType)
1 car turbine1 1.220null
2 car turbine1 null2050
3 car turbine1 1.421null
4 car turbine1 null2051
5 car turbine1 1.1nullnull
6 car turbine1 nullnull52
7 car turbine1 1.8nullnull
8 car turbine1 nullnull53

Then you can group by any level in delta_object. And then with the same option you can write this dataframe to TsFile.

5. Building

mvn clean scala:compile compile package

6. Examples

The path of 'test.tsfile' used in the following examples is "data/test.tsfile". Please upload 'test.tsfile' to hdfs in advance and the directory is "/test.tsfile".

6.1 Scala API

  • Example 1

     //read data in TsFile and create a table
     val df ="/test.tsfile")
     //query with filter
     val newDf = spark.sql("select * from tsfile_table where s1 > 1.2").cache()
  • Example 2

     val df =
        .load("/test.tsfile ")
     df.filter("time < 10").show()
  • Example 3

     //create a table in SparkSQL and build relation with a TsFile
     spark.sql("create temporary view tsfile_table using options(path = \"test.ts\")")
     spark.sql("select * from tsfile_table where s1 > 1.2").show()
  • Example 4(using options to read)

     val df ="delta_object_name", "root.device.turbine").tsfile("/test.tsfile")
     //create a table in SparkSQL and build relation with a TsFile
     spark.sql("select * from tsfile_table where turbine = 'd1' and device = 'car' and time < 10").show()
  • Example 5(write)

     val df ="/test.tsfile").write.tsfile("/out")
  • Example 6(using options to write)

     val df ="delta_object_name", "root.device.turbine").tsfile("/test.tsfile")
     df.write.option("delta_object_name", "root.device.turbine").tsfile("/out")

6.2 spark-shell

6.2.1 Start Spark Local Mode
./spark-2.0.1-bin-hadoop2.7/bin/spark-shell  --jars  tsfile-0.4.0.jar,tsfile-spark-connector-0.4.0.jar


  • Please replace "spark-2.0.1-bin-hadoop2.7/bin/spark-shell" with the real path of your spark-shell.
  • Multiple jar packages are separated by commas without any spaces.
  • The latest version used is v0.4.0. Distributed Mode
. /spark-2.0.1-bin-hadoop2.7/bin/spark-shell  --jars  tsfile-0.4.0.jar,tsfile-spark-connector-0.4.0.jar  --master spark://ip:7077
