Spark-curves provides SQL functions and ML Classes on Apache Spark about curve calculations such as curve similarity, clustering, etc. For example, it can apply to the analysis of traffic flow or human flow.
Apache Spark SQL & MLlib 3.5.0 (on Scala 2.12 or 2.13) are required to use Spark-curves.
libraryDependencies ++= Seq(
"io.github.toyotainfotech" %% "spark-curves" % "0.4.0",
"org.apache.spark" %% "spark-sql" % "3.5.0" % Provided,
"org.apache.spark" %% "spark-mllib" % "3.5.0" % Provided
)
import io.github.toyotainfotech.spark.curves.sql.functions.registerCurveFunctions
val spark: SparkSession = ??? // use a SparkSession instance in your Spark environment
registerCurveFunctions(spark.sessionState.functionRegistry)
> SELECT discrete_frechet_distance(array(array(0d, 0d), array(2d, 0d), array(6d, 0d)), array(array(0d, 1d), array(2d, 1d), array(3d, 1d), array(6d, 1d)))
1.414
> SELECT continuous_dynamic_time_warping(array(array(-1d, 1d), array(0d, 1d), array(0d, 2d), array(1d, 2d), array(2d, 2d), array(3d, 2d), array(3d, 1d), array(4d, 1d)), array(array(-1d, 0d), array(0d, 0d), array(3d, 0d), array(4d, 0d)))
2.722
val klClustering = new KLClustering()
.setK(2)
.setL(16)
.setIter(2)
.setCurveSimilarityFunc("continuous_frechet_distance")
.setCurveSimplifyFunc("polyline_sample_equidistant")
.setPointsCenterFunc("smallest_circle_center")
.setInputCurveCol("curve")
.setInputCurveAxisFields(Array("x", "y"))
.setOutputClusterIndexCol("clusterIndex")
.setOutputSimilarityCol("similarity")
.setOutputMatchingCurveCol("matching_curve")
val klClusterModel = klClustering.fit(trajectories)
val result = klClusterModel.transform(trajectories)
See API.
The following experiment is clustering of up to 3,578,281 curves on Amazon EMR.
See DEIM 2023 paper (Japanese) for details.
Copyright 2023 TOYOTA MOTOR CORPORATION
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.