shortTitle | shortDescription | expandText | anchorTarget | icon |
---|---|---|---|---|
Data Processing |
Pick your favorite notebook. Run massively distributed big data pipelines; train NLP or ML models; perform numerical analysis; visualize data and more. |
Processing data |
processing-data |
icon11.svg |
Analyse petabytes of data in parallel on single-node machines or on clusters.
Compute either in batches or in real-time. Execute fast, distributed relational operations on your data, or train machine learning algorithms.
Work with popular storage and computation engines such as Spark, Kafka, Hadoop, Flink, Cassandra, Delta Lake and more.
Libraries for processing big dataAnalyse data across a cluster with Spark
// Count the number of words in a text source
val textFile = spark.textFile("hdfs://...")
val counts = textFile
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
Explore data in web-based notebooks and produce rich, interactive output.
Combine code, data, and visualizations in a single document. Make changes and instantly see results. Share and collaborate with others.
Along many cloud-hosted solutions, open-source notebooks for Scala include the almond Jupyter kernel, Zeppelin and Polynote.
Libraries for big data and visualisation