shortTitle	shortDescription	expandText	anchorTarget	icon
Data Processing	Pick your favorite notebook. Run massively distributed big data pipelines; train NLP or ML models; perform numerical analysis; visualize data and more.	Processing data	processing-data	icon11.svg

Big Data Analysis

Analyse petabytes of data in parallel on single-node machines or on clusters.

Compute either in batches or in real-time. Execute fast, distributed relational operations on your data, or train machine learning algorithms.

Work with popular storage and computation engines such as Spark, Kafka, Hadoop, Flink, Cassandra, Delta Lake and more.

Libraries for processing big data

Analyse data across a cluster with Spark

// Count the number of words in a text source
val textFile = spark.textFile("hdfs://...")
val counts = textFile
  .flatMap(line => line.split(" "))
  .map(word => (word, 1))
  .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

Notebooks

Explore data in web-based notebooks and produce rich, interactive output.

Combine code, data, and visualizations in a single document. Make changes and instantly see results. Share and collaborate with others.

Along many cloud-hosted solutions, open-source notebooks for Scala include the almond Jupyter kernel, Zeppelin and Polynote.

Libraries for big data and visualisation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

2-data.md

2-data.md

Big Data Analysis

Notebooks

Files

2-data.md

Latest commit

History

2-data.md

File metadata and controls

Big Data Analysis

Notebooks