Skip to content

The repository for Scala Spark workshop held by Tenaris Data Science Department in universities

License

Notifications You must be signed in to change notification settings

tenaris/scala-spark-workshop

Repository files navigation

Scala Spark Workshop

This repository collects the Databricks notebooks used in the Scala Spark workshop held at universities by Tenaris Data Science Department.

Contents

The repository contains two Databricks notebooks made for Databricks Community Edition. The aim is to teach Spark fundamentals to future Software Engineers.

One notebooks contains excercises to be completed by students, while the other contains the solutions.

Notebooks are in Italian and can run on Spark 2.0+ clusters. The previous edition of classes was based on Spark 1.6+: the code is still available under the branch spark_1.6.0.

Getting Started

Workshop Scala Spark Edition: Students should create their account on Databricks Community Edition and import the notebook published at https://raw.githubusercontent.com/tenaris/scala-spark-workshop/master/src/main/databricks/EsercitazioneScalaSparkNoSoluzioni.dbc

Workshop PySpark Edition: Students should create their account on Databricks Community Edition and import the notebook published at https://github.com/tenaris/scala-spark-workshop/raw/master/src/main/databricks/WorkshopPySpark_English_NoSolution_Cleaned.dbc

Dataset References

  • The Iris Plants Database by R.A. Fisher and made available by Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
  • The Italian 2016 Referendum dataset is freely available on the Eligendo portal, and licensed under the IODL 2.0 license.

About

The repository for Scala Spark workshop held by Tenaris Data Science Department in universities

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published