Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 3.38 KB

2018-12-25-pyspark-pycharm-development.md

File metadata and controls

66 lines (47 loc) · 3.38 KB
layout title author categories image featured hidden
post
PySpark development environment with debugger; leveraging PyCharm on mac
dev
spark
pyspark
pycharm
bigdata
python
tech
true
true

Note: This turorial only focus on Spark Version 2.2.0 or later.

There are a lot of tutorials present on the web regarding this topic but none of them actually use pip to speed up the process.

With the release of spark version 2.2.0, spark added support for pip installer for pyspark. The Jira issue for the same can be tracked here - SPARK-1267.

To simplify the process of creating a development environment, we will install pyspark using pip package manager in PyCharm.

Follow the steps:

  • Download Pycharm
  • Create a new project in pycharm
  • Select Pure Python as the type of project
  • Under project Interpreter select New Virtual Environment and select python version 3 or later in base interpreter section.
  • Note: spark is only compatible for python3 and not python2

screenshot 2018-12-26 at 1 07 06 am{: .shadow}

  • Click on create button for further steps
  • Go to Preferences section in settings (shortcut: command + ,)
  • In Preferences go to Project section
  • Select Project Interpreter and select + to add the packages.

screenshot 2018-12-26 at 1 16 16 am{: .shadow}

  • Search for pyspark and install the spark version of your choice.
  • Click on Apply and OK.
  • You have successfully installed pyspark on your pycharm environment.
  • Now In your project create a new python file and add the following code in the file.
from pyspark import SparkContext

if __name__ == "__main__":
    sc = SparkContext(appName="python-helloworld")
    print("hello world", sc.startTime, sc.appName)
  • Execute the pyspark executor using Run button or using shortcut control + R
  • If the file executes successfully and you get some output like hello world 1545767710074 python-helloworld; You have finished the tutorial successfully.

What is PyCharm?

PyCharm is an integrated development environment (IDE) used in computer programming, specifically for the Python language. It is developed by the Czech company JetBrains (also developed Intellij Idea Java IDE, Kotlin programming language). It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems (VCSes), and supports web development with Django.

Using Debugger In PyCharm

Jetbrains has a great documentation of Pycharm debugging.

  • General introduction for debugging features like breakpoint can be found here.
  • Debugging session along with detailed example can be found here.

References: