forked from runawayhorse001/learning-apache-spark
-
Notifications
You must be signed in to change notification settings - Fork 0
/
install.Rmd
79 lines (61 loc) · 2.2 KB
/
install.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: "Installations"
author: "Wenqiang Feng & Ming Chen"
date: "2/17/2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### **Caution**: Before you start the following steps, please make sure you have already installed
- [Java JDK](http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html)
- [Ipython and python](https://ipython.org/install.html)
- If you use this method to install the Spark, you can skip the [Spark on Jupyter section](pyspark-on-jupyter.html)
### 1. Download Apache Spark from the official website
Weblink: [Download Apache Spark™](http://spark.apache.org/downloads.html)
### 2. Installation
Actually, the Pre-build version doesn't need installation. You can
use it when you unpack it.
### 3. Set path link
This is the most difficult step for the beginner. However, this step can be easily solved via Min RK's [`findspark`](https://github.com/minrk/findspark).
- install findspark
```{python eval=FALSE}
pip install findspark
```
- open `ipython` in terminal and import findspark
```{python eval=FALSE}
import findspark
findspark.init()
```
- finding spark path
```{python eval=FALSE}
findspark.find()
```
```{python eval=FALSE}
Out[3]: '/Users/wenqiangfeng/spark/'
```
- open `ipython --profile=myprofile` in terminal then run the following code
```{python eval=FALSE}
findspark.init('/Users/wenqiangfeng/spark/', edit_profile=True)
```
```{python eval=FALSE}
findspark.init('/Users/wenqiangfeng/spark/', edit_rc=True)
```
### Note:
- This will also help you to set up the `ipython notebook` or `Jupyter`. You may run the following code in terminal to double check it:
```{python eval=FALSE}
jupyter notebook
```
* If you PySpark still doesn't work, you need to check your `.profile` or `bash_profile` and add the following path to it
+ check `.profile` or `bash_profile` at terminal
+ add the path to your `.profile` or `bash_profile`
```{bash eval=FALSE}
vim ~/.profile
```
```{bash eval=FALSE}
# Added for Pyspark
export SPARK_HOME=YOUR_PATH/apache-spark/libexec
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
```