Install findspark, add spylon-kernel for scala

Install findspark, add spylon-kernel for scala

Install Python findspark library to be used in standalone Python script or Jupyter notebook to run Spark application outside PySpark.
Install Jupyter notebook Spylon kernel to run Scala code inside Jupyter notebook interactively.
For Windows:
To install findspark library for Python, open an Anaconda command prompt, run as administrator
For Linux or Mac:
Simply open a terminal by starting terminal app or by connecting with putty.exe from a windows machine or with ssh from a non Windows machine.
To go to virtual environment spark you have created that has Python 3.6 and Jupyter notebook earlier
1
conda activate spark
Copied!
If you want to exit virtual environment, run below to exit:
1
conda deactivate
Copied!
For the first time, update pip
1
pip install pip --upgrade
Copied!
Then install findspark Python module
1
pip install findspark
Copied!
findspark library contains class method findspark.init(), which looks where is SPARK home is located. For findspark.init() to work, SPARK_HOME environment variable must be set already and points to the path of Apache Spark home directory.
To invoke findspark.init(), run it at the beginning of your Python script:
1
import findspark
2
findspark.init()
Copied!
You need to run findspark.init() especially when you write your code in jupyter notebook
You do not need to run findspark.init() if you start your Python using pyspark, which it knows where the Spark home directory is located.
On Windows:
1
conda activate spark
2
%SPARK_HOME%\bin\pyspark
Copied!
On Linux or Mac:
1
conda activate spark
2
$SPARK_HOME/bin/pyspark
Copied!
Following is pyspark prompt:
1
Welcome to
2
____ __
3
/ __/__ ___ _____/ /__
4
_\ \/ _ \/ _ `/ __/ '_/
5
/__ / .__/\_,_/_/ /_/\_\ version 2.4.4
6
/_/
7
​
8
Using Python version 3.6.10 (default, Jan 7 2020 21:14:29)
9
SparkSession available as 'spark'.
10
>>>
11
​
Copied!
Next is to install Spylon library for Jupyter notebook to run Scala commands inside Jupyter-notebook
Open an Anaconda command prompt as administrator if for Windows and open a terminal if Mac or Linux:
1
conda activate spark
2
pip install spylon-kernel
Copied!
Create a kernel spec for Jupyter notebook
1
python -m spylon_kernel install --user
Copied!
Now you can test it out by start Jupyter-notebook in Anaconda Navigator->spark virtual environment, click New notebook, you will see spylon-kernel available in the drop down list
Open a notebook by clicking on spylon-kernel to run some Scala code, such as estimate Pi
Last modified 1yr ago
Copy link