Install findspark, add spylon-kernel for scala
Install Python findspark library to be used in standalone Python script or Jupyter notebook to run Spark application outside PySpark.
Install Jupyter notebook Spylon kernel to run Scala code inside Jupyter notebook interactively.
For Windows:
To install findspark library for Python, open an Anaconda command prompt, run as administrator

For Linux or Mac:
Simply open a terminal by starting terminal app or by connecting with putty.exe from a windows machine or with ssh from a non Windows machine.
To go to virtual environment spark you have created that has Python 3.6 and Jupyter notebook earlier
conda activate spark
If you want to exit virtual environment, run below to exit:
conda deactivate
For the first time, update pip
pip install pip --upgrade
Then install findspark Python module
pip install findspark
findspark library contains class method findspark.init(), which looks where is SPARK home is located. For findspark.init() to work, SPARK_HOME environment variable must be set already and points to the path of Apache Spark home directory.
To invoke findspark.init(), run it at the beginning of your Python script:
import findspark
findspark.init()
You need to run findspark.init() especially when you write your code in jupyter notebook

You do not need to run findspark.init() if you start your Python using pyspark, which it knows where the Spark home directory is located.
On Windows:
conda activate spark
%SPARK_HOME%\bin\pyspark
On Linux or Mac:
conda activate spark
$SPARK_HOME/bin/pyspark
Following is pyspark prompt:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Python version 3.6.10 (default, Jan 7 2020 21:14:29)
SparkSession available as 'spark'.
>>>
Next is to install Spylon library for Jupyter notebook to run Scala commands inside Jupyter-notebook
Open an Anaconda command prompt as administrator if for Windows and open a terminal if Mac or Linux:
conda activate spark
pip install spylon-kernel
Create a kernel spec for Jupyter notebook
python -m spylon_kernel install --user
Now you can test it out by start Jupyter-notebook in Anaconda Navigator->spark virtual environment, click New notebook, you will see spylon-kernel available in the drop down list

Open a notebook by clicking on spylon-kernel to run some Scala code, such as estimate Pi

Last modified 3yr ago