Install findspark, add spylon-kernel for scala

Install Python findspark library to be used in standalone Python script or Jupyter notebook to run Spark application outside PySpark.

Install Jupyter notebook Spylon kernel to run Scala code inside Jupyter notebook interactively.

For Windows:

To install findspark library for Python, open an Anaconda command prompt, run as administrator

For Linux or Mac:

Simply open a terminal by starting terminal app or by connecting with putty.exe from a windows machine or with ssh from a non Windows machine.

To go to virtual environment spark you have created that has Python 3.6 and Jupyter notebook earlier

conda activate spark

If you want to exit virtual environment, run below to exit:

conda deactivate

For the first time, update pip

pip install pip --upgrade

Then install findspark Python module

pip install findspark

findspark library contains class method findspark.init(), which looks where is SPARK home is located. For findspark.init() to work, SPARK_HOME environment variable must be set already and points to the path of Apache Spark home directory.

To invoke findspark.init(), run it at the beginning of your Python script:

import findspark
findspark.init()

You need to run findspark.init() especially when you write your code in jupyter notebook

You do not need to run findspark.init() if you start your Python using pyspark, which it knows where the Spark home directory is located.

On Windows:

conda activate spark
%SPARK_HOME%\bin\pyspark

On Linux or Mac:

conda activate spark
$SPARK_HOME/bin/pyspark

Following is pyspark prompt:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.6.10 (default, Jan  7 2020 21:14:29)
SparkSession available as 'spark'.
>>>

Next is to install Spylon library for Jupyter notebook to run Scala commands inside Jupyter-notebook

Open an Anaconda command prompt as administrator if for Windows and open a terminal if Mac or Linux:

conda activate spark
pip install spylon-kernel

Create a kernel spec for Jupyter notebook

python -m spylon_kernel install --user

Now you can test it out by start Jupyter-notebook in Anaconda Navigator->spark virtual environment, click New notebook, you will see spylon-kernel available in the drop down list

Open a notebook by clicking on spylon-kernel to run some Scala code, such as estimate Pi

PreviousEclipse, the Scala IDE Nextssh and scp client

Last updated 5 years ago

Was this helpful?