Install findspark, add spylon-kernel for scala
Last updated
Last updated
Install Python findspark library to be used in standalone Python script or Jupyter notebook to run Spark application outside PySpark.
Install Jupyter notebook Spylon kernel to run Scala code inside Jupyter notebook interactively.
For Windows:
To install findspark library for Python, open an Anaconda command prompt, run as administrator
For Linux or Mac:
Simply open a terminal by starting terminal app or by connecting with putty.exe from a windows machine or with ssh from a non Windows machine.
To go to virtual environment spark you have created that has Python 3.6 and Jupyter notebook earlier
If you want to exit virtual environment, run below to exit:
For the first time, update pip
Then install findspark Python module
findspark library contains class method findspark.init(), which looks where is SPARK home is located. For findspark.init() to work, SPARK_HOME environment variable must be set already and points to the path of Apache Spark home directory.
To invoke findspark.init(), run it at the beginning of your Python script:
You need to run findspark.init() especially when you write your code in jupyter notebook
You do not need to run findspark.init() if you start your Python using pyspark, which it knows where the Spark home directory is located.
On Windows:
On Linux or Mac:
Following is pyspark prompt:
Next is to install Spylon library for Jupyter notebook to run Scala commands inside Jupyter-notebook
Open an Anaconda command prompt as administrator if for Windows and open a terminal if Mac or Linux:
Create a kernel spec for Jupyter notebook
Now you can test it out by start Jupyter-notebook in Anaconda Navigator->spark virtual environment, click New notebook, you will see spylon-kernel available in the drop down list
Open a notebook by clicking on spylon-kernel to run some Scala code, such as estimate Pi