# Install findspark, add spylon-kernel for scala

### Install findspark, add spylon-kernel for scala

Install Python findspark library to be used in standalone Python script or Jupyter notebook to run Spark application outside PySpark.

Install Jupyter notebook Spylon kernel to run Scala code inside Jupyter notebook interactively.

For Windows:

To install findspark library for Python, open an Anaconda command prompt, *run as administrator*&#x20;

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7Zom9wteJuj03v3myp%2F-M7ZpRAi-AyQzAmsVZug%2Fanaconda_windows_01.jpg?alt=media\&token=009691b6-8925-4ee5-b66a-528a9066304e)

For Linux or Mac:

Simply open a terminal by starting terminal app or by connecting with putty.exe from a windows machine or with ssh from a non Windows machine.

To go to virtual environment spark you have created that has Python 3.6 and Jupyter notebook earlier

```
conda activate spark
```

If you want to exit virtual environment, run below to exit:

```
conda deactivate
```

For the first time, update pip

```
pip install pip --upgrade
```

Then install findspark Python module

```
pip install findspark
```

findspark library contains class method findspark.init(), which looks where is SPARK home is located. For findspark.init() to work, SPARK\_HOME environment variable must be set already and points to the path of Apache Spark home directory.

To invoke findspark.init(), run it at the beginning of your Python script:

```
import findspark
findspark.init()
```

You need to run findspark.init() especially when you write your code in jupyter notebook

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7Zvt75P8MWz_mkdyp8%2F-M7ZwYZy0I5x3Q38-GjW%2Ffindspark.jpg?alt=media\&token=72c4bf92-b782-4ab6-ad45-e0af24702618)

You do not need to run findspark.init() if you start your Python using pyspark, which it knows where the Spark home directory is located.

On Windows:

```
conda activate spark
%SPARK_HOME%\bin\pyspark
```

On Linux or Mac:

```
conda activate spark
$SPARK_HOME/bin/pyspark
```

Following is pyspark prompt:

```
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.6.10 (default, Jan  7 2020 21:14:29)
SparkSession available as 'spark'.
>>>

```

Next is to install Spylon library for Jupyter notebook to run Scala commands inside Jupyter-notebook

Open an Anaconda command prompt as administrator if for Windows and open a terminal if Mac or Linux:

```
conda activate spark
pip install spylon-kernel
```

Create a kernel spec for Jupyter notebook

```
python -m spylon_kernel install --user
```

Now you can test it out by start Jupyter-notebook in Anaconda Navigator->spark virtual environment, click New notebook, you will see spylon-kernel available in the drop down list

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7ExKguzlYaIWojXeqb%2F-M7F8VAFb-Yp6HkiWAMw%2Fjupyter-spylon.jpg?alt=media\&token=c9b16852-5194-4b0f-a70b-a1fb73657133)

Open a notebook by clicking on spylon-kernel to run some Scala code, such as estimate Pi

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7ExKguzlYaIWojXeqb%2F-M7F8lpbPWlzRCGZEQsd%2Fspylon_pi.jpg?alt=media\&token=25c193aa-17cd-4508-ae16-23ead8798c0d)
