# Download and install Spark

### Download and install Spark

make sure choose only 3.0.0 with Apache Hadoop 2.7, which the codes of this courses will be running with

<https://spark.apache.org/downloads.html>

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M1b3VYrmhuBaQNyUzeU%2F-M1dZfozDtknswxrB0BM%2Fspark.jpg?alt=media\&token=678d6f9d-e2d7-4149-a1ad-b36ae1d13867)

If you are using windows, and you do not have winzip or winrar installed, there are free alternatives of decompressing software that can expand tgz compressed file, that is needed to unpack Spark downloaded tgz file.

<https://beebom.com/winzip-winrar-alternatives/>

For windows, it is needed to setup for Hadoop, by download winutils.exe

<https://github.com/steveloughran/winutils>

Specifically, you can just download below winutils.exe file

<https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe>

You will need to first create the winutils folders, by opening a cmd window as administrator

```
mkdir c:\winutils
mkdir c:\winutils\bin
```

Then download the winutils.exe into c:\winutils\bin folder  &#x20;

c:\winutils\bin\\

After download, test to run

```
c:\winutils\bin\winutils.exe
```

By now, you have downloaded and extracted Spark, and winutils.exe (Hadoop utility for windows), next task is to set environment varibales in Windows control panel->system->advanced system settings->environment variables.  Make sure all environment variables&#x20;

*Must NOT contains any blank space even at the end*

*Only create/modify user variables, NOT system variables*

Setup below user environment variables

Set up SPARK\_HOME environment variable to point to home dir of Spark, in our case:

SPARK\_HOME=c:\spark\spark

Append %SPARK\_HOME%\bin to the PATH environment variable

Set up HADOOP\_HOME environment variable to point to Hadoop home dir, in our case:

HADOOP\_HOME=c:\winutils

Append %HADOOP\_HOME%\bin to the PATH environment variable

Set up default /tmp/hive directory that Spark needs. This means, in Windows, for example, you need to create a folder for example

Open a cmd command window as administrator

```
mkdir c:\tmp
mkdir c:\tmp\hive
```

Then set permission by

```
%HADOOP_HOME%\bin\winutils.exe chmod –R 777 c:\tmp\hive
```

Also point %TEMP% and %TMP% to c:\tmp, this means, create user environment variables TEMP and TMP:

TEMP=c:\temp

TMP=c:\tmp

if c:\temp and/or c:\tmp do not exist, please create them

Make sure your have defined environment variables in Windows control panel->system->advanced system settings->environment variables:

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7F8qIoATR18PHfBC4j%2F-M7Qy7WbXfVDPdb8GwMW%2Fwindows_env.jpg?alt=media\&token=b163d4d3-165a-4ab3-957b-e7a0eae961c2)

The PATH environment variable should be similar to:

![](https://2100080250-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M1PNTHVApkPePuMdTu3%2F-M7R47nLF6AsbHJw0WWo%2F-M7R8Cq3u8vdChQ7BlcP%2Fwindows_env2.jpg?alt=media\&token=b12fa711-0d1b-4e11-af83-c00acd0716fa)

Then you are done with Spark setup.
