index

Now you are in SPARK_HOME

In my example, SPARK_HOME=/home/bigdata2/spark/spark

Append following in ~/.bashrc

export SPARK_HOME=/home/bigdata2/spark/spark

export PATH=$SPARK_HOME/bin:$PATH

Log out and log back in.

Now you are ready for works on sparks that will integrate with Hadoop and Hive

Before start spark master service, set following environment manually on command line

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native

To start SPARK cluster, run:

$SPARK_HOME/sbin/start-all.sh

Once spark cluster that has master and worker nodes (in our cluster, Spark master and worker nodes are on the same machine. You can see spark cluster information by connect to the server at port 8080

Now the environment is ready for you to start develop spark code on your development workstation and deploy your code to the spark cluster that will run it.

Working on Apache Spark means lots of coding with APIs provided by Apache Spark libraries that include SQL, Machine Learning, Streaming and Graph computing.

Spark supports Scala, Python, Java and R.

In our class, we will only focus on Scala and Python for all hands-on programming.

Last updated