Now you are in SPARK_HOME
In my example, SPARK_HOME=/home/bigdata2/spark/spark
Append following in ~/.bashrc
Log out and log back in.
Now you are ready for works on sparks that will integrate with Hadoop and Hive
Before start spark master service, set following environment manually on command line
To start SPARK cluster, run:
Once spark cluster that has master and worker nodes (in our cluster, Spark master and worker nodes are on the same machine. You can see spark cluster information by connect to the server at port 8080
Now the environment is ready for you to start develop spark code on your development workstation and deploy your code to the spark cluster that will run it.
Working on Apache Spark means lots of coding with APIs provided by Apache Spark libraries that include SQL, Machine Learning, Streaming and Graph computing.
Spark supports Scala, Python, Java and R.
In our class, we will only focus on Scala and Python for all hands-on programming.