Dockerfile
Create a file called Dockerfile (name does matter, docker program will look for that file)
Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes
vi Dockerfile, enter below, then save and exit
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://mirrors.advancedhosters.com/apache/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh
vi start-master.sh, enter below, save and exit
do not forget the back slash \ as line continuation
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.master.Master \
--ip $SPARK_LOCAL_IP \
--port $SPARK_MASTER_PORT \
--webui-port $SPARK_MASTER_WEBUI_PORT
chmod +x start-master.sh
vi start-worker.sh, enter below, save and exit
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \
--webui-port $SPARK_WORKER_WEBUI_PORT \
$SPARK_MASTER
chmod +x start-worker.sh
Then build the docker image to be used in our class.
This is assume you are inside docker_dir directory, if not, cd into it, because it has the Dockerfile required
Run below to build Spark cluster docker
docker build -t spark_lab/spark:latest .