Dockerfile

Dockerfile

Create a file called Dockerfile (name does matter, docker program will look for that file)

Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes

vi Dockerfile, enter below, then save and exit

FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://mirrors.advancedhosters.com/apache/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh

vi start-master.sh, enter below, save and exit

do not forget the back slash \ as line continuation

#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.master.Master \
--ip $SPARK_LOCAL_IP \
--port $SPARK_MASTER_PORT \
--webui-port $SPARK_MASTER_WEBUI_PORT

chmod +x start-master.sh

vi start-worker.sh, enter below, save and exit

#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \
--webui-port $SPARK_WORKER_WEBUI_PORT \
$SPARK_MASTER

chmod +x start-worker.sh

Then build the docker image to be used in our class.

This is assume you are inside docker_dir directory, if not, cd into it, because it has the Dockerfile required

Run below to build Spark cluster docker

docker build -t spark_lab/spark:latest .

Last updated