index
Create a file called Dockerfile (name does matter, docker program will look for that file)
vi Dockerfile, enter below, then save and exit
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN tar -xzf spark-3.0.0-preview-bin-hadoop2.7.tgz && \
mv spark-3.0.0-preview-bin-hadoop2.7 /spark && \
rm spark-3.0.0-preview-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh
Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes
vi start-master.sh, enter below, save and exit
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.master.Master \
--ip $SPARK\_LOCAL\_IP \
--port $SPARK\_MASTER\_PORT \
--webui-port $SPARK\_MASTER\_WEBUI\_PORT
chmod +x start-master.sh
vi start-worker.sh, enter below, save and exit
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \
--webui-port $SPARK\_WORKER\_WEBUI\_PORT \
$SPARK\_MASTER
chmod +x start-worker.sh
Then build the docker image to be used in our class.
This is assume you are inside docker_dir directory, if not, cd into it, because it has the Dockerfile required
Run beSetup Elipcse Scala IDElow to build Spark cluster docker
docker build -t spark_lab/spark:latest
Last updated
Was this helpful?