Dockerfile

Dockerfile

Create a file called Dockerfile (name does matter, docker program will look for that file)
Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes
vi Dockerfile, enter below, then save and exit
1
FROM openjdk:8-alpine
2
RUN apk --update add wget tar bash
3
RUN wget http://mirrors.advancedhosters.com/apache/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
4
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
5
COPY start-master.sh /start-master.sh
6
COPY start-worker.sh /start-worker.sh
7
​
Copied!
vi start-master.sh, enter below, save and exit
do not forget the back slash \ as line continuation
1
#!/bin/sh
2
/spark/bin/spark-class org.apache.spark.deploy.master.Master \
3
--ip $SPARK_LOCAL_IP \
4
--port $SPARK_MASTER_PORT \
5
--webui-port $SPARK_MASTER_WEBUI_PORT
Copied!
chmod +x start-master.sh
vi start-worker.sh, enter below, save and exit
1
#!/bin/sh
2
/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \
3
--webui-port $SPARK_WORKER_WEBUI_PORT \
4
$SPARK_MASTER
5
​
Copied!
chmod +x start-worker.sh
Then build the docker image to be used in our class.
This is assume you are inside docker_dir directory, if not, cd into it, because it has the Dockerfile required
Run below to build Spark cluster docker
1
docker build -t spark_lab/spark:latest .
Copied!
Last modified 1yr ago
Copy link