index

Create a file called Dockerfile (name does matter, docker program will look for that file)

vi Dockerfile, enter below, then save and exit

FROM openjdk:8-alpine

RUN apk --update add wget tar bash

RUN wget http://apache.mirrors.lucidnetworks.net/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop2.7.tgz

RUN tar -xzf spark-3.0.0-preview-bin-hadoop2.7.tgz && \

mv spark-3.0.0-preview-bin-hadoop2.7 /spark && \

rm spark-3.0.0-preview-bin-hadoop2.7.tgz

COPY start-master.sh /start-master.sh

COPY start-worker.sh /start-worker.sh

Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes

vi start-master.sh, enter below, save and exit

#!/bin/sh

/spark/bin/spark-class org.apache.spark.deploy.master.Master \

--ip $SPARK\_LOCAL\_IP \

--port $SPARK\_MASTER\_PORT \

--webui-port $SPARK\_MASTER\_WEBUI\_PORT

chmod +x start-master.sh

vi start-worker.sh, enter below, save and exit

#!/bin/sh

/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \

--webui-port $SPARK\_WORKER\_WEBUI\_PORT \

$SPARK\_MASTER

chmod +x start-worker.sh

Then build the docker image to be used in our class.

This is assume you are inside docker_dir directory, if not, cd into it, because it has the Dockerfile required

Run beSetup Elipcse Scala IDElow to build Spark cluster docker

docker build -t spark_lab/spark:latest

Last updated