# Dockerfile

### Dockerfile

Create a file called Dockerfile (name does matter, docker program will look for that file)

Need to create 2 shell scripts, start-master.sh and start-worker.sh, to start up Spark cluster with master and work nodes

vi Dockerfile, enter below, then save and exit

```
FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget http://mirrors.advancedhosters.com/apache/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz
RUN tar -xzf spark-3.0.0-preview2-bin-hadoop2.7.tgz && mv spark-3.0.0-preview2-bin-hadoop2.7 /spark && rm spark-3.0.0-preview2-bin-hadoop2.7.tgz
COPY start-master.sh /start-master.sh
COPY start-worker.sh /start-worker.sh

```

vi start-master.sh, enter below, save and exit

do not forget the back slash \ as line continuation

```
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.master.Master \
--ip $SPARK_LOCAL_IP \
--port $SPARK_MASTER_PORT \
--webui-port $SPARK_MASTER_WEBUI_PORT
```

chmod +x start-master.sh

vi start-worker.sh, enter below, save and exit

```
#!/bin/sh
/spark/bin/spark-class org.apache.spark.deploy.worker.Worker \
--webui-port $SPARK_WORKER_WEBUI_PORT \
$SPARK_MASTER

```

chmod +x start-worker.sh

Then build the docker image to be used in our class.

This is assume you are inside docker\_dir directory, if not, cd into it, because it has the Dockerfile required

Run below to build Spark cluster docker

```
docker build -t spark_lab/spark:latest .
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://george-jen.gitbook.io/data-science-and-apache-spark/untitled-12.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
