Python with Apache Spark using Jupyter notebook

Python with Apache Spark using Jupyter notebook

Now let’s run the Python version of pi program. Start Anaconda Navigator, select Virtual Environment spark
Click Jupyter Notebook
In the Jupyter Notebook, need to import findspark and run findspark.init(), which will find where the SPARK_HOME points to.
Following is the Python script that runs pi.py, you can simply run:
python pi.py
1
#!/usr/bin/env python
2
# coding: utf-8
3
from __future__ import print_function
4
import findspark
5
findspark.init()
6
import sys
7
from random import random
8
from operator import add
9
from pyspark.sql import SparkSession
10
spark =SparkSession.builder.appName("PythonPi").getOrCreate()
11
partitions = 1
12
n = 100000 * partitions
13
def f(_):
14
x = random() * 2 - 1
15
y = random() * 2 - 1
16
return 1 if x ** 2 + y ** 2 <= 1 else 0
17
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
18
print("Pi is roughly %f" % (4.0 * count / n))
19
spark.stop()
Copied!
Last modified 1yr ago
Copy link