Python with Apache Spark using Jupyter notebook
Now letβs run the Python version of pi program. Start Anaconda Navigator, select Virtual Environment spark
In the Jupyter Notebook, need to import findspark and run findspark.init(), which will find where the SPARK_HOME points to.
#!/usr/bin/env python
# coding: utf-8
from __future__ import print_function
import findspark
findspark.init()
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
spark =SparkSession.builder.appName("PythonPi").getOrCreate()
partitions = 1
n = 100000 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
spark.stop()