Data Visualization with Vegas Viz and Scala with Spark ML

If you are python programmer working on data science, you are certainly very familiar with Matplotlib to visualize your classification or regression result, especially when you are using jupyter notebook. Matplotlib.pyplot is standard tool for data visualization, but there is one problem, it is only for Python.

If you code in Scala, you will need to use alternative, one of the great tools for data visualization with Scala is Vegas.

Vegas works with Jupyter-scala kernel, which is not necessarily tightly integrated with Apache Spark like for example, Spylon-kernel does. jupyter-scala kernel is light weight, great for pure Scala coding without Apache Spark, I have done pure coding with Scala with Jupyter-scala kernel when I do not need to use Apache Spark.

https://github.com/geyungjen/jentekllc/blob/master/Spark/Scala/Enrich_Json/ScalaJsonChallengeNoSparkNew.ipynb

Currently, as I am typing this text, Vegas-viz is only available on Maven repository, up to Scala version 2.11, if you are running Scala 2.12, it is not yet available from Maven repository, meaning you can not download needed jar files from Maven.

The use case I would like to demonstrate is to integrate light weight Jupyter-scala kernel, now it is called Almond

and Apache Spark.

To make Vegas-viz work, you need up to Scala 2.11, you can setup Apache Spark 2.4.4 with Hadoop 2.7 that includes Scala 2.11.

This assume you have Jupyter-notebook already. Then you just follow the install Almond instructions available on Almond website to install Jupyter-scala kernel.

Once installed, you are set to play with Jupyter-scala on Jupyter-notebook and Vegas-viz data visualization tool.

Here is a simple Scala code on Linear Regression from Apache Spark ML library to run under Almond/Jupyter-scala kernel on Jupyter-notebook.

With Vegas-viz, you can visualize anything that you see fit from dataset you have on Scala.

As always, code used in this writing is in my github site:

Last updated

Was this helpful?