k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters.
import org.apache.spark.ml.clustering.KMeans
// Loads data.
val dataset = spark.read.format("libsvm").load("file:///opt/spark/data/mllib/sample_libsvm_data.txt")
// Trains a k-means model.
val kmeans = new KMeans().setK(2).setSeed(1L)
val model = kmeans.fit(dataset)
// Evaluate clustering by computing Within Set Sum of Squared Errors.
val WSSSE = model.computeCost(dataset)
println(s"Within Set Sum of Squared Errors = $WSSSE")
// Shows the result.
println("Cluster Centers: ")
Within Set Sum of Squared Errors = 2.1480729824555594E8
Cluster Centers:
Within Set Sum of Squared Errors = 2.1480729824555594E8
Cluster Centers:
Last updated
Was this helpful?