Latent Dirichlet allocation or LDA
Latent Dirichlet allocation or LDA is implemented as an Estimator that supports both EMLDAOptimizer and OnlineLDAOptimizer, and generates a LDAModel as the base model.
1
import org.apache.spark.ml.clustering.LDA
2
// Loads data.
3
val dataset = spark.read.format("libsvm")
4
.load("file:///opt/spark/data/mllib/sample_libsvm_data.txt")
5
// Trains a LDA model.
6
val lda = new LDA().setK(10).setMaxIter(10)
7
val model = lda.fit(dataset)
8
val ll = model.logLikelihood(dataset)
9
val lp = model.logPerplexity(dataset)
10
println(s"The lower bound on the log likelihood of the entire corpus: $ll")
11
println(s"The upper bound bound on perplexity: $lp")
12
// Describe topics.
13
val topics = model.describeTopics(3)
14
println("The topics described by their top-weighted terms:")
15
topics.show(false)
16
​
17
/*
18
Output:
19
​
20
The lower bound on the log likelihood of the entire corpus: -1.2892413081774298E7
21
The upper bound bound on perplexity: 5.296387332859924
22
The topics described by their top-weighted terms:
23
+-----+---------------+--------------------------------------------------------------------+
24
|topic|termIndices |termWeights |
25
+-----+---------------+--------------------------------------------------------------------+
26
|0 |[569, 597, 598]|[0.01048331300016006, 0.010116199318706288, 0.009445101538413367] |
27
|1 |[233, 261, 205]|[0.022233380425792586, 0.01683403182240119, 0.014646518135972245] |
28
|2 |[342, 343, 553]|[0.01101987068888023, 0.010051896006494202, 0.009974954658255184] |
29
|3 |[125, 124, 331]|[0.010249053484287323, 0.008001789628260321, 0.007856951022221307] |
30
|4 |[406, 434, 378]|[0.016726468396808178, 0.016551662166306314, 0.016312669466501947] |
31
|5 |[301, 272, 538]|[0.011187985574975348, 0.01026802560070681, 0.009910574054908557] |
32
|6 |[265, 237, 181]|[0.016311101183295176, 0.01450491494274881, 0.013849316888254096] |
33
|7 |[542, 514, 682]|[0.0426212232584461, 0.040669536800267865, 0.04004669879586029] |
34
|8 |[48, 420, 421] |[0.001968951371791888, 0.0018823651925661982, 0.0018553426747176778]|
35
|9 |[664, 637, 465]|[0.04727237035523583, 0.04361701605039732, 0.03568842133530933] |
36
+-----+---------------+--------------------------------------------------------------------+
37
​
38
​
39
*/
Copied!
​
Last modified 1yr ago
Copy link