Normalizer
Normalizer is a Transformer which transforms a dataset of Vector rows, normalizing each Vector to have unit norm. It takes parameter p, which specifies the p-norm used for normalization. (p=2 by default.) This normalization can help standardize your input data and improve the behavior of learning algorithms.
1
import org.apache.spark.ml.feature.Normalizer
2
import org.apache.spark.ml.linalg.Vectors
3
​
4
val dataFrame = spark.createDataFrame(Seq(
5
(0, Vectors.dense(1.0, 0.5, -1.0)),
6
(1, Vectors.dense(2.0, 1.0, 1.0)),
7
(2, Vectors.dense(4.0, 10.0, 2.0))
8
)).toDF("id", "features")
9
​
10
// Normalize each Vector using $L^1$ norm.
11
val normalizer = new Normalizer()
12
.setInputCol("features")
13
.setOutputCol("normFeatures")
14
.setP(1.0)
15
​
16
val l1NormData = normalizer.transform(dataFrame)
17
println("Normalized using L^1 norm")
18
l1NormData.show()
19
​
20
// Normalize each Vector using $L^\infty$ norm.
21
val lInfNormData = normalizer.transform(dataFrame, normalizer.p -> Double.PositiveInfinity)
22
println("Normalized using L^inf norm")
23
lInfNormData.show()
24
​
25
/*
26
Output:
27
Normalized using L^1 norm
28
+---+--------------+------------------+
29
| id| features| normFeatures|
30
+---+--------------+------------------+
31
| 0|[1.0,0.5,-1.0]| [0.4,0.2,-0.4]|
32
| 1| [2.0,1.0,1.0]| [0.5,0.25,0.25]|
33
| 2|[4.0,10.0,2.0]|[0.25,0.625,0.125]|
34
+---+--------------+------------------+
35
​
36
Normalized using L^inf norm
37
+---+--------------+--------------+
38
| id| features| normFeatures|
39
+---+--------------+--------------+
40
| 0|[1.0,0.5,-1.0]|[1.0,0.5,-1.0]|
41
| 1| [2.0,1.0,1.0]| [1.0,0.5,0.5]|
42
| 2|[4.0,10.0,2.0]| [0.4,1.0,0.2]|
43
+---+--------------+--------------+
44
​
45
​
46
*/
Copied!
Last modified 1yr ago
Copy link