MinMaxScaler

MinMaxScaler

transforms a dataset of Vector rows, rescaling each feature to a specific range (often [0, 1]). It takes parameters:
min: 0.0 by default. Lower bound after transformation, shared by all features.
max: 1.0 by default. Upper bound after transformation, shared by all features.
MinMaxScaler computes summary statistics on a data set and produces a MinMaxScalerModel. The model can then transform each feature individually such that it is in the given range.
1
import org.apache.spark.ml.feature.MinMaxScaler
2
import org.apache.spark.ml.linalg.Vectors
3
val dataFrame = spark.createDataFrame(Seq(
4
(0, Vectors.dense(1.0, 0.1, -1.0)),
5
(1, Vectors.dense(2.0, 1.1, 1.0)),
6
(2, Vectors.dense(3.0, 10.1, 3.0))
7
)).toDF("id", "features")
8
val scaler = new MinMaxScaler()
9
.setInputCol("features")
10
.setOutputCol("scaledFeatures")
11
// Compute summary statistics and generate MinMaxScalerModel
12
val scalerModel = scaler.fit(dataFrame)
13
// rescale each feature to range [min, max].
14
val scaledData = scalerModel.transform(dataFrame)
15
println(s"Features scaled to range: [${scaler.getMin}, ${scaler.getMax}]")
16
scaledData.select("features", "scaledFeatures").show()
17
​
18
/*
19
Output:
20
Features scaled to range: [0.0, 1.0]
21
+--------------+--------------+
22
| features|scaledFeatures|
23
+--------------+--------------+
24
|[1.0,0.1,-1.0]| [0.0,0.0,0.0]|
25
| [2.0,1.1,1.0]| [0.5,0.1,0.5]|
26
|[3.0,10.1,3.0]| [1.0,1.0,1.0]|
27
+--------------+--------------+
28
​
29
*/
Copied!
​
Last modified 1yr ago
Copy link