MaxAbScaler
MaxAbsScaler transforms a dataset of Vector rows, rescaling each feature to range [-1, 1] by dividing through the maximum absolute value in each feature. It does not shift/center the data, and thus does not destroy any sparsity.
MaxAbsScaler computes summary statistics on a data set and produces a MaxAbsScalerModel. The model can then transform each feature individually to range [-1, 1].
1
import org.apache.spark.ml.feature.MaxAbsScaler
2
import org.apache.spark.ml.linalg.Vectors
3
val dataFrame = spark.createDataFrame(Seq(
4
(0, Vectors.dense(1.0, 0.1, -8.0)),
5
(1, Vectors.dense(2.0, 1.0, -4.0)),
6
(2, Vectors.dense(4.0, 10.0, 8.0))
7
)).toDF("id", "features")
8
val scaler = new MaxAbsScaler()
9
.setInputCol("features")
10
.setOutputCol("scaledFeatures")
11
// Compute summary statistics and generate MaxAbsScalerModel
12
val scalerModel = scaler.fit(dataFrame)
13
// rescale each feature to range [-1, 1]
14
val scaledData = scalerModel.transform(dataFrame)
15
scaledData.select("features", "scaledFeatures").show()
16
​
17
/*
18
Output:
19
+--------------+----------------+
20
| features| scaledFeatures|
21
+--------------+----------------+
22
|[1.0,0.1,-8.0]|[0.25,0.01,-1.0]|
23
|[2.0,1.0,-4.0]| [0.5,0.1,-0.5]|
24
|[4.0,10.0,8.0]| [1.0,1.0,1.0]|
25
+--------------+----------------+
26
*/
Copied!
​
Last modified 1yr ago
Copy link