VectorAssembler
VectorAssembler is a transformer that combines a given list of columns into a single vector column. It is useful for combining raw features and features generated by different feature transformers into a single feature vector, in order to train ML models like logistic regression and decision trees. VectorAssembler accepts the following input column types: all numeric types, boolean type, and vector type. In each row, the values of the input columns will be concatenated into a vector in the specified order.
Examples
Assume that we have a DataFrame with the columns id, hour, mobile, userFeatures, and clicked:
1
id | hour | mobile | userFeatures | clicked
2
----|------|--------|------------------|---------
3
0 | 18 | 1.0 | [0.0, 10.0, 0.5] | 1.0
Copied!
userFeatures is a vector column that contains three user features. We want to combine hour, mobile, and userFeatures into a single feature vector called features and use it to predict clicked or not. If we set VectorAssemblerโ€™s input columns to hour, mobile, and userFeatures and output column to features, after transformation we should get the following DataFrame:
1
id | hour | mobile | userFeatures | clicked | features
2
----|------|--------|------------------|---------|-----------------------------
3
0 | 18 | 1.0 | [0.0, 10.0, 0.5] | 1.0 | [18.0, 1.0, 0.0, 10.0, 0.5]
Copied!
1
import org.apache.spark.ml.feature.VectorAssembler
2
import org.apache.spark.ml.linalg.Vectors
3
val dataset = spark.createDataFrame(
4
Seq((0, 18, 1.0, Vectors.dense(0.0, 10.0, 0.5), 1.0))
5
).toDF("id", "hour", "mobile", "userFeatures", "clicked")
6
val assembler = new VectorAssembler()
7
.setInputCols(Array("hour", "mobile", "userFeatures"))
8
.setOutputCol("features")
9
val output = assembler.transform(dataset)
10
println("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'features'")
11
output.select("features", "clicked").show(false)
12
โ€‹
13
/*
14
Output:
15
+-----------------------+-------+
16
|features |clicked|
17
+-----------------------+-------+
18
|[18.0,1.0,0.0,10.0,0.5]|1.0 |
19
+-----------------------+-------+
20
*/
Copied!
โ€‹
โ€‹
Last modified 1yr ago
Copy link