Interaction
​
Interaction is a Transformer which takes vector or double-valued columns, and generates a single vector column that contains the product of all combinations of one value from each input column.
For example, if you have 2 vector type columns each of which has 3 dimensions as input columns, then you’ll get a 9-dimensional vector as the output column.
It is a Cartesian product between 2 vectors
Examples
Assume that we have the following DataFrame with the columns β€œid1”, β€œvec1”, and β€œvec2”:
id1
vec1
vec2
1
[1.0,2.0,3.0]
[8.0,4.0,5.0]
2
[4.0,3.0,8.0]
[7.0,9.0,8.0]
3
[6.0,1.0,9.0]
[2.0,3.0,6.0]
4
[10.0,8.0,6.0]
[9.0,4.0,5.0]
5
[9.0,2.0,7.0]
[10.0,7.0,3.0]
6
[1.0,1.0,4.0]
[2.0,8.0,4.0]
Applying Interaction with those input columns, then interactedCol as the output column contains:
id1
vec1
vec2
interactedCol
1
[1.0,2.0,3.0]
[8.0,4.0,5.0]
[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]
2
[4.0,3.0,8.0]
[7.0,9.0,8.0]
[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]
3
[6.0,1.0,9.0]
[2.0,3.0,6.0]
[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]
4
[10.0,8.0,6.0]
[9.0,4.0,5.0]
[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]
5
[9.0,2.0,7.0]
[10.0,7.0,3.0]
[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0]
6
[1.0,1.0,4.0]
[2.0,8.0,4.0]
[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]
1
import org.apache.spark.ml.feature.Interaction
2
import org.apache.spark.ml.feature.VectorAssembler
3
​
4
val df = spark.createDataFrame(Seq(
5
(1, 1, 2, 3, 8, 4, 5),
6
(2, 4, 3, 8, 7, 9, 8),
7
(3, 6, 1, 9, 2, 3, 6),
8
(4, 10, 8, 6, 9, 4, 5),
9
(5, 9, 2, 7, 10, 7, 3),
10
(6, 1, 1, 4, 2, 8, 4)
11
)).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7")
12
​
13
val assembler1 = new VectorAssembler().
14
setInputCols(Array("id2", "id3", "id4")).
15
setOutputCol("vec1")
16
​
17
val assembled1 = assembler1.transform(df)
18
​
19
val assembler2 = new VectorAssembler().
20
setInputCols(Array("id5", "id6", "id7")).
21
setOutputCol("vec2")
22
​
23
val assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
24
​
25
val interaction = new Interaction()
26
.setInputCols(Array("id1", "vec1", "vec2"))
27
.setOutputCol("interactedCol")
28
​
29
val interacted = interaction.transform(assembled2)
30
​
31
interacted.show(truncate = false)
32
​
33
/*
34
Output:
35
+---+--------------+--------------+------------------------------------------------------+
36
|id1|vec1 |vec2 |interactedCol |
37
+---+--------------+--------------+------------------------------------------------------+
38
|1 |[1.0,2.0,3.0] |[8.0,4.0,5.0] |[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0] |
39
|2 |[4.0,3.0,8.0] |[7.0,9.0,8.0] |[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0] |
40
|3 |[6.0,1.0,9.0] |[2.0,3.0,6.0] |[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0] |
41
|4 |[10.0,8.0,6.0]|[9.0,4.0,5.0] |[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]|
42
|5 |[9.0,2.0,7.0] |[10.0,7.0,3.0]|[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0] |
43
|6 |[1.0,1.0,4.0] |[2.0,8.0,4.0] |[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0] |
44
+---+--------------+--------------+------------------------------------------------------+
45
​
46
​
47
​
48
*/
Copied!
Last modified 1yr ago
Copy link