> For the complete documentation index, see [llms.txt](https://george-jen.gitbook.io/data-science-and-apache-spark/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://george-jen.gitbook.io/data-science-and-apache-spark/interaction.md).

# Interaction

Interaction is a Transformer which takes vector or double-valued columns, and generates a single vector column that contains the product of all combinations of one value from each input column.

For example, if you have 2 vector type columns each of which has 3 dimensions as input columns, then you’ll get a 9-dimensional vector as the output column.

It is a Cartesian product between 2 vectors

Examples

Assume that we have the following DataFrame with the columns “id1”, “vec1”, and “vec2”:

| id1 | vec1            | vec2            |
| --- | --------------- | --------------- |
| 1   | \[1.0,2.0,3.0]  | \[8.0,4.0,5.0]  |
| 2   | \[4.0,3.0,8.0]  | \[7.0,9.0,8.0]  |
| 3   | \[6.0,1.0,9.0]  | \[2.0,3.0,6.0]  |
| 4   | \[10.0,8.0,6.0] | \[9.0,4.0,5.0]  |
| 5   | \[9.0,2.0,7.0]  | \[10.0,7.0,3.0] |
| 6   | \[1.0,1.0,4.0]  | \[2.0,8.0,4.0]  |

Applying Interaction with those input columns, then interactedCol as the output column contains:

| id1 | vec1            | vec2            | interactedCol                                           |
| --- | --------------- | --------------- | ------------------------------------------------------- |
| 1   | \[1.0,2.0,3.0]  | \[8.0,4.0,5.0]  | \[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]             |
| 2   | \[4.0,3.0,8.0]  | \[7.0,9.0,8.0]  | \[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]      |
| 3   | \[6.0,1.0,9.0]  | \[2.0,3.0,6.0]  | \[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]         |
| 4   | \[10.0,8.0,6.0] | \[9.0,4.0,5.0]  | \[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0] |
| 5   | \[9.0,2.0,7.0]  | \[10.0,7.0,3.0] | \[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0]  |
| 6   | \[1.0,1.0,4.0]  | \[2.0,8.0,4.0]  | \[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]        |

```
import org.apache.spark.ml.feature.Interaction
import org.apache.spark.ml.feature.VectorAssembler

val df = spark.createDataFrame(Seq(
  (1, 1, 2, 3, 8, 4, 5),
  (2, 4, 3, 8, 7, 9, 8),
  (3, 6, 1, 9, 2, 3, 6),
  (4, 10, 8, 6, 9, 4, 5),
  (5, 9, 2, 7, 10, 7, 3),
  (6, 1, 1, 4, 2, 8, 4)
)).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7")

val assembler1 = new VectorAssembler().
  setInputCols(Array("id2", "id3", "id4")).
  setOutputCol("vec1")

val assembled1 = assembler1.transform(df)

val assembler2 = new VectorAssembler().
  setInputCols(Array("id5", "id6", "id7")).
  setOutputCol("vec2")

val assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")

val interaction = new Interaction()
  .setInputCols(Array("id1", "vec1", "vec2"))
  .setOutputCol("interactedCol")

val interacted = interaction.transform(assembled2)

interacted.show(truncate = false)

/*
Output:
+---+--------------+--------------+------------------------------------------------------+
|id1|vec1          |vec2          |interactedCol                                         |
+---+--------------+--------------+------------------------------------------------------+
|1  |[1.0,2.0,3.0] |[8.0,4.0,5.0] |[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]            |
|2  |[4.0,3.0,8.0] |[7.0,9.0,8.0] |[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]     |
|3  |[6.0,1.0,9.0] |[2.0,3.0,6.0] |[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]        |
|4  |[10.0,8.0,6.0]|[9.0,4.0,5.0] |[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]|
|5  |[9.0,2.0,7.0] |[10.0,7.0,3.0]|[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0] |
|6  |[1.0,1.0,4.0] |[2.0,8.0,4.0] |[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]       |
+---+--------------+--------------+------------------------------------------------------+


*/
```