Issue from running Cartesian Join Query
What is Cartesian Join query?
A Cartesian Join SQL query is also known as cross join SQL query.
A SQL join query that does no have join condition or does not have sufficient join conditions is a Cartesian join query, the result of it is called Cartesian product.
To avoid Cartesian product, a SQL query that joins N tables must have N-1 join conditions.
A Cartesian join query may be needed for reason depending on your design on your application. However, when you run Cartesian join query in Spark SQL, it is likely you may run into below error:
org.apache.spark.sql.AnalysisException:
Detected implicit cartesian product for INNER
join between logical plans
Project [VisitorId#14]
+- LogicalRDD [products#13, visitorId#14], false
and
Project [id#21, if (isnotnull(name#6)) name#6 else invalid product AS name#25, interest#22]
+- Join FullOuter, (id#21 = id#5)
:- Project [products#19.id AS id#21, products#19.interest AS interest#22]
: +- Generate explode(products#13), [0], false, [products#19]
: +- Project [products#13]
: +- LogicalRDD [products#13, visitorId#14], false
+- LocalRelation [id#5, name#6]
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;The error message does tell you the work around:
Therefore, the solutions are below:
Add below config in Spark session
By for example
or
The above solution requires you to add spark.sql.crossJoin.enabled=true
Into each of your Spark driver application code.
Set the default in Spark configuration to be effective to all
in $SPARK_HOME/conf directory, there is a file called:
spark-defaults.conf.template
Copy or rename this file to spark-defaults.conf
Then edit spark-defaults.conf, such as by vi, add a line at the end:
Make sure no # sign at the begin. Then bounce the Spark cluster by:
If you are running jupyter-notebook, make sure restart jupyter-notebook server process.
Last updated
Was this helpful?