The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
I am connecting to Databricks through Native Query that uses left joins and has where clause.
Based on the Microsodt Document - Optimize by Expanding Columns it seems that merge queries are better than using left joins.
Also based on the article on Query folding in Azure Databricks it is suggested that Dtabricks.Query be used instead of the Databricks.Catalog connector that does not allow query folding. Additionaly, it is suggested that teh WHERE clause be separated out.
Based on the above:
1. Are merge queries better than left joins used in native SQL queries? Why so? When the technique mentioned in the article was used, I could see that "View Native Query" option was enabled i.e not greyed out meaning query folding was happening. Wouldn't the entire query including the joins and where clauses get executed at the source -Databricks?
2. According to the article, its better to split the query into 2 parts - WHERE clause separately and also parametarise the query if possible. What advantage does this bring (if any)? My query still folds with just one long query including the WHERE clause.
3. What about the Databricks.Catalogs is not allowing query folding to happen ? Even when Enable query folding was set to "TRUE", the option to view native query was disabled as shown below:
Value.NativeQuery(Databricks.Catalogs(host_path, warehouse_path)[Catalog="hive_metastore", Database=null, EnableAutomaticProxyDiscovery=null]) {[Name="hive_metastore",Kind="Database"]}[Data], "select * from schema.table A left join another_table B on A.key_col = B.key_col where B.key_col = "some_value"", null, [EnableFolding=true])
Which is a preferable way to optimise for efficiency?
1. Import separate tables and the merge them into one ?
2. Use Databricks.Query and use a SQL with left join and seperate out the where clause ?
@Hoping did you ever fix this? running into the same issues now with query folding and databricks. Whatever I try, it doesn't work even though databricks itself explicitly mention that query folding should be possible now:
https://www.databricks.com/blog/databricks-power-bi-connector-now-supports-native-query