Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Make append with remove duplicates work in a Dataflow Gen2 using a Lakehouse data source

You can’t currently publish a Dataflow Gen2 that appends two tables from a Fabric Lakehouse and removes duplicates after append. When, after the “Append queries”-step (i.e. Table.Combine), you include the “Remove duplicates” step (i.e. Table.Distinct) in Power Query Online UI, the data preview stops refreshing and times out after 10 minutes. You can’t view the query plan either, which seems to suggest that there is an issue with compiling the query plan. If you try to publish the dataflow regardless, the publishing process keeps running until timed out after 8 hours. In my experience you can only make the dataflow publish if you use Table.Buffer-function before Table.Distinct.


Below is an example code to reproduce the situation with the working code commented out. The data source is a Lakehouse with two delta tables (table_a and table_b) both having just one column “A” with a couple of integer values.


let

 Source = Lakehouse.Contents(null){[workspaceId = "XXX"]}[Data]{[lakehouseId = "YYY"]}[Data],

 #"Navigation 1" = Source{[Id = "table_a", ItemKind = "Table"]}[Data],

 #"Navigation 2" = Source{[Id = "table_b", ItemKind = "Table"]}[Data],

 #"Appended query" = Table.Combine({#"Navigation 1", #"Navigation 2"}),

 #"Removed duplicates" = Table.Distinct(#"Appended query", {"A"})

 // #"Removed duplicates" = Table.Distinct(Table.Buffer(#"Appended query"), {"A"})

in

 #"Removed duplicates"

 

I’m told this is by design although this seems more like a bug to me. If this cannot be fixed I would suggest that Microsoft would properly document the need for always using Table.Buffer before Table.Combine + Table.Distinct with a Lakehouse data source, and preferably create a warning and/or error message for the user instead of making the user wait until timed out. Best solution would of course be that the Lakehouse connection would support query folding for this seemingly rather simple query (perhaps using “union” instead of a “union all”).

Status: New
Comments
fbcideas_migusr
New Member
Status changed to: New