Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Dataflow Remove Duplicates and Group By Time out

Hi, 

 

I am trying to create dimension tables for my star schema in the dataflow in Fabric. However, my compuation is timing out at the "Remove duplicates" step. Then i tried to use Group By on all the columns in order to get to a table with all unique rows. Same issue. 

 

The exact same dataflow works in the legacy dataflow setup (in the Power BI workspaces) however it is failing in the MS Fabric trial tenant. Please help! My M query is below. The point of failure is highlighted in orange. 

 

Best,

Arjun

 

let
  Source = Table.Combine({#"T - DIM - Person Pid", #"T - DIM - Person Opl"}),
  Custom1 = Table.TransformColumnNames(Source,fnRenameColumns),
  #"Added custom" = Table.TransformColumnTypes(Table.AddColumn(Custom1, "Person Index", each [Full Name] & [Employee Num] & [Location Type] & [Primary Capability] & [Employee Career Stage] & [Home Global Region Name] & [Current Global Region Name] & [Home Country Name] & [Current Country Name] & [Person Type] & [Employee Business Unit] & [Derived End Country] & [Employee Job Description] & [Career Stage Level] & [Home Metro Name] & [Current Metro Name] & [Capability Type Pda] & [Capability Group Pda] & [Craft Pda] & (if [Cs Id] = null then "" else [Cs Id]) & [Capability Group Pda Collapsed]), {{"Person Index", type text}}),
  #"Inserted conditional column" = Table.AddColumn(#"Added custom", "Location Type Sorting", each if [Location Type] = "Local" then 1 else if [Location Type] = "Traveler" then 2 else if [Location Type] = "India" then 3 else if [Location Type] = "No Person Footprint" then 4 else 999),
  #"Changed column type" = Table.TransformColumnTypes(#"Inserted conditional column", {{"Location Type Sorting", Int64.Type}}),
  #"Inserted conditional column 1" = Table.AddColumn(#"Changed column type", "Capability Group Pda Collapsed Sorting", each if [Capability Group Pda Collapsed] = "Engineering" then 1 else if [Capability Group Pda Collapsed] = "Product" then 2 else if [Capability Group Pda Collapsed] = "Strategy" then 3 else if [Capability Group Pda Collapsed] = "Experience" then 4 else if [Capability Group Pda Collapsed] = "Non-Core Capabilities" then 5 else if [Capability Group Pda Collapsed] = "No Person Footprint" then 6 else 999),
  #"Changed column type 1" = Table.TransformColumnTypes(#"Inserted conditional column 1", {{"Capability Group Pda Collapsed Sorting", Int64.Type}}),
  #"Removed duplicates" = Table.Distinct(#"Changed column type 1", {"Person Index"}),
  #"Added index" = Table.AddIndexColumn(#"Removed duplicates", "Index", 1, 1, Int64.Type),
  #"Removed columns" = Table.RemoveColumns(#"Added index", {"Person Index"}),
  #"Renamed columns" = Table.RenameColumns(#"Removed columns", {{"Index", "Person Index"}})
in
  #"Renamed columns"

 

 

Status: Investigating

Hi @arjunjalan ,

 

Dataflows' UI validation logic has a built-in timeout of 10 minutes. It is possible that some queries, take longer than that, especially if the cannot fold. 

Did you try  to split your entities into multiple dataflows to see if it works?

 

Community Support Team_Caitlyn

 

Comments
v-xiaoyan-msft
Community Support
Status changed to: Investigating

Hi @arjunjalan ,

 

Dataflows' UI validation logic has a built-in timeout of 10 minutes. It is possible that some queries, take longer than that, especially if the cannot fold. 

Did you try  to split your entities into multiple dataflows to see if it works?

 

Community Support Team_Caitlyn

 

arjunjalan
Frequent Visitor

Thank you for the response! 

 

I did not split the queries. Reason being, the exact same dataflow works in the Power BI ecosystem (pre MS Fabric release). In fact we have production level reports using the dataflow. However, the same dataflow fails in MS Fabric. Not just in the UI, but even if I publish it, the dataflow fails. I understand it may take time for results to render as I am building the flow, but it fails in execution everytime. 

 

Essentially I am trying to recrate a product that already exists in the legacy Power BI Ecosystem that looks up SQL tables and then this STAR Schema Dataflow is created. But in the MS Fabric ecosystem, I have 1st moved the SQL tables into the lakehouse, and then trying to build a STAR schema within the Fabric ecosystem by using the flat tables stored in the lakehouse. 

 

Edit:

So i ran the same flow in the Power BI Dataflow ecosystem where the underlying table being queired is in an Azure SQL Database. It took for the query of interest 2m 16 seconds to render in the UI. 

 

The same query in the MS Fabric Dataflow ecosystem, where the underlying table being queried is a lakehouse table (excatly the numbers of rows and columns as the SQL Table). It always maxes out at the 10min mark and fails. 

 

Thanks!

Arjun 

 

Thanks!

arjunjalan
Frequent Visitor

Any updates here please ?!

v-xiaoyan-msft
Community Support

Hi  @arjunjalan ,

 

Sorry for the late reply, as Fabric is still in preview, issues related to it are still being optimized.

 

Best regards.
Community Support Team_Caitlyn