cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
Anonymous
Not applicable

Transformations in Power Query - Very Slow Performance

Hi Power BI Community,

 

I have a solution that is working very slow and wanted to get everyone input in order to get some understanding on what is causing a hit on performance. So what do I mean by ‘slow’?

 

Firstly, In Power Query, after I apply a Append or Merge, Power Query takes forever to complete previewing the result (completion is indicated in bottom right).

Secondly, One I click ‘Close and Apply’ to load the data to the Model, it takes over 30mins.

 

Also, if anyone has any tips on monitoring performance levels in Power Query to understand what’s causing the slow performance, would be very much appreciated.

 

I have 5 CSV Files:

 

CSV UK = 1.7M Rows (132 MB)

CSV AU = 205K Rows (15.8 MB)

CSV CZ = 5.5K Rows (445 KB)

CSV EU = 108K Rows (8.02 MB)

CSV ROW = 228K Rows (16.8 MB)

 

All these file are appended in Power Query starting from the ‘CSV UK’ as the below shows:

 

#"Appended Query" = Table.Combine({#"Added Custom", AU, CZ, EU, ROW})

 

The files above are allowing us to see each Transactions and Items that belong in each Transaction with measures such as Sales Value. This Query has been named Online Fact.

 

Q: Does the Append already add much strain on the solution? The reason I ask is, once I click on the Append Step in the ‘Applied Steps’ panel, Power Query takes quite some time to COMPLETE previewing the data.

 

Following this, I need to add a Flag Field to act as a filter for Transactions that have at least 1 item with a Sales value of 0.

 

To do this, I Duplicated the ‘Online Fact’ and renamed to ’Zero Sales’ in Power Query and:

 

  1. Added a Filter on Sales Value = 0
  2. Removed Duplicated
  3. Added a Custom Column which repeats ‘Y’ across all rows and named it ‘AdjZeroFlag’

Doing this allowed by to have a single Data Set which shows only Transactions which have at least one item with a 0 Sales Value. I then done a LEFT OUTER MERGE:

 

#"Merged Queries" = Table.NestedJoin(#"Sorted Rows",{"TransactionID"},#"Zero Sales",{" TransactionID "},"Zero Sales",JoinKind.LeftOuter),

#"Expanded Zero Sales" = Table.ExpandTableColumn(#"Merged Queries", " Zero Sales", {"AdjZeroFlag"}, {"AdjZeroFlag"}),

 

Q: Does the LEFT OUTER MERGE add more performance issues to the overall solution?

 

The final step is to get some measures from a different source which are missing from the initial Text Files. This source is a SQL DB and I called it in Power Query ‘Lookup’.

 

The Query Returns 2.7M records and will be used to return the Sales Cost which is missing from the initial Power BI Query ‘Online Fact’. The way I approached this was by using once more, a LEFT OUTER MERGE:

 

#"Merged Queries1" = Table.NestedJoin(#"Added Custom6",{"OrderID", "ProductID"},#"LOOKUP",{"OntransRef", "StyleBarcode"},"LOOKUP",JoinKind.LeftOuter),

 

Once I add this Merge everything starts to work very slow. In fact, Power Query is taking ages to only to preview the data on screen. Once it finally does and I click ‘Close and Apply’ to load the model. It takes over 30mins for the data to load.

 

Important: The Only Table I am loading to the model is the Power BI Query ‘Online Fact’, as I unticked the ‘Enable Load’ option for the other as all they were needed for was to do the LEFT OUTER MERGE and APPEND.

 

What steps will I take to try increase overall performance:

 

  1. I will try append all the CSV files outside from Power BI, that way less work is needed from Power Query.
  2. Instead of Duplicating the entire ‘Online Fact’ and using it to do a LEFT OUTER MERGE, what I will try do is find an alternative method for deriving a Flag that indicates whether the Transactions has at least 1 item with a 0 Value.
    1. I will explore this in DAX using a Calculated Column, as I am hoping I can find a way to apply the logic at the Transaction ID level, even though the lowest level in the ‘Online Fact’ is Transaction Item.
    2. I will remove the LEFT OUTER MERGE for returning back the Sales Cost and instead, load the ‘Lookup’ and simply create a relationship with the ‘Online Fact’ and ‘Lookup’.
      1. Even though this is frustrating, as I would like to have simply added the measure in the Online Fact.

 

Thank you PBI Community

Laz

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi Community & @v-jiascu-msft,

 

So i managed to reduce the time of refreshing the entire model from 30mins to 4mins Smiley Very Happy

 

From my intial post, there were three areas which i believed were causing the slow performance:

 

1. Append - multiple CSV's.

2. Merge - merge Online Fact in order to produce a Flag Field.

3. Merge - merge Online Fact with Lookup table to retrieve other fields.

 

What chnages did i make to devrease performance from 30mins to 4mins:

 

Append - multiple CSV's

Isntead of Appending each CSV as @v-jiascu-msft advise i used the Folder Connector. I dont believe this caused much difference in performance, however, it does simplify and create less nedded Steps.

 

Merge - merge Online Fact in order to produce a Flag Field

I used a DAX formula to create a Flag to filter Transaction ID's which have at least one Product ID with a Sales value of 0.

 

Merge - merge Online Fact with Lookup table to retrieve other fields

I loaded the Lookup Table into the Model and created a relationship between the Online Fact and Lookup. Yes this means that i have measures seperated in tables but performance was improved dramatically. Also, needed to use the DAX Statement 'RELATED' to create measures/columns when one field was in Online Fact and another in the Lookup Table.

 

I have simplified the amount of work i dont to make this huge difference in refresh time: 30mins to 4mins but i believe the thing to take away from here is, dont try and do everyhting in Power Query.

 

Please feel free to comment on why you think the performance has increased 😄

 

Laz

 

 

 

View solution in original post

7 REPLIES 7
Anonymous
Not applicable

Hi Community & @v-jiascu-msft,

 

So i managed to reduce the time of refreshing the entire model from 30mins to 4mins Smiley Very Happy

 

From my intial post, there were three areas which i believed were causing the slow performance:

 

1. Append - multiple CSV's.

2. Merge - merge Online Fact in order to produce a Flag Field.

3. Merge - merge Online Fact with Lookup table to retrieve other fields.

 

What chnages did i make to devrease performance from 30mins to 4mins:

 

Append - multiple CSV's

Isntead of Appending each CSV as @v-jiascu-msft advise i used the Folder Connector. I dont believe this caused much difference in performance, however, it does simplify and create less nedded Steps.

 

Merge - merge Online Fact in order to produce a Flag Field

I used a DAX formula to create a Flag to filter Transaction ID's which have at least one Product ID with a Sales value of 0.

 

Merge - merge Online Fact with Lookup table to retrieve other fields

I loaded the Lookup Table into the Model and created a relationship between the Online Fact and Lookup. Yes this means that i have measures seperated in tables but performance was improved dramatically. Also, needed to use the DAX Statement 'RELATED' to create measures/columns when one field was in Online Fact and another in the Lookup Table.

 

I have simplified the amount of work i dont to make this huge difference in refresh time: 30mins to 4mins but i believe the thing to take away from here is, dont try and do everyhting in Power Query.

 

Please feel free to comment on why you think the performance has increased 😄

 

Laz

 

 

 

HI,

 

I am also facing the same issue, only initaily load tackes a time, but no huge data no complex queries. but the initial load tacks 10 to 15 minutes. there is any resolution. kindly help.

Thanks for sharing. @Anonymous. I'm glad you made it.

 

If the dataset is small, everything is fine. If the dataset is large, we have to keep everything simple. For example, less join or merge which could double the data and slow down the performance. 

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
v-jiascu-msft
Microsoft
Microsoft

Hi Laz,

 

I would suggest you only do "Get data" in the Power Query. Other steps are reasonable to do with DAX. 

1. Since you only use the appended table, you can try the connector Folder that will combine the CSV files at one time. So you don't need to import the individual files.

2. Why did you extract 0 sales in the Power Query? Usually, this should be done with DAX. Either a measure or a calculated column. 

3. If you use DAX instead in item 2 above, the Left Outer Join wouldn't necessary.

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-jiascu-msft,

 

Thank you for the reply.

 

In regards to your notes, i am assuming that when you say i should only do 'Get Data' in Power Query, this means i should not apply any of the transformations i am doing (mainly the Merges) as this can be done in DAX?

 

Also coming back to you on your points:

 

  1. Since you only use the appended table, you can try the connector Folder that will combine the CSV files at one time. So you don't need to import the individual files.

I will certainly do this; I believed that appending individually each CSV, would have the same outcome in performance as it would when using the connector Folder.

 

  1. Why did you extract 0 sales in the Power Query? Usually, this should be done with DAX. Either a measure or a calculated column.

In all honestly, the reason I chose to Duplicate the ‘Online Fact’, filter all records which have a 0 Sales value, remove all fields apart from the Transaction ID, Remove all Duplicates and MERGE with the original ‘Online Fact’ was due to knowing how to do this in DAX.

 

I did not know the method to do this in DAX using a measure or column as the dataset in question goes down to Transaction Item level. The filed I am looking to derive is a Flag indicating whether a Transaction ID has at least one Item with a 0 Value. Therefore, this measure or column should be based on Transaction ID level, whilst using the data from the Transactions Item level.

 

  1. If you use DAX instead in item 2 above, the Left Outer Join wouldn't necessary.

 

I don’t get this point, I can’t see DAX helping me when I want to pull some data from a different source into the ‘Online Fact’ Query. For instance, as mentioned the ‘Online Fact’ is coming from some CSV files, but they are missing the Sales Cost field. To get the Sales Cost I need to write a query to the SQL DB Source. Once I get this data into Power BI, I need to pull this data into the ‘Online Fact’. The way I currently do this, is through a LEFT OUTER MERGE. However, performance is an issue, hence I was thinking of simply joining the two tables using the Relationships. The only frustrating aspect of this, is that i am loading another table into the Model, only to get a singl field which is the Sales Cost.

 

I will be applying these changes and get back to you and the community with my findings.

Thanks again for the reply.

Laz

Hi Laz,

 

Can you share a little sample? A dummy one is enough.

Regarding the last part, if you don't have a query of "0 sales", you don't need to apply "left outer join" unless you'd like to join other queries. 

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-jiascu-msft

 

Looking to resolve this and will get back with an update 🙂

 

Thanks,

Laz

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors