Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hey All.
It would be great to have some explanation regarding the performance of the two Dataflow connectors and how they differ technically.
I have been doing some testing with semi large Dataflows (35,000,000 rows) in Power BI desktop and have concluded that the PowerBI.Dataflows connector is much faster than the PowerPlatform.Dataflows connector, the former finishing 80-85% quicker than the latter when using the refresh button.
For context in Power Query:
• After the navigation step the rows are filtered by a 0/1 column to exlude some data.
• Then I use a remove other columns step to reduce the columns from 26 to 5.
• Then a replace value step is done to convert a 1-5 column into text results i.e 1 = Red, 2 = Blue etc.
• Then a grouped rows step is apllied to reduce the rows from 35,000,000 to 2,500,000.
• Then a conditional column is added which acts as a numerical sort column for another column.
• Finally a changed type step to ensure all the data is in the right format.
Both queries are identical bar the initial connector, I would love to know why there is such a performance different as I am very reluctant to move my company over to the new connector when the performance is so poor.
After continued testing using identical query logic (after the navigation step) I can confirm the Power Platform Dataflow Connector is much slower than the Power BI Dataflow Connector, in some instances by 80%.
Below is my code (removed the navigation step ID's).
let
Source = PowerPlatform.Dataflows(null),
#"Bookings Detail" = *NAVIGATION STEPS*,
#"Filtered Rows" = Table.SelectRows(#"Bookings Detail", each ([Closure Date Filter] = 0)),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Centre Code", "Booking Date", "Booking Type", "Room Type", "Bookings"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",each [Room Type], each if [Room Type] = 1 then "Nursery" else if [Room Type] = 2 then "Toddler" else if [Room Type] = 3 then "Junior Kindy" else if [Room Type] = 4 then "Kindy" else if [Room Type] = 5 then "Pre-school" else if [Room Type] = 6 then "Before School Care" else if [Room Type] = 7 then "After School Care" else if [Room Type] = 8 then "Vacation Care" else if [Room Type] = 9 then "Before/After School Care" else null,Replacer.ReplaceValue,{"Room Type"}),
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Centre Code", "Booking Date", "Booking Type", "Room Type"}, {{"Bookings", each List.Sum([Bookings]), type nullable number}}),
#"Added Conditional Column" = Table.AddColumn(#"Grouped Rows", "Room Type Sort", each if [Room Type] = "Toddler" then 2 else if [Room Type] = "Kindy" then 4 else if [Room Type] = "Pre-school" then 5 else if [Room Type] = "Nursery" then 1 else if [Room Type] = "Junior Kindy" then 3 else if [Room Type] = "After School Care" then 7 else if [Room Type] = "Before School Care" then 6 else if [Room Type] = "Vacation Care" then 9 else if [Room Type] = null then 10 else if [Room Type] = "Before/After School Care" then 8 else null, type number),
#"Changed Type" = Table.TransformColumnTypes(#"Added Conditional Column",{{"Centre Code", type text}, {"Booking Date", type date}, {"Booking Type", type text}, {"Room Type", type text}, {"Bookings", Int64.Type}, {"Room Type Sort", Int64.Type}})
in
#"Changed Type"
It appears the Power Platform Dataflow Connector loads the data by rows whereas the Power BI Dataflow Connector loads it by data and is much faster. It is worth mentioning I have a gigabit 1000/50 internet connection so I can download from Dataflows very fast.
It is very dissapointing that Microsoft have made the Power Platform Dataflow the default as clearly the performance is not as good.
It would be great to have an explanation as to why there is a big performance difference.
Thanks for your reply however I am not really sure that answers anything, I have been using PowerBI Dataflows since inception so know them pretty well.
What I am trying to understand is why they are so much slower, at least in my case.
Hey JP,
Did you ever get to the bottom of this? Would also like to understand exactly what the difference and why between these two?
Would love to hear from the product team for their insights.
-Aaron
Hi @JP8991 ,
Here are several relationships between Power Platform dataflows and Power BI dataflows. Please refer to
Power Platform dataflows - Power Platform Release Plan | Microsoft Learn
If you want to know more about Power Platform dataflows, there is a document for your reference.
Create and use dataflows in Microsoft Power Platform - Power Query | Microsoft Learn
Actually, Power Platform dataflow will be easier and faster according to the documentation.
Best Regards,
Stephen Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
71 | |
63 | |
40 | |
28 | |
17 |