Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
MKDK
Frequent Visitor

Power BI Desktop import speed from Power BI Dataflows (ADLS2)

Question What affects the import speed from Power BI Dataflows to Power BI Desktop?

Summary

In some cases the import speed is consistently fast and in other cases slow. Together with MS Support Team (2111090050002523) and member from PQ Team, we had a hypothesis that the number of rows affects the import speed due to CSV processing needed on the ADLS2 side. Throughout 48 test cases I could not confirm this hypothesis. 

Notes

  • High speed = Task Manager showing high LAN utilization and speed at max of available internet connection
  • Slow speed = around 35 Mbits
  • Initial tests were on workspace with default Dataflow storage. Afterwards in workspace with attached ADLS2, so I could see the entity CSV size.
  • Real Dataflow names were replaced with generic names in the test case overview table

Steps

  • In the Premium workspaces with default and ADLS2 storage I have generated multiple dataflows and entities with various number of records, columns and column data types
  • In PBI Desktop I connected to these DFLs with Power BI Dataflow connector and imported data. There are no transformation steps apart from the Source & Navigation steps.
  • After import, I have saved > closed PBI Desktop. Each test case is single pbix containing 1 table which represents the Dataflow entity.

Findings

  • There are dataflows that are consistently importing with high speed:
    • TestLog Index 2, 8, 26, 31 relate to DFL3 where case 26 and 31 have 9M and 49M records respectively. The underlying ADLS2 CSVs are 6,5GB and  28GB in size respectively.
  • There are dataflows that are consistently importing with lower speed:
    • TestLog Index 3, 33, 34, 35, 36 relate to DFL DFL4 where the original entity had 65M (7,2GB) rows and I have created subsets with 5M (0,5GB), 10M (1,1GB), 20M (2,1GB) and 40M (3,2GB) respectively. The speed was consistent at each subset and was not much faster than the full record set.
  • Considering that DFL3 is wide table with mostly text columns and DFL4 is a narrow with mostly numeric columns, I got couple of ideas:
    • DataType idea: I have created DFL4_Various Sizes AsText where numeric column data types are set to text. There was minor increase in import speed form 50Mbit to 70Mbit. (TestLog Index 44-48)
    • Table width idea: I have created DFL7_Wide where I duplicated the columns so total column count is 30. There was no change in import speed. (TestLog Index 42-43)
  • Based on information in SQLBI Course Optimizing DAX, module VertiPaq Storage Engine and the book Definitive Guide to DAX, chapter The VertiPaq Engine in DAX, I thought that import speed could be related to the data structure (column cardinality and number of columns). Unfortunately I was not able to find correlation between the information in the video course & the book, and my TestLog results.
  • Additionally, reading the MS Docs Data reduction techniques for Import modeling did not bring more clarity into this topic

 

Conclusion

The import speed is likely affected by the data structure rather than number of records as shown in the TestLog but based on the information available I was not able to find clear correlation.

 

Test Log

Table contains only subset of test cases referenced above due to limitation of number of characters in a post (20k incl HTML tags).

 

 

TestLog IndexDFL NameEntity NameDevicePBI Desktop Avg Download SpeedRow Count (M)No of ColumnsDFL size in ADLS (GB)Tested OnWorkspace StoragePBI Desktop Version
2DFL3DFL3_Entity1User 1170932 09/11/2021DefaultOct 2021
3DFL4DFL4_Entity1User 1356514 09/11/2021DefaultOct 2021
8DFL3DFL3_Entity1VDI20932 11/11/2021DefaultOct 2021
26DFL3DFL3_Entity1User 11709326,510/02/2022ADLS2Dec 2021
31DFL3DFL3_Entity1_4xUser 117049322828/03/2022ADLS2Mar 2022
33DFL4_Various SizesDFL4_Entity1_5MUser 1505140,528/03/2022ADLS2Mar 2022
34DFL4_Various SizesDFL4_Entity1_10MUser 15010141,128/03/2022ADLS2Mar 2022
35DFL4_Various SizesDFL4_Entity1_20MUser 15020142,131/03/2022ADLS2Mar 2022
36DFL4_Various SizesDFL4_Entity1_40MUser 15040143,231/03/2022ADLS2Mar 2022
42DFL7_WideDFL7_Entity1Wide-1xUser 1352,2300,328/03/2022ADLS2Mar 2022
43DFL7_WideDFL7_Entity1Wide-5xUser 13511301,428/03/2022ADLS2Mar 2022
44DFL4_Various Sizes AsTextDFL4_Entity1_AsText_1MUser 11701140,130/03/2022ADLS2Mar 2022
45DFL4_Various Sizes AsTextDFL4_Entity1_AsText_5MUser 1705140,530/03/2022ADLS2Mar 2022
46DFL4_Various Sizes AsTextDFL4_Entity1_AsText_10MUser 17010141,130/03/2022ADLS2Mar 2022
47DFL4_Various Sizes AsTextDFL4_Entity1_AsText_20MUser 17020142,131/03/2022ADLS2Mar 2022
48DFL4_Various Sizes AsTextDFL4_Entity1_AsText_40MUser 17040144,231/03/2022ADLS2Mar 2022

User speed info

 

UserInternet SpeedAzure Speed Test to preferred datacenter
User 1160 Mbps72 ms
VDI690 Mbps41 ms
User 291 Mbps52 ms

 

Looking forward to any suggestions and ideas that could shed more light on the import speed.

3 REPLIES 3
lbendlin
Super User
Super User

Dataflows are glorified CSV files (or Parquet if you want).  My money is either on some weird column conversions in some of your dataflows, or a non-deterministic behavior of your Premium Capacity.  Microsoft say that these capacities are isolated, but there is a nagging feeling that they are still using some type of colo and that you can be impacted by someone else's activities.

 

Note that the "Direct Query for Dataflows"  nonsense introduces another layer of complexity - not sure if that is applicable in your scenario.

MKDK
Frequent Visitor

Hi Ibendlin,
I love your term "glorified CVS files", I have certainly not seen this term anywhere in Microsoft marketing materials.
In our organization we have been using PBI Dataflows since the day we rolled-out Power BI, that was approx. 3 years ago. The main selling point of DFL was the simplicity of provisioning data to our users from various systems and so far with mostly positive feedback. We currently have around 1500 dataflows across our capacities.
Without having the deep technical knowledge, I would guess that there should not be performance effect of our Gen2 capacities in the import speed from dataflows. Import in this case is simple read operation on the ADLS side. 
As you have suggested, "...some weird column conversions in some of your dataflows..." is also where my money is, but I would like to understand more about this behaviour, which was the reason to post here - to fish ideas from others 🙂

If you store your dataflow results into a dedicated ADLG2 instance you'll see that they are indeed snapshotted CSVs with no header information.  The column names and types are found in the Common Data Model file of model.json at the top level of the data flow.

 

If I download one of our larger dataflow CSVs (2.55 GiB) using Microsoft Azure Storage Explorer it speeds along at an average of ~700 Mbps which is significantly better than the ~20Mbps we see when pulling in the data via PowerBI Desktop.  🙄

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.