Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
Check it out now!Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
In some cases the import speed is consistently fast and in other cases slow. Together with MS Support Team (2111090050002523) and member from PQ Team, we had a hypothesis that the number of rows affects the import speed due to CSV processing needed on the ADLS2 side. Throughout 48 test cases I could not confirm this hypothesis.
The import speed is likely affected by the data structure rather than number of records as shown in the TestLog but based on the information available I was not able to find clear correlation.
Table contains only subset of test cases referenced above due to limitation of number of characters in a post (20k incl HTML tags).
TestLog Index | DFL Name | Entity Name | Device | PBI Desktop Avg Download Speed | Row Count (M) | No of Columns | DFL size in ADLS (GB) | Tested On | Workspace Storage | PBI Desktop Version |
2 | DFL3 | DFL3_Entity1 | User 1 | 170 | 9 | 32 | 09/11/2021 | Default | Oct 2021 | |
3 | DFL4 | DFL4_Entity1 | User 1 | 35 | 65 | 14 | 09/11/2021 | Default | Oct 2021 | |
8 | DFL3 | DFL3_Entity1 | VDI | 20 | 9 | 32 | 11/11/2021 | Default | Oct 2021 | |
26 | DFL3 | DFL3_Entity1 | User 1 | 170 | 9 | 32 | 6,5 | 10/02/2022 | ADLS2 | Dec 2021 |
31 | DFL3 | DFL3_Entity1_4x | User 1 | 170 | 49 | 32 | 28 | 28/03/2022 | ADLS2 | Mar 2022 |
33 | DFL4_Various Sizes | DFL4_Entity1_5M | User 1 | 50 | 5 | 14 | 0,5 | 28/03/2022 | ADLS2 | Mar 2022 |
34 | DFL4_Various Sizes | DFL4_Entity1_10M | User 1 | 50 | 10 | 14 | 1,1 | 28/03/2022 | ADLS2 | Mar 2022 |
35 | DFL4_Various Sizes | DFL4_Entity1_20M | User 1 | 50 | 20 | 14 | 2,1 | 31/03/2022 | ADLS2 | Mar 2022 |
36 | DFL4_Various Sizes | DFL4_Entity1_40M | User 1 | 50 | 40 | 14 | 3,2 | 31/03/2022 | ADLS2 | Mar 2022 |
42 | DFL7_Wide | DFL7_Entity1Wide-1x | User 1 | 35 | 2,2 | 30 | 0,3 | 28/03/2022 | ADLS2 | Mar 2022 |
43 | DFL7_Wide | DFL7_Entity1Wide-5x | User 1 | 35 | 11 | 30 | 1,4 | 28/03/2022 | ADLS2 | Mar 2022 |
44 | DFL4_Various Sizes AsText | DFL4_Entity1_AsText_1M | User 1 | 170 | 1 | 14 | 0,1 | 30/03/2022 | ADLS2 | Mar 2022 |
45 | DFL4_Various Sizes AsText | DFL4_Entity1_AsText_5M | User 1 | 70 | 5 | 14 | 0,5 | 30/03/2022 | ADLS2 | Mar 2022 |
46 | DFL4_Various Sizes AsText | DFL4_Entity1_AsText_10M | User 1 | 70 | 10 | 14 | 1,1 | 30/03/2022 | ADLS2 | Mar 2022 |
47 | DFL4_Various Sizes AsText | DFL4_Entity1_AsText_20M | User 1 | 70 | 20 | 14 | 2,1 | 31/03/2022 | ADLS2 | Mar 2022 |
48 | DFL4_Various Sizes AsText | DFL4_Entity1_AsText_40M | User 1 | 70 | 40 | 14 | 4,2 | 31/03/2022 | ADLS2 | Mar 2022 |
User | Internet Speed | Azure Speed Test to preferred datacenter |
User 1 | 160 Mbps | 72 ms |
VDI | 690 Mbps | 41 ms |
User 2 | 91 Mbps | 52 ms |
Looking forward to any suggestions and ideas that could shed more light on the import speed.
Dataflows are glorified CSV files (or Parquet if you want). My money is either on some weird column conversions in some of your dataflows, or a non-deterministic behavior of your Premium Capacity. Microsoft say that these capacities are isolated, but there is a nagging feeling that they are still using some type of colo and that you can be impacted by someone else's activities.
Note that the "Direct Query for Dataflows" nonsense introduces another layer of complexity - not sure if that is applicable in your scenario.
Hi Ibendlin,
I love your term "glorified CVS files", I have certainly not seen this term anywhere in Microsoft marketing materials.
In our organization we have been using PBI Dataflows since the day we rolled-out Power BI, that was approx. 3 years ago. The main selling point of DFL was the simplicity of provisioning data to our users from various systems and so far with mostly positive feedback. We currently have around 1500 dataflows across our capacities.
Without having the deep technical knowledge, I would guess that there should not be performance effect of our Gen2 capacities in the import speed from dataflows. Import in this case is simple read operation on the ADLS side.
As you have suggested, "...some weird column conversions in some of your dataflows..." is also where my money is, but I would like to understand more about this behaviour, which was the reason to post here - to fish ideas from others 🙂
If you store your dataflow results into a dedicated ADLG2 instance you'll see that they are indeed snapshotted CSVs with no header information. The column names and types are found in the Common Data Model file of model.json at the top level of the data flow.
If I download one of our larger dataflow CSVs (2.55 GiB) using Microsoft Azure Storage Explorer it speeds along at an average of ~700 Mbps which is significantly better than the ~20Mbps we see when pulling in the data via PowerBI Desktop. 🙄