Question What affects the import speed from Power BI Dataflows to Power BI Desktop?

MKDK · ‎05-01-2022

Question What affects the import speed from Power BI Dataflows to Power BI Desktop?

Summary

In some cases the import speed is consistently fast and in other cases slow. Together with MS Support Team (2111090050002523) and member from PQ Team, we had a hypothesis that the number of rows affects the import speed due to CSV processing needed on the ADLS2 side. Throughout 48 test cases I could not confirm this hypothesis.

Notes

High speed = Task Manager showing high LAN utilization and speed at max of available internet connection
Slow speed = around 35 Mbits
Initial tests were on workspace with default Dataflow storage. Afterwards in workspace with attached ADLS2, so I could see the entity CSV size.
Real Dataflow names were replaced with generic names in the test case overview table

Steps

In the Premium workspaces with default and ADLS2 storage I have generated multiple dataflows and entities with various number of records, columns and column data types
In PBI Desktop I connected to these DFLs with Power BI Dataflow connector and imported data. There are no transformation steps apart from the Source & Navigation steps.
After import, I have saved > closed PBI Desktop. Each test case is single pbix containing 1 table which represents the Dataflow entity.

Findings

There are dataflows that are consistently importing with high speed:

TestLog Index 2, 8, 26, 31 relate to DFL3 where case 26 and 31 have 9M and 49M records respectively. The underlying ADLS2 CSVs are 6,5GB and 28GB in size respectively.

There are dataflows that are consistently importing with lower speed:

TestLog Index 3, 33, 34, 35, 36 relate to DFL DFL4 where the original entity had 65M (7,2GB) rows and I have created subsets with 5M (0,5GB), 10M (1,1GB), 20M (2,1GB) and 40M (3,2GB) respectively. The speed was consistent at each subset and was not much faster than the full record set.

Considering that DFL3 is wide table with mostly text columns and DFL4 is a narrow with mostly numeric columns, I got couple of ideas:

DataType idea: I have created DFL4_Various Sizes AsText where numeric column data types are set to text. There was minor increase in import speed form 50Mbit to 70Mbit. (TestLog Index 44-48)
Table width idea: I have created DFL7_Wide where I duplicated the columns so total column count is 30. There was no change in import speed. (TestLog Index 42-43)

Based on information in SQLBI Course Optimizing DAX, module VertiPaq Storage Engine and the book Definitive Guide to DAX, chapter The VertiPaq Engine in DAX, I thought that import speed could be related to the data structure (column cardinality and number of columns). Unfortunately I was not able to find correlation between the information in the video course & the book, and my TestLog results.
Additionally, reading the MS Docs Data reduction techniques for Import modeling did not bring more clarity into this topic

Conclusion

The import speed is likely affected by the data structure rather than number of records as shown in the TestLog but based on the information available I was not able to find clear correlation.

Test Log

Table contains only subset of test cases referenced above due to limitation of number of characters in a post (20k incl HTML tags).

TestLog Index	DFL Name	Entity Name	Device	PBI Desktop Avg Download Speed	Row Count (M)	No of Columns	DFL size in ADLS (GB)	Tested On	Workspace Storage	PBI Desktop Version
2	DFL3	DFL3_Entity1	User 1	170	9	32		09/11/2021	Default	Oct 2021
3	DFL4	DFL4_Entity1	User 1	35	65	14		09/11/2021	Default	Oct 2021
8	DFL3	DFL3_Entity1	VDI	20	9	32		11/11/2021	Default	Oct 2021
26	DFL3	DFL3_Entity1	User 1	170	9	32	6,5	10/02/2022	ADLS2	Dec 2021
31	DFL3	DFL3_Entity1_4x	User 1	170	49	32	28	28/03/2022	ADLS2	Mar 2022
33	DFL4_Various Sizes	DFL4_Entity1_5M	User 1	50	5	14	0,5	28/03/2022	ADLS2	Mar 2022
34	DFL4_Various Sizes	DFL4_Entity1_10M	User 1	50	10	14	1,1	28/03/2022	ADLS2	Mar 2022
35	DFL4_Various Sizes	DFL4_Entity1_20M	User 1	50	20	14	2,1	31/03/2022	ADLS2	Mar 2022
36	DFL4_Various Sizes	DFL4_Entity1_40M	User 1	50	40	14	3,2	31/03/2022	ADLS2	Mar 2022
42	DFL7_Wide	DFL7_Entity1Wide-1x	User 1	35	2,2	30	0,3	28/03/2022	ADLS2	Mar 2022
43	DFL7_Wide	DFL7_Entity1Wide-5x	User 1	35	11	30	1,4	28/03/2022	ADLS2	Mar 2022
44	DFL4_Various Sizes AsText	DFL4_Entity1_AsText_1M	User 1	170	1	14	0,1	30/03/2022	ADLS2	Mar 2022
45	DFL4_Various Sizes AsText	DFL4_Entity1_AsText_5M	User 1	70	5	14	0,5	30/03/2022	ADLS2	Mar 2022
46	DFL4_Various Sizes AsText	DFL4_Entity1_AsText_10M	User 1	70	10	14	1,1	30/03/2022	ADLS2	Mar 2022
47	DFL4_Various Sizes AsText	DFL4_Entity1_AsText_20M	User 1	70	20	14	2,1	31/03/2022	ADLS2	Mar 2022
48	DFL4_Various Sizes AsText	DFL4_Entity1_AsText_40M	User 1	70	40	14	4,2	31/03/2022	ADLS2	Mar 2022

User speed info

User	Internet Speed	Azure Speed Test to preferred datacenter
User 1	160 Mbps	72 ms
VDI	690 Mbps	41 ms
User 2	91 Mbps	52 ms

Looking forward to any suggestions and ideas that could shed more light on the import speed.

lbendlin · ‎05-02-2022

Dataflows are glorified CSV files (or Parquet if you want). My money is either on some weird column conversions in some of your dataflows, or a non-deterministic behavior of your Premium Capacity. Microsoft say that these capacities are isolated, but there is a nagging feeling that they are still using some type of colo and that you can be impacted by someone else's activities.

Note that the "Direct Query for Dataflows" nonsense introduces another layer of complexity - not sure if that is applicable in your scenario.

MKDK · ‎05-09-2022

Hi Ibendlin,
I love your term "glorified CVS files", I have certainly not seen this term anywhere in Microsoft marketing materials.
In our organization we have been using PBI Dataflows since the day we rolled-out Power BI, that was approx. 3 years ago. The main selling point of DFL was the simplicity of provisioning data to our users from various systems and so far with mostly positive feedback. We currently have around 1500 dataflows across our capacities.
Without having the deep technical knowledge, I would guess that there should not be performance effect of our Gen2 capacities in the import speed from dataflows. Import in this case is simple read operation on the ADLS side.
As you have suggested, "...some weird column conversions in some of your dataflows..." is also where my money is, but I would like to understand more about this behaviour, which was the reason to post here - to fish ideas from others 🙂

pelowski · ‎10-18-2022

If you store your dataflow results into a dedicated ADLG2 instance you'll see that they are indeed snapshotted CSVs with no header information. The column names and types are found in the Common Data Model file of model.json at the top level of the data flow.

If I download one of our larger dataflow CSVs (2.55 GiB) using Microsoft Azure Storage Explorer it speeds along at an average of ~700 Mbps which is significantly better than the ~20Mbps we see when pulling in the data via PowerBI Desktop. 🙄

Power BI Desktop import speed from Power BI Dataflows (ADLS2)

Question What affects the import speed from Power BI Dataflows to Power BI Desktop?

Summary

Notes

Steps

Findings

Conclusion

Test Log

User speed info

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly

Huge last-minute discounts for FabCon Vienna from September 15-18, 2025

Power BI Desktop import speed from Power BI Dataflows (ADLS2)

Question What affects the import speed from Power BI Dataflows to Power BI Desktop?

Summary

Notes

Steps

Findings

Conclusion

Test Log

User speed info

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly