Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Tables and queries - is this why my query is reading in so much data?

Hi all, 

 

I need some help understanding if I have the right idea about why I was having problems with really slow power query experience. 

It's kind of a long story but in a nutshell, I was reading in a 250MB CSV file of invoices from a legacy system and aggregating the sales by month and SKU, then generating a list of unique month-product combinations and filling in all the missing combinations so that I could find the true average and standard deviations (0 sales months obviously not being in the invoice report). When working on the data everything slowed and I saw that power query was reading in over 1.7GB of data - which over a VPN was painfully slow.

 

I have fixed my problem and have a working theory what the engine was doing behind the scenes, but I need someone a bit more knowlegable about what the engine is doing behind the scenes to correct my thinking if I have the wrong end of the stick about what was happening. I did a walk through of a simplfied version in order to check my thinking which looked like this:

The approach in a nutshell as you can see from the Queries tab show:

  1. Read in data
  2. Reference Step 1 to make list of unique products
  3. Reference Step 1 again to make a list of unique dates, then cross join with Step 2 to make all combinations of SKU and month.
  4. Reference Step 1 again to group sales by month.
  5. Merge Steps 3 and 4 to left outer join the result to make the final table, and add a special column for the moving annual total using a clever custom function.

My thought is that by using references and merges, I was making multiple queries instead of just making multiple tables and lists in one query where I could user Table.Buffer() to stop repeated reads of the csv. So I went back and did the entire thing in a single block of code, creating the tables and lists all in one query. This fixed my problem, but it wouldn't have been possible to write the query from the GUI as far as I can tell as it involved a cross join and a left join to tables which only appear within the query. 

In case this is unclear, I've written full details of what I think is happening with sample code here:

https://wordpress.com/post/supplychaindatascientistcom.wordpress.com/195

While my problem is fixed, I'd like to get a better idea what was happening inside power query here in terms of amount of data read in when using one query of multiple tables vs. multiple queries of single tables.

Thanks for any help.

Nick 

1 ACCEPTED SOLUTION
mahoneypat
Microsoft Employee
Microsoft Employee

This article and video explain it well.

Chris Webb's BI Blog: Why Does Power BI Query My Data Source More Than Once? Chris Webb's BI Blog (c...

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

2 REPLIES 2
mahoneypat
Microsoft Employee
Microsoft Employee

This article and video explain it well.

Chris Webb's BI Blog: Why Does Power BI Query My Data Source More Than Once? Chris Webb's BI Blog (c...

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Anonymous
Not applicable

Thanks. It will take me some time to go through that, but it looks very promising. 

Greatly appreciate you taking the time. 

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.