Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
jb257
Frequent Visitor

Handling huge amounts of data

Dear Community,

 

I would like to know how you would load the following data into PowerQuery and finally into PowerBI. I am provided with eight different exports that cannot be changed. The column names of the different exports are the same with different values and have to summed up after they have been loaded. Below is an overview of the data.

 

xlsx-filesno. of columnsno. of rows
file 160035 000
........
file 860035 000

 

Unfortunately, the 15-minute time resolution must be retained so that the number of rows is not reduced. The same applies to the number of columns. The Excel files are stored on a SharePoint and are only updated once a year.
The transformation of the files is not a problem in itself, the problem is the unacceptable waiting times when performing transformations such as groub by etc.
My current solution is that I have written a Python script that takes over the transformations from Power Query.
But is there also a solution that I can implement only with the help of PowerQuery and Power BI? Maybe customize the import mode of the Excel files or something similar?

 

I look forward to your thoughts

 

Best regards

 

jb

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I ran into something similar recently. The lessons I learned were

- don't try to be cute and use List.Generate or something like that.  Create eight (yes, eight) separate Power Query queries, one for each export. This will yield maximum possible performance (don't forget to set parallelism to 8 or more).

- Use Binary.Buffer when fetching the files from SharePoint. That will tell the SharePoint server that you are impatient and not willing to wait for the chunking.

- Do as few transforms as possible. Maybe change column types, but that's it.

- load all queries into Power BI and do a UNION calculated table there.

View solution in original post

3 REPLIES 3
lbendlin
Super User
Super User

I ran into something similar recently. The lessons I learned were

- don't try to be cute and use List.Generate or something like that.  Create eight (yes, eight) separate Power Query queries, one for each export. This will yield maximum possible performance (don't forget to set parallelism to 8 or more).

- Use Binary.Buffer when fetching the files from SharePoint. That will tell the SharePoint server that you are impatient and not willing to wait for the chunking.

- Do as few transforms as possible. Maybe change column types, but that's it.

- load all queries into Power BI and do a UNION calculated table there.

Wow, those are some interesting and new findings. Generally, you prefer to keep the data model in Power BI small and clear, but obviously the performance of Power BI (Measures and Calculated Tables) seems to be much more effective. Thank you very much!

It is important to note that this is a corner case. Usually you want code that is more concise, easier to maintain, even if it is a little slower.  But in your scenario when brute force power is required you need to compromise on the code maintainability.

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors