Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
suzukishm
Frequent Visitor

How to deal with timeouts when combining multiple tables read from the Sharepoint folder.

Hello,

 

I place multiple Excel files in a Sharepoint folder, process them with the Power BI service's Data Flow, and create a Data Store for reporting. The main purpose of using the Power BI service is for automatic updates, which Power BI Desktop cannot replace it. The workspace license is Pro.

 

In such a situation, when reading and processing multiple Excel files in Power BI Service, the number of data has increased too much and a timeout occurs. The number of Excel files is about 800 and the total number of rows is about 1.5 million.

 

It is taking a long time, especially in the area of merging multiple tables. In the current is as follows.
```
previous_table = ...
output_table = Table.Combine(previous_table[table_column])
...
```

What action would allow me to finish this process within 10 minutes of the query evaluation?

 

1 ACCEPTED SOLUTION

The use of incremental refresh is good in many situations (not just dataflows), but merge steps can be costly for performance, especially if you end up processing all your Excel files multiple times. Do you need to do the merges? Could you have separate tables in your model instead?

 

While building, the tip to "Keep Top Rows" is a good one, so you only work with data from 2-3 files while building.

 

Also, see this article - Chris Webb's BI Blog: Excel.Workbook() And The delayTypes Option In Power Query/Power BI Chris Webb'...

 

Pat

 

Microsoft Employee

View solution in original post

5 REPLIES 5
ppm1
Solution Sage
Solution Sage

Please see this video that shows how to set up incremental refresh with files stored on SharePoint.

https://www.youtube.com/watch?v=IVMdg16yBKE

 

Pat

Microsoft Employee

Thanks @ppm1,

 

I think your idea would be very useful when I have a trouble while updating Data Flow in Power BI Service. However, in my case, the problem is that while developing ETL using Power Query. So the target is a bit different.


I would appreciate it if you could share another idea with me.

The use of incremental refresh is good in many situations (not just dataflows), but merge steps can be costly for performance, especially if you end up processing all your Excel files multiple times. Do you need to do the merges? Could you have separate tables in your model instead?

 

While building, the tip to "Keep Top Rows" is a good one, so you only work with data from 2-3 files while building.

 

Also, see this article - Chris Webb's BI Blog: Excel.Workbook() And The delayTypes Option In Power Query/Power BI Chris Webb'...

 

Pat

 

Microsoft Employee
suzukishm
Frequent Visitor

Thanks @MAwwad 

 

> Use the "Partition" feature in the Power BI service to break up the data into smaller chunks. You can use the "Partition Size" option in the "Advanced" tab of the Data Store settings to specify the number of rows that should be processed at a time.

 

Would you mind if I ask you to share a screenshot of this option? I cannot detect this in my screen...

MAwwad
Super User
Super User

f you are experiencing a timeout when reading and processing multiple Excel files in the Power BI service, there are a few things you can try to improve performance:

  1. Use the Power Query Editor in Power BI Desktop to optimize the data transformation process. You can use the "Remove Rows" and "Keep Rows" options to remove unnecessary rows, and the "Remove Columns" and "Keep Columns" options to remove unnecessary columns. You can also use the "Filter Rows" and "Sort Rows" options to reduce the amount of data that needs to be processed.

  2. Use the "Partition" feature in the Power BI service to break up the data into smaller chunks. You can use the "Partition Size" option in the "Advanced" tab of the Data Store settings to specify the number of rows that should be processed at a time.

  3. Use the "Index" column in Power Query to optimize the merge process. When you merge tables using the "Table.Combine" function, Power Query will automatically create an "Index" column to identify each row. You can use this column to optimize the merge process by sorting the tables by the "Index" column before merging them.

  4. Consider using a different data source that is better suited for large data sets, such as a SQL Server database or an Azure Data Lake.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors