Re: Help on optimizing refresh - much more rows th...

mahenkj2 · ‎02-17-2024

I am developing a solution to create match/mismatch list with two set of excel files and it is working perfectly. I do see that while refresh, rows count goes very high from actual data and I am afraid that when excel file increases, refresh might become a concers.

I share below mechanism how the queries are designed, I think it would be much difficult to exactly represent the data, but I still need to ask what are the key terms to look for the solution.

There are two folders. Each folder have multiple excel files. I make two data connection with each folder and merge corresponding files in one query for each folder.

So in power query I get two queries, one for each folder. I need to find exact match, partial match and mismatches in both these queries.

Query 1 - folder1 (1500 rows, so it is total of rows in all excel file in that folder)

Query 2 – folder2 (600 rows, so it is total of rows in all excel file in that folder)

Query 3 – merge query 1 and query 2 with inner join to find matches (203 rows)

Query 4 - merge query 1 and query 2 with outer join to find mismatches (1350 rows)

Query 5 – merge Query 4 with Query 2 to find mismatch in one column (7 rows)

Query 6 – merge Query 4 with Query 2 to find mismatch in one column (4 rows)

Query 7 – merge Query 4 with Query 2 to find mismatch in one column (4 rows)

Query 8 – merge Query 4 with Query 2 to find mismatch in one column (1 rows)

Query 9 – Append Query 3, 5,6,7,8 to make query for match or partial match

Query 10 – Merge Query 9 with Query 1 to find records not in Query 2

Query 11 – Merge Query 9 with Query 2 to find records not in Query 1

Query 12 (final result) – Append Query 9,10,11 (1960 rows)

At the time refresh, it takes couple of minutes (3 excel file in each folder) and several steps seen during the refresh (you may understand what I am talking about):

I need to find ways to optimize the refresh or need to find alternate ways to handle the purpose?

Hope I was able to explain the concern.

some_bih · ‎02-20-2024

Hi @mahenkj2 latter you will probably looking for some tool for identife columns not used or used in model

Check two links: first for power pivot utilities (Excel and DAX) and second with power bi with DAX Query view part

It these helps, accept as solution so other member could see it.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

some_bih · ‎02-17-2024

Hi @mahenkj2 do not import all the columns which you do need for your model / queries.

Analyse which columns you need and only import them.

Backup your working and pbi files and play a bit with decreased number of columns to figure out what columns are not needed.

Maybe the best would be to create some query out of pbi and then import the Excel file which you think will grow / be needed in future.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

mahenkj2 · ‎02-20-2024

Hi @some_bih

I do have couple of additional column for future usage possibitlity, these have just 1 or 2 unique values in each column. I do see little improvement in performance. I would expect if there are some alternate means to reduce total rows processed during query processing or some alternate ways to handle such application.

Secon solution you suggest is not possible in my case.

Help on optimizing refresh - much more rows than actual rows

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Join the #PBI10 DataViz contest