Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreShape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.
I am developing a solution to create match/mismatch list with two set of excel files and it is working perfectly. I do see that while refresh, rows count goes very high from actual data and I am afraid that when excel file increases, refresh might become a concers.
I share below mechanism how the queries are designed, I think it would be much difficult to exactly represent the data, but I still need to ask what are the key terms to look for the solution.
There are two folders. Each folder have multiple excel files. I make two data connection with each folder and merge corresponding files in one query for each folder.
So in power query I get two queries, one for each folder. I need to find exact match, partial match and mismatches in both these queries.
Query 1 - folder1 (1500 rows, so it is total of rows in all excel file in that folder)
Query 2 – folder2 (600 rows, so it is total of rows in all excel file in that folder)
Query 3 – merge query 1 and query 2 with inner join to find matches (203 rows)
Query 4 - merge query 1 and query 2 with outer join to find mismatches (1350 rows)
Query 5 – merge Query 4 with Query 2 to find mismatch in one column (7 rows)
Query 6 – merge Query 4 with Query 2 to find mismatch in one column (4 rows)
Query 7 – merge Query 4 with Query 2 to find mismatch in one column (4 rows)
Query 8 – merge Query 4 with Query 2 to find mismatch in one column (1 rows)
Query 9 – Append Query 3, 5,6,7,8 to make query for match or partial match
Query 10 – Merge Query 9 with Query 1 to find records not in Query 2
Query 11 – Merge Query 9 with Query 2 to find records not in Query 1
Query 12 (final result) – Append Query 9,10,11 (1960 rows)
At the time refresh, it takes couple of minutes (3 excel file in each folder) and several steps seen during the refresh (you may understand what I am talking about):
I need to find ways to optimize the refresh or need to find alternate ways to handle the purpose?
Hope I was able to explain the concern.
Hi @mahenkj2 latter you will probably looking for some tool for identife columns not used or used in model
Check two links: first for power pivot utilities (Excel and DAX) and second with power bi with DAX Query view part
It these helps, accept as solution so other member could see it.
Proud to be a Super User!
Hi @mahenkj2 do not import all the columns which you do need for your model / queries.
Analyse which columns you need and only import them.
Backup your working and pbi files and play a bit with decreased number of columns to figure out what columns are not needed.
Maybe the best would be to create some query out of pbi and then import the Excel file which you think will grow / be needed in future.
Proud to be a Super User!
Hi @some_bih
I do have couple of additional column for future usage possibitlity, these have just 1 or 2 unique values in each column. I do see little improvement in performance. I would expect if there are some alternate means to reduce total rows processed during query processing or some alternate ways to handle such application.
Secon solution you suggest is not possible in my case.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
User | Count |
---|---|
94 | |
90 | |
83 | |
76 | |
49 |
User | Count |
---|---|
145 | |
141 | |
109 | |
69 | |
55 |