Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
IOdzelashvili
Frequent Visitor

Optimizing Filtering in Power Query: Large Datasets with 6M+ Loans and 20M+ Balances

Hello there!

Imagine I have two table in Power Query Editor:

  1. Loans, with LoanId, BusinessSegment columns. This table contains over 6 mln rows with unique LoanIds.
  2. LoanBalances with LoanId, YearMonth, BalanceAmount. This table contains over 20 mln rows.

I want to filter the LoanBalances table where the BusinessSegment from the Loans table is "Mass".

 

the Condition is that I need to do this step in Power Query Editor, these are my tables and I cannot edit them before loading them into Power Query Editor.


I have tried many options available on the internet, but they only work by merging these two tables, which is very slow, because of the numbers of rows in both tables.


Could you help me find a better solution for my issue?

 

Thanks in advance.

11 REPLIES 11
Anonymous
Not applicable

Hi,

Thanks for the solution PwerQueryKees  and watkinnc  offered, and i want to offer some more infotmation for user to refet to.

hello @IOdzelashvili , you can refer to the follwing solution.

1.in Loans Table, create a new column to merge the  LoanId, BusinessSegment columns, can name it merged.

2.Then in LoanBalances create a custom column.

=let a=List.Buffer(List.Select(Loans[Merged],each Text.Contains(_,"Mass"))),
b=[LoanId]
in List.Count(List.Select(a,each Text.Contains(_,b)))

Then you can filter the new column that it greater than 0.

 

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

PwerQueryKees
Super User
Super User

@IOdzelashvili if you got a solution working, please share with some explanation! Power query performance on large datasets is always tricky and I would like to learn from your experience...

Anonymous
Not applicable

A couple more things: I don't know how many results your query gets, but if you're loading them to a page, then you can only print like 2 million some odd rows. Also, is your loan ID column text? If so, that's much slower for that many values.

 

--Nate

PwerQueryKees
Super User
Super User

Joinkinndnis only supported by table.join unfortunately. So if you stick with table.join, you may want to look at the join algorithms: https://powerquery.how/joinalgorithm-type/

PwerQueryKees
Super User
Super User

I agree with @Anonymous .

For even more speed you can use the Table.AddJoinColumn function instead because it does not actually bring any data from the joined table.

And JoinKind.LeftSemi could even add a bit more speed.

Anonymous
Not applicable

I would first filter the loans table like Table.SelectRows(Loans, each [BusinessSegment] = "Mass")

 

Then sort both tables ascending by LoanID (this is critical for

next step). Then, use the non-nested inner join:


Table.Join(PriorStepOrTableName, "LoanId", LoanBalances, "LoanId", JoinKind.Inner, JoinAlgorithm.SortMerge)

 

That should be pretty fast.

 

--Nate

PwerQueryKees
Super User
Super User

Not sure if it wil be any faster though. Just try...

PwerQueryKees
Super User
Super User

I don't have acces to my laptop. But you can write your own selection function for the selectrows of the loan balances table.

 

It would be something like this:

Table.Selectrows(LoanBalances, each Table.IsEmpty(Table.Selectrows(Loans, (L) => L[LoanId] = [LoanId] and L[BusinessSegement] "Mass")) = false)

IOdzelashvili
Frequent Visitor

Merge is doing everything good, just it is too slow and I have to wait sometimes more then 30-40 minute to complete this process.

 

I hope there is easier way to do this, using M code, but if there is not, okay, I will continuous to merge my queries.

Anonymous
Not applicable

I mean if you wanted to build a proper data model, you'd be done right now. Load those tables to the data model,  make a relationship between the tables, filter to your heart's content. Otherwise, you have to join them.

 

--Nate

My goal is not to load too many data and not to increas in size my model.

 

Also, I am creating data model for self-service, not a report, so it is important to be loaded only needed data for my model users.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.