Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello there!
Imagine I have two table in Power Query Editor:
I want to filter the LoanBalances table where the BusinessSegment from the Loans table is "Mass".
the Condition is that I need to do this step in Power Query Editor, these are my tables and I cannot edit them before loading them into Power Query Editor.
I have tried many options available on the internet, but they only work by merging these two tables, which is very slow, because of the numbers of rows in both tables.
Could you help me find a better solution for my issue?
Thanks in advance.
Hi,
Thanks for the solution PwerQueryKees and watkinnc offered, and i want to offer some more infotmation for user to refet to.
hello @IOdzelashvili , you can refer to the follwing solution.
1.in Loans Table, create a new column to merge the LoanId, BusinessSegment columns, can name it merged.
2.Then in LoanBalances create a custom column.
=let a=List.Buffer(List.Select(Loans[Merged],each Text.Contains(_,"Mass"))),
b=[LoanId]
in List.Count(List.Select(a,each Text.Contains(_,b)))
Then you can filter the new column that it greater than 0.
Best Regards!
Yolo Zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@IOdzelashvili if you got a solution working, please share with some explanation! Power query performance on large datasets is always tricky and I would like to learn from your experience...
A couple more things: I don't know how many results your query gets, but if you're loading them to a page, then you can only print like 2 million some odd rows. Also, is your loan ID column text? If so, that's much slower for that many values.
--Nate
Joinkinndnis only supported by table.join unfortunately. So if you stick with table.join, you may want to look at the join algorithms: https://powerquery.how/joinalgorithm-type/
I agree with @Anonymous .
For even more speed you can use the Table.AddJoinColumn function instead because it does not actually bring any data from the joined table.
And JoinKind.LeftSemi could even add a bit more speed.
I would first filter the loans table like Table.SelectRows(Loans, each [BusinessSegment] = "Mass")
Then sort both tables ascending by LoanID (this is critical for
next step). Then, use the non-nested inner join:
Table.Join(PriorStepOrTableName, "LoanId", LoanBalances, "LoanId", JoinKind.Inner, JoinAlgorithm.SortMerge)
That should be pretty fast.
--Nate
Not sure if it wil be any faster though. Just try...
I don't have acces to my laptop. But you can write your own selection function for the selectrows of the loan balances table.
It would be something like this:
Table.Selectrows(LoanBalances, each Table.IsEmpty(Table.Selectrows(Loans, (L) => L[LoanId] = [LoanId] and L[BusinessSegement] "Mass")) = false)
Merge is doing everything good, just it is too slow and I have to wait sometimes more then 30-40 minute to complete this process.
I hope there is easier way to do this, using M code, but if there is not, okay, I will continuous to merge my queries.
I mean if you wanted to build a proper data model, you'd be done right now. Load those tables to the data model, make a relationship between the tables, filter to your heart's content. Otherwise, you have to join them.
--Nate
My goal is not to load too many data and not to increas in size my model.
Also, I am creating data model for self-service, not a report, so it is important to be loaded only needed data for my model users.