Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
MarkDGaal
Helper III
Helper III

Steps for Creating Multiple Step Joins on Different Keys

I have two tables, Table 1; always has the full detail:

 

Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours
WAC      | Plane      |Civil       | Recon      | Air        |  1000
WAC      | Plane      | Military   | Patrol     | Land       |  1000
WAC      | Plane      | Civil      |Survey      |Water       |  1000

Table 2 sometime has the full detail and sometimes it does not and never has the "Hours" field:

ID#| Location | Category 1 | Category 2 | Category 3 | Category 4
001| WAC      | Plane      | Civil      | Recon      | Air            
002| WAC      | Plane      | Civil      | Recon      | Air                    
003| WAC      | Plane      | Military   | Patrol     |            
004| WAC      | Plane      | Civil      |            |

*at minimum Table2 always has a match to Table1 based on Location and Category1

 

I want to create a multiple step join in which the first time I merge Table1 & Table2 it uses Location & Category 1-4 to bring over a count of matching ID#'s as well as the ID# itself. Then, the next time I only use Location & Category 1-3. Followed by Location & Category 1-2.... etc For a final output that looks like:

Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours | CountMatchingTable2.ID#| Table2.ID#
WAC | Plane | Civil | Recon | Air | 1000 | 2 | 001 (match on Loc & C1-4) WAC | Plane | Civil | Recon | Air | 1000 | 2 | 002 (match on Loc & C1-4) WAC | Plane | Military | Patrol | Land | 1000 | 1 | 003 (match on Loc & C1-3) WAC | Plane | Civil |Survey |Water | 1000 | 1 | 004 (match on Loc & C1-2)

Thus far I've created Table2 Grouped by Location & Category 1-4, Table 2 grouped by Location & Category 1-3, Table2 Grouped by Location & Category 1-2, etc... Then, I merge (left outer join) Table1 with Table2 on Location and Category1-4. This duplicates the first record in Table1 and ascribes ID# 001 and ID#002 to the records. The 3rd and 4th Records in Table1 are null because Category 4 and 3 do not match respectively.

 

Now my problem is I would like to only do the remaining (progressively less detailed) joins on the unmatched records in Table1. However, I can't figure out how to do that exclusively on the unmatched records and when I try to merge Table1 again with Table2 on Location & Category1-3 I duplicate the already matched (1st and 2nd records) from Table1. The end goal here is to divide Hours by the number of matching records in Table2 and then create a relationship back to Table2 using ID#. 

 

Also if there is a way to do this without having 4 separate queries for Table2 grouped 4 different ways kindly let me know. Thanks in advance.

1 ACCEPTED SOLUTION
ImkeF
Super User
Super User

Somehow this request feels a bit strange, but anyway - here we go:

 

// Table2
let
    Source = YourTable2,
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}})
in
    #"Changed Type"

// Table1
let
    Source = YourTable1,
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}, {"Hours", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    MatchCat4 = Table.NestedJoin(#"Added Index",{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"Cat4",JoinKind.LeftOuter),
    StrikesCat4 = Table.AggregateTableColumn(MatchCat4, "Cat4", {{"First", List.Last, "Cat4"}}),
    MatchCat3 = Table.NestedJoin(StrikesCat4,{"Location", "Category 1", "Category 2", "Category 3"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3"},"NewColumn",JoinKind.LeftOuter),
    StrikesCat3 = Table.AggregateTableColumn(MatchCat3, "NewColumn", {{"First", List.Last, "Cat3"}}),
    MatchCat2 = Table.NestedJoin(StrikesCat3,{"Location", "Category 1", "Category 2"},DistinctCats,{"Location", "Category 1", "Category 2"},"NewColumn",JoinKind.LeftOuter),
    StrikesCat2 = Table.AggregateTableColumn(MatchCat2, "NewColumn", {{"First", List.Last, "Cat2"}}),
    HighestMatchLevel = Table.AddColumn(StrikesCat2, "HighestMatchLevel", each List.Min({[Cat4],[Cat3],[Cat2]}))
in
    HighestMatchLevel // DistinctCats let Source = Table2, #"Grouped Rows" = Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}) in #"Grouped Rows" // Result let Source = Table2, MergeDistinctCats = Table.NestedJoin(Source,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"NewColumn",JoinKind.LeftOuter), ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs", "AllIDs"}, {"First", "CountIDs", "AllIDs"}), MergeTable1OnID = Table.NestedJoin(ShowLookups,{"First"},Table1,{"HighestMatchLevel"},"NewColumn",JoinKind.LeftOuter), ShowTotalHours = Table.ExpandTableColumn(MergeTable1OnID, "NewColumn", {"Hours"}, {"TotalHours"}), CalculateHoursPerID = Table.AddColumn(ShowTotalHours, "HoursPerID", each [TotalHours]/[CountIDs]) in CalculateHoursPerID

The query-names are given as //... at the beginning of each query

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

12 REPLIES 12
ImkeF
Super User
Super User

Somehow this request feels a bit strange, but anyway - here we go:

 

// Table2
let
    Source = YourTable2,
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}})
in
    #"Changed Type"

// Table1
let
    Source = YourTable1,
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}, {"Hours", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    MatchCat4 = Table.NestedJoin(#"Added Index",{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"Cat4",JoinKind.LeftOuter),
    StrikesCat4 = Table.AggregateTableColumn(MatchCat4, "Cat4", {{"First", List.Last, "Cat4"}}),
    MatchCat3 = Table.NestedJoin(StrikesCat4,{"Location", "Category 1", "Category 2", "Category 3"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3"},"NewColumn",JoinKind.LeftOuter),
    StrikesCat3 = Table.AggregateTableColumn(MatchCat3, "NewColumn", {{"First", List.Last, "Cat3"}}),
    MatchCat2 = Table.NestedJoin(StrikesCat3,{"Location", "Category 1", "Category 2"},DistinctCats,{"Location", "Category 1", "Category 2"},"NewColumn",JoinKind.LeftOuter),
    StrikesCat2 = Table.AggregateTableColumn(MatchCat2, "NewColumn", {{"First", List.Last, "Cat2"}}),
    HighestMatchLevel = Table.AddColumn(StrikesCat2, "HighestMatchLevel", each List.Min({[Cat4],[Cat3],[Cat2]}))
in
    HighestMatchLevel // DistinctCats let Source = Table2, #"Grouped Rows" = Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}) in #"Grouped Rows" // Result let Source = Table2, MergeDistinctCats = Table.NestedJoin(Source,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"NewColumn",JoinKind.LeftOuter), ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs", "AllIDs"}, {"First", "CountIDs", "AllIDs"}), MergeTable1OnID = Table.NestedJoin(ShowLookups,{"First"},Table1,{"HighestMatchLevel"},"NewColumn",JoinKind.LeftOuter), ShowTotalHours = Table.ExpandTableColumn(MergeTable1OnID, "NewColumn", {"Hours"}, {"TotalHours"}), CalculateHoursPerID = Table.AddColumn(ShowTotalHours, "HoursPerID", each [TotalHours]/[CountIDs]) in CalculateHoursPerID

The query-names are given as //... at the beginning of each query

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hummm, okay I've gotten through the code and I think we have a bit of a misunderstanding, let me illustrate with my original tables:

 

Table1 (i've added an Index as you did in your code):

Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours |Index
WAC      | Plane      |Civil       | Recon      | Air        |  1000 |1
WAC      | Plane      | Military   | Patrol     | Land       |  1000 |2
WAC      | Plane      | Civil      |Survey      |Water       |  1000 |3

 

Table2 from my original post:

ID#| Location | Category 1 | Category 2 | Category 3 | Category 4
001| WAC      | Plane      | Civil      | Recon      | Air            
002| WAC      | Plane      | Civil      | Recon      | Air                    
003| WAC      | Plane      | Military   | Patrol     |            
004| WAC      | Plane      | Civil      |            |

When I do the Matching on my most detialed level ID# 001 AND 002 both match with Index #1.

 

 

Accordingly in my final output table (Index added):

Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours | Index| CountMatchingTable2.ID#| Table2.ID#
WAC      | Plane      | Civil      | Recon      | Air        |  1000 | 1    | 2                      | 001 (match on Loc & C1-4)
WAC      | Plane      | Civil      | Recon      | Air        |  1000 | 1    | 2                      | 002 (match on Loc & C1-4) 
WAC      | Plane      | Military   | Patrol     | Land       |  1000 | 2    | 1                      | 003 (match on Loc & C1-3) 
WAC      | Plane      | Civil      |Survey      |Water       |  1000 | 3    | 1                      | 004 (match on Loc & C1-2)

Index 1 has duplicated and 1000 will be divided by 2 matching ID's splitting 500 hours to ID#001 and ID#002. I believe by using "List.Last" in your "Strikes" this sort of duplication will not be possible.....

 

I hear you on this feeling odd, but really all that I am trying to accomplish is making sure that my Hours make it back to the ID#'s at the most detailed levels possible and then not have the hours ever double-counted. I'm open to a different approach if you can think of one...

Hi Mark,

please have a look at my code again. It should do exactly what you've requested (at least at this point): The result-query starts with query 2 (which is in your example the table with the ID's and incomplete Category-allocations).

 

So please check your table names and the according references in your code.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hmmm... no it's basically not allowing ID# 002 to match back with Table1 only ID#001....

 

Could it be the List.Min in the DistinctCats query? Wouldn't that always just take ID#001 rather than both ID 001 and 002 when the count of IDs is > 1?

This is the result I got:

PBI_MarkDGaalMultipleStepJoins2.png

 

Please check if this is your desired result.

If that’s your desired result, please post your query code so that I can have a look to check.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Okay, phew finally got my head around this, thanks ImkeF you are correct the code handles the duplication correctly! Thanks for all of your help.

Seems I've hit a snag in the Results query....

 

That query refers to an "ALLIDS" field that is not present in the //Distincats query did you mean to add an "All Rows" Aggregation to the Group By?

Sorry, yes, I did that and later removed it. Just throw it out:

 

ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs"}, {"First", "CountIDs"}),
   

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

I've changed the code for the first 2 queries. Now the first "Source"-steps reference the 2 tables which should be provided ("YourTable1" and "YourTable2")

 

So if these tables already exist in your model the easiest way would be to rename them to: "YourTable1" and "YourTable2".

That way these queries will work instantly.

 

Yes, this will create some redundant code and queries. So if you feel confident enough, just clean up afterwards 🙂

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF Thank you, I'm starting through your code now (I like to learn so I'm doing it step-by-step). It appears that at the first Aggregation of the most detailed level this step takes 4-5 mins of processing time (2cores/4threads at 50% utilization):

    #"StrikesCat4" = Table.AggregateTableColumn(#"MatchCat4", "NewColumn", {{"First", List.Last, "Cat4IDMatch"__li__ng_escape____li__ng_escape____li__ng_escape__}__li__ng_escape____li__ng_escape____li__ng_escape__})

 

Table2 is 493 rows and Table1 is 1478 rows (after Grouping, source is 170,000rows)... should it be that processor intensive? I should also add that ID from Table2 is not an integer field but rather a combination of letters and numbers. Also, what does "List.Last" do as that doesn't appear to be a function I can do through the Aggregation GUI?

 

EDIT: Table2 Source is a SharePoint Library Table1 Source is an Excel Doc.

EDIT2: Note Saving the Excel Doc source for Table1 already Grouped (1478rows vs 170,000) did not improve the processor time for Matching&Striking in Table1 

 

 

Sorry, didn't pay attention to the performance at all. This is about finding the right place for the List.Buffer or Table.Buffer basically.

Try it here:

 

// DistinctCats
let
    Source = Table2,
    #"Grouped Rows" = Table.Buffer(Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}))
in
    #"Grouped Rows"

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Well that helped! All the Matchs&Strikes run in 5secs now as opposed to 50mins. Getting to the Results Table now should be done shortly, will let you know if I have any issues.

 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.