The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I have two tables, Table 1; always has the full detail:
Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours WAC | Plane |Civil | Recon | Air | 1000 WAC | Plane | Military | Patrol | Land | 1000 WAC | Plane | Civil |Survey |Water | 1000
Table 2 sometime has the full detail and sometimes it does not and never has the "Hours" field:
ID#| Location | Category 1 | Category 2 | Category 3 | Category 4 001| WAC | Plane | Civil | Recon | Air 002| WAC | Plane | Civil | Recon | Air 003| WAC | Plane | Military | Patrol | 004| WAC | Plane | Civil | |
*at minimum Table2 always has a match to Table1 based on Location and Category1
I want to create a multiple step join in which the first time I merge Table1 & Table2 it uses Location & Category 1-4 to bring over a count of matching ID#'s as well as the ID# itself. Then, the next time I only use Location & Category 1-3. Followed by Location & Category 1-2.... etc For a final output that looks like:
Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours | CountMatchingTable2.ID#| Table2.ID#
WAC | Plane | Civil | Recon | Air | 1000 | 2 | 001 (match on Loc & C1-4) WAC | Plane | Civil | Recon | Air | 1000 | 2 | 002 (match on Loc & C1-4) WAC | Plane | Military | Patrol | Land | 1000 | 1 | 003 (match on Loc & C1-3) WAC | Plane | Civil |Survey |Water | 1000 | 1 | 004 (match on Loc & C1-2)
Thus far I've created Table2 Grouped by Location & Category 1-4, Table 2 grouped by Location & Category 1-3, Table2 Grouped by Location & Category 1-2, etc... Then, I merge (left outer join) Table1 with Table2 on Location and Category1-4. This duplicates the first record in Table1 and ascribes ID# 001 and ID#002 to the records. The 3rd and 4th Records in Table1 are null because Category 4 and 3 do not match respectively.
Now my problem is I would like to only do the remaining (progressively less detailed) joins on the unmatched records in Table1. However, I can't figure out how to do that exclusively on the unmatched records and when I try to merge Table1 again with Table2 on Location & Category1-3 I duplicate the already matched (1st and 2nd records) from Table1. The end goal here is to divide Hours by the number of matching records in Table2 and then create a relationship back to Table2 using ID#.
Also if there is a way to do this without having 4 separate queries for Table2 grouped 4 different ways kindly let me know. Thanks in advance.
Solved! Go to Solution.
Somehow this request feels a bit strange, but anyway - here we go:
// Table2 let Source = YourTable2, #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}}) in #"Changed Type" // Table1 let
Source = YourTable1,
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}, {"Hours", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
MatchCat4 = Table.NestedJoin(#"Added Index",{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"Cat4",JoinKind.LeftOuter),
StrikesCat4 = Table.AggregateTableColumn(MatchCat4, "Cat4", {{"First", List.Last, "Cat4"}}),
MatchCat3 = Table.NestedJoin(StrikesCat4,{"Location", "Category 1", "Category 2", "Category 3"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3"},"NewColumn",JoinKind.LeftOuter),
StrikesCat3 = Table.AggregateTableColumn(MatchCat3, "NewColumn", {{"First", List.Last, "Cat3"}}),
MatchCat2 = Table.NestedJoin(StrikesCat3,{"Location", "Category 1", "Category 2"},DistinctCats,{"Location", "Category 1", "Category 2"},"NewColumn",JoinKind.LeftOuter),
StrikesCat2 = Table.AggregateTableColumn(MatchCat2, "NewColumn", {{"First", List.Last, "Cat2"}}),
HighestMatchLevel = Table.AddColumn(StrikesCat2, "HighestMatchLevel", each List.Min({[Cat4],[Cat3],[Cat2]}))
in
HighestMatchLevel // DistinctCats let Source = Table2, #"Grouped Rows" = Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}) in #"Grouped Rows" // Result let Source = Table2, MergeDistinctCats = Table.NestedJoin(Source,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"NewColumn",JoinKind.LeftOuter), ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs", "AllIDs"}, {"First", "CountIDs", "AllIDs"}), MergeTable1OnID = Table.NestedJoin(ShowLookups,{"First"},Table1,{"HighestMatchLevel"},"NewColumn",JoinKind.LeftOuter), ShowTotalHours = Table.ExpandTableColumn(MergeTable1OnID, "NewColumn", {"Hours"}, {"TotalHours"}), CalculateHoursPerID = Table.AddColumn(ShowTotalHours, "HoursPerID", each [TotalHours]/[CountIDs]) in CalculateHoursPerID
The query-names are given as //... at the beginning of each query
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Somehow this request feels a bit strange, but anyway - here we go:
// Table2 let Source = YourTable2, #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}}) in #"Changed Type" // Table1 let
Source = YourTable1,
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Category 1", type text}, {"Category 2", type text}, {"Category 3", type text}, {"Category 4", type text}, {"Hours", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
MatchCat4 = Table.NestedJoin(#"Added Index",{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"Cat4",JoinKind.LeftOuter),
StrikesCat4 = Table.AggregateTableColumn(MatchCat4, "Cat4", {{"First", List.Last, "Cat4"}}),
MatchCat3 = Table.NestedJoin(StrikesCat4,{"Location", "Category 1", "Category 2", "Category 3"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3"},"NewColumn",JoinKind.LeftOuter),
StrikesCat3 = Table.AggregateTableColumn(MatchCat3, "NewColumn", {{"First", List.Last, "Cat3"}}),
MatchCat2 = Table.NestedJoin(StrikesCat3,{"Location", "Category 1", "Category 2"},DistinctCats,{"Location", "Category 1", "Category 2"},"NewColumn",JoinKind.LeftOuter),
StrikesCat2 = Table.AggregateTableColumn(MatchCat2, "NewColumn", {{"First", List.Last, "Cat2"}}),
HighestMatchLevel = Table.AddColumn(StrikesCat2, "HighestMatchLevel", each List.Min({[Cat4],[Cat3],[Cat2]}))
in
HighestMatchLevel // DistinctCats let Source = Table2, #"Grouped Rows" = Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}) in #"Grouped Rows" // Result let Source = Table2, MergeDistinctCats = Table.NestedJoin(Source,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},DistinctCats,{"Location", "Category 1", "Category 2", "Category 3", "Category 4"},"NewColumn",JoinKind.LeftOuter), ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs", "AllIDs"}, {"First", "CountIDs", "AllIDs"}), MergeTable1OnID = Table.NestedJoin(ShowLookups,{"First"},Table1,{"HighestMatchLevel"},"NewColumn",JoinKind.LeftOuter), ShowTotalHours = Table.ExpandTableColumn(MergeTable1OnID, "NewColumn", {"Hours"}, {"TotalHours"}), CalculateHoursPerID = Table.AddColumn(ShowTotalHours, "HoursPerID", each [TotalHours]/[CountIDs]) in CalculateHoursPerID
The query-names are given as //... at the beginning of each query
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hummm, okay I've gotten through the code and I think we have a bit of a misunderstanding, let me illustrate with my original tables:
Table1 (i've added an Index as you did in your code):
Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours |Index WAC | Plane |Civil | Recon | Air | 1000 |1 WAC | Plane | Military | Patrol | Land | 1000 |2 WAC | Plane | Civil |Survey |Water | 1000 |3
Table2 from my original post:
ID#| Location | Category 1 | Category 2 | Category 3 | Category 4 001| WAC | Plane | Civil | Recon | Air 002| WAC | Plane | Civil | Recon | Air 003| WAC | Plane | Military | Patrol | 004| WAC | Plane | Civil | |
When I do the Matching on my most detialed level ID# 001 AND 002 both match with Index #1.
Accordingly in my final output table (Index added):
Location | Category 1 | Category 2 | Category 3 | Category 4 | Hours | Index| CountMatchingTable2.ID#| Table2.ID# WAC | Plane | Civil | Recon | Air | 1000 | 1 | 2 | 001 (match on Loc & C1-4) WAC | Plane | Civil | Recon | Air | 1000 | 1 | 2 | 002 (match on Loc & C1-4) WAC | Plane | Military | Patrol | Land | 1000 | 2 | 1 | 003 (match on Loc & C1-3) WAC | Plane | Civil |Survey |Water | 1000 | 3 | 1 | 004 (match on Loc & C1-2)
Index 1 has duplicated and 1000 will be divided by 2 matching ID's splitting 500 hours to ID#001 and ID#002. I believe by using "List.Last" in your "Strikes" this sort of duplication will not be possible.....
I hear you on this feeling odd, but really all that I am trying to accomplish is making sure that my Hours make it back to the ID#'s at the most detailed levels possible and then not have the hours ever double-counted. I'm open to a different approach if you can think of one...
Hi Mark,
please have a look at my code again. It should do exactly what you've requested (at least at this point): The result-query starts with query 2 (which is in your example the table with the ID's and incomplete Category-allocations).
So please check your table names and the according references in your code.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hmmm... no it's basically not allowing ID# 002 to match back with Table1 only ID#001....
Could it be the List.Min in the DistinctCats query? Wouldn't that always just take ID#001 rather than both ID 001 and 002 when the count of IDs is > 1?
This is the result I got:
Please check if this is your desired result.
If that’s your desired result, please post your query code so that I can have a look to check.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Okay, phew finally got my head around this, thanks ImkeF you are correct the code handles the duplication correctly! Thanks for all of your help.
Seems I've hit a snag in the Results query....
That query refers to an "ALLIDS" field that is not present in the //Distincats query did you mean to add an "All Rows" Aggregation to the Group By?
Sorry, yes, I did that and later removed it. Just throw it out:
ShowLookups = Table.ExpandTableColumn(MergeDistinctCats, "NewColumn", {"First", "CountIDs"}, {"First", "CountIDs"}),
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
I've changed the code for the first 2 queries. Now the first "Source"-steps reference the 2 tables which should be provided ("YourTable1" and "YourTable2")
So if these tables already exist in your model the easiest way would be to rename them to: "YourTable1" and "YourTable2".
That way these queries will work instantly.
Yes, this will create some redundant code and queries. So if you feel confident enough, just clean up afterwards 🙂
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
@ImkeF Thank you, I'm starting through your code now (I like to learn so I'm doing it step-by-step). It appears that at the first Aggregation of the most detailed level this step takes 4-5 mins of processing time (2cores/4threads at 50% utilization):
#"StrikesCat4" = Table.AggregateTableColumn(#"MatchCat4", "NewColumn", {{"First", List.Last, "Cat4IDMatch"__li__ng_escape____li__ng_escape____li__ng_escape__}__li__ng_escape____li__ng_escape____li__ng_escape__})
Table2 is 493 rows and Table1 is 1478 rows (after Grouping, source is 170,000rows)... should it be that processor intensive? I should also add that ID from Table2 is not an integer field but rather a combination of letters and numbers. Also, what does "List.Last" do as that doesn't appear to be a function I can do through the Aggregation GUI?
EDIT: Table2 Source is a SharePoint Library Table1 Source is an Excel Doc.
EDIT2: Note Saving the Excel Doc source for Table1 already Grouped (1478rows vs 170,000) did not improve the processor time for Matching&Striking in Table1
Sorry, didn't pay attention to the performance at all. This is about finding the right place for the List.Buffer or Table.Buffer basically.
Try it here:
// DistinctCats
let
Source = Table2,
#"Grouped Rows" = Table.Buffer(Table.Group(Source, {"Location", "Category 1", "Category 2", "Category 3", "Category 4"}, {{"First", each List.Min([ID]), type number}, {"CountIDs", each Table.RowCount(_), type number}}))
in
#"Grouped Rows"
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Well that helped! All the Matchs&Strikes run in 5secs now as opposed to 50mins. Getting to the Results Table now should be done shortly, will let you know if I have any issues.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
108 | |
77 | |
71 | |
48 | |
39 |
User | Count |
---|---|
137 | |
108 | |
69 | |
64 | |
58 |