The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have 2 tables, #"Lead Data1" and #"Enrolment Data", with ~300k and ~16k rows respectively with key (Lead Number). I am using a separate query to pull #"Enrolment Data".
I want to update the #"Lead Data1" columns {"Created On", "Course", "Program", "Business"} from the #"Enrolment Data", retaining the other columns.
If any key (Lead Number) in #"Enrolment Data" is not found in #"Lead Data1", those records need to be appended to it.
I tried the below code:
let
Source = Excel.Workbook(File.Contents("xxx.xlsb"), null, true),
#"Lead Data1" = Source{[Name="Lead Data"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Lead Data1", [PromoteAllScalars=true]),
#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Created On", "Owner", "Lead Number", "Source", "Source type", "TL", "Course", "Program", "Business"}),
Here I remove all unnecessary columns, these are the ones I need in my output from combining #"Lead Data1" and #"Enrolment Data". I will use #"Removed Other Columns" table again to append non-matched items.
#"Removed Other Columns1" = Table.SelectColumns(#"Removed Other Columns",{"Owner", "Lead Number", "Source", "Source type", "TL"}),
I only keep columns I need to merge to #"Enrolment Data" (which don't have these)
#"Merged Queries" = Table.Join(#"Removed Other Columns1", {"Lead Number"}, #"Enrolment Data", {"Lead Number"}, JoinKind.RightOuter, 2),
I am merging with #"Enrolment Data" and not the other way around because it has much lower number of rows (if that makes a difference). This would give me the "updated" rows, including those with keys not matching #"Lead Data1".
#"Rows to Remove" = Table.Column(#"Merged Queries","Lead Number"),
Once merged, I select the list of matched keys (updated rows) in #"Enrolment Data".
#"Selected Rows" = Table.RemoveMatchingRows(#"Removed Other Columns", each List.Contains(#"Rows to Remove",[#"Lead Number"])),
I remove those rows from #"Lead Data1" (giving "non-updated" rows)
#"Appended Queries" = Table.Combine({#"Merged Queries", #"Selected Rows"})
I append the "Updated" and "Non-Updated" rows.
in #"Appended Queries"
However, the query throws an error of "Data Source not found" - I think this might be because I am referencing #"Removed Other Columns" table wrong. Is there any way around it, or my approach itself is faulty? If so, any solution will be highly appreciated.
Lead Data | ||||||||
Created On | "Owner" | "Lead Number" | "Source" | "Source type" | "TL" | "Course" | "Program" | "Business" |
01/01/2024 | A1 | 123 | B1 | C1 | D1 | E1 | F1 | G1 |
02/01/2024 | A2 | 124 | B2 | C2 | D2 | E2 | F2 | G2 |
03/01/2024 | A3 | 125 | B3 | C3 | D3 | E3 | F3 | G3 |
04/01/2024 | A4 | 126 | B4 | C4 | D4 | E4 | F4 | G4 |
05/01/2024 | A5 | 127 | B5 | C5 | D5 | E5 | F5 | G5 |
06/01/2024 | A6 | 128 | B6 | C6 | D6 | E6 | F6 | G6 |
Enrolment Data | ||||||||
Created On | "Lead Number" | "Course" | "Program" | "Business" | ||||
03/01/2024 | 125 | E7 | F7 | G7 | ||||
04/01/2024 | 126 | E8 | F8 | G8 | ||||
07/01/2024 | ABC | E9 | F9 | G9 | ||||
Output | ||||||||
Created On | "Owner" | "Lead Number" | "Source" | "Source type" | "TL" | "Course" | "Program" | "Business" |
01/01/2024 | A1 | 123 | B1 | C1 | D1 | E1 | F1 | G1 |
02/01/2024 | A2 | 124 | B2 | C2 | D2 | E2 | F2 | G2 |
03/01/2024 | A3 | 125 | B3 | C3 | D3 | E7 | F7 | G7 |
04/01/2024 | A4 | 126 | B4 | C4 | D4 | E8 | F8 | G8 |
05/01/2024 | A5 | 127 | B5 | C5 | D5 | E5 | F5 | G5 |
06/01/2024 | A6 | 128 | B6 | C6 | D6 | E6 | F6 | G6 |
07/01/2024 | null | ABC | null | null | null | E9 | F9 | G9 |
Hello @Anonymous,
Can you please try this streamlined version of your approach:
let
SourceLeadData = Excel.Workbook(File.Contents("xxx.xlsb"), null, true),
LeadData1 = SourceLeadData{[Name="Lead Data"]}[Data],
PromotedHeadersLead = Table.PromoteHeaders(LeadData1, [PromoteAllScalars=true]),
SourceEnrolmentData = Excel.Workbook(File.Contents("yyy.xlsb"), null, true),
EnrolmentData = SourceEnrolmentData{[Name="Enrolment Data"]}[Data],
PromotedHeadersEnrolment = Table.PromoteHeaders(EnrolmentData, [PromoteAllScalars=true]),
MergedQueries = Table.Join(PromotedHeadersLead, "Lead Number", PromotedHeadersEnrolment, "Lead Number", JoinKind.FullOuter),
CustomUpdated = Table.AddColumn(MergedQueries, "Custom Created On", each if [Created On.1] = null then [Created On] else [Created On.1]),
RemovedIntermediateColumns = Table.RemoveColumns(CustomUpdated,{"Created On", "Created On.1", "Course", "Course.1", "Program", "Program.1", "Business", "Business.1"}),
RenamedColumns = Table.RenameColumns(RemovedIntermediateColumns, {{"Custom Created On", "Created On"}})
// Repeat the renaming for other custom columns as needed
in
RenamedColumns
I have tried this solution, and it works, but feels very inefficient as the operations have to done on all 300k rows. It takes a really long time, and further increases if I need to update more columns (I removed a few here for simplicity). I was hoping there was a better and efficient way to perform the same operation using merge and append combo.