The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi Guys,
I want to get a Master table while combining the data of two different sources in Power Query. The constraints are :
- Data is coming from a sharepoint server and I have absolutly no control over the data shape on either sources
- Unique ID can come from both sources
- The only reliable key between tables is to combine "ID" and "Domain" (from the user point of view)
- All data are usefull / No line can be discarded
Here is an example for the 2 source tables and the output I am looking for.
Table Source 1 :
Key Source 1 | ID code source 1 | Domain source 1 | Specs source 1 | Lead Lead company source 1 | Qualification type source 1 |
ID 1/D1 | ID 1 | D1 | Spec#1; Spec#2;Spec 5 | .Ltd | First occurence |
ID 2/D2 | ID 2 | D2 | Spec#3;Spec#10;Spec#11 | .Gmbh | Renewal |
ID 3/D3 | ID 3 | D3 | Spec#50;Spec#18 | .Ltd | Under sampling |
ID 7/D1 | ID 7 | D1 | Spec#1 | .SARL | Renewal |
ID 10/D2 | ID 10 | D2 | Spec#10;Spec#11 | .SA | Under sampling |
ID 11/D4 | ID 11 | D4 | Spec#20;Spec#23;Spec#28 | .Gmbh | First occurence |
ID 15/D4 | ID 15 | D4 | Spec#;Spec#28 | .Ltd | Under sampling |
Table Source 2 :
Key source 2 | ID code source 2 | Domain source 2 | Sampling level source 2 | Leac company source 2 | Impacted companies source 2 |
ID 2/D1 | ID 2 | D1 | Light | .Gmbh | .Ltd |
ID 4/D3 | ID 4 | D3 | Nominal | .SA | .Ltd |
ID 10/D4 | ID 10 | D4 | Nominal | .Ltd | .Gmbh;.SA |
ID 15/D4 | ID 15 | D4 | Heavy | .Ltd | .SARL;.SA |
ID 17/D2 | ID 17 | D2 | Light | .SARL | .Gmbh |
ID 3/D3 | ID 3 | D3 | Heavy | .Ltd | .SA |
ID 4/D5 | ID 4 | D5 | Heavy | .Gmbh | .Ltd |
Master table / Target output :
Key Master | ID Master | Domain Master | Lead company 1 | Lead Company 2 | Impacted company | Qualification type | Specs | Sampling level |
ID 1/D1 | ID 1 | D1 | .Ltd | First occurence | Spec#1; Spec#2;Spec 5 | |||
ID 10/D2 | ID 10 | D2 | .SA | Under sampling | Spec#10;Spec#11 | |||
ID 10/D4 | ID 10 | D4 | .Ltd | .Gmbh;.SA | Nominal | |||
ID 11/D4 | ID 11 | D4 | .Gmbh | First occurence | Spec#20;Spec#23;Spec#28 | |||
ID 15/D4 | ID 15 | D4 | .Ltd | .Ltd | .SARL;.SA | Under sampling | Spec#;Spec#28 | Heavy |
ID 17/D2 | ID 17 | D2 | .SARL | .Gmbh | Light | |||
ID 2/D1 | ID 2 | D1 | .Gmbh | .Ltd | Light | |||
ID 2/D2 | ID 2 | D2 | .Gmbh | Renewal | Spec#3;Spec#10;Spec#11 | |||
ID 3/D3 | ID 3 | D3 | .Ltd | .Ltd | .SA | Under sampling | Spec#50;Spec#18 | Heavy |
ID 4/D3 | ID 4 | D3 | .SA | .Ltd | Nominal | |||
ID 4/D5 | ID 4 | D5 | .Gmbh | .Ltd | Heavy | |||
ID 7/D1 | ID 7 | D1 | .SARL | Renewal | Spec#1 |
I've tried different topics within the power Bi community, the functions Merge & Append, various relationships modes and creating distinct tables with DAX but each time there is either a loss of data or the KPI are not flexible enough to be exploited properly.
Since there can be several identical values in each columns, lookupvalue is of little help.
Does any of you has an idea how to get the proper table in Power Query ?
Many thanks,
Regards.
Romuald.
Solved! Go to Solution.
Hi @Anonymous
Paste my query in Advanced Editor of a new query
query name: Source2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8nRRMNJ3MVTSAbOAFJjtk5meUQKk9dxzkzJAtE9JilKsDli5ib6LMUS5CUg5iO2Xn5uZl5gDUhjsiKbc0EDfxQSi3tAApMEEVQNIKdQia5BumDZThDZTmDaP1MSySiRNwY5BPiiazPVdjKCazEGajJD9AlIN9xNUhzHcM8Ywz2BaguRzUySfmyIrRgmoWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Key source 2" = _t, #"ID code source 2" = _t, #"Domain source 2" = _t, #"Sampling level source 2" = _t, #"Leac company source 2" = _t, #"Impacted companies source 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Key source 2", type text}, {"ID code source 2", type text}, {"Domain source 2", type text}, {"Sampling level source 2", type text}, {"Leac company source 2", type text}, {"Impacted companies source 2", type text}})
in
#"Changed Type"
Query name :Source1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dZDNCoMwEIRfJdirqIkGBU+CtBQ8KT2JBxtDK6gVtfT1m6zxJ7SeZljYb2cnz41rjLAdY8MEJwR81nN2wiECJaEURMXcSqZKyLkexgm9GHsPvGPcKEwAETsmM0gKeAC44cxzlMoL1qW9P4WmvOOfslkQrh27M0IKeNihy26wpbh1FR/QWLZ9U3ePheCv3/j6N3Ixi9Lk9yh21uDY0ZLrkbPo8CoWJXoKAS16C4IoBFEtkGD3/UGRmG40qtH2jP8tFF8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Key Source 1" = _t, #"ID code source 1" = _t, #"Domain source 1" = _t, #"Specs source 1" = _t, #"Lead Lead company source 1" = _t, #"Qualification type source 1" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Key Source 1", type text}, {"ID code source 1", type text}, {"Domain source 1", type text}, {"Specs source 1", type text}, {"Lead Lead company source 1", type text}, {"Qualification type source 1", type text}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Key Source 1"}, Source2, {"Key source 2"}, "Source2", JoinKind.FullOuter),
#"Expanded Source2" = Table.ExpandTableColumn(#"Merged Queries", "Source2", {"Key source 2", "ID code source 2", "Domain source 2", "Sampling level source 2", "Leac company source 2", "Impacted companies source 2"}, {"Source2.Key source 2", "Source2.ID code source 2", "Source2.Domain source 2", "Source2.Sampling level source 2", "Source2.Leac company source 2", "Source2.Impacted companies source 2"}),
#"Added Conditional Column" = Table.AddColumn(#"Expanded Source2", "Key Master", each if [Key Source 1] = null then [Source2.Key source 2] else [Key Source 1]),
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "ID Master", each if [ID code source 1] = null then [Source2.ID code source 2] else [ID code source 1]),
#"Added Conditional Column2" = Table.AddColumn(#"Added Conditional Column1", "Domain Master", each if [Domain source 1] = null then [Source2.Domain source 2] else [Domain source 1]),
#"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column2",{"Key Source 1", "ID code source 1", "Domain source 1", "Source2.Key source 2", "Source2.ID code source 2", "Source2.Domain source 2"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Lead Lead company source 1", "Lead company 1"}, {"Source2.Leac company source 2", "Lead company 2"}, {"Qualification type source 1", "Qualification type"}, {"Source2.Impacted companies source 2", "Impacted companies"}, {"Source2.Sampling level source 2", "Sampling level"}, {"Specs source 1", "Specs"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Key Master", "ID Master", "Domain Master", "Lead company 1", "Lead company 2", "Impacted companies", "Qualification type", "Specs", "Sampling level"}),
#"Sorted Rows" = Table.Sort(#"Reordered Columns",{{"Key Master", Order.Ascending}})
in
#"Sorted Rows"
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous
Paste my query in Advanced Editor of a new query
query name: Source2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8nRRMNJ3MVTSAbOAFJjtk5meUQKk9dxzkzJAtE9JilKsDli5ib6LMUS5CUg5iO2Xn5uZl5gDUhjsiKbc0EDfxQSi3tAApMEEVQNIKdQia5BumDZThDZTmDaP1MSySiRNwY5BPiiazPVdjKCazEGajJD9AlIN9xNUhzHcM8Ywz2BaguRzUySfmyIrRgmoWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Key source 2" = _t, #"ID code source 2" = _t, #"Domain source 2" = _t, #"Sampling level source 2" = _t, #"Leac company source 2" = _t, #"Impacted companies source 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Key source 2", type text}, {"ID code source 2", type text}, {"Domain source 2", type text}, {"Sampling level source 2", type text}, {"Leac company source 2", type text}, {"Impacted companies source 2", type text}})
in
#"Changed Type"
Query name :Source1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dZDNCoMwEIRfJdirqIkGBU+CtBQ8KT2JBxtDK6gVtfT1m6zxJ7SeZljYb2cnz41rjLAdY8MEJwR81nN2wiECJaEURMXcSqZKyLkexgm9GHsPvGPcKEwAETsmM0gKeAC44cxzlMoL1qW9P4WmvOOfslkQrh27M0IKeNihy26wpbh1FR/QWLZ9U3ePheCv3/j6N3Ixi9Lk9yh21uDY0ZLrkbPo8CoWJXoKAS16C4IoBFEtkGD3/UGRmG40qtH2jP8tFF8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Key Source 1" = _t, #"ID code source 1" = _t, #"Domain source 1" = _t, #"Specs source 1" = _t, #"Lead Lead company source 1" = _t, #"Qualification type source 1" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Key Source 1", type text}, {"ID code source 1", type text}, {"Domain source 1", type text}, {"Specs source 1", type text}, {"Lead Lead company source 1", type text}, {"Qualification type source 1", type text}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Key Source 1"}, Source2, {"Key source 2"}, "Source2", JoinKind.FullOuter),
#"Expanded Source2" = Table.ExpandTableColumn(#"Merged Queries", "Source2", {"Key source 2", "ID code source 2", "Domain source 2", "Sampling level source 2", "Leac company source 2", "Impacted companies source 2"}, {"Source2.Key source 2", "Source2.ID code source 2", "Source2.Domain source 2", "Source2.Sampling level source 2", "Source2.Leac company source 2", "Source2.Impacted companies source 2"}),
#"Added Conditional Column" = Table.AddColumn(#"Expanded Source2", "Key Master", each if [Key Source 1] = null then [Source2.Key source 2] else [Key Source 1]),
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "ID Master", each if [ID code source 1] = null then [Source2.ID code source 2] else [ID code source 1]),
#"Added Conditional Column2" = Table.AddColumn(#"Added Conditional Column1", "Domain Master", each if [Domain source 1] = null then [Source2.Domain source 2] else [Domain source 1]),
#"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column2",{"Key Source 1", "ID code source 1", "Domain source 1", "Source2.Key source 2", "Source2.ID code source 2", "Source2.Domain source 2"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Lead Lead company source 1", "Lead company 1"}, {"Source2.Leac company source 2", "Lead company 2"}, {"Qualification type source 1", "Qualification type"}, {"Source2.Impacted companies source 2", "Impacted companies"}, {"Source2.Sampling level source 2", "Sampling level"}, {"Specs source 1", "Specs"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Key Master", "ID Master", "Domain Master", "Lead company 1", "Lead company 2", "Impacted companies", "Qualification type", "Specs", "Sampling level"}),
#"Sorted Rows" = Table.Sort(#"Reordered Columns",{{"Key Master", Order.Ascending}})
in
#"Sorted Rows"
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @v-juanli-msft / Maggie,
Thank you very much for that example, it worked like a charm.
It took me a bit of time to deconstructs the steps into my original queries but up to you, I think I get it. Thank you so much !
@artemus I now get what you were saying with the "Merge function", all I needed was more steps to prepare the data and not to perform the Merge as new.
Cheers,
Romuald.
While you *can* do this, to what end will it be useful? How will you visualize it in Power Bi if you don't know what the column names are?
Now if you just want to show the tables in Power Bi, you would need to unpivot the data (unpivot other columns), then do a merge, and use the matrix visual in Power Bi to display the combined data.
Hi @artemus and thanks for your feedback.
I do know the column in advance but I cannot change their name nor the data they display
The point is not to just show the tables in Power Bi but also to establish KPI to help people on their day to day job and others to help clean the data. Hence the constraint where no line can be discarded.
I've previously tried the method you are describing and did it again just now but I still have some data loss : only the lines in common from the two sources are shown in the new table.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
180 | |
52 | |
39 | |
27 | |
25 |