The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi everyone,
I have a question regarding best practices for future scalability, where I may need to add more data from additional units down the line. Here's the situation:
I have separate data sources for each unit, and I want to combine data from two units into a single dashboard, with the ability to filter by unit as needed.
The issue is that each unit has its own product dimension table, and the product IDs are different between the units. For example:
My goal is to centralize everything into a single fact table by combining data from both units, and include a column to identify the unit. The challenge I'm facing is that I need to list the product IDs, but they differ between the units. I was considering creating a surrogate key (ID) to standardize this across units. For example:
Would creating a surrogate ID be the best practice in this case? If so, what would be the recommended approach to implement this? Any guidance or suggestions would be greatly appreciated!
Thanks in advance for your help!
Solved! Go to Solution.
If it would have been just diffrenet key names, the solution would have been simple, harmonize each separate data source (you have seperate data source anyway) by renaming.
The question is how do you harmonize the values?
Do you think you can harmonize automatically? Here you treat the productnames as an alternate key. In general I would nor regard that as a best practise though, because names tend to have spelling variations and case variations.
But there may me other filds that can act as alternate key?
If not, you need a translation table.
In general I would make a table with 3 fields:
You want to make it scalable. This is very doable in PowerQuery. What I would do:
And the same for the sales tables (you can add the sales data sources also to the datasource table).
Expanding your solution with new datasources will only need 2 steps:
Hope this helps!
If you would like more help on the actual solution, send some more details on your setup and workflow and share some more realistic sample data and queries you already created.
Hi,
Thanks for the solution PwerQueryKees and ToddChitt offered, and i want to offer some more information for user to refer to.
hello @FelipMark , you can use merge queries in power query.
You can refer the folloing queries.Please create these queries in order.
Dimproduct1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUQrKTE5VitWJVjICcpyKUhNTwDxjIM83NbFEKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDProduct = _t, Description = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDProduct", Int64.Type}, {"Description", type text}})
in
#"Changed Type"
dimproduct2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfJNTSxRitWJVjICcoIyk1PBHGMgx6koNTFFKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDProduct = _t, Description = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDProduct", Int64.Type}, {"Description", type text}})
in
#"Changed Type"
Factsalesunit1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAeJYnWglIyDLCMwyBrKMlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDSale = _t, IDProduct = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDSale", Int64.Type}, {"IDProduct", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"IDProduct"}, Query1, {"IDProduct"}, "DimProductUnit1", JoinKind.LeftOuter),
#"Expanded DimProductUnit1" = Table.ExpandTableColumn(#"Merged Queries", "DimProductUnit1", {"Description"}, {"Description"})
in
#"Expanded DimProductUnit1"
factsalesunit2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJSitWJBpI6SsZgljGQZagUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDSale = _t, IDProduct = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDSale", Int64.Type}, {"IDProduct", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"IDProduct"}, Query2, {"IDProduct"}, "DimProductUnit2", JoinKind.LeftOuter),
#"Expanded DimProductUnit2" = Table.ExpandTableColumn(#"Merged Queries", "DimProductUnit2", {"Description"}, {"Description"}),
#"Merged Queries1" = Table.NestedJoin(#"Expanded DimProductUnit2", {"IDSale"}, Query3, {"IDSale"}, "FactSalesUnit1", JoinKind.LeftOuter),
#"Expanded FactSalesUnit1" = Table.ExpandTableColumn(#"Merged Queries1", "FactSalesUnit1", {"IDProduct", "Description"}, {"IDProduct.1", "Description.1"})
in
#"Expanded FactSalesUnit1"
Output(in factsalesunit2 table)
Best Regards!
Yolo Zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi,
Thanks for the solution PwerQueryKees and ToddChitt offered, and i want to offer some more information for user to refer to.
hello @FelipMark , you can use merge queries in power query.
You can refer the folloing queries.Please create these queries in order.
Dimproduct1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUQrKTE5VitWJVjICcpyKUhNTwDxjIM83NbFEKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDProduct = _t, Description = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDProduct", Int64.Type}, {"Description", type text}})
in
#"Changed Type"
dimproduct2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfJNTSxRitWJVjICcoIyk1PBHGMgx6koNTFFKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDProduct = _t, Description = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDProduct", Int64.Type}, {"Description", type text}})
in
#"Changed Type"
Factsalesunit1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAeJYnWglIyDLCMwyBrKMlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDSale = _t, IDProduct = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDSale", Int64.Type}, {"IDProduct", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"IDProduct"}, Query1, {"IDProduct"}, "DimProductUnit1", JoinKind.LeftOuter),
#"Expanded DimProductUnit1" = Table.ExpandTableColumn(#"Merged Queries", "DimProductUnit1", {"Description"}, {"Description"})
in
#"Expanded DimProductUnit1"
factsalesunit2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJSitWJBpI6SsZgljGQZagUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [IDSale = _t, IDProduct = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IDSale", Int64.Type}, {"IDProduct", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"IDProduct"}, Query2, {"IDProduct"}, "DimProductUnit2", JoinKind.LeftOuter),
#"Expanded DimProductUnit2" = Table.ExpandTableColumn(#"Merged Queries", "DimProductUnit2", {"Description"}, {"Description"}),
#"Merged Queries1" = Table.NestedJoin(#"Expanded DimProductUnit2", {"IDSale"}, Query3, {"IDSale"}, "FactSalesUnit1", JoinKind.LeftOuter),
#"Expanded FactSalesUnit1" = Table.ExpandTableColumn(#"Merged Queries1", "FactSalesUnit1", {"IDProduct", "Description"}, {"IDProduct.1", "Description.1"})
in
#"Expanded FactSalesUnit1"
Output(in factsalesunit2 table)
Best Regards!
Yolo Zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If it would have been just diffrenet key names, the solution would have been simple, harmonize each separate data source (you have seperate data source anyway) by renaming.
The question is how do you harmonize the values?
Do you think you can harmonize automatically? Here you treat the productnames as an alternate key. In general I would nor regard that as a best practise though, because names tend to have spelling variations and case variations.
But there may me other filds that can act as alternate key?
If not, you need a translation table.
In general I would make a table with 3 fields:
You want to make it scalable. This is very doable in PowerQuery. What I would do:
And the same for the sales tables (you can add the sales data sources also to the datasource table).
Expanding your solution with new datasources will only need 2 steps:
Hope this helps!
If you would like more help on the actual solution, send some more details on your setup and workflow and share some more realistic sample data and queries you already created.
Generally speaking, I wouldn't handle surrogate keys in Power Query. Oh, sure, you can do Indexing and the like, but you can also eat soup with a fork. Question is, why would you want to?
If you can handle the surrogate key in your soure, like SQL Server, and have a stored procedure do the combining, that would probably be best. Then the Fact tables, also processed in SQL, would do a lookup agains the Product Unit to get the Surrogate Key for the respective dimension member.
Proud to be a Super User! | |