March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
My question is whether I should separate out single-column dimensions into a dimension tabel or whether to leave them just as-is in the fact table?
I have a list of >1M water assets: pumps, valves, tanks, etc, and one of the fields is its operational status: "In Use", "Abandoned", "Demolished", "Future", "Mothballed" (and more). This field is going to be used a slicer in my Power BI report.
Should I split this field off into a seperate dimension table, given that that dimension table would only have two columns: primary key and the aforementioned descriptions?
I understand the concepts of Star Schema and all that, and the rule of thumb that "fact tables contain data you aggregate on, dimension tables contain data you filter on". At the same time, I also understand that the Power BI engine would create some sort of lookup under the bonnet for such low-cardinality fields anyway, therefore would be pointless and actually having a separate dimension table would create (minor) additional overhead.
So, in the end: is there any reason why I should create single-column dimension tables?
(and would you call this a degenerate dimension or is that something slightly different?)
Hello @TallArnie ,
I would made a dimension table with one column of values and a column for key and used the key in the fact table.
since when you refresh, instead of refreshing several characters per row which are the words, its better to refresh 1 or 2 character per row as a key.
check the normalization part in this article https://learn.microsoft.com/en-us/power-bi/guidance/star-schema
If I answered your question, please mark my post as solution, Appreciate your Kudos 👍
Proud to be a Super User! | |
Thank you @Idrissshatila
All my data is currently in a flat CSV file, with the operational status being one of the columns. And re-reading/refreshing that data will only need to be done on an occasional basis.
Leaving it as-is means I don't need to do anything to this column in Power Query, whilst moving it into a dimension table means I would need to split it off and generate a key. I am not sure if refreshing the latter would be faster than the former.
User | Count |
---|---|
120 | |
78 | |
58 | |
52 | |
46 |
User | Count |
---|---|
170 | |
117 | |
63 | |
58 | |
51 |