Solved: Re: degenerate dimensions

Jeanxyz · ‎11-14-2025

I'm reading MS documentation about data modeling, and got confused about degenerate dimensions. Basically MS says if there is only one column in the dimension table and this column (e.g., sales order id) exists in the fact table, it's better to delete the sales order id dimension table and filter the fact table using the sales order id in the fact table itself (https://learn.microsoft.com/en-us/power-bi/guidance/star-schema#junk-dimensions).

"A degenerate dimension refers to an attribute of the fact table that's required for filtering. At Adventure Works, the reseller sales order number is a good example. In this instance, it doesn't make sense to create an independent table consisting of just this one column because it would increase the model storage size and result in Data pane clutter.

In the Power BI semantic model, it can be appropriate to add the sales order number column to the fact table to allow filtering or grouping by sales order number. It's an exception to the formerly introduced rule that you shouldn't mix table types (generally, model tables should be either dimension or fact)."

Is this really ok? I have long believed any category fields use for filtering or grouping should come from a dimension table in order to : reduce model size and simplify calculation, avoid auto-exist errors (https://www.sqlbi.com/articles/understanding-dax-auto-exist/) . But this MS documentation seems to break this golden rule.

v-achippa · ‎11-27-2025

Hi @Jeanxyz,

Yes you are right that one column does not automatically mean 1:1. The real rule is that,
If that column has repeated values and is used to group or filter across many fact rows, then it is a real dimension and should stay as a dimension.
If the column is almost unique for each fact row and has no descriptive attributes or reuse, then it behaves like a degenerate dimension and there is no benefit in keeping a separate table.

So the decision is not based on the number of columns, it is based on cardinality and whether the column actually functions as a dimension.

Thanks and regards,

Anjan Kumar Chippa

View solution in original post

KarinSzilagyi · ‎11-14-2025

Hi @Jeanxyz, a degenerate dimension applies specifically when the field would produce a dimension with a 1:1 relationship to your fact, where the dimension only contains that single column.

In that case, creating a separate dimension doesn't add any analytical value. You'd just end up with a table that mirrors the fact table key, which also increases the model size and adds an unnecessary relationship. Power BI would need to filter through a join, just to return exactly the same data you'd get by using the column directly from your fact table.

Did I answer your question? If so, please consider marking my response as the ‘Accepted Solution’ - it helps others with the same issue find the answer more easily!

Jeanxyz · ‎11-14-2025

The real question is why do we need dimension tables in the first place? Are you saying dimension table is only used to remove duplicates from the fact table? How about auto-exist?

Idrissshatila · ‎11-15-2025

Hello @Jeanxyz .

mainly facts store events / transactions like amounts quantities and clicks for example. while dimensions store the attributes you filter / slice with and analyze by like stores, product names, customer names.

so instead of leaving the store name, customer name and attributes like age, gender, email. you create a dimension table that stores all columns of the customer and have a column key.

and only reference the key in the fact so that you eleminate repeated texts so that it performs better and have less storage.

If you keep these attributes in the Fact table:

They are repeated millions of times
They blow up file size
They slow relationships

Did I answer your question? Mark my post as a solution! Appreciate your Kudos

Follow me on LinkedIn

Vote for my Community Mobile App Idea

Proud to be a Super User!

Jeanxyz · ‎11-15-2025

Thanks for explaining the benefits of dimension table. That makes sense.

So here is my next question, what if the dimension table contains only one column as in the case of Dim_SalesOrderID? MS is saying it's better to remove the dimension table since there is only one column in the dimension table.

1. is that correct? Based on your explanation, even if there is only one column in the dimension table, it's better to keep the dimension table because it will speed up the filtering/slicing.

2. what if there are lots of unique rows in the dimension table? Will a long dimension table slow down the filtering query?

3. Can someone explain in more detail how the filtering works in Veripaq. In DataBricks, there is a clear explanation about how groupBy is handled in Executors step by step (shuffle-sorting - partition coalesce etc.)? How does Veripaq handle the filtering (i.e., groupBy) query?

v-achippa · ‎11-19-2025

Hi @Jeanxyz,

Thank you for reaching out to Microsoft Fabric Community.

Thank you @MFelix, @KarinSzilagyi and @Idrissshatila for the prompt response.

In this case microsoft’s guidance is correct. If a dimension contains only a single column that is 1:1 with the fact table, then keeping that column in the fact table is better so that it avoids an unnecessary relationship, reduces model clutter and storage and also speeds queries.

Keep a separate dimension when the column has many repeated values, so that deduplication saves space or when you need attributes such as descriptions, hierarchies, translations, RLS or reuse across multiple fact tables. If the value is unique per fact row and has no additional analytic value, it’s best to keep as a degenerate dimension directly in the fact.

Thanks and regards,

Anjan Kumar Chippa

Jeanxyz · ‎11-20-2025

Sorry, but I'm not convinced with any of the explanations. MS documentation says dimension table has benefits 1, 2, 3, 4, 5..., but the degenerate dimension doc says because benefit 4 doesn't apply, hence the dimension table should be removed. Am I the only one who see the fallacy?

So unless I understand how the fact and dimension relationship acts step by step, I'm afraid I won't buy into this documenation. Also Marco Russo said slicing and dicing should always come from dimension table to avoid auto exist error. No one talks about this in the thread.

v-achippa · ‎11-25-2025

Hi @Jeanxyz,

Thank you for the response and yes the dimension tables normally provide several benefits. The key point here is that those benefits only apply when the table is functioning as a real dimension that means like it has repeated values, attributes or is shared across fact tables.

In a degenerate dimension where a single column that is 1:1 with the fact and has no attributes, none of those benefits apply including AutoExist, because there are no attribute combinations to prune.

That is why microsoft treats it as an exception and recommends keeping it in the fact table. This is not contradicting the general rules, it’s just recognising that a 1-column, high-cardinality table is not functioning as a real dimension.

FYI please refer this below documents:

https://learn.microsoft.com/en-us/power-bi/guidance/star-schema#dimension-tables

https://learn.microsoft.com/en-us/power-bi/guidance/star-schema#degenerate-dimensions

Thanks and regards,

Anjan Kumar Chippa

Jeanxyz · ‎11-25-2025

It's not correct. There is only one column in the dimension table, but it doesn't mean there is a 1:1 relationship between the dimension and fact table.

v-achippa · ‎11-27-2025

Hi @Jeanxyz,

Yes you are right that one column does not automatically mean 1:1. The real rule is that,
If that column has repeated values and is used to group or filter across many fact rows, then it is a real dimension and should stay as a dimension.
If the column is almost unique for each fact row and has no descriptive attributes or reuse, then it behaves like a degenerate dimension and there is no benefit in keeping a separate table.

So the decision is not based on the number of columns, it is based on cardinality and whether the column actually functions as a dimension.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎12-01-2025

Hi @Jeanxyz,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎12-04-2025

Hi @Jeanxyz,

We wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.

Thanks and regards,

Anjan Kumar Chippa

MFelix · ‎11-14-2025

Hi @Jeanxyz ,

This is a very specific usecase where the Sales Order Table is not a dimension in itself because it aggregates values of the sales order that will allow you to pickup data for the dates, customer and so on that would connect to other dimension tables, so in the end you would need to have a Snowflake model with the Sales connected to the sales order and the sales order connected to the other dimensions.

In this case you should pass all your attributes to the fact table and generate a star schema and since the Sales Order Column is a single column in that model you can filter directly from the fact table.

Regards

Miguel Félix

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português