Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
wildrose2023
Frequent Visitor

Hierarchy Modelling for SellOut Data

Hi everyone,

 

I'm looking for some input on the following modeling concept.

The idea is pretty straightforward: we receive SellOut data from different providers—one file contains data at the SKU (product) level, and another contains data at the Brand level (which is the lowest level in that file’s hierarchy).

The SKU-level data is accurate for each SKU, but some SKUs are not reported. This means that simply summing the SKU data results in incorrect Brand totals. In these cases, the external file with Brand-level data becomes the source of truth.

 

To keep things clean and simple in Power BI—avoiding complex DAX like ISINSCOPE and instead relying on plain SUM, AVG, etc.—I’m aiming to bring everything into a single fact table. For additive measures, my approach is to assign the missing values to a blank row, which seems to solve the problem neatly.

 

However, for non-additive measures—like weighted distribution expressed as a percentage—there needs to be a mathematical logic to determine the value attributed to the blank row so that aggregated results remain correct.

Does my diagram make sense to you, and does this approach seem solid? Do you think that the NON-ADDITIVE part makes it much more complicated to do in Databricks? As always thank you for the help and input!

wildrose2023_0-1744898975160.png

 

1 ACCEPTED SOLUTION
v-priyankata
Community Support
Community Support

Hi @wildrose2023 
Thank you for sharing your detailed explanation and diagram. You’ve clearly put a lot of thought into modeling your SellOut data for both additive and non-additive measures.

 

Additive Measures:
Your approach of using a blank row to store the “missing” SKU-level data to align with the brand-level source of truth makes sense. This ensures that simple aggregations work cleanly without extra DAX complexity.

 

Non-Additive Measures:
For weighted distributions (non-additive), your method of reverse-calculating the blank row’s value so that the overall brand-level measure aggregates to the correct average also looks solid. This technique ensures that the final brand-level WD aligns with the trusted brand-level file.

 

Regarding Databricks (or Spark-based environments):
In principle, the calculation logic for the blank row (reverse-calculating the missing piece) is the same, regardless of the underlying platform (Power BI, Databricks, etc.). The challenge in Databricks would typically be to replicate the dynamic calculation of that blank row’s value in PySpark or SQL. However, if you pre-calculate and stamp the value in your ETL layer (as your diagram suggests), you’d avoid runtime complexity in the query layer same as in Power BI.

 

If this information is helpful, please “Accept it as a solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently. Thank you.

View solution in original post

3 REPLIES 3
v-priyankata
Community Support
Community Support

Hi @wildrose2023 
Thank you for sharing your detailed explanation and diagram. You’ve clearly put a lot of thought into modeling your SellOut data for both additive and non-additive measures.

 

Additive Measures:
Your approach of using a blank row to store the “missing” SKU-level data to align with the brand-level source of truth makes sense. This ensures that simple aggregations work cleanly without extra DAX complexity.

 

Non-Additive Measures:
For weighted distributions (non-additive), your method of reverse-calculating the blank row’s value so that the overall brand-level measure aggregates to the correct average also looks solid. This technique ensures that the final brand-level WD aligns with the trusted brand-level file.

 

Regarding Databricks (or Spark-based environments):
In principle, the calculation logic for the blank row (reverse-calculating the missing piece) is the same, regardless of the underlying platform (Power BI, Databricks, etc.). The challenge in Databricks would typically be to replicate the dynamic calculation of that blank row’s value in PySpark or SQL. However, if you pre-calculate and stamp the value in your ETL layer (as your diagram suggests), you’d avoid runtime complexity in the query layer same as in Power BI.

 

If this information is helpful, please “Accept it as a solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently. Thank you.

Thank you @v-priyankata for validating 🙂

lbendlin
Super User
Super User

but some SKUs are not reported. 

how do you know that?  It is very hard to report on the absence of things if you don't know that they are absent.

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.